Re: ERROR: could not open segment 1 of relation 1663/743352/743420 (target block 6407642): No such file or directory

From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Mike Williams <mike(dot)williams(at)comodo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-admin(at)postgresql(dot)org
Subject: Re: ERROR: could not open segment 1 of relation 1663/743352/743420 (target block 6407642): No such file or directory
Date: 2010-03-30 20:28:09
Message-ID: 34d269d41003301328p760d4cci74da558baf1576b9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, Mar 30, 2010 at 04:16, Mike Williams <mike(dot)williams(at)comodo(dot)com> wrote:
> Thanks Alex, good to know I've not screwed up the kernel somehow.
>
> I've been using 2.6.32 with grsecurity-2.1.14-2.6.32.9-201002231820 applied.

Looks like the first instance I had of this problem was with
2.6.31.1-rc1-grsec. I know I tried 2.6.32-grsec and various
2.6.32.X-grsecs but all those had this issue at some point. Currently
im on a mostly stock 2.6.33.1 with no problems. I have not had the
nerve to try a -grsec kernel on it again.

For reference here are the errors I got:

could not open segment 3 of relation base/4440720/8003730

COPY public.page_loads (cgi, content_length, date_created, defunct,
host, ip, page_load_id, protocol, proxy_ip, referrer, request_method,
sessionid, url, user_id, audit_tid, user_agent_id, action, server) TO
'/tmp/blah.sql';
ERROR: invalid memory alloc request size 18446744073709551613

There were more could not open segment errors... but I seem to have lost them.

Normally I would think the above is corrupt data, but it would
sometimes work. It *always* worked on the non grsec kernel. So
instead it smells like bad ram, well its got ecc ram and survived
multiple runs of memtest, memtest86+ various versions. [ Yeah I know
people including me have seen ram that passes all that and is still
bad ]

Since you are having similar problems with a -grsec kernel sounds like
there might be some kind of memory corruption bug with it. I would
recommend trying a stock kernel and seeing if the problem goes away.
I also think the general attitude here is if you run crazy security
patches you get to keep both pieces. :)

Another fact that seemed to point to bad ram or some kind of kernel
corruption was trying to find the bad row COPY reported above:

SELECT count(*) from (select * from page_loads order by page_load_id
desc limit 937980) as foo;
ERROR: could not open segment 3 of relation base/4440720/8003730
(target block 4680336): No such file or directory

SELECT count(*) from ( select * from page_loads order by page_load_id
desc limit 937970) as foo;
count
--------
937970

<70-79 snipped all worked>

SELECT count(*) from (select * from page_loads order by page_load_id
desc limit 937979) as foo;
count
--------
937979

-- Uhh this was just broken...
SELECT count(*) from (select * order by page_load_id desc limit 937980) as foo;
count
--------
937980

I thought I had some stacktraces... but they are not in my notes... I
do remember tracing through them and coming to the conclusion that its
most likely some kind of kernel bug. (That error can only happen if we
try to open a file that does not exist, but we can only get that far
if the file existed or some such) Sorry Im a bit hazy this was back
in September.

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rodger Donaldson 2010-03-30 21:08:23 Re: Virtualization vs. sharing a server
Previous Message Greg Sabino Mullane 2010-03-30 17:50:05 Re: Migrate postgres to newer hardware