Postgres error when adding new page

Lists: pgsql-general
From: Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Postgres error when adding new page
Date: 2012-10-01 13:47:17
Message-ID: CAKRNd4yN+-yYKgYcTovc8aKO1ZLFZpYSjzMHX+vxWOit2zTTAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Postgres general

We're experiencing a lot of errors when using CDash on PostgreSQL 9.1,
hosted on Mac OSX 10.6.8. The actual error message is as follows:

SQL error in Cannot insert test:
utility/asserter/assert_file_returns_true_for_empty_files into the
database():ERROR: failed to add old item to the right sibling while
splitting block 191 of index "crc323"<br>

After some investigation, it appears the error is coming from the guts of
Postgres: src/backend/access/nbtree/nbtinsert.c:1077:

if (!_bt_pgaddtup(rightpage, itemsz, item, rightoff))
{
memset(rightpage, 0, BufferGetPageSize(rbuf));
elog(ERROR, "failed to add old item to the right sibling"
" while splitting block %u of index \"%s\"",
origpagenumber, RelationGetRelationName(rel));
}
rightoff = OffsetNumberNext(rightoff);

I'm a bit stuck from here on. Is the likely reason for this problem
filesystem corruption or am I barking at the wrong tree?

Many thanks for your time.

Marco
--
So young, and already so unknown -- Pauli

blog: http://mcraveiro.blogspot.com


From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres error when adding new page
Date: 2012-10-01 14:02:47
Message-ID: CAEYLb_V0WGG2sOrBgZwa-+PAkOUjQdOJtYsJ78HAB-ugm2Wa-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 1 October 2012 14:47, Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com> wrote:
> Hello Postgres general
>
> We're experiencing a lot of errors when using CDash on PostgreSQL 9.1,
> hosted on Mac OSX 10.6.8. The actual error message is as follows:
>
> SQL error in Cannot insert test:
> utility/asserter/assert_file_returns_true_for_empty_files into the
> database():ERROR: failed to add old item to the right sibling while
> splitting block 191 of index "crc323"<br>

A call to PageAddItem(), made within _bt_pgaddtup(), is where this
failure seems to ultimately originate from. What we're missing here is
the reason for PageAddItem() returning InvalidOffsetNumber. That is
usually, though not necessarily, separately available within a WARNING
log message, which you haven't included here. Could you please let us
know if there is a WARNING that you didn't include just prior to the
ERROR?

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres error when adding new page
Date: 2012-10-01 14:24:25
Message-ID: CAKRNd4w_xtqjbcBquphf-UFz1Zbb_uwWBdDXb8us+iEAuks=pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Peter,

Thanks for your prompt reply.

A call to PageAddItem(), made within _bt_pgaddtup(), is where this
> failure seems to ultimately originate from. What we're missing here is
> the reason for PageAddItem() returning InvalidOffsetNumber. That is
> usually, though not necessarily, separately available within a WARNING
> log message, which you haven't included here. Could you please let us
> know if there is a WARNING that you didn't include just prior to the
> ERROR?
>

No warning I'm afraid. These are the statements I see on the Postgres log
file:

2012-10-01 13:09:12 WEST ERROR: failed to add old item to the right
sibling while splitting block 191 of index "crc323"
2012-10-01 13:09:12 WEST STATEMENT: INSERT INTO test
(projectid,crc32,name,path,command,details,output)
VALUES
('2','2548249718','utility/xml/closing_an_open_text_reader_does_not_throw','./projects/utility/spec','e:\cmake\bin\cmake.exe
"-E" "chdir"
"E:/mingw/msys/1.0/home/ctest/build/Continuous/dogen/mingw-1.0.17-i686-gcc-4.7/build/stage/bin"
"E:/mingw/msys/1.0/home/ctest/build/Continuous/dogen/mingw-1.0.17-i686-gcc-4.7/build/stage/bin/dogen_utility_spec"
"--run_test=xml/closing_an_open_text_reader_does_not_throw"','Completed','UnVubmluZyAxIHRlc3QgY2FzZS4uLgoKKioqIE5vIGVycm9ycyBkZXRlY3RlZAo=')

These are repeated several times as CDash keeps on retrying. After a few
retries we succeed (the actual number of retries is variable - 8, 10, etc).

Cheers

Marco
--
So young, and already so unknown -- Pauli

blog: http://mcraveiro.blogspot.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres error when adding new page
Date: 2012-10-01 14:42:02
Message-ID: 24057.1349102522@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com> writes:
> We're experiencing a lot of errors when using CDash on PostgreSQL 9.1,
> hosted on Mac OSX 10.6.8. The actual error message is as follows:

> SQL error in Cannot insert test:
> utility/asserter/assert_file_returns_true_for_empty_files into the
> database():ERROR: failed to add old item to the right sibling while
> splitting block 191 of index "crc323"<br>

> I'm a bit stuck from here on. Is the likely reason for this problem
> filesystem corruption or am I barking at the wrong tree?

This definitely looks like index corruption, but blaming it on the
filesystem might be premature. I'm wondering if this could be an
artifact of the WAL-replay bug fixed in 9.1.6. I'd suggest updating
and then reindexing the index ...

regards, tom lane


From: Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres error when adding new page
Date: 2012-10-01 14:49:35
Message-ID: CAKRNd4wmyOpYkHmEDoHk4A27U0bGZLpEZkzmzFUrafn6JXqA3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Tom,

This definitely looks like index corruption, but blaming it on the
> filesystem might be premature. I'm wondering if this could be an
> artifact of the WAL-replay bug fixed in 9.1.6. I'd suggest updating
> and then reindexing the index ...
>
>
We are running 9.1.2 it seems:

select version();
version

----------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.1.2 on x86_64-apple-darwin, compiled by
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3),
64-bit

We'll look into upgrading to 9.1.6.

Cheers

Marco
--
So young, and already so unknown -- Pauli

blog: http://mcraveiro.blogspot.com


From: Marco Craveiro <marco(dot)craveiro(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres error when adding new page
Date: 2012-10-03 06:37:05
Message-ID: CAKRNd4xoOkvWQiWap0zhF+SNiqbH=HmBBPBdjKd1k72a6vuiTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom, Peter,

I'm wondering if this could be an artifact of the WAL-replay bug fixed in
>> 9.1.6. I'd suggest updating and then reindexing the index ...
>>
>>
> We are running 9.1.2 it seems
>
>
We did a file system check and it all appeared green, at least as far as
OSX is concerned. We then upgraded, but to save us time we moved directly
to 9.2.1. This is not ideal in terms of root-causing the underlying problem
but as we can't afford the time to do multiple upgrades, we had to go for
it.

CDash is now very happy - no errors on the log. Many thanks for your help.

Cheers

Marco
--
So young, and already so unknown -- Pauli

blog: http://mcraveiro.blogspot.com