missing chunk 0 for toast value ...

Lists: pgsql-general
From: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: missing chunk 0 for toast value ...
Date: 2011-10-25 20:47:27
Message-ID: CAAC-ZNU00km2bfvrcGW8nURXnoB16uMvb5T=bSPCBTcCWNQ8XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I found the following error message in my logfiles.

Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
user=nutricate,db=admin_production ERROR: missing chunk number 0 for
toast value
2411466 in pg_toast_2619
Oct 24 04:05:57 db-app02 postgres[24640]: [2-2]
user=nutricate,db=admin_production STATEMENT: SELECT "devices"."id",
"devices"."name", "devices"."info", "devices"."pos_id",
"devices"."device_group_id", "devices"."header_id", "devices"."footer_id",
"devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id",
"devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version",
"devices"."device_status", "devices"."entity_id", "devices"."address",
"devices"."created_by", "devices"."create_method", "devices"."created_on",
"devices"."modified_by", "devices"."updated_on", "devices"."version_id",
"devices"."unique_id", "devices"."hw_key", "devices"."config_status",
"devices"."activated_on", "devices"."store_id",
"devices"."last_status_update", "devices"."loaded_on",
"devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id",
"devices"."scanner_comm_id", "devices"."public_address",
"devices"."hostname", "devices"."update_sw_version",
"devices"."proxy_address", "devices"."proxy_type_id",
"devices"."build_error",
Oct 24 04:05:57 db-app02 postgres[24640]: [2-3] "devices"."local_info",
"devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN
(SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280 AND U0."rgt" <
2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )

I tried reproducing it with

SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM
"entities" U0 WHERE ((U0."lft" > 280 AND U0."rgt" < 2597 ) OR U0."id" = 140
)) AND "devices"."auth_code" = E'0063' )

This returned no rows and did not reproduce the error. I tried reproducing
with an even broader approach

CREATE TEMP TABLE foo AS SELECT * FROM devices;
CREATE TEMP TABLE bar AS SELECT * FROM entities;

Still no repro. This is the second time I've seen this error, and the last
time it also did not reproduce. I'm wondering a couple of things. First, is
there some other way to reproduce this error? Second, would simply deleting
the row cause the problem to go away? I wouldn't think so, but why else is
it not reproducing? Third, what is the recommended course of action here. We
have hourly backups, but this is a production database, so I would prefer
not to have to restore. I tested a restore from an hour ago and it ran
successfully.

A


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-25 21:07:39
Message-ID: CAHyXU0yjmNTcRQhzr8tdzF0F3pWspX_nWpQjRQdFLK5OVO5xDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 25, 2011 at 3:47 PM, Andrew Hammond
<andrew(dot)george(dot)hammond(at)gmail(dot)com> wrote:
> I found the following error message in my logfiles.
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR:  missing chunk number 0
> for toast value 2411466 in pg_toast_2619
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-2]
> user=nutricate,db=admin_production STATEMENT:  SELECT "devices"."id",
> "devices"."name", "devices"."info", "devices"."pos_id",
> "devices"."device_group_id", "devices"."header_id", "devices"."footer_id",
> "devices"."device_type_id", "devices"."auth_code", "devices"."pos_comm_id",
> "devices"."printer_comm_id", "devices"."sw_version", "devices"."hw_version",
> "devices"."device_status", "devices"."entity_id", "devices"."address",
> "devices"."created_by", "devices"."create_method", "devices"."created_on",
> "devices"."modified_by", "devices"."updated_on", "devices"."version_id",
> "devices"."unique_id", "devices"."hw_key", "devices"."config_status",
> "devices"."activated_on", "devices"."store_id",
> "devices"."last_status_update", "devices"."loaded_on",
> "devices"."header2_id", "devices"."footer2_id", "devices"."timezone_id",
> "devices"."scanner_comm_id", "devices"."public_address",
> "devices"."hostname", "devices"."update_sw_version",
> "devices"."proxy_address", "devices"."proxy_type_id",
> "devices"."build_error",
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-3]  "devices"."local_info",
> "devices"."associated_on" FROM "devices" WHERE ("devices"."entity_id" IN
> (SELECT U0."id" FROM "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" <
> 2597 ) OR U0."id" = 140 )) AND "devices"."auth_code" = E'0063' )
> I tried reproducing it with
> SELECT * FROM devices WHERE ("devices"."entity_id" IN (SELECT U0."id" FROM
> "entities" U0 WHERE ((U0."lft" > 280  AND U0."rgt" < 2597 ) OR U0."id" = 140
> )) AND "devices"."auth_code" = E'0063' )
> This returned no rows and did not reproduce the error. I tried reproducing
> with an even broader approach
> CREATE TEMP TABLE foo AS SELECT * FROM devices;
> CREATE TEMP TABLE bar AS SELECT * FROM entities;
> Still no repro. This is the second time I've seen this error, and the last
> time it also did not reproduce. I'm wondering a couple of things. First, is
> there some other way to reproduce this error? Second, would simply deleting
> the row cause the problem to go away? I wouldn't think so, but why else is
> it not reproducing? Third, what is the recommended course of action here. We
> have hourly backups, but this is a production database, so I would prefer
> not to have to restore. I tested a restore from an hour ago and it ran
> successfully.

hrm, that's bizarre -- version etc?

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-25 21:20:16
Message-ID: 28935.1319577616@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> user=nutricate,db=admin_production ERROR: missing chunk number 0 for
> toast value
> 2411466 in pg_toast_2619

Hmm ... pg_toast_2619 is pg_statistic's toast table. Is this 9.0.x,
and are you by any chance in the habit of running CLUSTER or VACUUM FULL
on your system catalogs? Could one have been running in the background
when this happened?

regards, tom lane


From: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-25 22:12:27
Message-ID: CAAC-ZNV7+oW+geDtRNyb=dQkA1nuqOTXBcDtL_fZ_FbvM3e-qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 25, 2011 at 2:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> > Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
> > user=nutricate,db=admin_production ERROR: missing chunk number 0 for
> > toast value
> > 2411466 in pg_toast_2619
>
> Hmm ... pg_toast_2619 is pg_statistic's toast table. Is this 9.0.x,
> and are you by any chance in the habit of running CLUSTER or VACUUM FULL
> on your system catalogs? Could one have been running in the background
> when this happened?
>

This is 9.0.4.

I am not aware of any VACUUM FULL or CLUSTER operations that are scheduled.
To the best of my knowledge this DB has never been either clustered or
vacuum full'd. The following occur in the log file immediately before the
error message above.

Oct 24 04:05:57 db-app02 postgres[24639]: [2-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_catalog.pg_statistic": index scans: 0
Oct 24 04:05:57 db-app02 postgres[24639]: [2-2] #011pages: 0 removed, 150
remain
Oct 24 04:05:57 db-app02 postgres[24639]: [2-3] #011tuples: 254 removed, 925
remain
Oct 24 04:05:57 db-app02 postgres[24639]: [2-4] #011system usage: CPU
0.01s/0.00u sec elapsed 0.02 sec
Oct 24 04:05:57 db-app02 postgres[24639]: [3-1] user=,db= LOG: automatic
vacuum of table "admin_production.public.devices": index scans: 0
Oct 24 04:05:57 db-app02 postgres[24639]: [3-2] #011pages: 0 removed, 353
remain
Oct 24 04:05:57 db-app02 postgres[24639]: [3-3] #011tuples: 824 removed,
2261 remain
Oct 24 04:05:57 db-app02 postgres[24639]: [3-4] #011system usage: CPU
0.00s/0.00u sec elapsed 0.08 sec
Oct 24 04:05:57 db-app02 postgres[24639]: [4-1] user=,db= LOG: automatic
analyze of table "admin_production.public.devices" system usage: CPU
0.00s/0.08u sec elapsed 0.17 sec


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-25 23:11:26
Message-ID: 717.1319584286@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> The following occur in the log file immediately before the
> error message above.

Hmm, that's pretty interesting ... are there any nearby autovacuums of
pg_toast_2619?

regards, tom lane


From: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-25 23:33:41
Message-ID: CAAC-ZNUP34YSvtmGYpOS=Ofh0bL3GNSyj+xjz7Zz_7fgVa2pig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> > The following occur in the log file immediately before the
> > error message above.
>
> Hmm, that's pretty interesting ... are there any nearby autovacuums of
> pg_toast_2619?
>

Several, both before and after the error message:

Oct 24 03:49:57 db-app02 postgres[23554]: [4-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:49:57 db-app02 postgres[23554]: [4-2] #011pages: 0 removed, 206
remain
Oct 24 03:49:57 db-app02 postgres[23554]: [4-3] #011tuples: 220 removed, 608
remain
Oct 24 03:49:57 db-app02 postgres[23554]: [4-4] #011system usage: CPU
0.00s/0.00u sec elapsed 0.10 sec

Oct 24 03:53:57 db-app02 postgres[23800]: [5-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:53:57 db-app02 postgres[23800]: [5-2] #011pages: 0 removed, 206
remain
Oct 24 03:53:57 db-app02 postgres[23800]: [5-3] #011tuples: 220 removed, 608
remain
Oct 24 03:53:57 db-app02 postgres[23800]: [5-4] #011system usage: CPU
0.01s/0.00u sec elapsed 0.12 sec

Oct 24 03:57:57 db-app02 postgres[24059]: [4-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 03:57:57 db-app02 postgres[24059]: [4-2] #011pages: 0 removed, 206
remain
Oct 24 03:57:57 db-app02 postgres[24059]: [4-3] #011tuples: 220 removed, 608
remain
Oct 24 03:57:57 db-app02 postgres[24059]: [4-4] #011system usage: CPU
0.00s/0.00u sec elapsed 0.04 sec

Oct 24 04:01:57 db-app02 postgres[24394]: [4-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:01:57 db-app02 postgres[24394]: [4-2] #011pages: 0 removed, 206
remain
Oct 24 04:01:57 db-app02 postgres[24394]: [4-3] #011tuples: 220 removed, 608
remain
Oct 24 04:01:57 db-app02 postgres[24394]: [4-4] #011system usage: CPU
0.00s/0.01u sec elapsed 0.04 sec

Oct 24 04:05:57 db-app02 postgres[24640]: [2-1]
user=nutricate,db=admin_production ERROR: missing chunk number 0 for toast
value 2411466 in pg_toast_2619

Oct 24 04:05:57 db-app02 postgres[24639]: [6-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:05:57 db-app02 postgres[24639]: [6-2] #011pages: 0 removed, 206
remain
Oct 24 04:05:57 db-app02 postgres[24639]: [6-3] #011tuples: 216 removed, 608
remain
Oct 24 04:05:57 db-app02 postgres[24639]: [6-4] #011system usage: CPU
0.00s/0.00u sec elapsed 0.08 sec

Oct 24 04:09:57 db-app02 postgres[24877]: [5-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:09:57 db-app02 postgres[24877]: [5-2] #011pages: 0 removed, 206
remain
Oct 24 04:09:57 db-app02 postgres[24877]: [5-3] #011tuples: 220 removed, 608
remain
Oct 24 04:09:57 db-app02 postgres[24877]: [5-4] #011system usage: CPU
0.00s/0.00u sec elapsed 0.10 sec

Oct 24 04:13:57 db-app02 postgres[25116]: [6-1] user=,db= LOG: automatic
vacuum of table "admin_production.pg_toast.pg_toast_2619": index scans: 1
Oct 24 04:13:57 db-app02 postgres[25116]: [6-2] #011pages: 0 removed, 206
remain
Oct 24 04:13:57 db-app02 postgres[25116]: [6-3] #011tuples: 220 removed, 608
remain
Oct 24 04:13:57 db-app02 postgres[25116]: [6-4] #011system usage: CPU
0.00s/0.00u sec elapsed 0.12 sec


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-25 23:53:30
Message-ID: 1418.1319586810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hmm, that's pretty interesting ... are there any nearby autovacuums of
>> pg_toast_2619?

> Several, both before and after the error message:

Well, it seems clear that somehow the vacuum deleted a toast tuple that
the other statement was about to fetch, but it's not clear how this
could be. The pg_statistic fetch must have come from the planner, which
should always be called with a transaction snapshot established, and
that ought to protect it against vacuum deleting anything that could be
visible to SnapshotNow. Weird.

[ pokes around for a bit ... ] Hmm, can you say how the failing query
was submitted, exactly? I'm wondering if it came in via simple Query
(PQexec) or extended-query protocol (anything with parameters).

regards, tom lane


From: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-26 00:04:46
Message-ID: CAAC-ZNX7L9ShEJbg1L-LYyWkp65RCOFYM=EPdFJ+miDNC7Dcuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> > On Tue, Oct 25, 2011 at 4:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Hmm, that's pretty interesting ... are there any nearby autovacuums of
> >> pg_toast_2619?
>
> > Several, both before and after the error message:
>
> Well, it seems clear that somehow the vacuum deleted a toast tuple that
> the other statement was about to fetch, but it's not clear how this
> could be. The pg_statistic fetch must have come from the planner, which
> should always be called with a transaction snapshot established, and
> that ought to protect it against vacuum deleting anything that could be
> visible to SnapshotNow. Weird.
>
> [ pokes around for a bit ... ] Hmm, can you say how the failing query
> was submitted, exactly? I'm wondering if it came in via simple Query
> (PQexec) or extended-query protocol (anything with parameters).
>

The command was sent from a python via django-1.2.1 using psycopg2-2.2.1

Andrew


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-26 08:42:13
Message-ID: CA+mi_8bF6mUxMMKa1q8Ci+WUoZxJq6QxuSi2WAOg5=YgA_jdeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Oct 26, 2011 at 1:04 AM, Andrew Hammond
<andrew(dot)george(dot)hammond(at)gmail(dot)com> wrote:
> On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> [ pokes around for a bit ... ]  Hmm, can you say how the failing query
>> was submitted, exactly?  I'm wondering if it came in via simple Query
>> (PQexec) or extended-query protocol (anything with parameters).
>
> The command was sent from a python via django-1.2.1 using psycopg2-2.2.1

PQexec then.

-- Daniele


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-10-26 14:39:34
Message-ID: 12734.1319639974@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> writes:
> On Wed, Oct 26, 2011 at 1:04 AM, Andrew Hammond
> <andrew(dot)george(dot)hammond(at)gmail(dot)com> wrote:
>> On Tue, Oct 25, 2011 at 4:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> [ pokes around for a bit ... ] Hmm, can you say how the failing query
>>> was submitted, exactly? I'm wondering if it came in via simple Query
>>> (PQexec) or extended-query protocol (anything with parameters).

>> The command was sent from a python via django-1.2.1 using psycopg2-2.2.1

> PQexec then.

Thanks for the followup. My current theory is that that's unrelated,
though --- it now seems that this is just a race condition in the use of
toasted pg_statistic entries. See
http://archives.postgresql.org/pgsql-hackers/2011-10/msg01366.php

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: missing chunk 0 for toast value ...
Date: 2011-11-01 23:56:02
Message-ID: 116.1320191762@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
> On Tue, Oct 25, 2011 at 2:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hmm ... pg_toast_2619 is pg_statistic's toast table. Is this 9.0.x,
>> and are you by any chance in the habit of running CLUSTER or VACUUM FULL
>> on your system catalogs? Could one have been running in the background
>> when this happened?

> This is 9.0.4.

I've applied a patch for this:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f797d27fe0be5200ad5fd5af6cefcee30c8e24a

It turns out that the problem is very significantly harder to reproduce
pre-9.0, which may explain why you hadn't seen it before ... but it does
exist, at least as far back as 8.2, and probably in every version that
has TOAST tables.

regards, tom lane