Re: Errors on missing pg_subtrans/ files with 9.3

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: J Smith <dark(dot)panda+lists(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Errors on missing pg_subtrans/ files with 9.3
Date: 2013-11-25 11:47:12
Message-ID: 20131125114712.GI28145@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2013-11-24 16:56:26 -0500, J Smith wrote:
> coredumper worked like a charm. Useful tool, that is... although as a
> bit of advice, I'd try not to run it on Postgres if your various
> memory settings are tweaked towards production use -- the core dump
> that was captured on my server weighed in at 16 GB.

> Nov 23 14:38:32 dev postgres[23810]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 13514992
> Nov 23 14:38:32 dev postgres[23810]: [4-2] user=dev,db=dev DETAIL: Could not open file "pg_subtrans/00CE": Success.
> Nov 23 14:38:32 dev postgres[23810]: [4-3] user=dev,db=dev CONTEXT: SQL statement "SELECT 1 FROM ONLY "dev"."collection_batches" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"

Ok, this is helpful. Do you rather longrunning transactions? The
transaction that does foreign key checks has an xid of 10260613, while
the row that's getting checked has 13514992.

#4 0x0000000000635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501
tag = {locktag_field1 = 13514992, locktag_field2 = 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'}
#5 0x0000000000482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, tuple=<value optimized out>, ctid=<value optimized out>, xid=10260613, mode=LockTupleKeyShare) at heapam.c:4847

I am not sure whether that's the origin of the problem but at the very
least it seems to me that heap_lock_updated_tuple_rec() is missing
several important pieces:
a) do the priorXmax==xmin dance to check we're still following the same
ctid chain. Currently we could easily stumble across completely
unrelated tuples if a chain element aborted and got vacuumed.
b) Check whether a chain element actually aborted - currently we're
only doing that in the HEAP_KEYS_UPDATED updated case, but that seems
wrong (we can't check for committed tho!).
c) (reported separately as well) cope with failure of heap_fetch() to
return anything.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2013-11-25 12:02:47 Re: Logging WAL when updating hintbit
Previous Message Simon Riggs 2013-11-25 10:00:09 Re: Sequence Access Method WIP