Re: Corrupt database? 8.1/FreeBSD6.0

Lists: pgsql-generalpgsql-hackers
From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 03:06:47
Message-ID: 20070112030647.59137.qmail@web60824.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Looking backwards in the logs we see it a few other times this month...
(Autovacuum occurring just prior)...same transaction ID
How could it be the same transaction ID from several days prior?

Jan 2 03:05:04 prod-app-1 postgres[8524]: [4-1] 8524 LOG: autovacuum: processing database "template0"
Jan 2 03:05:05 prod-app-1 postgres[8524]: [5-1] 8524 ERROR: could not access status of transaction 1924023481
Jan 2 03:05:05 prod-app-1 postgres[8524]: [5-2] 8524 DETAIL: could not open file "pg_clog/072A": No such file or
directory

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> wrote: "PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518"

Started seeing this in the logs this afternoon...scaring the begeezus out of me.

Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory

I could find nothing any transaction refereced in pg_locks...nor could I find a file with that designation (was always the same transaction id)

ls -l /db/pg_clog/
total 984
-rw------- 1 pgsql wheel 262144 Jan 11 09:55 07CF
-rw------- 1 pgsql wheel 262144 Jan 11 13:45 07D0
-rw------- 1 pgsql wheel 262144 Jan 11 17:00 07D1
-rw------- 1 pgsql wheel 172032 Jan 11 20:39 07D2

At first I thought it was related to a constantly running batch process which I halted, and sure enough, the problem seemed to go away. If I restarted, the problem returned.

I then tried select * from on most of the tables used by that process (at least the last days worth) and found no obvious issues or errors.
About that time, I noticed that the errors were all IMMEDIATELY preceeded by an autovacuum of template0. So the logs actually looked like this:

Jan 11 19:20:19 prod-app-1 postgres[1752]: [4-1] 1752 LOG: autovacuum: processing database "template0"
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [4-1] 4094 LOG: autovacuum: processing database "template0"
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [4-1] 6728 LOG: autovacuum: processing database "template0"
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory

We've downloaded/compiled pg_filedump, but are stumped as to what relation (or even what database) to start with.

I restarted the batch process that I thought was the culprit and the issue has 'gone away'. I'm sure there is some corruption somewhere (from reading various similar posts), but not sure where to start.

Any help would be appreciated.

---------------------------------
Access over 1 million songs - Yahoo! Music Unlimited.


---------------------------------
Check out the all-new Yahoo! Mail beta - Fire up a more powerful email and get things done faster.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 04:37:19
Message-ID: 523.1168576639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> "PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518"

> Jan 2 03:05:04 prod-app-1 postgres[8524]: [4-1] 8524 LOG: autovacuum: processing database "template0"
> Jan 2 03:05:05 prod-app-1 postgres[8524]: [5-1] 8524 ERROR: could not access status of transaction 1924023481
> Jan 2 03:05:05 prod-app-1 postgres[8524]: [5-2] 8524 DETAIL: could not open file "pg_clog/072A": No such file or
> directory

> ls -l /db/pg_clog/
> total 984
> -rw------- 1 pgsql wheel 262144 Jan 11 09:55 07CF
> -rw------- 1 pgsql wheel 262144 Jan 11 13:45 07D0
> -rw------- 1 pgsql wheel 262144 Jan 11 17:00 07D1
> -rw------- 1 pgsql wheel 172032 Jan 11 20:39 07D2

So apparently there's some row in template0 that didn't get marked
committed before the pg_clog segment for it went away. Given 8.1's
rather schizophrenic view of whether it can modify template0 or not,
this is not too surprising, but I thought we'd put in some defense
against this happening. Alvaro?

Jeff, had you changed your autovac settings recently? Given that
autovac seems to be trying to hit template0 every few minutes, it's
hard to see how the tuple got missed for long enough to be a problem.

regards, tom lane


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 12:28:05
Message-ID: 20070112122805.20949.qmail@web60812.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:So apparently there's some row in template0 that didn't get marked
committed before the pg_clog segment for it went away. Given 8.1's
rather schizophrenic view of whether it can modify template0 or not,
this is not too surprising, but I thought we'd put in some defense
against this happening. Alvaro?

Jeff, had you changed your autovac settings recently? Given that
autovac seems to be trying to hit template0 every few minutes, it's
hard to see how the tuple got missed for long enough to be a problem.

Sure enough I did make autovacuum more agressive about 30-45 days ago (have to check the logs to find the exact date). Was originally whatever default settings that came out of the box with 8.1. Naptime is currently set to 60 seconds.

Am I to assume that this probably isn't the result of some operating system or filesystem misfeasance and that corrupt data in my 'real' databases is probably not an issue?


---------------------------------
Now that's room service! Choose from over 150,000 hotels
in 45,000 destinations on Yahoo! Travel to find your fit.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 12:47:27
Message-ID: 20070112124727.GJ10446@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Jeff Amiel wrote:

> Sure enough I did make autovacuum more agressive about 30-45 days ago (have to check the logs to find the exact date). Was originally whatever default settings that came out of the box with 8.1. Naptime is currently set to 60 seconds.
>
> Am I to assume that this probably isn't the result of some operating system or filesystem misfeasance and that corrupt data in my 'real' databases is probably not an issue?

Did you perchance connect to template0 (probably marking it as
connectable in the process), made some modification there, and then
mark it as non-connectable again, without executing VACUUM FREEZE on it?
AFAICS we only execute VACUUM FREEZE on it, so we shouldn't leave any
unfrozen tuples.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 13:48:35
Message-ID: 952344.50065.qm@web60819.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
Did you perchance connect to template0 (probably marking it as
connectable in the process), made some modification there, and then
mark it as non-connectable again, without executing VACUUM FREEZE on it?
AFAICS we only execute VACUUM FREEZE on it, so we shouldn't leave any
unfrozen tuples.

Nope. Have never touched template0...haven't made any changes to pg_hba.conf (which is how I assume you would make template0 connectable) until last night when we determined that template0 might be at fault and we were going to run pg_filedump on it to see if we could find anything useful.

I assume this is similar/identical to this issue:
http://archives.postgresql.org/pgsql-hackers/2006-03/msg01294.php

Any ideas on how should I move forward?


---------------------------------
Finding fabulous fares is fun.
Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 15:00:37
Message-ID: 6764.1168614037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Any ideas on how should I move forward?

Well, if the problem is indeed in pg_statistic, it'll be easy to repair
(think TRUNCATE...). Have you turned up the logging level to find out?

BTW, please don't do anything to try to correct the problem until we're
pretty sure we understand how this happened --- we might ask you for
more info. AFAICS this isn't having any bad effects except for bleats
in your log file, so you can wait.

regards, tom lane


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 15:16:42
Message-ID: 95812.43921.qm@web60813.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
Well, if the problem is indeed in pg_statistic, it'll be easy to repair
(think TRUNCATE...). Have you turned up the logging level to find out?

BTW, please don't do anything to try to correct the problem until we're
pretty sure we understand how this happened --- we might ask you for
more info. AFAICS this isn't having any bad effects except for bleats
in your log file, so you can wait.
We turned up the logging this morning.....(debug2) and are awaiting the issue to re-occur. Will report back then. Thanks.


---------------------------------
It's here! Your new message!
Get new email alerts with the free Yahoo! Toolbar.


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-13 14:58:43
Message-ID: 587309.33567.qm@web60816.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
BTW, please don't do anything to try to correct the problem until we're
pretty sure we understand how this happened --- we might ask you for
more info. AFAICS this isn't having any bad effects except for bleats
in your log file, so you can wait.

Happened again...however this time not appearingly related to an autovacuum. I have a past example of this almost identical event a few weeks ago. Here's what the logs look like with debug2 as the log_min level:

Jan 13 08:26:53 prod-app-1 postgres[41795]: [40171-1] 41795 LOG: autovacuum: processing database "template1"
Jan 13 08:26:53 prod-app-1 postgres[560]: [40171-1] 560 DEBUG: server process (PID 41794) exited with exit code 0
Jan 13 08:26:57 prod-app-1 postgres[563]: [915-1] 563 DEBUG: checkpoint starting
Jan 13 08:26:57 prod-app-1 postgres[563]: [916-1] 563 DEBUG: recycled transaction log file "000000010000005D00000069"
Jan 13 08:26:57 prod-app-1 postgres[563]: [917-1] 563 DEBUG: checkpoint complete; 0 transaction log file(s) added, 0 removed, 1 recycled
Jan 13 08:27:02 prod-app-1 postgres[560]: [40172-1] 560 DEBUG: forked new backend, pid=42368 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40173-1] 560 DEBUG: forked new backend, pid=42369 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40174-1] 560 DEBUG: forked new backend, pid=42370 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40175-1] 560 DEBUG: server process (PID 42370) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40176-1] 560 DEBUG: forked new backend, pid=42371 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40177-1] 560 DEBUG: server process (PID 42369) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40178-1] 560 DEBUG: server process (PID 42371) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40179-1] 560 DEBUG: forked new backend, pid=42372 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40180-1] 560 DEBUG: server process (PID 42372) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40181-1] 560 DEBUG: forked new backend, pid=42373 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40182-1] 560 DEBUG: server process (PID 42373) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40183-1] 560 DEBUG: forked new backend, pid=42374 socket=8
Jan 13 08:27:02 prod-app-1 postgres[560]: [40184-1] 560 DEBUG: server process (PID 42374) exited with exit code 0
Jan 13 08:27:02 prod-app-1 postgres[560]: [40185-1] 560 DEBUG: server process (PID 42368) exited with exit code 0
Jan 13 08:27:08 prod-app-1 postgres[560]: [40186-1] 560 DEBUG: forked new backend, pid=42375 socket=8
Jan 13 08:27:08 prod-app-1 postgres[560]: [40187-1] 560 DEBUG: server process (PID 42375) exited with exit code 0
Jan 13 08:27:23 prod-app-1 postgres[560]: [40188-1] 560 DEBUG: forked new backend, pid=42376 socket=8
Jan 13 08:27:23 prod-app-1 postgres[560]: [40189-1] 560 DEBUG: server process (PID 42376) exited with exit code 0
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-1] jboss 92257 ERROR: could not access status of transaction 2107200825
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-2] jboss 92257 DETAIL: could not open file "pg_clog/07D9": No such file or directory
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-3] jboss 92257 CONTEXT: SQL statement "DELETE FROM audit_metadata WHERE user_id <> -1"
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-4] PL/pgSQL function "disable_auditing" line 2 at SQL statement
Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-5] jboss 92257 STATEMENT: select disable_auditing()
Jan 13 08:27:38 prod-app-1 postgres[560]: [40190-1] 560 DEBUG: forked new backend, pid=42377 socket=8
Jan 13 08:27:38 prod-app-1 postgres[560]: [40191-1] 560 DEBUG: server process (PID 42377) exited with exit code 0
Jan 13 08:27:49 prod-app-1 postgres[560]: [40192-1] 560 DEBUG: forked new backend, pid=42378 socket=8
Jan 13 08:27:50 prod-app-1 postgres[560]: [40193-1] 560 DEBUG: forked new backend, pid=42379 socket=8
Jan 13 08:27:50 prod-app-1 postgres[560]: [40194-1] 560 DEBUG: forked new backend, pid=42380 socket=8
Jan 13 08:27:53 prod-app-1 postgres[560]: [40195-1] 560 DEBUG: forked new backend, pid=42381 socket=8
Jan 13 08:27:53 prod-app-1 postgres[42382]: [40196-1] 42382 LOG: autovacuum: processing database "postgres"
Jan 13 08:27:53 prod-app-1 postgres[560]: [40196-1] 560 DEBUG: server process (PID 42381) exited with exit code 0
Jan 13 08:28:02 prod-app-1 postgres[560]: [40197-1] 560 DEBUG: forked new backend, pid=42951 socket=8
Jan 13 08:28:02 prod-app-1 postgres[560]: [40198-1] 560 DEBUG: forked new backend, pid=42952 socket=8

pg_clog dir looks like this:
-rw------- 1 pgsql wheel 262144 Jan 13 05:41 07DA
-rw------- 1 pgsql wheel 262144 Jan 13 08:06 07DB
-rw------- 1 pgsql wheel 90112 Jan 13 08:51 07DC

Now that table, audit_metadata, is a temporary table (when accessed by jboss as it is here). There is a 'rea'l table with the same name, but only used by batch processes that connect to postgres.

Thoughts?


---------------------------------
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food & Drink Q&A.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-13 16:54:29
Message-ID: 25129.1168707269@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-1] jboss 92257 ERROR: could not access status of transaction 2107200825
> Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-2] jboss 92257 DETAIL: could not open file "pg_clog/07D9": No such file or directory
> Jan 13 08:27:26 prod-app-1 postgres[92257]: [30259-3] jboss 92257 CONTEXT: SQL statement "DELETE FROM audit_metadata WHERE user_id <> -1"

> pg_clog dir looks like this:
> -rw------- 1 pgsql wheel 262144 Jan 13 05:41 07DA
> -rw------- 1 pgsql wheel 262144 Jan 13 08:06 07DB
> -rw------- 1 pgsql wheel 90112 Jan 13 08:51 07DC

> Now that table, audit_metadata, is a temporary table (when accessed by jboss as it is here). There is a 'rea'l table with the same name, but only used by batch processes that connect to postgres.

Really? Wow, *that's* an interesting thought. Is it likely that that
temp table could contain many-hour-old data?

This seems unrelated to your issue with autovacuum (which should never
touch a temp table, and certainly isn't going to find one in template0)
... but I suddenly fear that we've missed a fundamental point about
pg_clog truncation. And WAL wraparound for that matter. To wit, a
sufficiently long-lived temp table could contain old XIDs, and there's
no way for anyone except the owning backend to clean them out, or even
guarantee that they're marked committed.

Thoughts?

regards, tom lane


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-13 17:40:41
Message-ID: 758295.45906.qm@web60818.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
Really? Wow, *that's* an interesting thought. Is it likely that that
temp table could contain many-hour-old data?

Certainly...our connection pool used by jboss can have connections to postgres persisting for multiple days. (We're still looking for a way to tell it to recycle these occasionally). As each 'user' of our web based app performs some action, they acquire one of the connection pool connections and set their user_id in the temporary table used by that connection (we use that for our audit triggers) Once they are 'done' with the connection, the connection is just released back to the pool but not actually closed...so the temp table still contains the data from a previous iteration.


---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-14 04:22:49
Message-ID: 3445.1168748569@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I wrote:
> ... but I suddenly fear that we've missed a fundamental point about
> pg_clog truncation. And WAL wraparound for that matter. To wit, a
> sufficiently long-lived temp table could contain old XIDs, and there's
> no way for anyone except the owning backend to clean them out, or even
> guarantee that they're marked committed.

After further thought I believe this is OK as of 8.2, because a temp
table's relfrozenxid is tracked independently of any other's. (This
problem puts a stake through the heart of the recently-discussed idea
that a temp table might be able to get along without a globally visible
pg_class entry, however.)

But it seems that we need a band-aid for 8.1 and earlier. The simplest
fix I can think of is for vacuum not to attempt to advance the
datvacuumxid/datfrozenxid fields if it skipped over any temp tables of
other backends. That's a bit nasty, since in a database making heavy
use of temp tables, you might do a whole lot of vacuums without ever
meeting that condition. Anyone have a better idea?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-18 16:42:49
Message-ID: 20070118164249.GO26080@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> I wrote:
> > ... but I suddenly fear that we've missed a fundamental point about
> > pg_clog truncation. And WAL wraparound for that matter. To wit, a
> > sufficiently long-lived temp table could contain old XIDs, and there's
> > no way for anyone except the owning backend to clean them out, or even
> > guarantee that they're marked committed.
>
> After further thought I believe this is OK as of 8.2, because a temp
> table's relfrozenxid is tracked independently of any other's. (This
> problem puts a stake through the heart of the recently-discussed idea
> that a temp table might be able to get along without a globally visible
> pg_class entry, however.)
>
> But it seems that we need a band-aid for 8.1 and earlier. The simplest
> fix I can think of is for vacuum not to attempt to advance the
> datvacuumxid/datfrozenxid fields if it skipped over any temp tables of
> other backends. That's a bit nasty, since in a database making heavy
> use of temp tables, you might do a whole lot of vacuums without ever
> meeting that condition. Anyone have a better idea?

That seems nasty. Can we examine the xmin of the pg_class entry for
temp tables instead?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-18 16:52:23
Message-ID: 28018.1169139143@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> But it seems that we need a band-aid for 8.1 and earlier. The simplest
>> fix I can think of is for vacuum not to attempt to advance the
>> datvacuumxid/datfrozenxid fields if it skipped over any temp tables of
>> other backends. That's a bit nasty, since in a database making heavy
>> use of temp tables, you might do a whole lot of vacuums without ever
>> meeting that condition. Anyone have a better idea?

> That seems nasty. Can we examine the xmin of the pg_class entry for
> temp tables instead?

No, because any sort of schema update on the temp table would rewrite
its pg_class row with a newer version. You couldn't assume that the
pg_class row is older than what's in the table. Consider this perfectly
reasonable scenario:

CREATE TEMP TABLE foo ...
COPY foo FROM ...
CREATE INDEX ... <- must set relhasindex

regards, tom lane