Corrupt database? 8.1/FreeBSD6.0

Lists: pgsql-generalpgsql-hackerspgsql-patches
From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 02:50:41
Message-ID: 599454.19005.qm@web60814.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

"PostgreSQL 8.1.2 on i386-portbld-freebsd6.0, compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518"

Started seeing this in the logs this afternoon...scaring the begeezus out of me.

Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory

I could find nothing any transaction refereced in pg_locks...nor could I find a file with that designation (was always the same transaction id)

ls -l /db/pg_clog/
total 984
-rw------- 1 pgsql wheel 262144 Jan 11 09:55 07CF
-rw------- 1 pgsql wheel 262144 Jan 11 13:45 07D0
-rw------- 1 pgsql wheel 262144 Jan 11 17:00 07D1
-rw------- 1 pgsql wheel 172032 Jan 11 20:39 07D2

At first I thought it was related to a constantly running batch process which I halted, and sure enough, the problem seemed to go away. If I restarted, the problem returned.

I then tried select * from on most of the tables used by that process (at least the last days worth) and found no obvious issues or errors.
About that time, I noticed that the errors were all IMMEDIATELY preceeded by an autovacuum of template0. So the logs actually looked like this:

Jan 11 19:20:19 prod-app-1 postgres[1752]: [4-1] 1752 LOG: autovacuum: processing database "template0"
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-1] 1752 ERROR: could not access status of transaction 1924023481
Jan 11 19:20:19 prod-app-1 postgres[1752]: [5-2] 1752 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:24:35 prod-app-1 postgres[4094]: [4-1] 4094 LOG: autovacuum: processing database "template0"
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-1] 4094 ERROR: could not access status of transaction 1924023481
Jan 11 19:24:35 prod-app-1 postgres[4094]: [5-2] 4094 DETAIL: could not open file "pg_clog/072A": No such file or directory
Jan 11 19:28:35 prod-app-1 postgres[6728]: [4-1] 6728 LOG: autovacuum: processing database "template0"
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-1] 6728 ERROR: could not access status of transaction 1924023481
Jan 11 19:28:35 prod-app-1 postgres[6728]: [5-2] 6728 DETAIL: could not open file "pg_clog/072A": No such file or directory

We've downloaded/compiled pg_filedump, but are stumped as to what relation (or even what database) to start with.

I restarted the batch process that I thought was the culprit and the issue has 'gone away'. I'm sure there is some corruption somewhere (from reading various similar posts), but not sure where to start.

Any help would be appreciated.


---------------------------------
Access over 1 million songs - Yahoo! Music Unlimited.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 04:40:32
Message-ID: 547.1168576832@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> We've downloaded/compiled pg_filedump, but are stumped as to what relation (or even what database) to start with.

Turn up log_min_messages to DEBUG2 and you'll be able to see which
table autovac is failing at. If I had to bet I'd bet on
template0.pg_statistic ...

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 13:55:52
Message-ID: 20070112135552.GO10446@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:
> Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> > We've downloaded/compiled pg_filedump, but are stumped as to what relation (or even what database) to start with.
>
> Turn up log_min_messages to DEBUG2 and you'll be able to see which
> table autovac is failing at. If I had to bet I'd bet on
> template0.pg_statistic ...

Hmm, that would mean an ANALYZE got done on template0, no? ... but
AFAICT process_whole_db() always sets analyze=false.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 14:16:23
Message-ID: 5984.1168611383@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm, that would mean an ANALYZE got done on template0, no? ... but
> AFAICT process_whole_db() always sets analyze=false.

The thing that's bothering me is that I don't see any certainty that
template0 is only processed via the process_whole_db() path. In the
8.1 code, the existence of a stats-collector DB entry causes a DB
to enter the normal round-robin processing path ... and I'm wondering
whether the mere act of autovac connecting due to process_whole_db()
doesn't cause such an entry to come into existence. IOW once template0
has been vacuumed once for anti-wraparound, it starts receiving normal
vacuums, which will cause their own XIDs to appear in pg_statistic
at least, and then you're at risk: if autovacuuming stops for awhile,
those XIDs will fall below the pg_clog horizon because template0 is
not considered when determining the cutoff.

8.2's approach is saner but I think we need some sort of band-aid
in the 8.1 branch...

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 14:38:07
Message-ID: 20070112143807.GA27743@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Hmm, that would mean an ANALYZE got done on template0, no? ... but
> > AFAICT process_whole_db() always sets analyze=false.
>
> The thing that's bothering me is that I don't see any certainty that
> template0 is only processed via the process_whole_db() path. In the
> 8.1 code, the existence of a stats-collector DB entry causes a DB
> to enter the normal round-robin processing path ... and I'm wondering
> whether the mere act of autovac connecting due to process_whole_db()
> doesn't cause such an entry to come into existence.

Hmm, as far as I can tell, the database entry would not be created
merely by a vacuum. The only way to create a database entry in pgstat
is by calling pgstat_recv_tabstat(); and pgstat_report_tabstat is only
called in postgres.c (not invoked via autovacuum) and in
pgstat_beshutdown_hook (not sure if this one is).

So I agree that there is a risk if the user connects to template0 and
the database pgstat entry gets created -- but that doesn't seem to be
the case here.

Confirmation on which table is causing the trouble would be good.

> 8.2's approach is saner but I think we need some sort of band-aid
> in the 8.1 branch...

Maybe we could forcibly activate the freeze mode on a template database?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 14:52:57
Message-ID: 6668.1168613577@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm, as far as I can tell, the database entry would not be created
> merely by a vacuum. The only way to create a database entry in pgstat
> is by calling pgstat_recv_tabstat(); and pgstat_report_tabstat is only
> called in postgres.c (not invoked via autovacuum) and in
> pgstat_beshutdown_hook (not sure if this one is).

The shutdown hook surely is called, and the scenario for sending stats
seems pretty obvious: stats_block_level = ON.

> Maybe we could forcibly activate the freeze mode on a template database?

Might not be a bad idea. And even more to the point, forcibly disable
analyze.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 16:22:59
Message-ID: 20070112162259.GE27743@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:

> > Maybe we could forcibly activate the freeze mode on a template database?
>
> Might not be a bad idea. And even more to the point, forcibly disable
> analyze.

Patch implementing this (albeit untested!) attached. I'll try to
reproduce the problem without the patch, and then test with the patch
applied.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Attachment Content-Type Size
autovac-freeze.patch text/x-diff 5.0 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 20:13:44
Message-ID: 20070112201344.GJ27743@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>
> > > Maybe we could forcibly activate the freeze mode on a template database?
> >
> > Might not be a bad idea. And even more to the point, forcibly disable
> > analyze.
>
> Patch implementing this (albeit untested!) attached. I'll try to
> reproduce the problem without the patch, and then test with the patch
> applied.

Ok, it does what it's intended to do. But in testing it I also
confirmed that a database-wide vacuum creates a pgstat entry for it and
for all tables in it. Is this something we want to prevent?

I'll apply this patch later today to the 8.1 branch unless somebody
objects.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 20:45:42
Message-ID: 9875.1168634742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Ok, it does what it's intended to do. But in testing it I also
> confirmed that a database-wide vacuum creates a pgstat entry for it and
> for all tables in it. Is this something we want to prevent?

That's odd, because I didn't see any such thing when I tested in CVS tip
the other day. Or did you have stats_block_level turned on?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 20:54:30
Message-ID: 20070112205430.GL27743@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Ok, it does what it's intended to do. But in testing it I also
> > confirmed that a database-wide vacuum creates a pgstat entry for it and
> > for all tables in it. Is this something we want to prevent?
>
> That's odd, because I didn't see any such thing when I tested in CVS tip
> the other day. Or did you have stats_block_level turned on?

Yes, I turned it on for this test.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-01-12 21:41:57
Message-ID: 10317.1168638117@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>> Ok, it does what it's intended to do. But in testing it I also
>>> confirmed that a database-wide vacuum creates a pgstat entry for it and
>>> for all tables in it. Is this something we want to prevent?
>>
>> That's odd, because I didn't see any such thing when I tested in CVS tip
>> the other day. Or did you have stats_block_level turned on?

> Yes, I turned it on for this test.

Well, the vacuums certainly accounted for I/O, so I suppose this is
reasonable behavior.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [GENERAL] Corrupt database?
Date: 2007-01-20 04:32:30
Message-ID: 200701200432.l0K4WUD14013@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches


Has this been applied, and should it be?

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>
> > > Maybe we could forcibly activate the freeze mode on a template database?
> >
> > Might not be a bad idea. And even more to the point, forcibly disable
> > analyze.
>
> Patch implementing this (albeit untested!) attached. I'll try to
> reproduce the problem without the patch, and then test with the patch
> applied.
>
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [pgsql-patches] [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-02-02 22:58:06
Message-ID: 200702022258.l12Mw6k04769@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches


What is the status of this patch?

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>
> > > Maybe we could forcibly activate the freeze mode on a template database?
> >
> > Might not be a bad idea. And even more to the point, forcibly disable
> > analyze.
>
> Patch implementing this (albeit untested!) attached. I'll try to
> reproduce the problem without the patch, and then test with the patch
> applied.
>
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Re: [pgsql-patches] [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-02-02 23:14:36
Message-ID: 45C3C5DC.40508@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Bruce Momjian wrote:
> What is the status of this patch?

Alvaro is currently out of town. He should be able to provide more info
next week.

Sincerely,

Joshua D. Drake

>
> ---------------------------------------------------------------------------
>
> Alvaro Herrera wrote:
>> Tom Lane wrote:
>>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>>> Maybe we could forcibly activate the freeze mode on a template database?
>>> Might not be a bad idea. And even more to the point, forcibly disable
>>> analyze.
>> Patch implementing this (albeit untested!) attached. I'll try to
>> reproduce the problem without the patch, and then test with the patch
>> applied.
>>
>> --
>> Alvaro Herrera http://www.CommandPrompt.com/
>> The PostgreSQL Company - Command Prompt, Inc.
>
> [ Attachment, skipping... ]
>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [pgsql-patches] [GENERAL] Corrupt database? 8.1/FreeBSD6.0
Date: 2007-02-05 16:05:42
Message-ID: 20070205160542.GA7196@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers pgsql-patches

Bruce Momjian wrote:
>
> What is the status of this patch?

This was applied:

2007-01-14 17:18 alvherre

* src/backend/postmaster/autovacuum.c:

Fix autovacuum to avoid leaving non-permanent Xids in non-connectable
databases.

Apply to the 8.1 branch only, as the new 8.2 (and HEAD) coding does not have
this problem.

> ---------------------------------------------------------------------------
>
> Alvaro Herrera wrote:
> > Tom Lane wrote:
> > > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >
> > > > Maybe we could forcibly activate the freeze mode on a template database?
> > >
> > > Might not be a bad idea. And even more to the point, forcibly disable
> > > analyze.
> >
> > Patch implementing this (albeit untested!) attached. I'll try to
> > reproduce the problem without the patch, and then test with the patch
> > applied.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.