Bug: Unreferenced temp tables disables vacuum to update xid

Lists: pgsql-hackers
From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-06 08:40:56
Message-ID: 20080106004056.7e7e2a2d@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I spent the better part of this evening tracking down a problem with a
high velocity database. The database had entered the point of no return
by invoking xidStopLimit.

This by itself isn't a problem because you just vacuum right? Well we
vacuumed... and the problem didn't resolve itself. It continued to
throw the warning:

FATAL: database is not accepting commands to avoid wraparound
data loss in database "foo"
HINT: Stop the postmaster and use a standalone backend to vacuum
database "foo".

And when in --single with postgres we would get:

2008-01-06 02:04:45 EST WARNING: database "foo" must be vacuumed
within 993712 transactions
2008-01-06 02:04:45 EST HINT: To avoid a database shutdown,
execute a full-database VACUUM in "foo".

We performed all the requisite queries to determine where the problem
was:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';

Everything returned ~ 50 mil

But:

SELECT datname, age(datfrozenxid) FROM pg_database;

Always returned ~ 2bil.

Even after two vacuums (one a vacuum and the other a vacuum analyze).

Anyway.. we tried a lot of different things, including adjusting
xidStopLimit so we could get back into interactive mode and have a
reasonable interface to work with...

The end result was that by chance we checked relkind = 't' instead of
'r' (Shout out to AndrewSN). And sure enough:

pg_toast_49013869 | 2146491285

And yes:

SELECT oid::regclass FROM pg_class WHERE
reltoastrelid='pg_toast.pg_toast_49013869'::regclass;

oid | pg_temp_24.tmp_isp_blk_chk

The hack to get this cleaned up was to connect about 2 dozen times (to
get to slot 24) with psql via different sessions and create temp
tables. Once we hit slot 24, the probably instantly went away and the
database returned to normal state.

May I humbly suggest that a:

* We need to check clean up unreferenced temp relations on startup and
remove them

* We need to change the docs for the following query:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';

To:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' OR
relkind = 't';

I apologize if this doesn't quite make sense. I am very tired but I
wanted to make sure to get this out on the list.

Sincerely,

Joshua D. Drake

- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHgJQaATb/zqfZUUQRAnKAAJ0fs0OahvGjlJq6fWrFZ67h1tY6qwCfcHmR
K0xOKL+JMAcPTQGbqR3qy1M=
=te9S
-----END PGP SIGNATURE-----


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-06 09:50:12
Message-ID: 4780A454.7020809@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> I spent the better part of this evening tracking down a problem with a
> high velocity database. The database had entered the point of no return
> by invoking xidStopLimit.

8.2.4

Joshua D. Drake


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 05:59:54
Message-ID: 4781BFDA.50105@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:

Ping?

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> I spent the better part of this evening tracking down a problem with a
> high velocity database. The database had entered the point of no return
> by invoking xidStopLimit.
>
> This by itself isn't a problem because you just vacuum right? Well we
> vacuumed... and the problem didn't resolve itself. It continued to
> throw the warning:
>
> FATAL: database is not accepting commands to avoid wraparound
> data loss in database "foo"
> HINT: Stop the postmaster and use a standalone backend to vacuum
> database "foo".
>
> And when in --single with postgres we would get:
>
> 2008-01-06 02:04:45 EST WARNING: database "foo" must be vacuumed
> within 993712 transactions
> 2008-01-06 02:04:45 EST HINT: To avoid a database shutdown,
> execute a full-database VACUUM in "foo".
>
> We performed all the requisite queries to determine where the problem
> was:
>
> SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
>
> Everything returned ~ 50 mil
>
> But:
>
> SELECT datname, age(datfrozenxid) FROM pg_database;
>
> Always returned ~ 2bil.
>
> Even after two vacuums (one a vacuum and the other a vacuum analyze).
>
> Anyway.. we tried a lot of different things, including adjusting
> xidStopLimit so we could get back into interactive mode and have a
> reasonable interface to work with...
>
> The end result was that by chance we checked relkind = 't' instead of
> 'r' (Shout out to AndrewSN). And sure enough:
>
> pg_toast_49013869 | 2146491285
>
> And yes:
>
> SELECT oid::regclass FROM pg_class WHERE
> reltoastrelid='pg_toast.pg_toast_49013869'::regclass;
>
> oid | pg_temp_24.tmp_isp_blk_chk
>
> The hack to get this cleaned up was to connect about 2 dozen times (to
> get to slot 24) with psql via different sessions and create temp
> tables. Once we hit slot 24, the probably instantly went away and the
> database returned to normal state.
>
> May I humbly suggest that a:
>
> * We need to check clean up unreferenced temp relations on startup and
> remove them
>
> * We need to change the docs for the following query:
>
> SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
>
> To:
>
> SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r' OR
> relkind = 't';
>
> I apologize if this doesn't quite make sense. I am very tired but I
> wanted to make sure to get this out on the list.
>
> Sincerely,
>
> Joshua D. Drake
>
> - --
> The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
> Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFHgJQaATb/zqfZUUQRAnKAAJ0fs0OahvGjlJq6fWrFZ67h1tY6qwCfcHmR
> K0xOKL+JMAcPTQGbqR3qy1M=
> =te9S
> -----END PGP SIGNATURE-----
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 12:18:24
Message-ID: 20080107121824.GA4723@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:

>> SELECT oid::regclass FROM pg_class WHERE
>> reltoastrelid='pg_toast.pg_toast_49013869'::regclass;
>>
>> oid | pg_temp_24.tmp_isp_blk_chk
>>
>> The hack to get this cleaned up was to connect about 2 dozen times (to
>> get to slot 24) with psql via different sessions and create temp
>> tables. Once we hit slot 24, the probably instantly went away and the
>> database returned to normal state.

Ah -- interesting. This is a known issue, but we haven't found a
solution yet.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 16:55:09
Message-ID: 20080107085509.77b21e9d@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 7 Jan 2008 09:18:24 -0300
Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> Joshua D. Drake wrote:
>
> >> SELECT oid::regclass FROM pg_class WHERE
> >> reltoastrelid='pg_toast.pg_toast_49013869'::regclass;
> >>
> >> oid | pg_temp_24.tmp_isp_blk_chk
> >>
> >> The hack to get this cleaned up was to connect about 2 dozen times
> >> (to get to slot 24) with psql via different sessions and create
> >> temp tables. Once we hit slot 24, the probably instantly went away
> >> and the database returned to normal state.
>
> Ah -- interesting. This is a known issue, but we haven't found a
> solution yet.
>

Is there bug number?

Sincerely,

Joshua D. Drake

- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHglltATb/zqfZUUQRAk19AJ9GywJ7ohqGZa4jrRYtufgbwCacowCgrgml
00egslWmlrI0MK2sJjyc63I=
=Y4Ok
-----END PGP SIGNATURE-----


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 16:58:29
Message-ID: 200801071658.m07GwTY04164@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> > >> SELECT oid::regclass FROM pg_class WHERE
> > >> reltoastrelid='pg_toast.pg_toast_49013869'::regclass;
> > >>
> > >> oid | pg_temp_24.tmp_isp_blk_chk
> > >>
> > >> The hack to get this cleaned up was to connect about 2 dozen times
> > >> (to get to slot 24) with psql via different sessions and create
> > >> temp tables. Once we hit slot 24, the probably instantly went away
> > >> and the database returned to normal state.
> >
> > Ah -- interesting. This is a known issue, but we haven't found a
> > solution yet.
> >
>
> Is there bug number?

I assume it is this TODO item:

o Prevent long-lived temporary tables from causing frozen-xid
advancement starvation

The problem is that autovacuum cannot vacuum them to set frozen xids;
only the session that created them can do that.
http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php

but am confused how the fix worked. Have all of these backends been
active for 1 billion transactions?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 17:10:24
Message-ID: 20080107091024.6f403cde@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 7 Jan 2008 11:58:29 -0500 (EST)
Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> > > Ah -- interesting. This is a known issue, but we haven't found a
> > > solution yet.
> > >
> >
> > Is there bug number?
>
> I assume it is this TODO item:
>
> o Prevent long-lived temporary tables from causing frozen-xid
> advancement starvation
>
> The problem is that autovacuum cannot vacuum them to set
> frozen xids; only the session that created them can do that.
> http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php
>
> but am confused how the fix worked. Have all of these backends been
> active for 1 billion transactions?

Well it certainly appears that the TODO item is related. However there
are a couple of differences.

1. I had to manually vacuum because we had already hid xidStoplimit.

2. Postgres has been restarted multiple times and it made zero
difference.

E.g; PostgreSQL isn't cleaning up after itself and it isn't apparent
when it happens.

Sincerely,

Joshua D. Drake

- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHgl0CATb/zqfZUUQRAtcnAKChLV9E4p7klYXRnVoEWKGsM+xv2QCgjmKB
JrBjOrL9i/4RcwXKMNk+z5I=
=6Gdf
-----END PGP SIGNATURE-----


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 18:28:49
Message-ID: 87myrh4hi6.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:

> On Mon, 7 Jan 2008 11:58:29 -0500 (EST)
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>> I assume it is this TODO item:
>>
>> o Prevent long-lived temporary tables from causing frozen-xid
>> advancement starvation
>>
>> The problem is that autovacuum cannot vacuum them to set
>> frozen xids; only the session that created them can do that.
>> http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php
>>
>> but am confused how the fix worked. Have all of these backends been
>> active for 1 billion transactions?
>
> Well it certainly appears that the TODO item is related. However there
> are a couple of differences.

Yeah, it seems there are two bugs here. 1) temporary tables prevent frozen-xid
advancement and 2) if a process dies at the wrong moment it's possible to
temporary tables. Either one alone is pretty minor but I guess the combination
is lethal.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!


From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 18:37:18
Message-ID: 200801071037.19483.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 07 January 2008 09:10:24 Joshua D. Drake wrote:
> On Mon, 7 Jan 2008 11:58:29 -0500 (EST)
>
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > > Ah -- interesting. This is a known issue, but we haven't found a
> > > > solution yet.
> > >
> > > Is there bug number?
> >
> > I assume it is this TODO item:
> >
> > o Prevent long-lived temporary tables from causing frozen-xid
> > advancement starvation
> >
> > The problem is that autovacuum cannot vacuum them to set
> > frozen xids; only the session that created them can do that.
> >
> > http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php
> >
> > but am confused how the fix worked. Have all of these backends been
> > active for 1 billion transactions?
>
> Well it certainly appears that the TODO item is related. However there
> are a couple of differences.
>
> 1. I had to manually vacuum because we had already hid xidStoplimit.
>
> 2. Postgres has been restarted multiple times and it made zero
> difference.
>
> E.g; PostgreSQL isn't cleaning up after itself and it isn't apparent
> when it happens.

After a fresh start of postgres, there should be no temp tables, so would a
work around to this at least be at postmaster start to (for a lack of a
better pseudo code ) DROP SCHEMA pg_temp* CASCADE; before coming up in
interactive mode? Doing this would at least have allowedthe manual vacuum to
do what it needed and not have caused confusion on the part of the user?
Also it would have greatly reduced the total time to resolution, and not
requiring hacking the backend to get there.

>
> Sincerely,
>
> Joshua D. Drake

--
Darcy Buskermolen
Command Prompt, Inc.
+1.503.667.4564 X 102
http://www.commandprompt.com/
PostgreSQL solutions since 1997


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 18:37:38
Message-ID: 87ir254h3h.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Gregory Stark" <stark(at)enterprisedb(dot)com> writes:

> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>
>> On Mon, 7 Jan 2008 11:58:29 -0500 (EST)
>> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>
>>> I assume it is this TODO item:
>>>
>>> o Prevent long-lived temporary tables from causing frozen-xid
>>> advancement starvation
>>>
>>> The problem is that autovacuum cannot vacuum them to set
>>> frozen xids; only the session that created them can do that.
>>> http://archives.postgresql.org/pgsql-general/2007-06/msg01645.php
>>>
>>> but am confused how the fix worked. Have all of these backends been
>>> active for 1 billion transactions?
>>
>> Well it certainly appears that the TODO item is related. However there
>> are a couple of differences.
>
> Yeah, it seems there are two bugs here. 1) temporary tables prevent frozen-xid
> advancement and 2) if a process dies at the wrong moment it's possible to
> temporary tables. Either one alone is pretty minor but I guess the combination
> is lethal.

oops, "2) if a process dies at the wrong moment it's possible to *leak*
temporary tables"

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 18:48:58
Message-ID: 20080107104858.1bf01a70@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Mon, 7 Jan 2008 10:37:18 -0800
Darcy Buskermolen <darcyb(at)commandprompt(dot)com> wrote:

> > 1. I had to manually vacuum because we had already hid xidStoplimit.
> >
> > 2. Postgres has been restarted multiple times and it made zero
> > difference.
> >
> > E.g; PostgreSQL isn't cleaning up after itself and it isn't apparent
> > when it happens.
>
> After a fresh start of postgres, there should be no temp tables, so
> would a work around to this at least be at postmaster start to (for a
> lack of a better pseudo code ) DROP SCHEMA pg_temp* CASCADE; before
> coming up in interactive mode? Doing this would at least have
> allowedthe manual vacuum to do what it needed and not have caused
> confusion on the part of the user? Also it would have greatly reduced
> the total time to resolution, and not requiring hacking the backend
> to get there.
>

+1

Joshua D. Drake

- --
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/
Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHgnQaATb/zqfZUUQRAt1qAJ4hzeNG8fzA4l5y/luNrg3eGOz5QQCfcvtZ
xMuLPQSEbvG+AYfTRkEyLD0=
=+Lkk
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 20:31:15
Message-ID: 24875.1199737875@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> After a fresh start of postgres, there should be no temp tables, so would a
> work around to this at least be at postmaster start to (for a lack of a
> better pseudo code ) DROP SCHEMA pg_temp* CASCADE; before coming up in
> interactive mode?

The argument against this is the same as not wiping out
apparently-unreferenced regular tables: automatically destroying the
evidence after a crash is someday going to bite you. Admittedly,
this argument is a bit weaker for temp tables than it is for regular
tables, but that only goes to the question of whether the data is
valuable on its own terms, not whether it might be valuable for crash
analysis.

The real question that Josh's report brings up to me is why the heck was
there an orphaned temp table? Especially if it was only a toast table
and not the linked "regular" temp table? Something happened there that
should not have.

regards, tom lane


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-07 23:13:24
Message-ID: slrnfo5cgk.f1c.andrew+nonews@atlantis.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2008-01-07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The real question that Josh's report brings up to me is why the heck was
> there an orphaned temp table? Especially if it was only a toast table
> and not the linked "regular" temp table? Something happened there that
> should not have.

The regular table was there too, but the regular table's relfrozenxid
was apparently recent, only the toast table's was old:

> pg_toast_49013869 | 2146491285
[...]
> SELECT oid::regclass FROM pg_class WHERE
> reltoastrelid='pg_toast.pg_toast_49013869'::regclass;
>
> oid | pg_temp_24.tmp_isp_blk_chk

The regular table had not shown up on a query of age(relfrozenxid) WHERE
relkind='r' but the toast table showed up on a similar query with WHERE
relkind='t'.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andrew(at)supernews(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-08 00:25:04
Message-ID: 27971.1199751904@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
> On 2008-01-07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The real question that Josh's report brings up to me is why the heck was
>> there an orphaned temp table? Especially if it was only a toast table
>> and not the linked "regular" temp table? Something happened there that
>> should not have.

> The regular table was there too, but the regular table's relfrozenxid
> was apparently recent, only the toast table's was old:

Hmm, that's even more odd, since AFAICS vacuum will always vacuum a
toast table immediately after vacuuming the parent. I wonder whether
we have a bug somewhere that allows a toast table's relfrozenxid to
get initially set to something substantially different from the
parent's.

(BTW, if the parent table *was* there then Josh hardly needed any fancy
jujitsu to clear the problem -- "drop table pg_temp_24.tmp_isp_blk_chk"
as a superuser should've worked. I wouldn't try this if the originating
backend were still around, but if it's not then there's not going to be
anything all that special about the temp table.)

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-08 01:07:32
Message-ID: 20080108010732.GA1828@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
> > On 2008-01-07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The real question that Josh's report brings up to me is why the heck was
> >> there an orphaned temp table? Especially if it was only a toast table
> >> and not the linked "regular" temp table? Something happened there that
> >> should not have.
>
> > The regular table was there too, but the regular table's relfrozenxid
> > was apparently recent, only the toast table's was old:
>
> Hmm, that's even more odd, since AFAICS vacuum will always vacuum a
> toast table immediately after vacuuming the parent. I wonder whether
> we have a bug somewhere that allows a toast table's relfrozenxid to
> get initially set to something substantially different from the
> parent's.

Hmm ... that would be strange. Off-the-cuff idea: we introduced code to
advance relfrozenxid in CLUSTER, TRUNCATE and table-rewriting forms of
ALTER TABLE. Perhaps the problem is that we're neglecting to update it
for the toast table there. AFAIR I analyzed the cases and they were all
handled, but perhaps I forgot something.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-08 02:21:25
Message-ID: 4782DE25.2040809@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Tom Lane wrote:
>> Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
>>> On 2008-01-07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> The real question that Josh's report brings up to me is why the heck was
>>>> there an orphaned temp table? Especially if it was only a toast table
>>>> and not the linked "regular" temp table? Something happened there that
>>>> should not have.
>>> The regular table was there too, but the regular table's relfrozenxid
>>> was apparently recent, only the toast table's was old:
>> Hmm, that's even more odd, since AFAICS vacuum will always vacuum a
>> toast table immediately after vacuuming the parent. I wonder whether
>> we have a bug somewhere that allows a toast table's relfrozenxid to
>> get initially set to something substantially different from the
>> parent's.
>
> Hmm ... that would be strange. Off-the-cuff idea: we introduced code to
> advance relfrozenxid in CLUSTER, TRUNCATE and table-rewriting forms of
> ALTER TABLE. Perhaps the problem is that we're neglecting to update it
> for the toast table there. AFAIR I analyzed the cases and they were all
> handled, but perhaps I forgot something.

Just to throw another variable into the mix. This machine was a PITR
slave that was pushed into production about two weeks ago.

Joshua D. Drake


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-08 02:28:21
Message-ID: 20080108022821.GD1828@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> Alvaro Herrera wrote:

>> Hmm ... that would be strange. Off-the-cuff idea: we introduced code to
>> advance relfrozenxid in CLUSTER, TRUNCATE and table-rewriting forms of
>> ALTER TABLE. Perhaps the problem is that we're neglecting to update it
>> for the toast table there. AFAIR I analyzed the cases and they were all
>> handled, but perhaps I forgot something.
>
> Just to throw another variable into the mix. This machine was a PITR slave
> that was pushed into production about two weeks ago.

Ah, right, I bet we have a smoking gun here.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-14 23:16:08
Message-ID: 8131.1200352568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm ... that would be strange. Off-the-cuff idea: we introduced code to
> advance relfrozenxid in CLUSTER, TRUNCATE and table-rewriting forms of
> ALTER TABLE. Perhaps the problem is that we're neglecting to update it
> for the toast table there. AFAIR I analyzed the cases and they were all
> handled, but perhaps I forgot something.

I found a smoking gun ...

regression=# create table foo (f1 serial primary key, f2 text);
NOTICE: CREATE TABLE will create implicit sequence "foo_f1_seq" for serial column "foo.f1"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
CREATE TABLE
regression=# insert into foo values(1,repeat('xyzzy',100000));
INSERT 0 1
regression=# insert into foo values(2,repeat('xqzzy',100000));
INSERT 0 1
regression=# select relname, relkind, relfrozenxid from pg_class order by oid desc limit 6;
relname | relkind | relfrozenxid
-----------------------+---------+--------------
foo_pkey | i | 0
pg_toast_707220_index | i | 0
pg_toast_707220 | t | 119421
foo | r | 119421
foo_f1_seq | S | 0
xmlview5 | v | 0
(6 rows)

regression=# cluster foo_pkey on foo;
CLUSTER
regression=# select relname, relkind, relfrozenxid from pg_class order by oid desc limit 6;
relname | relkind | relfrozenxid
-----------------------+---------+--------------
pg_toast_707231_index | i | 0
pg_toast_707231 | t | 119424
foo_pkey | i | 0
foo | r | 4195086720
foo_f1_seq | S | 0
xmlview5 | v | 0
(6 rows)

So something is out of whack in CLUSTER. However it only seems to be
broken in HEAD, so I'm not sure this helps to explain the original
report. (Speculation: this is related to the rewrite to make CLUSTER
MVCC-safe?)

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-14 23:38:02
Message-ID: 20080114233802.GB31680@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> regression=# cluster foo_pkey on foo;
> CLUSTER
> regression=# select relname, relkind, relfrozenxid from pg_class order by oid desc limit 6;
> relname | relkind | relfrozenxid
> -----------------------+---------+--------------
> pg_toast_707231_index | i | 0
> pg_toast_707231 | t | 119424
> foo_pkey | i | 0
> foo | r | 4195086720
> foo_f1_seq | S | 0
> xmlview5 | v | 0
> (6 rows)
>
> So something is out of whack in CLUSTER. However it only seems to be
> broken in HEAD, so I'm not sure this helps to explain the original
> report. (Speculation: this is related to the rewrite to make CLUSTER
> MVCC-safe?)

Right ... see copy_heap_data --- it sets FreezeXid as relfrozenxid.

If we were to scan each tuple as it is inserted, we could store a higher
relfrozenxid, but I doubt we want to do that.

Perhaps what we could do is take the relfrozenxid from the old relation
and copy it over, if it's later than FreezeXid?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug: Unreferenced temp tables disables vacuum to update xid
Date: 2008-01-14 23:48:19
Message-ID: 8672.1200354499@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Perhaps what we could do is take the relfrozenxid from the old relation
> and copy it over, if it's later than FreezeXid?

It certainly doesn't seem to make any sense to allow the rel's
relfrozenxid to go backwards. Indeed this coding lets it end up less
than the DB's datfrozenxid, which is certainly inappropriate.

What might be the best idea is to advance FreezeXid to the old
relfrozenxid between the vacuum_set_xid_limits and begin_heap_rewrite
calls. Then we'd be quite certain we are not lying: anything older
than that did indeed get frozen.

regards, tom lane