Re: [GENERAL] pg_upgrade problem

Lists: pgsql-generalpgsql-hackers
From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: pg_upgrade problem
Date: 2011-08-25 19:57:58
Message-ID: 20110825195757.GA28394@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hi

I have 8.3.11 database, ~ 600GB in size.

I want to upgrade it to 9.0.

First, I tried with 9.0.4, and when I hit problem (the same) I tried
git, head of 9.0 branch.

So. I did pg_upgrade with -c, and it looked like this:

$ time pg_upgrade -c -v -b /opt/pgsql-8.3.11-int/bin/ -B /opt/pgsql-9.0.5a-int/bin/ -d /var/postgresql/6666/ -D /var/postgresql/6666-9.0 -k -l pg_upgrade.log -p 6666 -P 4329
Running in verbose mode
Performing Consistency Checks
-----------------------------
Checking old data directory (/var/postgresql/6666) ok
Checking old bin directory (/opt/pgsql-8.3.11-int/bin) ok
Checking new data directory (/var/postgresql/6666-9.0) ok
Checking new bin directory (/opt/pgsql-9.0.5a-int/bin) ok
"/opt/pgsql-8.3.11-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666" -o "-p 6666 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000" start >> "pg_upgrade.log" 2>&1
Checking for reg* system oid user data types ok
Checking for /contrib/isn with bigint-passing mismatch ok
Checking for invalid 'name' user columns ok
Checking for tsquery user columns ok
Checking for tsvector user columns ok
Checking for hash and gin indexes warning

| Your installation contains hash and/or gin
| indexes. These indexes have different
| internal formats between your old and new
| clusters so they must be reindexed with the
| REINDEX command. After migration, you will
| be given REINDEX instructions.

Checking for bpchar_pattern_ops indexes ok
Checking for large objects ok
"/opt/pgsql-8.3.11-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666" stop >> "pg_upgrade.log" 2>&1
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" -o "-p 4329 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000" start >> "pg_upgrade.log" 2>&1
Checking for presence of required libraries ok

*Clusters are compatible*
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" stop >> "pg_upgrade.log" 2>&1

real 0m6.417s
user 0m0.040s
sys 0m0.060s

All looks ok. So I ran the upgrade without -c:

$ time pg_upgrade -v -b /opt/pgsql-8.3.11-int/bin/ -B /opt/pgsql-9.0.5a-int/bin/ -d /var/postgresql/6666/ -D /var/postgresql/6666-9.0 -k -l pg_upgrade.log -p 6666 -P 4329
Running in verbose mode
Performing Consistency Checks
-----------------------------
Checking old data directory (/var/postgresql/6666) ok
Checking old bin directory (/opt/pgsql-8.3.11-int/bin) ok
Checking new data directory (/var/postgresql/6666-9.0) ok
Checking new bin directory (/opt/pgsql-9.0.5a-int/bin) ok
"/opt/pgsql-8.3.11-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666" -o "-p 6666 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000" start >> "pg_upgrade.log" 2>&1
Checking for reg* system oid user data types ok
Checking for /contrib/isn with bigint-passing mismatch ok
Checking for invalid 'name' user columns ok
Checking for tsquery user columns ok
Creating script to adjust sequences ok
Checking for large objects ok
Creating catalog dump "/opt/pgsql-9.0.5a-int/bin/pg_dumpall" --port 6666 --username "postgres" --schema-only --binary-upgrade > "/var/postgresql/pg_upgrade_dump_all.sql"
ok
"/opt/pgsql-8.3.11-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666" stop >> "pg_upgrade.log" 2>&1
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" -o "-p 4329 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000" start >> "pg_upgrade.log" 2>&1
Checking for presence of required libraries ok

| If pg_upgrade fails after this point, you must
| re-initdb the new cluster before continuing.
| You will also need to remove the ".old" suffix
| from /var/postgresql/6666/global/pg_control.old.

Performing Migration
--------------------
Adding ".old" suffix to old global/pg_control ok
Analyzing all rows in the new cluster "/opt/pgsql-9.0.5a-int/bin/vacuumdb" --port 4329 --username "postgres" --all --analyze >> "pg_upgrade.log" 2>&1
ok
Freezing all rows on the new cluster "/opt/pgsql-9.0.5a-int/bin/vacuumdb" --port 4329 --username "postgres" --all --freeze >> "pg_upgrade.log" 2>&1
ok
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" stop >> "pg_upgrade.log" 2>&1
Deleting new commit clogs ok
Copying old commit clogs to new server cp -Rf "/var/postgresql/6666/pg_clog" "/var/postgresql/6666-9.0/pg_clog"
ok
Setting next transaction id for new cluster "/opt/pgsql-9.0.5a-int/bin/pg_resetxlog" -f -x 3673553615 "/var/postgresql/6666-9.0" > /dev/null
ok
Resetting WAL archives "/opt/pgsql-9.0.5a-int/bin/pg_resetxlog" -l 1,26478,133 "/var/postgresql/6666-9.0" >> "pg_upgrade.log" 2>&1
ok
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" -o "-p 4329 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000" start >> "pg_upgrade.log" 2>&1
Setting frozenxid counters in new cluster ok
Creating databases in the new cluster "/opt/pgsql-9.0.5a-int/bin/psql" --set ON_ERROR_STOP=on --no-psqlrc --port 4329 --username "postgres" -f "/var/postgresql/pg_upgrade_dump_globals.sql" --dbname template1 >> "pg_upgrade.log"
psql:/var/postgresql/pg_upgrade_dump_globals.sql:26: NOTICE: schema "check_postgres" does not exist
psql:/var/postgresql/pg_upgrade_dump_globals.sql:26: NOTICE: schema "contrib" does not exist
psql:/var/postgresql/pg_upgrade_dump_globals.sql:57: NOTICE: schema "check_postgres" does not exist
psql:/var/postgresql/pg_upgrade_dump_globals.sql:57: NOTICE: schema "ltree" does not exist
psql:/var/postgresql/pg_upgrade_dump_globals.sql:57: NOTICE: schema "pgcrypto" does not exist
ok
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" stop >> "pg_upgrade.log" 2>&1
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" -o "-p 4329 -c autovacuum=off -c autovacuum_freeze_max_age=2000000000" start >> "pg_upgrade.log" 2>&1
Adding support functions to new cluster ok
Restoring database schema to new cluster "/opt/pgsql-9.0.5a-int/bin/psql" --set ON_ERROR_STOP=on --no-psqlrc --port 4329 --username "postgres" -f "/var/postgresql/pg_upgrade_dump_db.sql" --dbname template1 >> "pg_upgrade.log"
ok
Removing support functions from new cluster ok
"/opt/pgsql-9.0.5a-int/bin/pg_ctl" -l "pg_upgrade.log" -D "/var/postgresql/6666-9.0" stop >> "pg_upgrade.log" 2>&1
Restoring user relation files
/var/postgresql/6666/base/113953649/2613 linking /var/postgresql/6666/base/113953649/2613 to /var/postgresql/6666-9.0/base/11826/11790
/var/postgresql/6666/base/113953649/2683 linking /var/postgresql/6666/base/113953649/2683 to /var/postgresql/6666-9.0/base/11826/11792

Could not find 71637071 in old cluster

real 0m53.065s
user 0m0.520s
sys 0m0.870s

What can be wrong? How can I fix it?

I don't care about current instance - it was just a test, but I need to
know how to make the upgrade actually work.

I did grep in generated log files for this value - 71637071, and found:

$ grep -C3 71637071 pg_upgrade*
pg_upgrade_dump_all.sql-
pg_upgrade_dump_all.sql--- For binary upgrade, must preserve relfilenodes
pg_upgrade_dump_all.sql-SELECT binary_upgrade.set_next_heap_relfilenode('71637068'::pg_catalog.oid);
pg_upgrade_dump_all.sql:SELECT binary_upgrade.set_next_toast_relfilenode('71637071'::pg_catalog.oid);
pg_upgrade_dump_all.sql-SELECT binary_upgrade.set_next_index_relfilenode('71637073'::pg_catalog.oid);
pg_upgrade_dump_all.sql-
pg_upgrade_dump_all.sql-CREATE TABLE actions (
--
pg_upgrade_dump_db.sql-
pg_upgrade_dump_db.sql--- For binary upgrade, must preserve relfilenodes
pg_upgrade_dump_db.sql-SELECT binary_upgrade.set_next_heap_relfilenode('71637068'::pg_catalog.oid);
pg_upgrade_dump_db.sql:SELECT binary_upgrade.set_next_toast_relfilenode('71637071'::pg_catalog.oid);
pg_upgrade_dump_db.sql-SELECT binary_upgrade.set_next_index_relfilenode('71637073'::pg_catalog.oid);
pg_upgrade_dump_db.sql-
pg_upgrade_dump_db.sql-CREATE TABLE actions (
--
pg_upgrade.log-linking /var/postgresql/6666/base/113953649/2613 to /var/postgresql/6666-9.0/base/11826/11790
pg_upgrade.log- /var/postgresql/6666/base/113953649/2683
pg_upgrade.log-linking /var/postgresql/6666/base/113953649/2683 to /var/postgresql/6666-9.0/base/11826/11792
pg_upgrade.log:Could not find 71637071 in old cluster

One more thing - one of earlier tests actually worked through
pg_upgrade, but when running vacuumdb -az on newly started 9.0.4, I got
error about missing transaction/clog - don't remember exactly what it
was, though.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-25 20:33:07
Message-ID: 201108252033.p7PKX7629929@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> hi
>
> I have 8.3.11 database, ~ 600GB in size.
>
> I want to upgrade it to 9.0.
>
> First, I tried with 9.0.4, and when I hit problem (the same) I tried
> git, head of 9.0 branch.

Good.

> pg_upgrade_dump_db.sql-
> pg_upgrade_dump_db.sql--- For binary upgrade, must preserve relfilenodes
> pg_upgrade_dump_db.sql-SELECT binary_upgrade.set_next_heap_relfilenode('71637068'::pg_catalog.oid);
> pg_upgrade_dump_db.sql:SELECT binary_upgrade.set_next_toast_relfilenode('71637071'::pg_catalog.oid);
> pg_upgrade_dump_db.sql-SELECT binary_upgrade.set_next_index_relfilenode('71637073'::pg_catalog.oid);
> pg_upgrade_dump_db.sql-
> pg_upgrade_dump_db.sql-CREATE TABLE actions (
> --
> pg_upgrade.log-linking /var/postgresql/6666/base/113953649/2613 to /var/postgresql/6666-9.0/base/11826/11790
> pg_upgrade.log- /var/postgresql/6666/base/113953649/2683
> pg_upgrade.log-linking /var/postgresql/6666/base/113953649/2683 to /var/postgresql/6666-9.0/base/11826/11792
> pg_upgrade.log:Could not find 71637071 in old cluster

The problem appears to be that the Postgres catalogs think there is a
toast table for 'actions', while the file system doesn't seem to have
such a file. I can you look in pg_class and verify that?

SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';

Then look in the file system to see if there is a matching file.

> One more thing - one of earlier tests actually worked through
> pg_upgrade, but when running vacuumdb -az on newly started 9.0.4, I got
> error about missing transaction/clog - don't remember exactly what it
> was, though.

THere was a bug in how how pg_upgrade worked in pre-9.0.4 --- could it
have been that?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-25 20:36:41
Message-ID: 20110825203641.GA19537@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Aug 25, 2011 at 04:33:07PM -0400, Bruce Momjian wrote:
> The problem appears to be that the Postgres catalogs think there is a
> toast table for 'actions', while the file system doesn't seem to have
> such a file. I can you look in pg_class and verify that?
>
> SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';

$ SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';
reltoastrelid
---------------
(0 rows)

This is done not on the pg from backup, but on normal production, as the test
pg instance doesn't work anymore.

I can re-set the test instance, but extracting from backup, and making it apply
all xlogs usually takes 2-3 days.

> > One more thing - one of earlier tests actually worked through
> > pg_upgrade, but when running vacuumdb -az on newly started 9.0.4, I got
> > error about missing transaction/clog - don't remember exactly what it
> > was, though.
>
> THere was a bug in how how pg_upgrade worked in pre-9.0.4 --- could it
> have been that?

It was done definitely using 9.0.4.

Best regards,

depesz


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-25 20:43:02
Message-ID: 201108252043.p7PKh2g03510@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Thu, Aug 25, 2011 at 04:33:07PM -0400, Bruce Momjian wrote:
> > The problem appears to be that the Postgres catalogs think there is a
> > toast table for 'actions', while the file system doesn't seem to have
> > such a file. I can you look in pg_class and verify that?
> >
> > SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';
>
> $ SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';
> reltoastrelid
> ---------------
> (0 rows)
>
> This is done not on the pg from backup, but on normal production, as the test
> pg instance doesn't work anymore.
>
> I can re-set the test instance, but extracting from backup, and making it apply
> all xlogs usually takes 2-3 days.

If you remove the .old extension on pg_control, you can start the old
cluster and check it. This is explained by pg_upgrade output:

| If pg_upgrade fails after this point, you must
| re-initdb the new cluster before continuing.
| You will also need to remove the ".old" suffix
| from /var/postgresql/6666/global/pg_control.old.

Please check the old cluster.

> > > One more thing - one of earlier tests actually worked through
> > > pg_upgrade, but when running vacuumdb -az on newly started 9.0.4, I got
> > > error about missing transaction/clog - don't remember exactly what it
> > > was, though.
> >
> > THere was a bug in how how pg_upgrade worked in pre-9.0.4 --- could it
> > have been that?
>
> It was done definitely using 9.0.4.

Good.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-25 20:53:18
Message-ID: 20110825205318.GA25225@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Aug 25, 2011 at 04:43:02PM -0400, Bruce Momjian wrote:
> Please check the old cluster.

Sure:

=# SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';
reltoastrelid
---------------
82510395
71637071
(2 rows)

=# SELECT oid::regclass, reltoastrelid FROM pg_class WHERE relname = 'actions';
oid | reltoastrelid
---------------+---------------
xxxxx.actions | 82510395
yyyyy.actions | 71637071
(2 rows)

=# select oid, relfilenode from pg_class where oid in (SELECT reltoastrelid FROM pg_class WHERE relname = 'actions');
oid | relfilenode
----------+-------------
82510395 | 82510395
71637071 | 71637071
(2 rows)

=# select oid from pg_database where datname = current_database();
oid
----------
71635381
(1 row)

$ ls -l 6666/base/71635381/{71637071,82510395}
-rw------- 1 postgres postgres 0 2009-10-12 06:49 6666/base/71635381/71637071
-rw------- 1 postgres postgres 0 2010-08-19 14:02 6666/base/71635381/82510395

> > > > One more thing - one of earlier tests actually worked through
> > > > pg_upgrade, but when running vacuumdb -az on newly started 9.0.4, I got
> > > > error about missing transaction/clog - don't remember exactly what it
> > > > was, though.
> > > THere was a bug in how how pg_upgrade worked in pre-9.0.4 --- could it
> > > have been that?
> > It was done definitely using 9.0.4.
> Good.

Not sure if it's good, since it was after the clog error was fixed, and
I still got it :/

but anyway - the problem with 71637071 is more important now.

Best regards,

depesz


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-26 04:18:55
Message-ID: 201108260418.p7Q4Itk17514@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


OK, this was very helpful. I found out that there is a bug in current
9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
tables. (The bug is not in any released version of pg_upgrade.) The
attached, applied patches should fix it for you. I assume you are
running 9.0.X, and not 9.0.4.

---------------------------------------------------------------------------

hubert depesz lubaczewski wrote:
> On Thu, Aug 25, 2011 at 04:43:02PM -0400, Bruce Momjian wrote:
> > Please check the old cluster.
>
> Sure:
>
> =# SELECT reltoastrelid FROM pg_class WHERE relname = 'actions';
> reltoastrelid
> ---------------
> 82510395
> 71637071
> (2 rows)
>
> =# SELECT oid::regclass, reltoastrelid FROM pg_class WHERE relname = 'actions';
> oid | reltoastrelid
> ---------------+---------------
> xxxxx.actions | 82510395
> yyyyy.actions | 71637071
> (2 rows)
>
> =# select oid, relfilenode from pg_class where oid in (SELECT reltoastrelid FROM pg_class WHERE relname = 'actions');
> oid | relfilenode
> ----------+-------------
> 82510395 | 82510395
> 71637071 | 71637071
> (2 rows)
>
> =# select oid from pg_database where datname = current_database();
> oid
> ----------
> 71635381
> (1 row)
>
> $ ls -l 6666/base/71635381/{71637071,82510395}
> -rw------- 1 postgres postgres 0 2009-10-12 06:49 6666/base/71635381/71637071
> -rw------- 1 postgres postgres 0 2010-08-19 14:02 6666/base/71635381/82510395
>
> > > > > One more thing - one of earlier tests actually worked through
> > > > > pg_upgrade, but when running vacuumdb -az on newly started 9.0.4, I got
> > > > > error about missing transaction/clog - don't remember exactly what it
> > > > > was, though.
> > > > THere was a bug in how how pg_upgrade worked in pre-9.0.4 --- could it
> > > > have been that?
> > > It was done definitely using 9.0.4.
> > Good.
>
> Not sure if it's good, since it was after the clog error was fixed, and
> I still got it :/
>
> but anyway - the problem with 71637071 is more important now.
>
> Best regards,
>
> depesz

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/pg_upgrade.9.0 text/x-diff 5.7 KB
/rtmp/pg_upgrade.9.1 text/x-diff 5.0 KB

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-26 15:28:35
Message-ID: 20110826152834.GA7886@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
>
> OK, this was very helpful. I found out that there is a bug in current
> 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> tables. (The bug is not in any released version of pg_upgrade.) The
> attached, applied patches should fix it for you. I assume you are
> running 9.0.X, and not 9.0.4.

pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.

will keep you posted.

Best regards,

depesz


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-29 16:54:41
Message-ID: 20110829165441.GA28573@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 26, 2011 at 05:28:35PM +0200, hubert depesz lubaczewski wrote:
> On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> >
> > OK, this was very helpful. I found out that there is a bug in current
> > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > tables. (The bug is not in any released version of pg_upgrade.) The
> > attached, applied patches should fix it for you. I assume you are
> > running 9.0.X, and not 9.0.4.
>
> pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.

vacuumdb failed. The fail looks very similar to the one I had on 9.0.4.

After long vacuum I got:
INFO: vacuuming "pg_toast.pg_toast_106668498"
vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.

Unfortunately at the moment, I no longer have the old (8.3) setup, but I do
have the 9.0.X and will be happy to provide any info you might need to help me
debug/fix the problem.

Best regards,

depesz


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-29 17:49:24
Message-ID: 20110829174924.GA14101@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> On Fri, Aug 26, 2011 at 05:28:35PM +0200, hubert depesz lubaczewski wrote:
> > On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> > >
> > > OK, this was very helpful. I found out that there is a bug in current
> > > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > > tables. (The bug is not in any released version of pg_upgrade.) The
> > > attached, applied patches should fix it for you. I assume you are
> > > running 9.0.X, and not 9.0.4.
> >
> > pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
>
> vacuumdb failed. The fail looks very similar to the one I had on 9.0.4.
>
> After long vacuum I got:
> INFO: vacuuming "pg_toast.pg_toast_106668498"
> vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
>
> Unfortunately at the moment, I no longer have the old (8.3) setup, but I do
> have the 9.0.X and will be happy to provide any info you might need to help me
> debug/fix the problem.

this pg_toast is related to table "transactions", which was vacuumed
like this:

INFO: vacuuming "public.transactions"
INFO: index "transaction_id_pkey" now contains 50141303 row versions in 144437 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.08s/0.13u sec elapsed 173.04 sec.
INFO: index "transactions_creation_tsz_idx" now contains 50141303 row versions in 162634 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.19s/0.23u sec elapsed 77.45 sec.
INFO: index "fki_transactions_xxxxxxxxxx_fkey" now contains 50141303 row versions in 163466 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.13s/0.29u sec elapsed 65.45 sec.
INFO: index "fki_transactions_xxxxxxxx_fkey" now contains 50141303 row versions in 146528 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 1.15s/0.24u sec elapsed 50.28 sec.
INFO: index "fki_transactions_xxxxxxxxxxxxx_fkey" now contains 50141303 row versions in 190914 pages
DETAIL: 0 index row versions were removed.
5 index pages have been deleted, 0 are currently reusable.
CPU 1.49s/0.17u sec elapsed 67.95 sec.
INFO: index "transactions_xxxxxxxxxxxxxxxxxxxxxxxxxx_id" now contains 50141303 row versions in 164669 pages
DETAIL: 0 index row versions were removed.
2 index pages have been deleted, 0 are currently reusable.
CPU 1.36s/0.18u sec elapsed 62.83 sec.
INFO: "transactions": found 0 removable, 39644831 nonremovable row versions in 5978240 out of 7312036 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 8209452 unused item pointers.
0 pages are entirely empty.
CPU 75.75s/18.57u sec elapsed 9268.19 sec.
INFO: vacuuming "pg_toast.pg_toast_106668498"
vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.

Interestingly.

In old dir there is pg_clog directory with files:
0AC0 .. 0DAF (including 0CC6, size 262144)
but new pg_clog has only:
0D2F .. 0DB0

File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
3c5122f3e80851735c19522065a2d12a 0DAF
8651fc2b9fa3d27cfb5b496165cead68 0DB0

0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd

one more thing. I did select count(*) from transactions and it worked.

that's about it. I can probably copy over files from old datadir to new (in
pg_clog/), and will be happy to do it, but I'll wait for your call - retry with
copies files might destroy some evidence.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: depesz <depesz(at)depesz(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-29 18:41:38
Message-ID: 1314643001-sup-6186@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Excerpts from hubert depesz lubaczewski's message of lun ago 29 14:49:24 -0300 2011:
> On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > On Fri, Aug 26, 2011 at 05:28:35PM +0200, hubert depesz lubaczewski wrote:
> > > On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> > > >
> > > > OK, this was very helpful. I found out that there is a bug in current
> > > > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > > > tables. (The bug is not in any released version of pg_upgrade.) The
> > > > attached, applied patches should fix it for you. I assume you are
> > > > running 9.0.X, and not 9.0.4.
> > >
> > > pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
> >
> > vacuumdb failed. The fail looks very similar to the one I had on 9.0.4.
> >
> > After long vacuum I got:
> > INFO: vacuuming "pg_toast.pg_toast_106668498"
> > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.

I don't understand the pg_upgrade code here. It is setting the
datfrozenxid and relfrozenxid values to the latest checkpoint's NextXID,

/* set pg_class.relfrozenxid */
PQclear(executeQueryOrDie(conn,
"UPDATE pg_catalog.pg_class "
"SET relfrozenxid = '%u' "
/* only heap and TOAST are vacuumed */
"WHERE relkind IN ('r', 't')",
old_cluster.controldata.chkpnt_nxtxid));

but I don't see why this is safe. I mean, surely the previous
vacuum might have been a lot earlier than that. Are these values reset
to more correct values (i.e. older ones) later somehow? My question is,
why isn't the new cluster completely screwed?

I wonder if pg_upgrade shouldn't be doing the conservative thing here,
which AFAICT would be to set all frozenxid values as furthest in the
past as possible (without causing a shutdown-due-to-wraparound, and
maybe without causing autovacuum to enter emergency mode either).

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: daveg <daveg(at)sonic(dot)net>
To: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-29 22:23:31
Message-ID: 20110829222331.GB10597@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Aug 29, 2011 at 07:49:24PM +0200, hubert depesz lubaczewski wrote:
> On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
>
> Interestingly.
>
> In old dir there is pg_clog directory with files:
> 0AC0 .. 0DAF (including 0CC6, size 262144)
> but new pg_clog has only:
> 0D2F .. 0DB0
>
> File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
> 3c5122f3e80851735c19522065a2d12a 0DAF
> 8651fc2b9fa3d27cfb5b496165cead68 0DB0
>
> 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
>
> one more thing. I did select count(*) from transactions and it worked.
>
> that's about it. I can probably copy over files from old datadir to new (in
> pg_clog/), and will be happy to do it, but I'll wait for your call - retry with
> copies files might destroy some evidence.

I had this same thing happen this Saturday just past and my client had to
restore the whole 2+ TB instance from the previous days pg_dumps.
I had been thinking that perhaps I did something wrong in setting up or
running the upgrade, but had not found it yet. Now that I see Hubert has
the same problem it is starting to look like pg_upgrade can eat all your
data.

After running pg_upgrade apparently successfully and analyzeing all the
tables we restarted the production workload and started getting errors:

2011-08-27 04:18:34.015 12337 c06 postgres ERROR: could not access status of transaction 2923961093
2011-08-27 04:18:34.015 12337 c06 postgres DETAIL: Could not open file "pg_clog/0AE4": No such file or directory.
2011-08-27 04:18:34.015 12337 c06 postgres STATEMENT: analyze public.b_pxx;

On examination the pg_clog directory contained on two files timestamped
after the startup of the new cluster with 9.0.4. Other hosts that upgraded
successfully had numerous files in pg_clog dating back a few days. So it
appears that all the clog files went missing during the upgrade somehow.
a
This happened upgrading from 8.4.7 to 9.0.4, with a brief session in between
at 8.4.8. We have upgraded several hosts to 9.0.4 successfully previously.

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 16:16:03
Message-ID: 201108311616.p7VGG3N29000@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> >
> > OK, this was very helpful. I found out that there is a bug in current
> > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > tables. (The bug is not in any released version of pg_upgrade.) The
> > attached, applied patches should fix it for you. I assume you are
> > running 9.0.X, and not 9.0.4.
>
> pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
>
> will keep you posted.

FYI, this pg_upgrade bug exists in PG 9.1RC1, but not in earlier betas.
Users can either wait for 9.1 RC2 or Final, or use the patch I posted.
The bug is not in 9.0.4 and will not be in 9.0.5.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: depesz <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 16:23:07
Message-ID: 201108311623.p7VGN7G29572@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Alvaro Herrera wrote:
> Excerpts from hubert depesz lubaczewski's message of lun ago 29 14:49:24 -0300 2011:
> > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > > On Fri, Aug 26, 2011 at 05:28:35PM +0200, hubert depesz lubaczewski wrote:
> > > > On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> > > > >
> > > > > OK, this was very helpful. I found out that there is a bug in current
> > > > > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > > > > tables. (The bug is not in any released version of pg_upgrade.) The
> > > > > attached, applied patches should fix it for you. I assume you are
> > > > > running 9.0.X, and not 9.0.4.
> > > >
> > > > pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
> > >
> > > vacuumdb failed. The fail looks very similar to the one I had on 9.0.4.
> > >
> > > After long vacuum I got:
> > > INFO: vacuuming "pg_toast.pg_toast_106668498"
> > > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
>
> I don't understand the pg_upgrade code here. It is setting the
> datfrozenxid and relfrozenxid values to the latest checkpoint's NextXID,
>
> /* set pg_class.relfrozenxid */
> PQclear(executeQueryOrDie(conn,
> "UPDATE pg_catalog.pg_class "
> "SET relfrozenxid = '%u' "
> /* only heap and TOAST are vacuumed */
> "WHERE relkind IN ('r', 't')",
> old_cluster.controldata.chkpnt_nxtxid));
>
> but I don't see why this is safe. I mean, surely the previous
> vacuum might have been a lot earlier than that. Are these values reset
> to more correct values (i.e. older ones) later somehow? My question is,
> why isn't the new cluster completely screwed?

Have you looked at my pg_upgrade presentation?

http://momjian.us/main/presentations/features.html#pg_upgrade

This query happens after we have done a VACUUM FREEEZE on an empty
cluster.

pg_dump --binary-upgrade will dump out the proper relfrozen xids for
every object that gets its file system files copied or linked.

> I wonder if pg_upgrade shouldn't be doing the conservative thing here,
> which AFAICT would be to set all frozenxid values as furthest in the
> past as possible (without causing a shutdown-due-to-wraparound, and
> maybe without causing autovacuum to enter emergency mode either).

I already get complaints about requiring an "analyze" run after the
upgrade --- this would make it much worse. In fact I have to look into
upgrading optimizer statistics someday.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: depesz(at)depesz(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 16:32:04
Message-ID: CA+TgmoZMpY4CmxOCvWP+1=PKUMop8Pnxj3+E-Czy01xLOV=GUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Aug 31, 2011 at 12:16 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> hubert depesz lubaczewski wrote:
>> On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
>> >
>> > OK, this was very helpful.  I found out that there is a bug in current
>> > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
>> > tables.  (The bug is not in any released version of pg_upgrade.)  The
>> > attached, applied patches should fix it for you.  I assume you are
>> > running 9.0.X, and not 9.0.4.
>>
>> pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
>>
>> will keep you posted.
>
> FYI, this pg_upgrade bug exists in PG 9.1RC1, but not in earlier betas.
> Users can either wait for 9.1 RC2 or Final, or use the patch I posted.
> The bug is not in 9.0.4 and will not be in 9.0.5.

Based on subsequent discussion on this thread, it sounds like
something is still broken.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 17:03:22
Message-ID: 20110831170322.GA12377@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Aug 31, 2011 at 12:16:03PM -0400, Bruce Momjian wrote:
> hubert depesz lubaczewski wrote:
> > On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> > >
> > > OK, this was very helpful. I found out that there is a bug in current
> > > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > > tables. (The bug is not in any released version of pg_upgrade.) The
> > > attached, applied patches should fix it for you. I assume you are
> > > running 9.0.X, and not 9.0.4.
> >
> > pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
> >
> > will keep you posted.
>
> FYI, this pg_upgrade bug exists in PG 9.1RC1, but not in earlier betas.
> Users can either wait for 9.1 RC2 or Final, or use the patch I posted.
> The bug is not in 9.0.4 and will not be in 9.0.5.

I assume you mean the bug that caused pg_upgrade to fail.

But there still is (existing in 9.0.4 too) bug which causes vacuum to
fail.

Best regards,

depesz


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 17:23:05
Message-ID: 201108311723.p7VHN5m16754@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> INFO: vacuuming "pg_toast.pg_toast_106668498"
> vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
>
> Interestingly.
>
> In old dir there is pg_clog directory with files:
> 0AC0 .. 0DAF (including 0CC6, size 262144)
> but new pg_clog has only:
> 0D2F .. 0DB0
>
> File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
> 3c5122f3e80851735c19522065a2d12a 0DAF
> 8651fc2b9fa3d27cfb5b496165cead68 0DB0
>
> 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
>
> one more thing. I did select count(*) from transactions and it worked.

Count(*) worked because it didn't access any of the long/toasted values.

> that's about it. I can probably copy over files from old datadir to new (in
> pg_clog/), and will be happy to do it, but I'll wait for your call - retry with
> copies files might destroy some evidence.

You can safely copy over any of the clog files that exist in the old
cluster but not in the new one, but another vacuum is likely to remove
those files again. :-(

This sure sounds like a variation on the pg_upgrade/toast bug we fixed
in 9.0.4:

http://wiki.postgresql.org/wiki/20110408pg_upgrade_fix

Can you get me the 9.0.X pg_class.relfrozenxid for the toast and heap
tables involved?

FYI, this is what pg_dump --binary-upgrade does to preserve the
relfrozenxids:

-- For binary upgrade, set heap's relfrozenxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '702'
WHERE oid = 'test'::pg_catalog.regclass;

-- For binary upgrade, set toast's relfrozenxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '702'
WHERE oid = '16434';

We also preserve the pg_class oids with:

-- For binary upgrade, must preserve pg_class oids
SELECT binary_upgrade.set_next_heap_pg_class_oid('16431'::pg_catalog.oid);
SELECT binary_upgrade.set_next_toast_pg_class_oid('16434'::pg_catalog.oid);
SELECT binary_upgrade.set_next_index_pg_class_oid('16436'::pg_catalog.oid);

The question is whether this is working, and if not, why not?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 17:24:05
Message-ID: 201108311724.p7VHO5F16882@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Wed, Aug 31, 2011 at 12:16:03PM -0400, Bruce Momjian wrote:
> > hubert depesz lubaczewski wrote:
> > > On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> > > >
> > > > OK, this was very helpful. I found out that there is a bug in current
> > > > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > > > tables. (The bug is not in any released version of pg_upgrade.) The
> > > > attached, applied patches should fix it for you. I assume you are
> > > > running 9.0.X, and not 9.0.4.
> > >
> > > pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
> > >
> > > will keep you posted.
> >
> > FYI, this pg_upgrade bug exists in PG 9.1RC1, but not in earlier betas.
> > Users can either wait for 9.1 RC2 or Final, or use the patch I posted.
> > The bug is not in 9.0.4 and will not be in 9.0.5.
>
> I assume you mean the bug that caused pg_upgrade to fail.

Yes.

> But there still is (existing in 9.0.4 too) bug which causes vacuum to
> fail.

Yes. We need to find the cause of that new bug.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: depesz <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 17:31:07
Message-ID: 1314810875-sup-3777@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Excerpts from Bruce Momjian's message of mié ago 31 13:23:07 -0300 2011:
> Alvaro Herrera wrote:
> > Excerpts from hubert depesz lubaczewski's message of lun ago 29 14:49:24 -0300 2011:
> > > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > > > On Fri, Aug 26, 2011 at 05:28:35PM +0200, hubert depesz lubaczewski wrote:
> > > > > On Fri, Aug 26, 2011 at 12:18:55AM -0400, Bruce Momjian wrote:
> > > > > >
> > > > > > OK, this was very helpful. I found out that there is a bug in current
> > > > > > 9.0.X, 9.1.X, and HEAD that I introduced recently when I excluded temp
> > > > > > tables. (The bug is not in any released version of pg_upgrade.) The
> > > > > > attached, applied patches should fix it for you. I assume you are
> > > > > > running 9.0.X, and not 9.0.4.
> > > > >
> > > > > pg_upgrade worked. Now I'm doing reindex and later on vacuumdb -az.
> > > >
> > > > vacuumdb failed. The fail looks very similar to the one I had on 9.0.4.
> > > >
> > > > After long vacuum I got:
> > > > INFO: vacuuming "pg_toast.pg_toast_106668498"
> > > > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > > > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
> >
> > I don't understand the pg_upgrade code here. It is setting the
> > datfrozenxid and relfrozenxid values to the latest checkpoint's NextXID,
> >
> > /* set pg_class.relfrozenxid */
> > PQclear(executeQueryOrDie(conn,
> > "UPDATE pg_catalog.pg_class "
> > "SET relfrozenxid = '%u' "
> > /* only heap and TOAST are vacuumed */
> > "WHERE relkind IN ('r', 't')",
> > old_cluster.controldata.chkpnt_nxtxid));
> >
> > but I don't see why this is safe. I mean, surely the previous
> > vacuum might have been a lot earlier than that. Are these values reset
> > to more correct values (i.e. older ones) later somehow? My question is,
> > why isn't the new cluster completely screwed?
>
> Have you looked at my pg_upgrade presentation?
>
> http://momjian.us/main/presentations/features.html#pg_upgrade

I just did, but it doesn't explain this in much detail. (In any case I
don't think we should be relying in a PDF presentation to explain the
inner pg_upgrade details. I think we should rely more on the
IMPLEMENTATION file rather than your PDF ... amusingly that file doesn't
mention the frozenxids.)

> This query happens after we have done a VACUUM FREEEZE on an empty
> cluster.

Oh, so it only affects the databases that initdb created, right?
The other ones are not even created yet.

> pg_dump --binary-upgrade will dump out the proper relfrozen xids for
> every object that gets its file system files copied or linked.

Okay. I assume that between the moment you copy the pg_clog files from
the old server, and the moment you do the UPDATEs on pg_class and
pg_database, there is no chance for vacuum to run and remove clog
segments.

Still, it seems to me that this coding makes Min(datfrozenxid) to go
backwards, and that's bad news.

> > I wonder if pg_upgrade shouldn't be doing the conservative thing here,
> > which AFAICT would be to set all frozenxid values as furthest in the
> > past as possible (without causing a shutdown-due-to-wraparound, and
> > maybe without causing autovacuum to enter emergency mode either).
>
> I already get complaints about requiring an "analyze" run after the
> upgrade --- this would make it much worse. In fact I have to look into
> upgrading optimizer statistics someday.

Why would it make it worse at all? It doesn't look to me like it
wouldn't affect in any way. The only thing it does, is tell the system
to keep clog segments around.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: depesz <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 17:35:25
Message-ID: 201108311735.p7VHZPA18346@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Alvaro Herrera wrote:
> > > I don't understand the pg_upgrade code here. It is setting the
> > > datfrozenxid and relfrozenxid values to the latest checkpoint's NextXID,
> > >
> > > /* set pg_class.relfrozenxid */
> > > PQclear(executeQueryOrDie(conn,
> > > "UPDATE pg_catalog.pg_class "
> > > "SET relfrozenxid = '%u' "
> > > /* only heap and TOAST are vacuumed */
> > > "WHERE relkind IN ('r', 't')",
> > > old_cluster.controldata.chkpnt_nxtxid));
> > >
> > > but I don't see why this is safe. I mean, surely the previous
> > > vacuum might have been a lot earlier than that. Are these values reset
> > > to more correct values (i.e. older ones) later somehow? My question is,
> > > why isn't the new cluster completely screwed?
> >
> > Have you looked at my pg_upgrade presentation?
> >
> > http://momjian.us/main/presentations/features.html#pg_upgrade
>
> I just did, but it doesn't explain this in much detail. (In any case I
> don't think we should be relying in a PDF presentation to explain the
> inner pg_upgrade details. I think we should rely more on the
> IMPLEMENTATION file rather than your PDF ... amusingly that file doesn't
> mention the frozenxids.)
>
> > This query happens after we have done a VACUUM FREEEZE on an empty
> > cluster.
>
> Oh, so it only affects the databases that initdb created, right?
> The other ones are not even created yet.

Right.

> > pg_dump --binary-upgrade will dump out the proper relfrozen xids for
> > every object that gets its file system files copied or linked.
>
> Okay. I assume that between the moment you copy the pg_clog files from
> the old server, and the moment you do the UPDATEs on pg_class and
> pg_database, there is no chance for vacuum to run and remove clog
> segments.

Right, we disable it, and had a long discussion about it. We actually
start the server with:

"-c autovacuum=off -c autovacuum_freeze_max_age=2000000000",

> Still, it seems to me that this coding makes Min(datfrozenxid) to go
> backwards, and that's bad news.

Yes, it is odd, but I don't see another option. Remember the problem
with xid wrap-around --- we really are defining two different xid eras,
and have to freeze to make that possible.

> > > I wonder if pg_upgrade shouldn't be doing the conservative thing here,
> > > which AFAICT would be to set all frozenxid values as furthest in the
> > > past as possible (without causing a shutdown-due-to-wraparound, and
> > > maybe without causing autovacuum to enter emergency mode either).
> >
> > I already get complaints about requiring an "analyze" run after the
> > upgrade --- this would make it much worse. In fact I have to look into
> > upgrading optimizer statistics someday.
>
> Why would it make it worse at all? It doesn't look to me like it
> wouldn't affect in any way. The only thing it does, is tell the system
> to keep clog segments around.

It will cause excessive vacuum freezing to happen on startup, I assume.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 17:45:51
Message-ID: 20110831174550.GA31255@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Aug 31, 2011 at 01:23:05PM -0400, Bruce Momjian wrote:
> Can you get me the 9.0.X pg_class.relfrozenxid for the toast and heap
> tables involved?

Sure:

=# select oid::regclass, relfrozenxid from pg_class where relname in ('transactions', 'pg_toast_106668498');
oid | relfrozenxid
-----------------------------+--------------
pg_toast.pg_toast_106668498 | 3673553926
transactions | 3623560321
(2 rows)

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: daveg <daveg(at)sonic(dot)net>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-08-31 18:25:44
Message-ID: 201108311825.p7VIPim14330@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


FYI, I am working with depesz on IM right now and will report back when
we have a cause of the bug. FYI, I was without electric power for 53
hours, which is why I am late in replying to this report.

---------------------------------------------------------------------------

daveg wrote:
> On Mon, Aug 29, 2011 at 07:49:24PM +0200, hubert depesz lubaczewski wrote:
> > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
> >
> > Interestingly.
> >
> > In old dir there is pg_clog directory with files:
> > 0AC0 .. 0DAF (including 0CC6, size 262144)
> > but new pg_clog has only:
> > 0D2F .. 0DB0
> >
> > File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
> > 3c5122f3e80851735c19522065a2d12a 0DAF
> > 8651fc2b9fa3d27cfb5b496165cead68 0DB0
> >
> > 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
> >
> > one more thing. I did select count(*) from transactions and it worked.
> >
> > that's about it. I can probably copy over files from old datadir to new (in
> > pg_clog/), and will be happy to do it, but I'll wait for your call - retry with
> > copies files might destroy some evidence.
>
> I had this same thing happen this Saturday just past and my client had to
> restore the whole 2+ TB instance from the previous days pg_dumps.
> I had been thinking that perhaps I did something wrong in setting up or
> running the upgrade, but had not found it yet. Now that I see Hubert has
> the same problem it is starting to look like pg_upgrade can eat all your
> data.
>
> After running pg_upgrade apparently successfully and analyzeing all the
> tables we restarted the production workload and started getting errors:
>
> 2011-08-27 04:18:34.015 12337 c06 postgres ERROR: could not access status of transaction 2923961093
> 2011-08-27 04:18:34.015 12337 c06 postgres DETAIL: Could not open file "pg_clog/0AE4": No such file or directory.
> 2011-08-27 04:18:34.015 12337 c06 postgres STATEMENT: analyze public.b_pxx;
>
> On examination the pg_clog directory contained on two files timestamped
> after the startup of the new cluster with 9.0.4. Other hosts that upgraded
> successfully had numerous files in pg_clog dating back a few days. So it
> appears that all the clog files went missing during the upgrade somehow.
> a
> This happened upgrading from 8.4.7 to 9.0.4, with a brief session in between
> at 8.4.8. We have upgraded several hosts to 9.0.4 successfully previously.
>
> -dg
>
> --
> David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
> If simplicity worked, the world would be overrun with insects.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Lou Picciano <loupicciano(at)comcast(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: toast error after pg_upgrade 9.0.4 -> 9.1rc1
Date: 2011-08-31 20:11:15
Message-ID: 239726088.702769.1314821475533.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

After running an essentially uneventful* pg_upgrade from 9.0.4 -> 9.1rc1, we are seeing some toast errors logged on the new cluster:

All are of this pattern: ERROR: missing chunk number 0 for toast value 130087 in pg_toast_34735

Have seen the same pattern for a few of the databases in the 9.1rc1 cluster, and all as a result of a select on a usr table (the offending SELECT happens to be the first one any of these DBs sees, as it's the first step in a user authentication process). SELECT count(*) does not produce an error.

*almost uneventful: We also saw messages that the destination cluster did not have one of our schema - (of course it didn't!) - I didn't realize pg_upgrade doesn't 'do' schema?

Lou Picciano


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-01 01:54:20
Message-ID: 201109010154.p811sKm18831@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Wed, Aug 31, 2011 at 01:23:05PM -0400, Bruce Momjian wrote:
> > Can you get me the 9.0.X pg_class.relfrozenxid for the toast and heap
> > tables involved?
>
> Sure:
>
> =# select oid::regclass, relfrozenxid from pg_class where relname in ('transactions', 'pg_toast_106668498');
> oid | relfrozenxid
> -----------------------------+--------------
> pg_toast.pg_toast_106668498 | 3673553926
> transactions | 3623560321
> (2 rows)

Working with depesz, I have found the cause. The code I added to fix
pg_upgrade in 9.0.4 and earlier releases didn't handle old 8.3 servers
properly. I mistakenly processed toast table with the same pg_dump
query as used for pre-8.4 toast tables, not realizing those were not
functional because there were no reloptions for toast tables in pre-8.4.

The attached applied patches fix all releases. This will have to be
mentioned in the 9.0.5 release notes, and we should probably do the same
kind of announcement we did when I fixed this for 9.0.4. :-(

Yeah, I should not have caused this bug. It did not show up in any of
my testing.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/pg_upgrade_fix.8.4 text/x-diff 2.2 KB
/rtmp/pg_upgrade_fix.9.0 text/x-diff 2.1 KB
/rtmp/pg_upgrade_fix.9.1 text/x-diff 2.0 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: daveg <daveg(at)sonic(dot)net>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-01 01:56:59
Message-ID: 201109010156.p811uxG19239@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

daveg wrote:
> On Mon, Aug 29, 2011 at 07:49:24PM +0200, hubert depesz lubaczewski wrote:
> > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
> >
> > Interestingly.
> >
> > In old dir there is pg_clog directory with files:
> > 0AC0 .. 0DAF (including 0CC6, size 262144)
> > but new pg_clog has only:
> > 0D2F .. 0DB0
> >
> > File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
> > 3c5122f3e80851735c19522065a2d12a 0DAF
> > 8651fc2b9fa3d27cfb5b496165cead68 0DB0
> >
> > 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
> >
> > one more thing. I did select count(*) from transactions and it worked.
> >
> > that's about it. I can probably copy over files from old datadir to new (in
> > pg_clog/), and will be happy to do it, but I'll wait for your call - retry with
> > copies files might destroy some evidence.
>
> I had this same thing happen this Saturday just past and my client had to
> restore the whole 2+ TB instance from the previous days pg_dumps.
> I had been thinking that perhaps I did something wrong in setting up or
> running the upgrade, but had not found it yet. Now that I see Hubert has
> the same problem it is starting to look like pg_upgrade can eat all your
> data.
>
> After running pg_upgrade apparently successfully and analyzeing all the
> tables we restarted the production workload and started getting errors:
>
> 2011-08-27 04:18:34.015 12337 c06 postgres ERROR: could not access status of transaction 2923961093
> 2011-08-27 04:18:34.015 12337 c06 postgres DETAIL: Could not open file "pg_clog/0AE4": No such file or directory.
> 2011-08-27 04:18:34.015 12337 c06 postgres STATEMENT: analyze public.b_pxx;
>
> On examination the pg_clog directory contained on two files timestamped
> after the startup of the new cluster with 9.0.4. Other hosts that upgraded
> successfully had numerous files in pg_clog dating back a few days. So it
> appears that all the clog files went missing during the upgrade somehow.
> a
> This happened upgrading from 8.4.7 to 9.0.4, with a brief session in between
> at 8.4.8. We have upgraded several hosts to 9.0.4 successfully previously.

I have posted this fix to the hackers email list, but I found it only
affected old 8.3 servers, not old 8.4.X, so I am confused by your bug
report.

I have tested 8.4.X to 9.0.4 and found pg_upgrade preserves toast
relfrozenxids properly in that case.

Can you tell me what table is showing this error? Does it happen during
vacuum? Can you run a vacuum verbose to see what it is throwing the
error on? Thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Lou Picciano <loupicciano(at)comcast(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: toast error after pg_upgrade 9.0.4 -> 9.1rc1
Date: 2011-09-01 02:38:01
Message-ID: 201109010238.p812c1p27084@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Lou Picciano wrote:
> After running an essentially uneventful* pg_upgrade from 9.0.4 ->
> 9.1rc1, we are seeing some toast errors logged on the new cluster:
>
> All are of this pattern: ERROR: missing chunk number 0 for toast value
> 130087 in pg_toast_34735
>
> Have seen the same pattern for a few of the databases in the 9.1rc1
> cluster, and all as a result of a select on a usr table (the offending
> SELECT happens to be the first one any of these DBs sees, as it's the
> first step in a user authentication process). SELECT count(*) does not
> produce an error.
>
> *almost uneventful: We also saw messages that the destination cluster
> did not have one of our schema - (of course it didn't!) - I didn't
> realize pg_upgrade doesn't 'do' schema?

I have fixed two errors in pg_upgrade since 9.1rc1 was released. Can
you use git 9.1 head or wait for 9.1rc2 or 9.1 final? I can email you
patches if you prefer.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Lou Picciano <loupicciano(at)comcast(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: toast error after pg_upgrade 9.0.4 -> 9.1rc1
Date: 2011-09-01 02:43:05
Message-ID: 201109010243.p812h5J28346@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Lou Picciano wrote:
> > After running an essentially uneventful* pg_upgrade from 9.0.4 ->
> > 9.1rc1, we are seeing some toast errors logged on the new cluster:
> >
> > All are of this pattern: ERROR: missing chunk number 0 for toast value
> > 130087 in pg_toast_34735
> >
> > Have seen the same pattern for a few of the databases in the 9.1rc1
> > cluster, and all as a result of a select on a usr table (the offending
> > SELECT happens to be the first one any of these DBs sees, as it's the
> > first step in a user authentication process). SELECT count(*) does not
> > produce an error.
> >
> > *almost uneventful: We also saw messages that the destination cluster
> > did not have one of our schema - (of course it didn't!) - I didn't
> > realize pg_upgrade doesn't 'do' schema?

> I have fixed two errors in pg_upgrade since 9.1rc1 was released. Can
> you use git 9.1 head or wait for 9.1rc2 or 9.1 final? I can email you
> patches if you prefer.

Thinking some more, none of these errors was fixed by the patches I
applied.

The schema error seems very odd --- pg_upgrade certainly handles
schemas. In fact, any error makes pg_upgrade stop, so I am curious what
the error was. Did the upgrade fail and you just started the new
server? That isn't good.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Lou Picciano <loupicciano(at)comcast(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: toast error after pg_upgrade 9.0.4 -> 9.1rc1
Date: 2011-09-01 03:00:55
Message-ID: 380511541.722458.1314846055794.JavaMail.root@sz0093a.westchester.pa.mail.comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

----- Original Message -----
From: "Bruce Momjian" <bruce(at)momjian(dot)us>
To: "Lou Picciano" <loupicciano(at)comcast(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Sent: Wednesday, August 31, 2011 10:38:01 PM
Subject: Re: [HACKERS] toast error after pg_upgrade 9.0.4 -> 9.1rc1

Lou Picciano wrote:
> After running an essentially uneventful* pg_upgrade from 9.0.4 ->
> 9.1rc1, we are seeing some toast errors logged on the new cluster:
>
> All are of this pattern: ERROR: missing chunk number 0 for toast value
> 130087 in pg_toast_34735
>
> Have seen the same pattern for a few of the databases in the 9.1rc1
> cluster, and all as a result of a select on a usr table (the offending
> SELECT happens to be the first one any of these DBs sees, as it's the
> first step in a user authentication process). SELECT count(*) does not
> produce an error.
>
> *almost uneventful: We also saw messages that the destination cluster
> did not have one of our schema - (of course it didn't!) - I didn't
> realize pg_upgrade doesn't 'do' schema?

I have fixed two errors in pg_upgrade since 9.1rc1 was released. Can
you use git 9.1 head or wait for 9.1rc2 or 9.1 final? I can email you
patches if you prefer.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Bruce, many thanks. I've done the recent git pull; yes, will build head from there and send you the new mileage report.

(Glad your weekend with Irene is finally over - we had a few crises down here in New York; my fish were swimming in the back yard, lots of flooding, a few trees down - one of which only prevented from falling on the house by the 16K volt primary line. Great! But we were luckier than many. )

Lou Picciano


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-01 06:05:51
Message-ID: 20110901060551.GA16941@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Aug 31, 2011 at 09:54:20PM -0400, Bruce Momjian wrote:
> Working with depesz, I have found the cause. The code I added to fix
> pg_upgrade in 9.0.4 and earlier releases didn't handle old 8.3 servers
> properly. I mistakenly processed toast table with the same pg_dump
> query as used for pre-8.4 toast tables, not realizing those were not
> functional because there were no reloptions for toast tables in pre-8.4.

Thanks a lot. Will test and post results (around sunday/monday I guess).

Best regards,

depesz


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 15:48:50
Message-ID: 20110905154850.GA30525@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Sep 01, 2011 at 08:05:51AM +0200, hubert depesz lubaczewski wrote:
> On Wed, Aug 31, 2011 at 09:54:20PM -0400, Bruce Momjian wrote:
> > Working with depesz, I have found the cause. The code I added to fix
> > pg_upgrade in 9.0.4 and earlier releases didn't handle old 8.3 servers
> > properly. I mistakenly processed toast table with the same pg_dump
> > query as used for pre-8.4 toast tables, not realizing those were not
> > functional because there were no reloptions for toast tables in pre-8.4.
>
> Thanks a lot. Will test and post results (around sunday/monday I guess).

All worked.
pg_upgrade/vacuum didn't raise any errors. Will check some random
queries too, but don't expect anything to break.

thanks again for quick help.

Best regards,

depesz


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 15:56:34
Message-ID: 201109051556.p85FuY005837@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Thu, Sep 01, 2011 at 08:05:51AM +0200, hubert depesz lubaczewski wrote:
> > On Wed, Aug 31, 2011 at 09:54:20PM -0400, Bruce Momjian wrote:
> > > Working with depesz, I have found the cause. The code I added to fix
> > > pg_upgrade in 9.0.4 and earlier releases didn't handle old 8.3 servers
> > > properly. I mistakenly processed toast table with the same pg_dump
> > > query as used for pre-8.4 toast tables, not realizing those were not
> > > functional because there were no reloptions for toast tables in pre-8.4.
> >
> > Thanks a lot. Will test and post results (around sunday/monday I guess).
>
> All worked.
> pg_upgrade/vacuum didn't raise any errors. Will check some random
> queries too, but don't expect anything to break.

Thanks. I will announce the known bug and the fix.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 17:52:16
Message-ID: 20110905175216.GA6283@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Sep 05, 2011 at 05:48:50PM +0200, hubert depesz lubaczewski wrote:
> On Thu, Sep 01, 2011 at 08:05:51AM +0200, hubert depesz lubaczewski wrote:
> > On Wed, Aug 31, 2011 at 09:54:20PM -0400, Bruce Momjian wrote:
> > > Working with depesz, I have found the cause. The code I added to fix
> > > pg_upgrade in 9.0.4 and earlier releases didn't handle old 8.3 servers
> > > properly. I mistakenly processed toast table with the same pg_dump
> > > query as used for pre-8.4 toast tables, not realizing those were not
> > > functional because there were no reloptions for toast tables in pre-8.4.
> >
> > Thanks a lot. Will test and post results (around sunday/monday I guess).
>
> All worked.
> pg_upgrade/vacuum didn't raise any errors. Will check some random
> queries too, but don't expect anything to break.

Hmm .. got breakage.

Have table with ltree column, and any select to it causes:

=# select * from categories limit 1;
The connection to the server was lost. Attempting reset: Failed.

strace shows that backend read table, then it opened correct ltree.so,
but then:

29293 17:49:00.667865 stat("/opt/pgsql-9.0.5a-int/lib/ltree", 0x7fffb026ceb0) = -1 ENOENT (No such file or directory) <0.000013>
29293 17:49:00.667935 stat("/opt/pgsql-9.0.5a-int/lib/ltree.so", {st_mode=S_IFREG|0755, st_size=72966, ...}) = 0 <0.000010>
29293 17:49:00.668007 stat("/opt/pgsql-9.0.5a-int/lib/ltree.so", {st_mode=S_IFREG|0755, st_size=72966, ...}) = 0 <0.000009>
29293 17:49:00.668135 open("/opt/pgsql-9.0.5a-int/lib/ltree.so", O_RDONLY) = 46 <0.000012>
29293 17:49:00.668181 read(46, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240.\0\0"..., 832) = 832 <0.000008>
29293 17:49:00.668227 fstat(46, {st_mode=S_IFREG|0755, st_size=72966, ...}) = 0 <0.000006>
29293 17:49:00.668294 mmap(NULL, 2153248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 46, 0) = 0x7feba4abc000 <0.000013>
29293 17:49:00.668341 mprotect(0x7feba4aca000, 2093056, PROT_NONE) = 0 <0.000012>
29293 17:49:00.668381 mmap(0x7feba4cc9000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 46, 0xd000) = 0x7feba4cc9000 <0.000012>
29293 17:49:00.668429 close(46) = 0 <0.000007>
29293 17:49:00.668715 open("/dev/tty", O_RDWR|O_NOCTTY|O_NONBLOCK) = -1 ENXIO (No such device or address) <0.000017>
29293 17:49:00.668771 writev(2, [{"*** glibc detected *** ", 23}, {"postgres: postgres xxxxxxx [loca"..., 41}, {": ", 2}, {"double free or corruption (!prev"..., 33}, {": 0x", 4}, {"0000000000be67a0", 16}, {" ***\n", 5}], 7) = 124 <0.000014>
29293 17:49:00.668863 open("/opt/pgsql-9.0.5a-int/lib/libgcc_s.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000010>
29293 17:49:00.668907 open("/opt/pgsql-8.3.11-int/lib/libgcc_s.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000012>
29293 17:49:00.668952 open("/etc/ld.so.cache", O_RDONLY) = 46 <0.000010>
29293 17:49:00.668990 fstat(46, {st_mode=S_IFREG|0644, st_size=17400, ...}) = 0 <0.000006>
29293 17:49:00.669044 mmap(NULL, 17400, PROT_READ, MAP_PRIVATE, 46, 0) = 0x7feba80d7000 <0.000008>
29293 17:49:00.669077 close(46) = 0 <0.000006>
29293 17:49:00.669110 access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) <0.000009>
29293 17:49:00.669156 open("/lib/libgcc_s.so.1", O_RDONLY) = 46 <0.000012>
29293 17:49:00.669197 read(46, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240!\0\0"..., 832) = 832 <0.000009>
29293 17:49:00.669244 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7feb9cabc000 <0.000007>
29293 17:49:00.669278 munmap(0x7feb9cabc000, 55853056) = 0 <0.000011>
29293 17:49:00.669313 munmap(0x7feba4000000, 11255808) = 0 <0.000008>
29293 17:49:00.669347 mprotect(0x7feba0000000, 135168, PROT_READ|PROT_WRITE) = 0 <0.000008>
29293 17:49:00.669387 fstat(46, {st_mode=S_IFREG|0644, st_size=56072, ...}) = 0 <0.000006>
29293 17:49:00.669451 mmap(NULL, 2151816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 46, 0) = 0x7feba48ae000 <0.000009>
29293 17:49:00.669487 mprotect(0x7feba48bb000, 2097152, PROT_NONE) = 0 <0.000009>
29293 17:49:00.669522 mmap(0x7feba4abb000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 46, 0xd000) = 0x7feba4abb000 <0.000009>
29293 17:49:00.669565 close(46) = 0 <0.000006>
29293 17:49:00.669614 munmap(0x7feba80d7000, 17400) = 0 <0.000012>
29293 17:49:00.669765 write(2, "======= Backtrace: =========\n", 29) = 29 <0.000011>
29293 17:49:00.669852 writev(2, [{"/lib/libc.so.6", 14}, {"[0x", 3}, {"7feba759908a", 12}, {"]\n", 2}], 4) = 31 <0.000011>
29293 17:49:00.669937 writev(2, [{"/lib/libc.so.6", 14}, {"(", 1}, {"cfree", 5}, {"+0x", 3}, {"8c", 2}, {")", 1}, {"[0x", 3}, {"7feba759cc1c", 12}, {"]\n", 2}], 9) = 43 <0.000012>
29293 17:49:00.670128 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"[0x", 3}, {"6c18c9", 6}, {"]\n", 2}], 4) = 52 <0.000011>
29293 17:49:00.670289 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"MemoryContextDelete", 19}, {"+0x", 3}, {"54", 2}, {")", 1}, {"[0x", 3}, {"6c1e54", 6}, {"]\n", 2}], 9) = 78 <0.000012>
29293 17:49:00.670453 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"standard_ExecutorEnd", 20}, {"+0x", 3}, {"12e", 3}, {")", 1}, {"[0x", 3}, {"54913e", 6}, {"]\n", 2}], 9) = 80 <0.000011>
29293 17:49:00.670611 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"PortalCleanup", 13}, {"+0x", 3}, {"8f", 2}, {")", 1}, {"[0x", 3}, {"51c88f", 6}, {"]\n", 2}], 9) = 72 <0.000012>
29293 17:49:00.670770 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"PortalDrop", 10}, {"+0x", 3}, {"4c", 2}, {")", 1}, {"[0x", 3}, {"6c26fc", 6}, {"]\n", 2}], 9) = 69 <0.000011>
29293 17:49:00.670940 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"[0x", 3}, {"5f8c95", 6}, {"]\n", 2}], 4) = 52 <0.000010>
29293 17:49:00.671094 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"PostgresMain", 12}, {"+0x", 3}, {"50e", 3}, {")", 1}, {"[0x", 3}, {"5f95de", 6}, {"]\n", 2}], 9) = 72 <0.000012>
29293 17:49:00.671264 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"[0x", 3}, {"5c94f6", 6}, {"]\n", 2}], 4) = 52 <0.000011>
29293 17:49:00.671415 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"PostmasterMain", 14}, {"+0x", 3}, {"7ee", 3}, {")", 1}, {"[0x", 3}, {"5ca0fe", 6}, {"]\n", 2}], 9) = 74 <0.000012>
29293 17:49:00.671579 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"main", 4}, {"+0x", 3}, {"190", 3}, {")", 1}, {"[0x", 3}, {"574070", 6}, {"]\n", 2}], 9) = 64 <0.000011>
29293 17:49:00.671677 writev(2, [{"/lib/libc.so.6", 14}, {"(", 1}, {"__libc_start_main", 17}, {"+0x", 3}, {"f4", 2}, {")", 1}, {"[0x", 3}, {"7feba75431c4", 12}, {"]\n", 2}], 9) = 55 <0.000011>
29293 17:49:00.671818 writev(2, [{"postgres: postgres xxxxxxx [loca"..., 41}, {"(", 1}, {"memcmp", 6}, {"+0x", 3}, {"489", 3}, {")", 1}, {"[0x", 3}, {"457179", 6}, {"]\n", 2}], 9) = 66 <0.000011>
29293 17:49:00.671893 write(2, "======= Memory map: ========\n", 29) = 29 <0.000010>
29293 17:49:00.671932 open("/proc/self/maps", O_RDONLY) = 46 <0.000025>
29293 17:49:00.671986 read(46, "00400000-00882000 r-xp 00000000 "..., 1024) = 1024 <0.000038>
29293 17:49:00.672054 write(2, "00400000-00882000 r-xp 00000000 "..., 1024) = 1024 <0.000010>
29293 17:49:00.672094 read(46, "000 rw-s 00000000 00:09 11436034"..., 1024) = 1024 <0.000023>
29293 17:49:00.672148 write(2, "000 rw-s 00000000 00:09 11436034"..., 1024) = 1024 <0.000010>
29293 17:49:00.672189 read(46, "feba7882000-7feba7887000 rw-p 7f"..., 1024) = 1024 <0.000024>
29293 17:49:00.672244 write(2, "feba7882000-7feba7887000 rw-p 7f"..., 1024) = 1024 <0.000011>
29293 17:49:00.672283 read(46, "b/ld-2.7.so\n7feba80dc000-7feba81"..., 1024) = 1024 <0.000027>
29293 17:49:00.672340 write(2, "b/ld-2.7.so\n7feba80dc000-7feba81"..., 1024) = 1024 <0.000010>
29293 17:49:00.672379 read(46, " /lib/ld-2.7.so\n7fffb02410"..., 1024) = 267 <0.000010>
29293 17:49:00.672426 write(2, " /lib/ld-2.7.so\n7fffb02410"..., 267) = 267 <0.000010>
29293 17:49:00.672466 read(46, "", 1024) = 0 <0.000006>
29293 17:49:00.672496 close(46) = 0 <0.000009>
29293 17:49:00.672531 rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0 <0.000006>
29293 17:49:00.672580 tgkill(29293, 29293, SIGABRT) = 0 <0.000008>
29293 17:49:00.672612 --- SIGABRT (Aborted) @ 0 (0) ---

Full logged text:

*** glibc detected *** postgres: postgres xxxxxxx [local] SELECT: double free or corruption (!prev): 0x0000000000be67a0 ***
======= Backtrace: =========
/lib/libc.so.6[0x7feba759908a]
/lib/libc.so.6(cfree+0x8c)[0x7feba759cc1c]
postgres: postgres xxxxxxx [local] SELECT[0x6c18c9]
postgres: postgres xxxxxxx [local] SELECT(MemoryContextDelete+0x54)[0x6c1e54]
postgres: postgres xxxxxxx [local] SELECT(standard_ExecutorEnd+0x12e)[0x54913e]
postgres: postgres xxxxxxx [local] SELECT(PortalCleanup+0x8f)[0x51c88f]
postgres: postgres xxxxxxx [local] SELECT(PortalDrop+0x4c)[0x6c26fc]
postgres: postgres xxxxxxx [local] SELECT[0x5f8c95]
postgres: postgres xxxxxxx [local] SELECT(PostgresMain+0x50e)[0x5f95de]
postgres: postgres xxxxxxx [local] SELECT[0x5c94f6]
postgres: postgres xxxxxxx [local] SELECT(PostmasterMain+0x7ee)[0x5ca0fe]
postgres: postgres xxxxxxx [local] SELECT(main+0x190)[0x574070]
/lib/libc.so.6(__libc_start_main+0xf4)[0x7feba75431c4]
postgres: postgres xxxxxxx [local] SELECT(memcmp+0x489)[0x457179]
======= Memory map: ========
00400000-00882000 r-xp 00000000 68:05 17009395 /opt/pgsql-9.0.5a-int/bin/postgres
00a82000-00a8d000 rw-p 00482000 68:05 17009395 /opt/pgsql-9.0.5a-int/bin/postgres
00a8d000-00c0b000 rw-p 00a8d000 00:00 0 [heap]
7feba0000000-7feba0021000 rw-p 7feba0000000 00:00 0
7feba0021000-7feba4000000 ---p 7feba0021000 00:00 0
7feba48ae000-7feba48bb000 r-xp 00000000 68:05 8467164 /lib/libgcc_s.so.1
7feba48bb000-7feba4abb000 ---p 0000d000 68:05 8467164 /lib/libgcc_s.so.1
7feba4abb000-7feba4abc000 rw-p 0000d000 68:05 8467164 /lib/libgcc_s.so.1
7feba4abc000-7feba4aca000 r-xp 00000000 68:05 29781680 /opt/pgsql-9.0.5a-int/lib/ltree.so
7feba4aca000-7feba4cc9000 ---p 0000e000 68:05 29781680 /opt/pgsql-9.0.5a-int/lib/ltree.so
7feba4cc9000-7feba4cca000 rw-p 0000d000 68:05 29781680 /opt/pgsql-9.0.5a-int/lib/ltree.so
7feba4cca000-7feba7102000 rw-s 00000000 00:09 11436034 /SYSV00420e29 (deleted)
7feba7102000-7feba710c000 r-xp 00000000 68:05 8467177 /lib/libnss_files-2.7.so
7feba710c000-7feba730c000 ---p 0000a000 68:05 8467177 /lib/libnss_files-2.7.so
7feba730c000-7feba730e000 rw-p 0000a000 68:05 8467177 /lib/libnss_files-2.7.so
7feba730e000-7feba7324000 r-xp 00000000 68:05 41943384 /usr/lib/libz.so.1.2.3.3
7feba7324000-7feba7524000 ---p 00016000 68:05 41943384 /usr/lib/libz.so.1.2.3.3
7feba7524000-7feba7525000 rw-p 00016000 68:05 41943384 /usr/lib/libz.so.1.2.3.3
7feba7525000-7feba767d000 r-xp 00000000 68:05 8467168 /lib/libc-2.7.so
7feba767d000-7feba787d000 ---p 00158000 68:05 8467168 /lib/libc-2.7.so
7feba787d000-7feba7880000 r--p 00158000 68:05 8467168 /lib/libc-2.7.so
7feba7880000-7feba7882000 rw-p 0015b000 68:05 8467168 /lib/libc-2.7.so
7feba7882000-7feba7887000 rw-p 7feba7882000 00:00 0
7feba7887000-7feba7907000 r-xp 00000000 68:05 8467172 /lib/libm-2.7.so
7feba7907000-7feba7b06000 ---p 00080000 68:05 8467172 /lib/libm-2.7.so
7feba7b06000-7feba7b08000 rw-p 0007f000 68:05 8467172 /lib/libm-2.7.so
7feba7b08000-7feba7b0a000 r-xp 00000000 68:05 8467171 /lib/libdl-2.7.so
7feba7b0a000-7feba7d0a000 ---p 00002000 68:05 8467171 /lib/libdl-2.7.so
7feba7d0a000-7feba7d0c000 rw-p 00002000 68:05 8467171 /lib/libdl-2.7.so
7feba7d0c000-7feba7e49000 r-xp 00000000 68:05 42511255 /usr/lib/libxml2.so.2.6.31
7feba7e49000-7feba8049000 ---p 0013d000 68:05 42511255 /usr/lib/libxml2.so.2.6.31
7feba8049000-7feba8052000 rw-p 0013d000 68:05 42511255 /usr/lib/libxml2.so.2.6.31
7feba8052000-7feba8053000 rw-p 7feba8052000 00:00 0
7feba8053000-7feba8070000 r-xp 00000000 68:05 8467165 /lib/ld-2.7.so
7feba80dc000-7feba813e000 rw-p 7feba80dc000 00:00 0
7feba813e000-7feba817d000 r--p 00000000 68:05 62914972 /usr/lib/locale/en_US.utf8/LC_CTYPE
7feba817d000-7feba8184000 r--s 00000000 68:05 50331831 /usr/lib/gconv/gconv-modules.cache
7feba8184000-7feba8265000 r--p 00000000 68:05 62914975 /usr/lib/locale/en_US.utf8/LC_COLLATE
7feba8265000-7feba8268000 rw-p 7feba8265000 00:00 0
7feba8269000-7feba826a000 r--p 00000000 68:05 62914974 /usr/lib/locale/en_US.utf8/LC_TIME
7feba826a000-7feba826b000 r--p 00000000 68:05 62914973 /usr/lib/locale/en_US.utf8/LC_NUMERIC
7feba826b000-7feba826c000 r--p 00000000 68:05 62915024 /usr/lib/locale/en_US.utf8/LC_MONETARY
7feba826c000-7feba826d000 r--p 00000000 68:05 551 /usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES
7feba826d000-7feba8270000 rw-p 7feba826d000 00:00 0
7feba8270000-7feba8272000 rw-p 0001d000 68:05 8467165 /lib/ld-2.7.so
7fffb0241000-7fffb0271000 rw-p 7ffffffcf000 00:00 0 [stack]
7fffb03fe000-7fffb0400000 r-xp 7fffb03fe000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
2011-09-05 17:49:00 UTC () [29522]: [30-1] user=,db= LOG: server process (PID 29293) was terminated by signal 6: Aborted
2011-09-05 17:49:00 UTC () [29522]: [31-1] user=,db= LOG: terminating any other active server processes

I'm not sure if it's upgrade thing, or is it because of error in
ltree/compilation, but it looks bad.

Is there any more info I could show/gather to help debug the issue?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: depesz(at)depesz(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 18:13:58
Message-ID: 201109051813.p85IDwc29847@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> hubert depesz lubaczewski wrote:
> > On Wed, Aug 31, 2011 at 01:23:05PM -0400, Bruce Momjian wrote:
> > > Can you get me the 9.0.X pg_class.relfrozenxid for the toast and heap
> > > tables involved?
> >
> > Sure:
> >
> > =# select oid::regclass, relfrozenxid from pg_class where relname in ('transactions', 'pg_toast_106668498');
> > oid | relfrozenxid
> > -----------------------------+--------------
> > pg_toast.pg_toast_106668498 | 3673553926
> > transactions | 3623560321
> > (2 rows)
>
> Working with depesz, I have found the cause. The code I added to fix
> pg_upgrade in 9.0.4 and earlier releases didn't handle old 8.3 servers
> properly. I mistakenly processed toast table with the same pg_dump
> query as used for pre-8.4 toast tables, not realizing those were not
> functional because there were no reloptions for toast tables in pre-8.4.
>
> The attached applied patches fix all releases. This will have to be
> mentioned in the 9.0.5 release notes, and we should probably do the same
> kind of announcement we did when I fixed this for 9.0.4. :-(
>
> Yeah, I should not have caused this bug. It did not show up in any of
> my testing.

I have posted the bug and fix announcement to the announce email list.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 18:18:18
Message-ID: 201109051818.p85IIIm01057@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> I'm not sure if it's upgrade thing, or is it because of error in
> ltree/compilation, but it looks bad.
>
> Is there any more info I could show/gather to help debug the issue?

I am confused by the error --- is it not loading, or can you get a
backtrace of the crash?

If I had to take a guess, it would be that there is some ltree
incompatibility from PG 8.3 that we didn't know about.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 18:44:11
Message-ID: 20110905184411.GA3444@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
> hubert depesz lubaczewski wrote:
> > I'm not sure if it's upgrade thing, or is it because of error in
> > ltree/compilation, but it looks bad.
> >
> > Is there any more info I could show/gather to help debug the issue?
>
> I am confused by the error --- is it not loading, or can you get a
> backtrace of the crash?

The one in logs is not sufficient?
If not - could you tell me how to make the backtrace? I'm by far not a c
programmer, so for this I'd need some tutoring.

> If I had to take a guess, it would be that there is some ltree
> incompatibility from PG 8.3 that we didn't know about.

it's possible.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 18:51:12
Message-ID: 201109051851.p85IpCS05627@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
> > hubert depesz lubaczewski wrote:
> > > I'm not sure if it's upgrade thing, or is it because of error in
> > > ltree/compilation, but it looks bad.
> > >
> > > Is there any more info I could show/gather to help debug the issue?
> >
> > I am confused by the error --- is it not loading, or can you get a
> > backtrace of the crash?
>
> The one in logs is not sufficient?
> If not - could you tell me how to make the backtrace? I'm by far not a c
> programmer, so for this I'd need some tutoring.

I think you want this:

http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

While strace is useful, it doesn't show us where the C code is failing.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: depesz(at)depesz(dot)com
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 19:10:55
Message-ID: 12710.1315249855@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
>> If I had to take a guess, it would be that there is some ltree
>> incompatibility from PG 8.3 that we didn't know about.

> it's possible.

[ checks the git history... ] This 8.4 commit:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=8eee65c996048848c20f6637c1d12b319a4ce244
changed a number of ltree data structures, though I'm not sure whether
any of those are on-disk structures.

regards, tom lane


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 20:35:56
Message-ID: 20110905203555.GA12739@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Sep 05, 2011 at 02:51:12PM -0400, Bruce Momjian wrote:
> hubert depesz lubaczewski wrote:
> > On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
> > > hubert depesz lubaczewski wrote:
> > > > I'm not sure if it's upgrade thing, or is it because of error in
> > > > ltree/compilation, but it looks bad.
> > > >
> > > > Is there any more info I could show/gather to help debug the issue?
> > >
> > > I am confused by the error --- is it not loading, or can you get a
> > > backtrace of the crash?
> >
> > The one in logs is not sufficient?
> > If not - could you tell me how to make the backtrace? I'm by far not a c
> > programmer, so for this I'd need some tutoring.
>
> I think you want this:
>
> http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
> While strace is useful, it doesn't show us where the C code is failing.

ok.
got this:

(gdb) bt
#0 0x00007fdc28605095 in raise () from /lib/libc.so.6
#1 0x00007fdc28606af0 in abort () from /lib/libc.so.6
#2 0x00007fdc2863fa7b in ?? () from /lib/libc.so.6
#3 0x00007fdc2864708a in ?? () from /lib/libc.so.6
#4 0x00007fdc2864ac1c in free () from /lib/libc.so.6
#5 0x00000000006c18c9 in AllocSetDelete (context=<value optimized out>) at aset.c:551
#6 0x00000000006c1e54 in MemoryContextDelete (context=0xbdae80) at mcxt.c:196
#7 0x000000000054913e in standard_ExecutorEnd (queryDesc=0xbbb4f0) at execMain.c:360
#8 0x000000000051c88f in PortalCleanup (portal=0xbb7a70) at portalcmds.c:268
#9 0x00000000006c26fc in PortalDrop (portal=0xbb7a70, isTopCommit=0 '\0') at portalmem.c:434
#10 0x00000000005f8c95 in exec_simple_query (query_string=0xb9b980 "select * from categories limit 1;") at postgres.c:1067
#11 0x00000000005f95de in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3936
#12 0x00000000005c94f6 in ServerLoop () at postmaster.c:3555
#13 0x00000000005ca0fe in PostmasterMain (argc=3, argv=0xaf0870) at postmaster.c:1092
#14 0x0000000000574070 in main (argc=3, argv=0xaf0870) at main.c:188

Hope it helps.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 20:43:47
Message-ID: 201109052043.p85KhlB07427@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Mon, Sep 05, 2011 at 02:51:12PM -0400, Bruce Momjian wrote:
> > hubert depesz lubaczewski wrote:
> > > On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
> > > > hubert depesz lubaczewski wrote:
> > > > > I'm not sure if it's upgrade thing, or is it because of error in
> > > > > ltree/compilation, but it looks bad.
> > > > >
> > > > > Is there any more info I could show/gather to help debug the issue?
> > > >
> > > > I am confused by the error --- is it not loading, or can you get a
> > > > backtrace of the crash?
> > >
> > > The one in logs is not sufficient?
> > > If not - could you tell me how to make the backtrace? I'm by far not a c
> > > programmer, so for this I'd need some tutoring.
> >
> > I think you want this:
> >
> > http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
> >
> > While strace is useful, it doesn't show us where the C code is failing.
>
> ok.
> got this:
>
> (gdb) bt
> #0 0x00007fdc28605095 in raise () from /lib/libc.so.6
> #1 0x00007fdc28606af0 in abort () from /lib/libc.so.6
> #2 0x00007fdc2863fa7b in ?? () from /lib/libc.so.6
> #3 0x00007fdc2864708a in ?? () from /lib/libc.so.6
> #4 0x00007fdc2864ac1c in free () from /lib/libc.so.6
> #5 0x00000000006c18c9 in AllocSetDelete (context=<value optimized out>) at aset.c:551
> #6 0x00000000006c1e54 in MemoryContextDelete (context=0xbdae80) at mcxt.c:196
> #7 0x000000000054913e in standard_ExecutorEnd (queryDesc=0xbbb4f0) at execMain.c:360
> #8 0x000000000051c88f in PortalCleanup (portal=0xbb7a70) at portalcmds.c:268
> #9 0x00000000006c26fc in PortalDrop (portal=0xbb7a70, isTopCommit=0 '\0') at portalmem.c:434
> #10 0x00000000005f8c95 in exec_simple_query (query_string=0xb9b980 "select * from categories limit 1;") at postgres.c:1067
> #11 0x00000000005f95de in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3936
> #12 0x00000000005c94f6 in ServerLoop () at postmaster.c:3555
> #13 0x00000000005ca0fe in PostmasterMain (argc=3, argv=0xaf0870) at postmaster.c:1092
> #14 0x0000000000574070 in main (argc=3, argv=0xaf0870) at main.c:188

Good. Is it possible to compile with debug symbols, -g? Odd you are
crashing in libc.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 20:46:27
Message-ID: 20110905204627.GA23389@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Sep 05, 2011 at 04:43:47PM -0400, Bruce Momjian wrote:
> hubert depesz lubaczewski wrote:
> > On Mon, Sep 05, 2011 at 02:51:12PM -0400, Bruce Momjian wrote:
> > > hubert depesz lubaczewski wrote:
> > > > On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
> > > > > hubert depesz lubaczewski wrote:
> > > > > > I'm not sure if it's upgrade thing, or is it because of error in
> > > > > > ltree/compilation, but it looks bad.
> > > > > >
> > > > > > Is there any more info I could show/gather to help debug the issue?
> > > > >
> > > > > I am confused by the error --- is it not loading, or can you get a
> > > > > backtrace of the crash?
> > > >
> > > > The one in logs is not sufficient?
> > > > If not - could you tell me how to make the backtrace? I'm by far not a c
> > > > programmer, so for this I'd need some tutoring.
> > >
> > > I think you want this:
> > >
> > > http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
> > >
> > > While strace is useful, it doesn't show us where the C code is failing.
> >
> > ok.
> > got this:
> >
> > (gdb) bt
> > #0 0x00007fdc28605095 in raise () from /lib/libc.so.6
> > #1 0x00007fdc28606af0 in abort () from /lib/libc.so.6
> > #2 0x00007fdc2863fa7b in ?? () from /lib/libc.so.6
> > #3 0x00007fdc2864708a in ?? () from /lib/libc.so.6
> > #4 0x00007fdc2864ac1c in free () from /lib/libc.so.6
> > #5 0x00000000006c18c9 in AllocSetDelete (context=<value optimized out>) at aset.c:551
> > #6 0x00000000006c1e54 in MemoryContextDelete (context=0xbdae80) at mcxt.c:196
> > #7 0x000000000054913e in standard_ExecutorEnd (queryDesc=0xbbb4f0) at execMain.c:360
> > #8 0x000000000051c88f in PortalCleanup (portal=0xbb7a70) at portalcmds.c:268
> > #9 0x00000000006c26fc in PortalDrop (portal=0xbb7a70, isTopCommit=0 '\0') at portalmem.c:434
> > #10 0x00000000005f8c95 in exec_simple_query (query_string=0xb9b980 "select * from categories limit 1;") at postgres.c:1067
> > #11 0x00000000005f95de in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3936
> > #12 0x00000000005c94f6 in ServerLoop () at postmaster.c:3555
> > #13 0x00000000005ca0fe in PostmasterMain (argc=3, argv=0xaf0870) at postmaster.c:1092
> > #14 0x0000000000574070 in main (argc=3, argv=0xaf0870) at main.c:188
>
> Good. Is it possible to compile with debug symbols, -g? Odd you are
> crashing in libc.

this had debug:

./configure \
--prefix=/opt/pgsql-9.0.5a-int \
--enable-debug \
--disable-rpath \
--without-perl \
--without-python \
--without-tcl \
--without-openssl \
--without-pam \
--without-krb5 \
--without-gssapi \
--enable-nls \
--enable-integer-datetimes \
--enable-thread-safety \
--with-libxml \
--with-libxslt \
--without-ldap

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 20:53:12
Message-ID: 201109052053.p85KrCG10225@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> > Good. Is it possible to compile with debug symbols, -g? Odd you are
> > crashing in libc.
>
> this had debug:
>
> ./configure \
> --prefix=/opt/pgsql-9.0.5a-int \
> --enable-debug \
> --disable-rpath \
> --without-perl \
> --without-python \
> --without-tcl \
> --without-openssl \
> --without-pam \
> --without-krb5 \
> --without-gssapi \
> --enable-nls \
> --enable-integer-datetimes \
> --enable-thread-safety \
> --with-libxml \
> --with-libxslt \
> --without-ldap

--enable-debug adds internal debug calls, not compiler debug symbols.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: depesz(at)depesz(dot)com
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 21:04:32
Message-ID: 201109052104.p85L4Wx11935@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski wrote:
> On Mon, Sep 05, 2011 at 02:51:12PM -0400, Bruce Momjian wrote:
> > hubert depesz lubaczewski wrote:
> > > On Mon, Sep 05, 2011 at 02:18:18PM -0400, Bruce Momjian wrote:
> > > > hubert depesz lubaczewski wrote:
> > > > > I'm not sure if it's upgrade thing, or is it because of error in
> > > > > ltree/compilation, but it looks bad.
> > > > >
> > > > > Is there any more info I could show/gather to help debug the issue?
> > > >
> > > > I am confused by the error --- is it not loading, or can you get a
> > > > backtrace of the crash?
> > >
> > > The one in logs is not sufficient?
> > > If not - could you tell me how to make the backtrace? I'm by far not a c
> > > programmer, so for this I'd need some tutoring.
> >
> > I think you want this:
> >
> > http://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
> >
> > While strace is useful, it doesn't show us where the C code is failing.
>
> ok.
> got this:
>
> (gdb) bt
> #0 0x00007fdc28605095 in raise () from /lib/libc.so.6
> #1 0x00007fdc28606af0 in abort () from /lib/libc.so.6
> #2 0x00007fdc2863fa7b in ?? () from /lib/libc.so.6
> #3 0x00007fdc2864708a in ?? () from /lib/libc.so.6
> #4 0x00007fdc2864ac1c in free () from /lib/libc.so.6
> #5 0x00000000006c18c9 in AllocSetDelete (context=<value optimized out>) at aset.c:551
> #6 0x00000000006c1e54 in MemoryContextDelete (context=0xbdae80) at mcxt.c:196
> #7 0x000000000054913e in standard_ExecutorEnd (queryDesc=0xbbb4f0) at execMain.c:360
> #8 0x000000000051c88f in PortalCleanup (portal=0xbb7a70) at portalcmds.c:268
> #9 0x00000000006c26fc in PortalDrop (portal=0xbb7a70, isTopCommit=0 '\0') at portalmem.c:434
> #10 0x00000000005f8c95 in exec_simple_query (query_string=0xb9b980 "select * from categories limit 1;") at postgres.c:1067
> #11 0x00000000005f95de in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, username=<value optimized out>) at postgres.c:3936
> #12 0x00000000005c94f6 in ServerLoop () at postmaster.c:3555
> #13 0x00000000005ca0fe in PostmasterMain (argc=3, argv=0xaf0870) at postmaster.c:1092
> #14 0x0000000000574070 in main (argc=3, argv=0xaf0870) at main.c:188

Odd it is dying in the memory freeing at executor close --- not in the
ltree code.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: depesz(at)depesz(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-05 21:26:00
Message-ID: 15262.1315257960@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Odd it is dying in the memory freeing at executor close --- not in the
> ltree code.

Doesn't seem odd. The glibc complaint previously shown already
indicates this is a memory stomp problem.

--enable-cassert might (or might not) provide additional help.

regards, tom lane


From: daveg <daveg(at)sonic(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 00:16:50
Message-ID: 20110906001650.GI24583@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Sorry I missed your reply, catching up now.

On Wed, Aug 31, 2011 at 09:56:59PM -0400, Bruce Momjian wrote:
> daveg wrote:
> > On Mon, Aug 29, 2011 at 07:49:24PM +0200, hubert depesz lubaczewski wrote:
> > > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
> > >
> > > Interestingly.
> > >
> > > In old dir there is pg_clog directory with files:
> > > 0AC0 .. 0DAF (including 0CC6, size 262144)
> > > but new pg_clog has only:
> > > 0D2F .. 0DB0
> > >
> > > File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
> > > 3c5122f3e80851735c19522065a2d12a 0DAF
> > > 8651fc2b9fa3d27cfb5b496165cead68 0DB0
> > >
> > > 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
...
> > I had this same thing happen this Saturday just past and my client had to
> > restore the whole 2+ TB instance from the previous days pg_dumps.
...
> > After running pg_upgrade apparently successfully and analyzeing all the

Update: reviewing the logs I see some of the analyzes hit the "could not
access status of transaction" error too.

> > tables we restarted the production workload and started getting errors:
> >
> > 2011-08-27 04:18:34.015 12337 c06 postgres ERROR: could not access status of transaction 2923961093
> > 2011-08-27 04:18:34.015 12337 c06 postgres DETAIL: Could not open file "pg_clog/0AE4": No such file or directory.
> > 2011-08-27 04:18:34.015 12337 c06 postgres STATEMENT: analyze public.b_pxx;
> >
> > On examination the pg_clog directory contained on two files timestamped
> > after the startup of the new cluster with 9.0.4. Other hosts that upgraded
> > successfully had numerous files in pg_clog dating back a few days. So it
> > appears that all the clog files went missing during the upgrade somehow.
> > a
> > This happened upgrading from 8.4.7 to 9.0.4, with a brief session in between
> > at 8.4.8. We have upgraded several hosts to 9.0.4 successfully previously.
>
> I have posted this fix to the hackers email list, but I found it only
> affected old 8.3 servers, not old 8.4.X, so I am confused by your bug
> report.
>
> I have tested 8.4.X to 9.0.4 and found pg_upgrade preserves toast
> relfrozenxids properly in that case.
>
> Can you tell me what table is showing this error? Does it happen during
> vacuum? Can you run a vacuum verbose to see what it is throwing the
> error on? Thanks.

This was upgrading from 8.4.8 to 9.0.4. I don't have the running cluster
anymore, but I do have tar.gz archives of it and could probably find
2TB free somewhere to restore it to if there is something useful to extract.

However, I don't think this was toast related. Most of our rows are short and
have only int, float, and short text columns. These errors hit over 60
different tables mostly during the analyzes we ran immediately after the
upgrade. It also hit during select, insert and delete statements. We did not
run the db more than a few minutes as the damage was so extensive.

As far as I can tell pg_upgrade never copied any pg_clog files from the
old cluster to the new cluster. I wish I had detected that before running
the remove_old_cluster.sh script.

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: daveg <daveg(at)sonic(dot)net>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 00:19:21
Message-ID: 201109060019.p860JL005675@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

daveg wrote:
> > Can you tell me what table is showing this error? Does it happen during
> > vacuum? Can you run a vacuum verbose to see what it is throwing the
> > error on? Thanks.
>
> This was upgrading from 8.4.8 to 9.0.4. I don't have the running cluster
> anymore, but I do have tar.gz archives of it and could probably find
> 2TB free somewhere to restore it to if there is something useful to extract.
>
> However, I don't think this was toast related. Most of our rows are short and
> have only int, float, and short text columns. These errors hit over 60
> different tables mostly during the analyzes we ran immediately after the
> upgrade. It also hit during select, insert and delete statements. We did not
> run the db more than a few minutes as the damage was so extensive.
>
> As far as I can tell pg_upgrade never copied any pg_clog files from the
> old cluster to the new cluster. I wish I had detected that before running
> the remove_old_cluster.sh script.

Wow, no clogs? That would make the system very confused. You can pull
the clogs out of the old backup and move them over if the files don't
already exist.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: daveg <daveg(at)sonic(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 02:17:28
Message-ID: 20110906021728.GK24583@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Sep 05, 2011 at 08:19:21PM -0400, Bruce Momjian wrote:
> daveg wrote:
> > > Can you tell me what table is showing this error? Does it happen during
> > > vacuum? Can you run a vacuum verbose to see what it is throwing the
> > > error on? Thanks.
> >
> > This was upgrading from 8.4.8 to 9.0.4. I don't have the running cluster
> > anymore, but I do have tar.gz archives of it and could probably find
> > 2TB free somewhere to restore it to if there is something useful to extract.
> >
> > However, I don't think this was toast related. Most of our rows are short and
> > have only int, float, and short text columns. These errors hit over 60
> > different tables mostly during the analyzes we ran immediately after the
> > upgrade. It also hit during select, insert and delete statements. We did not
> > run the db more than a few minutes as the damage was so extensive.
> >
> > As far as I can tell pg_upgrade never copied any pg_clog files from the
> > old cluster to the new cluster. I wish I had detected that before running
> > the remove_old_cluster.sh script.
>
> Wow, no clogs? That would make the system very confused. You can pull
> the clogs out of the old backup and move them over if the files don't
> already exist.

We don't have the old cluster after running delete_old_cluster.ch. We use
pg_dump for backup, so no clogs. We ended up restored 20 odd dbs totalling
2.1TB from the previous days pg_dumps.

If you review my original report I mentioned that there were only 2 clog
files in the new cluster both with ctime after the start of postgresql
after the upgrade. I did the upgrade for three hosts at the same time, the
others were fine. They have dozens of clogs dating back days before the
upgrade. The failing system had only 2 recent clog.

-dg

> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +
>

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: daveg <daveg(at)sonic(dot)net>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 02:35:03
Message-ID: 201109060235.p862Z3w25939@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

daveg wrote:
> > > As far as I can tell pg_upgrade never copied any pg_clog files from the
> > > old cluster to the new cluster. I wish I had detected that before running
> > > the remove_old_cluster.sh script.
> >
> > Wow, no clogs? That would make the system very confused. You can pull
> > the clogs out of the old backup and move them over if the files don't
> > already exist.
>
> We don't have the old cluster after running delete_old_cluster.ch. We use
> pg_dump for backup, so no clogs. We ended up restored 20 odd dbs totalling
> 2.1TB from the previous days pg_dumps.
>
> If you review my original report I mentioned that there were only 2 clog
> files in the new cluster both with ctime after the start of postgresql
> after the upgrade. I did the upgrade for three hosts at the same time, the
> others were fine. They have dozens of clogs dating back days before the
> upgrade. The failing system had only 2 recent clog.

That is certainly unusual.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: depesz(at)depesz(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 06:20:02
Message-ID: 1315290002.29771.0.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On mån, 2011-09-05 at 16:53 -0400, Bruce Momjian wrote:
> hubert depesz lubaczewski wrote:
> > > Good. Is it possible to compile with debug symbols, -g? Odd you are
> > > crashing in libc.
> >
> > this had debug:
> >
> > ./configure \
> > --prefix=/opt/pgsql-9.0.5a-int \
> > --enable-debug \
> > --disable-rpath \
> > --without-perl \
> > --without-python \
> > --without-tcl \
> > --without-openssl \
> > --without-pam \
> > --without-krb5 \
> > --without-gssapi \
> > --enable-nls \
> > --enable-integer-datetimes \
> > --enable-thread-safety \
> > --with-libxml \
> > --with-libxslt \
> > --without-ldap
>
> --enable-debug adds internal debug calls, not compiler debug symbols.

No, you have that backwards.


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 10:32:22
Message-ID: 20110906103222.GA20760@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Sep 05, 2011 at 05:26:00PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Odd it is dying in the memory freeing at executor close --- not in the
> > ltree code.
>
> Doesn't seem odd. The glibc complaint previously shown already
> indicates this is a memory stomp problem.
>
> --enable-cassert might (or might not) provide additional help.

recompiled with cassert.

result:

=# select * from categories where category_id = 177;
The connection to the server was lost. Attempting reset: Succeeded.

which is interesting, as the error is different.

logs show:

2011-09-06 10:28:58 UTC () [21986]: [1-1] user=[unknown],db=[unknown] LOG: connection received: host=[local]
2011-09-06 10:28:58 UTC ([local]) [21986]: [2-1] user=postgres,db=xxxxxxx LOG: connection authorized: user=postgres database=xxxxxxx
2011-09-06 10:28:58 UTC () [21977]: [2-1] user=,db= LOG: server process (PID 21985) was terminated by signal 11: Segmentation fault
2011-09-06 10:28:58 UTC () [21977]: [3-1] user=,db= LOG: terminating any other active server processes
2011-09-06 10:28:58 UTC ([local]) [21986]: [3-1] user=postgres,db=xxxxxxx WARNING: terminating connection because of crash of another server process
2011-09-06 10:28:58 UTC ([local]) [21986]: [4-1] user=postgres,db=xxxxxxx DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2011-09-06 10:28:58 UTC ([local]) [21986]: [5-1] user=postgres,db=xxxxxxx HINT: In a moment you should be able to reconnect to the database and repeat your command.

gdb backtrace is even less helpful:

$ gdb -q -c core* /opt/pgsql-9.0.5a-int/bin/postgres

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /usr/lib/libxml2.so.2...done.
Loaded symbols for /usr/lib/libxml2.so.2
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /usr/lib/libz.so.1...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /lib/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Core was generated by `postgres: postgres xxxxxxx [local] SELECT '.
Program terminated with signal 11, Segmentation fault.
[New process 21985]
#0 0x00007fe18c235e4b in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x00007fe18c235e4b in memcpy () from /lib/libc.so.6
#1 0x00007fe1897532e4 in ?? ()
#2 0x0000000000000000 in ?? ()
(gdb)

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 11:13:58
Message-ID: 20110906111358.GA24613@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

Worked a bit to get the ltree problem down to smallest possible, repeatable, situation.

Initial setup:

1. PostgreSQL 8.3.11, configured with:
./configure \
--prefix=/opt/pgsql-8.3.11-int \
--disable-rpath \
--without-perl \
--without-python \
--without-tcl \
--without-openssl \
--without-pam \
--without-krb5 \
--without-gssapi \
--enable-nls \
--enable-integer-datetimes \
--enable-thread-safety \
--with-libxml \
--with-libxslt \
--without-ldap
Built and installed with contrib modules.

2. PostgreSQL 9.0.5 (pre), from git checkout, head of 9.0 branch. Configured with:
./configure \
--prefix=/opt/pgsql-9.0.5a-int \
--enable-debug \
--enable-cassert \
--disable-rpath \
--without-perl \
--without-python \
--without-tcl \
--without-openssl \
--without-pam \
--without-krb5 \
--without-gssapi \
--enable-nls \
--enable-integer-datetimes \
--enable-thread-safety \
--with-libxml \
--with-libxslt \
--without-ldap

Now with these two in place, I can make the test:

=$ mkdir /var/tmp/test
=$ cd /var/tmp/test/
=$ export LD_LIBRARY_PATH=/opt/pgsql-8.3.11-int/lib/
=$ export PATH="/opt/pgsql-8.3.11-int/bin:$PATH"
=$ mkdir data-8.3
=$ initdb -D data-8.3
=$ pg_ctl -D data-8.3 -l logfile-8.3 start
=$ psql -d postgres -f /opt/pgsql-8.3.11-int/share/contrib/ltree.sql
=$ psql -d postgres -c "create table z (x ltree)"
=$ psql -d postgres -c "insert into z (x) values ('a.b')"
=$ pg_ctl -D data-8.3/ stop
=$ export LD_LIBRARY_PATH=/opt/pgsql-9.0.5a-int/lib/:/opt/pgsql-8.3.11-int/lib/
=$ export PATH=/opt/pgsql-9.0.5a-int/bin/:/opt/pgsql-8.3.11-int/bin/:~/bin:/usr/local/bin:/usr/bin:/bin
=$ mkdir data-9.0
=$ initdb -D data-9.0/
=$ perl -pi -e 's/#port = 5432/port=7654/' data-9.0/postgresql.conf
=$ pg_upgrade -v -b /opt/pgsql-8.3.11-int/bin/ -B /opt/pgsql-9.0.5a-int/bin/ -d $( pwd )/data-8.3/ -D $( pwd )/data-9.0 -k -l pg_upgrade.log -p 5432 -P 7654
=$ pg_ctl -D data-9.0 -l logfile-9.0 start
=$ psql -p 7654 -d postgres -c "select * from z"
WARNING: detected write past chunk end in ExecutorState 0xc500a0
WARNING: problem in alloc set ExecutorState: req size > alloc size for chunk 0xc500d8 in block 0xc4dfd0
WARNING: problem in alloc set ExecutorState: bad single-chunk 0xc500d8 in block 0xc4dfd0
WARNING: problem in alloc set ExecutorState: bogus aset link in block 0xc4dfd0, chunk 0xc500d8
WARNING: problem in alloc set ExecutorState: found inconsistent memory block 0xc4dfd0
connection to server was lost

Hope it helps.

Best regards,

depesz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: depesz(at)depesz(dot)com
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 20:36:35
Message-ID: 20829.1315341395@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> Worked a bit to get the ltree problem down to smallest possible, repeatable, situation.

I looked at this again and verified that indeed, commit
8eee65c996048848c20f6637c1d12b319a4ce244 introduced an incompatible
change into the on-disk format of ltree columns: it widened
ltree_level.len, which is one component of an ltree on disk.
So the crash is hardly surprising. I think that the only thing
pg_upgrade could do about it is refuse to upgrade when ltree columns
are present in an 8.3 database. I'm not sure though how you'd identify
contrib/ltree versus some random user-defined type named ltree.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: depesz(at)depesz(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-07 01:21:02
Message-ID: 201109070121.p871L2h24928@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> > Worked a bit to get the ltree problem down to smallest possible, repeatable, situation.
>
> I looked at this again and verified that indeed, commit
> 8eee65c996048848c20f6637c1d12b319a4ce244 introduced an incompatible
> change into the on-disk format of ltree columns: it widened
> ltree_level.len, which is one component of an ltree on disk.
> So the crash is hardly surprising. I think that the only thing
> pg_upgrade could do about it is refuse to upgrade when ltree columns
> are present in an 8.3 database. I'm not sure though how you'd identify
> contrib/ltree versus some random user-defined type named ltree.

It is actually easy to do using the attached patch. I check for the
functions that support the data type and check of they are from an
'ltree' shared object. I don't check actual user table type names in
this case.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/pg_upgrade text/x-diff 3.9 KB

From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-07 08:07:41
Message-ID: 20110907080741.GA9936@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, Sep 06, 2011 at 09:21:02PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> > > Worked a bit to get the ltree problem down to smallest possible, repeatable, situation.
> >
> > I looked at this again and verified that indeed, commit
> > 8eee65c996048848c20f6637c1d12b319a4ce244 introduced an incompatible
> > change into the on-disk format of ltree columns: it widened
> > ltree_level.len, which is one component of an ltree on disk.
> > So the crash is hardly surprising. I think that the only thing
> > pg_upgrade could do about it is refuse to upgrade when ltree columns
> > are present in an 8.3 database. I'm not sure though how you'd identify
> > contrib/ltree versus some random user-defined type named ltree.
>
> It is actually easy to do using the attached patch. I check for the
> functions that support the data type and check of they are from an
> 'ltree' shared object. I don't check actual user table type names in
> this case.

While it will prevent failures in future, it doesn't solve my problem
now :(

Will try to do it via:
- drop indexes on ltree
- convert ltree to text
- upgrade
- convert text to ltree
- create indexes on ltree

Best regards,

depesz


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, depesz(at)depesz(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-07 18:43:19
Message-ID: 201109071843.p87IhJY27277@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Tom Lane wrote:
> > hubert depesz lubaczewski <depesz(at)depesz(dot)com> writes:
> > > Worked a bit to get the ltree problem down to smallest possible, repeatable, situation.
> >
> > I looked at this again and verified that indeed, commit
> > 8eee65c996048848c20f6637c1d12b319a4ce244 introduced an incompatible
> > change into the on-disk format of ltree columns: it widened
> > ltree_level.len, which is one component of an ltree on disk.
> > So the crash is hardly surprising. I think that the only thing
> > pg_upgrade could do about it is refuse to upgrade when ltree columns
> > are present in an 8.3 database. I'm not sure though how you'd identify
> > contrib/ltree versus some random user-defined type named ltree.
>
> It is actually easy to do using the attached patch. I check for the
> functions that support the data type and check of they are from an
> 'ltree' shared object. I don't check actual user table type names in
> this case.

Attached patch applied to 9.0, 9.1, and HEAD. Doc changes included.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/pg_upgrade.9.0 text/x-diff 4.5 KB
/rtmp/pg_upgrade text/x-diff 4.5 KB