Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)

Lists: pgsql-bugspgsql-hackers
From: mfwilson(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-12 22:16:18
Message-ID: E1SpRgI-0007HO-UN@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 6733
Logged by: Mike Wilson
Email address: mfwilson(at)gmail(dot)com
PostgreSQL version: 9.1.4
Operating system: Solaris 10
Description:

Filing this under PG914 since there isn't an option for PG9.2beta2.
Possibly this should be filed elsewhere? If so apologies in advance.

We are starting to prepare for the PG9.2 release so I have been performing
some test pg_upgrades on our existing PG8.4.2 cluster. We have a sizable
database and limited space so we have been testing upgrading our cluster
using --link mode of pg_upgrade. After performing an upgrade though the new
PG92 database has only empty tables.

Running this:
export NEWPG=/usr/postgres/postgresql-9.2.0_slony-2.1.1
export OLDPG=/usr/postgres/pg842_slony210

LD_LIBRARY_PATH=${NEWPG}/lib PATH=${NEWPG}/bin:${PATH} /usr/bin/time
${NEWPG}/bin/pg_upgrade --verbose --link \
--old-datadir=/opt/postgres/db/root/old
--new-datadir=/opt/postgres/db/root/new --old-bindir=${OLDPG}/bin/64/ \
--new-bindir=${NEWPG}/bin/ --old-port=5432 --new-port=5920 --user=postgres

This runs without errors and I am able to start the new upgraded PG92beta2
cluster. The schemas appear to be correct as well as the table definitions
and other schema objects. The tables themselves though are all empty. Not
sure what is going on


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mfwilson(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-12 22:44:06
Message-ID: 13290.1342133046@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

mfwilson(at)gmail(dot)com writes:
> This runs without errors and I am able to start the new upgraded PG92beta2
> cluster. The schemas appear to be correct as well as the table definitions
> and other schema objects. The tables themselves though are all empty.

Hmm, maybe it's confused about XID past/future? Could we see the output
of pg_controldata for both old and new clusters?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: mfwilson(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-12 22:48:26
Message-ID: 20120712224826.GA25074@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Jul 12, 2012 at 10:16:18PM +0000, mfwilson(at)gmail(dot)com wrote:
> The following bug has been logged on the website:
>
> Bug reference: 6733
> Logged by: Mike Wilson
> Email address: mfwilson(at)gmail(dot)com
> PostgreSQL version: 9.1.4
> Operating system: Solaris 10
> Description:
>
> Filing this under PG914 since there isn't an option for PG9.2beta2.
> Possibly this should be filed elsewhere? If so apologies in advance.
>
> We are starting to prepare for the PG9.2 release so I have been performing
> some test pg_upgrades on our existing PG8.4.2 cluster. We have a sizable
> database and limited space so we have been testing upgrading our cluster
> using --link mode of pg_upgrade. After performing an upgrade though the new
> PG92 database has only empty tables.
>
> Running this:
> export NEWPG=/usr/postgres/postgresql-9.2.0_slony-2.1.1
> export OLDPG=/usr/postgres/pg842_slony210
>
> LD_LIBRARY_PATH=${NEWPG}/lib PATH=${NEWPG}/bin:${PATH} /usr/bin/time
> ${NEWPG}/bin/pg_upgrade --verbose --link \
> --old-datadir=/opt/postgres/db/root/old
> --new-datadir=/opt/postgres/db/root/new --old-bindir=${OLDPG}/bin/64/ \
> --new-bindir=${NEWPG}/bin/ --old-port=5432 --new-port=5920 --user=postgres
>
> This runs without errors and I am able to start the new upgraded PG92beta2
> cluster. The schemas appear to be correct as well as the table definitions
> and other schema objects. The tables themselves though are all empty. Not
> sure what is going on

OK, I just tested an upgrade from 9.1.4 to 9.2 HEAD and it worked just
fine using the regression database as content. The big question is what
is different about your setup. Can you check the data files in
/data/base/db_oid and see if they are all zero length in the new
cluster? A simple 'ls -l' should show it:

$ pwd
/u/pg/data/base/16413
$ ls -l
-rw------- 1 postgres postgres 122880 Jul 12 18:39 11744
-rw------- 1 postgres postgres 24576 Jul 12 18:39 11744_fsm
-rw------- 1 postgres postgres 8192 Jul 12 18:39 11744_vm
-rw------- 1 postgres postgres 16384 Jul 12 18:39 11746
-rw------- 1 postgres postgres 24576 Jul 12 18:39 11746_fsm
-----

These should _not_ be all zeros. Please let me know what you find.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mfwilson(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-12 22:49:29
Message-ID: 20120712224929.GB25074@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Jul 12, 2012 at 06:44:06PM -0400, Tom Lane wrote:
> mfwilson(at)gmail(dot)com writes:
> > This runs without errors and I am able to start the new upgraded PG92beta2
> > cluster. The schemas appear to be correct as well as the table definitions
> > and other schema objects. The tables themselves though are all empty.
>
> Hmm, maybe it's confused about XID past/future? Could we see the output
> of pg_controldata for both old and new clusters?

Uh, I thought we only changfed the xlog stuff in 9.3, not 9.2, so I am
confused what would have changed in that area.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Mike Wilson <mfwilson(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-13 00:21:31
Message-ID: 517A874F-B310-4340-803D-EC8625C8B497@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

This can be closed. I figured out what I was doing wrong, which was after the conversion I was cleaning up the old datadir by deleting it, which destroyed the hard links to the data since I am using pg_upgrade --link

Mike Wilson

On Jul 12, 2012, at 3:49 PM, Bruce Momjian wrote:

> On Thu, Jul 12, 2012 at 06:44:06PM -0400, Tom Lane wrote:
>> mfwilson(at)gmail(dot)com writes:
>>> This runs without errors and I am able to start the new upgraded PG92beta2
>>> cluster. The schemas appear to be correct as well as the table definitions
>>> and other schema objects. The tables themselves though are all empty.
>>
>> Hmm, maybe it's confused about XID past/future? Could we see the output
>> of pg_controldata for both old and new clusters?
>
> Uh, I thought we only changfed the xlog stuff in 9.3, not 9.2, so I am
> confused what would have changed in that area.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Mike Wilson <mfwilson(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-13 01:52:41
Message-ID: 20120713015241.GA15443@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Jul 12, 2012 at 05:21:31PM -0700, Mike Wilson wrote:
> This can be closed. I figured out what I was doing wrong, which was
> after the conversion I was cleaning up the old datadir by deleting it,
> which destroyed the hard links to the data since I am using pg_upgrade
> --link

Uh, actually, a hard link has two directory entries pointing to the same
file, so you can delete the old datadir and the new datadir should not
be affected.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Mike Wilson <mfwilson(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-15 21:15:35
Message-ID: 8D0918C1-39A0-429A-96DB-997BE362141A@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I've had some time to examine this closer over the weekend. It appears that pg_upgrade for 9.2b2 segfaults which more than likely has something to do with the resulting converted database appearing to have no rows. Earlier in this thread I reported that I was able to get the upgrade to work and this thread to be closed but I was in error. At the time I was also testing with the 9.1.4 pg_upgrade which does work and I thought that I had a successful 9.2b2 pg_upgrade run. Apologies for the confusion and let me know if you would like me to start a new thread.

<pg_upgrade 9.2b2>
...
pg_toast.pg_toast_948075_index: 948081 to 948081
c0.page_metadata_values_pkey: 948082 to 948082
c0.i_page_metadata_values_short_name: 948084 to 948084

Segmentation Fault (core dumped)
root(at)db4 /
</>
My upgrade procedure is scripted and I hadn't noticed the core dump when I first reported the bug. Here are the parameters of the run:
su - postgres -c "pg_upgrade --verbose --link \
--old-datadir=/opt/postgres/db/root/old --new-datadir=/opt/postgres/db/root/new --old-bindir=${OLDPG}/bin/64/ \
--new-bindir=${NEWPG}/bin/ --old-port=5432 --new-port=5920 --user=postgres"

As a test I have also been using the pg_upgrade from 9.1.4 which does work:
<pg_upgrade 9.1.4>

relname: pg_toast.pg_toast_948075: reloid: 948079 reltblspace:
relname: pg_toast.pg_toast_948075_index: reloid: 948081 reltblspace:
relname: c0.page_metadata_values_pkey: reloid: 948082 reltblspace:
relname: c0.i_page_metadata_values_short_name: reloid: 948084 reltblspace:

Database: postgres
relname: pg_catalog.pg_largeobject: reloid: 2613 reltblspace:
relname: pg_catalog.pg_largeobject_loid_pn_index: reloid: 2683 reltblspace:

Database: template1
relname: pg_catalog.pg_largeobject: reloid: 2613 reltblspace:
relname: pg_catalog.pg_largeobject_loid_pn_index: reloid: 2683 reltblspace:

executing: SELECT spclocation FROM pg_catalog.pg_tablespace WHERE spcname != 'pg_default' AND spcname != 'pg_global'

</>

I've also tried a step-wise migration by first converting to PG914 and then to PG92b2. This also fails with a similar segfault after the c0.i_page_metadata_values_short_name index.

Of possible note in this DB is that the previous DBA renamed the "postgres" user. As part of this conversion process I am renaming it back to it's default. I'm doing this before running pg_upgrade:
# shift jibjab su (postgres) account back to postgres rolname
su - postgres -c "psql -U jibjab c0 -c \"update pg_authid set rolname='postgres' where oid=10;\""

This probably isn't an issue as the 9.1.4 conversion works but I thought I should at least mention it. Actually I don't think pg_upgrade will run correctly if there isn't a postgres user so I imagine I need to correct this issue before running the upgrade procedure anyway.

For now I am stymied in my attempt to upgrade and may have to look at trying to get the non-link version of the upgrade working. That would be relatively painful though as this upgrade will be for a commercial internet site that can't easily tolerate a long down and the production DB is over a TB in size. I am really looking forward to 9.2's index only scans due to the size of the DB!

Cheers and thanks for any information you have on the issue.

Mike Wilson
mfwilson(at)gmail(dot)com

On Jul 12, 2012, at 6:52 PM, Bruce Momjian wrote:

> On Thu, Jul 12, 2012 at 05:21:31PM -0700, Mike Wilson wrote:
>> This can be closed. I figured out what I was doing wrong, which was
>> after the conversion I was cleaning up the old datadir by deleting it,
>> which destroyed the hard links to the data since I am using pg_upgrade
>> --link
>
> Uh, actually, a hard link has two directory entries pointing to the same
> file, so you can delete the old datadir and the new datadir should not
> be affected.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mike Wilson <mfwilson(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-15 21:45:28
Message-ID: 23299.1342388728@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Mike Wilson <mfwilson(at)gmail(dot)com> writes:
> I've had some time to examine this closer over the weekend. It
> appears that pg_upgrade for 9.2b2 segfaults which more than likely has
> something to do with the resulting converted database appearing to
> have no rows.

Yeah, more than likely :-(. Could we see a stack trace from the
segfault?

> Of possible note in this DB is that the previous DBA renamed the
> "postgres" user.

Hmm. There is a known bug in beta2 that's triggered by old and new
clusters not having the same name for the bootstrap superuser; although
I don't recall that the symptoms included a segfault. In any case,
I'd suggest making sure the new cluster is initdb'd under the same
account that currently owns the old cluster.

regards, tom lane


From: Mike Wilson <mfwilson(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-16 17:58:11
Message-ID: 4CF24053-1540-4843-AC50-45F37B9FC536@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

core stack:
root(at)db4 / $ pstack ~postgres/core
core '/opt/postgres/core' of 19868: pg_upgrade --verbose --link --old-datadir=/opt/postgres/db/root/old --
fffffd7ffeda1148 memcpy () + 6b8
000000000040b8b6 transfer_single_new_db () + fa
000000000040b6ea transfer_all_new_dbs () + 116
000000000040ae62 main () + 106
000000000040580c ???????? ()

As to the ownership, the bash script I am testing 9.1.4 and 9.2.0 with recursively chowns the directory that owns the old and the new PGDATA directory before running pg_upgrade.

Mike Wilson
mfwilson(at)gmail(dot)com

On Jul 15, 2012, at 2:45 PM, Tom Lane wrote:

> Mike Wilson <mfwilson(at)gmail(dot)com> writes:
>> I've had some time to examine this closer over the weekend. It
>> appears that pg_upgrade for 9.2b2 segfaults which more than likely has
>> something to do with the resulting converted database appearing to
>> have no rows.
>
> Yeah, more than likely :-(. Could we see a stack trace from the
> segfault?
>
>> Of possible note in this DB is that the previous DBA renamed the
>> "postgres" user.
>
> Hmm. There is a known bug in beta2 that's triggered by old and new
> clusters not having the same name for the bootstrap superuser; although
> I don't recall that the symptoms included a segfault. In any case,
> I'd suggest making sure the new cluster is initdb'd under the same
> account that currently owns the old cluster.
>
> regards, tom lane


From: Mike Wilson <mfwilson(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-16 19:56:25
Message-ID: 8A0530E7-4D74-41BD-BD48-A30E9B31D172@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Just for my interest I also created a new PG 842 db (initdb) and put some sample data it it and successfully did a pg_upgrade from 842 -> 920b2. Whatever the issue is it seems to be related to my actual database as a sample db conversion works. Again, 842-914 conversion works for my db though so possibly it was some change that happened to pg_upgrade since 914 was released that is having poor interaction with my actual db cluster.

Also, I wanted to make sure I wasn't using some git checkout of 920 I downloaded the postgresql-9.2.beta2.tgz from the PostgreSQL site dir-browser and re-compiled. Still have the problem unfortunately.

Cheers.

Mike Wilson
mfwilson(at)gmail(dot)com

On Jul 15, 2012, at 2:45 PM, Tom Lane wrote:

> Mike Wilson <mfwilson(at)gmail(dot)com> writes:
>> I've had some time to examine this closer over the weekend. It
>> appears that pg_upgrade for 9.2b2 segfaults which more than likely has
>> something to do with the resulting converted database appearing to
>> have no rows.
>
> Yeah, more than likely :-(. Could we see a stack trace from the
> segfault?
>
>> Of possible note in this DB is that the previous DBA renamed the
>> "postgres" user.
>
> Hmm. There is a known bug in beta2 that's triggered by old and new
> clusters not having the same name for the bootstrap superuser; although
> I don't recall that the symptoms included a segfault. In any case,
> I'd suggest making sure the new cluster is initdb'd under the same
> account that currently owns the old cluster.
>
> regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mike Wilson <mfwilson(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-17 02:15:35
Message-ID: 11927.1342491335@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Mike Wilson <mfwilson(at)gmail(dot)com> writes:
> Just for my interest I also created a new PG 842 db (initdb) and put
> some sample data it it and successfully did a pg_upgrade from 842 ->
> 920b2. Whatever the issue is it seems to be related to my actual
> database as a sample db conversion works.

Yeah, I was suspecting that, because there's no obvious problem in the
part of pg_upgrade that your stack trace fingers.

What we need to do next is get something that one of the PG developers
can put under a microscope. What I would suggest first is doing a
"pg_dumpall -s" (ie no data) of your database, and seeing whether
loading that into 8.4 produces a database on which pg_upgrade fails.
If so, then you could send us that dump (perhaps after some sanitizing
of names) and we could have at it.

If you aren't able to create a self-contained reproducible case in this
way, the only other way forward is for you to debug it yourself or let
one of the PG developers have access to your system to poke at it.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Mike Wilson <mfwilson(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-17 20:59:49
Message-ID: 20120717205949.GA8146@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sun, Jul 15, 2012 at 02:15:35PM -0700, Mike Wilson wrote:
> I've had some time to examine this closer over the weekend. It appears that
> pg_upgrade for 9.2b2 segfaults which more than likely has something to do with
> the resulting converted database appearing to have no rows. Earlier in this
> thread I reported that I was able to get the upgrade to work and this thread to
> be closed but I was in error. At the time I was also testing with the 9.1.4
> pg_upgrade which does work and I thought that I had a successful 9.2b2
> pg_upgrade run. Apologies for the confusion and let me know if you would like
> me to start a new thread.
>
> <pg_upgrade 9.2b2>
> ...
> pg_toast.pg_toast_948075_index: 948081 to 948081
> c0.page_metadata_values_pkey: 948082 to 948082
> c0.i_page_metadata_values_short_name: 948084 to 948084
>
>
> Segmentation Fault (core dumped)

What is the actual program output that appears before the core dump
message?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Mike Wilson <mfwilson(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-17 23:39:15
Message-ID: 20120717233915.GC8146@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jul 17, 2012 at 04:01:08PM -0700, Mike Wilson wrote:
> Please find below the full pg_upgrade output. Let me know if there are any other questions that I may have missed. Note, the attached pg_upgrade stdout log is for the schema only pg_dumpall upgrade attempt from PG842 -> PG920b2.
>

> -rw-r--r-- mwilson/staff 314381 2012-07-17 18:57 pg842_pg_upgrade.log

FYI, I was supplied with an anonymous SQL dump of the database and was
unable to reproduce the failure on Debian.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Mike Wilson <mfwilson(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-18 00:41:05
Message-ID: A36767EF-DE6B-45A4-B187-F99C46FF5551@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Given the stack trace previously supplied and that the pg_upgrade from 9.1.4 does actually complete successfully can you recommend how I might continue to diagnose the issue myself? Since the 9.1.4 version of pg_upgrade did work I've started to look at the source for pg_upgrade in an attempt to see if the 9.1.4 version of the tool could be used instead of the 9.2.0 version. Due to the # of differences in the pg_upgrade source though a port seems unlikely.

Can I get the git version string of the PG source you used for the version of the source that you tested with? I'd like to make sure that I'm using the same version as your successful test. Thanks.

Mike Wilson
mfwilson(at)gmail(dot)com

On Jul 17, 2012, at 4:39 PM, Bruce Momjian wrote:

> On Tue, Jul 17, 2012 at 04:01:08PM -0700, Mike Wilson wrote:
>> Please find below the full pg_upgrade output. Let me know if there are any other questions that I may have missed. Note, the attached pg_upgrade stdout log is for the schema only pg_dumpall upgrade attempt from PG842 -> PG920b2.
>>
>
>> -rw-r--r-- mwilson/staff 314381 2012-07-17 18:57 pg842_pg_upgrade.log
>
> FYI, I was supplied with an anonymous SQL dump of the database and was
> unable to reproduce the failure on Debian.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Mike Wilson <mfwilson(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-18 04:18:30
Message-ID: 20120718041830.GB29910@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jul 17, 2012 at 05:41:05PM -0700, Mike Wilson wrote:
> Given the stack trace previously supplied and that the pg_upgrade from 9.1.4
> does actually complete successfully can you recommend how I might continue to
> diagnose the issue myself? Since the 9.1.4 version of pg_upgrade did work I've
> started to look at the source for pg_upgrade in an attempt to see if the 9.1.4
> version of the tool could be used instead of the 9.2.0 version. Due to the # of
> differences in the pg_upgrade source though a port seems unlikely.
>
> Can I get the git version string of the PG source you used for the version of
> the source that you tested with? I'd like to make sure that I'm using the same
> version as your successful test. Thanks.

I am using git head for testing. Tom sees a few things odd in
load_directory() that might be causing some problems on Solaris, and
this is new code for 9.2 for Solaris, so that might explain it. I think
we need Tom to finish and then if you can grab our git source and test
that, it would be great!

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-18 04:31:53
Message-ID: 15721.1342585913@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I am using git head for testing. Tom sees a few things odd in
> load_directory() that might be causing some problems on Solaris, and
> this is new code for 9.2 for Solaris, so that might explain it. I think
> we need Tom to finish and then if you can grab our git source and test
> that, it would be great!

The only thing I see that looks likely to represent a platform-specific
issue is the entrysize calculation. Mike, just out of curiosity, could
you see if the attached patch makes things better for you?

regards, tom lane

Attachment Content-Type Size
entrysize-fix.patch text/x-patch 691 bytes

From: Mike Wilson <mfwilson(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-18 22:39:31
Message-ID: 64AEA226-6CF1-4B5E-B78F-062BEB4CCD3B@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom, after patching pg_upgrade now runs successfully. I noticed that this patch had been applied since yesterday to the REL9_2_STABLE so I also tested with a git pull without the patch that appears to work also. I think issue has been resolved for me, thanks so much! You guys rock!

Mike Wilson
mfwilson(at)gmail(dot)com

On Jul 17, 2012, at 9:31 PM, Tom Lane wrote:

> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> I am using git head for testing. Tom sees a few things odd in
>> load_directory() that might be causing some problems on Solaris, and
>> this is new code for 9.2 for Solaris, so that might explain it. I think
>> we need Tom to finish and then if you can grab our git source and test
>> that, it would be great!
>
> The only thing I see that looks likely to represent a platform-specific
> issue is the entrysize calculation. Mike, just out of curiosity, could
> you see if the attached patch makes things better for you?
>
> regards, tom lane
>
> diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c
> index 1dd3722142c9e83c1ec228099c3a3fd302a2179b..c886a67df43792a1692eec6b3b90238413e9f844 100644
> *** a/contrib/pg_upgrade/file.c
> --- b/contrib/pg_upgrade/file.c
> *************** load_directory(const char *dirname, stru
> *** 259,265 ****
> return -1;
> }
>
> ! entrysize = sizeof(struct dirent) - sizeof(direntry->d_name) +
> strlen(direntry->d_name) + 1;
>
> (*namelist)[name_num] = (struct dirent *) malloc(entrysize);
> --- 259,265 ----
> return -1;
> }
>
> ! entrysize = offsetof(struct dirent, d_name) +
> strlen(direntry->d_name) + 1;
>
> (*namelist)[name_num] = (struct dirent *) malloc(entrysize);


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-25 17:01:13
Message-ID: 20120725170113.GA22286@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Jul 19, 2012 at 02:24:36AM -0400, Bruce Momjian wrote:
> On Wed, Jul 18, 2012 at 01:30:34AM -0400, Tom Lane wrote:
> > I wrote:
> > > ... I'm wondering if maybe
> > > Solaris has a weird definition of struct dirent that breaks the
> > > calculation used here:
> >
> > > entrysize = sizeof(struct dirent) - sizeof(direntry->d_name) +
> > > strlen(direntry->d_name) + 1;
> >
> > Some googling suggests that on Solaris, this calculation would actually
> > give a result that's a little too *large* not too small, because of
> > padding in the declaration of struct dirent. That would not be a
> > problem so far as callers were concerned, but it's conceivable that the
> > memcpy in load_directory would run off the end of memory and crash.
> > Could that explain Mike's problem? You'd have to assume that the stack
> > trace he showed us had omitted to mention load_directory, but I don't
> > have a huge problem with making such an assumption. Anyway the offsetof
> > formulation should avoid this hazard, so I'm still interested to know
> > if that helps.
>
> Great to hear this fixed the problem. The only way I can see this
> causing a crash is:
>
> * the padding of direntry->d_name is less than the padding of
> struct dirent
> * there is a non-mapped memory range after the struct
> * the file name is near the full length of d_name
>
> Those are pretty rare odds. The only other issue is that this code
> assumes direntry->d_name is the last element of the struct, and I am not
> sure how we can assume that is true.
>
> Also, this code tries to be tricky by reducing the allocated memory
> instead of allocating the maximum path. Shouldn't there a comment
> about the propose of this code?

This bug report was fixed mostly via private email because a private
schema dump was sent to Tom and me. Tom fixed the problem by changing
the way we interact with struct dirent, but neither of us is sure
exactly why the fix worked. It is something about Solaris.

Tom suggested that load_directory() return a (char *) array, rather than
a struct dirent array, greatly simplifying the code.

I have done this in the attached patch, and because of the uncertainty
of ths fix, I would like to apply this to 9.2 as well.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 6.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-25 20:09:42
Message-ID: 2736.1343246982@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Thu, Jul 19, 2012 at 02:24:36AM -0400, Bruce Momjian wrote:
>> On Wed, Jul 18, 2012 at 01:30:34AM -0400, Tom Lane wrote:
>>> Some googling suggests that on Solaris, this calculation would actually
>>> give a result that's a little too *large* not too small, because of
>>> padding in the declaration of struct dirent. That would not be a
>>> problem so far as callers were concerned, but it's conceivable that the
>>> memcpy in load_directory would run off the end of memory and crash.

> This bug report was fixed mostly via private email because a private
> schema dump was sent to Tom and me. Tom fixed the problem by changing
> the way we interact with struct dirent, but neither of us is sure
> exactly why the fix worked. It is something about Solaris.

I'm quite sure that the explanation is as above, that is, we were
computing an incorrectly large value for the length of the directory
entry, and that ran off the end of a memory-mapped directory page.
The actual length of the entry, for a five-character file name,
would be offsetof(d_name) plus 6 bytes, which assuming the entry
requires 4-byte alignment would pad to offsetof(d_name) plus 8;
but we were computing offsetof(d_name) plus 9 bytes, just enough
to try to fetch a byte that wasn't there.

> Tom suggested that load_directory() return a (char *) array, rather than
> a struct dirent array, greatly simplifying the code.
> I have done this in the attached patch, and because of the uncertainty
> of ths fix, I would like to apply this to 9.2 as well.

I think patching 9.2 is reasonable, but on the grounds of keeping it
parallel to HEAD, not that we don't know why the previous code was
broken.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-25 20:30:10
Message-ID: 3109.1343248210@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom suggested that load_directory() return a (char *) array, rather than
> a struct dirent array, greatly simplifying the code.
> I have done this in the attached patch, and because of the uncertainty
> of ths fix, I would like to apply this to 9.2 as well.

BTW, the patch neglects to update the header comment for load_directory.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-25 20:38:16
Message-ID: 20120725203816.GA21271@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Jul 25, 2012 at 04:09:42PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Thu, Jul 19, 2012 at 02:24:36AM -0400, Bruce Momjian wrote:
> >> On Wed, Jul 18, 2012 at 01:30:34AM -0400, Tom Lane wrote:
> >>> Some googling suggests that on Solaris, this calculation would actually
> >>> give a result that's a little too *large* not too small, because of
> >>> padding in the declaration of struct dirent. That would not be a
> >>> problem so far as callers were concerned, but it's conceivable that the
> >>> memcpy in load_directory would run off the end of memory and crash.
>
> > This bug report was fixed mostly via private email because a private
> > schema dump was sent to Tom and me. Tom fixed the problem by changing
> > the way we interact with struct dirent, but neither of us is sure
> > exactly why the fix worked. It is something about Solaris.
>
> I'm quite sure that the explanation is as above, that is, we were
> computing an incorrectly large value for the length of the directory
> entry, and that ran off the end of a memory-mapped directory page.
> The actual length of the entry, for a five-character file name,
> would be offsetof(d_name) plus 6 bytes, which assuming the entry
> requires 4-byte alignment would pad to offsetof(d_name) plus 8;
> but we were computing offsetof(d_name) plus 9 bytes, just enough
> to try to fetch a byte that wasn't there.

OK.

> > Tom suggested that load_directory() return a (char *) array, rather than
> > a struct dirent array, greatly simplifying the code.
> > I have done this in the attached patch, and because of the uncertainty
> > of ths fix, I would like to apply this to 9.2 as well.
>
> I think patching 9.2 is reasonable, but on the grounds of keeping it
> parallel to HEAD, not that we don't know why the previous code was
> broken.

Agreed.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-25 20:40:14
Message-ID: 20120725204014.GB21271@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Jul 25, 2012 at 04:30:10PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom suggested that load_directory() return a (char *) array, rather than
> > a struct dirent array, greatly simplifying the code.
> > I have done this in the attached patch, and because of the uncertainty
> > of ths fix, I would like to apply this to 9.2 as well.
>
> BTW, the patch neglects to update the header comment for load_directory.

Oh, good point. Updated patch attached.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 6.6 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mike Wilson <mfwilson(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Date: 2012-07-26 10:22:19
Message-ID: 20120726102219.GD21271@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Jul 25, 2012 at 04:40:14PM -0400, Bruce Momjian wrote:
> On Wed, Jul 25, 2012 at 04:30:10PM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Tom suggested that load_directory() return a (char *) array, rather than
> > > a struct dirent array, greatly simplifying the code.
> > > I have done this in the attached patch, and because of the uncertainty
> > > of ths fix, I would like to apply this to 9.2 as well.
> >
> > BTW, the patch neglects to update the header comment for load_directory.
>
> Oh, good point. Updated patch attached.

Applied and back-patched to 9.2.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +