Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)

Lists: pgsql-bugs
From: "Jon Erdman (aka StuckMojo)" <postgresql(at)thewickedtribe(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 05:09:00
Message-ID: 201001190509.o0J5908J065090@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5288
Logged by: Jon Erdman (aka StuckMojo)
Email address: postgresql(at)thewickedtribe(dot)net
PostgreSQL version: 8.5devel, 8.4
Operating system: Debian Sid
Description: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch
included)
Details:

So, I still run 7.4.5 for my medical billing app, and in playing around with
8.5 at AustinPUG last week I discovered that if I try to restore one of my
backups from 7.4 (produced with 7.4 pg_dump) into 8.5devel using 8.5
pg_restore and -j 2, it immediately segfaults. 8.4 does as well.

I built 8.5 with debug to get a bt and investigate what's going on, and I
found that it's a dependency in the archive TOC that is much higher than the
highest dump id in the TOC. This doesn't seem all that odd, considering the
comment right above the offending block says there can be deps to things
that aren't in the archive. This causes the code to index way off the end of
the array of TOC entries by dumpId.

After confirming that this dependency id is in the dumped TOC, I created a
patch to check for dumpIds greater than maxDumpId against today's 8.5devel,
which once applied causes the restore to proceed cleanly (and with 2 jobs).
This would be my first code contribution to PG, so I hope it gets accepted!
:) Oh, and in 8.4 the modified line number would be 3681.

Here is a gdb session showing what I found, and below that is my proposed
patch:

[jon(at)stuck daily]$ gdb --args /usr/local/stow/pgsql-8.5-dev/bin/pg_restore
-p 5435 -d debug -j 2 hef.20100113-000001.dmp
GNU gdb (GDB) 7.0-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/stow/pgsql-8.5-dev/bin/pg_restore...done.
(gdb) run
Starting program: /usr/local/stow/pgsql-8.5-dev/bin/pg_restore -p 5435 -d
debug -j 2 hef.20100113-000001.dmp
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x08051865 in fix_dependencies (AH=0x8063190) at pg_backup_archiver.c:3736
3736 if (tocsByDumpId[te->dependencies[i] - 1] ==
NULL)
(gdb) bt
#0 0x08051865 in fix_dependencies (AH=0x8063190) at
pg_backup_archiver.c:3736
#1 0x08050aec in restore_toc_entries_parallel (AH=0x8063190)
at pg_backup_archiver.c:3097
#2 0x0804b4b2 in RestoreArchive (AHX=0x8063190, ropt=0x80630b8)
at pg_backup_archiver.c:366
#3 0x0804ac76 in main (argc=8, argv=0xbffff874) at pg_restore.c:380
(gdb) list
3731 */
3732 for (te = AH->toc->next; te != AH->toc; te = te->next)
3733 {
3734 for (i = 0; i < te->nDeps; i++)
3735 {
3736 if (tocsByDumpId[te->dependencies[i] - 1] ==
NULL)
3737 te->depCount--;
3738 }
3739 }
3740
(gdb) print *te
$1 = {prev = 0x806a028, next = 0x806a308, catalogId = {tableoid = 0, oid =
1001953},
dumpId = 311, section = SECTION_PRE_DATA, hadDumper = 0 '\000',
tag = 0x806a238 "plpgsql", namespace = 0x806a2d8 "public", tablespace =
0x0,
owner = 0x806a2e8 "", withOids = 1 '\001', desc = 0x806a248 "PROCEDURAL
LANGUAGE",
defn = 0x806a260 "CREATE TRUSTED PROCEDURAL LANGUAGE plpgsql HANDLER
plpgsql_call_handler;\n", dropStmt = 0x806a2b0 "DROP PROCEDURAL LANGUAGE
plpgsql;\n", copyStmt = 0x0,
dependencies = 0x806a2f8, nDeps = 1, dataDumper = 0, dataDumperArg = 0x0,
formatData = 0x806a490, par_prev = 0x0, par_next = 0x0, created = 0
'\000',
depCount = 1, lockDeps = 0x0, nLockDeps = 0}
(gdb) print *te->prev
$2 = {prev = 0x8069a10, next = 0x806a1c8, catalogId = {tableoid = 0, oid =
1001952},
dumpId = 312, section = SECTION_PRE_DATA, hadDumper = 0 '\000',
tag = 0x806a0a8 "plpgsql_call_handler()", namespace = 0x806a098 "public",
tablespace = 0x0, owner = 0x806a1a8 "postgres", withOids = 1 '\001',
desc = 0x806a0c8 "FUNC PROCEDURAL LANGUAGE",
defn = 0x806a0e8 "CREATE FUNCTION plpgsql_call_handler() RETURNS
language_handler\n AS '$libdir/plpgsql', 'plpgsql_call_handler'\n
LANGUAGE c;\n",
dropStmt = 0x806a170 "DROP FUNCTION public.plpgsql_call_handler();\n",
copyStmt = 0x0, dependencies = 0x0, nDeps = 0, dataDumper = 0,
dataDumperArg = 0x0,
formatData = 0x806a1b8, par_prev = 0x0, par_next = 0x0, created = 0
'\000',
depCount = 0, lockDeps = 0x0, nLockDeps = 0}
(gdb) print te->dependencies[i]
$3 = 1001952
(gdb) print AH->maxDumpId
$4 = 583
(gdb) print *tocsByDumpId[582]
$5 = {prev = 0x80c13f0, next = 0x80c1b40, catalogId = {tableoid = 0, oid =
1813687},
dumpId = 583, section = SECTION_POST_DATA, hadDumper = 0 '\000',
tag = 0x80c17c0 "providernumber_vw_upd", namespace = 0x80c1b10 "public",
tablespace = 0x0, owner = 0x80c1b20 "postgres", withOids = 1 '\001',
desc = 0x80c17b0 "RULE",
defn = 0x80c17e0 "CREATE RULE providernumber_vw_upd AS ON UPDATE TO
providernumber_vw DO INSTEAD UPDATE tblprovidernumber SET idoctorid =
new.idoctorid, gofficeid = new.gofficeid, iuserid = new.iuserid,
iinsurancecoid "..., dropStmt = 0x80c1b00 "",
copyStmt = 0x0, dependencies = 0x0, nDeps = 0, dataDumper = 0,
dataDumperArg = 0x0,
formatData = 0x80c1b30, par_prev = 0x0, par_next = 0x0, created = 0
'\000',
depCount = 0, lockDeps = 0x0, nLockDeps = 0}
(gdb) print *tocsByDumpId[583]
Cannot access memory at address 0x1569
(gdb)

And my patch:

diff --git a/src/bin/pg_dump/pg_backup_archiver.c
b/src/bin/pg_dump/pg_backup_archiver.c
index dda13ce..1b688ef 100644
*** a/src/bin/pg_dump/pg_backup_archiver.c
--- b/src/bin/pg_dump/pg_backup_archiver.c
*************** fix_dependencies(ArchiveHandle *AH)
*** 3733,3740 ****
{
for (i = 0; i < te->nDeps; i++)
{
! if (tocsByDumpId[te->dependencies[i] - 1] == NULL)
te->depCount--;
}
}

--- 3733,3743 ----
{
for (i = 0; i < te->nDeps; i++)
{
! if (te->dependencies[i] > AH->maxDumpId ||
! tocsByDumpId[te->dependencies[i] - 1] == NULL)
! {
te->depCount--;
+ }
}
}


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jon Erdman (aka StuckMojo)" <postgresql(at)thewickedtribe(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 17:15:15
Message-ID: 12493.1263921315@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Jon Erdman (aka StuckMojo)" <postgresql(at)thewickedtribe(dot)net> writes:
> So, I still run 7.4.5 for my medical billing app, and in playing around with
> 8.5 at AustinPUG last week I discovered that if I try to restore one of my
> backups from 7.4 (produced with 7.4 pg_dump) into 8.5devel using 8.5
> pg_restore and -j 2, it immediately segfaults. 8.4 does as well.

Ouch.

> I built 8.5 with debug to get a bt and investigate what's going on, and I
> found that it's a dependency in the archive TOC that is much higher than the
> highest dump id in the TOC. This doesn't seem all that odd, considering the
> comment right above the offending block says there can be deps to things
> that aren't in the archive. This causes the code to index way off the end of
> the array of TOC entries by dumpId.

No, the problem is that back then the dependencies meant something
completely different; they were object OIDs not dump ID numbers.

I think what we have to do is forbid -j restores from archives older
than archive version 1.8 (ie, pg_dump 8.0). It doesn't seem worth it
to try to support parallel restore from nearly-obsolete versions, and
I suspect that we couldn't do it even if we tried --- the reason the
representation got changed is that the old way simply didn't work for
any significant use of the dependency info. Just ignoring the
dependencies, as your patch effectively proposes, is going to lead to
restore failures or worse.

It seems like a good idea to put a range check into that loop as well,
but I think it should throw error not silently ignore bad data ...

Thanks for the report!

regards, tom lane


From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 18:01:40
Message-ID: 4B55F384.4000109@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:
> "Jon Erdman (aka StuckMojo)" <postgresql(at)thewickedtribe(dot)net> writes:
>> So, I still run 7.4.5 for my medical billing app, and in playing around with
>> 8.5 at AustinPUG last week I discovered that if I try to restore one of my
>> backups from 7.4 (produced with 7.4 pg_dump) into 8.5devel using 8.5
>> pg_restore and -j 2, it immediately segfaults. 8.4 does as well.
>
> Ouch.
>
>> I built 8.5 with debug to get a bt and investigate what's going on, and I
>> found that it's a dependency in the archive TOC that is much higher than the
>> highest dump id in the TOC. This doesn't seem all that odd, considering the
>> comment right above the offending block says there can be deps to things
>> that aren't in the archive. This causes the code to index way off the end of
>> the array of TOC entries by dumpId.
>
> No, the problem is that back then the dependencies meant something
> completely different; they were object OIDs not dump ID numbers.

That makes much more sense, I had a feeling it was something like that
based on how far outside the range that particular value was.

> I think what we have to do is forbid -j restores from archives older
> than archive version 1.8 (ie, pg_dump 8.0). It doesn't seem worth it
> to try to support parallel restore from nearly-obsolete versions, and
> I suspect that we couldn't do it even if we tried --- the reason the
> representation got changed is that the old way simply didn't work for
> any significant use of the dependency info. Just ignoring the
> dependencies, as your patch effectively proposes, is going to lead to
> restore failures or worse.

Mind if I take a crack at this patch? I spent a few hours poking around
in that section of the code, and I think I should be able to find the
right place to make the check without too much trouble. Assuming you
haven't already written it of course, knowing how quick you tend to be ;)

> It seems like a good idea to put a range check into that loop as well,
> but I think it should throw error not silently ignore bad data ...

What I'm curious about is the comment above that loop:

/*
* It is possible that the dependencies list items that are not in the
* archive at all. Subtract such items from the depCounts.
*/

If things can be referenced that are not even in the dump, wouldn't it
be somewhat likely that the dependency dumpId could be outside the range
of Ids found in the TOC (and thus outside the range of the array,
requiring a range check)? And if it's better to return an error in that
case, why does this loop exist at all in it's current form?

Nice to be able to pay you back for a patch you did at my request in 7.4
to help me port from MS SQL, which was supporting column defaults on
views...
- --

Jon T Erdman (aka StuckMojo)
PostgreSQL Zealot
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAktV84QACgkQRAk1+p0GhSFpvgCgiom9Dzre5R2bUlast+dRhmNu
nggAoJXWJ6FiOzaRH7PumLYsq1S6e3Sw
=qRyU
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 18:17:46
Message-ID: 14479.1263925066@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jon Erdman <postgresql(at)thewickedtribe(dot)net> writes:
> Tom Lane wrote:
>> It seems like a good idea to put a range check into that loop as well,
>> but I think it should throw error not silently ignore bad data ...

> If things can be referenced that are not even in the dump, wouldn't it
> be somewhat likely that the dependency dumpId could be outside the range
> of Ids found in the TOC (and thus outside the range of the array,
> requiring a range check)?

Yeah, after looking at it further I concluded that you're right.
maxDumpId is only the highest dump ID actually found in the archive
--- I had been thinking it was transmitted by the originating pg_dump,
but it isn't. So we do need to guard against out-of-range dependency
IDs too; this code could fail on partial archives as well as old ones.

The partial archive case is a bit scary, because there might be indirect
dependencies we don't know about. It might be best to throw an error,
but for the moment I'll just make it ignore the dependency to the
missing object and we'll see if anyone complains.

> Mind if I take a crack at this patch?

I already did it before seeing your mail ...

regards, tom lane


From: Chris Travers <chris(at)metatrontech(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jon Erdman (aka StuckMojo)" <postgresql(at)thewickedtribe(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 18:37:49
Message-ID: 5ed37b141001191037i3b6a988at7061ec1bfc3d10ab@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Just weighing in here.

On Tue, Jan 19, 2010 at 9:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It doesn't seem worth it
> to try to support parallel restore from nearly-obsolete versions, and
> I suspect that we couldn't do it even if we tried --- the reason the
> representation got changed is that the old way simply didn't work for
> any significant use of the dependency info.  Just ignoring the
> dependencies, as your patch effectively proposes, is going to lead to
> restore failures or worse.

Just to clarify, the only part that would not be supported would be
the parallel part, right?

Best Wishes,
Chris Travers


From: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
To: Chris Travers <chris(at)metatrontech(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 18:43:49
Message-ID: 4B55FD65.4070802@thewickedtribe.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chris Travers wrote:
> Just weighing in here.
>
> On Tue, Jan 19, 2010 at 9:15 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It doesn't seem worth it
>> to try to support parallel restore from nearly-obsolete versions, and
>> I suspect that we couldn't do it even if we tried --- the reason the
>> representation got changed is that the old way simply didn't work for
>> any significant use of the dependency info. Just ignoring the
>> dependencies, as your patch effectively proposes, is going to lead to
>> restore failures or worse.
>
> Just to clarify, the only part that would not be supported would be
> the parallel part, right?

Far as I can tell yes, since this fix_dependencies() function is only
called for parallel restores code path.

That would seem to indicate that in the case of partial archives these
missing deps wouldn't cause any worse failure than they otherwise would,
since they'd be just as missing if you did a normal restore vs. a
parallel one.
- --

Jon T Erdman (aka StuckMojo)
PostgreSQL Zealot
-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAktV/WUACgkQRAk1+p0GhSE6JwCeJ9Yalsx6nkffrxxjjfw5Te4M
g+cAnjimrcof/ziEda0kSLI4/A8ln5Jj
=pibw
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Travers <chris(at)metatrontech(dot)com>
Cc: "Jon Erdman (aka StuckMojo)" <postgresql(at)thewickedtribe(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 18:44:11
Message-ID: 16763.1263926651@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Chris Travers <chris(at)metatrontech(dot)com> writes:
> Just to clarify, the only part that would not be supported would be
> the parallel part, right?

Yeah, you can still restore as long as you don't use -j.

The other alternative is to use a more modern pg_dump to dump from
the old server.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jon Erdman <postgresql(at)thewickedtribe(dot)net>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5288: Restoring a 7.4.5 -Fc dump using -j 2 segfaults (patch included)
Date: 2010-01-19 18:49:08
Message-ID: 16891.1263926948@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jon Erdman <postgresql(at)thewickedtribe(dot)net> writes:
> That would seem to indicate that in the case of partial archives these
> missing deps wouldn't cause any worse failure than they otherwise would,
> since they'd be just as missing if you did a normal restore vs. a
> parallel one.

The dependency info isn't used at all unless you do a parallel restore.
We put that information into the dump file format specification years
ago with the idea that it would someday be used for purposes like this,
but 8.4's -j is the first attempt to actually use it. (The lack of
actual use is not unrelated to the fact that pre-8.0's representation
was unusable ...)

regards, tom lane