Lists: | pgsql-hackers |
---|
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | parallel restore fixes |
Date: | 2009-03-09 23:09:05 |
Message-ID: | 49B5A191.1000209@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
The attached patch fixes two issues with parallel restore:
* the static buffer problem in dumputils.c::fmtId() on Windows
(solution: use thread-local storage)
* ReopenPtr() is called too often
There is one remaining bug I know of that I can reproduce: we can get
into deadlock when two tables are foreign keyed to each other. So I need
to get a bit more paranoid about dependencies.
I can't reproduce Olivier Prennant's file closing problem on Unixware.
Is it still happening after application of this patch?
cheers
andrew
Attachment | Content-Type | Size |
---|---|---|
parallel_fix.patch | text/x-patch | 5.8 KB |
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-09 23:22:54 |
Message-ID: | 20090309232254.GI12932@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Andrew Dunstan wrote:
>
> The attached patch fixes two issues with parallel restore:
>
> * the static buffer problem in dumputils.c::fmtId() on Windows
> (solution: use thread-local storage)
> * ReopenPtr() is called too often
Hmm, if pg_restore is the only program that's threaded, why are you
calling init_dump_utils on pg_dump and pg_dumpall? It makes me a bit
nervous because there are some other programs that are linking
dumputils.c (psql and some in src/bin/scripts/) and even calling fmtId.
Also I think the fmtId comment needs to be updated.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-09 23:42:59 |
Message-ID: | 17972.1236642179@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> + void
> + init_dump_utils()
This should be
> + void
> + init_dump_utils(void)
please. We don't do K&R C around here. I'd lose the added retval
variable too; that's not contributing anything.
> ! #endif;
Semicolon is bogus here.
Looks pretty sane otherwise.
regards, tom lane
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-09 23:50:34 |
Message-ID: | 18107.1236642634@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Hmm, if pg_restore is the only program that's threaded, why are you
> calling init_dump_utils on pg_dump and pg_dumpall?
... because fmtId will crash on *any* use without that.
> It makes me a bit
> nervous because there are some other programs that are linking
> dumputils.c (psql and some in src/bin/scripts/) and even calling fmtId.
Actually, why bother with init_dump_utils at all? fmtId could be made
to initialize the ID variable for itself on first call, with only one
extra if-test, which is hardly gonna matter.
regards, tom lane
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-10 00:37:06 |
Message-ID: | 49B5B632.4040302@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
>> It makes me a bit
>> nervous because there are some other programs that are linking
>> dumputils.c (psql and some in src/bin/scripts/) and even calling fmtId.
>>
>
> Actually, why bother with init_dump_utils at all? fmtId could be made
> to initialize the ID variable for itself on first call, with only one
> extra if-test, which is hardly gonna matter.
>
>
Well, the Windows reference I have suggests TlsAlloc() needs to be
called early in the initialisation process ... I guess I could force it
with a dummy call to fmtId() in restore_toc_entries_parallel() before it
starts spawning children, so we'd be sure there wasn't a race condition,
and nothing else is going to have threads so it won't matter. We'd need
a long comment to that effect, though ;-)
> I'd lose the added retval
> variable too; that's not contributing anything.
It is, in fact. Until I put that in I was getting constant crashes. I
suspect it's something to do with stuff Windows does under the hood on
function return.
cheers
andrew
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-10 01:16:20 |
Message-ID: | 19101.1236647780@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> Actually, why bother with init_dump_utils at all?
> Well, the Windows reference I have suggests TlsAlloc() needs to be
> called early in the initialisation process ...
How early is early? The proposed call sites for init_dump_utils seem
already long past the point where any libc-level infrastructure would
think it is "initialization" time.
>> I'd lose the added retval
>> variable too; that's not contributing anything.
> It is, in fact. Until I put that in I was getting constant crashes. I
> suspect it's something to do with stuff Windows does under the hood on
> function return.
Pardon me while I retrieve my eyebrows from the ceiling. I think you've
got something going on there you don't understand, and you need to
understand it not just put in a cargo-cult fix. (Especially one that's
not documented and hence likely to be removed by the next person who
touches the code.)
regards, tom lane
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-10 01:44:50 |
Message-ID: | 49B5C612.90505@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> Actually, why bother with init_dump_utils at all?
>>>
>
>
>> Well, the Windows reference I have suggests TlsAlloc() needs to be
>> called early in the initialisation process ...
>>
>
> How early is early? The proposed call sites for init_dump_utils seem
> already long past the point where any libc-level infrastructure would
> think it is "initialization" time.
>
Well, I think at least it needs to be done where other threads won't be
calling it at the same time.
cheers
andrew
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-10 02:09:07 |
Message-ID: | 19822.1236650947@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> How early is early? The proposed call sites for init_dump_utils seem
>> already long past the point where any libc-level infrastructure would
>> think it is "initialization" time.
> Well, I think at least it needs to be done where other threads won't be
> calling it at the same time.
Oh, I see, ye olde race condition. Still, it seems like a bad idea
to expect that we will catch every program that might call fmtId;
as Alvaro notes, that's all over our client-side code.
How about this: by default, fmtId uses the same logic as now (one static
PQExpBuffer). If told to by a call of init_parallel_dump_utils(), which
need only be called by pg_restore during its startup, then it switches to
using per-thread storage. init_parallel_dump_utils can be the place
that calls TlsAlloc. This is almost the same as what you suggested a
couple messages back, but perhaps a bit clearer as to what's going on;
and it definitely cuts the number of places we need to touch.
regards, tom lane
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-10 20:39:59 |
Message-ID: | 49B6D01F.6050204@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
>
> How about this: by default, fmtId uses the same logic as now (one static
> PQExpBuffer). If told to by a call of init_parallel_dump_utils(), which
> need only be called by pg_restore during its startup, then it switches to
> using per-thread storage. init_parallel_dump_utils can be the place
> that calls TlsAlloc. This is almost the same as what you suggested a
> couple messages back, but perhaps a bit clearer as to what's going on;
> and it definitely cuts the number of places we need to touch.
>
>
>
OK, here 'tis.
Moving on to the deadlock with crossed FKs issue.
cheers
andrew
Attachment | Content-Type | Size |
---|---|---|
parallel_fix.patch | text/x-patch | 6.1 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-10 21:39:38 |
Message-ID: | 15395.1236721178@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> OK, here 'tis.
Looks fairly reasonable to me, but of course I haven't tested it.
regards, tom lane
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-11 15:19:34 |
Message-ID: | 49B7D686.60906@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> OK, here 'tis.
>
> Looks fairly reasonable to me, but of course I haven't tested it.
Well, I have to do a blitz of parallel restores next week, so hopefully
that will hit any soft spots.
--Josh
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: parallel restore fixes |
Date: | 2009-03-11 16:47:14 |
Message-ID: | 49B7EB12.6030404@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Josh Berkus wrote:
> Tom Lane wrote:
>> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>>> OK, here 'tis.
>>
>> Looks fairly reasonable to me, but of course I haven't tested it.
>
> Well, I have to do a blitz of parallel restores next week, so
> hopefully that will hit any soft spots.
I have a known outstanding bug to do with deadlock from FKs that cross
(i.e. A has an FK that references B, and B has an FK that references A).
The solution to this could be mildly complex, but I have an outline of
it in my head. Workaround: recreate the failed FK at the end of the restore.
The only other reported problem is the one on Unixware to do with
closing the archive. I haven't been able to reproduce it on Linux or
Windows, the two platforms I test on, although it might be fixed by the
patch I just applied.
cheers
andrew