Re: parallel restore vs. windows

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: parallel restore vs. windows
Date: 2008-12-09 02:08:39
Message-ID: 493DD327.6000102@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


OK, after quite some trying I have hit a brick wall. I have been unable
to get parallel restore to work with Windows threading. No doubt I am
missing something, but I really don't know what. Unless someone can tell
me what I am doing wrong, I have these possibilities:

* run parallel steps on Windows in separate processes rather than
threads, similar to what we do in the server, or
* disable parallel restore on Windows for now.

Time is unfortunately running very short.

Latest attempt is attached.

cheers

andrew

Attachment Content-Type Size
parallel_restore_11.patch.gz application/x-gzip 15.7 KB

From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 02:19:45
Message-ID: 493DD5C1.6060106@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> OK, after quite some trying I have hit a brick wall. I have been unable
> to get parallel restore to work with Windows threading. No doubt I am
> missing something, but I really don't know what. Unless someone can tell
> me what I am doing wrong, I have these possibilities:
>
> * run parallel steps on Windows in separate processes rather than
> threads, similar to what we do in the server, or
> * disable parallel restore on Windows for now.
>
>
> Time is unfortunately running very short.
>
> Latest attempt is attached.
>
>

We use _beginthread. I don't remember exactely how it broke, but it did. Try
using the below instead of CreateThread.

// NOTE: if you don't need the returned handle, close it or
// leaks will occur. Closing it doesn't kill the thread.
HANDLE h = (HANDLE)_beginthreadex(NULL, 0, thread_start, arg, 0, NULL);
if(h)
CloseHandle(h);

From MSDN:
"A thread in an executable that calls the C run-time library (CRT) should use
the _beginthread and _endthread functions for thread management rather than
CreateThread and ExitThread;"

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 03:01:17
Message-ID: 493DDF7D.6010503@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
> Andrew Dunstan wrote:
>>
>> OK, after quite some trying I have hit a brick wall. I have been
>> unable to get parallel restore to work with Windows threading. No
>> doubt I am missing something, but I really don't know what. Unless
>> someone can tell me what I am doing wrong, I have these possibilities:
>>
>> * run parallel steps on Windows in separate processes rather than
>> threads, similar to what we do in the server, or
>> * disable parallel restore on Windows for now.
>>
>>
>> Time is unfortunately running very short.
>>
>> Latest attempt is attached.
>>
>>
>
> We use _beginthread. I don't remember exactely how it broke, but it
> did. Try using the below instead of CreateThread.
>
> // NOTE: if you don't need the returned handle, close it or
> // leaks will occur. Closing it doesn't kill the thread.
> HANDLE h = (HANDLE)_beginthreadex(NULL, 0, thread_start, arg, 0, NULL);

This didn't give me any more joy, unfortunately. But you're right, I
should be using it.

> if(h)
> CloseHandle(h);

Umm, even if I wait on the handle using waitForMultipleObjects() ?

>
> From MSDN:
> "A thread in an executable that calls the C run-time library (CRT)
> should use the _beginthread and _endthread functions for thread
> management rather than CreateThread and ExitThread;"

I am terminating the thread by returning from the thread function. I
understand this is the recommended way.

cheers

andrew


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 03:32:26
Message-ID: 493DE6CA.70407@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> HANDLE h = (HANDLE)_beginthreadex(NULL, 0, thread_start, arg, 0, NULL);
>
> This didn't give me any more joy, unfortunately. But you're right, I
> should be using it.
>

Are these threads sharing memory, intentionally or by mistake?

>> if(h)
>> CloseHandle(h);
>
> Umm, even if I wait on the handle using waitForMultipleObjects() ?
>

I was only trying to demonstrate that the value returned by _beginthread can be
managed/closed just like any other win32 HANDLE.

> I am terminating the thread by returning from the thread function. I
> understand this is the recommended way.

I didn't see a CloseHandle on ret_child anywhere. The HANDLE still exists after
the thread exists, you still have to call CloseHandle.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 03:48:38
Message-ID: 493DEA96.3020603@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
>
>>> HANDLE h = (HANDLE)_beginthreadex(NULL, 0, thread_start, arg, 0, NULL);
>>
>> This didn't give me any more joy, unfortunately. But you're right, I
>> should be using it.
>>
>
> Are these threads sharing memory, intentionally or by mistake?

Things they write, and things they read but might not be stable, are not
supposed to be shared. If they are it's a mistake.

>
>>> if(h)
>>> CloseHandle(h);
>>
>> Umm, even if I wait on the handle using waitForMultipleObjects() ?
>>
>
> I was only trying to demonstrate that the value returned by
> _beginthread can be managed/closed just like any other win32 HANDLE.
>
> > I am terminating the thread by returning from the thread function. I
> > understand this is the recommended way.
>
> I didn't see a CloseHandle on ret_child anywhere. The HANDLE still
> exists after the thread exists, you still have to call CloseHandle.

OK. I'll put that in after handling the return.

thanks

andrew


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 04:03:23
Message-ID: 493DEE0B.1010202@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> Andrew Chernow wrote:
>>
>>>> HANDLE h = (HANDLE)_beginthreadex(NULL, 0, thread_start, arg, 0, NULL);
>>>
>>> This didn't give me any more joy, unfortunately. But you're right, I
>>> should be using it.
>>>
>>
>> Are these threads sharing memory, intentionally or by mistake?
>
>
> Things they write, and things they read but might not be stable, are not
> supposed to be shared. If they are it's a mistake.
>

Looks like the ArchiveHandle variable 'AH' and the TocEntry 'next_work_item' are
not being deep copied at line 315 of your patch, where you prepare the
RestoreArgs struct for the thread. Every thread is accessing and possibly
updating the members of these structs that need to be deep copied.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 04:09:48
Message-ID: 493DEF8C.2030001@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>
>>> Are these threads sharing memory, intentionally or by mistake?
>>
>>
>> Things they write, and things they read but might not be stable, are
>> not supposed to be shared. If they are it's a mistake.
>>
>
> Looks like the ArchiveHandle variable 'AH' and the TocEntry
> 'next_work_item' are not being deep copied at line 315 of your patch,
> where you prepare the RestoreArgs struct for the thread. Every thread
> is accessing and possibly updating the members of these structs that
> need to be deep copied.
>

Forgot something, the prestore function leaks the RestoreArgs struct.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 16:34:06
Message-ID: 493E9DFE.2080601@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
>
> Looks like the ArchiveHandle variable 'AH' and the TocEntry
> 'next_work_item' are not being deep copied at line 315 of your patch,
> where you prepare the RestoreArgs struct for the thread. Every thread
> is accessing and possibly updating the members of these structs that
> need to be deep copied.
>

Each thread deals with a different TocEntry, which no other thread deals
with, so there should be no need to clone it, I believe.

Parts of AH need deep cloning, notably the formatData member, which is
done in _ReopenArchive().

I am aware that there are some minor memory leaks, which I will remedy.

cheers

andrew


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 17:06:53
Message-ID: 493EA5AD.7080803@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Parts of AH need deep cloning, notably the formatData member, which is
> done in _ReopenArchive().
>

Is it okay to clone this from within the thread?

The reopen() appears to mess with AH->FH, which mutltiple threads are
calling fclose on. The second thread is going to fail and the first
fclose() will close the main threads handle.

+ #ifndef WIN32
+ if (fclose(AH->FH) != 0)
+ die_horribly(AH, modulename, "could not close archive file: %s\n",
+ strerror(errno));
+ #else

How are things failing? Core dump, maybe you are seeing the above
error? The non-windows path is safe from this because a) it never does
an fclose and b) its a fork and has its own copy of the FH.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 17:36:40
Message-ID: 493EACA8.4070702@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
>>
>> Parts of AH need deep cloning, notably the formatData member, which
>> is done in _ReopenArchive().
>>
>
> Is it okay to clone this from within the thread?

I don't see why not.

>
> The reopen() appears to mess with AH->FH, which mutltiple threads are
> calling fclose on. The second thread is going to fail and the first
> fclose() will close the main threads handle.
>
> + #ifndef WIN32
> + if (fclose(AH->FH) != 0)
> + die_horribly(AH, modulename, "could not close archive file:
> %s\n",
> + strerror(errno));
> + #else
>
> How are things failing? Core dump, maybe you are seeing the above
> error? The non-windows path is safe from this because a) it never
> does an fclose and b) its a fork and has its own copy of the FH.

No, as this fragment shows, fclose() is NOT called on Windows.

The program dies with a nasty dialog box when restoring a dump of the
regression database after the second COPY thread disconnects.

cheers

andrew


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 17:46:11
Message-ID: 493EAEE3.8090704@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> No, as this fragment shows, fclose() is NOT called on Windows.
>

Oooppps. I'm the village idiot today.

> The program dies with a nasty dialog box when restoring a dump of the
> regression database after the second COPY thread disconnects.
>

I'll poke around but apparently I need food :)

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 18:50:27
Message-ID: 493EBDF3.6010008@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>
>>> Parts of AH need deep cloning, notably the formatData member, which
>>> is done in _ReopenArchive().
>>>
>>
>> Is it okay to clone this from within the thread?
>
> I don't see why not.
>

Because another thread may be modifying the memory you are trying to
clone. If no one modifies the formatData struct, then why is it being
deep copied to begin with.

>
> The program dies with a nasty dialog box when restoring a dump of the
> regression database after the second COPY thread disconnects.

Sounds like the friendly and helpful GPF Dialog (General Protection
Fault). This is a core dump which strongly suggests your threads are
trampling over one another. Its possible that a couple threads get
fired off but upon the first thread completion, something !(deep_copied)
is freed/modified ... bang-bang your dead :o I tried to find this, but
haven't yet.

Maybe do a full deep copy in the main thread and comment out any
in-thread deep copying. I wonder if that would work with no other changes.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 19:36:31
Message-ID: 493EC8BF.2090103@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
>>>>
>>>> Parts of AH need deep cloning, notably the formatData member, which
>>>> is done in _ReopenArchive().
>>>>
>>>
>>> Is it okay to clone this from within the thread?
>>
>> I don't see why not.
>>
>
> Because another thread may be modifying the memory you are trying to
> clone. If no one modifies the formatData struct, then why is it being
> deep copied to begin with.
>
>>
>> The program dies with a nasty dialog box when restoring a dump of the
>> regression database after the second COPY thread disconnects.
>
> Sounds like the friendly and helpful GPF Dialog (General Protection
> Fault). This is a core dump which strongly suggests your threads are
> trampling over one another. Its possible that a couple threads get
> fired off but upon the first thread completion, something
> !(deep_copied) is freed/modified ... bang-bang your dead :o I tried
> to find this, but haven't yet.
>
> Maybe do a full deep copy in the main thread and comment out any
> in-thread deep copying. I wonder if that would work with no other
> changes.

I'll try. It's unfortunately not as simple as it sounds, because of the
way the abstractions are arranged. I can't count the number of times I
have had to stop and try to clear my head while working on this code.

Thanks for the suggestion.

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Andrew Chernow <ac(at)esilo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 19:45:39
Message-ID: 493ECAE3.6060600@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> Andrew Chernow wrote:
>>>>>
>>>>> Parts of AH need deep cloning, notably the formatData member, which
>>>>> is done in _ReopenArchive().
>>>>>
>>>>
>>>> Is it okay to clone this from within the thread?
>>>
>>> I don't see why not.
>>>
>>
>> Because another thread may be modifying the memory you are trying to
>> clone. If no one modifies the formatData struct, then why is it being
>> deep copied to begin with.
>>
>>>
>>> The program dies with a nasty dialog box when restoring a dump of the
>>> regression database after the second COPY thread disconnects.
>>
>> Sounds like the friendly and helpful GPF Dialog (General Protection
>> Fault). This is a core dump which strongly suggests your threads are
>> trampling over one another. Its possible that a couple threads get
>> fired off but upon the first thread completion, something
>> !(deep_copied) is freed/modified ... bang-bang your dead :o I tried
>> to find this, but haven't yet.
>>
>> Maybe do a full deep copy in the main thread and comment out any
>> in-thread deep copying. I wonder if that would work with no other
>> changes.
>
> I'll try. It's unfortunately not as simple as it sounds, because of the
> way the abstractions are arranged. I can't count the number of times I
> have had to stop and try to clear my head while working on this code.

That's what killed me when I tried to review that stuff and figure it out.

Does that indicate that the abstractions are bad and should be changed,
or just that there's no reasonably way to make the abstractions both
make sense for the internal API itself *and* for being threadsafe?

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Andrew Chernow <ac(at)esilo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 20:02:31
Message-ID: 20149.1228852951@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Andrew Dunstan wrote:
>> I'll try. It's unfortunately not as simple as it sounds, because of the
>> way the abstractions are arranged. I can't count the number of times I
>> have had to stop and try to clear my head while working on this code.

> That's what killed me when I tried to review that stuff and figure it out.

> Does that indicate that the abstractions are bad and should be changed,
> or just that there's no reasonably way to make the abstractions both
> make sense for the internal API itself *and* for being threadsafe?

I think pretty much everybody except Philip Warner has found the stuff
around the TOC data structure and the "archiver" API to be confusing.
I'm not immediately sure about a better design though, at least not if
you don't want to duplicate a lot of code between the plain pg_dump and
the pg_dump/pg_restore cases.

I don't see that this has much of anything to do with thread safety,
however --- it's just a matter of too many layers of indirection IMHO.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Andrew Chernow <ac(at)esilo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 22:27:47
Message-ID: 493EF0E3.5010905@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Andrew Dunstan wrote:
>>> I'll try. It's unfortunately not as simple as it sounds, because of the
>>> way the abstractions are arranged. I can't count the number of times I
>>> have had to stop and try to clear my head while working on this code.
>
>> That's what killed me when I tried to review that stuff and figure it out.
>
>> Does that indicate that the abstractions are bad and should be changed,
>> or just that there's no reasonably way to make the abstractions both
>> make sense for the internal API itself *and* for being threadsafe?
>
> I think pretty much everybody except Philip Warner has found the stuff
> around the TOC data structure and the "archiver" API to be confusing.
> I'm not immediately sure about a better design though, at least not if
> you don't want to duplicate a lot of code between the plain pg_dump and
> the pg_dump/pg_restore cases.
>
> I don't see that this has much of anything to do with thread safety,
> however --- it's just a matter of too many layers of indirection IMHO.

It doesn't - but it makes it harder to find the issue I think :-( If it
was reasonably easy, an API redesign might help that. But I haven't
looked at all at the possibility of doing so, so I won't comment on if
it's likely to be doable.

//Magnus


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-09 23:28:47
Message-ID: 493EFF2F.2030700@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> Tom Lane wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> Andrew Dunstan wrote:
>>>> I'll try. It's unfortunately not as simple as it sounds, because of the
>>>> way the abstractions are arranged. I can't count the number of times I
>>>> have had to stop and try to clear my head while working on this code.
>>> That's what killed me when I tried to review that stuff and figure it out.
>>> Does that indicate that the abstractions are bad and should be changed,
>>> or just that there's no reasonably way to make the abstractions both
>>> make sense for the internal API itself *and* for being threadsafe?
>> I think pretty much everybody except Philip Warner has found the stuff
>> around the TOC data structure and the "archiver" API to be confusing.
>> I'm not immediately sure about a better design though, at least not if
>> you don't want to duplicate a lot of code between the plain pg_dump and
>> the pg_dump/pg_restore cases.
>>
>> I don't see that this has much of anything to do with thread safety,
>> however --- it's just a matter of too many layers of indirection IMHO.
>
> It doesn't - but it makes it harder to find the issue I think :-( If it
> was reasonably easy, an API redesign might help that. But I haven't
> looked at all at the possibility of doing so, so I won't comment on if
> it's likely to be doable.
>
> //Magnus
>
>

If it previously worked without threads, than in theory a deep copy of the
thread_arg should fix the core dump; especially if the non-windows fork() method
works with this patch. Maybe you can get away with only copying some of the
members (trial-n-error), I don't think they are all being used in this context.
Nothing should be copied from within the thread itself.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Andrew Chernow <ac(at)esilo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-10 00:19:58
Message-ID: 493F0B2E.6070204@rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I think pretty much everybody except Philip Warner has found the stuff
> around the TOC data structure and the "archiver" API to be confusing.
> I'm not immediately sure about a better design though, at least not if
> you don't want to duplicate a lot of code between the plain pg_dump and
> the pg_dump/pg_restore cases.
>

Here was I thinking it was more or less self-documenting and clear ;-).
But, yes, it is complex, and I can still see no way to reduce the
complexity. I should have some old notes on the code and am happy to
expand them as much as necessary.

If people want to nominate key areas of confusion, I will concentrate on
those first.

In terms of the current discussion, I am not sure I can help greatly;
writing cross-platform thread code is non-trivial. One minor point: I
noticed in early versions of the code that a global AH had been created
-- it occurs to me that this could be problem.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Chernow <ac(at)esilo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-10 00:32:33
Message-ID: 493F0E21.9050508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner wrote:
> Tom Lane wrote:
>
>> I think pretty much everybody except Philip Warner has found the stuff
>> around the TOC data structure and the "archiver" API to be confusing.
>> I'm not immediately sure about a better design though, at least not if
>> you don't want to duplicate a lot of code between the plain pg_dump and
>> the pg_dump/pg_restore cases.
>>
>>
>
> Here was I thinking it was more or less self-documenting and clear ;-).
> But, yes, it is complex, and I can still see no way to reduce the
> complexity. I should have some old notes on the code and am happy to
> expand them as much as necessary.
>
> If people want to nominate key areas of confusion, I will concentrate on
> those first.
>
> In terms of the current discussion, I am not sure I can help greatly;
> writing cross-platform thread code is non-trivial. One minor point: I
> noticed in early versions of the code that a global AH had been created
> -- it occurs to me that this could be problem.
>
>

No, it's not. It's not used in any thread except the main thread.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-14 17:13:48
Message-ID: 49453ECC.5030006@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
>
> If it previously worked without threads, than in theory a deep copy of
> the thread_arg should fix the core dump; especially if the non-windows
> fork() method works with this patch. Maybe you can get away with only
> copying some of the members (trial-n-error), I don't think they are
> all being used in this context. Nothing should be copied from within
> the thread itself.
>

I did this, but it turned out that the problem was a logic error that I
found once I had managed to get a working debugger. However, the Windows
thread code should now be more robust, so thanks to Andrew and Magnus
for the suggestions.

This version completes properly on Windows with the regression database.

Left to do:

. improve error checking
. memory leak cleanup
. code cleanup
. docs

I hope to have this done shortly.

cheers

andrew

Attachment Content-Type Size
parallel_restore_12.patch.gz application/x-gzip 16.1 KB

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: parallel restore vs. windows
Date: 2008-12-17 08:57:59
Message-ID: 20081217164555.E917.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

> I did this, but it turned out that the problem was a logic error that I
> found once I had managed to get a working debugger. However, the Windows
> thread code should now be more robust, so thanks to Andrew and Magnus
> for the suggestions.

Hello, I tested parallel restore on Windows.
I have some random comments about it:

* Two compiler warnings.
pg_backup_custom.c: In function `_PrintTocData':
pg_backup_custom.c:437: warning: unused variable `ctx'
pg_backup_custom.c: In function `_ReopenArchive':
pg_backup_custom.c:849: warning: unused variable `ctx'

* No description about new options in pg_restore --help.
There are no help messages about multi-thread (-m) and
truncate-before-load options.

* multi-thread option is ignored if --data-only is on.
Is it an intended behavior? Even if so, we'd better to have
warning messages here.

* Threads, forked processes and connections are disposed per entry.
I think it's a designed behavior, but there might be room for
improvement. The present implementation is slower when there
are many small objects. If we can specialize in thread-based
implementation, thread pooling and connections pooling are
typically used in the context. -- it might be a ToDo item in 8.5.

----
I have no idea about performance because I don't have multi-core
machine for windows. Parallel restore seems to be slower than
serial restore on single-cpu machine.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: parallel restore vs. windows
Date: 2008-12-17 12:53:50
Message-ID: 4948F65E.4020504@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
>> I did this, but it turned out that the problem was a logic error that I
>> found once I had managed to get a working debugger. However, the Windows
>> thread code should now be more robust, so thanks to Andrew and Magnus
>> for the suggestions.
>>
>
> Hello, I tested parallel restore on Windows.
> I have some random comments about it:
>

Thanks for this
> * Two compiler warnings.
> pg_backup_custom.c: In function `_PrintTocData':
> pg_backup_custom.c:437: warning: unused variable `ctx'
> pg_backup_custom.c: In function `_ReopenArchive':
> pg_backup_custom.c:849: warning: unused variable `ctx'
>

Will be fixed in code cleanup
> * No description about new options in pg_restore --help.
> There are no help messages about multi-thread (-m) and
> truncate-before-load options.
>

Will fix
> * multi-thread option is ignored if --data-only is on.
> Is it an intended behavior? Even if so, we'd better to have
> warning messages here.
>

Not intended, unless my memory is fading. I will check.
> * Threads, forked processes and connections are disposed per entry.
> I think it's a designed behavior, but there might be room for
> improvement. The present implementation is slower when there
> are many small objects. If we can specialize in thread-based
> implementation, thread pooling and connections pooling are
> typically used in the context. -- it might be a ToDo item in 8.5.
>

Yes. I only got threading working at all just a few days ago. I think
your suggestion is a good one, and we should probably converge on a
threaded implementation and then look at using pooling. However, as you
say that would be work for the 8.5 timeframe.

> ----
> I have no idea about performance because I don't have multi-core
> machine for windows. Parallel restore seems to be slower than
> serial restore on single-cpu machine.
>

Not surprising. There is extra connection, worker setup/breakdown,
dependency housekeeping and context switching involved. However, I'd be
surprised if the overhead were huge.

cheers

andrew


From: "Jaime Casanova" <jcasanov(at)systemguards(dot)com(dot)ec>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Andrew Chernow" <ac(at)esilo(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-24 22:11:15
Message-ID: 3073cc9b0812241411t71a60f44j16d6a7083a1f88f1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 14, 2008 at 12:13 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> This version completes properly on Windows with the regression database.
>

actually, this one doesn't apply cleanly on head

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Andrew Chernow <ac(at)esilo(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel restore vs. windows
Date: 2008-12-24 22:58:29
Message-ID: 4952BE95.5000900@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova wrote:
> On Sun, Dec 14, 2008 at 12:13 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> This version completes properly on Windows with the regression database.
>>
>>
>
> actually, this one doesn't apply cleanly on head
>

I will have a new patch a day or two after Christmas, which I hope will
be very close to being fully done.

cheers

andrew