Re: parallel pg_restore

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel pg_restore
Date: 2008-09-21 22:32:45
Message-ID: 48D6CB8D.2030502@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> I am working on getting parallel pg_restore working. I'm currently
>> getting all the scaffolding working, and hope to have a naive prototype
>> posted within about a week.
>>
>
>
>> The major question is how to choose the restoration order so as to
>> maximize efficiency both on the server and in reading the archive.
>>
>
> One of the first software design principles I ever learned was to
> separate policy from mechanism. ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure). We can refine it after
> that.
>

Indeed, that's exactly what I'm doing. However, given that time for the
8.4 window is short, I thought it would be sensible to get people
thinking about what the policy might be, while I get on with the mechanism.

>
>> Another question is what we should do if the user supplies an explicit
>> order with --use-list. I'm inclined to say we should stick strictly with
>> the supplied order. Or maybe that should be an option.
>>
>
> Hmm. I think --use-list is used more for selecting a subset of items
> to restore than for forcing a nondefault restore order. Forcing the
> order used to be a major purpose, but that was years ago before we
> had the dependency-driven-restore-order code working. So I'd vote that
> the default behavior is to still allow parallel restore when this option
> is used, and we should provide an orthogonal option that disables use of
> parallel restore.
>
> You'd really want the latter anyway for some cases, ie, when you don't
> want the restore trying to hog the machine. Maybe the right form for
> the extra option is just a limit on how many connections to use. Set it
> to one to force the exact restore order, and to other values to throttle
> how much of the machine the restore tries to eat.
>

My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.

> One problem here though is that you'd need to be sure you behave sanely
> when there is a dependency chain passing through an object that's not to
> be restored. The ordering of the rest of the chain still ought to honor
> the dependencies I think.
>
>
>

Right. I think we'd need to fake doing a full restore and omit actually
restoring items not on the passed in list. That should be simple enough.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-09-21 22:42:36 Re: Assert Levels
Previous Message Greg Smith 2008-09-21 22:19:36 Re: Assert Levels