Re: parallel pg_restore

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: parallel pg_restore
Date: 2008-09-21 19:28:54
Message-ID: 48D6A076.90107@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am working on getting parallel pg_restore working. I'm currently
getting all the scaffolding working, and hope to have a naive prototype
posted within about a week.

The major question is how to choose the restoration order so as to
maximize efficiency both on the server and in reading the archive. My
thoughts are currently running something like this:

* when an item is completed, reduce the dependency count for each
item that depends on it by 1.
* when an item has a dependency count of 0 it is available for
execution, and gets moved to the head of the queue.
* when a new worker spot becomes available, if there not currently a
data load running then pick the first available data load,
otherwise pick the first available item.

This would mean that loading a table would probably be immediately
followed by creation of its indexes, including PK and UNIQUE
constraints, thus taking possible advantage of synchronised scans, data
in file system buffers, etc.

Another question is what we should do if the user supplies an explicit
order with --use-list. I'm inclined to say we should stick strictly with
the supplied order. Or maybe that should be an option.

Thoughts and comments welcome.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel pg_restore
Date: 2008-09-21 22:15:21
Message-ID: 12407.1222035321@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I am working on getting parallel pg_restore working. I'm currently
> getting all the scaffolding working, and hope to have a naive prototype
> posted within about a week.

> The major question is how to choose the restoration order so as to
> maximize efficiency both on the server and in reading the archive.

One of the first software design principles I ever learned was to
separate policy from mechanism. ISTM in this first cut you ought to
concentrate on mechanism and let the policy just be something dumb
(but coded separately from the infrastructure). We can refine it after
that.

> Another question is what we should do if the user supplies an explicit
> order with --use-list. I'm inclined to say we should stick strictly with
> the supplied order. Or maybe that should be an option.

Hmm. I think --use-list is used more for selecting a subset of items
to restore than for forcing a nondefault restore order. Forcing the
order used to be a major purpose, but that was years ago before we
had the dependency-driven-restore-order code working. So I'd vote that
the default behavior is to still allow parallel restore when this option
is used, and we should provide an orthogonal option that disables use of
parallel restore.

You'd really want the latter anyway for some cases, ie, when you don't
want the restore trying to hog the machine. Maybe the right form for
the extra option is just a limit on how many connections to use. Set it
to one to force the exact restore order, and to other values to throttle
how much of the machine the restore tries to eat.

One problem here though is that you'd need to be sure you behave sanely
when there is a dependency chain passing through an object that's not to
be restored. The ordering of the rest of the chain still ought to honor
the dependencies I think.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel pg_restore
Date: 2008-09-21 22:32:45
Message-ID: 48D6CB8D.2030502@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> I am working on getting parallel pg_restore working. I'm currently
>> getting all the scaffolding working, and hope to have a naive prototype
>> posted within about a week.
>>
>
>
>> The major question is how to choose the restoration order so as to
>> maximize efficiency both on the server and in reading the archive.
>>
>
> One of the first software design principles I ever learned was to
> separate policy from mechanism. ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure). We can refine it after
> that.
>

Indeed, that's exactly what I'm doing. However, given that time for the
8.4 window is short, I thought it would be sensible to get people
thinking about what the policy might be, while I get on with the mechanism.

>
>> Another question is what we should do if the user supplies an explicit
>> order with --use-list. I'm inclined to say we should stick strictly with
>> the supplied order. Or maybe that should be an option.
>>
>
> Hmm. I think --use-list is used more for selecting a subset of items
> to restore than for forcing a nondefault restore order. Forcing the
> order used to be a major purpose, but that was years ago before we
> had the dependency-driven-restore-order code working. So I'd vote that
> the default behavior is to still allow parallel restore when this option
> is used, and we should provide an orthogonal option that disables use of
> parallel restore.
>
> You'd really want the latter anyway for some cases, ie, when you don't
> want the restore trying to hog the machine. Maybe the right form for
> the extra option is just a limit on how many connections to use. Set it
> to one to force the exact restore order, and to other values to throttle
> how much of the machine the restore tries to eat.
>

My intention is to have single-thread restore remain the default, at
least for this go round, and have the user be able to choose
--multi-thread=nn to specify the number of concurrent connections to use.

> One problem here though is that you'd need to be sure you behave sanely
> when there is a dependency chain passing through an object that's not to
> be restored. The ordering of the rest of the chain still ought to honor
> the dependencies I think.
>
>
>

Right. I think we'd need to fake doing a full restore and omit actually
restoring items not on the passed in list. That should be simple enough.

cheers

andrew


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 07:53:16
Message-ID: 200809220953.19208.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le lundi 22 septembre 2008, Andrew Dunstan a écrit :
> > You'd really want the latter anyway for some cases, ie, when you don't
> > want the restore trying to hog the machine. Maybe the right form for
> > the extra option is just a limit on how many connections to use. Set it
> > to one to force the exact restore order, and to other values to throttle
> > how much of the machine the restore tries to eat.
>
> My intention is to have single-thread restore remain the default, at
> least for this go round, and have the user be able to choose
> --multi-thread=nn to specify the number of concurrent connections to use.

What about the make famous -j option?

-j [jobs], --jobs[=jobs]
Specifies the number of jobs (commands) to run simultaneously. If
there is more than one -j option, the last one is effective. If
the -j option is given without an argument, make will not limit
the number of jobs that can run simultaneously.

Regards,
--
dim


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 10:04:09
Message-ID: 1222077849.4445.150.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:

> > My intention is to have single-thread restore remain the default, at
> > least for this go round, and have the user be able to choose
> > --multi-thread=nn to specify the number of concurrent connections to use.
>
> What about the make famous -j option?
>
> -j [jobs], --jobs[=jobs]
> Specifies the number of jobs (commands) to run simultaneously. If
> there is more than one -j option, the last one is effective. If
> the -j option is given without an argument, make will not limit
> the number of jobs that can run simultaneously.

+1

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel pg_restore
Date: 2008-09-22 10:08:44
Message-ID: 1222078124.4445.155.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Sun, 2008-09-21 at 18:15 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > I am working on getting parallel pg_restore working. I'm currently
> > getting all the scaffolding working, and hope to have a naive prototype
> > posted within about a week.
>
> > The major question is how to choose the restoration order so as to
> > maximize efficiency both on the server and in reading the archive.
>
> One of the first software design principles I ever learned was to
> separate policy from mechanism. ISTM in this first cut you ought to
> concentrate on mechanism and let the policy just be something dumb
> (but coded separately from the infrastructure). We can refine it after
> that.

Agreed. We musn't make too many built in assumptions about the best way
to parallelise the restore.

For example, running all CREATE INDEX at same time may help I/O on the
scan but it may also swamp memory and force additional I/O as a result.

We might need a setting for total memory available, so pg_restore can
try not to run tasks that will exceed that across settings. Preferably
this wouldn't be just a pg_restore setting.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 15:38:50
Message-ID: 48D7BC0A.406@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
>
>
>>> My intention is to have single-thread restore remain the default, at
>>> least for this go round, and have the user be able to choose
>>> --multi-thread=nn to specify the number of concurrent connections to use.
>>>
>> What about the make famous -j option?
>>
>> -j [jobs], --jobs[=jobs]
>> Specifies the number of jobs (commands) to run simultaneously. If
>> there is more than one -j option, the last one is effective. If
>> the -j option is given without an argument, make will not limit
>> the number of jobs that can run simultaneously.
>>
>
> +1
>
>

If that's the preferred name I have no problem. I'm not sure about the
default argument part, though.

First, I'm not sure out getopt infrastructure actually provides for
optional arguments, and I am not going to remove it in pg_restore to get
around such a problem, at least now.

More importantly, I'm not convinced it's a good idea. It seems more like
a footgun that will potentially try to launch thousands of simultaneous
restore connections. I should have thought that optimal performance
would be reached at some small multiple (say maybe 2?) of the number of
CPUs on the server. You could achieve unlimited parallelism by saying
something like --jobs=99999, but I'd rather that were done very
explicitly instead of as the default value of the parameter.

cheers

andrew


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 16:24:28
Message-ID: 1222100668.4445.195.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-09-22 at 11:38 -0400, Andrew Dunstan wrote:
>
> Simon Riggs wrote:
> > On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote:
> >
> >
> >>> My intention is to have single-thread restore remain the default, at
> >>> least for this go round, and have the user be able to choose
> >>> --multi-thread=nn to specify the number of concurrent connections to use.
> >>>
> >> What about the make famous -j option?
> >>
> >> -j [jobs], --jobs[=jobs]
> >> Specifies the number of jobs (commands) to run simultaneously. If
> >> there is more than one -j option, the last one is effective. If
> >> the -j option is given without an argument, make will not limit
> >> the number of jobs that can run simultaneously.
> >>
> >
> > +1
> >
> >
>
> If that's the preferred name I have no problem. I'm not sure about the
> default argument part, though.
>
> First, I'm not sure out getopt infrastructure actually provides for
> optional arguments, and I am not going to remove it in pg_restore to get
> around such a problem, at least now.
>
> More importantly, I'm not convinced it's a good idea. It seems more like
> a footgun that will potentially try to launch thousands of simultaneous
> restore connections. I should have thought that optimal performance
> would be reached at some small multiple (say maybe 2?) of the number of
> CPUs on the server. You could achieve unlimited parallelism by saying
> something like --jobs=99999, but I'd rather that were done very
> explicitly instead of as the default value of the parameter.

OK, sounds best.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Joshua Drake <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 16:30:24
Message-ID: 20080922093024.60b5bbf1@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 22 Sep 2008 17:24:28 +0100
Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> > More importantly, I'm not convinced it's a good idea. It seems more
> > like a footgun that will potentially try to launch thousands of
> > simultaneous restore connections. I should have thought that
> > optimal performance would be reached at some small multiple (say
> > maybe 2?) of the number of CPUs on the server. You could achieve
> > unlimited parallelism by saying something like --jobs=99999, but
> > I'd rather that were done very explicitly instead of as the default
> > value of the parameter.
>
> OK, sounds best.
>

I will not argue vehemently here but I will say that "jobs" doesn't
seem correct. The term "workers" seems more appropriate.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Joshua Drake <jd(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 16:34:32
Message-ID: 200809221834.34750.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le lundi 22 septembre 2008, Joshua Drake a écrit :
> I will not argue vehemently here but I will say that "jobs" doesn't
> seem correct. The term "workers" seems more appropriate.

Mmmm, it sounds like it depends on the implementation (and how all workers
will share the same serializable transaction or just be independant jobs),
but my point here is more about giving the user a name they are used to.
Like in "oh, pg_restore -j, I see, thanks".

Now, if your argument is that the make concept of job does not match the
parallel pg_restore concept of workers, I'll simply bow to your choice:
baring other "limits", English not being my natural language makes it hard
for me to follow there ;)

Regards,
--
dim


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Joshua Drake <jd(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 18:03:31
Message-ID: 1222106611.4445.210.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-09-22 at 09:30 -0700, Joshua Drake wrote:
> On Mon, 22 Sep 2008 17:24:28 +0100
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
> > > More importantly, I'm not convinced it's a good idea. It seems more
> > > like a footgun that will potentially try to launch thousands of
> > > simultaneous restore connections. I should have thought that
> > > optimal performance would be reached at some small multiple (say
> > > maybe 2?) of the number of CPUs on the server. You could achieve
> > > unlimited parallelism by saying something like --jobs=99999, but
> > > I'd rather that were done very explicitly instead of as the default
> > > value of the parameter.
> >
> > OK, sounds best.
> >
>
> I will not argue vehemently here but I will say that "jobs" doesn't
> seem correct. The term "workers" seems more appropriate.

Agreed, but most utilities have "j" free but not w, p, t or other
letters that might be synonyms.

j is at least used for exactly this purpose in other tools.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Joshua Drake <jd(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-22 19:05:51
Message-ID: 48D7EC8F.1000106@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
>> I will not argue vehemently here but I will say that "jobs" doesn't
>> seem correct. The term "workers" seems more appropriate.
>>
>
> Agreed, but most utilities have "j" free but not w, p, t or other
> letters that might be synonyms.
>
> j is at least used for exactly this purpose in other tools.
>
>

There are in fact very few letters available, as we've been fairly
profligate in our use of option letters in the pg_dump suite.

j and m happen to be two of those that are available.

I honestly don't have a terribly strong opinion about what it should be
called. I can live with jobs or multi-threads.

cheers

andrew


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 05:40:10
Message-ID: 48D8813A.5060202@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:

> There are in fact very few letters available, as we've been fairly
> profligate in our use of option letters in the pg_dump suite.
>
> j and m happen to be two of those that are available.

--max-workers

Max makes sense because the number of workers won't be consistent, a
worker may not have a job to do. It is also consistent with
auto_vacuum_max_workers.

Joshua D. Drake

Sincerely,

Joshua D. Drake


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 07:14:33
Message-ID: 20080923071433.GA28639@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
>Andrew Dunstan wrote:
>>There are in fact very few letters available, as we've been fairly
>>profligate in our use of option letters in the pg_dump suite.

>>j and m happen to be two of those that are available.

>--max-workers

Perhaps, but please do not use that as justification for using -m.
That would be equally silly as abbreviating "number of workers" to -n.
--
Sincerely,
Stephen R. van den Berg.

Experience is something you don't get until just after you need it.


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joshua Drake <jd(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 07:44:19
Message-ID: 1222155859.4445.296.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:

> j and m happen to be two of those that are available.
>
> I honestly don't have a terribly strong opinion about what it should be
> called. I can live with jobs or multi-threads.

Perhaps we can use -j for jobs and -m for memory, so we can set memory
available across all threads with a single total value.

I can live with jobs or multi-threads also, whichever we decide. Neither
one is confusing to explain.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Joshua Drake <jd(at)commandprompt(dot)com>
To: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 19:43:20
Message-ID: 20080923124320.27d27b07@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 23 Sep 2008 09:14:33 +0200
"Stephen R. van den Berg" <srb(at)cuci(dot)nl> wrote:

> Joshua D. Drake wrote:
> >Andrew Dunstan wrote:
> >>There are in fact very few letters available, as we've been fairly
> >>profligate in our use of option letters in the pg_dump suite.
>
> >>j and m happen to be two of those that are available.
>
> >--max-workers
>
> Perhaps, but please do not use that as justification for using -m.
> That would be equally silly as abbreviating "number of workers" to -n.

Actually I came up with it because it coincides with existing
terminology. Autovacuum has the concept of max_workers.

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/


From: Joshua Drake <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 19:43:51
Message-ID: 20080923124351.4c2ba1f0@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 23 Sep 2008 08:44:19 +0100
Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

>
> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
>
> > j and m happen to be two of those that are available.
> >
> > I honestly don't have a terribly strong opinion about what it
> > should be called. I can live with jobs or multi-threads.
>
> Perhaps we can use -j for jobs and -m for memory, so we can set memory
> available across all threads with a single total value.
>
> I can live with jobs or multi-threads also, whichever we decide.
> Neither one is confusing to explain.
>

Memory? Where did that come from. Andrew is that in your spec?

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Joshua Drake <jd(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 19:53:59
Message-ID: 1222199639.4445.462.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
> On Tue, 23 Sep 2008 08:44:19 +0100
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
> >
> > On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
> >
> > > j and m happen to be two of those that are available.
> > >
> > > I honestly don't have a terribly strong opinion about what it
> > > should be called. I can live with jobs or multi-threads.
> >
> > Perhaps we can use -j for jobs and -m for memory, so we can set memory
> > available across all threads with a single total value.
> >
> > I can live with jobs or multi-threads also, whichever we decide.
> > Neither one is confusing to explain.
> >
>
> Memory? Where did that come from. Andrew is that in your spec?

No, but it's in mine. As I said upthread, no point in making it more
parallel than memory allows. Different operations need more/less memory
than others, so we must think about that also. We can quickly work out
how big a table is, so we can work out how much memory it will need to
perform sorts for index builds and thus how many parallel builds can
sensibly take place.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Joshua Drake <jd(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 20:50:43
Message-ID: 48D956A3.5030204@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
>
>> On Tue, 23 Sep 2008 08:44:19 +0100
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>
>>
>>> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
>>>
>>>
>>>> j and m happen to be two of those that are available.
>>>>
>>>> I honestly don't have a terribly strong opinion about what it
>>>> should be called. I can live with jobs or multi-threads.
>>>>
>>> Perhaps we can use -j for jobs and -m for memory, so we can set memory
>>> available across all threads with a single total value.
>>>
>>> I can live with jobs or multi-threads also, whichever we decide.
>>> Neither one is confusing to explain.
>>>
>>>
>> Memory? Where did that come from. Andrew is that in your spec?
>>
>
> No, but it's in mine. As I said upthread, no point in making it more
> parallel than memory allows. Different operations need more/less memory
> than others, so we must think about that also. We can quickly work out
> how big a table is, so we can work out how much memory it will need to
> perform sorts for index builds and thus how many parallel builds can
> sensibly take place.
>
>

If that ever happens it will certainly not be in this go round.

In fact, we have some anecdotal evidence that the point of dimishing
returns is not reached until a fairly high degree of parallelism is used
(Joshua's and my client has been using 24 threads, I believe).

In any case, my agenda goes something like this:

* get it working with a basic selection algorithm on Unix (nearly
done - keep your eyes open for a patch soon)
* start testing
* get it working on Windows
* improve the selection algorithm
* harden code

If we get all that done by November we'll have done well. And we know
that in some cases just this much can lead to reductions in restore time
of the order of 80%.

cheers

andrew


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joshua Drake <jd(at)commandprompt(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 21:10:33
Message-ID: 1222204233.4445.470.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:

> If we get all that done by November we'll have done well. And we know
> that in some cases just this much can lead to reductions in restore
> time
> of the order of 80%.

Agreed. Go for it.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Joshua Drake <jd(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-23 23:51:37
Message-ID: 20080923165137.32902742@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 23 Sep 2008 16:50:43 -0400
Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
>
> Simon Riggs wrote:
> > On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote:
> >
> >> On Tue, 23 Sep 2008 08:44:19 +0100
> >> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> >>
> >>
> >>> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote:
> >>>
> >>>
> >>>> j and m happen to be two of those that are available.
> >>>>
> >>>> I honestly don't have a terribly strong opinion about what it
> >>>> should be called. I can live with jobs or multi-threads.
> >>>>
> >>> Perhaps we can use -j for jobs and -m for memory, so we can set
> >>> memory available across all threads with a single total value.
> >>>
> >>> I can live with jobs or multi-threads also, whichever we decide.
> >>> Neither one is confusing to explain.
> >>>
> >>>
> >> Memory? Where did that come from. Andrew is that in your spec?
> >>
> >
> > No, but it's in mine. As I said upthread, no point in making it more
> > parallel than memory allows. Different operations need more/less
> > memory than others, so we must think about that also. We can
> > quickly work out how big a table is, so we can work out how much
> > memory it will need to perform sorts for index builds and thus how
> > many parallel builds can sensibly take place.
> >
> >
>
> If that ever happens it will certainly not be in this go round.
>
> In fact, we have some anecdotal evidence that the point of dimishing
> returns is not reached until a fairly high degree of parallelism is
> used (Joshua's and my client has been using 24 threads, I believe).

Against 8 cores but yes.

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 05:17:02
Message-ID: 48D9CD4E.7090200@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:
>
>> If we get all that done by November we'll have done well. And we know
>> that in some cases just this much can lead to reductions in restore
>> time
>> of the order of 80%.
>
> Agreed. Go for it.
>

Just as an FYI, by far the number one bottle neck on the multiple work
restores I was doing was CPU. RAM and IO were never the problem.

Sincerely,

Joshua D. Drake


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 06:52:52
Message-ID: 1222239172.4445.500.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-09-23 at 22:17 -0700, Joshua D. Drake wrote:
> Simon Riggs wrote:
> > On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote:
> >
> >> If we get all that done by November we'll have done well. And we know
> >> that in some cases just this much can lead to reductions in restore
> >> time
> >> of the order of 80%.
> >
> > Agreed. Go for it.

> Just as an FYI, by far the number one bottle neck on the multiple work
> restores I was doing was CPU. RAM and IO were never the problem.

It would be useful to see a full breakdown of those results.

There's always a bottleneck on something for any particular task and we
shouldn't presume the problem is only on CPU, for all data on all
systems. CPU parallelism is the most pressing problem, I agree, but I
think we will quickly hit problems without memory limits. But I agree
with Andrew that this will be a nice problem to have and not everything
is possible by Nov 1.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 07:50:11
Message-ID: 200809240950.14128.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le mardi 23 septembre 2008, Andrew Dunstan a écrit :
> In any case, my agenda goes something like this:
>
> * get it working with a basic selection algorithm on Unix (nearly
> done - keep your eyes open for a patch soon)
> * start testing
> * get it working on Windows
> * improve the selection algorithm
> * harden code

I'm not sure whether your work will feature single table restore splitting,
but if it's the case, you could consider having a look at what I've done in
pgloader. The parallel loading work there was asked for by Simon Riggs and
Greg Smith and you could test two different parallel algorithms.
The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose
what to implement in pg_restore, so I still hope it'll get usefull someday :)

Regards,
--
dim


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 15:48:35
Message-ID: 48DA6153.7040003@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine wrote:
> Hi,
>
> Le mardi 23 septembre 2008, Andrew Dunstan a écrit :
>
>> In any case, my agenda goes something like this:
>>
>> * get it working with a basic selection algorithm on Unix (nearly
>> done - keep your eyes open for a patch soon)
>> * start testing
>> * get it working on Windows
>> * improve the selection algorithm
>> * harden code
>>
>
> I'm not sure whether your work will feature single table restore splitting,
> but if it's the case, you could consider having a look at what I've done in
> pgloader. The parallel loading work there was asked for by Simon Riggs and
> Greg Smith and you could test two different parallel algorithms.
> The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose
> what to implement in pg_restore, so I still hope it'll get usefull someday :)
>
>
>

No. The proposal will perform exactly the same set of steps as
single-threaded pg_restore, but in parallel. The individual steps won't
be broken up.

Quite apart from anything else, parallel data loading of individual
tables will defeat clustering, as well as making it impossible to avoid
WAL logging of the load (which I have made provision for).

The fact that custom archives are compressed by default would in fact
make parallel loading of individual tables' data difficult with the
present format. We'd have to do something like expanding it on the
client (which might not even have enough disk space) and then split it
before loading it to the server. That's pretty yucky. Alternatively,
each loader thread would need to start decompressing the data from the
start and thow away data until it got to the point it wanted to start
restoring from. Also pretty yucky.

Far better would be to provide for multiple data members in the archive
and teach pg_dump to split large tables as it writes the archive. Then
pg_restore would need comparatively little adjustment.

Also, of course, you can split tables yourself by partitioning them.
That would buy you parallel data load with what I am doing now, with no
extra work.

In any case, data loading is very far from being the only problem. One
of my clients has long running restores where the data load takes about
20% or so of the time - the rest is in index creation and the like. No
amount of table splitting will make a huge difference to them, but
parallel processing will. As against that, if your problem is in loading
one huge table, this won't help you much. However, this is not a pattern
I see much - most of my clients seem to have several large tables plus a
boatload of indexes. They will benefit a lot.

cheers

andrew


From: Joshua Drake <jd(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 16:04:29
Message-ID: 20080924090429.4262bea4@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 24 Sep 2008 07:52:52 +0100
Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> > Just as an FYI, by far the number one bottle neck on the multiple
> > work restores I was doing was CPU. RAM and IO were never the
> > problem.
>
> It would be useful to see a full breakdown of those results.

Its in the archives. We had a ginormous discussion about it about 6
months back (-hackers).

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 16:16:33
Message-ID: 200809241816.36678.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le mercredi 24 septembre 2008, Andrew Dunstan a écrit :
> No. The proposal will perform exactly the same set of steps as
> single-threaded pg_restore, but in parallel. The individual steps won't
> be broken up.

Ok, good for a solid trustworthy parallelism restore. Which is exactly what we
want. Just out of curiosity, do you plan to use Postgres-R helper backends
infrastructure?

> Quite apart from anything else, parallel data loading of individual
> tables will defeat clustering, as well as making it impossible to avoid
> WAL logging of the load (which I have made provision for).

Depends whether the different workers are able to work from the same
transaction or not, I imagine. Some work has been done to allow multiple
backends to be working in the exact same transaction (Simon's snapclone and
Postgres-R helper backend infrastructure), so one of them could TRUNCATE the
table and give a go signal to workers to fill the table. In the same
transaction.
Ok, I need to wake up now... :)

> The fact that custom archives are compressed by default would in fact
> make parallel loading of individual tables' data difficult with the
> present format. We'd have to do something like expanding it on the
> client (which might not even have enough disk space) and then split it
> before loading it to the server. That's pretty yucky. Alternatively,
> each loader thread would need to start decompressing the data from the
> start and thow away data until it got to the point it wanted to start
> restoring from. Also pretty yucky.

Another alternative is the round-robin reader implemented in pgloader, where
all the archive reading is done by a single worker, which then split what it
read to any number of coworkers, filling next queue(s) while previous one(s)
are busy COPYing to the server.

> Far better would be to provide for multiple data members in the archive
> and teach pg_dump to split large tables as it writes the archive. Then
> pg_restore would need comparatively little adjustment.

Well, that's another possibility, but I tend to prefer having the parallelism
mecanics into the restore side of things. It may be only an illusion, but
this way I have far more trust into my backups.

> Also, of course, you can split tables yourself by partitioning them.
> That would buy you parallel data load with what I am doing now, with no
> extra work.

And that's excellent :)

> In any case, data loading is very far from being the only problem. One
> of my clients has long running restores where the data load takes about
> 20% or so of the time - the rest is in index creation and the like. No
> amount of table splitting will make a huge difference to them, but
> parallel processing will.

Oh yes, I'm running into this too (not on the same level but still).
Parallel seqscan should help creating indexes in parallel without having the
disks going crazy for read - write - read - write etc sequences, and posix
advices should help a lot here too.
Does the dependancy tracker in pg_restore allows to consider FK creation are
dependant on matching PK being already there?

> As against that, if your problem is in loading
> one huge table, this won't help you much. However, this is not a pattern
> I see much - most of my clients seem to have several large tables plus a
> boatload of indexes. They will benefit a lot.

The use case given by Greg Smith at the time was loading a multi terabyte
table on a raid array with a lot of spindles. It then become impossible for a
single CPU to take full profit of the available write bandwith. No idea how
common this situation is in the field, though.

Regards,
--
dim


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 16:56:34
Message-ID: 48DA7142.9060509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine wrote:
> Le mercredi 24 septembre 2008, Andrew Dunstan a écrit :
>
>> No. The proposal will perform exactly the same set of steps as
>> single-threaded pg_restore, but in parallel. The individual steps won't
>> be broken up.
>>
>
> Ok, good for a solid trustworthy parallelism restore. Which is exactly what we
> want. Just out of curiosity, do you plan to use Postgres-R helper backends
> infrastructure?
>

The is purely a patch to pg_restore. No backend changes at all (and if I
did it would not use anything that isn't in core anyway).
>
>> Quite apart from anything else, parallel data loading of individual
>> tables will defeat clustering, as well as making it impossible to avoid
>> WAL logging of the load (which I have made provision for).
>>
>
> Depends whether the different workers are able to work from the same
> transaction or not, I imagine. Some work has been done to allow multiple
> backends to be working in the exact same transaction (Simon's snapclone and
> Postgres-R helper backend infrastructure), so one of them could TRUNCATE the
> table and give a go signal to workers to fill the table. In the same
> transaction.
> Ok, I need to wake up now... :)
>
>

Again, I am not doing anything on the backend. I am following Tom's
original suggestion of simply having pg_restore run steps in parallel,
with no backend changes.

Also, you ignored the point about clustered data. Maybe that doesn't
matter to some people, but it does to others. This is designed to
provide the same result as a single threaded pg_restore. Splitting data
will break that.

>> The fact that custom archives are compressed by default would in fact
>> make parallel loading of individual tables' data difficult with the
>> present format. We'd have to do something like expanding it on the
>> client (which might not even have enough disk space) and then split it
>> before loading it to the server. That's pretty yucky. Alternatively,
>> each loader thread would need to start decompressing the data from the
>> start and thow away data until it got to the point it wanted to start
>> restoring from. Also pretty yucky.
>>
>
> Another alternative is the round-robin reader implemented in pgloader, where
> all the archive reading is done by a single worker, which then split what it
> read to any number of coworkers, filling next queue(s) while previous one(s)
> are busy COPYing to the server.
>
>
>> Far better would be to provide for multiple data members in the archive
>> and teach pg_dump to split large tables as it writes the archive. Then
>> pg_restore would need comparatively little adjustment.
>>
>
> Well, that's another possibility, but I tend to prefer having the parallelism
> mecanics into the restore side of things. It may be only an illusion, but
> this way I have far more trust into my backups.
>

Having pg_dump do the split would mean you get it for free, pretty much.
Rejecting that for a solution that could well be a bottleneck at restore
time would require lots more than just a feeling. I don't see how it
would give you any less reason to trust your backups.
>
>> Also, of course, you can split tables yourself by partitioning them.
>> That would buy you parallel data load with what I am doing now, with no
>> extra work.
>>
>
> And that's excellent :)
>
>
>> In any case, data loading is very far from being the only problem. One
>> of my clients has long running restores where the data load takes about
>> 20% or so of the time - the rest is in index creation and the like. No
>> amount of table splitting will make a huge difference to them, but
>> parallel processing will.
>>
>
> Oh yes, I'm running into this too (not on the same level but still).
> Parallel seqscan should help creating indexes in parallel without having the
> disks going crazy for read - write - read - write etc sequences, and posix
> advices should help a lot here too.
> Does the dependancy tracker in pg_restore allows to consider FK creation are
> dependant on matching PK being already there?
>

I believe so. If not, that's a bug and we should fix it IMNSHO.

>
>> As against that, if your problem is in loading
>> one huge table, this won't help you much. However, this is not a pattern
>> I see much - most of my clients seem to have several large tables plus a
>> boatload of indexes. They will benefit a lot.
>>
>
> The use case given by Greg Smith at the time was loading a multi terabyte
> table on a raid array with a lot of spindles. It then become impossible for a
> single CPU to take full profit of the available write bandwith. No idea how
> common this situation is in the field, though.
>
>
>

I still think the multiple data members of the archive approach would be
best here. One that allowed you to tell pg_dump to split every nn rows,
or every nn megabytes. Quite apart from any parallelism issues, that
could help enormously when there is a data problem as happens from time
to time, and can get quite annoying if it's in the middle of a humungous
data load.

cheers

andrew


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: parallel pg_restore
Date: 2008-09-24 19:10:44
Message-ID: 464C4848-FDEC-40A3-90D1-0BE285CA0C46@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Le 24 sept. 08 à 18:56, Andrew Dunstan a écrit :
> The is purely a patch to pg_restore. No backend changes at all (and
> if I did it would not use anything that isn't in core anyway).

Ok, good.
I'm eager to see what -core hackers will want to do with Postgres-R
patches, but that shouldn't be a reason to distract them, sorry...

> Also, you ignored the point about clustered data. Maybe that doesn't
> matter to some people, but it does to others. This is designed to
> provide the same result as a single threaded pg_restore. Splitting
> data will break that.

I'm not sure I understand what you mean by "clustered data" here, in
fact...

> Having pg_dump do the split would mean you get it for free, pretty
> much. Rejecting that for a solution that could well be a bottleneck
> at restore time would require lots more than just a feeling. I don't
> see how it would give you any less reason to trust your backups.

Well, when pg_restore's COPY fail, the table is not loaded and you get
an ERROR, and if you're running with the -1 option, the restore stops
here and you get a nice ROLLBACK.
With this later option, even if pg_dump did split your tables, the
ROLLBACK still happens.

Now, what happens when only one part of the data cannot be restored
but you didn't pg_restore -1. I guess you're simply left with a
partially restored table. How will you know which part contains the
error? How will you replay the restoring of this part only?

It the answer is to play with the restore catalogue, ok, if that's not
it, I'm feeling the dumps are now less trustworthy with the split
option than they were before.

Of course all this remains hypothetical as your work is not including
such a feature, which as we see is yet to be designed.

> I still think the multiple data members of the archive approach
> would be best here. One that allowed you to tell pg_dump to split
> every nn rows, or every nn megabytes. Quite apart from any
> parallelism issues, that could help enormously when there is a data
> problem as happens from time to time, and can get quite annoying if
> it's in the middle of a humungous data load.

Agreed, but it depends a lot on the ways to control the part that
failed, IMVHO. And I think we'd prefer to have a version of COPY FROM
with the capability to continue loading on failure...

Regards,
- --
dim

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjakLQACgkQlBXRlnbh1bm4jgCg0WenIOsaHwD9GDpI6C2mhVYB
pdwAoJYesvDYByQbSxqMjIEZOR9KiVXu
=AVy3
-----END PGP SIGNATURE-----