Re: [COMMITTERS] pgsql: Add parallel pg_dump option.

Lists: pgsql-committerspgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Add parallel pg_dump option.
Date: 2013-03-24 15:39:44
Message-ID: E1UJn1M-0001uh-Nw@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Add parallel pg_dump option.

New infrastructure is added which creates a set number of workers
(threads on Windows, forked processes on Unix). Jobs are then
handed out to these workers by the master process as needed.
pg_restore is adjusted to use this new infrastructure in place of the
old setup which created a new worker for each step on the fly. Parallel
dumps acquire a snapshot clone in order to stay consistent, if
available.

The parallel option is selected by the -j / --jobs command line
parameter of pg_dump.

Joachim Wieland, lightly editorialized by Andrew Dunstan.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/9e257a181cc1dc5e19eb5d770ce09cc98f470f5f

Modified Files
--------------
doc/src/sgml/backup.sgml | 18 +
doc/src/sgml/perform.sgml | 9 +
doc/src/sgml/ref/pg_dump.sgml | 89 +++-
src/bin/pg_dump/Makefile | 2 +-
src/bin/pg_dump/compress_io.c | 10 +
src/bin/pg_dump/dumputils.c | 86 ++-
src/bin/pg_dump/dumputils.h | 13 +-
src/bin/pg_dump/parallel.c | 1293 +++++++++++++++++++++++++++++++++
src/bin/pg_dump/parallel.h | 85 +++
src/bin/pg_dump/pg_backup.h | 11 +-
src/bin/pg_dump/pg_backup_archiver.c | 735 +++++++------------
src/bin/pg_dump/pg_backup_archiver.h | 35 +-
src/bin/pg_dump/pg_backup_custom.c | 88 +++-
src/bin/pg_dump/pg_backup_db.c | 20 +-
src/bin/pg_dump/pg_backup_directory.c | 264 +++++++-
src/bin/pg_dump/pg_backup_tar.c | 8 +-
src/bin/pg_dump/pg_dump.c | 681 ++++++++++--------
src/bin/pg_dump/pg_dump.h | 3 +
src/bin/pg_dump/pg_dump_sort.c | 92 +++-
src/bin/pg_dump/pg_dumpall.c | 20 +-
src/bin/pg_dump/pg_restore.c | 17 +-
src/tools/msvc/Mkvcbuild.pm | 5 +
22 files changed, 2765 insertions(+), 819 deletions(-)


From: David Fetter <david(at)fetter(dot)org>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add parallel pg_dump option.
Date: 2013-03-29 19:12:40
Message-ID: 20130329191240.GJ17360@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, Mar 24, 2013 at 03:39:44PM +0000, Andrew Dunstan wrote:
> Add parallel pg_dump option.

This is great!

While testing, I noticed that the only supported -F option when -j is
specified is directory, which is fine as far as it goes, but I think
it would be easier on users if there were some default like ./pg_dump
when -j is specified. It could of course be overridden by -Fd.

What say?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add parallel pg_dump option.
Date: 2013-03-30 00:04:36
Message-ID: 51562C14.3080101@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


On 03/29/2013 03:12 PM, David Fetter wrote:
> On Sun, Mar 24, 2013 at 03:39:44PM +0000, Andrew Dunstan wrote:
>> Add parallel pg_dump option.
> This is great!
>
> While testing, I noticed that the only supported -F option when -j is
> specified is directory, which is fine as far as it goes, but I think
> it would be easier on users if there were some default like ./pg_dump
> when -j is specified. It could of course be overridden by -Fd.
>
> What say?
>

What you really want is to supply a default for --file in this case. It
would seem very odd to do so when --jobs is specified but not when it
isn't. You already need to specify a --file value when
--format=directory is in use, and the addition of --jobs has not changed
that; neither has the fact that --jobs requires --format=directory to be
specified. Given that, this can be seen as a feature request that could
be considered for 9.4 (although I'd be skeptical, TBH), but it surely
isn't something that's broken and needs to be fixed, and we're long past
the point where we should be adding new design for 9.3.

cheers

andrew