Quick Links

Re: WIP patch for parallel pg_dump

From:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-26 03:13:23
Message-ID:	AANLkTik_-7HATrAYDTwuhgpx03vbb30f8OeN7H+4VWEo@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 7:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> However, if you were doing something like parallel pg_dump you could
> >> just run the parent and child instances all against the slave, so the
> >> pg_dump scenario doesn't seem to offer much of a supporting use-case for
> >> worrying about this. When would you really need to be able to do it?
>
> > If you had several standbys, you could distribute the work of the
> > pg_dump among them. This would be a huge speedup for a large database,
> > potentially, thanks to parallelization of I/O and network. Imagine
> > doing a pg_dump of a 300GB database in 10min.
>
> That does sound kind of attractive. But to do that I think we'd have to
> go with the pass-the-snapshot-through-the-client approach. Shipping
> internal snapshot files through the WAL stream doesn't seem attractive
> to me.
>
> While I see Robert's point about preferring not to expose the snapshot
> contents to clients, I don't think it outweighs all other considerations
> here; and every other one is pointing to doing it the other way.
>
>
How about the publishing transaction puts the snapshot in a (new) system
table and passes a UUID to its children, and the joining transactions looks
for that UUID in the system table using dirty snapshot (SnapshotAny) using a
security-definer function owned by superuser.

No shared memory used, and if WAL-logged, the snapshot would get to the
slaves too.

I realize SnapshotAny wouldn't be sufficient since we want the tuple to
become invisible when the publishing transaction ends (commit/rollback),
hence something akin to (new) HeapTupleSatisfiesStillRunning() would be
needed.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

In response to

Re: WIP patch for parallel pg_dump at 2010-12-07 00:22:09 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Elliot Chance	2010-12-26 03:14:40	#include <funcapi.h>
Previous Message	Andrew Dunstan	2010-12-26 00:10:56	Re: MingW and MiniDumps