Quick Links

WIP patch for parallel pg_dump

Lists:	pgsql-hackers

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	WIP patch for parallel pg_dump
Date:	2010-11-14 23:52:55
Message-ID:	AANLkTin27_TOVU5KF90Ou3qGnT+d76JPgaDbrDLZBaxV@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is the second patch for parallel pg_dump, now the actual part that
parallelizes the whole thing. More precisely, it adds parallel
backup/restore
to pg_dump/pg_restore for the directory archive format and keeps the
parallel
restore part of the custom archive format. Combined with my archive format
directory patch, which also includes a prototype of the liblzf compression
you
can combine this compression with any of the just mentioned backup/restore
scenarios. This patch is on top of the previous directory patch.

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname

In previous discussions there was a request to add support for multiple
directories, which I have done as well, so that you can also run

$ pg_dump -j 4 -Fd -f dir1:dir2:dir3 dbname

to equally distribute the data among those three directories (we can still
discuss the syntax, I am not all that happy with the colon either...)

The dump would always start with the largest objects, by looking at the
relpages column of pg_class which should give a good estimate. The order of
the
objects to restore is determined by the dependencies among the objects
(which
is already used in the parallel restore of the custom archivetype).

The file test.sh includes some example commands that I have run here as a
kind
of regression test that should give you an impression of how to call it from
the
command line.

One thing that is currently missing is proper support for Windows, this is
the next
thing that I will be working on. Also this version still gives quite a bunch
of debug
information about what the processes are doing, so don't try to pipe the
pg_dump output anywhere (even when not run in parallel), it will probably
just
not work...

The missing part that would make parallel pg_dump work with no strings
attached
is snapshot synchronization. As long as there are no synchronized snapshots,
you would need to stop writing to your database before starting the parallel
pg_dump. However it turns out that most often when you are especially
concerned
about a fast dump, you have shut down your applications anyway (which is the
reason why you are so concerned about speed in the first place). These cases
are typically database migrations from one host/platform to another or
database
upgrades without pg_migrator.

Joachim

Attachment	Content-Type	Size
pg_dump-parallel.diff	text/x-patch	132.4 KB

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 05:39:08
Message-ID:	AANLkTikQpktCDi6vPZtXz1Pv2uSxRoBn6wJ0Y_MVPeDD@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> You would add a regular parallel dump with
>
> $ pg_dump -j 4 -Fd -f out.dir dbname

So this is an updated series of patches for my parallel pg_dump WIP
patch. Most importantly it now runs on Windows once you get it to
compile there (I have added the new files to the respective project of
Mkvcbuild.pm but I wondered why the other archive formats do not need
to be defined in that file...).

So far nobody has volunteered to review this patch. It would be great
if people could at least check it out, run it and let me know if it
works and if they have any comments.

I have put all four patches in a tar archive, the patches must be
applied sequentially:

1. pg_dump_compression-refactor.diff
2. pg_dump_directory.diff
3. pg_dump_directory_parallel.diff
4. pg_dump_directory_parallel_lzf.diff

The compression-refactor patch does not include Heikki's latest changes yet.

And the last of the four patches adds LZF compression for whoever
wants to try that out. You need to link against an already installed
liblzf and call it with --compress-lzf.

Joachim

Attachment	Content-Type	Size
pg_dump_parallel.tar.gz	application/x-gzip	60.9 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 11:19:20
Message-ID:	4CF780B8.3060202@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02.12.2010 07:39, Joachim Wieland wrote:
> On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wieland<joe(at)mcknight(dot)de> wrote:
>> You would add a regular parallel dump with
>>
>> $ pg_dump -j 4 -Fd -f out.dir dbname
>
> So this is an updated series of patches for my parallel pg_dump WIP
> patch. Most importantly it now runs on Windows once you get it to
> compile there (I have added the new files to the respective project of
> Mkvcbuild.pm but I wondered why the other archive formats do not need
> to be defined in that file...).
>
> So far nobody has volunteered to review this patch. It would be great
> if people could at least check it out, run it and let me know if it
> works and if they have any comments.

That's a big patch..

I don't see the point of the sort-by-relpages code. The order the
objects are dumped should be irrelevant, as long as you obey the
restrictions dictated by dependencies. Or is it only needed for the
multiple-target-dirs feature? Frankly I don't see the point of that, so
it would be good to cull it out at least in this first stage.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 12:00:21
Message-ID:	8762vc1j1m.fsf@hi-media-techno.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I don't see the point of the sort-by-relpages code. The order the objects
> are dumped should be irrelevant, as long as you obey the restrictions
> dictated by dependencies. Or is it only needed for the multiple-target-dirs
> feature? Frankly I don't see the point of that, so it would be good to cull
> it out at least in this first stage.

From the talk at CHAR(10), and provided memory serves, it's an
optimisation so that you're doing largest file in a process and all the
little file in other processes. In lots of case the total pg_dump
duration is then reduced to about the time to dump the biggest files.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 13:34:08
Message-ID:	AANLkTinTg80dvcraEEpNbUYWoR+EbUYEZso+7TkF2YGO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 2, 2010 at 6:19 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I don't see the point of the sort-by-relpages code. The order the objects
> are dumped should be irrelevant, as long as you obey the restrictions
> dictated by dependencies. Or is it only needed for the multiple-target-dirs
> feature? Frankly I don't see the point of that, so it would be good to cull
> it out at least in this first stage.

A guy called Dimitri Fontaine actually proposed the
serveral-directories feature here and other people liked the idea.

http://archives.postgresql.org/pgsql-hackers/2008-02/msg01061.php :-)

The code doesn't change much with or without it, and if people are no
longer in favour of it, I have no problem with taking it out.

As Dimitri has already pointed out, the relpage sorting thing is there
to start with the largest table(s) first.

Joachim

From:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 13:50:32
Message-ID:	87lj48z3kn.fsf@hi-media-techno.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joachim Wieland <joe(at)mcknight(dot)de> writes:
> A guy called Dimitri Fontaine actually proposed the
> serveral-directories feature here and other people liked the idea.

Hehe :)

Reading that now, it could be that I didn't know at the time that given
a powerful enough subsystem disk there's no way to saturate it with one
CPU. So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time. Not sure how releveant it is.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 17:56:39
Message-ID:	4CF7DDD7.1040607@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 05:50 AM, Dimitri Fontaine wrote:
> So the use case of parralel dump in a bunch or user given locations
> would be to use different mount points (disk subsystems) at the same
> time. Not sure how releveant it is.

I think it will complicate this feature unnecessarily for 9.1.
Personally, I need this patch so much I'm thinking of backporting it.
However, having all the data go to one directory/mount wouldn't trouble
me at all.

Now, if only I could think of some way to write a parallel dump to a set
of pipes, I'd be in heaven.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 18:10:45
Message-ID:	4CF7E125.7050007@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 12:56 PM, Josh Berkus wrote:
> On 12/02/2010 05:50 AM, Dimitri Fontaine wrote:
>> So the use case of parralel dump in a bunch or user given locations
>> would be to use different mount points (disk subsystems) at the same
>> time. Not sure how releveant it is.
>
> I think it will complicate this feature unnecessarily for 9.1.
> Personally, I need this patch so much I'm thinking of backporting it.
> However, having all the data go to one directory/mount wouldn't
> trouble me at all.
>
> Now, if only I could think of some way to write a parallel dump to a
> set of pipes, I'd be in heaven.

The only way I can see that working sanely would be to have a program
gathering stuff at the other end of the pipes, and ensuring it was all
coherent. That would be a huge growth in scope for this, and I seriously
doubt it's worth it.

cheers

andrew

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 18:18:45
Message-ID:	4CF7E305.7060102@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> Now, if only I could think of some way to write a parallel dump to a
>> set of pipes, I'd be in heaven.
>
> The only way I can see that working sanely would be to have a program
> gathering stuff at the other end of the pipes, and ensuring it was all
> coherent. That would be a huge growth in scope for this, and I seriously
> doubt it's worth it.

Oh, no question. And there's workarounds ... sshfs, for example. I'm
just thinking of the ad-hoc parallel backup I'm running today, which
relies heavily on pipes.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 19:13:13
Message-ID:	AANLkTimCN53_hGmi9saEqEdWw=yZcRVMQuCZVLXHgLpt@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 2, 2010 at 12:56 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Now, if only I could think of some way to write a parallel dump to a set of
> pipes, I'd be in heaven.

What exactly are you trying to accomplish with the pipes?

Joachim

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 22:01:16
Message-ID:	3474.1291327276@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> That's a big patch..

Not nearly big enough :-(

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables. I fail to understand
why that consideration can be allowed to go by the wayside now.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 22:13:37
Message-ID:	4CF81A11.7010605@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 05:01 PM, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> That's a big patch..
> Not nearly big enough :-(
>
> In the past, proposals for this have always been rejected on the grounds
> that it's impossible to assure a consistent dump if different
> connections are used to read different tables. I fail to understand
> why that consideration can be allowed to go by the wayside now.
>
>

Well, snapshot cloning should allow that objection to be overcome, no?

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 22:32:16
Message-ID:	8927.1291329136@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/02/2010 05:01 PM, Tom Lane wrote:
>> In the past, proposals for this have always been rejected on the grounds
>> that it's impossible to assure a consistent dump if different
>> connections are used to read different tables. I fail to understand
>> why that consideration can be allowed to go by the wayside now.

> Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's. But the
snapshot issue is definitely a showstopper.)

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 23:12:15
Message-ID:	201012022312.oB2NCF119818@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dimitri Fontaine wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> > I don't see the point of the sort-by-relpages code. The order the objects
> > are dumped should be irrelevant, as long as you obey the restrictions
> > dictated by dependencies. Or is it only needed for the multiple-target-dirs
> > feature? Frankly I don't see the point of that, so it would be good to cull
> > it out at least in this first stage.
>
> >From the talk at CHAR(10), and provided memory serves, it's an
> optimisation so that you're doing largest file in a process and all the
> little file in other processes. In lots of case the total pg_dump
> duration is then reduced to about the time to dump the biggest files.

Seems there should be a comment in the code explaining why this is being
done.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-02 23:51:58
Message-ID:	4CF8311E.4050508@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 05:32 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 12/02/2010 05:01 PM, Tom Lane wrote:
>>> In the past, proposals for this have always been rejected on the grounds
>>> that it's impossible to assure a consistent dump if different
>>> connections are used to read different tables. I fail to understand
>>> why that consideration can be allowed to go by the wayside now.
>> Well, snapshot cloning should allow that objection to be overcome, no?
> Possibly, but we need to see that patch first not second.

Yes, I agree with that.

> (I'm not actually convinced that snapshot cloning is the only problem
> here; locking could be an issue too, if there are concurrent processes
> trying to take locks that will conflict with pg_dump's. But the
> snapshot issue is definitely a showstopper.)
>
>

Why is that more an issue with parallel pg_dump?

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 00:13:46
Message-ID:	AANLkTi=Ub5sddoWS+O302o9CjfM8eezfUWw09rE_J6Xe@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 12/02/2010 05:01 PM, Tom Lane wrote:
>>> In the past, proposals for this have always been rejected on the grounds
>>> that it's impossible to assure a consistent dump if different
>>> connections are used to read different tables. I fail to understand
>>> why that consideration can be allowed to go by the wayside now.
>
>> Well, snapshot cloning should allow that objection to be overcome, no?
>
> Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 00:21:28
Message-ID:	4CF83808.2010307@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 07:13 PM, Robert Haas wrote:
> On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>>> On 12/02/2010 05:01 PM, Tom Lane wrote:
>>>> In the past, proposals for this have always been rejected on the grounds
>>>> that it's impossible to assure a consistent dump if different
>>>> connections are used to read different tables. I fail to understand
>>>> why that consideration can be allowed to go by the wayside now.
>>> Well, snapshot cloning should allow that objection to be overcome, no?
>> Possibly, but we need to see that patch first not second.
> Yes, by all means let's allow the perfect to be the enemy of the good.
>

That seems like a bit of an easy shot. Requiring that parallel pg_dump
produce a dump that is as consistent as non-parallel pg_dump currently
produces isn't unreasonable. It's not stopping us moving forward, it's
just not wanting to go backwards.

And it shouldn't be terribly hard. IIRC Joachim has already done some
work on it.

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 00:48:33
Message-ID:	AANLkTimrvQOy5w+-Z1fztgrQYVrZFjBi2=garS38Kg6P@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>>>> In the past, proposals for this have always been rejected on the
>>>>> grounds
>>>>> that it's impossible to assure a consistent dump if different
>>>>> connections are used to read different tables. I fail to understand
>>>>> why that consideration can be allowed to go by the wayside now.
>>>> Well, snapshot cloning should allow that objection to be overcome, no?
>>> Possibly, but we need to see that patch first not second.
>> Yes, by all means let's allow the perfect to be the enemy of the good.
>>
>
> That seems like a bit of an easy shot. Requiring that parallel pg_dump
> produce a dump that is as consistent as non-parallel pg_dump currently
> produces isn't unreasonable. It's not stopping us moving forward, it's just
> not wanting to go backwards.

I certainly agree that would be nice. But if Joachim thought the
patch were useless without that, perhaps he wouldn't have bothered
writing it at this point. In fact, he doesn't think that, and he
mentioned the use cases he sees in his original post. But even
supposing you wouldn't personally find this useful in those
situations, how can you possibly say that HE wouldn't find it useful
in those situations? I understand that people sometimes show up here
and ask for ridiculous things, but I don't think we should be too
quick to attribute ridiculousness to regular contributors.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 01:11:48
Message-ID:	4CF843D4.2010004@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 07:48 PM, Robert Haas wrote:
> On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>>>>> In the past, proposals for this have always been rejected on the
>>>>>> grounds
>>>>>> that it's impossible to assure a consistent dump if different
>>>>>> connections are used to read different tables. I fail to understand
>>>>>> why that consideration can be allowed to go by the wayside now.
>>>>> Well, snapshot cloning should allow that objection to be overcome, no?
>>>> Possibly, but we need to see that patch first not second.
>>> Yes, by all means let's allow the perfect to be the enemy of the good.
>>>
>> That seems like a bit of an easy shot. Requiring that parallel pg_dump
>> produce a dump that is as consistent as non-parallel pg_dump currently
>> produces isn't unreasonable. It's not stopping us moving forward, it's just
>> not wanting to go backwards.
> I certainly agree that would be nice. But if Joachim thought the
> patch were useless without that, perhaps he wouldn't have bothered
> writing it at this point. In fact, he doesn't think that, and he
> mentioned the use cases he sees in his original post. But even
> supposing you wouldn't personally find this useful in those
> situations, how can you possibly say that HE wouldn't find it useful
> in those situations? I understand that people sometimes show up here
> and ask for ridiculous things, but I don't think we should be too
> quick to attribute ridiculousness to regular contributors.

Umm, nobody has attributed ridiculousness to anyone. Please don't put
words in my mouth. But I think this is a perfectly reasonable discussion
to have. Nobody gets to come along and get the features they want
without some sort of consensus, not me, not you, not Joachim, not Tom.

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 01:59:04
Message-ID:	332D8050-2479-4D0F-95A9-765B89268E83@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Dec 2, 2010, at 8:11 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Umm, nobody has attributed ridiculousness to anyone. Please don't put words in my mouth. But I think this is a perfectly reasonable discussion to have. Nobody gets to come along and get the features they want without some sort of consensus, not me, not you, not Joachim, not Tom.

I'm not disputing that we COULD reject the patch. I AM disputing that we've made a cogent argument for doing so.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 02:09:59
Message-ID:	9536.1291342199@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/02/2010 05:32 PM, Tom Lane wrote:
>> (I'm not actually convinced that snapshot cloning is the only problem
>> here; locking could be an issue too, if there are concurrent processes
>> trying to take locks that will conflict with pg_dump's. But the
>> snapshot issue is definitely a showstopper.)

> Why is that more an issue with parallel pg_dump?

The scenario that bothers me is

1. pg_dump parent process AccessShareLocks everything to be dumped.

2. somebody else tries to acquire AccessExclusiveLock on table foo.

3. pg_dump child process is told to dump foo, tries to acquire
AccessShareLock.

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete. Can you say "undetectable deadlock"?

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 02:24:47
Message-ID:	4CF854EF.3080100@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 09:09 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 12/02/2010 05:32 PM, Tom Lane wrote:
>>> (I'm not actually convinced that snapshot cloning is the only problem
>>> here; locking could be an issue too, if there are concurrent processes
>>> trying to take locks that will conflict with pg_dump's. But the
>>> snapshot issue is definitely a showstopper.)
>> Why is that more an issue with parallel pg_dump?
> The scenario that bothers me is
>
> 1. pg_dump parent process AccessShareLocks everything to be dumped.
>
> 2. somebody else tries to acquire AccessExclusiveLock on table foo.
> hmm.
> 3. pg_dump child process is told to dump foo, tries to acquire
> AccessShareLock.
>
> Now, process 3 is blocked behind process 2 is blocked behind process 1
> which is waiting for 3 to complete. Can you say "undetectable deadlock"?
>
>

Hmm. Yeah. Maybe we could get around it if we prefork the workers and
they all acquire locks on everything to be dumped up front in nowait
mode, right after the parent, and if they can't the whole dump fails. Or
something along those lines.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 02:33:58
Message-ID:	11743.1291343638@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Umm, nobody has attributed ridiculousness to anyone. Please don't put
> words in my mouth. But I think this is a perfectly reasonable discussion
> to have. Nobody gets to come along and get the features they want
> without some sort of consensus, not me, not you, not Joachim, not Tom.

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement. I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 02:41:53
Message-ID:	11860.1291344113@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/02/2010 09:09 PM, Tom Lane wrote:
>> Now, process 3 is blocked behind process 2 is blocked behind process 1
>> which is waiting for 3 to complete. Can you say "undetectable deadlock"?

> Hmm. Yeah. Maybe we could get around it if we prefork the workers and
> they all acquire locks on everything to be dumped up front in nowait
> mode, right after the parent, and if they can't the whole dump fails. Or
> something along those lines.

[ thinks for a bit... ] Actually it might be good enough if a child
simply takes the lock it needs in nowait mode, and reports failure on
error. We know the parent already has that lock, so the only way that
the child's request can fail is if something conflicting with
AccessShareLock is queued up behind the parent's lock. So failure to
get the child lock immediately proves that the deadlock case applies.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 03:03:33
Message-ID:	4CF85E05.7050706@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 09:41 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 12/02/2010 09:09 PM, Tom Lane wrote:
>>> Now, process 3 is blocked behind process 2 is blocked behind process 1
>>> which is waiting for 3 to complete. Can you say "undetectable deadlock"?
>> Hmm. Yeah. Maybe we could get around it if we prefork the workers and
>> they all acquire locks on everything to be dumped up front in nowait
>> mode, right after the parent, and if they can't the whole dump fails. Or
>> something along those lines.
> [ thinks for a bit... ] Actually it might be good enough if a child
> simply takes the lock it needs in nowait mode, and reports failure on
> error. We know the parent already has that lock, so the only way that
> the child's request can fail is if something conflicting with
> AccessShareLock is queued up behind the parent's lock. So failure to
> get the child lock immediately proves that the deadlock case applies.
>
>

Yeah, that would be a whole lot simpler. It would avoid the deadlock,
but it would have lots more chances for failure. But it would at least
be a good place to start.

cheers

andrew

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 04:44:55
Message-ID:	AANLkTimwgM8=Gy9r-ydLXtPJvNq=zj7OEsvpLRXJReKP@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> In particular, this issue *has* been discussed before, and there was a
> consensus that preserving dump consistency was a requirement. I don't
> think that Joachim gets to bypass that decision just by submitting a
> patch that ignores it.

I am not trying to bypass anything here :) Regarding the locking
issue I probably haven't done sufficient research, at least I managed
to miss the emails that mentioned it. Anyway, that seems to be solved
now fortunately, I'm going to implement your idea over the weekend.

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning. And actually it was you who
motivated me to work on it even without having snapshot consistency...

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php

In my patch pg_dump emits a warning when called with -j, if you feel
better with an extra option
--i-know-that-i-have-no-synchronized-snapshots, fine with me :-)

In the end we provide a tool with limitations, it might not serve all
use cases but there are use cases that would benefit a lot. I
personally think this is better than to provide no tool at all...

Joachim

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 11:06:16
Message-ID:	AANLkTi=3S1Tq2r0T2_G=KR+GWki=gRFzTMNxnYv+3WSE@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Umm, nobody has attributed ridiculousness to anyone. Please don't put
>> words in my mouth. But I think this is a perfectly reasonable discussion
>> to have. Nobody gets to come along and get the features they want
>> without some sort of consensus, not me, not you, not Joachim, not Tom.
>
> In particular, this issue *has* been discussed before, and there was a
> consensus that preserving dump consistency was a requirement. I don't
> think that Joachim gets to bypass that decision just by submitting a
> patch that ignores it.

Well, the discussion that Joachim linked too certainly doesn't have
any sort of clear consensus that that's the only way to go. In fact,
it seems to be much closer to the opposite consensus. Perhaps there
is some OTHER time that this has been discussed where "synchronization
is a hard requirement" was the consensus. There's an old saw that the
nice thing about standards is there are so many to choose from, and
the same thing can certainly be said about -hackers discussions on any
particular topic.

I actually think that the phrase "this has been discussed before and
rejected" should be permanently removed from our list of excuses for
rejecting a patch. Or if we must use that excuse, then I think a link
to the relevant discussion is a must, and the relevant discussion had
better reflect the fact that $TOPIC was in fact rejected. It seems to
me that in at least 50% of cases, someone comes back and says one of
the following things:

1. I searched the archives and could find no discussion along those lines.
2. I read that discussion and it doesn't appear to me that it reflects
a rejection of this idea. Instead what people seemed to be saying was
X.
3. At the time that might have been true, but what has changed in the
meanwhile is X.

In short, the problem with referring to previous discussions is that
our memories grow fuzzy over time. We remember that an idea was not
adopted, but not exactly why it wasn't adopted. We reject a new patch
with a good implementation of $FEATURE because an old patch was badly
done, or fell down on some peripheral issue, or just never got done.
Veteran backend hackers understand the inevitable necessity of arguing
about what consensus is actually reflected in the archives and whether
it's still relevant, but new people can be (and frequently are) put
off by it; and even for experienced contributors, it does little to
advance the dialogue. Hmm, according to so-and-so's memory, sometime
in the fourteen-year-history of the project someone didn't like this
idea, or maybe a similar one. Whee, time to start Googling.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 13:02:11
Message-ID:	4CF8EA53.5040101@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2010 11:44 PM, Joachim Wieland wrote:
> On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In particular, this issue *has* been discussed before, and there was a
>> consensus that preserving dump consistency was a requirement. I don't
>> think that Joachim gets to bypass that decision just by submitting a
>> patch that ignores it.
> I am not trying to bypass anything here :) Regarding the locking
> issue I probably haven't done sufficient research, at least I managed
> to miss the emails that mentioned it. Anyway, that seems to be solved
> now fortunately, I'm going to implement your idea over the weekend.
>
> Regarding snapshot cloning and dump consistency, I brought this up
> already several months ago and asked if the feature is considered
> useful even without snapshot cloning. And actually it was you who
> motivated me to work on it even without having snapshot consistency...
>
> http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php
>
> In my patch pg_dump emits a warning when called with -j, if you feel
> better with an extra option
> --i-know-that-i-have-no-synchronized-snapshots, fine with me :-)
>
> In the end we provide a tool with limitations, it might not serve all
> use cases but there are use cases that would benefit a lot. I
> personally think this is better than to provide no tool at all...
>
>
>

I think Tom's statement there:

> I think migration to a new server version (that's too incompatible for
> PITR or pg_migrate migration) is really the only likely use case.

is just wrong. Say you have a site that's open 24/7. But there is a
window of, say, 6 hours, each day, when it's almost but not quite quiet.
You want to be able to make your disaster recovery dump within that
window, and the low level of traffic means you can afford the degraded
performance that might result from a parallel dump. Or say you have a
hot standby machine from which you want to make the dump but want to set
the max_standby_*_delay as low as possible. These are both cases where
you might want parallel dump and yet you want dump consistency. I have a
client currently considering the latter setup, and the timing tolerances
are a little tricky. The times in which the system is in a state that we
want dumped are fixed, and we want to be sure that the dump is finished
by the next time such a time rolls around. (This is a system that in
effect makes one giant state change at a time.) If we can't complete the
dump in that time then there will be a delay introduced to the system's
critical path. Parallel dump will be very useful in helping us avoid
such a situation, but only if it's properly consistent.

I think Josh Berkus' comments in the thread you mentioned are correct:

> Actually, I'd say that there's a broad set of cases of people who want
> to do a parallel pg_dump while their system is active. Parallel pg_dump
> on a stopped system will help some people (for migration, particularly)
> but parallel pg_dump with snapshot cloning will help a lot more people.

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 16:23:48
Message-ID:	AANLkTikqXMPUk9Mnn+BvWyiouajGTQJPAOM--G0Zanbc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> I think Josh Berkus' comments in the thread you mentioned are correct:
>
>> Actually, I'd say that there's a broad set of cases of people who want
>> to do a parallel pg_dump while their system is active. Parallel pg_dump
>> on a stopped system will help some people (for migration, particularly)
>> but parallel pg_dump with snapshot cloning will help a lot more people.

But you failed to quote the rest of what he said:

> So: if parallel dump in single-user mode is what you can get done, then
> do it. We can always improve it later, and we have to start somewhere.
> But we will eventually need parallel pg_dump on active systems, and
> that should remain on the TODO list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 16:40:50
Message-ID:	4CF91D92.6060606@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/03/2010 11:23 AM, Robert Haas wrote:
> On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> I think Josh Berkus' comments in the thread you mentioned are correct:
>>
>>> Actually, I'd say that there's a broad set of cases of people who want
>>> to do a parallel pg_dump while their system is active. Parallel pg_dump
>>> on a stopped system will help some people (for migration, particularly)
>>> but parallel pg_dump with snapshot cloning will help a lot more people.
> But you failed to quote the rest of what he said:
>
>> So: if parallel dump in single-user mode is what you can get done, then
>> do it. We can always improve it later, and we have to start somewhere.
>> But we will eventually need parallel pg_dump on active systems, and
>> that should remain on the TODO list.

Right, and the reason I don't think that's right is that it seems to me
like a serious potential footgun.

But in any case, the reason I quoted Josh was in answer to a different
point, namely Tom's statement about the limited potential uses.

cheers

andre

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 16:56:32
Message-ID:	AANLkTikP8O7ib5O+rJHtF9R3PFD26FNJHBLH-o4nMqOx@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 3, 2010 at 11:40 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 12/03/2010 11:23 AM, Robert Haas wrote:
>>
>> On Fri, Dec 3, 2010 at 8:02 AM, Andrew Dunstan<andrew(at)dunslane(dot)net>
>> wrote:
>>>
>>> I think Josh Berkus' comments in the thread you mentioned are correct:
>>>
>>>> Actually, I'd say that there's a broad set of cases of people who want
>>>> to do a parallel pg_dump while their system is active. Parallel pg_dump
>>>> on a stopped system will help some people (for migration, particularly)
>>>> but parallel pg_dump with snapshot cloning will help a lot more people.
>>
>> But you failed to quote the rest of what he said:
>>
>>> So: if parallel dump in single-user mode is what you can get done, then
>>> do it. We can always improve it later, and we have to start somewhere.
>>> But we will eventually need parallel pg_dump on active systems, and
>>> that should remain on the TODO list.
>
> Right, and the reason I don't think that's right is that it seems to me like
> a serious potential footgun.
>
> But in any case, the reason I quoted Josh was in answer to a different
> point, namely Tom's statement about the limited potential uses.

I know the use cases are limited, but I think it's still useful on its own.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 17:17:11
Message-ID:	1291396575-sup-9640@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010:

> I know the use cases are limited, but I think it's still useful on its own.

I don't understand what's so difficult about starting with the snapshot
cloning patch. AFAIR it's already been written anyway, no?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-03 17:37:11
Message-ID:	4CF92AC7.2090803@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/03/2010 12:17 PM, Alvaro Herrera wrote:
> Excerpts from Robert Haas's message of vie dic 03 13:56:32 -0300 2010:
>
>> I know the use cases are limited, but I think it's still useful on its own.
> I don't understand what's so difficult about starting with the snapshot
> cloning patch. AFAIR it's already been written anyway, no?

Yeah. If we can do it then this whole argument becomes moot. Like you I
don't see why we can't.

cheers

andrew

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-05 17:55:39
Message-ID:	4CFBD21B.40808@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joachim Wieland wrote:
> Regarding snapshot cloning and dump consistency, I brought this up
> already several months ago and asked if the feature is considered
> useful even without snapshot cloning.

In addition, Joachim submitted a synchronized snapshot patch that looks
to me like it slipped through the cracks without being fully explored.
Since it's split in the official archives the easiest way to read the
thread is at
http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg143866.html

Or you can use these two:
http://archives.postgresql.org/pgsql-hackers/2010-01/msg00916.php
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00363.php

That never made it into a CommitFest proper that I can see, it just
picked up review mainly from Markus. The way I read that thread, there
were two objections:

1) This mechanism isn't general enough for all use-cases outside of
pg_dump, which doesn't make it wrong when the question is how to get
parallel pg_dump running

2) Running as superuser is excessive. Running as the database owner was
suggested as likely to be good enough for pg_dump purposes.

Ultimately I think that stalled because without a client that needed it
the code wasn't so interesting yet. But now there is one; should that
get revived again? It seems like all of the pieces needed to build
what's really desired here are available, it's just the always
non-trivial task of integrating them together the right way that's needed.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-05 18:28:47
Message-ID:	27542.1291573727@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> In addition, Joachim submitted a synchronized snapshot patch that looks
> to me like it slipped through the cracks without being fully explored.
> ...
> The way I read that thread, there were two objections:

> 1) This mechanism isn't general enough for all use-cases outside of
> pg_dump, which doesn't make it wrong when the question is how to get
> parallel pg_dump running

> 2) Running as superuser is excessive. Running as the database owner was
> suggested as likely to be good enough for pg_dump purposes.

IIRC, in old discussions of this problem we first considered allowing
clients to pull down an explicit representation of their snapshot (which
actually is an existing feature now, txid_current_snapshot()) and then
upload that again to become the active snapshot in another connection.
That was rejected on the grounds that you could cause all kinds of
mischief by uploading a bad snapshot; so we decided to think about
providing a server-side-only means to clone another backend's current
snapshot. Which is essentially what Joachim's above-mentioned patch
provides. However, as was discussed in that thread, that approach is
far from being ideal either.

I'm wondering if we should reconsider the pass-it-through-the-client
approach, because if we could make that work it would be more general and
it wouldn't need any special privileges. The trick seems to be to apply
sufficient sanity testing to the snapshot proposed to be installed in
the subsidiary transaction. I think the requirements would basically be
(1) xmin <= any listed XIDs < xmax
(2) xmin not so old as to cause GlobalXmin to decrease
(3) xmax not beyond current XID counter
(4) XID list includes all still-running XIDs in the given range

One tricky part would be ensuring GlobalXmin doesn't decrease when the
snap is installed, but I think that could be made to work if we take
ProcArrayLock exclusively and insist on observing some other running
transaction with xmin <= proposed xmin. For the pg_dump case this would
certainly hold since xmin would be the parent pg_dump's xmin.

Given the checks stated above, it would be possible for someone to
install a snapshot that corresponds to no actual state of the database,
eg it shows some T1 as running and T2 as committed when actually T1
committed before T2. I don't see any simple way for the installation
function to detect that, but I'm not sure whether it matters. The user
might see inconsistent data, but do we care? Perhaps as a safety
measure we should only allow snapshot installation in read-only
transactions, so that even if the xact does observe inconsistent data it
can't possibly corrupt the database state thereby. This'd be no skin
off pg_dump's nose, obviously. Or compromise on "only superusers can
do it in non-read-only transactions".

Thoughts?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 01:55:37
Message-ID:	AANLkTin8UeT8P2cUYVsGA+RjsuK=UYf2JXb8KZHq4FLS@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm wondering if we should reconsider the pass-it-through-the-client
> approach, because if we could make that work it would be more general and
> it wouldn't need any special privileges. The trick seems to be to apply
> sufficient sanity testing to the snapshot proposed to be installed in
> the subsidiary transaction. I think the requirements would basically be
> (1) xmin <= any listed XIDs < xmax
> (2) xmin not so old as to cause GlobalXmin to decrease
> (3) xmax not beyond current XID counter
> (4) XID list includes all still-running XIDs in the given range
>
> Thoughts?

I think this is too ugly to live. I really think it's a very bad idea
for database clients to need to explicitly know anywhere near this
many details about how the server represents snapshots. It's not
impossible we might want to change this in the future, and even if we
don't, it seems to me to be exposing a whole lot of unnecessary
internal grottiness.

How about just pg_publish_snapshot(), returning a token that is only
valid until the end of the transaction in which it was called, and
pg_subscribe_snapshot(token)? The implementation can be that the
publisher writes its snapshot to a temp file and returns the name of
the temp file, setting an at-commit hook to remove the temp file. The
subscriber reads the temp file and sets the contents as its
transaction snapshot. If security is a concern, one could also save
the publisher's role OID to the file and require the subscriber's to
match.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 02:04:53
Message-ID:	4CFC44C5.8030903@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/05/2010 08:55 PM, Robert Haas wrote:
> On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm wondering if we should reconsider the pass-it-through-the-client
>> approach, because if we could make that work it would be more general and
>> it wouldn't need any special privileges. The trick seems to be to apply
>> sufficient sanity testing to the snapshot proposed to be installed in
>> the subsidiary transaction. I think the requirements would basically be
>> (1) xmin<= any listed XIDs< xmax
>> (2) xmin not so old as to cause GlobalXmin to decrease
>> (3) xmax not beyond current XID counter
>> (4) XID list includes all still-running XIDs in the given range
>>
>> Thoughts?
> I think this is too ugly to live. I really think it's a very bad idea
> for database clients to need to explicitly know anywhere near this
> many details about how the server represents snapshots. It's not
> impossible we might want to change this in the future, and even if we
> don't, it seems to me to be exposing a whole lot of unnecessary
> internal grottiness.
>
> How about just pg_publish_snapshot(), returning a token that is only
> valid until the end of the transaction in which it was called, and
> pg_subscribe_snapshot(token)? The implementation can be that the
> publisher writes its snapshot to a temp file and returns the name of
> the temp file, setting an at-commit hook to remove the temp file. The
> subscriber reads the temp file and sets the contents as its
> transaction snapshot. If security is a concern, one could also save
> the publisher's role OID to the file and require the subscriber's to
> match.

Why not just say give me the snapshot currently held by process nnnn?

And please, not temp files if possible.

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 02:27:58
Message-ID:	AANLkTim_uPP2AGj3M_Ki0Ubu_NEe14A0RWg7zfivO=qX@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Why not just say give me the snapshot currently held by process nnnn?
>
> And please, not temp files if possible.

As far as I'm aware, the full snapshot doesn't normally exist in
shared memory, hence the need for publication of some sort. We could
dedicate a shared memory region for publication but then you have to
decide how many slots to allocate, and any number you pick will be too
many for some people and not enough for others, not to mention that
shared memory is a fairly precious resource.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Joachim Wieland <joe(at)mcknight(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, koichi(dot)szk(at)gmail(dot)com
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 04:45:18
Message-ID:	AANLkTinVTb7JcsHg37nOT15a+28DBK6gY0NEeOoE5XJy@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Why not just say give me the snapshot currently held by process nnnn?
>>
>> And please, not temp files if possible.
>
> As far as I'm aware, the full snapshot doesn't normally exist in
> shared memory, hence the need for publication of some sort. We could
> dedicate a shared memory region for publication but then you have to
> decide how many slots to allocate, and any number you pick will be too
> many for some people and not enough for others, not to mention that
> shared memory is a fairly precious resource.

So here is a patch that I have been playing with in the past, I have
done it a while back and thanks go to Koichi Suzuki for his helpful
comments. I have not published it earlier because I haven't worked on
it recently and from the discussion that I brought up in march I got
the feeling that people are fine with having a first version of
parallel dump without synchronized snapshots.

I am not really sure that what the patch does is sufficient nor if it
does it in the right way but I hope that it can serve as a basis to
collect ideas (and doubt).

My idea is pretty much similar to Robert's about publishing snapshots
and subscribing to them, the patch even uses these words.

Basically the idea is that a transaction in isolation level
serializable can publish a snapshot and as long as this transaction is
alive, its snapshot can be adopted by other transactions. Requiring
the publishing transaction to be serializable guarantees that the copy
of the snapshot in shared memory is always current. When the
transaction ends, the copy of the snapshot is also invalidated and
cannot be adopted anymore. So instead of doing explicit checks, the
patch aims at always having a reference transaction around that
guarantees validity of the snapshot information in shared memory.

The patch currently creates a new area in shared memory to store
snapshot information but we can certainly discuss this... I had a GUC
in mind that can control the number of available "slots", similar to
max_prepared_transactions. Snapshot information can become quite
large, especially with a high number of max_connections.

Known limitations: the patch is lacking awareness of prepared
transactions completely and doesn't check if both backends belong to
the same user.

Joachim

Attachment	Content-Type	Size
syncSnapshots.diff	text/x-patch	13.6 KB

From:	Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To:	Joachim Wieland <joe(at)mcknight(dot)de>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 06:15:45
Message-ID:	AANLkTimSDtnQUEpNq0QVvWojXb_5F=wzBCeUxKb_CakL@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you Joachim;

Yes, and the current patch requires the original (publisher)
transaction is alive to prevent RecentXmin updated.

I hope this restriction is acceptable if publishing/subscribing is
provided via functions, not statements.

Cheers;
----------
Koichi Suzuki

2010/12/6 Joachim Wieland <joe(at)mcknight(dot)de>:
> On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> Why not just say give me the snapshot currently held by process nnnn?
>>>
>>> And please, not temp files if possible.
>>
>> As far as I'm aware, the full snapshot doesn't normally exist in
>> shared memory, hence the need for publication of some sort. We could
>> dedicate a shared memory region for publication but then you have to
>> decide how many slots to allocate, and any number you pick will be too
>> many for some people and not enough for others, not to mention that
>> shared memory is a fairly precious resource.
>
> So here is a patch that I have been playing with in the past, I have
> done it a while back and thanks go to Koichi Suzuki for his helpful
> comments. I have not published it earlier because I haven't worked on
> it recently and from the discussion that I brought up in march I got
> the feeling that people are fine with having a first version of
> parallel dump without synchronized snapshots.
>
> I am not really sure that what the patch does is sufficient nor if it
> does it in the right way but I hope that it can serve as a basis to
> collect ideas (and doubt).
>
> My idea is pretty much similar to Robert's about publishing snapshots
> and subscribing to them, the patch even uses these words.
>
> Basically the idea is that a transaction in isolation level
> serializable can publish a snapshot and as long as this transaction is
> alive, its snapshot can be adopted by other transactions. Requiring
> the publishing transaction to be serializable guarantees that the copy
> of the snapshot in shared memory is always current. When the
> transaction ends, the copy of the snapshot is also invalidated and
> cannot be adopted anymore. So instead of doing explicit checks, the
> patch aims at always having a reference transaction around that
> guarantees validity of the snapshot information in shared memory.
>
> The patch currently creates a new area in shared memory to store
> snapshot information but we can certainly discuss this... I had a GUC
> in mind that can control the number of available "slots", similar to
> max_prepared_transactions. Snapshot information can become quite
> large, especially with a high number of max_connections.
>
> Known limitations: the patch is lacking awareness of prepared
> transactions completely and doesn't check if both backends belong to
> the same user.
>
>
> Joachim
>

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 07:29:04
Message-ID:	4CFC90C0.6000001@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.12.2010 02:55, Robert Haas wrote:
> On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm wondering if we should reconsider the pass-it-through-the-client
>> approach, because if we could make that work it would be more general and
>> it wouldn't need any special privileges. The trick seems to be to apply
>> sufficient sanity testing to the snapshot proposed to be installed in
>> the subsidiary transaction. I think the requirements would basically be
>> (1) xmin<= any listed XIDs< xmax
>> (2) xmin not so old as to cause GlobalXmin to decrease
>> (3) xmax not beyond current XID counter
>> (4) XID list includes all still-running XIDs in the given range
>>
>> Thoughts?
>
> I think this is too ugly to live. I really think it's a very bad idea
> for database clients to need to explicitly know anywhere near this
> many details about how the server represents snapshots. It's not
> impossible we might want to change this in the future, and even if we
> don't, it seems to me to be exposing a whole lot of unnecessary
> internal grottiness.

The client doesn't need to know anything about the snapshot blob that
the server gives it. It just needs to pass it back to the server through
the other connection. To the client, it's just an opaque chunk of bytes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 13:57:54
Message-ID:	AANLkTim=CmsOdLP8FJRZRLpZxPZdTDh0iL6Ca5E_+pAX@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 06.12.2010 02:55, Robert Haas wrote:
>>
>> On Sun, Dec 5, 2010 at 1:28 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> I'm wondering if we should reconsider the pass-it-through-the-client
>>> approach, because if we could make that work it would be more general and
>>> it wouldn't need any special privileges. The trick seems to be to apply
>>> sufficient sanity testing to the snapshot proposed to be installed in
>>> the subsidiary transaction. I think the requirements would basically be
>>> (1) xmin<= any listed XIDs< xmax
>>> (2) xmin not so old as to cause GlobalXmin to decrease
>>> (3) xmax not beyond current XID counter
>>> (4) XID list includes all still-running XIDs in the given range
>>>
>>> Thoughts?
>>
>> I think this is too ugly to live. I really think it's a very bad idea
>> for database clients to need to explicitly know anywhere near this
>> many details about how the server represents snapshots. It's not
>> impossible we might want to change this in the future, and even if we
>> don't, it seems to me to be exposing a whole lot of unnecessary
>> internal grottiness.
>
> The client doesn't need to know anything about the snapshot blob that the
> server gives it. It just needs to pass it back to the server through the
> other connection. To the client, it's just an opaque chunk of bytes.

I suppose that would work, but I still think it's a bad idea. We made
this mistake with expression trees. Any oversight in the code that
validates the chunk of bytes when it (or a modified version) is sent
back to the server turns into a security hole. I think it's a whole
lot simpler and cleaner to keep the representation details private to
the server.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 14:45:36
Message-ID:	4CFCF710.8040306@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.12.2010 14:57, Robert Haas wrote:
> On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> The client doesn't need to know anything about the snapshot blob that the
>> server gives it. It just needs to pass it back to the server through the
>> other connection. To the client, it's just an opaque chunk of bytes.
>
> I suppose that would work, but I still think it's a bad idea. We made
> this mistake with expression trees. Any oversight in the code that
> validates the chunk of bytes when it (or a modified version) is sent
> back to the server turns into a security hole.

True, but a snapshot is a lot simpler than an expression tree. It's
pretty much impossible to plug all the holes in the expression-tree
reading functions, and keep them hole-free in the future. The expression
tree format is constantly in flux. A snapshot, however, is a fairly
isolated small data structure that rarely changes.

> I think it's a whole
> lot simpler and cleaner to keep the representation details private to
> the server.

Well, then you need some sort of cross-backend communication, which is
always a bit clumsy.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 14:53:31
Message-ID:	AANLkTikjyUy0tsgV6-tXcra24HmeBAw4C1oYWSVP_GA4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 06.12.2010 14:57, Robert Haas wrote:
>>
>> On Mon, Dec 6, 2010 at 2:29 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> The client doesn't need to know anything about the snapshot blob that the
>>> server gives it. It just needs to pass it back to the server through the
>>> other connection. To the client, it's just an opaque chunk of bytes.
>>
>> I suppose that would work, but I still think it's a bad idea. We made
>> this mistake with expression trees. Any oversight in the code that
>> validates the chunk of bytes when it (or a modified version) is sent
>> back to the server turns into a security hole.
>
> True, but a snapshot is a lot simpler than an expression tree. It's pretty
> much impossible to plug all the holes in the expression-tree reading
> functions, and keep them hole-free in the future. The expression tree format
> is constantly in flux. A snapshot, however, is a fairly isolated small data
> structure that rarely changes.

I guess. It still seems far too much like exposing the server's guts
for my taste. It might not be as bad as the expression tree stuff,
but there's nothing particularly good about it either.

>> I think it's a whole
>> lot simpler and cleaner to keep the representation details private to
>> the server.
>
> Well, then you need some sort of cross-backend communication, which is
> always a bit clumsy.

A temp file seems quite sufficient, and not at all difficult.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 14:58:41
Message-ID:	4CFCFA21.5040409@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.12.2010 15:53, Robert Haas wrote:
> I guess. It still seems far too much like exposing the server's guts
> for my taste. It might not be as bad as the expression tree stuff,
> but there's nothing particularly good about it either.

Note that we already have txid_current_snapshot() function, which
exposes all that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 15:22:53
Message-ID:	AANLkTi=LpmeivefHLRFS-OXDUzf=HC5VZ4ttZwx3cnK4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 06.12.2010 15:53, Robert Haas wrote:
>>
>> I guess. It still seems far too much like exposing the server's guts
>> for my taste. It might not be as bad as the expression tree stuff,
>> but there's nothing particularly good about it either.
>
> Note that we already have txid_current_snapshot() function, which exposes
> all that.

Fair enough, and I think that's actually useful for Slony &c. But I
don't think we should shy away of providing a cleaner API here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 15:35:56
Message-ID:	4CFD02DC.1070705@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/06/2010 10:22 AM, Robert Haas wrote:
> On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 06.12.2010 15:53, Robert Haas wrote:
>>> I guess. It still seems far too much like exposing the server's guts
>>> for my taste. It might not be as bad as the expression tree stuff,
>>> but there's nothing particularly good about it either.
>> Note that we already have txid_current_snapshot() function, which exposes
>> all that.
> Fair enough, and I think that's actually useful for Slony&c. But I
> don't think we should shy away of providing a cleaner API here.
>

Just don't let the perfect get in the way of the good :P

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 15:38:13
Message-ID:	AANLkTimY2w1yKN0_GktufDaO7ieuOc6++dfhy=SVdend@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 10:35 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 12/06/2010 10:22 AM, Robert Haas wrote:
>>
>> On Mon, Dec 6, 2010 at 9:58 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> On 06.12.2010 15:53, Robert Haas wrote:
>>>>
>>>> I guess. It still seems far too much like exposing the server's guts
>>>> for my taste. It might not be as bad as the expression tree stuff,
>>>> but there's nothing particularly good about it either.
>>>
>>> Note that we already have txid_current_snapshot() function, which exposes
>>> all that.
>>
>> Fair enough, and I think that's actually useful for Slony&c. But I
>> don't think we should shy away of providing a cleaner API here.
>>
>
> Just don't let the perfect get in the way of the good :P

I'll keep that in mind. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 15:40:44
Message-ID:	26498.1291650044@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Well, then you need some sort of cross-backend communication, which is
>> always a bit clumsy.

> A temp file seems quite sufficient, and not at all difficult.

"Not at all difficult" is nonsense. To do that, you need to invent some
mechanism for sender and receivers to identify which temp file they want
to use, and you need to think of some way to clean up the files when the
client forgets to tell you to do so. That's going to be at least as
ugly as anything else. And I think it's unproven that this approach
would be security-hole-free either. For instance, what about some other
session overwriting pg_dump's snapshot temp file?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 15:47:15
Message-ID:	AANLkTin7VTiM-3RH4QOPy76v7ziz-xHTEg-a-QE4O0=t@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Well, then you need some sort of cross-backend communication, which is
>>> always a bit clumsy.
>
>> A temp file seems quite sufficient, and not at all difficult.
>
> "Not at all difficult" is nonsense. To do that, you need to invent some
> mechanism for sender and receivers to identify which temp file they want
> to use,

Why is this even remotely hard? That's the whole point of having the
"publish" operation return a token. The token either is, or uniquely
identifies, the file name.

> and you need to think of some way to clean up the files when the
> client forgets to tell you to do so. That's going to be at least as
> ugly as anything else.

Backends don't forget to call their end-of-transaction hooks, do they?
They might crash, but we already have code to remove temp files on
server restart. At most it would need minor adjustment.

> And I think it's unproven that this approach
> would be security-hole-free either. For instance, what about some other
> session overwriting pg_dump's snapshot temp file?

Why would this be any different from any other temp file? We surely
must have a mechanism in place to ensure that the temporary files used
by sorts or hash joins don't get overwritten by some other session, or
the system would be totally unstable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 15:50:28
Message-ID:	4CFD0644.2020100@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/06/2010 10:40 AM, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Dec 6, 2010 at 9:45 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Well, then you need some sort of cross-backend communication, which is
>>> always a bit clumsy.
>> A temp file seems quite sufficient, and not at all difficult.
> "Not at all difficult" is nonsense. To do that, you need to invent some
> mechanism for sender and receivers to identify which temp file they want
> to use, and you need to think of some way to clean up the files when the
> client forgets to tell you to do so. That's going to be at least as
> ugly as anything else. And I think it's unproven that this approach
> would be security-hole-free either. For instance, what about some other
> session overwriting pg_dump's snapshot temp file?
>
>

Yeah. I'm still not convinced that using shared memory is a bad way to
pass these around. Surely we're not talking about large numbers of them.
What am I missing here?

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 17:28:10
Message-ID:	28168.1291656490@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Yeah. I'm still not convinced that using shared memory is a bad way to
> pass these around. Surely we're not talking about large numbers of them.
> What am I missing here?

They're not of a very predictable size.

Robert's idea of publish() returning a temp file identifier, which then
gets removed at transaction end, might work all right.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 17:31:54
Message-ID:	28227.1291656714@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Why not just say give me the snapshot currently held by process nnnn?

There's not a unique snapshot held by a particular process. Also, we
don't want to expend the overhead to fully publish every snapshot.
I think it's really necessary that the "sending" process take some
deliberate action to publish a snapshot.

> And please, not temp files if possible.

Barring the cleanup issue, I don't see why not. This is a relatively
low-usage feature, I think, so I wouldn't be much in favor of dedicating
shmem to it even if the space requirement were predictable.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 17:50:19
Message-ID:	4CFD225B.3020207@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/06/2010 12:28 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> Yeah. I'm still not convinced that using shared memory is a bad way to
>> pass these around. Surely we're not talking about large numbers of them.
>> What am I missing here?
> They're not of a very predictable size.
>
>

Ah. Ok.

cheers

andrew

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Joachim Wieland" <joe(at)mcknight(dot)de>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 18:11:18
Message-ID:	4CFCD2E60200002500038351@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> I'm still not convinced that using shared memory is a bad way to
>> pass these around. Surely we're not talking about large numbers
>> of them. What am I missing here?
>
> They're not of a very predictable size.

Surely you can predict that any snapshot is no larger than a fairly
small fixed portion plus sizeof(TransactionId) * MaxBackends? So,
for example, if you're configured for 100 connections, you'd be
limited to something under 1kB, maximum?

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Joachim Wieland" <joe(at)mcknight(dot)de>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 18:24:27
Message-ID:	29177.1291659867@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm still not convinced that using shared memory is a bad way to
>>> pass these around. Surely we're not talking about large numbers
>>> of them. What am I missing here?
>>
>> They're not of a very predictable size.

> Surely you can predict that any snapshot is no larger than a fairly
> small fixed portion plus sizeof(TransactionId) * MaxBackends?

No. See subtransactions.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Joachim Wieland" <joe(at)mcknight(dot)de>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 18:28:39
Message-ID:	4CFCD6F70200002500038363@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:

>> Surely you can predict that any snapshot is no larger than a fairly
>> small fixed portion plus sizeof(TransactionId) * MaxBackends?
>
> No. See subtransactions.

Subtransactions are included in snapshots?

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Joachim Wieland" <joe(at)mcknight(dot)de>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 18:49:28
Message-ID:	29609.1291661368@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> No. See subtransactions.

> Subtransactions are included in snapshots?

Sure, see GetSnapshotData(). You could avoid it by setting
suboverflowed, but that comes at a nontrivial performance cost.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Joachim Wieland" <joe(at)mcknight(dot)de>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 18:58:15
Message-ID:	4CFCDDE7020000250003836F@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> No. See subtransactions.
>
>> Subtransactions are included in snapshots?
>
> Sure, see GetSnapshotData(). You could avoid it by setting
> suboverflowed, but that comes at a nontrivial performance cost.

Yeah, sorry for blurting like that before I checked. I was somewhat
panicked that I'd missed something important for SSI, because my
XidIsConcurrent check just uses xmin, xmax, and xip; I was afraid
what I have would fall down in the face of subtransactions. But on
review I found that I'd thought that through and (discussion in in
the archives) I always wanted to associate the locks and conflicts
with the top level transaction; so that was already identified
before checking for overlap, and it was therefore more efficient to
just check that.

Sorry for the "senior moment". :-/

Perhaps a line or two of comments about that in the SSI patch would
be a good idea. And maybe some tests involving subtransactions....

-Kevin

From:	marcin mank <marcin(dot)mank(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 20:48:29
Message-ID:	AANLkTikOrY05Y5m6Zf0gOebAWeTqzozDYKbkB6Uf6CZw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> IIRC, in old discussions of this problem we first considered allowing
> clients to pull down an explicit representation of their snapshot (which
> actually is an existing feature now, txid_current_snapshot()) and then
> upload that again to become the active snapshot in another connection.

Could a hot standby use such a snapshot representation? I.e. same
snapshot on the master and the standby?

Greetings
Marcin Mańk

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	marcin mank <marcin(dot)mank(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 21:06:36
Message-ID:	4CFD505C.9010708@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.12.2010 21:48, marcin mank wrote:
> On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> IIRC, in old discussions of this problem we first considered allowing
>> clients to pull down an explicit representation of their snapshot (which
>> actually is an existing feature now, txid_current_snapshot()) and then
>> upload that again to become the active snapshot in another connection.
>
> Could a hot standby use such a snapshot representation? I.e. same
> snapshot on the master and the standby?

Hmm, I suppose it could. That's an interesting idea, you could run
parallel pg_dump or something else against master and/or multiple hot
standby servers, all working on the same snapshot.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	marcin mank <marcin(dot)mank(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 21:15:41
Message-ID:	2287.1291670141@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

marcin mank <marcin(dot)mank(at)gmail(dot)com> writes:
> On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> IIRC, in old discussions of this problem we first considered allowing
>> clients to pull down an explicit representation of their snapshot (which
>> actually is an existing feature now, txid_current_snapshot()) and then
>> upload that again to become the active snapshot in another connection.

> Could a hot standby use such a snapshot representation? I.e. same
> snapshot on the master and the standby?

Hm, that's a good question. It seems like it's at least possibly
workable, but I'm not sure if there are any showstoppers. The other
proposal of publish-a-snapshot would presumably NOT support this, since
we'd not want to ship the snapshot temp files down the WAL stream.

However, if you were doing something like parallel pg_dump you could
just run the parent and child instances all against the slave, so the
pg_dump scenario doesn't seem to offer much of a supporting use-case for
worrying about this. When would you really need to be able to do it?

regards, tom lane

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-06 22:46:31
Message-ID:	4CFD67C7.1020703@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> However, if you were doing something like parallel pg_dump you could
> just run the parent and child instances all against the slave, so the
> pg_dump scenario doesn't seem to offer much of a supporting use-case for
> worrying about this. When would you really need to be able to do it?

If you had several standbys, you could distribute the work of the
pg_dump among them. This would be a huge speedup for a large database,
potentially, thanks to parallelization of I/O and network. Imagine
doing a pg_dump of a 300GB database in 10min.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-07 00:22:09
Message-ID:	4881.1291681329@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> However, if you were doing something like parallel pg_dump you could
>> just run the parent and child instances all against the slave, so the
>> pg_dump scenario doesn't seem to offer much of a supporting use-case for
>> worrying about this. When would you really need to be able to do it?

> If you had several standbys, you could distribute the work of the
> pg_dump among them. This would be a huge speedup for a large database,
> potentially, thanks to parallelization of I/O and network. Imagine
> doing a pg_dump of a 300GB database in 10min.

That does sound kind of attractive. But to do that I think we'd have to
go with the pass-the-snapshot-through-the-client approach. Shipping
internal snapshot files through the WAL stream doesn't seem attractive
to me.

While I see Robert's point about preferring not to expose the snapshot
contents to clients, I don't think it outweighs all other considerations
here; and every other one is pointing to doing it the other way.

regards, tom lane

From:	Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-07 03:13:23
Message-ID:	AANLkTinaWfwOBoFdD2H_RJo3sf2j8PShk5d+bpk-aTNV@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

We may need other means to ensure that the snapshot is available on
the slave. It could be a bit too early to use the snapshot on the
slave depending upon the delay of WAL replay.
----------
Koichi Suzuki

2010/12/7 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> marcin mank <marcin(dot)mank(at)gmail(dot)com> writes:
>> On Sun, Dec 5, 2010 at 7:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> IIRC, in old discussions of this problem we first considered allowing
>>> clients to pull down an explicit representation of their snapshot (which
>>> actually is an existing feature now, txid_current_snapshot()) and then
>>> upload that again to become the active snapshot in another connection.
>
>> Could a hot standby use such a snapshot representation? I.e. same
>> snapshot on the master and the standby?
>
> Hm, that's a good question. It seems like it's at least possibly
> workable, but I'm not sure if there are any showstoppers. The other
> proposal of publish-a-snapshot would presumably NOT support this, since
> we'd not want to ship the snapshot temp files down the WAL stream.
>
> However, if you were doing something like parallel pg_dump you could
> just run the parent and child instances all against the slave, so the
> pg_dump scenario doesn't seem to offer much of a supporting use-case for
> worrying about this. When would you really need to be able to do it?
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-07 07:16:51
Message-ID:	4CFDDF63.40301@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/07/2010 01:22 AM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> However, if you were doing something like parallel pg_dump you could
>>> just run the parent and child instances all against the slave, so the
>>> pg_dump scenario doesn't seem to offer much of a supporting use-case for
>>> worrying about this. When would you really need to be able to do it?
>
>> If you had several standbys, you could distribute the work of the
>> pg_dump among them. This would be a huge speedup for a large database,
>> potentially, thanks to parallelization of I/O and network. Imagine
>> doing a pg_dump of a 300GB database in 10min.
>
> That does sound kind of attractive. But to do that I think we'd have to
> go with the pass-the-snapshot-through-the-client approach. Shipping
> internal snapshot files through the WAL stream doesn't seem attractive
> to me.

this kind of functionality would also be very useful/interesting for
connection poolers/loadbalancers that are trying to distribute load
across multiple hosts and could use that to at least give some sort of
consistency guarantee.

Stefan

From:	Tatsuo Ishii <ishii(at)postgresql(dot)org>
To:	stefan(at)kaltenbrunner(dot)cc
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, josh(at)agliodbs(dot)com, marcin(dot)mank(at)gmail(dot)com, greg(at)2ndquadrant(dot)com, joe(at)mcknight(dot)de, andrew(at)dunslane(dot)net, robertmhaas(at)gmail(dot)com, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-07 07:27:54
Message-ID:	20101207.162754.999254373512590109.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> On 12/07/2010 01:22 AM, Tom Lane wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>>> However, if you were doing something like parallel pg_dump you could
>>>> just run the parent and child instances all against the slave, so the
>>>> pg_dump scenario doesn't seem to offer much of a supporting use-case for
>>>> worrying about this. When would you really need to be able to do it?
>>
>>> If you had several standbys, you could distribute the work of the
>>> pg_dump among them. This would be a huge speedup for a large database,
>>> potentially, thanks to parallelization of I/O and network. Imagine
>>> doing a pg_dump of a 300GB database in 10min.
>>
>> That does sound kind of attractive. But to do that I think we'd have to
>> go with the pass-the-snapshot-through-the-client approach. Shipping
>> internal snapshot files through the WAL stream doesn't seem attractive
>> to me.
>
> this kind of functionality would also be very useful/interesting for
> connection poolers/loadbalancers that are trying to distribute load
> across multiple hosts and could use that to at least give some sort of
> consistency guarantee.

In addition to this, that will greatly help query based replication
tools such as pgpool-II. Sounds great.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

From:	Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-07 08:23:19
Message-ID:	AANLkTi=8Luv--1E3kHL0tp1NHgGQAuHEHWf7vSHTgC=7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is what Postgres-XC is doing between a coordinator and a
datanode. Coordinator may correspond to poolers/loadbalancers.
Does anyone think it makes sense to extract XC implementation of
snapshot shipping to PostgreSQL itself?

Cheers;
----------
Koichi Suzuki

2010/12/7 Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>:
> On 12/07/2010 01:22 AM, Tom Lane wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>>> However, if you were doing something like parallel pg_dump you could
>>>> just run the parent and child instances all against the slave, so the
>>>> pg_dump scenario doesn't seem to offer much of a supporting use-case for
>>>> worrying about this. When would you really need to be able to do it?
>>
>>> If you had several standbys, you could distribute the work of the
>>> pg_dump among them. This would be a huge speedup for a large database,
>>> potentially, thanks to parallelization of I/O and network. Imagine
>>> doing a pg_dump of a 300GB database in 10min.
>>
>> That does sound kind of attractive. But to do that I think we'd have to
>> go with the pass-the-snapshot-through-the-client approach. Shipping
>> internal snapshot files through the WAL stream doesn't seem attractive
>> to me.
>
> this kind of functionality would also be very useful/interesting for
> connection poolers/loadbalancers that are trying to distribute load
> across multiple hosts and could use that to at least give some sort of
> consistency guarantee.
>
>
>
> Stefan
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	koichi(dot)szk(at)gmail(dot)com
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-07 12:51:47
Message-ID:	4CFE2DE3.9020406@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/07/2010 09:23 AM, Koichi Suzuki wrote:
> This is what Postgres-XC is doing between a coordinator and a
> datanode. Coordinator may correspond to poolers/loadbalancers.
> Does anyone think it makes sense to extract XC implementation of
> snapshot shipping to PostgreSQL itself?

well if there is a preeceeding implementation of that it would certainly
be of interest to see that - but before you go and extract the code
maybe you could tell us how exactly it works?

Stefan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	koichi(dot)szk(at)gmail(dot)com
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-14 22:13:35
Message-ID:	AANLkTik8xPK-mLuv_g5oG+igPU5hhQeFzwdGry7_KEzt@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 7, 2010 at 3:23 AM, Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com> wrote:
> This is what Postgres-XC is doing between a coordinator and a
> datanode. Coordinator may correspond to poolers/loadbalancers.
> Does anyone think it makes sense to extract XC implementation of
> snapshot shipping to PostgreSQL itself?

Perhaps, though of course it would need to be re-licensed. I'd be
happy to see us pursue a snapshot cloning framework, wherever it comes
from. I remain unconvinced that it should be made a hard requirement
for parallel pg_dump, but of course if we can get it implemented then
the point becomes moot.

Let's not let this fall on the floor. Someone should pursue this,
whether it's Joachim or Koichi or someone else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-15 00:06:40
Message-ID:	AANLkTi=sgRhLmRaXt1L4_SReDRfb9Od5+A9jOqS01NPr@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert;

Thank you very much for your advice. Indeed, I'm considering to
change the license to PostgreSQL's one. It may take a bit more
though...
----------
Koichi Suzuki

2010/12/15 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Dec 7, 2010 at 3:23 AM, Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com> wrote:
>> This is what Postgres-XC is doing between a coordinator and a
>> datanode. Coordinator may correspond to poolers/loadbalancers.
>> Does anyone think it makes sense to extract XC implementation of
>> snapshot shipping to PostgreSQL itself?
>
> Perhaps, though of course it would need to be re-licensed. I'd be
> happy to see us pursue a snapshot cloning framework, wherever it comes
> from. I remain unconvinced that it should be made a hard requirement
> for parallel pg_dump, but of course if we can get it implemented then
> the point becomes moot.
>
> Let's not let this fall on the floor. Someone should pursue this,
> whether it's Joachim or Koichi or someone else.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	koichi(dot)szk(at)gmail(dot)com
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-15 01:26:53
Message-ID:	AANLkTik7EW+HFk6+VCjhFkXmURP411fwKoOGJeU0XMbm@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 14, 2010 at 7:06 PM, Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com> wrote:
> Thank you very much for your advice. Indeed, I'm considering to
> change the license to PostgreSQL's one. It may take a bit more
> though...

You wouldn't necessarily need to relicense all of Postgres-XC
(although that would be cool, too, at least IMO), just the portion you
were proposing for commit to PostgreSQL. Or it doesn't sound like it
would be infeasible for someone to code this up from scratch. But we
should try to make something good happen here!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-24 15:52:50
Message-ID:	201012241552.oBOFqoo16792@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> I actually think that the phrase "this has been discussed before and
> rejected" should be permanently removed from our list of excuses for
> rejecting a patch. Or if we must use that excuse, then I think a link
> to the relevant discussion is a must, and the relevant discussion had
> better reflect the fact that $TOPIC was in fact rejected. It seems to
> me that in at least 50% of cases, someone comes back and says one of
> the following things:
>
> 1. I searched the archives and could find no discussion along those lines.
> 2. I read that discussion and it doesn't appear to me that it reflects
> a rejection of this idea. Instead what people seemed to be saying was
> X.
> 3. At the time that might have been true, but what has changed in the
> meanwhile is X.

Agreed. Perhaps we need an anti-TODO that lists things we don't want in
more detail. The TODO has that for a few items, but scaling things up
there will be cumbersome.

I agree that having the person saying it was rejected find the email
discussion is ideal --- if they can't find it, odds are the patch person
will not be able to find it either.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-24 19:48:33
Message-ID:	1293220113.30276.180.camel@jd-desktop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> anwhile is X.
>
> Agreed. Perhaps we need an anti-TODO that lists things we don't want in
> more detail. The TODO has that for a few items, but scaling things up
> there will be cumbersome.
>

Well there is a problem with this too. A good example is hints. A lot of
the community wants hints. A lot of the community doesn't. The community
changes as we get more mature and more hackers. It isn't hard to point
to dozens of items we have now that would have been on that list 5 years
ago.

> I agree that having the person saying it was rejected find the email
> discussion is ideal --- if they can't find it, odds are the patch person
> will not be able to find it either.

I would have to agree here. The idea that we have to search email is bad
enough (issue/bug/feature tracker anyone?) but to have someone say,
search the archives? That is just plain rude and anti-community.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	jd(at)commandprompt(dot)com
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-24 23:26:43
Message-ID:	AANLkTik+gdwG=TxknrYCg2fgwB5b-XdM-X-jNFgt8Dw+@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:

> I would have to agree here. The idea that we have to search email is bad
> enough (issue/bug/feature tracker anyone?) but to have someone say,
> search the archives? That is just plain rude and anti-community.

Saying "search the bugtracker" is no less rude than "search the archives"...

And most of the bugtrackers I've had to search have way *less*
ease-of-use for searching than a good mailing list archive (I tend to
keep going back to gmane's search)

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	jd(at)commandprompt(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-24 23:37:26
Message-ID:	4D152EB6.6000407@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/24/2010 06:26 PM, Aidan Van Dyk wrote:
> On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake<jd(at)commandprompt(dot)com> wrote:
>
>> I would have to agree here. The idea that we have to search email is bad
>> enough (issue/bug/feature tracker anyone?) but to have someone say,
>> search the archives? That is just plain rude and anti-community.
> Saying "search the bugtracker" is no less rude than "search the archives"...
>
> And most of the bugtrackers I've had to search have way *less*
> ease-of-use for searching than a good mailing list archive (I tend to
> keep going back to gmane's search)
>
>

It's deja vu all over again. See mailing list archives for details.

cheers

andrew

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-24 23:46:07
Message-ID:	1293234367.30276.184.camel@jd-desktop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-12-24 at 18:26 -0500, Aidan Van Dyk wrote:
> On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
>
> > I would have to agree here. The idea that we have to search email is bad
> > enough (issue/bug/feature tracker anyone?) but to have someone say,
> > search the archives? That is just plain rude and anti-community.
>
> Saying "search the bugtracker" is no less rude than "search the archives"...
>
> And most of the bugtrackers I've had to search have way *less*
> ease-of-use for searching than a good mailing list archive (I tend to
> keep going back to gmane's search)

I think you kind of missed my point.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, HeikkiLinnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-25 02:11:12
Message-ID:	297DA80E-01A2-4263-8489-FD7C69F8D29F@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Dec 24, 2010, at 10:52 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Agreed. Perhaps we need an anti-TODO that lists things we don't want in
> more detail. The TODO has that for a few items, but scaling things up
> there will be cumbersome.

I don't really think that'd be much better. What might be of some value is summaries of previous discussions, *with citations*. Foo seems like it would be useful [1,2,3] but there are concerns about bar [4,5] and baz[6].

...Robert

From:	David Fetter <david(at)fetter(dot)org>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Aidan Van Dyk <aidan(at)highrise(dot)ca>, jd(at)commandprompt(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-25 18:14:40
Message-ID:	20101225181440.GA907@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 24, 2010 at 06:37:26PM -0500, Andrew Dunstan wrote:
> On 12/24/2010 06:26 PM, Aidan Van Dyk wrote:
> >On Fri, Dec 24, 2010 at 2:48 PM, Joshua D. Drake<jd(at)commandprompt(dot)com> wrote:
> >
> >>I would have to agree here. The idea that we have to search email
> >>is bad enough (issue/bug/feature tracker anyone?) but to have
> >>someone say, search the archives? That is just plain rude and
> >>anti-community.
> >Saying "search the bugtracker" is no less rude than "search the
> >archives"...
> >
> >And most of the bugtrackers I've had to search have way *less*
> >ease-of-use for searching than a good mailing list archive (I tend
> >to keep going back to gmane's search)
>
> It's deja vu all over again. See mailing list archives for details.

LOL!

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, marcin mank <marcin(dot)mank(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: WIP patch for parallel pg_dump
Date:	2010-12-26 03:13:23
Message-ID:	AANLkTik_-7HATrAYDTwuhgpx03vbb30f8OeN7H+4VWEo@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 6, 2010 at 7:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> However, if you were doing something like parallel pg_dump you could
> >> just run the parent and child instances all against the slave, so the
> >> pg_dump scenario doesn't seem to offer much of a supporting use-case for
> >> worrying about this. When would you really need to be able to do it?
>
> > If you had several standbys, you could distribute the work of the
> > pg_dump among them. This would be a huge speedup for a large database,
> > potentially, thanks to parallelization of I/O and network. Imagine
> > doing a pg_dump of a 300GB database in 10min.
>
> That does sound kind of attractive. But to do that I think we'd have to
> go with the pass-the-snapshot-through-the-client approach. Shipping
> internal snapshot files through the WAL stream doesn't seem attractive
> to me.
>
> While I see Robert's point about preferring not to expose the snapshot
> contents to clients, I don't think it outweighs all other considerations
> here; and every other one is pointing to doing it the other way.
>
>
How about the publishing transaction puts the snapshot in a (new) system
table and passes a UUID to its children, and the joining transactions looks
for that UUID in the system table using dirty snapshot (SnapshotAny) using a
security-definer function owned by superuser.

No shared memory used, and if WAL-logged, the snapshot would get to the
slaves too.

I realize SnapshotAny wouldn't be sufficient since we want the tuple to
become invisible when the publishing transaction ends (commit/rollback),
hence something akin to (new) HeapTupleSatisfiesStillRunning() would be
needed.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device