Re: WIP patch for parallel pg_dump

From: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch for parallel pg_dump
Date: 2010-12-06 06:15:45
Message-ID: AANLkTimSDtnQUEpNq0QVvWojXb_5F=wzBCeUxKb_CakL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you Joachim;

Yes, and the current patch requires the original (publisher)
transaction is alive to prevent RecentXmin updated.

I hope this restriction is acceptable if publishing/subscribing is
provided via functions, not statements.

Cheers;
----------
Koichi Suzuki

2010/12/6 Joachim Wieland <joe(at)mcknight(dot)de>:
> On Sun, Dec 5, 2010 at 9:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Dec 5, 2010 at 9:04 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> Why not just say give me the snapshot currently held by process nnnn?
>>>
>>> And please, not temp files if possible.
>>
>> As far as I'm aware, the full snapshot doesn't normally exist in
>> shared memory, hence the need for publication of some sort.  We could
>> dedicate a shared memory region for publication but then you have to
>> decide how many slots to allocate, and any number you pick will be too
>> many for some people and not enough for others, not to mention that
>> shared memory is a fairly precious resource.
>
> So here is a patch that I have been playing with in the past, I have
> done it a while back and thanks go to Koichi Suzuki for his helpful
> comments. I have not published it earlier because I haven't worked on
> it recently and from the discussion that I brought up in march I got
> the feeling that people are fine with having a first version of
> parallel dump without synchronized snapshots.
>
> I am not really sure that what the patch does is sufficient nor if it
> does it in the right way but I hope that it can serve as a basis to
> collect ideas (and doubt).
>
> My idea is pretty much similar to Robert's about publishing snapshots
> and subscribing to them, the patch even uses these words.
>
> Basically the idea is that a transaction in isolation level
> serializable can publish a snapshot and as long as this transaction is
> alive, its snapshot can be adopted by other transactions. Requiring
> the publishing transaction to be serializable guarantees that the copy
> of the snapshot in shared memory is always current. When the
> transaction ends, the copy of the snapshot is also invalidated and
> cannot be adopted anymore. So instead of doing explicit checks, the
> patch aims at always having a reference transaction around that
> guarantees validity of the snapshot information in shared memory.
>
> The patch currently creates a new area in shared memory to store
> snapshot information but we can certainly discuss this... I had a GUC
> in mind that can control the number of available "slots", similar to
> max_prepared_transactions. Snapshot information can become quite
> large, especially with a high number of max_connections.
>
> Known limitations: the patch is lacking awareness of prepared
> transactions completely and doesn't check if both backends belong to
> the same user.
>
>
> Joachim
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitriy Igrishin 2010-12-06 06:23:19 Re: Suggesting a libpq addition
Previous Message Craig Ringer 2010-12-06 05:56:46 Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)