Re: directory archive format for pg_dump

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 22:55:34
Message-ID: 4D0A98E6.7090606@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/16/2010 03:52 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 12/16/2010 03:13 PM, Robert Haas wrote:
>>> So how bad would it be if we committed this new format without support
>>> for splitting large relations into multiple files, or with some stub
>>> support that never actually gets used, and fixed this later? Because
>>> this is starting to sound like a bigger project than I think we ought
>>> to be requiring for this patch.
>> I don't think we have to have that in the first go at all. Parallel dump
>> could be extremely useful without it. I haven't looked closely, but I
>> assume there will still be an archive version recorded somewhere. When
>> we change the archive format, bump the version number.
> Sure, but it's worth thinking about the feature now. If there are
> format tweaks to be made, it might be less painful to make them now
> instead of later, even if actual support for the feature isn't there.
> (I agree I don't want to try to implement it just yet.)
>
>

Yeah, OK. Well, time is getting short but (hand waving wildly) I think
we could probably get by with just adding a member to the TOC for the
section number of the entry (set it to 0 for non TABLE DATA TOC
entries). The section number could be built into the file name in
directory format. For now that number would always be 1 for TABLE DATA
members.

This has intriguing possibilities for parallel restore of custom format
dumps too. It could be very useful to be able to restore a single table
in parallel, if we had more than one TABLE DATA member per table.

I'm deliberately just addressing infrastructure issues rather than how
we actually generate multiple sections of data for a single table
(especially if we want to do that in parallel).

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2010-12-16 22:55:36 Re: [PATCH] V3: Idle in transaction cancellation
Previous Message Andres Freund 2010-12-16 22:53:21 Re: directory archive format for pg_dump