Re: directory archive format for pg_dump

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 17:48:06
Message-ID: 4D0A50D6.5070602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.12.2010 17:23, Heikki Linnakangas wrote:
> On 16.12.2010 12:12, Greg Smith wrote:
>> There's a number of small things that I'd like to see improved in new
>> rev of this code
>> ...
>
> In addition to those:
>...

One more thing: the motivation behind this patch is to allow parallel
pg_dump in the future, so we should be make sure this patch caters well
for that.

As soon as we have parallel pg_dump, the next big thing is going to be
parallel dump of the same table using multiple processes. Perhaps we
should prepare for that in the directory archive format, by allowing the
data of a single table to be split into multiple files. That way
parallel pg_dump is simple, you just split the table in chunks of
roughly the same size, say 10GB each, and launch a process for each
chunk, writing to a separate file.

It should be a quite simple add-on to the current patch, but will make
life so much easier for parallel pg_dump. It would also be helpful to
work around file size limitations on some filesystems.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Radosław Smogura 2010-12-16 17:55:45 Binary timestamp with without timezone
Previous Message Tom Lane 2010-12-16 17:46:43 Re: [PATCH] V3: Idle in transaction cancellation