Re: directory archive format for pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 18:33:10
Message-ID: AANLkTim8vp_iEvzmWwnOG9--3m3sw3gr_zuMWyK31PkP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> As soon as we have parallel pg_dump, the next big thing is going to be
> parallel dump of the same table using multiple processes. Perhaps we should
> prepare for that in the directory archive format, by allowing the data of a
> single table to be split into multiple files. That way parallel pg_dump is
> simple, you just split the table in chunks of roughly the same size, say
> 10GB each, and launch a process for each chunk, writing to a separate file.

How exactly would you "just split the table in chunks of roughly the
same size" ? Which queries should pg_dump send to the backend? If it
just sends a bunch of WHERE queries, the server would still scan the
same data several times since each pg_dump client would result in a
seqscan over the full table.

Ideally pg_dump should be able to query for all data in only one
relation segment so that each segment is scanned by only one backend
process. However this requires backend support and we would be sending
queries that we'd not want clients other than pg_dump to send...

If you were thinking about WHERE queries to get equally sized
partitions, how would we deal with unindexed and/or non-numerical data
in a large table?

Joachim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-12-16 18:45:42 Re: directory archive format for pg_dump
Previous Message Tom Lane 2010-12-16 18:24:50 Re: [PATCH] V3: Idle in transaction cancellation