directory archive format for pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: directory archive format for pg_dump
Date: 2010-11-14 23:48:07
Message-ID: AANLkTimUELTXwRSQDQNwxik_k1y3YcH1u-9NgHZqpi9e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is the first of two patches for parallel pg_dump. In particular, this
patch adds a new pg_dump archive type which can save pg_dump data to a
directory, with each table/blob being a file so that several processes can
write to different files in parallel.

Since the compression is currently all down in the custom format backup
code,
the first thing I've done was refactoring the compression functions into a
separate file. While at it, I have added support for liblzf compression.

Writing the backup to a directory brings the disadvantage that your backup
now
consists of a bunch of files and you should make sure not to lose files or
mix
files of different backup sets. Therefore, I have added a -k switch that
checks if a directory backup set is complete. To do this, every backup has a
different id (basically a random md5sum) which is copied into every file
(both
TOC and data files). The TOC also knows about the size of each data file and
can check if it has been truncated for some reason.

Regarding lzf compression, the last discussion was here:

http://archives.postgresql.org/pgsql-hackers/2010-04/msg00442.php

I have included it to actually have multiple compression algorithms to build
a
framework for and to allow people to just compile and run it and see what
they
get. In my tests, when I run a backup with lzf compression, the postgres
backend is using 100% of one CPU and pg_dump is using 15% of another CPU.
Running with zlib however gives me 100% zlib and 70% postgres. Specifying
the
fastest zlib compression rate of 1 gives me 50% pg_dump and 100% postgres.
zlib
compression can be taken out of the code in like two minutes, it's all in
#ifdef's, so please see lzf just as an optional addition to the directory
patch
instead of as a main feature.

I am also submitting a WIP patch that shows the parallel version of pg_dump
which is a patch on top of this one. It is not completely ready yet but I am
releasing it as a WIP patch so you can see the overall picture and can play
with it already now. And hopefully I can get some feedback if I am going
into
the right direction.

There is a small shellscript included (test.sh) listing some of the
commands,
to give people a quick overview of how to call it.

Joachim

Attachment Content-Type Size
pg_dump-directory.diff text/x-patch 110.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2010-11-14 23:48:24 Spread checkpoint sync
Previous Message Robert Haas 2010-11-14 23:46:16 Re: Count backend self-sync calls