Re: directory archive format for pg_dump

Lists: pgsql-hackers
From: José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-19 21:28:03
Message-ID: AANLkTi=AEi3F6jC0Zy3dG9njiwJJC8Y6V090-hMonyyn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Dimitri and Joachim.

I've looked the patch too, and I want to share some thoughts too. I've
used http://wiki.postgresql.org/wiki/Reviewing_a_Patch to guide my
review.

Submission review:

I've apllied and compiled the patch successfully using the current master.

Usability review:

The dir format generated in my database 60 files, with different
sizes, and it looks very confusing. Is it possible to use the same
trick as pigz and pbzip2, creating a concatenated file of streams?

Feature test:

Just a partial review. I can dump / restore using lzf, but didnt
stress it hard to check robustness.

Performance review:

Didnt test it hard too, but looks ok.

Coding review:

Just a shallow review here.

>> I think I'd like to see a separate patch for the new compression
>> support. Sorry about that, I realize that's extra work…

Same feeling here, this is the 1st thing that I notice.

The md5.c and kwlookup.c reuse using a link doesn't look nice either.
This way you need to compile twice, among others things, but I think
that its temporary, right?

--
José Arthur Benetasso Villanova


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: directory archive format for pg_dump
Date: 2010-11-19 21:56:23
Message-ID: 1290203622-sup-4947@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from José Arthur Benetasso Villanova's message of vie nov 19 18:28:03 -0300 2010:

> The md5.c and kwlookup.c reuse using a link doesn't look nice either.
> This way you need to compile twice, among others things, but I think
> that its temporary, right?

Not sure what you mean here, but kwlookup.c is a symlink without this
patch too. It's just the way it works; the compilation environments
here and in the backend are different, so there is no other option but
to compile twice. I guess md5.c is a new one (I didn't check), but I
would assume it's the same thing.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Joachim Wieland <joe(at)mcknight(dot)de>
To: José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-20 04:10:37
Message-ID: AANLkTik+PR2HCsqS7PVAdETPyKSGn2hb-eTeg9QZpRew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Jose,

2010/11/19 José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>:
> The dir format generated in my database 60 files, with different
> sizes, and it looks very confusing. Is it possible to use the same
> trick as pigz and pbzip2, creating a concatenated file of streams?

What pigz is parallelizing is the actual computation of the compressed
data. The directory archive format however is a preparation for a
parallel pg_dump, dumping several tables (especially large tables of
course) in parallel via multiple database connections and multiple
pg_dump frontends. The idea of multiplexing their output into one file
has been rejected on the grounds that it would probably slow down the
whole process.

Nevertheless pigz could be implemented as an alternative compression
algorithm and that way the custom and the directory archive format
could use it, but here as well, license and patent questions might be
in the way, even though it is based on libz.

> The md5.c and kwlookup.c reuse using a link doesn't look nice either.
> This way you need to compile twice, among others things, but I think
> that its temporary, right?

No, it isn't. md5.c is used in the same way by e.g. libpq and there
are other examples for links in core, check out src/bin/psql for
example.

Joachim


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-22 16:49:45
Message-ID: 4CEA9F29.5060106@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20.11.2010 06:10, Joachim Wieland wrote:
> 2010/11/19 José Arthur Benetasso Villanova<jose(dot)arthur(at)gmail(dot)com>:
>> The md5.c and kwlookup.c reuse using a link doesn't look nice either.
>> This way you need to compile twice, among others things, but I think
>> that its temporary, right?
>
> No, it isn't. md5.c is used in the same way by e.g. libpq and there
> are other examples for links in core, check out src/bin/psql for
> example.

It seems like overkill to include md5 just for hashing the random bytes
that getRandomData() generates. And if random() doesn't produce unique
values, it's not going to get better by hashing it. How about using a
timestamp instead of the hash?

If you don't initialize random() with srandom(), BTW, it will always
return the same value.

But I'm not actually sure we should be preventing mix & match of files
from different dumps. It might be very useful to do just that sometimes,
like restoring a recent backup, with the contents of one table replaced
with older data. A warning would be ok, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-22 17:07:03
Message-ID: 28890.1290445623@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> But I'm not actually sure we should be preventing mix & match of files
> from different dumps. It might be very useful to do just that sometimes,
> like restoring a recent backup, with the contents of one table replaced
> with older data. A warning would be ok, though.

+1. This mechanism seems like a solution in search of a problem.
Just lose the whole thing, and instead fix pg_dump to complain if
the target directory isn't empty. That should be sufficient to guard
against accidental mixing of different dumps, and as Heikki says
there's not a good reason to prevent intentional mixing.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-22 20:44:10
Message-ID: 4CEAD61A.6060207@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.11.2010 19:07, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> But I'm not actually sure we should be preventing mix& match of files
>> from different dumps. It might be very useful to do just that sometimes,
>> like restoring a recent backup, with the contents of one table replaced
>> with older data. A warning would be ok, though.
>
> +1. This mechanism seems like a solution in search of a problem.
> Just lose the whole thing, and instead fix pg_dump to complain if
> the target directory isn't empty. That should be sufficient to guard
> against accidental mixing of different dumps, and as Heikki says
> there's not a good reason to prevent intentional mixing.

Extending that thought a bit, it would be nice if the per-file header
would carry the info if the file is compressed or not, instead of just
one such flag in the TOC. You could then also mix & match files from
compressed and non-compressed archives.

Other than the md5 thing, the patch looks fine to me. There's many quite
levels of indirection, it took me a while to get my head around the call
chains like DataDumper->_WriteData->WriteDataToArchive->_WriteBuf, but I
don't have any ideas on how to improve that.

However, docs are missing, so I'm marking this as "Waiting on Author".

There's some cosmetic changes I'd like to have fixed or do myself before
committing:

* wrap long lines
* use extern in function prototypes in header files
* "inline" some functions like _StartDataCompressor, _EndDataCompressor,
_DoInflate/_DoDeflate that aren't doing anything but call some other
function.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-29 05:11:46
Message-ID: AANLkTi=neaBqvLa2ZnDpuAMSen3P4ZHE8__Q7L2pmjD0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> * wrap long lines
> * use extern in function prototypes in header files
> * "inline" some functions like _StartDataCompressor, _EndDataCompressor,
> _DoInflate/_DoDeflate  that aren't doing anything but call some other
> function.

So here is a new round of patches. It turned out that the feature to
allow to also restore files from a different dump and with a different
compression required some changes in the compressor API. And in the
end I didn't like all the #ifdefs either and made a less #ifdef-rich
version using function pointers. The downside now is that I have
created quite a few one-line functions that Heikki doesn't like all
that much, but I assume that they are okay in this case on the grounds
that the public compressor interface is calling the private
implementation of a certain compressor.

Joachim

Attachment Content-Type Size
pg_dump-compression-refactor.diff text/x-patch 37.6 KB
pg_dump-directory.diff text/x-patch 68.9 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-29 15:49:16
Message-ID: 4CF3CB7C.8050104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29.11.2010 07:11, Joachim Wieland wrote:
> On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> * wrap long lines
>> * use extern in function prototypes in header files
>> * "inline" some functions like _StartDataCompressor, _EndDataCompressor,
>> _DoInflate/_DoDeflate that aren't doing anything but call some other
>> function.
>
> So here is a new round of patches. It turned out that the feature to
> allow to also restore files from a different dump and with a different
> compression required some changes in the compressor API. And in the
> end I didn't like all the #ifdefs either and made a less #ifdef-rich
> version using function pointers. The downside now is that I have
> created quite a few one-line functions that Heikki doesn't like all
> that much, but I assume that they are okay in this case on the grounds
> that the public compressor interface is calling the private
> implementation of a certain compressor.

Thanks, I'll take a look.

BTW, I know you wanted to have support for other compression algorithms;
I think the best way to achieve that is to make it possible to specify
an external command to be used for compression. pg_dump would fork() and
exec() that, and pipe the data to be compressed/decompressed to
stdin/stdout of the external command. We're not going to add support for
every new compression algorithm that's in vogue, but generic external
command support should make happy those who want it. I'd be particularly
excited about using something like pbzip2, to speed up the compression
on multi-core systems.

That should be a separate patch, but it's something to keep in mind with
these refactorings.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-29 16:06:04
Message-ID: AANLkTikiJd1d1+1SJTNmDUwp6HJ2nTKinef8U6nuHEcj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 29, 2010 at 10:49 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 29.11.2010 07:11, Joachim Wieland wrote:
>>
>> On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com>  wrote:
>>>
>>> * wrap long lines
>>> * use extern in function prototypes in header files
>>> * "inline" some functions like _StartDataCompressor, _EndDataCompressor,
>>> _DoInflate/_DoDeflate  that aren't doing anything but call some other
>>> function.
>>
>> So here is a new round of patches. It turned out that the feature to
>> allow to also restore files from a different dump and with a different
>> compression required some changes in the compressor API. And in the
>> end I didn't like all the #ifdefs either and made a less #ifdef-rich
>> version using function pointers. The downside now is that I have
>> created quite a few one-line functions that Heikki doesn't like all
>> that much, but I assume that they are okay in this case on the grounds
>> that the public compressor interface is calling the private
>> implementation of a certain compressor.
>
> Thanks, I'll take a look.
>
> BTW, I know you wanted to have support for other compression algorithms; I
> think the best way to achieve that is to make it possible to specify an
> external command to be used for compression. pg_dump would fork() and exec()
> that, and pipe the data to be compressed/decompressed to stdin/stdout of the
> external command. We're not going to add support for every new compression
> algorithm that's in vogue, but generic external command support should make
> happy those who want it. I'd be particularly excited about using something
> like pbzip2, to speed up the compression on multi-core systems.
>
> That should be a separate patch, but it's something to keep in mind with
> these refactorings.

That would also ease licensing concerns, since we wouldn't have to
redistribute or bundle anything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-29 20:21:51
Message-ID: 4CF40B5F.1010407@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29.11.2010 07:11, Joachim Wieland wrote:
> On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> * wrap long lines
>> * use extern in function prototypes in header files
>> * "inline" some functions like _StartDataCompressor, _EndDataCompressor,
>> _DoInflate/_DoDeflate that aren't doing anything but call some other
>> function.
>
> So here is a new round of patches. It turned out that the feature to
> allow to also restore files from a different dump and with a different
> compression required some changes in the compressor API. And in the
> end I didn't like all the #ifdefs either and made a less #ifdef-rich
> version using function pointers.

Ok. The separate InitCompressorState() and AllocateCompressorState()
functions seem unnecessary. As the code stands, there's little
performance gain from re-using the same CompressorState, just
re-initializing it, and I can't see any other justification for them either.

I combined those, and the Free/Flush steps, and did a bunch of other
editorializations and cleanups. Here's an updated patch, also available
in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
"pg_dump-dir". I'm going to continue reviewing this later, tomorrow
hopefully.

> The downside now is that I have
> created quite a few one-line functions that Heikki doesn't like all
> that much, but I assume that they are okay in this case on the grounds
> that the public compressor interface is calling the private
> implementation of a certain compressor.

You could avoid the wrapper functions by calling the function pointers
directly, but I agree it seems neater the way you did it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
pg_dump-compression-refactor-2.diff text/x-diff 34.9 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-01 14:03:15
Message-ID: 4CF655A3.4090308@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29.11.2010 22:21, Heikki Linnakangas wrote:
> On 29.11.2010 07:11, Joachim Wieland wrote:
>> On Mon, Nov 22, 2010 at 3:44 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> * wrap long lines
>>> * use extern in function prototypes in header files
>>> * "inline" some functions like _StartDataCompressor, _EndDataCompressor,
>>> _DoInflate/_DoDeflate that aren't doing anything but call some other
>>> function.
>>
>> So here is a new round of patches. It turned out that the feature to
>> allow to also restore files from a different dump and with a different
>> compression required some changes in the compressor API. And in the
>> end I didn't like all the #ifdefs either and made a less #ifdef-rich
>> version using function pointers.
>
> Ok. The separate InitCompressorState() and AllocateCompressorState()
> functions seem unnecessary. As the code stands, there's little
> performance gain from re-using the same CompressorState, just
> re-initializing it, and I can't see any other justification for them
> either.
>
> I combined those, and the Free/Flush steps, and did a bunch of other
> editorializations and cleanups. Here's an updated patch, also available
> in my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch
> "pg_dump-dir". I'm going to continue reviewing this later, tomorrow
> hopefully.

Here's another update. I changed things quite heavily. I didn't see the
point of having the Alloc+Free functions for uncompressing, because the
ReadDataFromArchive processed the whole input stream in one go anyway.
So the new API consists of four functions, AllocateCompressor,
WriteDataToArchive and EndCompressor for writing, and
ReadDataFromArchive for reading.

Also, I reverted the zlib buffer size from 64k to 4k. If you want to
raise that, let's discuss that separately.

Please let me know what you think of this version, or if you spot any
bugs. I'll keep working on this, I'm hoping to get this into committable
shape by the end of the week.

The pg_backup_directory patch naturally won't apply over this anymore.
Once we have the compress_io part in shape, that will need to be fixed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-01 14:05:34
Message-ID: 4CF6562E.60506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.12.2010 16:03, Heikki Linnakangas wrote:
> On 29.11.2010 22:21, Heikki Linnakangas wrote:
>> I combined those, and the Free/Flush steps, and did a bunch of other
>> editorializations and cleanups. Here's an updated patch, also available
>> in my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch
>> "pg_dump-dir". I'm going to continue reviewing this later, tomorrow
>> hopefully.
>
> Here's another update.

Forgot attachment. This is also available in the above git repo.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
pg_dump-compression-refactor-3.patch text/x-diff 32.1 KB

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-02 02:35:48
Message-ID: AANLkTinauQsCGt-yeyeN_QWuN+tao+q126rwKnO2QWLP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 1, 2010 at 9:05 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Forgot attachment. This is also available in the above git repo.

I have quickly checked your modifications, on the one hand I like the
reduction of functions, I would have said that we have AH around all
the time and so we could just allocate once and stuff it all into
ctx->cs and reuse the buffers for every object, but re-allocating them
for every (dumpable) object should be fine as well.

Regarding the function pointers that you removed, you are now putting
back in what Dimitri wanted me to take out, namely switch/case
instructions for the algorithms and then #ifdefs for every algorithm.
It's not too many now since we have taken out LZF. Well, I can live
with both ways.

There is one thing however that I am not in favor of, which is the
removal of the "sizeHint" parameter for the read functions. The reason
for this parameter is not very clear now without LZF but I have tried
to put in a few comments to explain the situation (which you have
taken out as well :-) ).

The point is that zlib is a stream based compression algorithm, you
just stuff data in and from time to time you get data out and in the
end you explicitly flush the compressor. The read function can just
return as many bytes as it wants and we can just hand it all over to
zlib. Other compression algorithms however are block based and first
write a block header that contains the information on the next data
block, including uncompressed and compressed sizes. Now with the
sizeHint parameter I used, the compressor could tell the read function
that it just wants to read the fixed size header (6 bytes IIRC). In
the header it would look up the compressed size for the next block and
would then ask the read function to get exactly this amount of data,
decompress it and go on with the next block, and so forth...

Of course you can possibly do that memory management inside the
compressor with an extra buffer holding what you got in excess but
it's a pain. If you removed that part on purpose on the grounds that
there is no block based compression algorithm in core and probably
never will be, then that's okay :-)

Joachim


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-02 07:49:45
Message-ID: 4CF74F99.4050008@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.12.2010 04:35, Joachim Wieland wrote:
> There is one thing however that I am not in favor of, which is the
> removal of the "sizeHint" parameter for the read functions. The reason
> for this parameter is not very clear now without LZF but I have tried
> to put in a few comments to explain the situation (which you have
> taken out as well :-) ).
>
> The point is that zlib is a stream based compression algorithm, you
> just stuff data in and from time to time you get data out and in the
> end you explicitly flush the compressor. The read function can just
> return as many bytes as it wants and we can just hand it all over to
> zlib. Other compression algorithms however are block based and first
> write a block header that contains the information on the next data
> block, including uncompressed and compressed sizes. Now with the
> sizeHint parameter I used, the compressor could tell the read function
> that it just wants to read the fixed size header (6 bytes IIRC). In
> the header it would look up the compressed size for the next block and
> would then ask the read function to get exactly this amount of data,
> decompress it and go on with the next block, and so forth...
>
> Of course you can possibly do that memory management inside the
> compressor with an extra buffer holding what you got in excess but
> it's a pain. If you removed that part on purpose on the grounds that
> there is no block based compression algorithm in core and probably
> never will be, then that's okay :-)

Yeah, we're not going to have lzf built-in anytime soon. The external
command approach seems like the best way to support additional
compression algorithms, and I don't think it could do anything with
sizeHint. And the custom format didn't obey sizeHint anyway, because it
reads one custom-format block at a time.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-02 19:52:27
Message-ID: 4CF7F8FB.1040305@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ok, committed, with some small cleanup since the last patch I posted.

Could you update the directory-format patch on top of the committed
version, please?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: directory archive format for pg_dump
Date: 2010-12-02 21:12:41
Message-ID: 1291324036-sup-7776@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Heikki Linnakangas's message of jue dic 02 16:52:27 -0300 2010:
> Ok, committed, with some small cleanup since the last patch I posted.

I think the comments on _ReadBuf and friends need to be updated, since
they are not just for headers and TOC stuff anymore. I'm not sure if
they were already outdated before your patch ...

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: directory archive format for pg_dump
Date: 2010-12-03 09:22:53
Message-ID: 4CF8B6ED.3090300@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.12.2010 23:12, Alvaro Herrera wrote:
> Excerpts from Heikki Linnakangas's message of jue dic 02 16:52:27 -0300 2010:
>> Ok, committed, with some small cleanup since the last patch I posted.
>
> I think the comments on _ReadBuf and friends need to be updated, since
> they are not just for headers and TOC stuff anymore. I'm not sure if
> they were already outdated before your patch ...

"These routines are only used to read & write headers & TOC"

Hmm, ReadInt calls _ReadByte, and PrintData used to call ReadInt, so it
was indirectly been called for things other than headers and TOC
already. Unless you consider the "headers" to include length integer in
in each data block. I'm inclined to just remove that sentence.

I also note that the _Clone and _DeClone functions are a bit misplaced.
There's a big "END OF FORMAT CALLBACKS" earlier in the file, but _Clone
and _DeClone are such callbacks. I'll move them to the right place.

PS. Thanks for the cleanup you did yesterday.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-08 04:38:02
Message-ID: AANLkTik=3XBZq-PbC9CNTV1X6+-+Yd9gOdwf7rYjCpBp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 2, 2010 at 2:52 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Ok, committed, with some small cleanup since the last patch I posted.
>
> Could you update the directory-format patch on top of the committed version,
> please?

Thanks for committing the first part. Here is the updated and rebased
directory-format patch.

Joachim

Attachment Content-Type Size
pg_dump-directory-rebased.diff.gz application/x-gzip 16.1 KB

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 10:12:52
Message-ID: 4D09E624.4030706@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Moving onto the directory archive part of this patch, the feature seems
to work as advertised; here's a quick test case:

createdb pgbench
pgbench -i -s 1 pgbench
pg_dump -F d -f test
pg_restore -k test
pg_restore -l test
createdb copy
pg_restore -d copy test

The copy made that way looked good. There's a good chunk of code in the
patch that revolves around BLOB support. We need to get someone who is
more familiar with those than me to suggest some tests for that part
before this gets committed. If you could suggest how to test that code,
that would be helpful.

There's a number of small things that I'd like to see improved in new
rev of this code

pg_dump: help message for "--file" needs to mention that this is
overloaded to also specify the output directory too.

pg_dump: the documentation for --file should say the directory is
created, and must not exist when you start. The code catches this well,
but that expectation is not clear until you try it.

pg_restore: the help message "check the directory archive" would be
clearer as "check an archive in directory format".

There are some tab vs. space whitespace inconsistencies in the
documentation added.

The comments at the beginning of functions could be more consistent.
Early parts of the code have a header for each function that's
extensive. Maybe even a bit more than needed. I'm not sure why it's
important to document here which of these functions is
optional/mandatory for example, and getting rid of just those would trim
a decent number of lines out of the patch. But then at the end, all of
the new functions added aren't documented at all. Some of those are
near trivial, but it would be better to have at least a small
descriptive header for them.

The comment header at the beginning of pg_backup_directory is a bit
weird. I guess Philip Warner should still be credited as the author of
the code this was based on, but it's a weird seeing a new file
attributed solely to him. Also, there's an XXX in the identification
field there that should be filled in with the file name.

There's your feedback for this round. I hope we'll see an updated patch
from you as part of the next CommitFest.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 15:23:43
Message-ID: 4D0A2EFF.9020104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.12.2010 12:12, Greg Smith wrote:
> Moving onto the directory archive part of this patch, the feature seems
> to work as advertised; here's a quick test case:
>
> createdb pgbench
> pgbench -i -s 1 pgbench
> pg_dump -F d -f test
> pg_restore -k test
> pg_restore -l test
> createdb copy
> pg_restore -d copy test
>
> The copy made that way looked good. There's a good chunk of code in the
> patch that revolves around BLOB support. We need to get someone who is
> more familiar with those than me to suggest some tests for that part
> before this gets committed. If you could suggest how to test that code,
> that would be helpful.
>
> There's a number of small things that I'd like to see improved in new
> rev of this code
> ...

In addition to those:

The "check" functionality seems orthogonal, it should be splitted off to
a separate patch. It would possibly be useful to be perform sanity
checks on an archive in custom format too, and the directory format
works just as well without it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 17:48:06
Message-ID: 4D0A50D6.5070602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.12.2010 17:23, Heikki Linnakangas wrote:
> On 16.12.2010 12:12, Greg Smith wrote:
>> There's a number of small things that I'd like to see improved in new
>> rev of this code
>> ...
>
> In addition to those:
>...

One more thing: the motivation behind this patch is to allow parallel
pg_dump in the future, so we should be make sure this patch caters well
for that.

As soon as we have parallel pg_dump, the next big thing is going to be
parallel dump of the same table using multiple processes. Perhaps we
should prepare for that in the directory archive format, by allowing the
data of a single table to be split into multiple files. That way
parallel pg_dump is simple, you just split the table in chunks of
roughly the same size, say 10GB each, and launch a process for each
chunk, writing to a separate file.

It should be a quite simple add-on to the current patch, but will make
life so much easier for parallel pg_dump. It would also be helpful to
work around file size limitations on some filesystems.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 17:58:32
Message-ID: AANLkTimu1XtW4oQCKpCBeH-3wPDC3iL+_28=kAL0HSY9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> One more thing: the motivation behind this patch is to allow parallel
> pg_dump in the future, so we should be make sure this patch caters well for
> that.
>
> As soon as we have parallel pg_dump, the next big thing is going to be
> parallel dump of the same table using multiple processes. Perhaps we should
> prepare for that in the directory archive format, by allowing the data of a
> single table to be split into multiple files. That way parallel pg_dump is
> simple, you just split the table in chunks of roughly the same size, say
> 10GB each, and launch a process for each chunk, writing to a separate file.
>
> It should be a quite simple add-on to the current patch, but will make life
> so much easier for parallel pg_dump. It would also be helpful to work around
> file size limitations on some filesystems.

Sounds reasonable. Are you planning to do this and commit?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 18:04:54
Message-ID: 4D0A54C6.3090502@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.12.2010 19:58, Robert Haas wrote:
> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> One more thing: the motivation behind this patch is to allow parallel
>> pg_dump in the future, so we should be make sure this patch caters well for
>> that.
>>
>> As soon as we have parallel pg_dump, the next big thing is going to be
>> parallel dump of the same table using multiple processes. Perhaps we should
>> prepare for that in the directory archive format, by allowing the data of a
>> single table to be split into multiple files. That way parallel pg_dump is
>> simple, you just split the table in chunks of roughly the same size, say
>> 10GB each, and launch a process for each chunk, writing to a separate file.
>>
>> It should be a quite simple add-on to the current patch, but will make life
>> so much easier for parallel pg_dump. It would also be helpful to work around
>> file size limitations on some filesystems.
>
> Sounds reasonable. Are you planning to do this and commit?

I'll defer to Joachim, assuming he has the time & energy.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 18:33:10
Message-ID: AANLkTim8vp_iEvzmWwnOG9--3m3sw3gr_zuMWyK31PkP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> As soon as we have parallel pg_dump, the next big thing is going to be
> parallel dump of the same table using multiple processes. Perhaps we should
> prepare for that in the directory archive format, by allowing the data of a
> single table to be split into multiple files. That way parallel pg_dump is
> simple, you just split the table in chunks of roughly the same size, say
> 10GB each, and launch a process for each chunk, writing to a separate file.

How exactly would you "just split the table in chunks of roughly the
same size" ? Which queries should pg_dump send to the backend? If it
just sends a bunch of WHERE queries, the server would still scan the
same data several times since each pg_dump client would result in a
seqscan over the full table.

Ideally pg_dump should be able to query for all data in only one
relation segment so that each segment is scanned by only one backend
process. However this requires backend support and we would be sending
queries that we'd not want clients other than pg_dump to send...

If you were thinking about WHERE queries to get equally sized
partitions, how would we deal with unindexed and/or non-numerical data
in a large table?

Joachim


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 18:45:42
Message-ID: 4D0A5E56.1080109@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.12.2010 20:33, Joachim Wieland wrote:
> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> As soon as we have parallel pg_dump, the next big thing is going to be
>> parallel dump of the same table using multiple processes. Perhaps we should
>> prepare for that in the directory archive format, by allowing the data of a
>> single table to be split into multiple files. That way parallel pg_dump is
>> simple, you just split the table in chunks of roughly the same size, say
>> 10GB each, and launch a process for each chunk, writing to a separate file.
>
> How exactly would you "just split the table in chunks of roughly the
> same size" ?

Check pg_class.relpages, and divide that evenly across the processes.
That should be good enough.

> Which queries should pg_dump send to the backend? If it
> just sends a bunch of WHERE queries, the server would still scan the
> same data several times since each pg_dump client would result in a
> seqscan over the full table.

Hmm, I was thinking of "SELECT * FROM table WHERE ctid BETWEEN ? AND ?",
but we don't support TidScans for ranges. Perhaps we could add that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 19:29:02
Message-ID: 8305.1292527742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 16.12.2010 20:33, Joachim Wieland wrote:
>> How exactly would you "just split the table in chunks of roughly the
>> same size" ?

> Check pg_class.relpages, and divide that evenly across the processes.
> That should be good enough.

Not even close ... relpages could be badly out of date. If you believe
it, you could fail to dump data that's in further-out pages. We'd need
to move pg_relpages() or some equivalent into core to make this
workable.

>> Which queries should pg_dump send to the backend?

> Hmm, I was thinking of "SELECT * FROM table WHERE ctid BETWEEN ? AND ?",
> but we don't support TidScans for ranges. Perhaps we could add that.

Yeah, that seems probably workable, given an up-to-date idea of the
possible block range.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 20:13:38
Message-ID: AANLkTimrRrhnoibwb9Z9Nz6CQ-vx22Wezm1YyZR=vR6x@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 16, 2010 at 2:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 16.12.2010 20:33, Joachim Wieland wrote:
>>> How exactly would you "just split the table in chunks of roughly the
>>> same size" ?
>
>> Check pg_class.relpages, and divide that evenly across the processes.
>> That should be good enough.
>
> Not even close ... relpages could be badly out of date.  If you believe
> it, you could fail to dump data that's in further-out pages.  We'd need
> to move pg_relpages() or some equivalent into core to make this
> workable.
>
>>> Which queries should pg_dump send to the backend?
>
>> Hmm, I was thinking of "SELECT * FROM table WHERE ctid BETWEEN ? AND ?",
>> but we don't support TidScans for ranges. Perhaps we could add that.
>
> Yeah, that seems probably workable, given an up-to-date idea of the
> possible block range.

So how bad would it be if we committed this new format without support
for splitting large relations into multiple files, or with some stub
support that never actually gets used, and fixed this later? Because
this is starting to sound like a bigger project than I think we ought
to be requiring for this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 20:26:20
Message-ID: 4D0A75EC.6080708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.12.2010 22:13, Robert Haas wrote:
> So how bad would it be if we committed this new format without support
> for splitting large relations into multiple files, or with some stub
> support that never actually gets used, and fixed this later? Because
> this is starting to sound like a bigger project than I think we ought
> to be requiring for this patch.

Would probably be fine, as long as we don't paint ourselves in the corner.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 20:28:40
Message-ID: 4D0A7678.8040609@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/16/2010 03:13 PM, Robert Haas wrote:
> So how bad would it be if we committed this new format without support
> for splitting large relations into multiple files, or with some stub
> support that never actually gets used, and fixed this later? Because
> this is starting to sound like a bigger project than I think we ought
> to be requiring for this patch.

I don't think we have to have that in the first go at all. Parallel dump
could be extremely useful without it. I haven't looked closely, but I
assume there will still be an archive version recorded somewhere. When
we change the archive format, bump the version number.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 20:52:12
Message-ID: 12022.1292532732@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/16/2010 03:13 PM, Robert Haas wrote:
>> So how bad would it be if we committed this new format without support
>> for splitting large relations into multiple files, or with some stub
>> support that never actually gets used, and fixed this later? Because
>> this is starting to sound like a bigger project than I think we ought
>> to be requiring for this patch.

> I don't think we have to have that in the first go at all. Parallel dump
> could be extremely useful without it. I haven't looked closely, but I
> assume there will still be an archive version recorded somewhere. When
> we change the archive format, bump the version number.

Sure, but it's worth thinking about the feature now. If there are
format tweaks to be made, it might be less painful to make them now
instead of later, even if actual support for the feature isn't there.
(I agree I don't want to try to implement it just yet.)

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 22:29:50
Message-ID: 201012162329.51796.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 December 2010 19:33:10 Joachim Wieland wrote:
> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
>
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> > As soon as we have parallel pg_dump, the next big thing is going to be
> > parallel dump of the same table using multiple processes. Perhaps we
> > should prepare for that in the directory archive format, by allowing the
> > data of a single table to be split into multiple files. That way
> > parallel pg_dump is simple, you just split the table in chunks of
> > roughly the same size, say 10GB each, and launch a process for each
> > chunk, writing to a separate file.
>
> How exactly would you "just split the table in chunks of roughly the
> same size" ? Which queries should pg_dump send to the backend? If it
> just sends a bunch of WHERE queries, the server would still scan the
> same data several times since each pg_dump client would result in a
> seqscan over the full table.
I would suggest implementing < > support for tidscans and doing it in segment
size...

Andres


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 22:34:02
Message-ID: 4D0A93DA.30308@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17.12.2010 00:29, Andres Freund wrote:
> On Thursday 16 December 2010 19:33:10 Joachim Wieland wrote:
>> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
>>
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> As soon as we have parallel pg_dump, the next big thing is going to be
>>> parallel dump of the same table using multiple processes. Perhaps we
>>> should prepare for that in the directory archive format, by allowing the
>>> data of a single table to be split into multiple files. That way
>>> parallel pg_dump is simple, you just split the table in chunks of
>>> roughly the same size, say 10GB each, and launch a process for each
>>> chunk, writing to a separate file.
>>
>> How exactly would you "just split the table in chunks of roughly the
>> same size" ? Which queries should pg_dump send to the backend? If it
>> just sends a bunch of WHERE queries, the server would still scan the
>> same data several times since each pg_dump client would result in a
>> seqscan over the full table.
> I would suggest implementing< > support for tidscans and doing it in segment
> size...

I don't think there's any particular gain from matching the server's
data file segment size, although 1GB does sound like a good chunk size
for this too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 22:53:21
Message-ID: 201012162353.21655.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 December 2010 23:34:02 Heikki Linnakangas wrote:
> On 17.12.2010 00:29, Andres Freund wrote:
> > On Thursday 16 December 2010 19:33:10 Joachim Wieland wrote:
> >> On Thu, Dec 16, 2010 at 12:48 PM, Heikki Linnakangas
> >>
> >> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> >>> As soon as we have parallel pg_dump, the next big thing is going to be
> >>> parallel dump of the same table using multiple processes. Perhaps we
> >>> should prepare for that in the directory archive format, by allowing
> >>> the data of a single table to be split into multiple files. That way
> >>> parallel pg_dump is simple, you just split the table in chunks of
> >>> roughly the same size, say 10GB each, and launch a process for each
> >>> chunk, writing to a separate file.
> >>
> >> How exactly would you "just split the table in chunks of roughly the
> >> same size" ? Which queries should pg_dump send to the backend? If it
> >> just sends a bunch of WHERE queries, the server would still scan the
> >> same data several times since each pg_dump client would result in a
> >> seqscan over the full table.
> >
> > I would suggest implementing< > support for tidscans and doing it in
> > segment size...
>
> I don't think there's any particular gain from matching the server's
> data file segment size, although 1GB does sound like a good chunk size
> for this too.
Its noticeable more efficient reading from different files in different processes
in comparison to all hammering the same file.

Andres


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Joachim Wieland <joe(at)mcknight(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-16 22:55:34
Message-ID: 4D0A98E6.7090606@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/16/2010 03:52 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 12/16/2010 03:13 PM, Robert Haas wrote:
>>> So how bad would it be if we committed this new format without support
>>> for splitting large relations into multiple files, or with some stub
>>> support that never actually gets used, and fixed this later? Because
>>> this is starting to sound like a bigger project than I think we ought
>>> to be requiring for this patch.
>> I don't think we have to have that in the first go at all. Parallel dump
>> could be extremely useful without it. I haven't looked closely, but I
>> assume there will still be an archive version recorded somewhere. When
>> we change the archive format, bump the version number.
> Sure, but it's worth thinking about the feature now. If there are
> format tweaks to be made, it might be less painful to make them now
> instead of later, even if actual support for the feature isn't there.
> (I agree I don't want to try to implement it just yet.)
>
>

Yeah, OK. Well, time is getting short but (hand waving wildly) I think
we could probably get by with just adding a member to the TOC for the
section number of the entry (set it to 0 for non TABLE DATA TOC
entries). The section number could be built into the file name in
directory format. For now that number would always be 1 for TABLE DATA
members.

This has intriguing possibilities for parallel restore of custom format
dumps too. It could be very useful to be able to restore a single table
in parallel, if we had more than one TABLE DATA member per table.

I'm deliberately just addressing infrastructure issues rather than how
we actually generate multiple sections of data for a single table
(especially if we want to do that in parallel).

cheers

andrew