Re: generic copy options

From: Dan Colish <dan(at)unencrypted(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic copy options
Date: 2009-09-17 23:21:19
Message-ID: 20090917232119.GF13715@funkstrom.spiretech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 17, 2009 at 07:10:35PM -0400, Andrew Dunstan wrote:
>
>
> Greg Smith wrote:
>> On Thu, 17 Sep 2009, Dan Colish wrote:
>>
>>> - Performance appears to be the same although I don't have a good
>>> way for
>>> testing this at the moment
>>
>> Here's what I do to generate simple COPY performance test cases:
>>
>> CREATE TABLE t (i integer);
>> INSERT INTO t SELECT x FROM generate_series(1,100000) AS x;
>> \timing
>> COPY t TO '/some/file' WITH [options];
>> BEGIN;
>> TRUNCATE TABLE t;
>> COPY t FROM '/some/file' WITH [options];
>> COMMIT;
>>
>> You can adjust the size of the generated table based on whether you
>> want to minimize (small number) or maximize (big number) the impact of
>> the setup overhead relative to actual processing time. Big numbers
>> make sense if there's a per-row change, small ones if it's mainly COPY
>> setup that's been changed if you want a small bit of data to test
>> against.
>>
>> An example with one column in it is a good test case for seeing
>> whether per-row impact has gone up. You'd want something with a wider
>> row for other types of performance tests.
>>
>> The reason for the BEGIN/COMMIT there is that form utilizes an
>> optimization that lowers WAL volume when doing the COPY insertion,
>> which makes it more likely you'll be testing performance of the right
>> thing.
>>
>>
>
> I usually prefer to test with a table that is more varied than anything
> you can make with generate_series. When I tested my ragged copy patch
> the other day I copied 1,000,000 rows out of a large table with a
> mixture of dates, strings, numbers and nulls.
>
> But then, it has a (tiny) per field overhead so I wanted to make sure
> that was well represented in the test.
>
> You are certainly right about wrapping it in begin/truncate/commit (and
> when you do make sure that archiving is not on).
>
> You probably want to make sure that the file is not on the same disk as
> the database, to avoid disk contention. Or, better, make sure that it is
> in OS file system cache, or on a RAM disk.
>
> cheers
>
> andrew

If someone with a more significant setup can run tests that would ideal.
I only have my laptop which is a single disk and fairly underpowered.

That said, here are my results running the script above, it looks like
the pach improves performance. I would really interested to see results
on a larger data set and heavier iron.

--
--Dan

Without Patch:

CREATE TABLE
INSERT 0 100000
Timing is on.
COPY 100000
Time: 83.273 ms
BEGIN
Time: 0.412 ms
TRUNCATE TABLE
Time: 0.357 ms
COPY 100000
Time: 140.911 ms
COMMIT
Time: 4.909 ms

With Patch:

CREATE TABLE
INSERT 0 100000
Timing is on.
COPY 100000
Time: 80.205 ms
BEGIN
Time: 0.351 ms
TRUNCATE TABLE
Time: 0.346 ms
COPY 100000
Time: 124.303 ms
COMMIT
Time: 4.130 ms

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emmanuel Cecchet 2009-09-17 23:30:54 Re: generic copy options
Previous Message Andrew Dunstan 2009-09-17 23:10:35 Re: generic copy options