Re: generic copy options

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Dan Colish <dan(at)unencrypted(dot)org>, Emmanuel Cecchet <manu(at)asterdata(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Emmanuel Cecchet <Emmanuel(dot)Cecchet(at)asterdata(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generic copy options
Date: 2009-09-17 23:10:35
Message-ID: 4AB2C1EB.8000703@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
> On Thu, 17 Sep 2009, Dan Colish wrote:
>
>> - Performance appears to be the same although I don't have a good
>> way for
>> testing this at the moment
>
> Here's what I do to generate simple COPY performance test cases:
>
> CREATE TABLE t (i integer);
> INSERT INTO t SELECT x FROM generate_series(1,100000) AS x;
> \timing
> COPY t TO '/some/file' WITH [options];
> BEGIN;
> TRUNCATE TABLE t;
> COPY t FROM '/some/file' WITH [options];
> COMMIT;
>
> You can adjust the size of the generated table based on whether you
> want to minimize (small number) or maximize (big number) the impact of
> the setup overhead relative to actual processing time. Big numbers
> make sense if there's a per-row change, small ones if it's mainly COPY
> setup that's been changed if you want a small bit of data to test
> against.
>
> An example with one column in it is a good test case for seeing
> whether per-row impact has gone up. You'd want something with a wider
> row for other types of performance tests.
>
> The reason for the BEGIN/COMMIT there is that form utilizes an
> optimization that lowers WAL volume when doing the COPY insertion,
> which makes it more likely you'll be testing performance of the right
> thing.
>
>

I usually prefer to test with a table that is more varied than anything
you can make with generate_series. When I tested my ragged copy patch
the other day I copied 1,000,000 rows out of a large table with a
mixture of dates, strings, numbers and nulls.

But then, it has a (tiny) per field overhead so I wanted to make sure
that was well represented in the test.

You are certainly right about wrapping it in begin/truncate/commit (and
when you do make sure that archiving is not on).

You probably want to make sure that the file is not on the same disk as
the database, to avoid disk contention. Or, better, make sure that it is
in OS file system cache, or on a RAM disk.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Colish 2009-09-17 23:21:19 Re: generic copy options
Previous Message Robert Haas 2009-09-17 23:04:01 Re: generic copy options