Quick Links

Re: COPY FROM performance improvements

Lists:	pgsql-patchespgsql-performance

From:	"Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To:	"Alvaro Herrera" <alvherre(at)surnet(dot)cl>
Cc:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-27 06:23:05
Message-ID:	3E37B936B592014B978C4415F90D662D0C420B@MI8NYCMAIL06.Mi8.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Alvaro,

> Am I the only one annoyed by the fact that the patch is not very nice to
> 80-columns-wide terminals? It doesn't need to be a rigid rule but I
> think the code looks much better if it's not too wide. This code is
> wide already, but I think we should be making it better, not the other
> way around.

Yup - fixed (as well as I can without mucking readability).

> Also, your text editor seems to be messing the indentation of comments
> when there are ( or other symbols in the comment text. (Maybe this
> doesn't matter a lot because pgindent will fix it, but still -- it makes
> it slightly more difficult to read.)

Yah - I think I fixed several mis-indented comments. I'm using vim with
tabstop=4. I personally don't like tabs in text and would prefer them
expanded using spaces, but that's a nice way to make small formatting
changes look huge in a cvs diff.

See attached - only formatting changes included.

- Luke

Attachment	Content-Type	Size
fast_copy_V11_8.1.patch	application/octet-stream	56.3 KB

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Luke Lonergan <LLonergan(at)greenplum(dot)com>
Cc:	Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-27 11:45:19
Message-ID:	42BFE6CF.1030101@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke Lonergan wrote:

>Yah - I think I fixed several mis-indented comments. I'm using vim with
>tabstop=4. I personally don't like tabs in text and would prefer them
>expanded using spaces, but that's a nice way to make small formatting
>changes look huge in a cvs diff.
>
>
>

You might like to look at running pgindent (see src/tools/pgindent) over
the file before cutting a patch. Since this is usually run over each
file just before a release, the only badness should be things from
recent patches.

cheers

andrew

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc:	"Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-27 18:52:30
Message-ID:	BEE59901.7E8E%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Andrew,

> You might like to look at running pgindent (see src/tools/pgindent) over
> the file before cutting a patch. Since this is usually run over each
> file just before a release, the only badness should be things from
> recent patches.

I've attached two patches, one gained from running pgindent against the
current CVS tip copy.c (:-D) and one gained by running the COPY FROM perf
improvements through the same. Nifty tool!

Only formatting changes in these.

- Luke

Attachment	Content-Type	Size
copy-pgindent.patch	application/octet-stream	79.7 KB
fast_copy_V12_8.1.patch	application/octet-stream	125.9 KB

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-27 20:20:24
Message-ID:	42C05F88.7000105@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke Lonergan wrote:

>Andrew,
>
>
>
>>You might like to look at running pgindent (see src/tools/pgindent) over
>>the file before cutting a patch. Since this is usually run over each
>>file just before a release, the only badness should be things from
>>recent patches.
>>
>>
>
>I've attached two patches, one gained from running pgindent against the
>current CVS tip copy.c (:-D) and one gained by running the COPY FROM perf
>improvements through the same. Nifty tool!
>
>Only formatting changes in these.
>
>
>
>
Luke,

Something strange has happened. I suspect that you've inadvertantly used
GNU indent or an unpatched BSD indent. pgindent needs a special patched
BSD indent to work according to the PG standards - see the README

cheers

andrew

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc:	"Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-28 04:44:59
Message-ID:	BEE623DE.7F0A%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Andrew,

> Something strange has happened. I suspect that you've inadvertantly used
> GNU indent or an unpatched BSD indent. pgindent needs a special patched
> BSD indent to work according to the PG standards - see the README

OK - phew! I generated new symbols for pgindent and fixed a bug in the awk
scripting within (diff attached) and ran against the CVS tip copy.c and got
only minor changes in formatting that appear to be consistent with the rest
of the code. I pgindent'ed the COPY FROM performance modded code and it
looks good and tests good.

Only formatting changes to the previous patch for copy.c attached.

Patch to update pgindent with new symbols and fix a bug in an awk section
(extra \\ in front of a ')').

- Luke

Attachment	Content-Type	Size
fast_copy_V14_8.1.patch	application/octet-stream	88.2 KB
pgindent-81.patch	application/octet-stream	37.1 KB

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-28 12:53:50
Message-ID:	42C1485E.3080008@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke, Alon

OK, I'm going to apply the patch to my copy and try to get my head
around it. meanwhile:

. we should not be describing things as "old" or "new". The person
reading the code might have no knowledge of the history, and should not
need to.
. we should not have "slow" and "fast" either. We should have "text",
"csv" and "binary".

IOW, the patch comments look slightly like it is intended for after the
fact application rather than incorporation into the main code.

Are you looking at putting CSV mode into the fast code? Please let me
know if you have questions about that. There are only a few days left to
whip this into shape.

cheers

andrew

Luke Lonergan wrote:

>Andrew,
>
>
>
>>Something strange has happened. I suspect that you've inadvertantly used
>>GNU indent or an unpatched BSD indent. pgindent needs a special patched
>>BSD indent to work according to the PG standards - see the README
>>
>>
>
>OK - phew! I generated new symbols for pgindent and fixed a bug in the awk
>scripting within (diff attached) and ran against the CVS tip copy.c and got
>only minor changes in formatting that appear to be consistent with the rest
>of the code. I pgindent'ed the COPY FROM performance modded code and it
>looks good and tests good.
>
>Only formatting changes to the previous patch for copy.c attached.
>
>Patch to update pgindent with new symbols and fix a bug in an awk section
>(extra \\ in front of a ')').
>
>- Luke
>
>
>

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)surnet(dot)cl>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-28 23:14:41
Message-ID:	200506282314.j5SNEfC08435@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke Lonergan wrote:
> Patch to update pgindent with new symbols and fix a bug in an awk section
> (extra \\ in front of a ')').

Yea, that '\' wasn't needed. I applied the following patch to use //
instead of "" for patterns, and removed the unneeded backslash.

I will update the typedefs in a separate commit.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment	Content-Type	Size
unknown_filename	text/plain	5.3 KB

From:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-15 00:22:18
Message-ID:	BEFC4FCB.7017%agoldshuv@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

I revisited my patch and removed the code duplications that were there, and
added support for CSV with buffered input, so CSV now runs faster too
(although it is not as optimized as the TEXT format parsing). So now
TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Patch attached.

Greetings,
Alon.

Attachment	Content-Type	Size
copy_parse_improvements_V15.patch	application/octet-stream	78.0 KB

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Alon Goldshuv <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 15:23:18
Message-ID:	42DD1AE6.6020204@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Alon Goldshuv wrote:

>I revisited my patch and removed the code duplications that were there, and
>added support for CSV with buffered input, so CSV now runs faster too
>(although it is not as optimized as the TEXT format parsing). So now
>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>
>Patch attached.
>
>
>

I do not have time to review this 2900 line patch analytically, nor to
benchmark it. I have done some functional testing of it on Windows, and
tried to break it in text and CSV modes, and with both Unix and Windows
type line endings - I have not observed any breakage.

This does need lots of eyeballs, though.

cheers

andrew

From:	Mark Wong <markw(at)osdl(dot)org>
To:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 19:54:53
Message-ID:	200507191954.j6JJsVjA003280@smtp.osdl.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

On Thu, 14 Jul 2005 17:22:18 -0700
"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:

> I revisited my patch and removed the code duplications that were there, and
> added support for CSV with buffered input, so CSV now runs faster too
> (although it is not as optimized as the TEXT format parsing). So now
> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.

Hi Alon,

I'm curious, what kind of system are you testing this on? I'm trying to
load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
interested in the results you would expect.

Mark

From:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To:	"Mark Wong" <markw(at)osdl(dot)org>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 21:05:56
Message-ID:	BF02B944.7295%agoldshuv@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Hi Mark,

I improved the data *parsing* capabilities of COPY, and didn't touch the
data conversion or data insertion parts of the code. The parsing improvement
will vary largely depending on the ratio of parsing -to- converting and
inserting.

Therefore, the speed increase really depends on the nature of your data:

100GB file with
long data rows (lots of parsing)
Small number of columns (small number of attr conversions per row)
less rows (less tuple insertions)

Will show the best performance improvements.

However, same file size 100GB with
Short data rows (minimal parsing)
large number of columns (large number of attr conversions per row)
AND/OR
more rows (more tuple insertions)

Will show improvements but not as significant.
In general I'll estimate 40%-95% improvement in load speed for the 1st case
and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
etc... This is for TEXT format. As for CSV, it may be faster but not as much
as I specified here. BINARY will stay the same as before.

HTH
Alon.

On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> On Thu, 14 Jul 2005 17:22:18 -0700
> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>
>> I revisited my patch and removed the code duplications that were there, and
>> added support for CSV with buffered input, so CSV now runs faster too
>> (although it is not as optimized as the TEXT format parsing). So now
>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>
> Hi Alon,
>
> I'm curious, what kind of system are you testing this on? I'm trying to
> load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
> interested in the results you would expect.
>
> Mark
>

From:	Mark Wong <markw(at)osdl(dot)org>
To:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 21:37:58
Message-ID:	200507192137.j6JLbZjA011011@smtp.osdl.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Hi Alon,

Yeah, that helps. I just need to break up my scripts a little to just
load the data and not build indexes.

Is the following information good enough to give a guess about the data
I'm loading, if you don't mind? ;) Here's a link to my script to create
tables:
http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in

File sizes:
-rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 customer.tbl
-rw-r--r-- 1 markw 50 74G Jul 8 15:03 lineitem.tbl
-rw-r--r-- 1 markw 50 2.1K Jul 8 15:03 nation.tbl
-rw-r--r-- 1 markw 50 17G Jul 8 15:03 orders.tbl
-rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 part.tbl
-rw-r--r-- 1 markw 50 12G Jul 8 15:03 partsupp.tbl
-rw-r--r-- 1 markw 50 391 Jul 8 15:03 region.tbl
-rw-r--r-- 1 markw 50 136M Jul 8 15:03 supplier.tbl

Number of rows:
# wc -l *.tbl
15000000 customer.tbl
600037902 lineitem.tbl
25 nation.tbl
150000000 orders.tbl
20000000 part.tbl
80000000 partsupp.tbl
5 region.tbl
1000000 supplier.tbl

Thanks,
Mark

On Tue, 19 Jul 2005 14:05:56 -0700
"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:

> Hi Mark,
>
> I improved the data *parsing* capabilities of COPY, and didn't touch the
> data conversion or data insertion parts of the code. The parsing improvement
> will vary largely depending on the ratio of parsing -to- converting and
> inserting.
>
> Therefore, the speed increase really depends on the nature of your data:
>
> 100GB file with
> long data rows (lots of parsing)
> Small number of columns (small number of attr conversions per row)
> less rows (less tuple insertions)
>
> Will show the best performance improvements.
>
> However, same file size 100GB with
> Short data rows (minimal parsing)
> large number of columns (large number of attr conversions per row)
> AND/OR
> more rows (more tuple insertions)
>
> Will show improvements but not as significant.
> In general I'll estimate 40%-95% improvement in load speed for the 1st case
> and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
> etc... This is for TEXT format. As for CSV, it may be faster but not as much
> as I specified here. BINARY will stay the same as before.
>
> HTH
> Alon.
>
>
>
>
>
>
> On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>
> > On Thu, 14 Jul 2005 17:22:18 -0700
> > "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
> >
> >> I revisited my patch and removed the code duplications that were there, and
> >> added support for CSV with buffered input, so CSV now runs faster too
> >> (although it is not as optimized as the TEXT format parsing). So now
> >> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
> >
> > Hi Alon,
> >
> > I'm curious, what kind of system are you testing this on? I'm trying to
> > load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
> > interested in the results you would expect.
> >
> > Mark
> >
>

From:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To:	"Mark Wong" <markw(at)osdl(dot)org>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 22:06:17
Message-ID:	BF02C769.72A8%agoldshuv@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Mark,

Thanks for the info.

Yes, isolating indexes out of the picture is a good idea for this purpose.

I can't really give a guess to how fast the load rate should be. I don't
know how your system is configured, and all the hardware characteristics
(and even if I knew that info I may not be able to guess...). I am pretty
confident that the load will be faster than before, I'll risk that ;-)
Looking into your TPC-H size and metadata I'll estimate that
partsupp,customer and orders will have the most significant increase in load
rate. You could start with those.

I guess the only way to really know is to try... Load several times with the
existing PG-COPY and then load several times with the patched COPY and
compare. I'll be curious to hear your results.

Thx,
Alon.

On 7/19/05 2:37 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> Hi Alon,
>
> Yeah, that helps. I just need to break up my scripts a little to just
> load the data and not build indexes.
>
> Is the following information good enough to give a guess about the data
> I'm loading, if you don't mind? ;) Here's a link to my script to create
> tables:
> http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f
> 7f23437e432&path=scripts/pgsql/create_tables.sh.in
>
> File sizes:
> -rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 customer.tbl
> -rw-r--r-- 1 markw 50 74G Jul 8 15:03 lineitem.tbl
> -rw-r--r-- 1 markw 50 2.1K Jul 8 15:03 nation.tbl
> -rw-r--r-- 1 markw 50 17G Jul 8 15:03 orders.tbl
> -rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 part.tbl
> -rw-r--r-- 1 markw 50 12G Jul 8 15:03 partsupp.tbl
> -rw-r--r-- 1 markw 50 391 Jul 8 15:03 region.tbl
> -rw-r--r-- 1 markw 50 136M Jul 8 15:03 supplier.tbl
>
> Number of rows:
> # wc -l *.tbl
> 15000000 customer.tbl
> 600037902 lineitem.tbl
> 25 nation.tbl
> 150000000 orders.tbl
> 20000000 part.tbl
> 80000000 partsupp.tbl
> 5 region.tbl
> 1000000 supplier.tbl
>
> Thanks,
> Mark
>
> On Tue, 19 Jul 2005 14:05:56 -0700
> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>
>> Hi Mark,
>>
>> I improved the data *parsing* capabilities of COPY, and didn't touch the
>> data conversion or data insertion parts of the code. The parsing improvement
>> will vary largely depending on the ratio of parsing -to- converting and
>> inserting.
>>
>> Therefore, the speed increase really depends on the nature of your data:
>>
>> 100GB file with
>> long data rows (lots of parsing)
>> Small number of columns (small number of attr conversions per row)
>> less rows (less tuple insertions)
>>
>> Will show the best performance improvements.
>>
>> However, same file size 100GB with
>> Short data rows (minimal parsing)
>> large number of columns (large number of attr conversions per row)
>> AND/OR
>> more rows (more tuple insertions)
>>
>> Will show improvements but not as significant.
>> In general I'll estimate 40%-95% improvement in load speed for the 1st case
>> and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
>> etc... This is for TEXT format. As for CSV, it may be faster but not as much
>> as I specified here. BINARY will stay the same as before.
>>
>> HTH
>> Alon.
>>
>>
>>
>>
>>
>>
>> On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>>
>>> On Thu, 14 Jul 2005 17:22:18 -0700
>>> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>>>
>>>> I revisited my patch and removed the code duplications that were there, and
>>>> added support for CSV with buffered input, so CSV now runs faster too
>>>> (although it is not as optimized as the TEXT format parsing). So now
>>>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
>>>> file.
>>>
>>> Hi Alon,
>>>
>>> I'm curious, what kind of system are you testing this on? I'm trying to
>>> load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
>>> interested in the results you would expect.
>>>
>>> Mark
>>>
>>
>

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Mark Wong <markw(at)osdl(dot)org>
Cc:	Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 22:17:52
Message-ID:	42DD7C10.4090008@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Mark,

You should definitely not be doing this sort of thing, I believe:

CREATE TABLE orders (
o_orderkey INTEGER,
o_custkey INTEGER,
o_orderstatus CHAR(1),
o_totalprice REAL,
o_orderDATE DATE,
o_orderpriority CHAR(15),
o_clerk CHAR(15),
o_shippriority INTEGER,
o_comment VARCHAR(79),
PRIMARY KEY (o_orderkey))

Create the table with no constraints, load the data, then set up primary keys and whatever other constraints you want using ALTER TABLE. Last time I did a load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from deferring constarint creation.

cheers

andrew

Mark Wong wrote:

>Hi Alon,
>
>Yeah, that helps. I just need to break up my scripts a little to just
>load the data and not build indexes.
>
>Is the following information good enough to give a guess about the data
>I'm loading, if you don't mind? ;) Here's a link to my script to create
>tables:
>http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in
>
>File sizes:
>-rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 customer.tbl
>-rw-r--r-- 1 markw 50 74G Jul 8 15:03 lineitem.tbl
>-rw-r--r-- 1 markw 50 2.1K Jul 8 15:03 nation.tbl
>-rw-r--r-- 1 markw 50 17G Jul 8 15:03 orders.tbl
>-rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 part.tbl
>-rw-r--r-- 1 markw 50 12G Jul 8 15:03 partsupp.tbl
>-rw-r--r-- 1 markw 50 391 Jul 8 15:03 region.tbl
>-rw-r--r-- 1 markw 50 136M Jul 8 15:03 supplier.tbl
>
>Number of rows:
># wc -l *.tbl
> 15000000 customer.tbl
> 600037902 lineitem.tbl
> 25 nation.tbl
> 150000000 orders.tbl
> 20000000 part.tbl
> 80000000 partsupp.tbl
> 5 region.tbl
> 1000000 supplier.tbl
>
>Thanks,
>Mark
>
>On Tue, 19 Jul 2005 14:05:56 -0700
>"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>
>
>
>>Hi Mark,
>>
>>I improved the data *parsing* capabilities of COPY, and didn't touch the
>>data conversion or data insertion parts of the code. The parsing improvement
>>will vary largely depending on the ratio of parsing -to- converting and
>>inserting.
>>
>>Therefore, the speed increase really depends on the nature of your data:
>>
>>100GB file with
>>long data rows (lots of parsing)
>>Small number of columns (small number of attr conversions per row)
>>less rows (less tuple insertions)
>>
>>Will show the best performance improvements.
>>
>>However, same file size 100GB with
>>Short data rows (minimal parsing)
>>large number of columns (large number of attr conversions per row)
>>AND/OR
>>more rows (more tuple insertions)
>>
>>Will show improvements but not as significant.
>>In general I'll estimate 40%-95% improvement in load speed for the 1st case
>>and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
>>etc... This is for TEXT format. As for CSV, it may be faster but not as much
>>as I specified here. BINARY will stay the same as before.
>>
>>HTH
>>Alon.
>>
>>
>>
>>
>>
>>
>>On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>>
>>
>>
>>>On Thu, 14 Jul 2005 17:22:18 -0700
>>>"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>>>
>>>
>>>
>>>>I revisited my patch and removed the code duplications that were there, and
>>>>added support for CSV with buffered input, so CSV now runs faster too
>>>>(although it is not as optimized as the TEXT format parsing). So now
>>>>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
>>>>
>>>>
>>>Hi Alon,
>>>
>>>I'm curious, what kind of system are you testing this on? I'm trying to
>>>load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
>>>interested in the results you would expect.
>>>
>>>Mark
>>>
>>>
>>>
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>
>

From:	Mark Wong <markw(at)osdl(dot)org>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-19 22:51:33
Message-ID:	200507192251.j6JMpAjA016147@smtp.osdl.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Whoopsies, yeah good point about the PRIMARY KEY. I'll fix that.

Mark

On Tue, 19 Jul 2005 18:17:52 -0400
Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

> Mark,
>
> You should definitely not be doing this sort of thing, I believe:
>
> CREATE TABLE orders (
> o_orderkey INTEGER,
> o_custkey INTEGER,
> o_orderstatus CHAR(1),
> o_totalprice REAL,
> o_orderDATE DATE,
> o_orderpriority CHAR(15),
> o_clerk CHAR(15),
> o_shippriority INTEGER,
> o_comment VARCHAR(79),
> PRIMARY KEY (o_orderkey))
>
> Create the table with no constraints, load the data, then set up primary keys and whatever other constraints you want using ALTER TABLE. Last time I did a load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup from deferring constarint creation.
>
>
> cheers
>
> andrew
>
>
>
> Mark Wong wrote:
>
> >Hi Alon,
> >
> >Yeah, that helps. I just need to break up my scripts a little to just
> >load the data and not build indexes.
> >
> >Is the following information good enough to give a guess about the data
> >I'm loading, if you don't mind? ;) Here's a link to my script to create
> >tables:
> >http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb44f7f23437e432&path=scripts/pgsql/create_tables.sh.in
> >
> >File sizes:
> >-rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 customer.tbl
> >-rw-r--r-- 1 markw 50 74G Jul 8 15:03 lineitem.tbl
> >-rw-r--r-- 1 markw 50 2.1K Jul 8 15:03 nation.tbl
> >-rw-r--r-- 1 markw 50 17G Jul 8 15:03 orders.tbl
> >-rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 part.tbl
> >-rw-r--r-- 1 markw 50 12G Jul 8 15:03 partsupp.tbl
> >-rw-r--r-- 1 markw 50 391 Jul 8 15:03 region.tbl
> >-rw-r--r-- 1 markw 50 136M Jul 8 15:03 supplier.tbl
> >
> >Number of rows:
> ># wc -l *.tbl
> > 15000000 customer.tbl
> > 600037902 lineitem.tbl
> > 25 nation.tbl
> > 150000000 orders.tbl
> > 20000000 part.tbl
> > 80000000 partsupp.tbl
> > 5 region.tbl
> > 1000000 supplier.tbl
> >
> >Thanks,
> >Mark
> >
> >On Tue, 19 Jul 2005 14:05:56 -0700
> >"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
> >
> >
> >
> >>Hi Mark,
> >>
> >>I improved the data *parsing* capabilities of COPY, and didn't touch the
> >>data conversion or data insertion parts of the code. The parsing improvement
> >>will vary largely depending on the ratio of parsing -to- converting and
> >>inserting.
> >>
> >>Therefore, the speed increase really depends on the nature of your data:
> >>
> >>100GB file with
> >>long data rows (lots of parsing)
> >>Small number of columns (small number of attr conversions per row)
> >>less rows (less tuple insertions)
> >>
> >>Will show the best performance improvements.
> >>
> >>However, same file size 100GB with
> >>Short data rows (minimal parsing)
> >>large number of columns (large number of attr conversions per row)
> >>AND/OR
> >>more rows (more tuple insertions)
> >>
> >>Will show improvements but not as significant.
> >>In general I'll estimate 40%-95% improvement in load speed for the 1st case
> >>and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
> >>etc... This is for TEXT format. As for CSV, it may be faster but not as much
> >>as I specified here. BINARY will stay the same as before.
> >>
> >>HTH
> >>Alon.
> >>
> >>
> >>
> >>
> >>
> >>
> >>On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
> >>
> >>
> >>
> >>>On Thu, 14 Jul 2005 17:22:18 -0700
> >>>"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
> >>>
> >>>
> >>>
> >>>>I revisited my patch and removed the code duplications that were there, and
> >>>>added support for CSV with buffered input, so CSV now runs faster too
> >>>>(although it is not as optimized as the TEXT format parsing). So now
> >>>>TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original file.
> >>>>
> >>>>
> >>>Hi Alon,
> >>>
> >>>I'm curious, what kind of system are you testing this on? I'm trying to
> >>>load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
> >>>interested in the results you would expect.
> >>>
> >>>Mark
> >>>
> >>>
> >>>
> >

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Mark Wong" <markw(at)osdl(dot)org>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-20 00:39:33
Message-ID:	BF02EB55.8E9A%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Good points on all, another element in the performance expectations is the
ratio of CPU speed to I/O subsystem speed, as Alon had hinted earlier.

This patch substantially (500%) improves the efficiency of parsing in the
COPY path, which, on a 3GHz P4 desktop with a commodity disk drive
represents 8 of a total of 30 seconds of processing time. So, by reducing
the parsing time from 8 seconds to 1.5 seconds, the overall COPY time is
reduced from 30 seconds to 23.5 seconds, or a speedup of about 20%.

On a dual 2.2GHz Opteron machine with a 6-disk SCSI RAID subsystem capable
of 240MB/s sequential read and writes, the ratios change and we see between
35% and 95% increase in COPY performance, with the bottleneck being CPU.
The disk is only running at about 90MB/s during this period.

I'd expect that as your CPUs slow down relative to your I/O speed, and
Itaniums or IT2s are quite slow, you should see an increased effect of the
parsing improvements.

One good way to validate the effect is to watch the I/O bandwidth using
vmstat 1 (on Linux) while the load is progressing. When you watch that with
the unpatched source and with the patched source, if they are the same, you
should see no benefit from the patch (you are I/O limited).

If you check your underlying sequential write speed, you will be
bottlenecked at roughly half that in performing COPY because of the
write-through the WAL.

- Luke

On 7/19/05 3:51 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> Whoopsies, yeah good point about the PRIMARY KEY. I'll fix that.
>
> Mark
>
> On Tue, 19 Jul 2005 18:17:52 -0400
> Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> Mark,
>>
>> You should definitely not be doing this sort of thing, I believe:
>>
>> CREATE TABLE orders (
>> o_orderkey INTEGER,
>> o_custkey INTEGER,
>> o_orderstatus CHAR(1),
>> o_totalprice REAL,
>> o_orderDATE DATE,
>> o_orderpriority CHAR(15),
>> o_clerk CHAR(15),
>> o_shippriority INTEGER,
>> o_comment VARCHAR(79),
>> PRIMARY KEY (o_orderkey))
>>
>> Create the table with no constraints, load the data, then set up primary keys
>> and whatever other constraints you want using ALTER TABLE. Last time I did a
>> load like this (albeit 2 orders of magnitude smaller) I saw a 50% speedup
>> from deferring constarint creation.
>>
>>
>> cheers
>>
>> andrew
>>
>>
>>
>> Mark Wong wrote:
>>
>>> Hi Alon,
>>>
>>> Yeah, that helps. I just need to break up my scripts a little to just
>>> load the data and not build indexes.
>>>
>>> Is the following information good enough to give a guess about the data
>>> I'm loading, if you don't mind? ;) Here's a link to my script to create
>>> tables:
>>> http://developer.osdl.org/markw/mt/getfile.py?id=eaf16b7831588729780645b2bb4
>>> 4f7f23437e432&path=scripts/pgsql/create_tables.sh.in
>>>
>>> File sizes:
>>> -rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 customer.tbl
>>> -rw-r--r-- 1 markw 50 74G Jul 8 15:03 lineitem.tbl
>>> -rw-r--r-- 1 markw 50 2.1K Jul 8 15:03 nation.tbl
>>> -rw-r--r-- 1 markw 50 17G Jul 8 15:03 orders.tbl
>>> -rw-r--r-- 1 markw 50 2.3G Jul 8 15:03 part.tbl
>>> -rw-r--r-- 1 markw 50 12G Jul 8 15:03 partsupp.tbl
>>> -rw-r--r-- 1 markw 50 391 Jul 8 15:03 region.tbl
>>> -rw-r--r-- 1 markw 50 136M Jul 8 15:03 supplier.tbl
>>>
>>> Number of rows:
>>> # wc -l *.tbl
>>> 15000000 customer.tbl
>>> 600037902 lineitem.tbl
>>> 25 nation.tbl
>>> 150000000 orders.tbl
>>> 20000000 part.tbl
>>> 80000000 partsupp.tbl
>>> 5 region.tbl
>>> 1000000 supplier.tbl
>>>
>>> Thanks,
>>> Mark
>>>
>>> On Tue, 19 Jul 2005 14:05:56 -0700
>>> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>>>
>>>
>>>
>>>> Hi Mark,
>>>>
>>>> I improved the data *parsing* capabilities of COPY, and didn't touch the
>>>> data conversion or data insertion parts of the code. The parsing
>>>> improvement
>>>> will vary largely depending on the ratio of parsing -to- converting and
>>>> inserting.
>>>>
>>>> Therefore, the speed increase really depends on the nature of your data:
>>>>
>>>> 100GB file with
>>>> long data rows (lots of parsing)
>>>> Small number of columns (small number of attr conversions per row)
>>>> less rows (less tuple insertions)
>>>>
>>>> Will show the best performance improvements.
>>>>
>>>> However, same file size 100GB with
>>>> Short data rows (minimal parsing)
>>>> large number of columns (large number of attr conversions per row)
>>>> AND/OR
>>>> more rows (more tuple insertions)
>>>>
>>>> Will show improvements but not as significant.
>>>> In general I'll estimate 40%-95% improvement in load speed for the 1st case
>>>> and 10%-40% for the 2nd. But that also depends on the hardware, disk speed
>>>> etc... This is for TEXT format. As for CSV, it may be faster but not as
>>>> much
>>>> as I specified here. BINARY will stay the same as before.
>>>>
>>>> HTH
>>>> Alon.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 7/19/05 12:54 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>>>>
>>>>
>>>>
>>>>> On Thu, 14 Jul 2005 17:22:18 -0700
>>>>> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I revisited my patch and removed the code duplications that were there,
>>>>>> and
>>>>>> added support for CSV with buffered input, so CSV now runs faster too
>>>>>> (although it is not as optimized as the TEXT format parsing). So now
>>>>>> TEXT,CSV and BINARY are all parsed in CopyFrom(), like in the original
>>>>>> file.
>>>>>>
>>>>>>
>>>>> Hi Alon,
>>>>>
>>>>> I'm curious, what kind of system are you testing this on? I'm trying to
>>>>> load 100GB of data in our dbt3 workload on a 4-way itanium2. I'm
>>>>> interested in the results you would expect.
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>>
>>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

From:	Mark Wong <markw(at)osdl(dot)org>
To:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-21 21:55:07
Message-ID:	200507212154.j6LLsdjA002847@smtp.osdl.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

I just ran through a few tests with the v14 patch against 100GB of data
from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
give a few details, I only loaded data and started a COPY in parallel
for each the data files:
http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/

Here's a visual of my disk layout, for those familiar with the database schema:
http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4-010-dbt3.html

I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.

Let me know if you have any questions.

Mark

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Mark Wong" <markw(at)osdl(dot)org>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-21 23:14:47
Message-ID:	BF057A77.96D7%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Cool!

At what rate does your disk setup write sequential data, e.g.:
time dd if=/dev/zero of=bigfile bs=8k count=500000

(sized for 2x RAM on a system with 2GB)

BTW - the Compaq smartarray controllers are pretty broken on Linux from a
performance standpoint in our experience. We've had disastrously bad
results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
2.6.10, on the order of 20MB/s.

For comparison, the results on our dual opteron with a single LSI SCSI
controller with software RAID0 on a 2.6.10 kernel:

[llonergan(at)stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
count=500000
500000+0 records in
500000+0 records out

real 0m24.702s
user 0m0.077s
sys 0m8.794s

Which calculates out to about 161MB/s.

- Luke

On 7/21/05 2:55 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> I just ran through a few tests with the v14 patch against 100GB of data
> from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
> give a few details, I only loaded data and started a COPY in parallel
> for each the data files:
> http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/
>
> Here's a visual of my disk layout, for those familiar with the database
> schema:
> http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
> -010-dbt3.html
>
> I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
> attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
>
> Let me know if you have any questions.
>
> Mark
>

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Mark Wong <markw(at)osdl(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 00:08:09
Message-ID:	42E038E9.1080107@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke Lonergan wrote:
> Cool!
>
> At what rate does your disk setup write sequential data, e.g.:
> time dd if=/dev/zero of=bigfile bs=8k count=500000
>
> (sized for 2x RAM on a system with 2GB)
>
> BTW - the Compaq smartarray controllers are pretty broken on Linux from a
> performance standpoint in our experience. We've had disastrously bad
> results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
> 2.6.10, on the order of 20MB/s.

O.k. this strikes me as interesting, now we know that Compaq and Dell
are borked for Linux. Is there a name brand server (read Enterprise)
that actually does provide reasonable performance?

>
> For comparison, the results on our dual opteron with a single LSI SCSI
> controller with software RAID0 on a 2.6.10 kernel:
>
> [llonergan(at)stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
> count=500000
> 500000+0 records in
> 500000+0 records out
>
> real 0m24.702s
> user 0m0.077s
> sys 0m8.794s
>
> Which calculates out to about 161MB/s.
>
> - Luke
>
>
> On 7/21/05 2:55 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>
>
>>I just ran through a few tests with the v14 patch against 100GB of data
>>from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
>>give a few details, I only loaded data and started a COPY in parallel
>>for each the data files:
>>http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/
>>
>>Here's a visual of my disk layout, for those familiar with the database
>>schema:
>>http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
>>-010-dbt3.html
>>
>>I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
>>attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
>>
>>Let me know if you have any questions.
>>
>>Mark
>>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	"Mark Wong" <markw(at)osdl(dot)org>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 02:04:55
Message-ID:	BF05A257.96F2%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Joshua,

On 7/21/05 5:08 PM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:

> O.k. this strikes me as interesting, now we know that Compaq and Dell
> are borked for Linux. Is there a name brand server (read Enterprise)
> that actually does provide reasonable performance?

I think late model Dell (post the bad chipset problem, circa 2001-2?) and
IBM and Sun servers are fine because they all use simple SCSI adapters from
LSI or Adaptec.

The HP Smartarray is an aberration, they don't have good driver support for
Linux and as a consequence have some pretty bad problems with both
performance and stability. On Windows they perform quite well.

Also - there are very big issues with some SATA controllers and Linux we've
seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
Not sure about Nvidia, but the only ones I trust now are 3Ware and the
others mentioned in earlier posts.

- Luke

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Mark Wong <markw(at)osdl(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 02:53:35
Message-ID:	42E05FAF.70601@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

> I think late model Dell (post the bad chipset problem, circa 2001-2?) and
> IBM and Sun servers are fine because they all use simple SCSI adapters from
> LSI or Adaptec.

Well I know that isn't true at least not with ANY of the Dells my
customers have purchased in the last 18 months. They are still really,
really slow.

> Also - there are very big issues with some SATA controllers and Linux we've
> seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
> Not sure about Nvidia, but the only ones I trust now are 3Ware and the
> others mentioned in earlier posts.

I have great success with Silicon Image as long as I am running them
with Linux software RAID. The LSI controllers are also really nice.

>
> - Luke
>
>
>

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc:	"Mark Wong" <markw(at)osdl(dot)org>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 04:19:04
Message-ID:	BF05C1C8.9702%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Joshua,

On 7/21/05 7:53 PM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
> Well I know that isn't true at least not with ANY of the Dells my
> customers have purchased in the last 18 months. They are still really,
> really slow.

That's too bad, can you cite some model numbers? SCSI?

> I have great success with Silicon Image as long as I am running them
> with Linux software RAID. The LSI controllers are also really nice.

That's good to hear, I gave up on Silicon Image controllers on Linux about 1
year ago, which kernel are you using with success? Silicon Image
controllers are the most popular, so it's important to see them supported
well, though I'd rather see more SATA headers than 2 off of the built-in
chipsets.

- Luke

From:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
To:	<llonergan(at)greenplum(dot)com>
Cc:	<jd(at)commandprompt(dot)com>, <markw(at)osdl(dot)org>, <andrew(at)dunslane(dot)net>, <alvherre(at)surnet(dot)cl>, <pgman(at)candle(dot)pha(dot)pa(dot)us>, <agoldshuv(at)greenplum(dot)com>, <pgsql-patches(at)postgresql(dot)org>, <maryedie(at)osdl(dot)org>
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 05:27:41
Message-ID:	3277.24.211.165.134.1122010061.squirrel@www.dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

this discussion belongs on -performance

cheers

andrew

Luke Lonergan said:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
>> Well I know that isn't true at least not with ANY of the Dells my
>> customers have purchased in the last 18 months. They are still really,
>> really slow.
>
> That's too bad, can you cite some model numbers? SCSI?
>
>> I have great success with Silicon Image as long as I am running them
>> with Linux software RAID. The LSI controllers are also really nice.
>
> That's good to hear, I gave up on Silicon Image controllers on Linux
> about 1 year ago, which kernel are you using with success? Silicon
> Image
> controllers are the most popular, so it's important to see them
> supported well, though I'd rather see more SATA headers than 2 off of
> the built-in chipsets.
>
> - Luke

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Mark Wong <markw(at)osdl(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 15:03:28
Message-ID:	42E10AC0.1080905@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke Lonergan wrote:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
>
>>Well I know that isn't true at least not with ANY of the Dells my
>>customers have purchased in the last 18 months. They are still really,
>>really slow.
>
>
> That's too bad, can you cite some model numbers? SCSI?

Yeah I will get them and post, but yes they are all SCSI.

>
>
>>I have great success with Silicon Image as long as I am running them
>>with Linux software RAID. The LSI controllers are also really nice.
>
>
> That's good to hear, I gave up on Silicon Image controllers on Linux about 1
> year ago, which kernel are you using with success?

Any of the 2.6 kernels. ALso the laster 2.4 (+22 I believe) support it
pretty well as well.

Silicon Image
> controllers are the most popular, so it's important to see them supported
> well, though I'd rather see more SATA headers than 2 off of the built-in
> chipsets.
>
> - Luke
>

From:	Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Mark Wong <markw(at)osdl(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-07-22 16:49:05
Message-ID:	20050722164905.GE18413@quartz.newn.cam.ac.uk
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
> > Well I know that isn't true at least not with ANY of the Dells my
> > customers have purchased in the last 18 months. They are still really,
> > really slow.
>
> That's too bad, can you cite some model numbers? SCSI?

I would be interested too, given

http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30531

Cheers,

Patrick

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>
Cc:	Luke Lonergan <llonergan(at)greenplum(dot)com>, Mark Wong <markw(at)osdl(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)surnet(dot)cl>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-22 17:11:25
Message-ID:	42E128BD.2000104@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Here is the SCSI output:

Web Server

SCSI subsystem driver Revision: 1.00
megaraid: v1.18j (Release Date: Mon Jul 7 14:39:55 EDT 2003)
megaraid: found 0x1028:0x000f:idx 0:bus 4:slot 3:func 0
scsi0 : Found a MegaRAID controller at 0xf883f000, IRQ: 18
scsi0 : Enabling 64 bit support
megaraid: [412W:H406] detected 1 logical drives
megaraid: supports extended CDBs.
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
scsi0 : LSI Logic MegaRAID 412W 254 commands 15 targs 5 chans 7 luns

Database Server

SCSI subsystem driver Revision: 1.00
megaraid: v1.18j (Release Date: Mon Jul 7 14:39:55 EDT 2003)
megaraid: found 0x101e:0x1960:idx 0:bus 5:slot 0:func 0
scsi0 : Found a MegaRAID controller at 0xf883f000, IRQ: 21
scsi0 : Enabling 64 bit support
megaraid: [196T:3.33] detected 1 logical drives
megaraid: supports extended CDBs.
megaraid: channel[1] is raid.
megaraid: channel[2] is raid.
scsi0 : LSI Logic MegaRAID 196T 254 commands 15 targs 5 chans 7 luns
Starting timer : 0 0
blk: queue c5f2d218, I/O limit 4095Mb (mask 0xffffffff)
scsi0: scanning virtual channel 0 for logical drives.
Vendor: MegaRAID Model: LD 0 RAID5 86G Rev: 196T
Type: Direct-Access ANSI SCSI revision: 02
Starting timer : 0 0

The webserver is a 1U and it actually performs better on the IO than the
database server even though the database server is running 6 disks versus 3.

The database server is a PE (Power Edge) 6600

Database Server IO:

[root(at)master root]# /sbin/hdparm -tT /dev/sda

/dev/sda:
Timing buffer-cache reads: 1888 MB in 2.00 seconds = 944.00 MB/sec
Timing buffered disk reads: 32 MB in 3.06 seconds = 10.46 MB/sec

Second Database Server IO:

[root(at)pq-slave root]# /sbin/hdparm -tT /dev/sda

/dev/sda:
Timing buffer-cache reads: 1816 MB in 2.00 seconds = 908.00 MB/sec
Timing buffered disk reads: 26 MB in 3.11 seconds = 8.36 MB/sec
[root(at)pq-slave root]#

Which is just horrible.

Sincerely,

Joshua D. Drake

Patrick Welche wrote:
> On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote:
>
>>Joshua,
>>
>>On 7/21/05 7:53 PM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
>>
>>>Well I know that isn't true at least not with ANY of the Dells my
>>>customers have purchased in the last 18 months. They are still really,
>>>really slow.
>>
>>That's too bad, can you cite some model numbers? SCSI?
>
>
> I would be interested too, given
>
> http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30531
>
>
> Cheers,
>
> Patrick
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Patrick Welche" <prlw1(at)newn(dot)cam(dot)ac(dot)uk>
Cc:	"Mark Wong" <markw(at)osdl(dot)org>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-22 19:28:43
Message-ID:	BF0696FB.9795%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Joshua,

On 7/22/05 10:11 AM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
> The database server is a PE (Power Edge) 6600
>
> Database Server IO:
>
> [root(at)master root]# /sbin/hdparm -tT /dev/sda
>
> /dev/sda:
> Timing buffer-cache reads: 1888 MB in 2.00 seconds = 944.00 MB/sec
> Timing buffered disk reads: 32 MB in 3.06 seconds = 10.46 MB/sec
>
> Second Database Server IO:
>
> [root(at)pq-slave root]# /sbin/hdparm -tT /dev/sda
>
> /dev/sda:
> Timing buffer-cache reads: 1816 MB in 2.00 seconds = 908.00 MB/sec
> Timing buffered disk reads: 26 MB in 3.11 seconds = 8.36 MB/sec
> [root(at)pq-slave root]#

Can you post the "time dd if=/dev/zero of=bigfile bs=8k count=500000"
results? Also do the reverse (read the file) with "time dd if=bigfile
of=/dev/null bs=8k".

I think you are observing what we've known for a while, hardware RAID is
horribly slow. We've not found a hardware RAID adapter of this class yet
that shows reasonable read or write performance. The Adaptec 2400R or the
LSI or others have terrible internal I/O compared to raw SCSI with software
RAID, and even the CPU usage is higher on these cards while doing slower I/O
than linux SW RAID.

Notably - we've found that the 3Ware RAID controller does a better job than
the low end SCSI RAID at HW RAID support, and also exports JBOD at high
speeds. If you export JBOD on the low end SCSI RAID adapters, the
performance is also very poor, though generally faster than using HW RAID.

- Luke

From:	Mark Wong <markw(at)osdl(dot)org>
To:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-22 19:47:18
Message-ID:	200507221946.j6MJkljA024095@smtp.osdl.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

On a single spindle:

$ time dd if=/dev/zero of=bigfile bs=8k count=2000000
2000000+0 records in
2000000+0 records out

real 2m8.569s
user 0m0.725s
sys 0m19.633s

None of my drives are partitioned big enough for me to create 2x RAM
sized files on a single disk. I have 16MB RAM and only 36GB drives.
But here are some number for my 12-disk lvm2 striped volume.

$ time dd if=/dev/zero of=bigfile3 bs=8k count=4000000
4000000+0 records in
4000000+0 records out

real 1m17.059s
user 0m1.479s
sys 0m41.293s

Mark

On Thu, 21 Jul 2005 16:14:47 -0700
"Luke Lonergan" <llonergan(at)greenplum(dot)com> wrote:

> Cool!
>
> At what rate does your disk setup write sequential data, e.g.:
> time dd if=/dev/zero of=bigfile bs=8k count=500000
>
> (sized for 2x RAM on a system with 2GB)
>
> BTW - the Compaq smartarray controllers are pretty broken on Linux from a
> performance standpoint in our experience. We've had disastrously bad
> results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
> 2.6.10, on the order of 20MB/s.
>
> For comparison, the results on our dual opteron with a single LSI SCSI
> controller with software RAID0 on a 2.6.10 kernel:
>
> [llonergan(at)stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
> count=500000
> 500000+0 records in
> 500000+0 records out
>
> real 0m24.702s
> user 0m0.077s
> sys 0m8.794s
>
> Which calculates out to about 161MB/s.
>
> - Luke
>
>
> On 7/21/05 2:55 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>
> > I just ran through a few tests with the v14 patch against 100GB of data
> > from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
> > give a few details, I only loaded data and started a COPY in parallel
> > for each the data files:
> > http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/
> >
> > Here's a visual of my disk layout, for those familiar with the database
> > schema:
> > http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4
> > -010-dbt3.html
> >
> > I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
> > attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
> >
> > Let me know if you have any questions.
> >
> > Mark
> >
>

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Mark Wong" <markw(at)osdl(dot)org>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alvaro Herrera" <alvherre(at)surnet(dot)cl>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-23 04:09:10
Message-ID:	BF0710F6.97DE%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Mark,

On 7/22/05 12:47 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> On a single spindle:
>
> $ time dd if=/dev/zero of=bigfile bs=8k count=2000000
> 2000000+0 records in
> 2000000+0 records out
>
> real 2m8.569s
> user 0m0.725s
> sys 0m19.633s

This is super fast! 124MB/s seems too fast for true write performance on a
single spindle.

> But here are some number for my 12-disk lvm2 striped volume.

So, software striping on how many controllers?

> $ time dd if=/dev/zero of=bigfile3 bs=8k count=4000000
> 4000000+0 records in
> 4000000+0 records out
>
> real 1m17.059s
> user 0m1.479s
> sys 0m41.293s

Again - super fast at 416MB/s. How many controllers?

When we had our problems with the cciss driver and the smartarray 5i/6
controllers, we found the only way to get any performance out of them was to
run them in JBOD mode and software stripe. However, when we did so the CPU
usage skyrocketed and the performance of simple SCSI adapters was 50% faster
with less CPU consumption. These numbers show 100*(42.8/67) = 64% CPU
consumption, I'd expect less with 2 simple U320 SCSI controllers.

- Luke

From:	Mark Wong <markw(at)osdl(dot)org>
To:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Patrick Welche" <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-28 23:43:52
Message-ID:	200507282343.j6SNhEjA013300@smtp.osdl.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

On Fri, 22 Jul 2005 12:28:43 -0700
"Luke Lonergan" <llonergan(at)greenplum(dot)com> wrote:

> Joshua,
>
> On 7/22/05 10:11 AM, "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
> > The database server is a PE (Power Edge) 6600
> >
> > Database Server IO:
> >
> > [root(at)master root]# /sbin/hdparm -tT /dev/sda
> >
> > /dev/sda:
> > Timing buffer-cache reads: 1888 MB in 2.00 seconds = 944.00 MB/sec
> > Timing buffered disk reads: 32 MB in 3.06 seconds = 10.46 MB/sec
> >
> > Second Database Server IO:
> >
> > [root(at)pq-slave root]# /sbin/hdparm -tT /dev/sda
> >
> > /dev/sda:
> > Timing buffer-cache reads: 1816 MB in 2.00 seconds = 908.00 MB/sec
> > Timing buffered disk reads: 26 MB in 3.11 seconds = 8.36 MB/sec
> > [root(at)pq-slave root]#
>
> Can you post the "time dd if=/dev/zero of=bigfile bs=8k count=500000"
> results? Also do the reverse (read the file) with "time dd if=bigfile
> of=/dev/null bs=8k".
>
> I think you are observing what we've known for a while, hardware RAID is
> horribly slow. We've not found a hardware RAID adapter of this class yet
> that shows reasonable read or write performance. The Adaptec 2400R or the
> LSI or others have terrible internal I/O compared to raw SCSI with software
> RAID, and even the CPU usage is higher on these cards while doing slower I/O
> than linux SW RAID.
>
> Notably - we've found that the 3Ware RAID controller does a better job than
> the low end SCSI RAID at HW RAID support, and also exports JBOD at high
> speeds. If you export JBOD on the low end SCSI RAID adapters, the
> performance is also very poor, though generally faster than using HW RAID.

Are there any recommendations for Qlogic controllers on Linux, scsi or
fiber channel? I might be able to my hands on some. I have pci-x slots
for AMD, Itanium, or POWER5 if the architecture makes a difference.

Mark

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Mark Wong <markw(at)osdl(dot)org>
Cc:	Luke Lonergan <llonergan(at)greenplum(dot)com>, Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-28 23:57:24
Message-ID:	42E970E4.4010902@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

>>
>>>[root(at)pq-slave root]# /sbin/hdparm -tT /dev/sda
>>>
>>>/dev/sda:
>>> Timing buffer-cache reads: 1816 MB in 2.00 seconds = 908.00 MB/sec
>>> Timing buffered disk reads: 26 MB in 3.11 seconds = 8.36 MB/sec
>>>[root(at)pq-slave root]#
>>
>>Can you post the "time dd if=/dev/zero of=bigfile bs=8k count=500000"
>>results? Also do the reverse (read the file) with "time dd if=bigfile
>>of=/dev/null bs=8k".

I didn't see this come across before... here ya go:

time dd if=/dev/zero of=bigfile bs=8k count=500000

500000+0 records in
500000+0 records out

real 1m52.738s
user 0m0.310s
sys 0m36.590s

time dd if=bigfile of=/dev/null bs=8k

time dd if=bigfile of=/dev/null bs=8k
500000+0 records in
500000+0 records out

real 4m38.742s
user 0m0.320s
sys 0m27.870s

FYI on your hardware raid comment... I easily get 50 megs a second on my
3ware controllers and faster on my LSI SATA controllers.

Sincerely,

Joshua D. Drake

>
--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Mark Wong" <markw(at)osdl(dot)org>
Cc:	"Patrick Welche" <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-29 04:03:20
Message-ID:	BF0EF898.A360%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

>>> Can you post the "time dd if=/dev/zero of=bigfile bs=8k count=500000"
>>> results? Also do the reverse (read the file) with "time dd if=bigfile
>>> of=/dev/null bs=8k".
>
> I didn't see this come across before... here ya go:
>
> time dd if=/dev/zero of=bigfile bs=8k count=500000
>
> 500000+0 records in
> 500000+0 records out
>
> real 1m52.738s
> user 0m0.310s
> sys 0m36.590s

So, that's 35MB/s, or 1/2 of a single disk drive.

> time dd if=bigfile of=/dev/null bs=8k
>
> time dd if=bigfile of=/dev/null bs=8k
> 500000+0 records in
> 500000+0 records out
>
> real 4m38.742s
> user 0m0.320s
> sys 0m27.870s

And that's 14MB/s, or < 1/4 of a single disk drive.

> FYI on your hardware raid comment... I easily get 50 megs a second on my
> 3ware controllers and faster on my LSI SATA controllers.

Then you are almost getting one disk worth of bandwidth.

By comparison, we get this using Linux software RAID on Xeon or Opteron:

$ time dd if=/dev/zero of=bigfile bs=8k count=500000
500000+0 records in
500000+0 records out

real 0m26.927s
user 0m0.074s
sys 0m8.769s

$ time dd if=bigfile of=/dev/null bs=8k
500000+0 records in
500000+0 records out

real 0m28.190s
user 0m0.039s
sys 0m8.349s

with less CPU usage than HW SCSI RAID controllers.

- Luke

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Mark Wong" <markw(at)osdl(dot)org>
Cc:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Patrick Welche" <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-29 04:07:12
Message-ID:	BF0EF980.A364%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Mark,

On 7/28/05 4:43 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:

> Are there any recommendations for Qlogic controllers on Linux, scsi or
> fiber channel? I might be able to my hands on some. I have pci-x slots
> for AMD, Itanium, or POWER5 if the architecture makes a difference.

I don't have a recommendation for a particular one, it's been too long
(1998) since I've used one with Linux. However, I'd like to see a
comparison between Emulex and Qlogic and a winner chosen. We've had some
apparent driver issues with a client running Emulex on Linux, even using
many different versions of the kernel.

- Luke

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Mark Wong <markw(at)osdl(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Patrick Welche <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-29 12:37:00
Message-ID:	200507291237.j6TCb0A10838@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Luke Lonergan wrote:
> Mark,
>
> On 7/28/05 4:43 PM, "Mark Wong" <markw(at)osdl(dot)org> wrote:
>
> > Are there any recommendations for Qlogic controllers on Linux, scsi or
> > fiber channel? I might be able to my hands on some. I have pci-x slots
> > for AMD, Itanium, or POWER5 if the architecture makes a difference.
>
> I don't have a recommendation for a particular one, it's been too long
> (1998) since I've used one with Linux. However, I'd like to see a
> comparison between Emulex and Qlogic and a winner chosen. We've had some
> apparent driver issues with a client running Emulex on Linux, even using
> many different versions of the kernel.

Where is the most recent version of the COPY patch?

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	"Mark Wong" <markw(at)osdl(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Patrick Welche" <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-07-29 17:54:08
Message-ID:	BF0FBB52.A401%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Bruce,

On 7/29/05 5:37 AM, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:

> Where is the most recent version of the COPY patch?

My direct e-mails aren't getting to you, they are trapped in a spam filter
on your end, so you didn't get my e-mail with the patch!

I've attached it here, sorry to the list owner for the patch inclusion /
off-topic.

- Luke

Attachment	Content-Type	Size
copy_parse_improvements_V15.patch.gz	application/octet-stream	18.4 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Mark Wong" <markw(at)osdl(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Patrick Welche" <prlw1(at)newn(dot)cam(dot)ac(dot)uk>, pgsql-performance(at)postgresql(dot)org, maryedie(at)osdl(dot)org
Subject:	Re: [PATCHES] COPY FROM performance improvements
Date:	2005-08-01 16:32:34
Message-ID:	10468.1122913954@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

"Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
> On 7/29/05 5:37 AM, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>> Where is the most recent version of the COPY patch?

> I've attached it here, sorry to the list owner for the patch inclusion /
> off-topic.

This patch appears to reverse out the most recent committed changes in
copy.c.

regards, tom lane

From:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-02 14:59:43
Message-ID:	BF1502A1.7BD0%agoldshuv@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

New patch attached. It includes very minor changes. These are changes that
were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
previous patch.

Alon.

Attachment	Content-Type	Size
copy_parse_improvements_V16.patch	application/octet-stream	77.0 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-06 21:04:52
Message-ID:	26467.1123362292@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> writes:
> New patch attached. It includes very minor changes. These are changes that
> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
> previous patch.

I've applied this with (rather extensive) revisions. I didn't like what
you had done with the control structure --- loading the input buffer
only at the outermost loop level was a bad design choice IMHO. You had
sprinkled the code with an unreasonable number of special cases in order
to try to cope with the effects of that mistake, but there were lots
of problems still left. Some of the bugs I noticed:

* Broke old-protocol COPY, since that has no provision for stopping at
the EOF marker except by parsing the data carefully to start with. The
backend would just hang up unless the total data size chanced to be a
multiple of 64K.

* Subtle change in interpretation of \. EOF marker (the existing code
will recognize it even when not at start of line).

* Seems to have thrown away detection of newline format discrepancies.

* Fails for zero-column tables.

* Broke display of column values during error context callback (would
always show the first column contents no matter which one is being
complained of).

* DetectLineEnd mistakenly assumes CR mode if very last character of first
bufferload is CR; need to reserve judgment until next char is available.

* DetectLineEnd fails to account for backslashed control characters,
so it will e.g. accept \ followed by \n as determining the newline
style.

* Fails to apply encoding conversion if first line exceeds copy buf
size, because when DetectLineEnd fails the quick-exit path doesn't do
it.

* There seem to be several bugs associated with the fact that input_buf[]
always has 64K of data in it even when EOF has been reached on the
input. One example:
echo -n 123 >zzz1
psql> create temp table t1(f1 text);
psql> copy t1 from '/home/tgl/zzz1';
psql> select * from t1;
hmm ... where'd that 64K of whitespace come from?

I rewrote the patch in a way that retained most of the speedups without
changing the basic control structure (except for replacing multiple
CopyReadAttribute calls with one CopyReadAttributes call per line).

I had some difficulty in generating test cases that weren't largely
I/O-bound, but AFAICT the patch as applied is about the same speed
as what you submitted.

regards, tom lane

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-07 02:33:07
Message-ID:	BF1AC0F3.AA43%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Tom,

Thanks for finding the bugs and reworking things.

> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.

You achieve the important objective of knocking the parsing stage down a
lot, but your parsing code is actually about 20% slower than Alon's.

Before your patch:
Time: 14205.606 ms

With your patch:
Time: 10565.374 ms

With Alon's patch:
Time: 10289.845 ms

The parsing part of the code in your version is slower, but as a percentage
of the total it's hidden. The loss of 0.3 seconds on 143MB means:

- If parsing takes a total of 0.9 seconds, the parsing rate is 160MB/s
(143/0.9)

- If we add another 0.3 seconds to parsing to bring it to 1.2, then the
parsing rate becomes 120MB/s

When we improve the next stages of the processing (attribute conversion,
write-to disk), this will stand out a lot more. Our objective is to get the
COPY rate *much* faster than the current poky rate of 14MB/s (after this
patch).

- Luke

On 8/6/05 2:04 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com> writes:
>> New patch attached. It includes very minor changes. These are changes that
>> were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the
>> previous patch.
>
> I've applied this with (rather extensive) revisions. I didn't like what
> you had done with the control structure --- loading the input buffer
> only at the outermost loop level was a bad design choice IMHO. You had
> sprinkled the code with an unreasonable number of special cases in order
> to try to cope with the effects of that mistake, but there were lots
> of problems still left. Some of the bugs I noticed:
>
> * Broke old-protocol COPY, since that has no provision for stopping at
> the EOF marker except by parsing the data carefully to start with. The
> backend would just hang up unless the total data size chanced to be a
> multiple of 64K.
>
> * Subtle change in interpretation of \. EOF marker (the existing code
> will recognize it even when not at start of line).
>
> * Seems to have thrown away detection of newline format discrepancies.
>
> * Fails for zero-column tables.
>
> * Broke display of column values during error context callback (would
> always show the first column contents no matter which one is being
> complained of).
>
> * DetectLineEnd mistakenly assumes CR mode if very last character of first
> bufferload is CR; need to reserve judgment until next char is available.
>
> * DetectLineEnd fails to account for backslashed control characters,
> so it will e.g. accept \ followed by \n as determining the newline
> style.
>
> * Fails to apply encoding conversion if first line exceeds copy buf
> size, because when DetectLineEnd fails the quick-exit path doesn't do
> it.
>
> * There seem to be several bugs associated with the fact that input_buf[]
> always has 64K of data in it even when EOF has been reached on the
> input. One example:
> echo -n 123 >zzz1
> psql> create temp table t1(f1 text);
> psql> copy t1 from '/home/tgl/zzz1';
> psql> select * from t1;
> hmm ... where'd that 64K of whitespace come from?
>
> I rewrote the patch in a way that retained most of the speedups without
> changing the basic control structure (except for replacing multiple
> CopyReadAttribute calls with one CopyReadAttributes call per line).
>
> I had some difficulty in generating test cases that weren't largely
> I/O-bound, but AFAICT the patch as applied is about the same speed
> as what you submitted.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-07 02:54:04
Message-ID:	BF1AC5DC.AA48%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Tom,

The previous timings were for a table with 15 columns of mixed type. We
also test with 1 column to make the parsing overhead more apparent. In the
case of 1 text column with 145MB of input data:

Your patch:
Time: 6612.599 ms

Alon's patch:
Time: 6119.244 ms

Alon's patch is 7.5% faster here, where it was only 3% faster on the 15
column case. This is consistent with a large difference in parsing speed
between your approach and Alon's.

I'm pretty sure that the "mistake" you refer to is responsible for the speed
improvement, and was deliberately chosen to minimize memory copies, etc.
Given that we're looking ahead to getting much higher speeds, approaching
current high performance disk speeds, we've been looking more closely at the
parsing speed. It comes down to a tradeoff between elegant code and speed.

We'll prove it in lab tests soon, where we measure the parsing rate
directly, but these experiments show it clearly, though indirectly.

- Luke

On 8/6/05 2:04 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-07 03:25:24
Message-ID:	BF1ACD34.AC4D%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Tom,

My direct e-mails to you are apparently blocked, so I'll send this to the
list.

I've attached the case we use for load performance testing, with the data
generator modified to produce a single row version of the dataset.

I do believe that you/we will need to invert the processing loop to get the
maximum parsing speed. We will be implementing much higher loading speeds
which require it to compete with Oracle, Netezza, Teradata, so we'll have to
work this out for the best interests of our users.

- Luke

Attachment	Content-Type	Size
IVP.tgz	application/octet-stream	13.4 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-07 04:08:16
Message-ID:	29007.1123387696@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

"Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>> I had some difficulty in generating test cases that weren't largely
>> I/O-bound, but AFAICT the patch as applied is about the same speed
>> as what you submitted.

> You achieve the important objective of knocking the parsing stage down a
> lot, but your parsing code is actually about 20% slower than Alon's.

I would like to see the exact test case you are using to make this
claim; the tests I did suggested my code is the same speed or faster.
The particular test case I was using was the "tenk1" data from the
regression database, duplicated out to about 600K rows so as to run
long enough to measure with some degree of repeatability.

As best I can tell, my version of CopyReadAttributes is significantly
quicker than Alon's, approximately balancing out the fact that my
version of CopyReadLine is slower. I did the latter first, and would
now be tempted to rewrite it in the same style as CopyReadAttributes,
ie one pass of memory-to-memory copy using pointers rather than buffer
indexes.

BTW, late today I figured out a way to get fairly reproducible
non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
the actual inserts, thus:

create table foo ...
create function noway() returns trigger as
'begin return null; end' language plpgsql;
create trigger noway before insert on foo
for each row execute procedure noway();
then repeat:
copy foo from '/tmp/foo.data';

If the source file is not too large to fit in kernel disk cache, then
after the first iteration there is no I/O at all. I got numbers
that were reproducible within less than 1%, as opposed to 5% or more
variation when the thing was partially I/O bound. Pretty useless in the
real world, of course, but great for timing COPY's data-pushing.

regards, tom lane

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-07 05:21:03
Message-ID:	BF1AE84F.AC5A%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Tom,

On 8/6/05 9:08 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>>> I had some difficulty in generating test cases that weren't largely
>>> I/O-bound, but AFAICT the patch as applied is about the same speed
>>> as what you submitted.
>
>> You achieve the important objective of knocking the parsing stage down a
>> lot, but your parsing code is actually about 20% slower than Alon's.
>
> I would like to see the exact test case you are using to make this
> claim; the tests I did suggested my code is the same speed or faster.

I showed mine - you show yours :-) Apparently our e-mail crossed.

> As best I can tell, my version of CopyReadAttributes is significantly
> quicker than Alon's, approximately balancing out the fact that my
> version of CopyReadLine is slower. I did the latter first, and would
> now be tempted to rewrite it in the same style as CopyReadAttributes,
> ie one pass of memory-to-memory copy using pointers rather than buffer
> indexes.

See previous timings - looks like Alon's parsing is substantially faster.
However, I'd like him to confirm by running with the "shunt" placed at
different stages, in this case between parse and attribute conversion (not
attribute parse).

> BTW, late today I figured out a way to get fairly reproducible
> non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
> the actual inserts, thus:
>
> create table foo ...
> create function noway() returns trigger as
> 'begin return null; end' language plpgsql;
> create trigger noway before insert on foo
> for each row execute procedure noway();
> then repeat:
> copy foo from '/tmp/foo.data';

Cool! That's a better way than hacking code and inserting shunts.

Alon will likely hit this tomorrow.

- Luke

From:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
To:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-10 00:41:18
Message-ID:	BF1E9B3E.803C%agoldshuv@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

I did some performance checks after the recent code commit.

The good news is that the parsing speed of COPY is now MUCH faster, which is
great. It is about 5 times faster - about 100MB/sec on my machine
(previously 20MB/sec at best, usually less).

The better news is that my original patch parsing speed reaches 120MB/sec,
about 20MB/sec faster than the code that's now in CVS. This can be
significant for the long scheme of things and for large data sets. Maybe we
can improve the current code a bit more to reach this number.

I performed those measurement by executing *only the parsing logic* of the
COPY pipeline. All data conversion (functioncall3(string...)) and tuple
handling (form_heaptuple etc...) and insertion were manually disabled. So
the only code measured is reading from disk and parsing to the attribute
level.

Cheers,
Alon.

On 8/7/05 1:21 AM, "Luke Lonergan" <llonergan(at)greenplum(dot)com> wrote:

> Tom,
>
> On 8/6/05 9:08 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> "Luke Lonergan" <llonergan(at)greenplum(dot)com> writes:
>>>> I had some difficulty in generating test cases that weren't largely
>>>> I/O-bound, but AFAICT the patch as applied is about the same speed
>>>> as what you submitted.
>>
>>> You achieve the important objective of knocking the parsing stage down a
>>> lot, but your parsing code is actually about 20% slower than Alon's.
>>
>> I would like to see the exact test case you are using to make this
>> claim; the tests I did suggested my code is the same speed or faster.
>
> I showed mine - you show yours :-) Apparently our e-mail crossed.
>
>> As best I can tell, my version of CopyReadAttributes is significantly
>> quicker than Alon's, approximately balancing out the fact that my
>> version of CopyReadLine is slower. I did the latter first, and would
>> now be tempted to rewrite it in the same style as CopyReadAttributes,
>> ie one pass of memory-to-memory copy using pointers rather than buffer
>> indexes.
>
> See previous timings - looks like Alon's parsing is substantially faster.
> However, I'd like him to confirm by running with the "shunt" placed at
> different stages, in this case between parse and attribute conversion (not
> attribute parse).
>
>> BTW, late today I figured out a way to get fairly reproducible
>> non-I/O-bound numbers about COPY FROM: use a trigger that suppresses
>> the actual inserts, thus:
>>
>> create table foo ...
>> create function noway() returns trigger as
>> 'begin return null; end' language plpgsql;
>> create trigger noway before insert on foo
>> for each row execute procedure noway();
>> then repeat:
>> copy foo from '/tmp/foo.data';
>
> Cool! That's a better way than hacking code and inserting shunts.
>
> Alon will likely hit this tomorrow.
>
> - Luke
>

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Alon Goldshuv <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-10 01:01:38
Message-ID:	42F951F2.8010104@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Alon Goldshuv wrote:

>I performed those measurement by executing *only the parsing logic* of the
>COPY pipeline. All data conversion (functioncall3(string...)) and tuple
>handling (form_heaptuple etc...) and insertion were manually disabled. So
>the only code measured is reading from disk and parsing to the attribute
>level.
>
>

Arguably this might exaggerate the effect quite significantly. Users
will want to know the real time effect on a complete COPY. Depending on
how much the pasing is in the total time your 20% improvement in parsing
might only be a small fraction of 20% improvement in COPY.

Like you, I'm happy we have seen a 5 times improvement in parsing. Is it
possible you can factor out something smallish from your patch that
might make up the balance?

cheers

andrew

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>
Cc:	pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-10 04:39:55
Message-ID:	BF1ED32B.C206%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Andrew,

> Arguably this might exaggerate the effect quite significantly. Users
> will want to know the real time effect on a complete COPY. Depending on
> how much the pasing is in the total time your 20% improvement in parsing
> might only be a small fraction of 20% improvement in COPY.

Arguably has already been argued. We knew this because we profiled the
entire COPY process and came up with this approx. breakdown for a specific
case:
Parsing: 25%
Attribute Conversion: 40%
Data Insertion: 35%

Net copy rate: 8 MB/s on a filesystem that does 240 MB/s

So - if we speed up parsing by 500% or 450%, the end result is about a
20-30% speed increase in the overall process.

Note that we're still a *long* way from getting anywhere near the limit of
the I/O subsystem at 12 MB/s. Oracle can probably get 5-8 times this data
rate, if not better.

The attribute conversion logic is also very slow and needs similar
improvements.

The reason we focused first on Parsing is that our MPP version of Bizgres
will reach data loading rates approaching the parsing speed, so we needed to
improve that part to get it out of the way.

We will continue to improve COPY speed in Bizgres so that we can provide
comparable COPY performance to Oracle and MySQL.

> Like you, I'm happy we have seen a 5 times improvement in parsing. Is it
> possible you can factor out something smallish from your patch that
> might make up the balance?

That parsing was 25% of the workload was traceable to a 3 main things:
1) Per character acquisition from source instead of buffering
2) Frequent interruption of the parsing pipeline to handle attribute
conversion
3) Lack of micro-parallelism in the character finding logic

Tom's patch took our contribution from (1) and (2) and his improvements, and
he rejected (3). The net result is that we lost performance from the lack
of (3) but gained performance from his improvements of (1) and (2).

I believe that re-introducing (3) may bring us from 100 MB/s to 150 MB/s
parsing speed.

- Luke

From:	"Luke Lonergan" <llonergan(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-10 04:48:02
Message-ID:	BF1ED512.C208%llonergan@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Tom,

> As best I can tell, my version of CopyReadAttributes is significantly
> quicker than Alon's, approximately balancing out the fact that my
> version of CopyReadLine is slower. I did the latter first, and would
> now be tempted to rewrite it in the same style as CopyReadAttributes,
> ie one pass of memory-to-memory copy using pointers rather than buffer
> indexes.

I think you are right, with the exception that Alon's results prove out that
the net result of your patch is 20% slower than his.

I think with your speedup of CopyReadAttributes and some additional work on
CopyReadLine the net result could be 50% faster than Alon's patch.

The key thing that is missing is the lack of micro-parallelism in the
character processing in this version. By "inverting the loop", or putting
the characters into a buffer on the outside, then doing fast character
scanning inside with special "fix-up" cases, we exposed long runs of
pipeline-able code to the compiler.

I think there is another way to accomplish the same thing and still preserve
the current structure, but it requires "strip mining" the character buffer
into chunks that can be processed with an explicit loop to check for the
different characters. While it may seem artificial (it is), it will provide
the compiler with the ability to pipeline the character finding logic over
long runs. The other necessary element will have to avoid pipeline stalls
from the "if" conditions as much as possible.

Anyway, thanks for reviewing this code and improving it - it's important to
bring speed increases to our collective customer base. With Bizgres, we're
not satisfied with 12 MB/s, we won't stop until we saturate the I/O bus, so
we may get more extreme with the code than seems reasonable for the general
audience.

- Luke

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-10 08:15:00
Message-ID:	1123661700.3670.621.camel@localhost.localdomain
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

On Tue, 2005-08-09 at 21:48 -0700, Luke Lonergan wrote:

> The key thing that is missing is the lack of micro-parallelism in the
> character processing in this version. By "inverting the loop", or putting
> the characters into a buffer on the outside, then doing fast character
> scanning inside with special "fix-up" cases, we exposed long runs of
> pipeline-able code to the compiler.
>
> I think there is another way to accomplish the same thing and still preserve
> the current structure, but it requires "strip mining" the character buffer
> into chunks that can be processed with an explicit loop to check for the
> different characters. While it may seem artificial (it is), it will provide
> the compiler with the ability to pipeline the character finding logic over
> long runs. The other necessary element will have to avoid pipeline stalls
> from the "if" conditions as much as possible.

This is a key point, IMHO.

That part of the code was specifically written to take advantage of
processing pipelines in the hardware, not because the actual theoretical
algorithm for that approach was itself faster.

Nobody's said what compiler/hardware they have been using, so since both
Alon and Tom say their character finding logic is faster, it is likely
to be down to that? Name your platforms gentlemen, please.

My feeling is that we may learn something here that applies more widely
across many parts of the code.

Best Regards, Simon Riggs

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Luke Lonergan <llonergan(at)greenplum(dot)com>, Alon Goldshuv <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject:	Re: COPY FROM performance improvements
Date:	2005-08-10 15:29:49
Message-ID:	26482.1123687789@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-patches pgsql-performance

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Nobody's said what compiler/hardware they have been using, so since both
> Alon and Tom say their character finding logic is faster, it is likely
> to be down to that? Name your platforms gentlemen, please.

I tested on HPPA with gcc 2.95.3 and on a Pentium 4 with gcc 3.4.3.
Got pretty much the same results on both.

regards, tom lane