Re: CopyReadLineText optimization

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: CopyReadLineText optimization
Date: 2008-03-03 19:42:16
Message-ID: 200803031942.m23JgG129730@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Heikki Linnakangas wrote:
> > Attached is a patch that modifies CopyReadLineText so that it uses
> > memchr to speed up the scan. The nice thing about memchr is that we can
> > take advantage of any clever optimizations that might be in libc or
> > compiler.
>
> Here's an updated version of the patch. The principle is the same, but
> the same optimization is now used for CSV input as well, and there's
> more comments.
>
> I still need to do more benchmarking. I mentioned a ~5% speedup on the
> test I ran earlier, which was a load of the lineitem table from TPC-H.
> It looks like with cheaper data types the gain can be much bigger;
> here's an oprofile from loading the TPC-H partsupp table,
>
> Before:
>
> samples % image name symbol name
> 5146 25.7635 postgres CopyReadLine
> 4089 20.4716 postgres DoCopy
> 1449 7.2544 reiserfs (no symbols)
> 1369 6.8539 postgres pg_verify_mbstr_len
> 1013 5.0716 libc-2.7.so memcpy
> 749 3.7499 libc-2.7.so ____strtod_l_internal
> 598 2.9939 postgres heap_formtuple
> 548 2.7436 libc-2.7.so ____strtol_l_internal
> 403 2.0176 libc-2.7.so memset
> 309 1.5470 libc-2.7.so strlen
> 208 1.0414 postgres AllocSetAlloc
> ...
>
> After:
>
> samples % image name symbol name
> 4165 25.7879 postgres DoCopy
> 1574 9.7455 postgres pg_verify_mbstr_len
> 1520 9.4112 reiserfs (no symbols)
> 1005 6.2225 libc-2.7.so memchr
> 986 6.1049 libc-2.7.so memcpy
> 632 3.9131 libc-2.7.so ____strtod_l_internal
> 589 3.6468 postgres heap_formtuple
> 546 3.3806 libc-2.7.so ____strtol_l_internal
> 386 2.3899 libc-2.7.so memset
> 366 2.2661 postgres CopyReadLine
> 287 1.7770 libc-2.7.so strlen
> 215 1.3312 postgres LWLockAcquire
> 208 1.2878 postgres hash_any
> 176 1.0897 postgres LWLockRelease
> 161 0.9968 postgres InputFunctionCall
> 157 0.9721 postgres AllocSetAlloc
> ...
>
> Profile shows that with the patch, ~8.5% of the CPU time is spent in
> CopyReadLine+memchr, vs. 25.5% before. That's a quite significant speedup.
>
> I still need to test the worst-case performance, with input that has a
> lot of escapes. It would be interesting to hear reports with this patch
> from people on different platforms. These results are from my laptop
> with 32-bit Intel CPU, running Linux. There could be big differences in
> the memchr implementations.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2008-03-03 19:58:10 Re: A couple of PG schedule reminders
Previous Message Bruce Momjian 2008-03-03 19:39:26 Re: pgsql: Don't build the win32 support files in the all target, only in

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2008-03-03 20:59:40 Re: Bulk Insert tuning
Previous Message Bruce Momjian 2008-03-03 19:17:46 Re: [HACKERS] new warning message