Quick Links

Re: Bug in UTF8-Validation Code?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Michael Fuhr <mike(at)fuhr(dot)org>, Mario Weilguni <mweilguni(at)sime(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-03-17 21:58:21
Message-ID:	18930.1174168701@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Here are some timing tests in 1m rows of random utf8 encoded 100 char
> data. It doesn't look to me like the saving you're suggesting is worth
> the trouble.

Hmm ... not sure I believe your numbers. Using a test file of 1m lines
of 100 random latin1 characters converted to utf8 (thus, about half and
half 7-bit ASCII and 2-byte utf8 characters), I get this in SQL_ASCII
encoding:

regression=# \timing
Timing is on.
regression=# create temp table test(f1 text);
CREATE TABLE
Time: 5.047 ms
regression=# copy test from '/home/tgl/zzz1m';
COPY 1000000
Time: 4337.089 ms

and this in UTF8 encoding:

utf8=# \timing
Timing is on.
utf8=# create temp table test(f1 text);
CREATE TABLE
Time: 5.108 ms
utf8=# copy test from '/home/tgl/zzz1m';
COPY 1000000
Time: 7776.583 ms

The numbers aren't super repeatable, but it sure looks to me like the
encoding check adds at least 50% to the runtime in this example; so
doing it twice seems unpleasant.

(This is CVS HEAD, compiled without assert checking, on an x86_64
Fedora Core 6 box.)

regards, tom lane

In response to

Re: Bug in UTF8-Validation Code? at 2007-03-17 20:28:53 from Andrew Dunstan

Responses

Re: Bug in UTF8-Validation Code? at 2007-03-17 23:09:07 from Andrew Dunstan
Re: Bug in UTF8-Validation Code? at 2007-03-18 04:03:58 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2007-03-17 23:09:07	Re: Bug in UTF8-Validation Code?
Previous Message	Jan Wieck	2007-03-17 21:53:58	Re: As proposed the complete changes to pg_trigger and pg_rewrite