Re: patch (for 9.1) string functions

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-21 06:29:49
Message-ID: AANLkTikmytco6sgwM2jLHLvrrf4uQo9lNoGMMj9KZyyO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2010/7/21 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> I reviewed the core changes of the patch. I don't think we need
> mb_string_info() at all. Instead, we can just call pg_mbxxx() functions.
>
> I rewrote the patch to use pg_mbstrlen_with_len() and pg_mbcharcliplen().
> What do you think the changes? It requires re-counting lengths of multi-byte
> strings in some cases, but the code will be much simpler and can avoid
> allocating length buffers.
>

It is a good idea. I see a problem only for "right" function, where
for most common use case a mblen will be called two times. I am not
able to say now, if this can be a performance issue or not. Highly
probably not - only for very large strings.

postgres=# create or replace function randomstr(int) returns text as
$$select string_agg(substring('abcdefghijklmnop' from
trunc(random()*13)::int+1 for 1),'') from generate_series(1,$1) $$
language sql;
CREATE FUNCTION
Time: 27,452 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5615,061 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5606,937 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5630,771 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5753,063 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)
Time: 5755,776 ms

It is about 2% slower for UTF8 encoding. So it isn't significant for me.

I agree with your changes. Thank You very much

Regards

Pavel Stehule

> I'd like to apply contrib/stringinfo apart from the core changes,
> because there seems to be still some idea to improve sprintf().
>
> --
> Itagaki Takahiro
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-07-21 06:52:58 Re: Synchronous replication
Previous Message Pavel Stehule 2010-07-21 05:51:24 Re: patch: to_string, to_array functions