Re: Simplify formatting.c

Lists: pgsql-patches
From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Simplify formatting.c
Date: 2008-05-20 01:41:40
Message-ID: 200805200141.m4K1feb16574@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Now that upper/lower/initcase do not modify the passed string, I have
simplified formatting.c with the attached patch.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 7.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-05-20 02:29:04
Message-ID: 13542.1211250544@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Now that upper/lower/initcase do not modify the passed string, I have
> simplified formatting.c with the attached patch.

I was thinking the same thing while reading the patch. But please,
make str_toupper() and friends declare their argument "const" if you're
going to do this.

Another issue in this area is that these functions could do with some
refactoring to eliminate useless text/cstring conversions; I'm pretty
sure some code paths are doing cstring->text->cstring in direct
succession. Also, it seems a bit inconsistent to be relying on
oracle_compat.c for upper/lower but not initcap.

regards, tom lane


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-05-20 14:56:15
Message-ID: 4832E68F.2030805@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:

> Also, it seems a bit inconsistent to be relying on
> oracle_compat.c for upper/lower but not initcap.
>
I saw this inconsistence while I'm doing the patch. What about moving
that upper/lower/initcap and wcs* code to another file. pg_locale.c?
BTW, formatting.c and oracle_compat.c already include pg_locale.h.

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-05-20 23:09:37
Message-ID: 7221.1211324977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> Tom Lane wrote:
>> Also, it seems a bit inconsistent to be relying on
>> oracle_compat.c for upper/lower but not initcap.
>>
> I saw this inconsistence while I'm doing the patch. What about moving
> that upper/lower/initcap and wcs* code to another file. pg_locale.c?

That doesn't seem a particularly appropriate place for them. pg_locale
is about dealing with the locale state, not about doing actual
operations based on the locale data.

I was just thinking of having oracle_compat expose an initcap routine.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Euler Taveira de Oliveira <euler(at)timbira(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-06-14 00:15:59
Message-ID: 200806140015.m5E0FxX02327@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> > Tom Lane wrote:
> >> Also, it seems a bit inconsistent to be relying on
> >> oracle_compat.c for upper/lower but not initcap.
> >>
> > I saw this inconsistence while I'm doing the patch. What about moving
> > that upper/lower/initcap and wcs* code to another file. pg_locale.c?
>
> That doesn't seem a particularly appropriate place for them. pg_locale
> is about dealing with the locale state, not about doing actual
> operations based on the locale data.
>
> I was just thinking of having oracle_compat expose an initcap routine.

You mean like the attached?

I moved str_initcap() over into oracle_compat.c and then had initcap()
convert to/from TEXT to call it. The code is a little weird because
str_initcap() needs to convert to text to use texttowcs(), so in
multibyte encodings initcap converts the string to text, then to char,
then to text to call texttowcs(). I didn't see a cleaner way to do
this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/pgpatches/initcap text/x-diff 5.8 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-06-14 21:29:42
Message-ID: 20080614212942.GD8519@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian wrote:

> I moved str_initcap() over into oracle_compat.c and then had initcap()
> convert to/from TEXT to call it. The code is a little weird because
> str_initcap() needs to convert to text to use texttowcs(), so in
> multibyte encodings initcap converts the string to text, then to char,
> then to text to call texttowcs(). I didn't see a cleaner way to do
> this.

Why not use wchar2char? It seems there's room for extra cleanup here.

Also, the prototype of str_initcap in builtins.h looks out of place.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-06-17 16:10:35
Message-ID: 200806171610.m5HGAZ901651@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Alvaro Herrera wrote:
> Bruce Momjian wrote:
>
> > I moved str_initcap() over into oracle_compat.c and then had initcap()
> > convert to/from TEXT to call it. The code is a little weird because
> > str_initcap() needs to convert to text to use texttowcs(), so in
> > multibyte encodings initcap converts the string to text, then to char,
> > then to text to call texttowcs(). I didn't see a cleaner way to do
> > this.
>
> Why not use wchar2char? It seems there's room for extra cleanup here.
>
> Also, the prototype of str_initcap in builtins.h looks out of place.

I talked to Alvaro on IM, and there is certainly much more cleanup to do
in this area. I will work from the bottom up. First, is moving the
USE_WIDE_UPPER_LOWER define to c.h, and removing TS_USE_WIDE and using
USE_WIDE_UPPER_LOWER instead. Patch attached and applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 9.7 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-18 18:43:42
Message-ID: 200806181843.m5IIhgm03448@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> >
> > > I moved str_initcap() over into oracle_compat.c and then had initcap()
> > > convert to/from TEXT to call it. The code is a little weird because
> > > str_initcap() needs to convert to text to use texttowcs(), so in
> > > multibyte encodings initcap converts the string to text, then to char,
> > > then to text to call texttowcs(). I didn't see a cleaner way to do
> > > this.
> >
> > Why not use wchar2char? It seems there's room for extra cleanup here.
> >
> > Also, the prototype of str_initcap in builtins.h looks out of place.
>
> I talked to Alvaro on IM, and there is certainly much more cleanup to do
> in this area. I will work from the bottom up. First, is moving the
> USE_WIDE_UPPER_LOWER define to c.h, and removing TS_USE_WIDE and using
> USE_WIDE_UPPER_LOWER instead. Patch attached and applied.

The second step is to move wchar2char() and char2wchar() from tsearch
into /mb to be easier to use for other modules; also move pnstrdup().

Patch attached and applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 11.4 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-21 20:06:45
Message-ID: 200806212006.m5LK6jX07815@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian wrote:
> Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > >
> > > > I moved str_initcap() over into oracle_compat.c and then had initcap()
> > > > convert to/from TEXT to call it. The code is a little weird because
> > > > str_initcap() needs to convert to text to use texttowcs(), so in
> > > > multibyte encodings initcap converts the string to text, then to char,
> > > > then to text to call texttowcs(). I didn't see a cleaner way to do
> > > > this.
> > >
> > > Why not use wchar2char? It seems there's room for extra cleanup here.
> > >
> > > Also, the prototype of str_initcap in builtins.h looks out of place.
> >
> > I talked to Alvaro on IM, and there is certainly much more cleanup to do
> > in this area. I will work from the bottom up. First, is moving the
> > USE_WIDE_UPPER_LOWER define to c.h, and removing TS_USE_WIDE and using
> > USE_WIDE_UPPER_LOWER instead. Patch attached and applied.
>
> The second step is to move wchar2char() and char2wchar() from tsearch
> into /mb to be easier to use for other modules; also move pnstrdup().

The third step is for oracle_compat.c::initcap() to use
formatting.c::str_initcap(). You can see the result; patch attached
(not applied).

This greatly reduces the size of initcap(), with the downside that we
are making two extra copies of the string to convert it to/from char*.

Is this acceptable? If it is I will do the same for uppper()/lower()
with similar code size reduction and modularity.

If not perhaps I should keep the non-multibyte code in initcap() and
have only the multi-byte use str_initcap().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/pgpatches/initcap3 text/x-diff 4.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-22 00:54:07
Message-ID: 28031.1214096047@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> The third step is for oracle_compat.c::initcap() to use
> formatting.c::str_initcap(). You can see the result; patch attached
> (not applied).

> This greatly reduces the size of initcap(), with the downside that we
> are making two extra copies of the string to convert it to/from char*.

> Is this acceptable?

I'd say not. Can't we do some more refactoring and avoid so many
useless conversions? Seems like str_initcap is the wrong primitive API
--- the work ought to be done by a function that takes a char pointer
and a length. That would be a suitable basis for functions operating
on both text datums and C strings.

(Perhaps what I should be asking is whether the performance of upper()
and lower() is equally bad. Certainly all three should have comparable
code, so maybe they all need refactoring.)

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-22 01:49:33
Message-ID: 200806220149.m5M1nXa16104@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > The third step is for oracle_compat.c::initcap() to use
> > formatting.c::str_initcap(). You can see the result; patch attached
> > (not applied).
>
> > This greatly reduces the size of initcap(), with the downside that we
> > are making two extra copies of the string to convert it to/from char*.
>
> > Is this acceptable?
>
> I'd say not. Can't we do some more refactoring and avoid so many
> useless conversions? Seems like str_initcap is the wrong primitive API
> --- the work ought to be done by a function that takes a char pointer
> and a length. That would be a suitable basis for functions operating
> on both text datums and C strings.

Yea, I thought about that idea too but it is going to add a strlen()
calls in some places, but not in critical ones.

> (Perhaps what I should be asking is whether the performance of upper()
> and lower() is equally bad. Certainly all three should have comparable
> code, so maybe they all need refactoring.)

Yes, they do. I will work on the length idea and see how that goes.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-22 01:59:24
Message-ID: 29109.1214099964@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> I'd say not. Can't we do some more refactoring and avoid so many
>> useless conversions? Seems like str_initcap is the wrong primitive API
>> --- the work ought to be done by a function that takes a char pointer
>> and a length. That would be a suitable basis for functions operating
>> on both text datums and C strings.

> Yea, I thought about that idea too but it is going to add a strlen()
> calls in some places, but not in critical ones.

Sure, but the cost-per-byte of the strlen should be a good bit less than
the cost-per-byte of the actual conversion, so that doesn't bother me
too much.

Actually it seems like the hard part is not so much the input
representation as the output representation --- what should the
base-level initcap routine return, to be reasonably efficient for
both cases?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-22 02:43:28
Message-ID: 200806220243.m5M2hS322144@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> I'd say not. Can't we do some more refactoring and avoid so many
> >> useless conversions? Seems like str_initcap is the wrong primitive API
> >> --- the work ought to be done by a function that takes a char pointer
> >> and a length. That would be a suitable basis for functions operating
> >> on both text datums and C strings.
>
> > Yea, I thought about that idea too but it is going to add a strlen()
> > calls in some places, but not in critical ones.
>
> Sure, but the cost-per-byte of the strlen should be a good bit less than
> the cost-per-byte of the actual conversion, so that doesn't bother me
> too much.
>
> Actually it seems like the hard part is not so much the input
> representation as the output representation --- what should the
> base-level initcap routine return, to be reasonably efficient for
> both cases?

I hadn't gotten to trying it out yet, but I can see the output being a
problem. You can't even really pre-allocate the storage before passing
it because you don't know the length after case change. You could pass
back a char* and repalloc to get the varlena header in there but that is
very messy.

Add to that that the multi-byte case also has to be converted to wide
characters, so you have text -> char * -> wide chars -> char * -> text
for the most complex case.

I am starto to think that the simplest case is to keep the single-copy
version in there for single-byte encodings and not worry about the
overhead of the multi-byte case.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-23 02:21:19
Message-ID: 200806230221.m5N2LJI16272@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian wrote:
> > Actually it seems like the hard part is not so much the input
> > representation as the output representation --- what should the
> > base-level initcap routine return, to be reasonably efficient for
> > both cases?
>
> I hadn't gotten to trying it out yet, but I can see the output being a
> problem. You can't even really pre-allocate the storage before passing
> it because you don't know the length after case change. You could pass
> back a char* and repalloc to get the varlena header in there but that is
> very messy.
>
> Add to that that the multi-byte case also has to be converted to wide
> characters, so you have text -> char * -> wide chars -> char * -> text
> for the most complex case.
>
> I am starting to think that the simplest case is to keep the single-copy
> version in there for single-byte encodings and not worry about the
> overhead of the multi-byte case.

My new idea is if we pass the length to str_initcap, we can eliminate
the string copy from text to char *. That leaves us with just one extra
string copy from char * to text, which seems acceptable. We still have
the wide char copy but I don't see any easy way to eliminate that
because the multi-byte code is complex and not something we want to
duplicate.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Subject: Re: Simplify formatting.c
Date: 2008-06-23 19:28:29
Message-ID: 200806231928.m5NJSTE01812@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Bruce Momjian wrote:
> > I am starting to think that the simplest case is to keep the single-copy
> > version in there for single-byte encodings and not worry about the
> > overhead of the multi-byte case.
>
> My new idea is if we pass the length to str_initcap, we can eliminate
> the string copy from text to char *. That leaves us with just one extra
> string copy from char * to text, which seems acceptable. We still have
> the wide char copy but I don't see any easy way to eliminate that
> because the multi-byte code is complex and not something we want to
> duplicate.

I ended up going in this direction, and did the same for upper and
lower. Patch attached and applied. I don't see any other cleanups in
this area.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 29.1 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Simplify formatting.c
Date: 2008-06-23 19:30:19
Message-ID: 200806231930.m5NJUJm07912@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Euler Taveira de Oliveira wrote:
> Tom Lane wrote:
>
> > Also, it seems a bit inconsistent to be relying on
> > oracle_compat.c for upper/lower but not initcap.
> >
> I saw this inconsistence while I'm doing the patch. What about moving
> that upper/lower/initcap and wcs* code to another file. pg_locale.c?
> BTW, formatting.c and oracle_compat.c already include pg_locale.h.

I researched this idea but is seems pg_locale.c contains only
locale-specific stuff, while these functions have locale and non-locale
versions; I ended up moving the common stuff into formatting.c.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +