Collation patch's handling of wcstombs/mbstowcs is sheerest fantasy

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Collation patch's handling of wcstombs/mbstowcs is sheerest fantasy
Date: 2011-04-22 20:32:30
Message-ID: 9844.1303504350@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I just noticed that the collation patch has modified char2wchar and
wchar2char to accept a collation OID as argument ... but it hasn't done
anything to make those arguments actually work. Since those functions
depend on wcstombs and mbstowcs, which respond to LC_CTYPE and nothing
else, this flat out does not work in non-default collations. What's
more, there doesn't seem to be any such thing as wcstombs_l or
mbstowcs_l (at least my Fedora box hasn't got them), so this can't be
fixed within the available glibc API.

Right at the moment this only affects str_tolower, str_toupper, and
str_initcap; there are other uses of these functions in the text search
code, but those always pass DEFAULT_COLLATION_OID.

It's possible that things are not too broken in practice, because it's
likely that the transformations done by these functions only depend on
the encoding indicated by LC_CTYPE, and we (try to) enforce that all
locales used in a given database match the database encoding. Still,
that's a rather shaky chain of reasoning.

The complete lack of code comments on this doesn't make me any happier
--- in fact, the comments for char2wchar and wchar2char still claim that
they have the same API as wcstombs and mbstowcs, which can hardly be
considered true when they don't even have the same argument lists.

Any thoughts what to do about this?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-22 20:50:44 Re: "stored procedures"
Previous Message Merlin Moncure 2011-04-22 20:21:59 Re: "stored procedures"