Re: locale

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgman(at)candle(dot)pha(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: locale
Date: 2004-04-08 15:47:25
Message-ID: 14589.1081439245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> writes:
> On Thu, 8 Apr 2004, Tom Lane wrote:
>> No, the ordering *will* be the same as it was before, because strcoll()
>> is still functioning the same. You'd get the same answer from a sort
>> operation since it depends on the same operators.

> But, now when we compare these strings as latin1 strings
> it's no longer the case that c3 84 72 61 > c3 85 6b 65. As latin1 strings
> we compare each character and c3 = c3, and then 84 < 85 (in latin1 84
> and 85 are some control characters).

You're missing the point: strcoll() is not going to compare them as
latin1 strings. It's going to interpret the bytes as utf-8 strings,
because that's what LC_CTYPE will tell it to do. So the sort ordering
of any particular byte string remains the same as it was before, and
the index does not become corrupt.

Whether the index is delivering answers that you find useful is a whole
different question ;-). For example, if you do a "WHERE col = 'foo'"
type of query, you'll be presenting the latin1 encoding of 'foo', which
may well not equal the utf-8 encoding of 'foo', meaning you won't find
that row even if it exists. However this would be true whether you used
the index or not --- it's really a data failure and not an index failure.

> a) What have we gained by copying this table into the latin1 database.
> It looks broken to me.

It looks broken to me too, in terms of user functionality. I was simply
responding to your assertion that the indexes will be corrupt. They
won't be.

AFAICS, to support per-database encoding and locale correctly, CREATE
DATABASE would have to be prepared to re-encode *and* re-index every
textual column in the copied database. I don't really foresee us going
to that much work in order to have a solution that's still half-baked
and non-spec-compliant. It's much more likely that per-column locale
and encoding will get done instead.

regards, tom lane

In response to

  • Re: locale at 2004-04-08 15:31:59 from Dennis Bjorklund

Responses

  • Re: locale at 2004-04-08 19:19:32 from Dennis Bjorklund

Browse pgsql-hackers by date

  From Date Subject
Next Message scott.marlowe 2004-04-08 16:26:03 Re: make == as = ?
Previous Message Tom Lane 2004-04-08 15:32:19 Re: PostgreSQL configuration