Re: PATCH: CITEXT 2.0

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: PATCH: CITEXT 2.0
Date: 2008-07-07 20:15:03
Message-ID: 8E2D49F2-E366-4504-9428-0AB6F35468FA@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 7, 2008, at 12:46, Zdenek Kotala wrote:

>> So, the upshot is that the = and <> operators are not locale-aware,
>> yes? They just do byte comparisons. Is that really the way it
>> should be? I mean, could there not be strings that are equivalent
>> but have different bytes?
>
> Correct. The problem is complex. It works fine only for normalized
> string. But postgres now assume that all utf8 strings are normalized.

I see. So binary equivalence is okay, in that case.

> If you need to implement < <= >= > operators you need to use strcol
> which take care of locale collation.

Which varstr_cmp() does, I guess. It's what textlt uses, for example.

> See unicode collation algorithm http://www.unicode.org/reports/tr10/

Wow, that looks like a fun read.

Best,

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-07-07 20:59:16 Re: PATCH: CITEXT 2.0
Previous Message David E. Wheeler 2008-07-07 20:13:15 Re: PATCH: CITEXT 2.0