Re: sortsupport for text

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: sortsupport for text
Date: 2012-06-20 10:00:13
Message-ID: 1340186413.26286.35.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
> So if you take the word "Aßlar" here - that is equivalent to "Asslar",
> and so strcoll("Aßlar", "Asslar") will return 0 if you have the right
> LC_COLLATE

This is not actually correct. glibc will sort Asslar before Aßlar, and
that is correct in my mind.

When a Wikipedia page on some particular language's alphabet says
something like "$letterA and $letterB are equivalent", what it really
means is that they are sorted the same compared to other letters, but
are distinct when ties are broken.

> (if you tried this out for yourself and found that I was
> actually lying through my teeth, pretend I said Hungarian instead of
> German and "some really obscure character" rather than ß).

Yeah, there are obviously exceptions, which led to the original change
being made, but they are not as wide-spread as they appear to be.

The real issue in this area, I suspect, will be dealing with Unicode
combining sequences versus equivalent precombined characters. But
support for that is generally crappy, so it's not urgent to deal with
it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-06-20 10:15:55 Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture
Previous Message Simon Riggs 2012-06-20 09:47:05 Re: [PATCH 10/16] Introduce the concept that wal has a 'origin' node