Re: is this a bug or I am blind?

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mage <mage(at)mage(dot)hu>, pgsql-general(at)postgreSQL(dot)org
Subject: Re: is this a bug or I am blind?
Date: 2005-12-16 17:54:15
Message-ID: 20051216175411.GA11985@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Dec 16, 2005 at 12:12:08PM -0500, Tom Lane wrote:
> Perhaps the fast-path check is a bad idea, but fixing this is not just
> a matter of removing that. If we subscribe to strcoll's worldview then
> we have to conclude that *text strings are not hashable*, because
> strings that should be "equal" may have different hash codes. And at
> least in the current PG code, that's not something we can flip on and off
> depending on the locale --- texteq would have to be marked non hashable
> in the system catalogs, meaning a big performance hit for *everybody*
> even if their locale is not this weird.

That's true, in the sense that unconverted strings are not hashable.
This is what strxfrm was created for, to return the sorting key for a
string. A quick C program demonstrates that indeed in that locale these
two strings are equal, whereas in en_AU they are not.

$ LC_ALL=hu_HU ./strxfrm potyty potty
String 1: potyty
Strxfrm 1: " ((\x01\x02\x02\x02\x02\x01\x02\x02\x02\x02
String 2: potty
Strxfrm 2: " ((\x01\x02\x02\x02\x02\x01\x02\x02\x02\x02
$ LC_ALL=en_AU ./strxfrm potyty potty
String 1: potyty
Strxfrm 1: \x1B\x1A\x1F$\x1F$\x01\x02\x02\x02\x02\x02\x02\x01\x02\x02\x02\x02\x02\x02
String 2: potty
Strxfrm 2: \x1B\x1A\x1F\x1F$\x01\x02\x02\x02\x02\x02\x01\x02\x02\x02\x02\x02

I think the only way to make indexes properly locale sensetive would be
to either use strcoll() in all cases, or store the result from
strxfrm() in the index. Anything else will break somewhere.

In any case, we first need to determine which answer is correct, before
we run off trying to fix it.

This is Glibc 2.3.2 on a Debian Linux system.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment Content-Type Size
strxfrm.c text/x-csrc 508 bytes

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Csaba Nagy 2005-12-16 17:59:48 Re: is this a bug or I am blind?
Previous Message Tom Lane 2005-12-16 17:52:36 Re: is this a bug or I am blind?