Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search archives
  Advanced Search

Re: UTF-8 and LIKE vs =


  • From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
  • To: barwick(at)gmail(dot)com
  • Cc: twanger(at)bluetwanger(dot)de, david(at)kineticode(dot)com, pgsql-general(at)postgresql(dot)org
  • Subject: Re: UTF-8 and LIKE vs =
  • Date: Tue, 24 Aug 2004 09:22:28 +0900 (JST)
  • Message-id: <20040824.092228.15271660.t-ishii@sra.co.jp> <text/plain>

> > 
> > Ð ÐÐÐ, 23.08.2004, Ð 23:04, David Wheeler ÐÐÑÐÑ:
> > > On Aug 23, 2004, at 1:58 PM, Ian Barwick wrote:
> > >
> > > > er, the characters in "name" don't seem to match the characters in the
> > > > query - 'êëë' vs. 'ëíì' - does that have any bearing?
> > >
> > > Yes, it means that = is doing the wrong thing!!
> > 
> > The collation rules of your (and my) locale say that these strings are
> > the same:
> > 
> > [markus(at)teetnang markus]$ cat > t
> > êëë
> > ëíì
> > [markus(at)teetnang markus]$ uniq t
> > êëë
> > [markus(at)teetnang markus]$
> 
> wild speculation in need of a Korean speaker, but:
> 
> ian(at)linux:~/tmp> cat j.txt
> ããã
> íêì
> ìêì
> ìëì
> êëë
> ëíì
> ããã
> ian(at)linux:~/tmp> uniq  j.txt
> ããã
> íêì
> ããã
> 
> All but the first and last lines are random Korean (Hangul)
> characters. Evidently our respective locales think all Hangul strings
> of the same length are identical, which is very probably not the
> case...

Locales for multibyte encodings are often broken on many platforms. I
see identical things with Japanese on Red Hat. This is one of the
reason why I tell Japanese PostgreSQL users not to enable locale while
initdb...
--
Tatsuo Ishii


Home | Main Index | Thread Index

Privacy Policy | About PostgreSQL
Copyright © 1996 – 2012 PostgreSQL Global Development Group