Quick Links

Re: LIKE optimization in UTF-8 and locale-C

From:	ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To:	Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: LIKE optimization in UTF-8 and locale-C
Date:	2007-03-23 05:45:47
Message-ID:	20070323142444.6368.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org> wrote:

> The problem with the like pattern _ is that it has to know how long the
> single caracter is that it should pass over. Say you have a UTF-8 string
> with 2 characters encoded in 3 bytes ('ÖA'). Where the first character
> is 2 bytes:
>
> 0xC3 0x96 'A'
>
> and now you want to match that with the LIKE pattern:
>
> '_A'

Thanks, it all made sense to me. My proposal was completely wrong.
The optimization of MBMatchText() seems to be the right way...

> Maybe one should simply write a special version of LIKE for the UTF-8
> encoding since it's probably the most used encoding today. But I don't
> think you can use the C locale and that it would work for UTF-8.

But then, present LIKE matching is not locale aware. we treat multi-byte
characters properly, but always perform a char-by-char comparison.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

In response to

Re: LIKE optimization in UTF-8 and locale-C at 2007-03-23 05:17:26 from Dennis Bjorklund

Responses

Re: LIKE optimization in UTF-8 and locale-C at 2007-03-23 06:10:39 from Andrew - Supernews

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew - Supernews	2007-03-23 06:00:20	Re: LIKE optimization in UTF-8 and locale-C
Previous Message	Dennis Bjorklund	2007-03-23 05:17:26	Re: LIKE optimization in UTF-8 and locale-C

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Andrew - Supernews	2007-03-23 06:00:20	Re: LIKE optimization in UTF-8 and locale-C
Previous Message	Dennis Bjorklund	2007-03-23 05:17:26	Re: LIKE optimization in UTF-8 and locale-C