Re: LIKE optimization in UTF-8 and locale-C

From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)skype(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: LIKE optimization in UTF-8 and locale-C
Date: 2007-03-23 05:17:26
Message-ID: 460362E6.2040208@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro skrev:

>> I guess it works well for % but not for _ , the latter has to know, how
>> many bytes the current (multibyte) character covers.
>
> Yes, % is not used in trailing bytes for all encodings, but _ is
> used in some of them. I think we can use the optimization for all
> of the server encodings except JOHAB.

The problem with the like pattern _ is that it has to know how long the
single caracter is that it should pass over. Say you have a UTF-8 string
with 2 characters encoded in 3 bytes ('ÖA'). Where the first character
is 2 bytes:

0xC3 0x96 'A'

and now you want to match that with the LIKE pattern:

'_A'

How would that work in the C locale?

Maybe one should simply write a special version of LIKE for the UTF-8
encoding since it's probably the most used encoding today. But I don't
think you can use the C locale and that it would work for UTF-8.

/Dennis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ITAGAKI Takahiro 2007-03-23 05:45:47 Re: LIKE optimization in UTF-8 and locale-C
Previous Message Pavan Deolasee 2007-03-23 03:57:59 Re: CREATE INDEX and HOT - revised design

Browse pgsql-patches by date

  From Date Subject
Next Message ITAGAKI Takahiro 2007-03-23 05:45:47 Re: LIKE optimization in UTF-8 and locale-C
Previous Message ITAGAKI Takahiro 2007-03-23 03:25:25 Re: LIKE optimization in UTF-8 and locale-C