Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: like/ilike improvements


  • From: Andrew Dunstan <andrew(at)dunslane(dot)net>
  • To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
  • Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
  • Subject: Re: like/ilike improvements
  • Date: Thu, 24 May 2007 23:21:35 -0400
  • Message-id: <4656563F(dot)50608(at)dunslane(dot)net>



Tom Lane wrote:
Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
Tom Lane wrote:
You have to be on a first byte before you can meaningfully apply
NextChar, and you have to use NextChar or else you don't count
characters correctly (eg "__" must match 2 chars not 2 bytes).

Yes, I agree completely. However it looks to me like IsFirstByte will in fact always be true when we get to call NextChar for matching "_" for UTF8.

If that's true, the patch is failing to achieve its goal of treating %
bytewise ...


Let's back up. % processing works by looking for a place in the text that might match what follows % in the pattern, and then calling itself recursively. For UTF8, if what follows % is _, it does that search by repeatedly calling NextChar - otherwise it calls NextByte. But if we're not processing a wildcard we have to match an actual complete UTF8 char, so the fact that we proceed byte-wise won't get us out of sync. whenever we happen to encounter an _. We can't rely on that process for other multi-byte charsets because the suffix of one char might be the prefix of another, so we could get false matches. That can't happen with UTF8.

cheers

andrew



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group