Re: PATCH: CITEXT 2.0 v4

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Michael Paesold <mpaesold(at)gmx(dot)at>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: CITEXT 2.0 v4
Date: 2008-07-22 20:54:20
Message-ID: 65071C7E-0CE0-489C-B294-D1C6DAD49D16@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 18, 2008, at 01:39, Michael Paesold wrote:

> Calling regex functions with the case-insensitivity option would be
> great. It should also be possible to rewrite replace() into
> regexp_replace() by first escaping the regex meta characters.
>
> Actually re-implementing those functions in a case insensitive way
> would still be an option, but of course some amount of work. The
> question is, how much use case there is.

I've figured out how to make all the functions work using SQL function
workarounds, converting things and re-dispatching to the text versions
as appropriate. They work quite well, and can be converted to C later
if that becomes a requirement.

Meanwhile, on the question of whether or not regular expression and
LIKE comparisons *should* match case-insensitively, I have a couple
more observations:

* Thinking about how a true case-insensitive collation would work, I'm
quite certain that it would match case-insensitively. Anything else
would just be unexpected, because in a case-insensitive collation,
lowercase characters are, in practice, identical to uppercase
characters. As far as matching is concerned, there is no difference
between them. So the matching operators and functions against CITEXT
should follow that assumption.

* I tried a few matches on MySQL, where the collation is case-
insensitive by default, and it confirms my impression:

mysql> select 'Foo' regexp 'o$';
+-------------------+
| 'Foo' regexp 'o$' |
+-------------------+
| 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' regexp 'O$';
+-------------------+
| 'Foo' regexp 'O$' |
+-------------------+
| 1 |
+-------------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%o';
+-----------------+
| 'Foo' like '%o' |
+-----------------+
| 1 |
+-----------------+
1 row in set (0.00 sec)

mysql> select 'Foo' like '%O';
+-----------------+
| 'Foo' like '%O' |
+-----------------+
| 1 |
+-----------------+
1 row in set (0.00 sec)

I'll grant that MySQL may not be the best model for how things should
work, but it's something, at least. Anyone else got access to another
database with case-insensitive collations to see how LIKE and regular
expressions work?

Thanks,

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2008-07-22 21:06:09 Re: Do we really want to migrate plproxy and citext into PG core distribution?
Previous Message Tom Lane 2008-07-22 20:49:35 Re: Postgres-R: primary key patches