Re: Case Insensitive Data Type

Lists: pgsql-general
From: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Case Insensitive Data Type
Date: 2002-05-24 16:46:20
Message-ID: 061801c20342$881c8310$0464a8c0@iarchives.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)

Does anyone know of a case-insensitive data type? I'd want the following behavior:

/* Make the primary key be a case-insensitive data type */
CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));

/* Insert a row with a case insensitive key */
INSERT INTO foo VALUES ('joe(at)somewhere(dot)com', 'Joe');
INSERT 24751 1

/* A different case of an existing primary key should fail */
INSERT INTO foo VALUES ('Joe(at)SOMEWHERE(dot)com', 'Joe');
ERROR: Cannot insert a duplicate key into unique index foo_pkey

/* A lookup on a different case of an existing key should be successful: */
SELECT * FROM foo WHERE email = 'Joe(at)SOMEWHERE(dot)com';
email | name
-------------------+------
joe(at)somewhere(dot)com | Joe
(1 row)

Anyone know how I can accomplish this? Can I create a custom data type to do this?

Thanks, Russell


From: "Russell Black" <rblack(at)iarchives(dot)com>
To: "Russell Black" <russell(dot)black(at)iarchives(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 17:04:06
Message-ID: 062601c20345$037262d0$0464a8c0@iarchives.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I found this method in the archives:

CREATE TABLE t1 (f1 varchar(40));
SELECT f1 FROM t1 WHERE lower( f1 ) = 'whatever';
CREATE INDEX t1_f1_idx ON t1 (lower(f1));

This will unfortunately not work, since I'm using CMP EJB's, and I'm not writing the SQL queries myself. The EJB container does it for me, and it wants to do a
SELECT f1 FROM t1 WHERE f1 = 'whatever';
Instead of
SELECT f1 FROM t1 WHERE lower( f1 ) = 'whatever';

This is why I'm hoping to have the database backend take care of this with perhaps a new datatype, so that the '=' operator is case insensitive.

Russell
----- Original Message -----
From: Russell Black
To: pgsql-general(at)postgresql(dot)org
Sent: Friday, May 24, 2002 10:46 AM
Subject: [GENERAL] Case Insensitive Data Type

I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)

Does anyone know of a case-insensitive data type? I'd want the following behavior:

/* Make the primary key be a case-insensitive data type */
CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));

/* Insert a row with a case insensitive key */
INSERT INTO foo VALUES ('joe(at)somewhere(dot)com', 'Joe');
INSERT 24751 1

/* A different case of an existing primary key should fail */
INSERT INTO foo VALUES ('Joe(at)SOMEWHERE(dot)com', 'Joe');
ERROR: Cannot insert a duplicate key into unique index foo_pkey

/* A lookup on a different case of an existing key should be successful: */
SELECT * FROM foo WHERE email = 'Joe(at)SOMEWHERE(dot)com';
email | name
-------------------+------
joe(at)somewhere(dot)com | Joe
(1 row)

Anyone know how I can accomplish this? Can I create a custom data type to do this?

Thanks, Russell


From: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
To: "Russell Black" <russell(dot)black(at)iarchives(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 17:08:34
Message-ID: 064601c20345$a324f450$0464a8c0@iarchives.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I found this method in the archives:

CREATE TABLE t1 (f1 varchar(40));
CREATE INDEX t1_f1_idx ON t1 (lower(f1));

SELECT * FROM t1 WHERE lower( f1 ) = 'whatever';
This will unfortunately not work, since I'm using CMP EJB's, and I'm not writing the SQL queries myself. The EJB container does it for me, and it wants to do a
SELECT * FROM t1 WHERE f1 = 'whatever';
Instead of
SELECT * FROM t1 WHERE lower( f1 ) = 'whatever';

This is why I'm hoping to have the database backend take care of this with perhaps a new datatype, so that the '=' operator is case insensitive.
Any ideas?

Russell
----- Original Message -----
From: Russell Black
To: pgsql-general(at)postgresql(dot)org
Sent: Friday, May 24, 2002 10:46 AM
Subject: [GENERAL] Case Insensitive Data Type

I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)

Does anyone know of a case-insensitive data type? I'd want the following behavior:

/* Make the primary key be a case-insensitive data type */
CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));

/* Insert a row with a case insensitive key */
INSERT INTO foo VALUES ('joe(at)somewhere(dot)com', 'Joe');
INSERT 24751 1

/* A different case of an existing primary key should fail */
INSERT INTO foo VALUES ('Joe(at)SOMEWHERE(dot)com', 'Joe');
ERROR: Cannot insert a duplicate key into unique index foo_pkey

/* A lookup on a different case of an existing key should be successful: */
SELECT * FROM foo WHERE email = 'Joe(at)SOMEWHERE(dot)com';
email | name
-------------------+------
joe(at)somewhere(dot)com | Joe
(1 row)

Anyone know how I can accomplish this? Can I create a custom data type to do this?

Thanks, Russell


From: Darren Ferguson <darren(at)crystalballinc(dot)com>
To: Russell Black <russell(dot)black(at)iarchives(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 17:17:38
Message-ID: Pine.LNX.4.10.10205241313070.25796-100000@thread.crystalballinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Three ways:

One: write trigger on the table.

CREATE OR REPLACE FUNCTION foo_trigger_func() RETURNS OPAQUE AS '
BEGIN
NEW.email := LOWER(NEW.email);
RETURN NEW;
END;' LANGUAGE 'plpgsql';

CREATE TRIGGER foo_trigger BEFORE INSERT OR UPDATE ON foo
FOR EACH ROW EXECUTE PROCEDURE foo_trigger_func();

Two: use the lower function in your update and insert queries

INSERT INTO foo VALUES(LOWER(email),'Name');

Three: Postgresql has an ILIKE operator

You could play with that but not the best solution

HTH

Darren Ferguson

On Fri, 24 May 2002, Russell Black wrote:

> I'm using email addresses as a primary key in one of my tables. Currently, I have to ensure that the email addresses are converted to lower case before they get put into the table, and that all lookups on that field are converted to lower case before the select statement, in order to ensure that Joe(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)
>
> Does anyone know of a case-insensitive data type? I'd want the following behavior:
>
> /* Make the primary key be a case-insensitive data type */
> CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name VARCHAR(50));
>
> /* Insert a row with a case insensitive key */
> INSERT INTO foo VALUES ('joe(at)somewhere(dot)com', 'Joe');
> INSERT 24751 1
>
> /* A different case of an existing primary key should fail */
> INSERT INTO foo VALUES ('Joe(at)SOMEWHERE(dot)com', 'Joe');
> ERROR: Cannot insert a duplicate key into unique index foo_pkey
>
> /* A lookup on a different case of an existing key should be successful: */
> SELECT * FROM foo WHERE email = 'Joe(at)SOMEWHERE(dot)com';
> email | name
> -------------------+------
> joe(at)somewhere(dot)com | Joe
> (1 row)
>
> Anyone know how I can accomplish this? Can I create a custom data type to do this?
>
> Thanks, Russell
>
>
>


From: Fran Fabrizio <ffabrizio(at)mmrd(dot)com>
To: Russell Black <russell(dot)black(at)iarchives(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 17:21:43
Message-ID: 3CEE76A7.9060401@mmrd.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> Does anyone know of a case-insensitive data type? I'd want the
> following behavior:

sounds like you want some combination of the built-in lower() function
and a trigger or rule to ensure lower on insert.

-Fran


From: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 17:48:56
Message-ID: 068a01c2034b$4717d550$0464a8c0@iarchives.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for your quick reply!

The first option you gave looks promising. Unfortunately, I have no control
over the insert or select statements, since I'm using EJBs with Container
Managed Persistence. The EJB container does the INSERT and UPDATE SQL.
This means options two and three in your suggestions wouldn't work. But it
looks like your first option would ensure that the data is lower case in the
table, which gets me half way there. Now I just need a way to make the =
operator for the email column case-insensitive. So that I could do

INSERT INTO foo VALUES ('abcde');

and selecting on an upper case version would work:

SELECT * FROM foo WHERE (email = 'ABCDE');

Any suggestions?

Thanks

----- Original Message -----

> Three ways:
>
> One: write trigger on the table.
>
> CREATE OR REPLACE FUNCTION foo_trigger_func() RETURNS OPAQUE AS '
> BEGIN
> NEW.email := LOWER(NEW.email);
> RETURN NEW;
> END;' LANGUAGE 'plpgsql';
>
> CREATE TRIGGER foo_trigger BEFORE INSERT OR UPDATE ON foo
> FOR EACH ROW EXECUTE PROCEDURE foo_trigger_func();
>
> Two: use the lower function in your update and insert queries
>
> INSERT INTO foo VALUES(LOWER(email),'Name');
>
> Three: Postgresql has an ILIKE operator
>
> You could play with that but not the best solution
>
> HTH
>
> Darren Ferguson
>
> On Fri, 24 May 2002, Russell Black wrote:
>
> > I'm using email addresses as a primary key in one of my tables.
Currently, I have to ensure that the email addresses are converted to lower
case before they get put into the table, and that all lookups on that field
are converted to lower case before the select statement, in order to ensure
that Joe(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)
> >
> > Does anyone know of a case-insensitive data type? I'd want the
following behavior:
> >
> > /* Make the primary key be a case-insensitive data type */
> > CREATE TABLE foo (email CASE_INSENSITIVE_VARCHAR(50) PRIMARY KEY, name
VARCHAR(50));
> >
> > /* Insert a row with a case insensitive key */
> > INSERT INTO foo VALUES ('joe(at)somewhere(dot)com', 'Joe');
> > INSERT 24751 1
> >
> > /* A different case of an existing primary key should fail */
> > INSERT INTO foo VALUES ('Joe(at)SOMEWHERE(dot)com', 'Joe');
> > ERROR: Cannot insert a duplicate key into unique index foo_pkey
> >
> > /* A lookup on a different case of an existing key should be successful:
*/
> > SELECT * FROM foo WHERE email = 'Joe(at)SOMEWHERE(dot)com';
> > email | name
> > -------------------+------
> > joe(at)somewhere(dot)com | Joe
> > (1 row)
> >
> > Anyone know how I can accomplish this? Can I create a custom data type
to do this?
> >
> > Thanks, Russell
> >
> >
> >
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 18:47:31
Message-ID: 13443.1022266051@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Russell Black" <russell(dot)black(at)iarchives(dot)com> writes:
> I'm using email addresses as a primary key in one of my tables. Currently,=
> I have to ensure that the email addresses are converted to lower case befo=
> re they get put into the table, and that all lookups on that field are conv=
> erted to lower case before the select statement, in order to ensure that Jo=
> e(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)

A partial answer to this is

CREATE UNIQUE INDEX foo_key on foo (lower(email));

which will prevent duplicate entries with the same-up-to-case-folding
address. However you'd still have to query with queries like

SELECT * FROM foo WHERE lower(email) = lower('probe value');

if the probe values you are handed can't be relied on to be downcased
already. (I tried to think of a way to insert the lower() automatically
using a rule, but couldn't come up with anything --- ideas anyone?)

> Does anyone know of a case-insensitive data type?

I do not think you should go away feeling that a case-insensitive data
type would magically solve your problems. Today you might think that
case-folding is what you want, but by no later than next week you would
figure out that there are other sorts of normalizations you'll also need
to do on provided addresses --- one obvious example is stripping leading
and trailing blanks and reducing consecutive blanks to a single blank.
In fact if you really want to look like you know what you're doing,
you'll have to strip out the comment portions of an address entirely.
For example, these are all equivalent forms per RFC specs:
joe(at)blow(dot)com
Joe Blow <joe(at)blow(dot)com>
"Joe Blow" <joe(at)blow(dot)com>
joe(at)blow(dot)com (Joe Blow)
and should be recognized as such by anything that pretends to know
what email addresses are. (For that matter, you do know that the
specs say only the hostname part should be assumed case-insensitive,
don't you?)

So the real bottom line here is that you'd better figure a way to pass
the input strings through an address-normalization function. "lower()"
might do as a first-order approximation but you won't stay with it
forever.

> Can I create a custom data type to do this?

You probably could, but it'd likely be more work than you want. A
minimal implementation would require an input function (which would
apply the normalization rules you want), an output function (for which
you could probably just commandeer textout), and a no-op text() coercion
function (assuming you'd like to do anything besides display the value).
Then you'd have to create a set of index operators and access method
entries, if you intended to make this indexable --- these could all
piggyback on text operators, but you'd still need to do the work of
making the catalog entries for them.

Assuming you did have all that, I think a query like

select * from foo where email = 'Joe Blow <joe(at)blow(dot)com>'

would indeed work the way you want --- the initially untyped string
literal would eventually get coerced to your datatype, and then your
input conversion routine could do the right things to it. But it'll
likely be easier to make it happen on the client side ;-)

regards, tom lane


From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 19:00:22
Message-ID: 20020524150022.4701cc2b.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 24 May 2002 11:08:34 -0600
"Russell Black" <russell(dot)black(at)iarchives(dot)com> wrote:
> The EJB container does it for me, and it wants to do a
> SELECT * FROM t1 WHERE f1 = 'whatever';
> Instead of
> SELECT * FROM t1 WHERE lower( f1 ) = 'whatever';

You could have it run the SELECT on a view, where the relevant
column is defined using lower() on the appropriate column
of the real table.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: Darren Ferguson <darren(at)crystalballinc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 19:03:06
Message-ID: Pine.LNX.4.10.10205241500460.26467-100000@thread.crystalballinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom

Just out of interest and i am not sure if this is possible or practical
but:

Could / Would it be very difficult to add ON SELECT triggers to Postgres?

This could help a little.

It would be as a last resort.

SELECT email FROM foo WHERE email = 'ABCDE';

TRIGGER could have a part where you could then lower the ABCDE before the
query was run.

Just a thought

Darren Ferguson

On Fri, 24 May 2002, Tom Lane wrote:

> "Russell Black" <russell(dot)black(at)iarchives(dot)com> writes:
> > I'm using email addresses as a primary key in one of my tables. Currently,=
> > I have to ensure that the email addresses are converted to lower case befo=
> > re they get put into the table, and that all lookups on that field are conv=
> > erted to lower case before the select statement, in order to ensure that Jo=
> > e(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)
>
> A partial answer to this is
>
> CREATE UNIQUE INDEX foo_key on foo (lower(email));
>
> which will prevent duplicate entries with the same-up-to-case-folding
> address. However you'd still have to query with queries like
>
> SELECT * FROM foo WHERE lower(email) = lower('probe value');
>
> if the probe values you are handed can't be relied on to be downcased
> already. (I tried to think of a way to insert the lower() automatically
> using a rule, but couldn't come up with anything --- ideas anyone?)
>
>
> > Does anyone know of a case-insensitive data type?
>
> I do not think you should go away feeling that a case-insensitive data
> type would magically solve your problems. Today you might think that
> case-folding is what you want, but by no later than next week you would
> figure out that there are other sorts of normalizations you'll also need
> to do on provided addresses --- one obvious example is stripping leading
> and trailing blanks and reducing consecutive blanks to a single blank.
> In fact if you really want to look like you know what you're doing,
> you'll have to strip out the comment portions of an address entirely.
> For example, these are all equivalent forms per RFC specs:
> joe(at)blow(dot)com
> Joe Blow <joe(at)blow(dot)com>
> "Joe Blow" <joe(at)blow(dot)com>
> joe(at)blow(dot)com (Joe Blow)
> and should be recognized as such by anything that pretends to know
> what email addresses are. (For that matter, you do know that the
> specs say only the hostname part should be assumed case-insensitive,
> don't you?)
>
> So the real bottom line here is that you'd better figure a way to pass
> the input strings through an address-normalization function. "lower()"
> might do as a first-order approximation but you won't stay with it
> forever.
>
> > Can I create a custom data type to do this?
>
> You probably could, but it'd likely be more work than you want. A
> minimal implementation would require an input function (which would
> apply the normalization rules you want), an output function (for which
> you could probably just commandeer textout), and a no-op text() coercion
> function (assuming you'd like to do anything besides display the value).
> Then you'd have to create a set of index operators and access method
> entries, if you intended to make this indexable --- these could all
> piggyback on text operators, but you'd still need to do the work of
> making the catalog entries for them.
>
> Assuming you did have all that, I think a query like
>
> select * from foo where email = 'Joe Blow <joe(at)blow(dot)com>'
>
> would indeed work the way you want --- the initially untyped string
> literal would eventually get coerced to your datatype, and then your
> input conversion routine could do the right things to it. But it'll
> likely be easier to make it happen on the client side ;-)
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>


From: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 19:15:51
Message-ID: 5.1.0.14.2.20020524151332.02c4ab70@mail.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


>Assuming you did have all that, I think a query like
>
> select * from foo where email = 'Joe Blow <joe(at)blow(dot)com>'
>
>would indeed work the way you want --- the initially untyped string
>literal would eventually get coerced to your datatype, and then your
>input conversion routine could do the right things to it. But it'll
>likely be easier to make it happen on the client side ;-)

Since he is using EJB, he is probably stripping out everything already
except the core SMTP e-mail address (probably using
javax.mail.internet.InternetAddress's parser).

Not that this is in the least bit relevant to PG so I'll just shut up now. :)

Cheers,

Doug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Doug Fields <dfields-pg-general(at)pexicom(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 19:41:39
Message-ID: 20473.1022269299@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Doug Fields <dfields-pg-general(at)pexicom(dot)com> writes:
> Since he is using EJB, he is probably stripping out everything already
> except the core SMTP e-mail address (probably using
> javax.mail.internet.InternetAddress's parser).

Hm. I'd sort of expect a facility like that to have done all the
appropriate normalization already --- in which case this discussion
becomes moot ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Darren Ferguson <darren(at)crystalballinc(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 20:02:42
Message-ID: 20803.1022270562@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Darren Ferguson <darren(at)crystalballinc(dot)com> writes:
> Could / Would it be very difficult to add ON SELECT triggers to Postgres?

Don't see that it would help you. An ON SELECT trigger would presumably
fire when a row is selected --- but you are trying to alter the
selection conditions, and cause rows to be selected that would *not* be
normally. There is no place for a trigger to fire to make that happen.

Note that it'd be easy enough for a rule to add lower() to the reference
to email --- really all you're doing there is making a view that
replaces the email column with a lowercased version of it. But I don't
see a way for a rule to rewrite the query to lower-case the literal
constant being compared to email. Can anyone see a way to do it?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: "Russell Black" <russell(dot)black(at)iarchives(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 20:16:40
Message-ID: 20954.1022271400@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Neil Conway <nconway(at)klamath(dot)dyndns(dot)org> writes:
> You could have it run the SELECT on a view, where the relevant
> column is defined using lower() on the appropriate column
> of the real table.

But that only gets him halfway there. He really wants the system to
do a lower() on the provided comparison constant, too. I haven't
thought of a way to get the rule system to insert that.

regards, tom lane


From: grant <grant(at)amadensor(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Darren Ferguson <darren(at)crystalballinc(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 22:02:20
Message-ID: Pine.LNX.4.33.0205241500580.4951-100000@dbms.amadensor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Would it be possible to make a view that simply returns a tolower() of the
field, then have an instead of trigger on the update of the view to
update the real table?

______________________________________________________________________________

Your mouse has moved.
You must restart Windows for your changes to take effect.

#!/usr/bin/perl
print $i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 22:08:35
Message-ID: 20020524180835.C28484@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, May 24, 2002 at 03:41:39PM -0400, Tom Lane wrote:
> Doug Fields <dfields-pg-general(at)pexicom(dot)com> writes:
> > Since he is using EJB, he is probably stripping out everything already
> > except the core SMTP e-mail address (probably using

> Hm. I'd sort of expect a facility like that to have done all the
> appropriate normalization already --- in which case this discussion

Except maybe EJB accepts the actual spec, and so allows mixed case?

--
----
Andrew Sullivan 87 Mowat Avenue
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M6K 3E3
+1 416 646 3304 x110


From: "Peter Darley" <pdarley(at)kinesis-cem(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Russell Black" <russell(dot)black(at)iarchives(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 22:16:17
Message-ID: NNEAICKPNOGDBHNCEDCPIEBCCJAA.pdarley@kinesis-cem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Friends,
I don't want to imply something about things I don't know anything about,
but I'm interested in why nobody has suggested creating a new data type
using CREATE TYPE, setting up it's various functions and creating new
operators for it. It looks (reading the docs) like it would do what Russel
wants without too much work. Is there some reason that folks avoid this
approach?
Thanks,
Peter Darley

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org]On Behalf Of Tom Lane
Sent: Friday, May 24, 2002 11:48 AM
To: Russell Black
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [GENERAL] Case Insensitive Data Type

"Russell Black" <russell(dot)black(at)iarchives(dot)com> writes:
> I'm using email addresses as a primary key in one of my tables.
Currently,=
> I have to ensure that the email addresses are converted to lower case
befo=
> re they get put into the table, and that all lookups on that field are
conv=
> erted to lower case before the select statement, in order to ensure that
Jo=
> e(at)Somewhere(dot)com is the same as joe(at)somewhere(dot)com(dot)

A partial answer to this is

CREATE UNIQUE INDEX foo_key on foo (lower(email));

which will prevent duplicate entries with the same-up-to-case-folding
address. However you'd still have to query with queries like

SELECT * FROM foo WHERE lower(email) = lower('probe value');

if the probe values you are handed can't be relied on to be downcased
already. (I tried to think of a way to insert the lower() automatically
using a rule, but couldn't come up with anything --- ideas anyone?)

> Does anyone know of a case-insensitive data type?

I do not think you should go away feeling that a case-insensitive data
type would magically solve your problems. Today you might think that
case-folding is what you want, but by no later than next week you would
figure out that there are other sorts of normalizations you'll also need
to do on provided addresses --- one obvious example is stripping leading
and trailing blanks and reducing consecutive blanks to a single blank.
In fact if you really want to look like you know what you're doing,
you'll have to strip out the comment portions of an address entirely.
For example, these are all equivalent forms per RFC specs:
joe(at)blow(dot)com
Joe Blow <joe(at)blow(dot)com>
"Joe Blow" <joe(at)blow(dot)com>
joe(at)blow(dot)com (Joe Blow)
and should be recognized as such by anything that pretends to know
what email addresses are. (For that matter, you do know that the
specs say only the hostname part should be assumed case-insensitive,
don't you?)

So the real bottom line here is that you'd better figure a way to pass
the input strings through an address-normalization function. "lower()"
might do as a first-order approximation but you won't stay with it
forever.

> Can I create a custom data type to do this?

You probably could, but it'd likely be more work than you want. A
minimal implementation would require an input function (which would
apply the normalization rules you want), an output function (for which
you could probably just commandeer textout), and a no-op text() coercion
function (assuming you'd like to do anything besides display the value).
Then you'd have to create a set of index operators and access method
entries, if you intended to make this indexable --- these could all
piggyback on text operators, but you'd still need to do the work of
making the catalog entries for them.

Assuming you did have all that, I think a query like

select * from foo where email = 'Joe Blow <joe(at)blow(dot)com>'

would indeed work the way you want --- the initially untyped string
literal would eventually get coerced to your datatype, and then your
input conversion routine could do the right things to it. But it'll
likely be easier to make it happen on the client side ;-)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Darley" <pdarley(at)kinesis-cem(dot)com>
Cc: "Russell Black" <russell(dot)black(at)iarchives(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 22:37:24
Message-ID: 21811.1022279844@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Peter Darley" <pdarley(at)kinesis-cem(dot)com> writes:
> I don't want to imply something about things I don't know anything about,
> but I'm interested in why nobody has suggested creating a new data type
> using CREATE TYPE, setting up it's various functions and creating new
> operators for it. It looks (reading the docs) like it would do what Russel
> wants without too much work. Is there some reason that folks avoid this
> approach?

That's what I was referring to at the end of the message you quoted.
The question is what you define as "too much work" ;-).

Given a C implementation of the required normalization function (I'm
assuming plain lower() won't really cut it), you'd need probably O(100)
lines of additional C code and a few dozen catalog entries to make it
happen. A couple hours work for someone who'd done it before, rather
longer for someone who hadn't ...

regards, tom lane


From: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 22:53:43
Message-ID: 06fb01c20375$dac118a0$0464a8c0@iarchives.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'd be happy with something which was just plain case-insensitive. I don't
really think anyone is going to enter "Their Email <address(at)like(dot)this>"
Trimming whitepsace would be nice, but I could probably live without it.
I'm also comfortable with ignoring the fact that the spec says
Joe(at)somewhere(dot)com is really a different address than joe(at)somewhere(dot)com(dot)
Given that, I would imagine it would be much less than 100 lines of C code,
and could probably be done in plpgsql. Maybe it's still a big deal to
define all the operators for the new datatype, but I wouldn't mind giving it
a try if someone could tell me how to piggy-back on the regular string
operators, and string datatype (probably VARCHAR).

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Darley" <pdarley(at)kinesis-cem(dot)com>
Cc: "Russell Black" <russell(dot)black(at)iarchives(dot)com>;
<pgsql-general(at)postgresql(dot)org>
Sent: Friday, May 24, 2002 4:37 PM
Subject: Re: [GENERAL] Case Insensitive Data Type

> "Peter Darley" <pdarley(at)kinesis-cem(dot)com> writes:
> > I don't want to imply something about things I don't know anything
about,
> > but I'm interested in why nobody has suggested creating a new data type
> > using CREATE TYPE, setting up it's various functions and creating new
> > operators for it. It looks (reading the docs) like it would do what
Russel
> > wants without too much work. Is there some reason that folks avoid this
> > approach?
>
> That's what I was referring to at the end of the message you quoted.
> The question is what you define as "too much work" ;-).
>
> Given a C implementation of the required normalization function (I'm
> assuming plain lower() won't really cut it), you'd need probably O(100)
> lines of additional C code and a few dozen catalog entries to make it
> happen. A couple hours work for someone who'd done it before, rather
> longer for someone who hadn't ...
>
> regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Russell Black" <russell(dot)black(at)iarchives(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Case Insensitive Data Type
Date: 2002-05-24 23:08:22
Message-ID: 22024.1022281702@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Russell Black" <russell(dot)black(at)iarchives(dot)com> writes:
> I'd be happy with something which was just plain case-insensitive.
> Given that, I would imagine it would be much less than 100 lines of C code,
> and could probably be done in plpgsql.

You'll still have to get your hands dirty with C. AFAIK there isn't any
way to define I/O functions for new datatypes in any of the PL languages
we have.

regards, tom lane