Re: Select all invalid e-mail addresses

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Andrus <eetasoft(at)online(dot)ee>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Select all invalid e-mail addresses
Date: 2005-10-24 21:07:11
Message-ID: 20051024210711.GA25991@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Oct 24, 2005 at 09:02:26PM +0300, Andrus wrote:
> I applied Michael hint about dollar quoting to this and tried
>
> create temp table customer ( email char(60));
> insert into customer values( 'steve(at)blighty(dot)com');
> SELECT email FROM customer WHERE email !~*
> $$
> ^[^(at)]*@(?:[^(at)]*\(dot))?[a-z0-9_-]+\.(?:a[defgilmnoqrstuwz]|b[abdefghijmnorstvwyz]|c[acdfghiklmnoruvxyz]|d[ejkmoz]|e[ceghrst]|f[ijkmorx]|g[abdefhilmnpqrstuwy]|h[kmnrtu]|i[delnoqrst]|j[mop]|k[eghimnprwyz]|l[abcikrstuvy]|m[acdghklmnopqrstuvwxyz]|n[acefgilopruz]|om|p[aefghklmnrtwy]|qa|r[eouw]|s[abcdeghijklmnortvyz]|t[cdfghjkmnoprtvwz]|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]|edu|com|net|org|gov|mil|info|biz|coop|museum|aero|name|pro|mobi|arpa)$
> $$
>
> but this classifies e-mail address steve(at)blighty(dot)com as invalid (select
> returns it). The same result are without dollar quoting, using your original
> select.

There are at least two problems:

1. Since you're storing the email address as char(60), in some cases
it'll be padded with spaces up to 60 characters. This appears to
be one of those cases:

SELECT 'foo'::char(60) ~ '^foo$';
?column?
----------
f
(1 row)

test=> SELECT 'foo'::char(60) ~ '^foo {57}$';
?column?
----------
t
(1 row)

2. Everything in the quoted string is part of the regular expression,
including the embedded newlines immediately after the open quote and
before the close quote.

test=> SELECT 'foo'::text ~ $$
test$> ^foo$
test$> $$;
?column?
----------
f
(1 row)

test=> SELECT 'foo'::text ~ $_$^foo$$_$;
?column?
----------
t
(1 row)

Note the need to quote with something other than $$ ($_$ in this case)
because of the $ that's part of the regular expression. Otherwise
you'd get this:

test=> SELECT 'foo'::text ~ $$^foo$$$;
ERROR: syntax error at or near "$" at character 30
LINE 1: SELECT 'foo'::text ~ $$^foo$$$;
^

Suggestions: use text or varchar for the email address, don't embed
newlines in the regular expression, and if you use dollar quotes
and the regular expression ends with a dollar sign then quote with
a character sequence other than $$.

--
Michael Fuhr

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-10-24 21:17:54 Re: is there a function which elminates spaces?
Previous Message Jeffrey Melloy 2005-10-24 20:50:33 Re: is there a function which elminates spaces?