Re: Select all invalid e-mail addresses

From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Select all invalid e-mail addresses
Date: 2005-10-25 16:54:00
Message-ID: 20051025165400.GB21613@gp.word-to-the-wise.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Oct 25, 2005 at 09:09:44AM -0600, Michael Fuhr wrote:
> On Tue, Oct 25, 2005 at 11:20:53AM +0300, Andrus wrote:
> > This regex allows email addresses containing two dots without any letters,
> > like eeta(dot)(dot)soft(at)online(dot)ee
> > I havent seen any email of such kind.
>
> That's because the regular expression is wrong: it simply checks
> the local part for zero or more non-@ characters instead of checking
> against the RFC822/RFC2822 specification. Use a search engine to
> find a more complete regular expression (beware: it's long).

eeta(dot)(dot)soft(at)online(dot)ee is a perfectly functional email address, despite
not being in dot-atom form, so technically in violation of RFC
2822. There are few constraints on the local part of an email address,
and those constraints are often violated in practice, and cause no
problems.

I do data analysis on email addresses all day, every day. I'm fully
aware of RFC 2822 constraints, and I'm also aware that the correlation
between them and the real world is high, but not absolute.

If you were using this to validate email software that would be a
different thing, but if you're actually working in the real world with
real world data and are actually concerned about finding email
addresses that are likely to be incorrect (rather than punishing users
with noc RFC 2822 compliant email addresses) then looking at the
local-part in much detail is really not useful.

Cheers,
Steve

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Seltenreich 2005-10-25 17:02:02 Re: STL problem in stored procedures
Previous Message WireSpot 2005-10-25 16:44:16 Re: Deleting vs foreign keys