Re: SELECT substring with regex

From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: mailreg(at)numerixtechnology(dot)de
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 22:39:47
Message-ID: bf05e51c0607071539k1de038ddt846fb158ca8423a7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
>
>
> I am trying to come up with a semi-automatic solution to tidy up some
> data. If it's got to be done manually via the GUI it would mean a lot of
> dummy work [for the customer].

I would recommend you alter the GUI to NOT allow any more bad data to get in
- perhaps give some nice validation and friendly error message about the
correct format or give various fields that need to be filled out so the user
can easily enter it and the GUI assembles the correct string. It won't fix
the current bad data but would stop the insanity from proliferating :)-

First of all I did a 5 table join to select those NAMEs which don't
> follow the required pattern: the pattern describes a photographic lens
> (focal length followed by lens speed (aperture)) and nothing else.
> Unfortuantely, there are a few hundred occurences where a few attributes
> have been appended which should have been stored elsewhere.
>
>
> valid entries would be:
> "28mm F2.8" (prime lens)
> "30-70mm F4" (zoom lens)
> "30-70mm F2.8" (zoom lens)
> "30-100mm F4.5-5.6" (zoom lens with variable speed)
>
>
> In the WHERE clause I have specified all those NAMEs, which follow that
> pattern but have some gubbins appended:
>
> WHERE NAME ~
> '^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))\\D+$'
>
>
> which gives me a listing of those candidates that need to be amended -
> manually or otherwise.
>
> Next, I wanted to produce a result which splits NAME into what it should
> be (BASE) and attributes (SUFFIX). Maybe I can generate some SQL from
> that to tidy up the data.

You might also consider adding the base and suffix columns with a trigger
that parses the name field and sets the values of base and suffix and also
does any integrity checks during inserts and updates. Finally, after the
data is cleaned up and the client apps are changed to use base and suffix
and not name, get rid of the name column.

Then again, this may be exactly what you are already trying to do.

-Aaron

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Rodrigo De Leon 2006-07-08 01:24:07 Re: SELECT substring with regex
Previous Message Richard Broersma Jr 2006-07-07 21:33:01 Re: SELECT substring with regex