Re: SELECT substring with regex

From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 20:49:09
Message-ID: 44AEC8C5.80103@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Erik Jones wrote:
> T E Schmitz wrote:
>
>> Gary Stainburn wrote:
>>
>>> On Friday 07 July 2006 14:51, T E Schmitz wrote:
>>>
>>>> I would like to split the contents of a column using substring with a
>>>> regular expression:
>>>>
>>>> The column contains something like
>>>> "150mm LD AD Asp XR Macro"
>>>> I want to split this into
>>>> "150mm", "LD AD Asp XR Macro"
>>>>
>>>
>>>
>>> select substring('150mm LD AD Asp XR Macro','^\\d+mm') as BASE_NAME,
>>> substring('150mm LD AD Asp XR Macro','^\\d+mm (.*)$') as SUFFIX;
>>> base_name | suffix
>>> -----------+--------------------
>>> 150mm | LD AD Asp XR Macro
>>> (1 row)
>>>
>>> The brackets surround the required match
>> But that takes me to the next problem:
>>
>> For the sake of the example I simplified the regular pattern.
>> In reality, BASE_NAME might be:
>>
>> 28mm
>> 28-70mm
>>
>> So the reg. expr. requires brackets:
>>
>> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
>>
> Will the mm always be the end of the base name?

I had thought it best to simplify the problem for the purposes of the
mailing list but maybe I should supply the complete problem and describe
the purpose of the exercise:

I am trying to come up with a semi-automatic solution to tidy up some
data. If it's got to be done manually via the GUI it would mean a lot of
dummy work [for the customer].

First of all I did a 5 table join to select those NAMEs which don't
follow the required pattern: the pattern describes a photographic lens
(focal length followed by lens speed (aperture)) and nothing else.
Unfortuantely, there are a few hundred occurences where a few attributes
have been appended which should have been stored elsewhere.

valid entries would be:
"28mm F2.8" (prime lens)
"30-70mm F4" (zoom lens)
"30-70mm F2.8" (zoom lens)
"30-100mm F4.5-5.6" (zoom lens with variable speed)

In the WHERE clause I have specified all those NAMEs, which follow that
pattern but have some gubbins appended:

WHERE NAME ~
'^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))\\D+$'

which gives me a listing of those candidates that need to be amended -
manually or otherwise.

Next, I wanted to produce a result which splits NAME into what it should
be (BASE) and attributes (SUFFIX). Maybe I can generate some SQL from
that to tidy up the data.

--

Regards,

Tarlika Elisabeth Schmitz

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message David Clarke 2006-07-07 20:52:22 Re: Alternative to serial primary key
Previous Message Jim Buttafuoco 2006-07-07 20:34:32 Re: SELECT substring with regex