Re: SELECT substring with regex

Lists: pgsql-sql
From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: SELECT substring with regex
Date: 2006-07-07 13:51:28
Message-ID: 44AE66E0.1040606@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

I would like to split the contents of a column using substring with a
regular expression:

SELECT
substring (NAME, '^\\d+mm') as BASE_NAME,
substring (NAME, ??? ) as SUFFIX
FROM MODEL

The column contains something like
"150mm LD AD Asp XR Macro"
I want to split this into
"150mm", "LD AD Asp XR Macro"

How can I extract the bit following the matching substring?

--

Regards,

Tarlika Elisabeth Schmitz


From: "Rodrigo De Leon" <rdeleonp(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 15:55:17
Message-ID: a55915760607070855j57b6b243lf1fd81dce1036bed@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
> I would like to split the contents of a column using substring with a
> regular expression:
>
> SELECT
> substring (NAME, '^\\d+mm') as BASE_NAME,
> substring (NAME, ??? ) as SUFFIX
> FROM MODEL
>
> The column contains something like
> "150mm LD AD Asp XR Macro"
> I want to split this into
> "150mm", "LD AD Asp XR Macro"
>
> How can I extract the bit following the matching substring?
>
> --
>
>
> Regards,
>
> Tarlika Elisabeth Schmitz

I'm sure there's a cleaner, regexp based approach, but how about:

SELECT
substring (NAME, '^\\d+mm') AS BASE_NAME ,
substr(
NAME
, char_length(
substring (NAME, '^\\d+mm')
) + 2
) AS SUFFIX
FROM MODEL

Regards,

Rodrigo


From: Gary Stainburn <gary(dot)stainburn(at)ringways(dot)co(dot)uk>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 15:59:14
Message-ID: 200607071659.14940.gary.stainburn@ringways.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Friday 07 July 2006 14:51, T E Schmitz wrote:
> I would like to split the contents of a column using substring with a
> regular expression:
>
> SELECT
> substring (NAME, '^\\d+mm') as BASE_NAME,
> substring (NAME, ??? ) as SUFFIX
> FROM MODEL
>
> The column contains something like
> "150mm LD AD Asp XR Macro"
> I want to split this into
> "150mm", "LD AD Asp XR Macro"
>
> How can I extract the bit following the matching substring?

select substring('150mm LD AD Asp XR Macro','^\\d+mm') as BASE_NAME,
substring('150mm LD AD Asp XR Macro','^\\d+mm (.*)$') as SUFFIX;
base_name | suffix
-----------+--------------------
150mm | LD AD Asp XR Macro
(1 row)

The brackets surround the required match
--
Gary Stainburn

This email does not contain private or confidential material as it
may be snooped on by interested government parties for unknown
and undisclosed purposes - Regulation of Investigatory Powers Act, 2000


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 17:35:11
Message-ID: 44AE9B4F.1020401@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Gary Stainburn wrote:
> On Friday 07 July 2006 14:51, T E Schmitz wrote:
>
>>I would like to split the contents of a column using substring with a
>>regular expression:
>>
>>The column contains something like
>>"150mm LD AD Asp XR Macro"
>>I want to split this into
>>"150mm", "LD AD Asp XR Macro"
>>
>
>
> select substring('150mm LD AD Asp XR Macro','^\\d+mm') as BASE_NAME,
> substring('150mm LD AD Asp XR Macro','^\\d+mm (.*)$') as SUFFIX;
> base_name | suffix
> -----------+--------------------
> 150mm | LD AD Asp XR Macro
> (1 row)
>
> The brackets surround the required match

This is ingenious! I had been looking at chapter 9.6 Pattern Matching.
Am I missing something? I did not realize that the brackets indicate
the required match.

But that takes me to the next problem:

For the sake of the example I simplified the regular pattern.
In reality, BASE_NAME might be:

28mm
28-70mm

So the reg. expr. requires brackets:

substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME

Actually, the pattern is more complex than that and I cannot see how I
can express it without brackets.

--

Regards/Gruß,

Tarlika Elisabeth Schmitz


From: "Rodrigo De Leon" <rdeleonp(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 18:37:08
Message-ID: a55915760607071137q2c70346cp5b30cac14beaf1e7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
> But that takes me to the next problem:
>
> For the sake of the example I simplified the regular pattern.
> In reality, BASE_NAME might be:
>
> 28mm
> 28-70mm
>
> So the reg. expr. requires brackets:
>
> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
>
> Actually, the pattern is more complex than that and I cannot see how I
> can express it without brackets.

Maybe:

select
substring ('150mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
, substring('150mm LD AD Asp XR Macro','^[\\d-]*mm (.*)$') as SUFFIX;

select
substring ('28-70mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
, substring('28-70mm LD AD Asp XR Macro','^[\\d-]*mm (.*)$') as SUFFIX;

etc...

Regards,

Rodrigo


From: Erik Jones <erik(at)myemma(dot)com>
To: mailreg(at)numerixtechnology(dot)de
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 18:58:33
Message-ID: 44AEAED9.7080300@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

T E Schmitz wrote:
> Gary Stainburn wrote:
>> On Friday 07 July 2006 14:51, T E Schmitz wrote:
>>
>>> I would like to split the contents of a column using substring with a
>>> regular expression:
>>>
>>> The column contains something like
>>> "150mm LD AD Asp XR Macro"
>>> I want to split this into
>>> "150mm", "LD AD Asp XR Macro"
>>>
>>
>>
>> select substring('150mm LD AD Asp XR Macro','^\\d+mm') as BASE_NAME,
>> substring('150mm LD AD Asp XR Macro','^\\d+mm (.*)$') as SUFFIX;
>> base_name | suffix
>> -----------+--------------------
>> 150mm | LD AD Asp XR Macro
>> (1 row)
>>
>> The brackets surround the required match
>
>
> This is ingenious! I had been looking at chapter 9.6 Pattern Matching.
> Am I missing something? I did not realize that the brackets indicate
> the required match.
>
> But that takes me to the next problem:
>
> For the sake of the example I simplified the regular pattern.
> In reality, BASE_NAME might be:
>
> 28mm
> 28-70mm
>
> So the reg. expr. requires brackets:
>
> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
>
> Actually, the pattern is more complex than that and I cannot see how I
> can express it without brackets.
>
>
Will the mm always be the end of the base name?

substring(NAME, '^(.*?mm)') as BASE_NAME

That should match all the way up to the first mm. If there are actually
a set number of different units that it might end in, such as mm and cm,
you can do:

substring(NAME, '^(.*?(mm|cm))') as BASE_NAME

That will match everything up to the first mm or cm. Note that you
don't have to worry about the second set of brackets returning anything
as the regexp version of substring only returns what is matched by the
first parenthesised subexpression.

--
erik jones <erik(at)myemma(dot)com>
software development
emma(r)


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 19:23:50
Message-ID: 44AEB4C6.9000903@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Rodrigo De Leon wrote:
> On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
>
>> But that takes me to the next problem:
>>
>> For the sake of the example I simplified the regular pattern.
>> In reality, BASE_NAME might be:
>>
>> 28mm
>> 28-70mm
>>
>> So the reg. expr. requires brackets:
>>
>> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
>>
>> Actually, the pattern is more complex than that and I cannot see how I
>> can express it without brackets.
>
>
> Maybe:
>
> select
> substring ('28-70mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME

Sorry, but that would also capture something like
10-30-59mm

The pattern describes either a single length (120 millimeters) or a
range (30 to 70 millimetres), hence:

\\d+(-\\d+)?mm

The ? quantifier refers to the combination of '-' and digits and has to
be bracketed.

If the brackets cannot be avoided in the expression, your original
suggestion might come in handy though:

SELECT
substring (NAME, '^\\d+(-\\d+)?mm') AS BASE_NAME ,
substr(
NAME
, char_length(
substring (NAME, '^\\d+(-\\d+)?mm')
) + 2
) AS SUFFIX

Still, I'd be interested to know whether there is a 'more elegant' solution.

--

Regards,

Tarlika Elisabeth Schmitz


From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Rodrigo De Leon" <rdeleonp(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 19:45:51
Message-ID: bf05e51c0607071245w3738e109l8128d345f5b06977@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7/7/06, Rodrigo De Leon <rdeleonp(at)gmail(dot)com> wrote:
>
> On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
> > But that takes me to the next problem:
> >
> > For the sake of the example I simplified the regular pattern.
> > In reality, BASE_NAME might be:
> >
> > 28mm
> > 28-70mm
> >
> > So the reg. expr. requires brackets:
> >
> > substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
> >
> > Actually, the pattern is more complex than that and I cannot see how I
> > can express it without brackets.
>
> Maybe:
>
> select
> substring ('150mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
> , substring('150mm LD AD Asp XR Macro','^[\\d-]*mm (.*)$') as SUFFIX;
>
> select
> substring ('28-70mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
> , substring('28-70mm LD AD Asp XR Macro','^[\\d-]*mm (.*)$') as SUFFIX;
>
> etc...
>
> Regards,
>
> Rodrigo

Is there a reason this column wasn't separated into two different columns?
Or perhaps into a child table if there could be more than one XXXmm value in
the field?

Just curious.

-Aaron


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: Aaron Bono <postgresql(at)aranya(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 20:12:52
Message-ID: 44AEC044.1040206@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Aaron Bono wrote:
> On 7/7/06, *Rodrigo De Leon* <rdeleonp(at)gmail(dot)com
> <mailto:rdeleonp(at)gmail(dot)com>> wrote:
>
> On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de
> <mailto:mailreg(at)numerixtechnology(dot)de>> wrote:
> > But that takes me to the next problem:
> >
> > For the sake of the example I simplified the regular pattern.
> > In reality, BASE_NAME might be:
> >
> > 28mm
> > 28-70mm
> >
> > So the reg. expr. requires brackets:
> >
> > substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
> >
> > Actually, the pattern is more complex than that and I cannot see
> how I
> > can express it without brackets.
>
> Maybe:
>
> select
> substring ('150mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
> , substring('150mm LD AD Asp XR Macro','^[\\d-]*mm (.*)$') as SUFFIX;
>
> select
> substring ('28-70mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
> , substring('28-70mm LD AD Asp XR Macro','^[\\d-]*mm (.*)$') as SUFFIX;
>
> etc...
>
> Regards,
>
> Rodrigo
>
>
> Is there a reason this column wasn't separated into two different
> columns? Or perhaps into a child table if there could be more than one
> XXXmm value in the field?
>
> Just curious.

You're absolutely right (see my other posting):

what was entered:

MODEL.NAME "150mm F4 E" TYPE.NAME -
MODEL.NAME "150mm F4 PE" TYPE.NAME -

what should've been entered:
MODEL.NAME "150mm F4"
TYPE.NAME "PE"
TYPE.NAME "E"
both referencing the same MODEL

--

Regards,

Tarlika Elisabeth Schmitz


From: "Jim Buttafuoco" <jim(at)contactbda(dot)com>
To: mailreg(at)numerixtechnology(dot)de, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 20:34:32
Message-ID: 20060707203420.M77919@contactbda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

use plperl

---------- Original Message -----------
From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Sent: Fri, 07 Jul 2006 20:23:50 +0100
Subject: Re: [SQL] SELECT substring with regex

> Rodrigo De Leon wrote:
> > On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
> >
> >> But that takes me to the next problem:
> >>
> >> For the sake of the example I simplified the regular pattern.
> >> In reality, BASE_NAME might be:
> >>
> >> 28mm
> >> 28-70mm
> >>
> >> So the reg. expr. requires brackets:
> >>
> >> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
> >>
> >> Actually, the pattern is more complex than that and I cannot see how I
> >> can express it without brackets.
> >
> >
> > Maybe:
> >
> > select
> > substring ('28-70mm LD AD Asp XR Macro', '^[\\d-]*mm' ) as BASE_NAME
>
> Sorry, but that would also capture something like
> 10-30-59mm
>
> The pattern describes either a single length (120 millimeters) or a
> range (30 to 70 millimetres), hence:
>
> \\d+(-\\d+)?mm
>
> The ? quantifier refers to the combination of '-' and digits and has to
> be bracketed.
>
> If the brackets cannot be avoided in the expression, your original
> suggestion might come in handy though:
>
> SELECT
> substring (NAME, '^\\d+(-\\d+)?mm') AS BASE_NAME ,
> substr(
> NAME
> , char_length(
> substring (NAME, '^\\d+(-\\d+)?mm')
> ) + 2
> ) AS SUFFIX
>
> Still, I'd be interested to know whether there is a 'more elegant' solution.
>
> --
>
> Regards,
>
> Tarlika Elisabeth Schmitz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
------- End of Original Message -------


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 20:49:09
Message-ID: 44AEC8C5.80103@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Erik Jones wrote:
> T E Schmitz wrote:
>
>> Gary Stainburn wrote:
>>
>>> On Friday 07 July 2006 14:51, T E Schmitz wrote:
>>>
>>>> I would like to split the contents of a column using substring with a
>>>> regular expression:
>>>>
>>>> The column contains something like
>>>> "150mm LD AD Asp XR Macro"
>>>> I want to split this into
>>>> "150mm", "LD AD Asp XR Macro"
>>>>
>>>
>>>
>>> select substring('150mm LD AD Asp XR Macro','^\\d+mm') as BASE_NAME,
>>> substring('150mm LD AD Asp XR Macro','^\\d+mm (.*)$') as SUFFIX;
>>> base_name | suffix
>>> -----------+--------------------
>>> 150mm | LD AD Asp XR Macro
>>> (1 row)
>>>
>>> The brackets surround the required match
>> But that takes me to the next problem:
>>
>> For the sake of the example I simplified the regular pattern.
>> In reality, BASE_NAME might be:
>>
>> 28mm
>> 28-70mm
>>
>> So the reg. expr. requires brackets:
>>
>> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME
>>
> Will the mm always be the end of the base name?

I had thought it best to simplify the problem for the purposes of the
mailing list but maybe I should supply the complete problem and describe
the purpose of the exercise:

I am trying to come up with a semi-automatic solution to tidy up some
data. If it's got to be done manually via the GUI it would mean a lot of
dummy work [for the customer].

First of all I did a 5 table join to select those NAMEs which don't
follow the required pattern: the pattern describes a photographic lens
(focal length followed by lens speed (aperture)) and nothing else.
Unfortuantely, there are a few hundred occurences where a few attributes
have been appended which should have been stored elsewhere.

valid entries would be:
"28mm F2.8" (prime lens)
"30-70mm F4" (zoom lens)
"30-70mm F2.8" (zoom lens)
"30-100mm F4.5-5.6" (zoom lens with variable speed)

In the WHERE clause I have specified all those NAMEs, which follow that
pattern but have some gubbins appended:

WHERE NAME ~
'^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))\\D+$'

which gives me a listing of those candidates that need to be amended -
manually or otherwise.

Next, I wanted to produce a result which splits NAME into what it should
be (BASE) and attributes (SUFFIX). Maybe I can generate some SQL from
that to tidy up the data.

--

Regards,

Tarlika Elisabeth Schmitz


From: Richard Broersma Jr <rabroersma(at)yahoo(dot)com>
To: mailreg(at)numerixtechnology(dot)de, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 21:33:01
Message-ID: 20060707213301.36130.qmail@web31803.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


> valid entries would be:
> "28mm F2.8" (prime lens)
> "30-70mm F4" (zoom lens)
> "30-70mm F2.8" (zoom lens)
> "30-100mm F4.5-5.6" (zoom lens with variable speed)
>
>
> In the WHERE clause I have specified all those NAMEs, which follow that
> pattern but have some gubbins appended:
>
> WHERE NAME ~
> '^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))\\D+$'
>
>
> which gives me a listing of those candidates that need to be amended -
> manually or otherwise.
>
> Next, I wanted to produce a result which splits NAME into what it should
> be (BASE) and attributes (SUFFIX). Maybe I can generate some SQL from
> that to tidy up the data.

would this give you the results you want? It admit that it doesn't look to elegant.

name:
substr(your_string, 0, strpos(your_string, ' ')+1)

suffix:
substr(your_string, length(your_string)-strpos(your_string, ' '), length(your_string))

Regards,

Richard Broersma Jr.


From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: mailreg(at)numerixtechnology(dot)de
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-07 22:39:47
Message-ID: bf05e51c0607071539k1de038ddt846fb158ca8423a7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
>
>
> I am trying to come up with a semi-automatic solution to tidy up some
> data. If it's got to be done manually via the GUI it would mean a lot of
> dummy work [for the customer].

I would recommend you alter the GUI to NOT allow any more bad data to get in
- perhaps give some nice validation and friendly error message about the
correct format or give various fields that need to be filled out so the user
can easily enter it and the GUI assembles the correct string. It won't fix
the current bad data but would stop the insanity from proliferating :)-

First of all I did a 5 table join to select those NAMEs which don't
> follow the required pattern: the pattern describes a photographic lens
> (focal length followed by lens speed (aperture)) and nothing else.
> Unfortuantely, there are a few hundred occurences where a few attributes
> have been appended which should have been stored elsewhere.
>
>
> valid entries would be:
> "28mm F2.8" (prime lens)
> "30-70mm F4" (zoom lens)
> "30-70mm F2.8" (zoom lens)
> "30-100mm F4.5-5.6" (zoom lens with variable speed)
>
>
> In the WHERE clause I have specified all those NAMEs, which follow that
> pattern but have some gubbins appended:
>
> WHERE NAME ~
> '^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))\\D+$'
>
>
> which gives me a listing of those candidates that need to be amended -
> manually or otherwise.
>
> Next, I wanted to produce a result which splits NAME into what it should
> be (BASE) and attributes (SUFFIX). Maybe I can generate some SQL from
> that to tidy up the data.

You might also consider adding the base and suffix columns with a trigger
that parses the name field and sets the values of base and suffix and also
does any integrity checks during inserts and updates. Finally, after the
data is cleaned up and the client apps are changed to use base and suffix
and not name, get rid of the name column.

Then again, this may be exactly what you are already trying to do.

-Aaron


From: "Rodrigo De Leon" <rdeleonp(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-08 01:24:07
Message-ID: a55915760607071824k65a09b3ahd52b16bdd59108b8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
> Sorry, but that would also capture something like
> 10-30-59mm
>
> The pattern describes either a single length (120 millimeters) or a
> range (30 to 70 millimetres), hence:
>
> \\d+(-\\d+)?mm
>
> The ? quantifier refers to the combination of '-' and digits and has to
> be bracketed.
>
> ...
>
> Still, I'd be interested to know whether there is a 'more elegant' solution.

OK, last try using regex:

SELECT
NAME
, substring(NAME, '^(\\d+(-\\d+)?mm)') AS BASE_NAME
, regexp_replace(NAME, '^\\d+(-\\d+)?mm (.*)', '\\2') AS SUFFIX
FROM MODEL

Regards,

Rodrigo


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: Rodrigo De Leon <rdeleonp(at)gmail(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-08 09:40:08
Message-ID: 44AF7D78.2050007@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Rodrigo De Leon wrote:
> On 7/7/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
>
>> Sorry, but that would also capture something like
>> 10-30-59mm
>>
>> The pattern describes either a single length (120 millimeters) or a
>> range (30 to 70 millimetres), hence:
>>
>> \\d+(-\\d+)?mm
>>
>> The ? quantifier refers to the combination of '-' and digits and has to
>> be bracketed.
>>
>> ...
>>
>> Still, I'd be interested to know whether there is a 'more elegant'
>> solution.
>
>
> OK, last try using regex:
>
> SELECT
> NAME
> , substring(NAME, '^(\\d+(-\\d+)?mm)') AS BASE_NAME
> , regexp_replace(NAME, '^\\d+(-\\d+)?mm (.*)', '\\2') AS SUFFIX
> FROM MODEL

Is regexp_replace a new feature?
I am running v 7.4.

--

Regards,

Tarlika


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: Aaron Bono <postgresql(at)aranya(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-08 15:40:15
Message-ID: 44AFD1DF.8010102@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Aaron Bono wrote:
> On 7/7/06, *T E Schmitz* <mailreg(at)numerixtechnology(dot)de
> <mailto:mailreg(at)numerixtechnology(dot)de>> wrote:
>
>
> I am trying to come up with a semi-automatic solution to tidy up some
> data. If it's got to be done manually via the GUI it would mean a lot of
> dummy work [for the customer].
>
>
> I would recommend you alter the GUI to NOT allow any more bad data to

I absolutely agree that the user should not be able to enter 'bad' data
in the first place.
However, in this case it is not a hard and fast rule, rather a
suggestion how to organize the data sensibly. The pattern is suitable
for 98% of the lenses; there are some, albeit very few, exceptions.

Also: the MODEL.NAME is a free-form field; only when referred to by a
PRODUCT which is of CATEGORY 'Lens', would the pattern apply.

The only way one could maybe assist the user would be a warning when he
tries to save a 'Lens' PRODUCT, which deviates from the pattern.

> You might also consider adding the base and suffix columns with a
> trigger that parses the name field and sets the values of base and
> suffix and also does any integrity checks during inserts and updates.
> Finally, after the data is cleaned up and the client apps are changed to
> use base and suffix and not name, get rid of the name column.

The suffix should have been entered in TYPE.NAME, which references
MODEL. The structure exists, it just needs to be used efficiently.

--

Regards,

Tarlika


From: "Rodrigo De Leon" <rdeleonp(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-08 16:48:04
Message-ID: a55915760607080948m20db3d99i8cfeca30ca906faa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 7/8/06, T E Schmitz <mailreg(at)numerixtechnology(dot)de> wrote:
> Is regexp_replace a new feature?
> I am running v 7.4.

Given the patch history:

http://archives.postgresql.org/pgsql-patches/2004-07/msg00471.php
http://archives.postgresql.org/pgsql-patches/2005-06/msg00515.php
http://archives.postgresql.org/pgsql-patches/2005-07/msg00260.php

And the release notes:

http://www.postgresql.org/docs/8.1/static/release.html

I'd say at least:

7.4.9

8.0.4

8.1

Anyone care to verify this?

Regards,

Rodrigo


From: Emils <gnudiff(at)gmail(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-10 05:11:47
Message-ID: 9dcb6fa40607092211g5e0e2099x3e6d9d3c56977a02@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

2006/7/7, T E Schmitz <mailreg(at)numerixtechnology(dot)de>:
>
> valid entries would be:
> "28mm F2.8" (prime lens)
> "30-70mm F4" (zoom lens)
> "30-70mm F2.8" (zoom lens)
> "30-100mm F4.5-5.6" (zoom lens with variable speed)

If these are the cases, wouldn't the regex be simply:

"^[\d\-]+mm" for BASE

"^[\d\-]+mm (.+)$" for SUFFIX

Or are you having to deal with malformatted data too (like "30 -70mm"
or "28 mm ")?

Emils


From: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
To: emils(dot)klotins(at)gmail(dot)com, pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-10 11:02:06
Message-ID: 44B233AE.7080003@numerixtechnology.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Emils wrote:
> 2006/7/7, T E Schmitz <mailreg(at)numerixtechnology(dot)de>:
>
>>
>> valid entries would be:
>> "28mm F2.8" (prime lens)
>> "30-70mm F4" (zoom lens)
>> "30-70mm F2.8" (zoom lens)
>> "30-100mm F4.5-5.6" (zoom lens with variable speed)
>
>
> If these are the cases, wouldn't the regex be simply:
>
> "^[\d\-]+mm" for BASE
>
> "^[\d\-]+mm (.+)$" for SUFFIX
>
> Or are you having to deal with malformatted data too (like "30 -70mm"
> or "28 mm ")?

There were quite a few malformed MODEL.NAMEs (upper/lower case,
additional blanks, missing mm) and therefore a precise regexp was required.

I did this as a 2-stage process:

WHERE !~ '^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))$'
selected all malformed MODEL.NAMEs
I corrected the malformed basenames, which left me with the BASE/SUFFIX
problem only.

I selected all of those and /knowing/ that none of them were malformed,
I used a simplified pattern without round brackets to produce the BASE
and SUFFIX result columns:

substring (MODEL.MODEL_NAME, '^[-\\d]+mm F[-\\d\.]+' ) as BASE,
substring (MODEL.MODEL_NAME, '^[-\\d]+mm F[-\\d\.]+ *(.*)$') as SUFFIX

exported that together with PKs as CSV and generated SQL from it.

Basically, my problem had been that I needed to express the SUFFIX
pattern without using brackets for the BASE.

I would like to thank everyone who contributed to this thread.

--

Regards/Gruß,

Tarlika Elisabeth Schmitz


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: T E Schmitz <mailreg(at)numerixtechnology(dot)de>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: SELECT substring with regex
Date: 2006-07-10 18:29:44
Message-ID: 20060710182944.GB26570@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

T E Schmitz wrote:

> valid entries would be:
> "28mm F2.8" (prime lens)
> "30-70mm F4" (zoom lens)
> "30-70mm F2.8" (zoom lens)
> "30-100mm F4.5-5.6" (zoom lens with variable speed)

I think you already got all the help you needed, but I want to point out
that the Nikon 18-70mm that was used to take this picture

http://people.planetpostgresql.org/mha/index.php?/archives/97-Conference-is-over....html

seems to have a lot of aberration (I don't know the technical term,
sorry). Notice that grid on the floor? It looked very straight on the
real life. I'd expect a bit of curvature but I think that's too much.

Sorry for the offtopic :-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.