Re: Removing whitespace using regexp_replace

Lists: pgsql-sql
From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Removing whitespace using regexp_replace
Date: 2007-10-28 11:20:36
Message-ID: fg1ra3$ehj$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi,

I have a column with the datatype "text" that may contain leading whitespace
(tabs, spaces newlines, ...) and I would like to remove them all (ideally
leading and trailing).

I tried

SELECT regexp_replace(myfield, '\A\s*', '')
FROM mytable;

(for leading whitespace, to start with)

But it does not remove anything. I replace my first attempt '^\s*' with '\A\s*'
after reading the chapter about newline-sensitive matching, but that doesn't
seem to do the trick either.

Just for a test I changed this to

SELECT regexp_replace(myfield, '\s*', '')
FROM mytable;

and expected *all* whitespace to be removed from my string, but only the leading
ones were replaced. Which I don't understand at all. Why weren't other
whitespace sequences not replaced with that expression?

What would be the correct RE to replace leading and trailing whitespace without
affecting anything inbetween?

I'm pretty sure I'm missing someting very obvious...

Thanks in advance
Thomas


From: Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Removing whitespace using regexp_replace
Date: 2007-10-28 11:42:47
Message-ID: 20071028114247.GA1327@KanotixBox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Thomas Kellerer <spam_eater(at)gmx(dot)net> schrieb:

> Hi,
>
> I have a column with the datatype "text" that may contain leading
> whitespace (tabs, spaces newlines, ...) and I would like to remove them all
> (ideally leading and trailing).

You can use trim() for that:

select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar ')) || 'x';

(for testing with 'x' around the result)

>
> I tried
>
> SELECT regexp_replace(myfield, '\A\s*', '')
> FROM mytable;

For regexp_replace() you need an extra parameter 'g' like below:

test=*# select 'x' || regexp_replace(regexp_replace(' \t\tfoo bar ', '^[ \t]+','','g'),'[ \t]+$','','g') || 'x';
^^^ ^^^

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Removing whitespace using regexp_replace
Date: 2007-10-28 12:15:20
Message-ID: fg1ugo$m6v$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Andreas Kretschmer wrote on 28.10.2007 12:42:
>> I have a column with the datatype "text" that may contain leading
>> whitespace (tabs, spaces newlines, ...) and I would like to remove them all
>> (ideally leading and trailing).
>
> You can use trim() for that:
>
> select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar ')) || 'x';
>
> (for testing with 'x' around the result)
Yes I was thinking about a solution like that as well, but wouldn't that only
work if the order in which spaces and tabs appear is always the same?
The above would replace ' \t' but not '\t ', right?

> For regexp_replace() you need an extra parameter 'g' like below:
Cool, works like a charm.
Didn't see that parameter when first reading that chapter.

But it seems my problem was actually caused by something else:

SELECT regexp_replace(myfield, '\s*', '', 'g')
FROM mytable;

does not replace anything, but

SELECT regexp_replace(myfield, '[ \t\n\r]*', '', 'g')
FROM mytable;

does replace all whitespaces (as I expected). And subsequently

SELECT regexp_replace(myfield, '^[ \t\n\r]*', '', 'g')
FROM mytable;

replaces only the whitespace at the beginning.

I thought \s is a "shortcut" for "whitespace", which in my understanding is the
same as [ \t\r\n]. Am I wrong here?

Cheers
Thomas


From: Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Removing whitespace using regexp_replace
Date: 2007-10-28 12:32:54
Message-ID: 20071028123254.GB17410@KanotixBox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Thomas Kellerer <spam_eater(at)gmx(dot)net> schrieb:

> Andreas Kretschmer wrote on 28.10.2007 12:42:
> >>I have a column with the datatype "text" that may contain leading
> >>whitespace (tabs, spaces newlines, ...) and I would like to remove them
> >>all (ideally leading and trailing).
> >You can use trim() for that:
> >select 'x' || trim(both '\t' from trim(both ' ' from ' \t\tfoo bar '))
> >|| 'x';
> >(for testing with 'x' around the result)
> Yes I was thinking about a solution like that as well, but wouldn't that
> only work if the order in which spaces and tabs appear is always the same?
> The above would replace ' \t' but not '\t ', right?

Oh, yes.

>
>
> >For regexp_replace() you need an extra parameter 'g' like below:
> Cool, works like a charm.

Nice, i'm glad to help you.

> But it seems my problem was actually caused by something else:
>
> SELECT regexp_replace(myfield, '\s*', '', 'g')
> FROM mytable;

you should escape the \, change to ...'\\s*'...

But without anchors this replaces all whitespaces, also within the text
and not only at the beginning/end (^ and $)

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknow)
Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Removing whitespace using regexp_replace
Date: 2007-10-28 14:35:59
Message-ID: fg26oe$dfg$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Andreas Kretschmer wrote on 28.10.2007 13:32:
>> But it seems my problem was actually caused by something else:
>>
>> SELECT regexp_replace(myfield, '\s*', '', 'g')
>> FROM mytable;
>
> you should escape the \, change to ...'\\s*'...
Ah! Didn't think this was necessary, as \t or \n did not need to be escaped.

> But without anchors this replaces all whitespaces, also within the text
> and not only at the beginning/end (^ and $)
Yes of course, this was only for testing ;)

Thanks for your help!

Thomas