proposal: support empty string as separator for string_to_array

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: proposal: support empty string as separator for string_to_array
Date: 2009-07-25 03:40:13
Message-ID: 162867790907242040r6d8e1756s40f65cdc491f135e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I have one idea, that should simplify string to char array
transformation. The base is idea: between every char is empty string,
so empty string is regular separator for string_to_array function.
This behave is inversion of array_to_string function behave:

postgres=# select array_to_string(array['a','b','c'],'');
array_to_string
-----------------
abc
(1 row)

postgres=# select string_to_array('abc','');
string_to_array
-----------------
{a,b,c}
(1 row)

Notes, ideas???

Regards
Pavel Stehule


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-25 14:15:02
Message-ID: b42b73150907250715u4534c7c1we83b5645ce72c111@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hello
>
> I have one idea, that should simplify string to char array
> transformation. The base is idea: between every char is empty string,
> so empty string is regular separator for string_to_array function.
> This behave is inversion of array_to_string function behave:
>
> postgres=# select array_to_string(array['a','b','c'],'');
>  array_to_string
> -----------------
>  abc
> (1 row)
>
> postgres=# select string_to_array('abc','');
>  string_to_array
> -----------------
>  {a,b,c}
> (1 row)

postgres=# select regexp_split_to_array('abc', '');
regexp_split_to_array
-----------------------
{a,b,c}
(1 row)

:-)

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-25 14:30:37
Message-ID: 162867790907250730o43b25956lc76e0ffac0ebe021@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/25 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Hello
>>
>> I have one idea, that should simplify string to char array
>> transformation. The base is idea: between every char is empty string,
>> so empty string is regular separator for string_to_array function.
>> This behave is inversion of array_to_string function behave:
>>
>> postgres=# select array_to_string(array['a','b','c'],'');
>>  array_to_string
>> -----------------
>>  abc
>> (1 row)
>>
>> postgres=# select string_to_array('abc','');
>>  string_to_array
>> -----------------
>>  {a,b,c}
>> (1 row)
>
> postgres=# select regexp_split_to_array('abc', '');
>  regexp_split_to_array
> -----------------------
>  {a,b,c}
> (1 row)

I know - but regexp is not necessary - simply function for string
decomposition should be faster and little bit more intuitive. Not
everybody understand reg exp.

Pavel
>
> :-)
>
> merlin
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-25 17:35:29
Message-ID: 12754.1248543329@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I have one idea, that should simplify string to char array
> transformation. The base is idea: between every char is empty string,
> so empty string is regular separator for string_to_array function.

There already is a definition for what string_to_array does with an
empty field separator, and that is not it. So this change would possibly
break existing applications. It does not seem either intuitively
correct or useful enough to justify that --- particularly seeing that
there's already another way to get the effect.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-25 17:55:21
Message-ID: 162867790907251055y2532c305s2830362c55702a54@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/25 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> I have one idea, that should simplify string to char array
>> transformation. The base is idea: between every char is empty string,
>> so empty string is regular separator for string_to_array function.
>
> There already is a definition for what string_to_array does with an
> empty field separator, and that is not it.  So this change would possibly
> break existing applications.  It does not seem either intuitively
> correct or useful enough to justify that --- particularly seeing that
> there's already another way to get the effect.

I thing, so nobody use empty separator in string_to_array, because it
does nothing useful. Or do you know any case where empty separator
should be used? I am not. My argument for "some" non regexp based
function is fact, so this function should be very light and fast.
Faster than regexp.

Other way is one param string_to_array function. This function is not
defined yet, so we could to use it.

Regards
Pavel

>
>                        regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-25 18:01:26
Message-ID: 13100.1248544886@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> 2009/7/25 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> There already is a definition for what string_to_array does with an
>> empty field separator, and that is not it.

> I thing, so nobody use empty separator in string_to_array, because it
> does nothing useful.

According to you, maybe not. But perhaps whoever coded the function
originally had a use-case in mind, or people may have come up with
one since then. In any case we have a perfectly good answer available
for anyone who wants this behavior. I see no reason to change here.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-27 16:42:06
Message-ID: 162867790907270942y74dfc1acud8e9d22b5113c61e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/25 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
>> Hello
>>
>> I have one idea, that should simplify string to char array
>> transformation. The base is idea: between every char is empty string,
>> so empty string is regular separator for string_to_array function.
>> This behave is inversion of array_to_string function behave:
>>
>> postgres=# select array_to_string(array['a','b','c'],'');
>>  array_to_string
>> -----------------
>>  abc
>> (1 row)
>>
>> postgres=# select string_to_array('abc','');
>>  string_to_array
>> -----------------
>>  {a,b,c}
>> (1 row)
>
> postgres=# select regexp_split_to_array('abc', '');
>  regexp_split_to_array
> -----------------------
>  {a,b,c}
> (1 row)
>
> :-)
>

I tested implementation and it's about 30% faster than using regexp.

I could to thing, 30% is significant reason for implementation.

regards
Pavel Stehule

> merlin
>


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-27 16:55:32
Message-ID: 4A6D95B40200002500028D92@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> I tested implementation and it's about 30% faster than using
> regexp.

Rather than making a change which could break existing applications,
how about a new function string_to_array(text) which returns an array
of "char"?

-Kevin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-27 17:03:49
Message-ID: 162867790907271003n9a6a8f5wb8ff7ea51130e8a9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/27 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>> I tested  implementation and it's about 30% faster than using
>> regexp.
>
> Rather than making a change which could break existing applications,
> how about a new function string_to_array(text) which returns an array
> of "char"?

yes, it was my idea too - or function "chars_to_array"

Pavel

>
> -Kevin
>


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-27 17:28:29
Message-ID: b42b73150907271028u57108e31nf00358d3b13a6b1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 27, 2009 at 12:42 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2009/7/25 Merlin Moncure <mmoncure(at)gmail(dot)com>:
>> On Fri, Jul 24, 2009 at 11:40 PM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> Hello
>>>
>>> I have one idea, that should simplify string to char array
>>> transformation. The base is idea: between every char is empty string,
>>> so empty string is regular separator for string_to_array function.
>>> This behave is inversion of array_to_string function behave:
>>>
>>> postgres=# select array_to_string(array['a','b','c'],'');
>>>  array_to_string
>>> -----------------
>>>  abc
>>> (1 row)
>>>
>>> postgres=# select string_to_array('abc','');
>>>  string_to_array
>>> -----------------
>>>  {a,b,c}
>>> (1 row)
>>
>> postgres=# select regexp_split_to_array('abc', '');
>>  regexp_split_to_array
>> -----------------------
>>  {a,b,c}
>> (1 row)
>>
>> :-)
>>
>
> I tested  implementation and it's about 30% faster than using regexp.
>
> I could to thing, 30% is significant reason for implementation.

yes, I noticed that too. I was thinking though that if anything
should be done, it should be to go the other way: simple cases of
regexp_split_to_array should use the simpler algorithm in
'string_to_array'...just not the '' case, since they produce different
results.

I don't think the chars_to_array function is the way to go. One thing
that might work though is to overload the string_to_array function (or
use default parameter) to control the empty string case with an bool,
or an option or something.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-27 17:51:47
Message-ID: 13161.1248717107@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I tested implementation and it's about 30% faster than using regexp.

In a real application, that's going to be negligible compared to all the
other costs involved in pushing the data around. And we still haven't
seen any in-the-field requests for this functionality, so even if the
gap were wider, I don't see the point of putting effort into it.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: support empty string as separator for string_to_array
Date: 2009-07-27 17:59:56
Message-ID: 162867790907271059u71a7cef8xb20376599d3fa2a0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/7/27 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> I tested  implementation and it's about 30% faster than using regexp.
>
> In a real application, that's going to be negligible compared to all the
> other costs involved in pushing the data around.  And we still haven't
> seen any in-the-field requests for this functionality, so even if the
> gap were wider, I don't see the point of putting effort into it.
>

This is just possible optimalisation - Maybe Merlin proposal is the
best - we could add this technique to regexp_split_to_array - when is
pattern empty, then we could to use direct char separation - without
any new function or change of current function. And somebody, that use
regexp_split_to_array now will have profit too.

Pavel

>                        regards, tom lane
>