Re: patch (for 9.1) string functions

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: patch (for 9.1) string functions
Date: 2010-03-09 14:30:58
Message-ID: 162867791003090630w2b2ad42fxc360568f1de6fee3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

this patch contains a string formatting function "format"

postgres=# select format('some message: % (current user: %)',
current_date, current_user);
format
------------------------------------------------
some message: 2010-03-09 (current user: pavel)
(1 row)

this patch add new contrib module string functions - contains mainly
sprintf function:

postgres=# select sprintf('some message: %10s (%10s)', current_date,
current_user);
sprintf
---------------------------------------
some message: 2010-03-09 ( pavel)
(1 row)

postgres=# select sprintf('some message: %10s (%-10s)', current_date,
current_user);
sprintf
---------------------------------------
some message: 2010-03-09 (pavel )
(1 row)

some string variadic functions

postgres=# select concat('ahaha',10,null,current_date, true);
concat
------------------------
ahaha,10,,2010-03-09,t
(1 row)

postgres=# select concat_sql('ahaha',10,null,current_date, true);
concat_sql
--------------------------------
'ahaha',10,NULL,'2010-03-09',t
(1 row)

postgres=# select concat_json('ahaha'::text,10,null,current_date, true);
concat_json
-----------------------------------
"ahaha",10,null,"2010-03-09",true
(1 row)

and some basic text function rvrs, left, right.

Regards
Pavel Stehule

Attachment Content-Type Size
stringfunc.diff application/octet-stream 40.4 KB

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 16:55:34
Message-ID: 7FCC6CA9-3101-4D93-8ACF-77F92B97A314@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mar 9, 2010, at 6:30 AM, Pavel Stehule wrote:

> this patch contains a string formatting function "format"
>
> postgres=# select format('some message: % (current user: %)',
> current_date, current_user);
> format
> ------------------------------------------------
> some message: 2010-03-09 (current user: pavel)
> (1 row)
>
> this patch add new contrib module string functions - contains mainly
> sprintf function:

Seems useful. Add it to the CommitFest so it doesn't get lost?

https://commitfest.postgresql.org/action/commitfest_view?id=6

Best,

David


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 16:56:59
Message-ID: 162867791003090856j2de2b447h39a8ee3bf3a188f8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/3/9 David E. Wheeler <david(at)kineticode(dot)com>:
> On Mar 9, 2010, at 6:30 AM, Pavel Stehule wrote:
>
>> this patch contains a string formatting function "format"
>>
>> postgres=# select format('some message: % (current user: %)',
>> current_date, current_user);
>>                     format
>> ------------------------------------------------
>> some message: 2010-03-09 (current user: pavel)
>> (1 row)
>>
>> this patch add new contrib module string functions - contains mainly
>> sprintf function:
>
> Seems useful. Add it to the CommitFest so it doesn't get lost?
>
>  https://commitfest.postgresql.org/action/commitfest_view?id=6

https://commitfest.postgresql.org/action/patch_view?id=287

it is there

Pavel

>
> Best,
>
> David
>


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 18:05:49
Message-ID: 4B968DFD.6090501@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> Hello
>
> this patch contains a string formatting function "format"
>
Hi Pavel,

This is supercool. Haven't tried it out yet but surely will, thanks!

Yeb Havinga


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 18:25:46
Message-ID: b42b73151003091025o5dd014a7r387b36fb497a76a2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 9, 2010 at 9:30 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> postgres=# select concat('ahaha',10,null,current_date, true);
>         concat
> ------------------------
>  ahaha,10,,2010-03-09,t

why are there commas in the output?

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 18:31:30
Message-ID: 162867791003091031q724dbd60s14edca2aa4a7189a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/3/9 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Tue, Mar 9, 2010 at 9:30 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> postgres=# select concat('ahaha',10,null,current_date, true);
>>         concat
>> ------------------------
>>  ahaha,10,,2010-03-09,t
>
> why are there commas in the output?
>

I though so comma can be default separator - but I see - it is wrong -
separator have to be empty string.

Pavel

> merlin
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 18:45:59
Message-ID: 162867791003091045i77e28cd2i5d0951250fbf8e2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

updated version, concat function doesn't use separator

Pavel

2010/3/9 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> 2010/3/9 Merlin Moncure <mmoncure(at)gmail(dot)com>:
>> On Tue, Mar 9, 2010 at 9:30 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> postgres=# select concat('ahaha',10,null,current_date, true);
>>>         concat
>>> ------------------------
>>>  ahaha,10,,2010-03-09,t
>>
>> why are there commas in the output?
>>
>
> I though so comma can be default separator - but I see - it is wrong -
> separator have to be empty string.
>
> Pavel
>
>
>> merlin
>>
>

Attachment Content-Type Size
stringfunc.diff application/octet-stream 40.4 KB

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-09 19:30:29
Message-ID: b42b73151003091130l2fd4392at2aeafeda6d640b63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 9, 2010 at 1:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> updated version, concat function doesn't use separator

btw...very cool stuff. I took a brief look at the sprintf
implementation. The main switch:
switch (pdesc.field_type)

It looks like format codes we choose not to support (like %p) are
silently ignored. Is this the correct behavior?

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-03-10 07:39:00
Message-ID: 162867791003092339t5c4f1c4bg3e47ca57895d66cf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/3/9 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Tue, Mar 9, 2010 at 1:45 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> updated version, concat function doesn't use separator
>
> btw...very cool stuff.  I took a brief look at the sprintf
> implementation.  The main switch:
> switch (pdesc.field_type)
>
> It looks like format codes we choose not to support (like %p) are
> silently ignored.  Is this the correct behavior?

it could be enhanced. sprintf is little bit baroque function and I am
not able to take all details. Please, add comment with proposals for
change.

Pavel

>
> merlin
>


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-08 07:15:38
Message-ID: 20100708161538.FFF5.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> updated version, concat function doesn't use separator

BTW, didn't you forget stringfunc.sql.in for contrib/stringfunc ?
So, I have not check stringfunc module yet.

I reviewed your patch, and format() in the core is almost ok. It's very cool!
On the other hand, contrib/stringfunc tries to implement safe-sprintf. It's
very complex, and I have questions about multi-byte character handling in it.

* How to print NULL value.
format() function prints NULL as "NULL", but RAISE statement in PL/pgSQL
does as "<NULL>". Do we need the same result for them?

postgres=# SELECT format('% vs %', 'NULL', NULL);
format
--------------
NULL vs NULL
(1 row)

postgres=# DO $$ BEGIN RAISE NOTICE '% vs %', 'NULL', NULL; END; $$;
NOTICE: NULL vs <NULL>
DO

* Error messages: "too few/many parameters"
For the same reason, "too few/many parameters specified for format()"
might be better for the messages.

For RAISE in PL/pgSQL:
ERROR: too few parameters specified for RAISE
ERROR: too many parameters specified for RAISE

* Why do you need convert multi-byte characters to wide char?
Length specifier in stringfunc_sprintf() means "character length".
But is pg_encoding_mbcliplen() enough for the purpose?

* Character-length vs. disp-length in length specifier for sprintf()
For example, '%10s' for sprintf() means "10 characters" in the code.
But there might be usages to format text values for display. In such
case, display length might be better for the length specifier.
How about having both "s" and "S"?
"%10s" -- 10 characters
"%10S" -- 10 disp length; we could use pg_dsplen() for the purpse.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-08 11:09:41
Message-ID: AANLkTimI35B5Z6cUTfqF8CvYdNhyCJR_2Z4zPP_h1PZX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/8 Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>> updated version, concat function doesn't use separator
>
> BTW, didn't you forget stringfunc.sql.in for contrib/stringfunc ?
> So, I have not check stringfunc module yet.

sorry, attached fixed patch
>
> I reviewed your patch, and format() in the core is almost ok. It's very cool!
> On the other hand, contrib/stringfunc tries to implement safe-sprintf. It's
> very complex, and I have questions about multi-byte character handling in it.
>

I use same mechanism as RAISE statement does. And it working for mer

postgres=# select sprintf('žlutý%dkůň',10);
sprintf
------------
žlutý10kůň
(1 row)

Time: 0,647 ms

postgres=# select sprintf('%s žlutý kůň','příliš');
sprintf
------------------
příliš žlutý kůň
(1 row)

Time: 11,017 ms
postgres=# select sprintf('%10s žlutý kůň','příliš');
sprintf
----------------------
příliš žlutý kůň
(1 row)

Time: 0,439 ms

> * How to print NULL value.
> format() function prints NULL as "NULL", but RAISE statement in PL/pgSQL
> does as "<NULL>". Do we need the same result for them?

I prefer just NULL. You can add "<" and ">" simple if you want. But
removing is little bit dificult.

postgres=# select sprintf('%s', coalesce(NULL, '<NULL>'));
sprintf
---------
<NULL>
(1 row)

maybe some GUC variable

stringfunc.null_string = '<NULL>' in future??

>
>    postgres=# SELECT format('% vs %', 'NULL', NULL);
>        format
>    --------------
>     NULL vs NULL
>    (1 row)
>
>    postgres=# DO $$ BEGIN RAISE NOTICE '% vs %', 'NULL', NULL; END; $$;
>    NOTICE:  NULL vs <NULL>
>    DO
>
> * Error messages: "too few/many parameters"
>  For the same reason, "too few/many parameters specified for format()"
>  might be better for the messages.
>
>  For RAISE in PL/pgSQL:
>    ERROR:  too few parameters specified for RAISE
>    ERROR:  too many parameters specified for RAISE

ook, I agree

>
> * Why do you need convert multi-byte characters to wide char?
> Length specifier in stringfunc_sprintf() means "character length".
> But is pg_encoding_mbcliplen() enough for the purpose?
>

No, I need it. I use a swprintf function - for output of formated
strings - and there are not some sprintf function for multibyte chars
:(. Without this function I don't need a multibyte->widechars
conversion, but sprintf function will be much more larger and complex.

> * Character-length vs. disp-length in length specifier for sprintf()
> For example, '%10s' for sprintf() means "10 characters" in the code.
> But there might be usages to format text values for display. In such
> case, display length might be better for the length specifier.
> How about having both "s" and "S"?
>    "%10s" -- 10 characters
>    "%10S" -- 10 disp length; we could use pg_dsplen() for the purpse.

it is independent, because I use swprintf function

postgres=# select length(sprintf('%5s', 'ščř'));
length
--------
5
(1 row)

Time: 45,485 ms
postgres=# select length(sprintf('%5s', 'abc'));
length
--------
5
(1 row)

Time: 0,499 ms

so it is equal to using a pg_dsplen()

probably original one byte behave have sense for bytea data type. But
I am not sure if we would to complicate this function for binary data.

>
> Regards,

Thank you very much for review

Pavel Stehule
> ---
> Takahiro Itagaki
> NTT Open Source Software Center
>
>
>

Attachment Content-Type Size
stringfunc.diff application/octet-stream 41.8 KB

From: Takahiro Itagaki <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-09 02:36:13
Message-ID: AANLkTikQ4kApIRtg1mCHQ3nnX9PuEgzbXYOAhe5D5NWC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/8 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> sorry, attached fixed patch

Make installcheck for contrib/stringfunc is broken.
Please run regression test with --enable-cassert build.
test stringfunc ... TRAP:
FailedAssertion("!(!lc_ctype_is_c())", File: "mbutils.c", Line: 715)
LOG: server process (PID 15121) was terminated by signal 6: Aborted

This patch contains several functions.
- format(fmt text, VARIADIC args "any")
- sprintf(fmt text, VARIADIC args "any")
- concat(VARIADIC args "any")
- concat_ws(separator text, VARIADIC args "any")
- concat_json(VARIADIC args "any")
- concat_sql(VARIADIC args "any")
- rvrs(str text)
- left(str text, n int)
- right(str text, n int)

The first one is in the core, and others are in contrib/stringfunc.
But I think almost
all of them should be in the core, because users want to write portable SQLs.
Contrib modules are not always available. Note that concat() is
supported by Oracle,
MySQL, and DB2. Also left() and right() are supported by MySQL, DB2,
and SQL Server.

Functions that depend on GUC settings should be marked as VOLATILE
instead of IMMUTABLE. I think format(), sprintf(), and all of
concat()s should be
volatile because at least timestamp depends on datestyle parameter.

concat_ws() and rvrs() should be renamed to non-abbreviated forms.
How about concat_with_sep() and reverse() ?

I think we should avoid concat_json() at the moment because there is another
development project for JSON support. The result type will be JOIN type rather
than text then.

I'm not sure usefulness of concat_sql(). Why don't you just quote all values
with quotes and separate them with comma?

>> format() function prints NULL as "NULL", but RAISE statement in PL/pgSQL
>> does as "<NULL>".
> I prefer just NULL.
> maybe some GUC variable
> stringfunc.null_string = '<NULL>' in future??

We have some choices for NULL representation. For example, empty string,
NULL, <NULL>, or (null) . What will be our choice? Each of them looks
equally reasonable for me. GUC idea is also good because we need to
mark format() as VOLATILE anyway. We have nothing to lose.

---
Takahiro Itagaki


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-09 07:40:48
Message-ID: AANLkTinlfI8-bZctFUyplPr0-AiSKtamjlvoM-i2268S@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hello

2010/7/9 Takahiro Itagaki <itagaki(dot)takahiro(at)gmail(dot)com>:
> 2010/7/8 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> sorry, attached fixed patch
>
> Make installcheck for contrib/stringfunc is broken.
> Please run regression test with --enable-cassert build.
>  test stringfunc           ... TRAP:
> FailedAssertion("!(!lc_ctype_is_c())", File: "mbutils.c", Line: 715)
>  LOG:  server process (PID 15121) was terminated by signal 6: Aborted
>

> This patch contains several functions.
> - format(fmt text, VARIADIC args "any")
> - sprintf(fmt text, VARIADIC args "any")
> - concat(VARIADIC args "any")
> - concat_ws(separator text, VARIADIC args "any")
> - concat_json(VARIADIC args "any")
> - concat_sql(VARIADIC args "any")
> - rvrs(str text)
> - left(str text, n int)
> - right(str text, n int)
>
> The first one is in the core, and others are in contrib/stringfunc.
> But I think almost
> all of them should be in the core, because users want to write portable SQLs.
> Contrib modules are not always available.  Note that concat() is
> supported by Oracle,
> MySQL, and DB2. Also left() and right() are supported by MySQL, DB2,
> and SQL Server.
>
> Functions that depend on GUC settings should be marked as VOLATILE
> instead of IMMUTABLE. I think format(), sprintf(), and all of
> concat()s should be
> volatile because at least timestamp depends on datestyle parameter.
>

ok, I'll fix it

> concat_ws() and rvrs() should be renamed to non-abbreviated forms.
> How about concat_with_sep() and reverse() ?
>

I used a well known names - concat_ws (MySQL) and rvrs (Oracle rdbms),
I like concat_ws - concat_with_sep is maybe too long. rvrs is too
short, so I'll rename it to reverse - ok?

> I think we should avoid concat_json() at the moment because there is another
> development project for JSON support. The result type will be JOIN type rather
> than text then.
>

ok

> I'm not sure usefulness of concat_sql(). Why don't you just quote all values
> with quotes and separate them with comma?
>

concat_xxx functions are helpers to serialisation. So when when you
would to generate INSERT statements for some export, and you cannot
use a COPY statement, you can do

FOR r IN
SELECT ....
LOOP
RETURN NEXT 'INSERT INTO tab(..) VALUES (' || concat_sql(r.a, r.b, r.c, ... )
END LOOP;
RETURN;

you don't need to solve anything and output is well formated SQL. Some
databases dislike quoted numeric values - and quoted nums can be
sonfusing

>>> format() function prints NULL as "NULL", but RAISE statement in PL/pgSQL
>>> does as "<NULL>".
>> I prefer just NULL.
>> maybe some GUC variable
>> stringfunc.null_string = '<NULL>' in future??
>
> We have some choices for NULL representation. For example, empty string,
> NULL, <NULL>, or (null) . What will be our choice?   Each of them looks
> equally reasonable for me. GUC idea is also good because we need to
> mark format() as VOLATILE anyway. We have nothing to lose.
>

Can ve to solve it other patch? I know to aversion core hackers to new
GUC. Now I propose just "NULL". The GUC for NULL representation has
bigger consequences - probably have to related to RAISE statement, and
to proposed functions to_string, to_array.

> ---
> Takahiro Itagaki
>

Thank You very much, I'do fix it

Pavel


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-09 09:12:19
Message-ID: AANLkTilCHWtDm5C-EY83WW9hNHyZwxs25CiAhmPOVolS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I am sending a actualised patch

* removed concat_json
* renamed function rvsr to reverse
* functions format, sprintf and concat* are stable now (as to_char for example)

2010/7/9 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> hello
>
> 2010/7/9 Takahiro Itagaki <itagaki(dot)takahiro(at)gmail(dot)com>:
>> 2010/7/8 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>> sorry, attached fixed patch
>>
>> Make installcheck for contrib/stringfunc is broken.
>> Please run regression test with --enable-cassert build.
>>  test stringfunc           ... TRAP:
>> FailedAssertion("!(!lc_ctype_is_c())", File: "mbutils.c", Line: 715)
>>  LOG:  server process (PID 15121) was terminated by signal 6: Aborted

it worked on my station :( - Fedora 64bit

can you send a backtrace, please

Regards

Pavel Stehule

>>
>
>
>> This patch contains several functions.
>> - format(fmt text, VARIADIC args "any")
>> - sprintf(fmt text, VARIADIC args "any")
>> - concat(VARIADIC args "any")
>> - concat_ws(separator text, VARIADIC args "any")
>> - concat_json(VARIADIC args "any")
>> - concat_sql(VARIADIC args "any")
>> - rvrs(str text)
>> - left(str text, n int)
>> - right(str text, n int)
>>
>> The first one is in the core, and others are in contrib/stringfunc.
>> But I think almost
>> all of them should be in the core, because users want to write portable SQLs.
>> Contrib modules are not always available.  Note that concat() is
>> supported by Oracle,
>> MySQL, and DB2. Also left() and right() are supported by MySQL, DB2,
>> and SQL Server.
>>
>> Functions that depend on GUC settings should be marked as VOLATILE
>> instead of IMMUTABLE. I think format(), sprintf(), and all of
>> concat()s should be
>> volatile because at least timestamp depends on datestyle parameter.
>>
>
> ok, I'll fix it
>
>> concat_ws() and rvrs() should be renamed to non-abbreviated forms.
>> How about concat_with_sep() and reverse() ?
>>
>
> I used a well known names - concat_ws (MySQL) and rvrs (Oracle rdbms),
> I like concat_ws - concat_with_sep is maybe too long. rvrs is too
> short, so I'll rename it to reverse - ok?
>
>> I think we should avoid concat_json() at the moment because there is another
>> development project for JSON support. The result type will be JOIN type rather
>> than text then.
>>
>
> ok
>
>> I'm not sure usefulness of concat_sql(). Why don't you just quote all values
>> with quotes and separate them with comma?
>>
>
> concat_xxx functions are helpers to serialisation. So when when you
> would to generate INSERT statements for some export, and you cannot
> use a COPY statement, you can do
>
> FOR r IN
>  SELECT ....
> LOOP
>  RETURN NEXT 'INSERT INTO tab(..) VALUES (' || concat_sql(r.a, r.b, r.c, ... )
> END LOOP;
> RETURN;
>
> you don't need to solve anything and output is well formated SQL. Some
> databases dislike quoted numeric values - and quoted nums can be
> sonfusing
>
>
>>>> format() function prints NULL as "NULL", but RAISE statement in PL/pgSQL
>>>> does as "<NULL>".
>>> I prefer just NULL.
>>> maybe some GUC variable
>>> stringfunc.null_string = '<NULL>' in future??
>>
>> We have some choices for NULL representation. For example, empty string,
>> NULL, <NULL>, or (null) . What will be our choice?   Each of them looks
>> equally reasonable for me. GUC idea is also good because we need to
>> mark format() as VOLATILE anyway. We have nothing to lose.
>>
>
> Can ve to solve it other patch? I know to aversion core hackers to new
> GUC. Now I propose just "NULL". The GUC for NULL representation has
> bigger consequences - probably have to related to RAISE statement, and
> to proposed functions to_string, to_array.
>
>> ---
>> Takahiro Itagaki
>>
>
> Thank You very much, I'do fix it
>
> Pavel
>

Attachment Content-Type Size
stringfunc.diff application/octet-stream 38.8 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 02:30:40
Message-ID: AANLkTinMM-TNC8naXfj7Tag3Y25wxIxKsVd-fpmuYr94@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/9 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> I am sending a actualised patch
> * removed concat_json
> * renamed function rvsr to reverse
> * functions format, sprintf and concat* are stable now (as to_char for example)

I'd like to move all proposed functions into the core, and not to add
contrib/stringfunc.
I think those functions are very useful and worth adding in core.
* concat(), concat_ws(), reverse(), left() and right() are ready to commit.
* format() is almost ready, except consensus of NULL representation.
* sprintf() is also useful, but we cannot use swprintf() in it because
there are many problems in converting to wide chars. We should
develop mbchar-aware version of %s formatter.
* IMHO, concat_sql() has very limited use cases. Boolean and numeric
values are not quoted, but still need product-specific conversions because
some DBs prefer 1/0 instead of true/false.
Also, dblink_build_sql_insert() provides similar functionality. Will
we have both?

> it worked on my station :( - Fedora 64bit
Still failed :-( I used UTF8 database with *locale=C* on 64bit Linux.
char2wchar() doesn't seem to work on C locale. We should avoid
using the function and converting mb chars to wide chars.

select sprintf('>>>%10s %10d<<<', 'hello', 10);
! server closed the connection unexpectedly
TRAP: FailedAssertion("!(!lc_ctype_is_c())", File: "mbutils.c", Line: 715)

#0 0x00000038c0c332f5 in raise () from /lib64/libc.so.6
#1 0x00000038c0c34b20 in abort () from /lib64/libc.so.6
#2 0x00000000006e951d in ExceptionalCondition (conditionName=<value
optimized out>, errorType=<value optimized out>, fileName=<value
optimized out>,
lineNumber=<value optimized out>) at assert.c:57
#3 0x00000000006fa8bf in char2wchar (to=0x1daf188 L"", tolen=16,
from=0x1da95b0 "%*s", fromlen=3) at mbutils.c:715
#4 0x00007f8e8c436d37 in stringfunc_sprintf (fcinfo=0x7fff9bdcd4b0)
at stringfunc.c:463

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 02:58:41
Message-ID: AANLkTim5xfU0iY4OcFD_JPu_zTZIeky_08oWsp3oZnmC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 11, 2010 at 10:30 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> 2010/7/9 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> I am sending a actualised patch
>> * removed concat_json
>> * renamed function rvsr to reverse
>> * functions format, sprintf and concat* are stable now (as to_char for example)
>
> I'd like to move all proposed functions into the core, and not to add
> contrib/stringfunc.
> I think those functions are very useful and worth adding in core.
> * concat(), concat_ws(), reverse(), left() and right() are ready to commit.
> * format() is almost ready, except consensus of NULL representation.
> * sprintf() is also useful, but we cannot use swprintf() in it because
>  there are many problems in converting to wide chars. We should
>  develop mbchar-aware version of %s formatter.
> * IMHO, concat_sql() has very limited use cases. Boolean  and numeric
>  values are not quoted, but still need product-specific conversions because
>  some DBs prefer 1/0 instead of true/false.
>  Also, dblink_build_sql_insert() provides similar functionality. Will
> we have both?

I'm all in favor of putting such things in core as are supported by
multiple competing products, but is that really true for all of these?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 03:32:24
Message-ID: AANLkTika4JCT659Z6eqHZ88w0GbBxO5rND1YWIUKtZdt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
> I'm all in favor of putting such things in core as are supported by
> multiple competing products, but is that really true for all of these?

- concat() : MySQL, Oracle, DB2
- concat_ws() : MySQL,
- left(), right() : MySQL, SQL Server, DB2
- reverse() : MySQL, SQL Server, Oracle (as utl_raw.reverse)

concat_sql(), format(), and sprintf() will be our unique features.

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 08:47:22
Message-ID: AANLkTinz_NOfHoFwqepQJUjSWgJVLtfVp4AVxMqS3O98@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sun, Jul 11, 2010 at 10:30 PM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> 2010/7/9 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>> I am sending a actualised patch
>>> * removed concat_json
>>> * renamed function rvsr to reverse
>>> * functions format, sprintf and concat* are stable now (as to_char for example)
>>
>> I'd like to move all proposed functions into the core, and not to add
>> contrib/stringfunc.
>> I think those functions are very useful and worth adding in core.
>> * concat(), concat_ws(), reverse(), left() and right() are ready to commit.
>> * format() is almost ready, except consensus of NULL representation.

what solution for this moment - be a consistent with RAISE statement ???

>> * sprintf() is also useful, but we cannot use swprintf() in it because
>>  there are many problems in converting to wide chars. We should
>>  develop mbchar-aware version of %s formatter.

ook I'll work on this - but there is same problem with NULL like a
format function

>> * IMHO, concat_sql() has very limited use cases. Boolean  and numeric
>>  values are not quoted, but still need product-specific conversions because
>>  some DBs prefer 1/0 instead of true/false.
>>  Also, dblink_build_sql_insert() provides similar functionality. Will
>> we have both?
>

I can remove it - when I checked it I found so it doesn't well
serialize PostgreSQL specific types as array or record, so I am not
against to remove it now.

> I'm all in favor of putting such things in core as are supported by
> multiple competing products, but is that really true for all of these?
>

I have not a strong opinion on this - I would to like see reverse and
format in core. But I think, so contrib is enought. Can somebody from
commiters to decide it, please? Any sprintf implemenation needs lots
of code - minimally for this function I prefer contrib for this
function.

Regards

Pavel Stehule

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 12:59:33
Message-ID: 2FC90519-3A14-4F22-B0F8-8087B9850EA6@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 11, 2010, at 10:32 PM, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> 2010/7/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> I'm all in favor of putting such things in core as are supported by
>> multiple competing products, but is that really true for all of these?
>
> - concat() : MySQL, Oracle, DB2
> - concat_ws() : MySQL,
> - left(), right() : MySQL, SQL Server, DB2
> - reverse() : MySQL, SQL Server, Oracle (as utl_raw.reverse)
>
> concat_sql(), format(), and sprintf() will be our unique features.

I think concat, left, right, and reverse should go into core, and perhaps format also. The rest sound like contrib material to me.

...Robert


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Itagaki Takahiro" <itagaki(dot)takahiro(at)gmail(dot)com>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "Takahiro Itagaki" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 17:17:25
Message-ID: 4C3B07D50200002500033463@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> wrote:

> I'd like to move all proposed functions into the core, and not to
> add contrib/stringfunc.

> Still failed :-( I used UTF8 database with *locale=C* on 64bit
> Linux.
> char2wchar() doesn't seem to work on C locale. We should avoid
> using the function and converting mb chars to wide chars.
>
> select sprintf('>>>%10s %10d<<<', 'hello', 10);
> ! server closed the connection unexpectedly
> TRAP: FailedAssertion("!(!lc_ctype_is_c())", File: "mbutils.c",
> Line: 715)
>
> #0 0x00000038c0c332f5 in raise () from /lib64/libc.so.6
> #1 0x00000038c0c34b20 in abort () from /lib64/libc.so.6
> #2 0x00000000006e951d in ExceptionalCondition
> (conditionName=<value optimized out>, errorType=<value optimized
> out>, fileName=<value optimized out>, lineNumber=<value optimized
> out>) at assert.c:57
> #3 0x00000000006fa8bf in char2wchar (to=0x1daf188 L"", tolen=16,
> from=0x1da95b0 "%*s", fromlen=3) at mbutils.c:715
> #4 0x00007f8e8c436d37 in stringfunc_sprintf
> (fcinfo=0x7fff9bdcd4b0)
> at stringfunc.c:463

Based on this and subsequent posts, I've changed this patch's status
to "Waiting on Author".

-Kevin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 17:36:18
Message-ID: AANLkTik6VG57RtLFDWKDNi1NAiUIhrhHVQlJnzP6KlNs@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/12 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:
> Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>
>> I'd like to move all proposed functions into the core, and not to
>> add contrib/stringfunc.
>
>> Still failed :-(  I used UTF8 database with *locale=C* on 64bit
>> Linux.
>> char2wchar() doesn't seem to work on C locale. We should avoid
>> using the function and converting mb chars to wide chars.
>>
>>   select sprintf('>>>%10s %10d<<<', 'hello', 10);
>> ! server closed the connection unexpectedly
>> TRAP: FailedAssertion("!(!lc_ctype_is_c())", File: "mbutils.c",
>> Line: 715)
>>
>> #0  0x00000038c0c332f5 in raise () from /lib64/libc.so.6
>> #1  0x00000038c0c34b20 in abort () from /lib64/libc.so.6
>> #2  0x00000000006e951d in ExceptionalCondition
>> (conditionName=<value optimized out>, errorType=<value optimized
>> out>, fileName=<value optimized out>, lineNumber=<value optimized
>> out>) at assert.c:57
>> #3  0x00000000006fa8bf in char2wchar (to=0x1daf188 L"", tolen=16,
>> from=0x1da95b0 "%*s", fromlen=3) at mbutils.c:715
>> #4  0x00007f8e8c436d37 in stringfunc_sprintf
>> (fcinfo=0x7fff9bdcd4b0)
>> at stringfunc.c:463
>
> Based on this and subsequent posts, I've changed this patch's status
> to "Waiting on Author".

ook - I'll send actualised version tomorrow

Pavel

>
> -Kevin
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-12 20:40:27
Message-ID: AANLkTin1R0jYklt2XbiODiko-XDOTFmY_iOfAtYWsszJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

so this is actualised patch:

* concat_sql removed
* left, right, reverse and concat are in core
* printf and concat_ws are in contrib
* format show "<NULL>" as NULL string
* removed an using of wide chars

todo:

NULL handling for printf function

Query:
what is corect result for

* printf(">>%3.2d<<", NULL) ??
* printf(">>%3.2s", NULL) ??

Regards

Pavel Stehule

2010/7/12 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> 2010/7/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> I'm all in favor of putting such things in core as are supported by
>> multiple competing products, but is that really true for all of these?
>
> - concat() : MySQL, Oracle, DB2
> - concat_ws() : MySQL,
> - left(), right() : MySQL, SQL Server, DB2
> - reverse() : MySQL, SQL Server, Oracle (as utl_raw.reverse)
>
> concat_sql(), format(), and sprintf() will be our unique features.
>
> --
> Itagaki Takahiro
>

Attachment Content-Type Size
stringfunc.diff application/octet-stream 39.6 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-13 06:50:54
Message-ID: AANLkTik5V1T2ceoDvYgLlrBUTjsva8FvGUxGLn0yKYGy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/13 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> so this is actualised patch:
> * concat_sql removed
> * left, right, reverse and concat are in core
> * printf and concat_ws are in contrib
> * format show "<NULL>" as NULL string
> * removed an using of wide chars

I think function codes in the core (concat, format, left, right,
and reverse) are ready for committers. They also have docs, but
the names are not listed in Index page (bookindex.html).
Please add
<indexterm>
<primary>funcname</primary>
</indexterm>
in func.sgml for each new function.

However, I have a couple of comments to stringfunc module. sprintf()
and concat_ws() are not installed by default, but provided by the module.

> todo:
> NULL handling for printf function

I like <NULL> for null arguments. It is just same as format() and RAISE.

=== Questions ===
* concat_ws() transforms NULLs into empty strings.
Is it an intended behavior and compatible with MySQL?
Note that string_agg() doesn't add separators to NULLs.

=# SELECT coalesce(concat_ws(',', 'A', NULL, 'B'), '(null)');
coalesce
----------
A,,B
(1 row)

* concat_ws() returns NULL when the separator is NULL.
Is it an intended behavior and compatible with MySQL?

=# SELECT coalesce(concat_ws(NULL, 'A', NULL, 'B'), '(null)');
coalesce
----------
(null)
(1 row)

=== Trivial issues ===
* Some function prototypes are declared but not used.
We can just remove them.
- mb_string_info()
- stringfunc_concat(PG_FUNCTION_ARGS);
- stringfunc_left(PG_FUNCTION_ARGS);
- stringfunc_right(PG_FUNCTION_ARGS);
- stringfunc_reverse(PG_FUNCTION_ARGS);

* Some error messages need to be improved.
For example, "1th" is wrong.
=# select sprintf('>>>%*s<<<', NULL, 'abcdef');
ERROR: null value not allowed
HINT: width (1th) arguments is NULL

* sprintf() has some typos in error messages
For example, "sprinf".

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-13 07:54:30
Message-ID: AANLkTimkwWOHbLBGMxUPVYyO-VsQLVpQp1kyaavOnjU9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/13 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> 2010/7/13 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> so this is actualised patch:
>> * concat_sql removed
>> * left, right, reverse and concat are in core
>> * printf and concat_ws are in contrib
>> * format show "<NULL>" as NULL string
>> * removed an using of wide chars
>
> I think function codes in the core (concat, format, left, right,
> and reverse) are ready for committers. They also have docs, but
> the names are not listed in Index page (bookindex.html).
> Please add
>   <indexterm>
>    <primary>funcname</primary>
>   </indexterm>
> in func.sgml for each new function.
>

fixed
> However, I have a couple of comments to stringfunc module. sprintf()
> and concat_ws() are not installed by default, but provided by the module.
>
>> todo:
>> NULL handling for printf function
>
> I like <NULL> for null arguments. It is just same as format() and RAISE.

done

>
> === Questions ===
> * concat_ws() transforms NULLs into empty strings.
> Is it an intended behavior and compatible with MySQL?
> Note that string_agg() doesn't add separators to NULLs.
>

no I was wrong - original concat_ws just ignore NULL - fixed, now
concat_ws has same behave like original.

>  =# SELECT coalesce(concat_ws(',', 'A', NULL, 'B'), '(null)');
>   coalesce
>  ----------
>   A,,B
>  (1 row)
>
> * concat_ws() returns NULL when the separator is NULL.
> Is it an intended behavior and compatible with MySQL?
>
>  =# SELECT coalesce(concat_ws(NULL, 'A', NULL, 'B'), '(null)');
>   coalesce
>  ----------
>   (null)
>  (1 row)
>
> === Trivial issues ===
> * Some function prototypes are declared but not used.
>  We can just remove them.
>  - mb_string_info()
>  - stringfunc_concat(PG_FUNCTION_ARGS);
>  - stringfunc_left(PG_FUNCTION_ARGS);
>  - stringfunc_right(PG_FUNCTION_ARGS);
>  - stringfunc_reverse(PG_FUNCTION_ARGS);
>
> * Some error messages need to be improved.
>  For example, "1th" is wrong.
>    =# select sprintf('>>>%*s<<<', NULL, 'abcdef');
>    ERROR:  null value not allowed
>    HINT:  width (1th) arguments is NULL

have you a some idea about it?

>
> * sprintf() has some typos in error messages
>  For example, "sprinf".
>

fixed

> --
> Itagaki Takahiro
>

Regards

Pavel

Attachment Content-Type Size
stringfunc.diff application/octet-stream 40.0 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-16 19:08:39
Message-ID: AANLkTikDwBjBmeuV-xwyTDs8VPY1LHUUxnz7eJ7S6VYd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I have a one idea nonstandard enhancing of sprintf - relatie often job
is a quoting in PostgreSQL. So sprintf should have a special formats
for quoted values. What do you think about

%lq ... literal quoted
%iq ... ident quoted

??

Regards

Pavel

2010/7/13 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> Hello
>
> 2010/7/13 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
>> 2010/7/13 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>>> so this is actualised patch:
>>> * concat_sql removed
>>> * left, right, reverse and concat are in core
>>> * printf and concat_ws are in contrib
>>> * format show "<NULL>" as NULL string
>>> * removed an using of wide chars
>>
>> I think function codes in the core (concat, format, left, right,
>> and reverse) are ready for committers. They also have docs, but
>> the names are not listed in Index page (bookindex.html).
>> Please add
>>   <indexterm>
>>    <primary>funcname</primary>
>>   </indexterm>
>> in func.sgml for each new function.
>>
>
> fixed
>> However, I have a couple of comments to stringfunc module. sprintf()
>> and concat_ws() are not installed by default, but provided by the module.
>>
>>> todo:
>>> NULL handling for printf function
>>
>> I like <NULL> for null arguments. It is just same as format() and RAISE.
>
> done
>
>>
>> === Questions ===
>> * concat_ws() transforms NULLs into empty strings.
>> Is it an intended behavior and compatible with MySQL?
>> Note that string_agg() doesn't add separators to NULLs.
>>
>
> no I was  wrong - original concat_ws just ignore NULL - fixed, now
> concat_ws has same behave like original.
>
>>  =# SELECT coalesce(concat_ws(',', 'A', NULL, 'B'), '(null)');
>>   coalesce
>>  ----------
>>   A,,B
>>  (1 row)
>>
>> * concat_ws() returns NULL when the separator is NULL.
>> Is it an intended behavior and compatible with MySQL?
>>
>>  =# SELECT coalesce(concat_ws(NULL, 'A', NULL, 'B'), '(null)');
>>   coalesce
>>  ----------
>>   (null)
>>  (1 row)
>>
>> === Trivial issues ===
>> * Some function prototypes are declared but not used.
>>  We can just remove them.
>>  - mb_string_info()
>>  - stringfunc_concat(PG_FUNCTION_ARGS);
>>  - stringfunc_left(PG_FUNCTION_ARGS);
>>  - stringfunc_right(PG_FUNCTION_ARGS);
>>  - stringfunc_reverse(PG_FUNCTION_ARGS);
>>
>> * Some error messages need to be improved.
>>  For example, "1th" is wrong.
>>    =# select sprintf('>>>%*s<<<', NULL, 'abcdef');
>>    ERROR:  null value not allowed
>>    HINT:  width (1th) arguments is NULL
>
> have you a some idea about it?
>
>>
>> * sprintf() has some typos in error messages
>>  For example, "sprinf".
>>
>
> fixed
>
>> --
>> Itagaki Takahiro
>>
>
> Regards
>
> Pavel
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-21 03:27:00
Message-ID: AANLkTimuv9Rxu1cRQwQa4pO4Av9kAIwUrW-LzQ8IlZrf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I reviewed the core changes of the patch. I don't think we need
mb_string_info() at all. Instead, we can just call pg_mbxxx() functions.

I rewrote the patch to use pg_mbstrlen_with_len() and pg_mbcharcliplen().
What do you think the changes? It requires re-counting lengths of multi-byte
strings in some cases, but the code will be much simpler and can avoid
allocating length buffers.

I'd like to apply contrib/stringinfo apart from the core changes,
because there seems to be still some idea to improve sprintf().

--
Itagaki Takahiro

Attachment Content-Type Size
stringfunc_core-20100721.diff application/octet-stream 14.7 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-21 06:29:49
Message-ID: AANLkTikmytco6sgwM2jLHLvrrf4uQo9lNoGMMj9KZyyO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/21 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> I reviewed the core changes of the patch. I don't think we need
> mb_string_info() at all. Instead, we can just call pg_mbxxx() functions.
>
> I rewrote the patch to use pg_mbstrlen_with_len() and pg_mbcharcliplen().
> What do you think the changes? It requires re-counting lengths of multi-byte
> strings in some cases, but the code will be much simpler and can avoid
> allocating length buffers.
>

It is a good idea. I see a problem only for "right" function, where
for most common use case a mblen will be called two times. I am not
able to say now, if this can be a performance issue or not. Highly
probably not - only for very large strings.

postgres=# create or replace function randomstr(int) returns text as
$$select string_agg(substring('abcdefghijklmnop' from
trunc(random()*13)::int+1 for 1),'') from generate_series(1,$1) $$
language sql;
CREATE FUNCTION
Time: 27,452 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5615,061 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5606,937 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5630,771 ms

postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)

Time: 5753,063 ms
postgres=# select count(*) from(select right(randomstr(1000),3) from
generate_series(1,10000))x;
count
-------
10000
(1 row)
Time: 5755,776 ms

It is about 2% slower for UTF8 encoding. So it isn't significant for me.

I agree with your changes. Thank You very much

Regards

Pavel Stehule

> I'd like to apply contrib/stringinfo apart from the core changes,
> because there seems to be still some idea to improve sprintf().
>
> --
> Itagaki Takahiro
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-23 02:47:40
Message-ID: AANLkTikdSujjNLfkH2AD8fyfV_ydeqoTHS9tPJYHLu2R@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/21 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> It is about 2% slower for UTF8 encoding. So it isn't significant for me.
> I agree with your changes. Thank You very much

Thanks. The core-part is almost ready to commit.
I'll continue to review the contrib part.

But I found there is a design issue in format() :
Appending a '%' is common use-case, but format() cannot append
% char without any spaces between placeholder and the raw % char.

itagaki=# SELECT format('%%%', 10), format('% %%', 10);
format | format
--------+--------
%10 | 10 %
(1 row)

It is a design issue, and RAISE in PL/pgSQL has the same issue, too.
Do we accept the restriction? Or should we add another escape
syntax and/or placeholder pattern?

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-23 05:46:40
Message-ID: AANLkTikvPxPEfEL=U=tGNEXs_eiZJ27ih_sD7=-somi9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/23 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> 2010/7/21 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> It is about 2% slower for UTF8 encoding. So it isn't significant for me.
>> I agree with your changes. Thank You very much
>
> Thanks. The core-part is almost ready to commit.
> I'll continue to review the contrib part.
>
> But I found there is a design issue in format() :
> Appending a '%' is common use-case, but format() cannot append
> % char without any spaces between placeholder and the raw % char.
>
> itagaki=# SELECT format('%%%', 10), format('% %%', 10);
>  format | format
> --------+--------
>  %10    | 10 %
> (1 row)
>
> It is a design issue, and RAISE in PL/pgSQL has the same issue, too.
> Do we accept the restriction? Or should we add another escape
> syntax and/or placeholder pattern?
>

I prefer a current behave - RAISE statement uses same and it is not
reported as bug for ten years, what I read a mailing lists. I would to
have a FORMAT implementation simple as possible.

and there is simple workaround:

postgres=# create or replace function fx()
returns void as $$
begin
raise notice '>>>%<<<', '%';
end;
$$ language plpgsql;
CREATE FUNCTION
Time: 560.063 ms
postgres=# select fx();
NOTICE: >>>%<<<
fx
────

(1 row)

Regards

Pavel Stehule

> --
> Itagaki Takahiro
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-23 06:04:26
Message-ID: AANLkTil2s2B1af48PruuF7iQSaPGOhWJfCl2nLmZTqgJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm reviewing contrib part of the string functions patch.

I found an issue in sprintf() to print integer values. In this case,
'l' (for long type) is used on *all* platforms. For example,
SELECT sprintf('%d', 10);
internally uses
appendStringInfo('%ld', (int64) 10)

But there are some platform that requires to use %lld for int64 format, probably
on Windows. That's why we have INT64_FORMAT macro. sprintf() needs to be
adjusted to use INT64_FORMAT or similar portable codes.

Other portion of the patch seems to be OK for me,
unless you have still some idea to extend the feature.

2010/7/17 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> I have a one idea nonstandard enhancing of sprintf - relatie often job
> is a quoting in PostgreSQL. So sprintf should have a special formats
> for quoted values. What do you think about
>
> %lq ... literal quoted
> %iq ... ident quoted

They save some keyboard types to write quote_literal() and quote_ident(), right?
They seem to be useful and reasonable for me. One comment is that you might
want to print NULL values as "NULL" instead of "<NULL>" in such cases.

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-23 06:22:12
Message-ID: AANLkTi=Wp8J2w3eSqZLURETM2NA1wKWC9wswWqpTAsud@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/23 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> I'm reviewing contrib part of the string functions patch.
>
> I found an issue in sprintf() to print integer values. In this case,
> 'l' (for long type) is used on *all* platforms. For example,
>  SELECT sprintf('%d', 10);
> internally uses
>  appendStringInfo('%ld', (int64) 10)
>
> But there are some platform that requires to use %lld for int64 format, probably
> on Windows. That's why we have INT64_FORMAT macro. sprintf() needs to be
> adjusted to use INT64_FORMAT or similar portable codes.

ok, I'll look on it

>
> Other portion of the patch seems to be OK for me,
> unless you have still some idea to extend the feature.
>
> 2010/7/17 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> I have a one idea nonstandard enhancing of sprintf - relatie often job
>> is a quoting in PostgreSQL. So sprintf should have a special formats
>> for quoted values. What do you think about
>>
>> %lq ... literal quoted
>> %iq ... ident quoted
>
> They save some keyboard types to write quote_literal() and quote_ident(), right?
> They seem to be useful and reasonable for me. One comment is that you might
> want to print NULL values as "NULL" instead of "<NULL>" in such cases.
>

yes, it is good note

Thank You very much

Regards

Pavel Stehule

> --
> Itagaki Takahiro
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-24 15:17:33
Message-ID: AANLkTi=qb5G4HwkagzWo4Yvbs_6-d07G79tFE6=bVoQC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/23 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> I'm reviewing contrib part of the string functions patch.
>
> I found an issue in sprintf() to print integer values. In this case,
> 'l' (for long type) is used on *all* platforms. For example,
>  SELECT sprintf('%d', 10);
> internally uses
>  appendStringInfo('%ld', (int64) 10)
>

> But there are some platform that requires to use %lld for int64 format, probably
> on Windows. That's why we have INT64_FORMAT macro. sprintf() needs to be
> adjusted to use INT64_FORMAT or similar portable codes.
>

fixed - it depends on INT64_FORMAT now.

> Other portion of the patch seems to be OK for me,
> unless you have still some idea to extend the feature.
>
> 2010/7/17 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> I have a one idea nonstandard enhancing of sprintf - relatie often job
>> is a quoting in PostgreSQL. So sprintf should have a special formats
>> for quoted values. What do you think about
>>
>> %lq ... literal quoted
>> %iq ... ident quoted
>
> They save some keyboard types to write quote_literal() and quote_ident(), right?
> They seem to be useful and reasonable for me. One comment is that you might
> want to print NULL values as "NULL" instead of "<NULL>" in such cases.
>

NULL is showed as NULL for literal quoting and when ident quoting is
used, then exception is raised.

Maybe last rule is too hard, but it should be a protection before SQL
injection via mal formated SQL

Regards

Pavel

> --
> Itagaki Takahiro
>

Attachment Content-Type Size
stringfunc.diff text/x-patch 41.0 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 03:29:32
Message-ID: AANLkTimyWtTMAmfpEU0UGeQuciZK4D2oQ8YtRT7NWwYc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I merged and enhanced some part of your patch:
- contrib/stringfunc are merged in the core patch
- Old format() is replaced with sprintf(), but the function name is
still format().
- Support %q as alias for %iq.

2010/7/25 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> fixed - it depends on INT64_FORMAT now.
I modified the code a bit not to expect 'll' or 'l'.

> %lq ... literal quoted
> %iq ... ident quoted
I also modified 'q' without specifier, i.e, %q is handled as same as %lq.

>> But I found there is a design issue in format() :
> I prefer a current behave - RAISE statement uses same and it is not
> reported as bug for ten years

I think RAISE is badly designed. Using % as a placeholder has a limitation
to format strings. For example, format() cannot work as concat():
SELECT format('%%', 123, 456) => ERROR

So, my proposal is renaming stringfunc//sprintf() to format(),
and moving it into the core. I think sprintf() is superior to format()
in every aspect; '%s%s' works as concat(), and '%s%%' can append
% without blanks.

Then, concat_ws() will be moved into core because contrib/stringfunc
only has the function now. In addition, I'd like to include the function for
the compatibility to MySQL. Also, concat() and concat_ws() can share
the implementation.

Comments?

--
Itagaki Takahiro

Attachment Content-Type Size
stringfunc_core-20100726.diff application/octet-stream 33.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 03:44:28
Message-ID: AANLkTinLDc1YT5hHcDiueY+Eg8vR6pN22ZBqFEqF6LY-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> I merged and enhanced some part of your patch:
>  - contrib/stringfunc are merged in the core patch
>  - Old format() is replaced with sprintf(), but the function name is
> still format().
>  - Support %q as alias for %iq.
>
> 2010/7/25 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> fixed - it depends on INT64_FORMAT now.
> I modified the code a bit not to expect 'll' or 'l'.
>
>> %lq ... literal quoted
>> %iq ... ident quoted
> I also modified 'q' without specifier, i.e, %q is handled as same as %lq.
>
>>> But I found there is a design issue in format() :
>> I prefer a current behave - RAISE statement uses same and it is not
>> reported as bug for ten years
>
> I think RAISE is badly designed. Using % as a placeholder has a limitation
> to format strings. For example, format() cannot work as concat():
>  SELECT format('%%', 123, 456) => ERROR
>
> So, my proposal is renaming stringfunc//sprintf() to format(),
> and moving it into the core. I think sprintf() is superior to format()
> in every aspect; '%s%s' works as concat(), and '%s%%' can append
> % without blanks.
>

Sorry, I am strong against. Using a format just for string string
concation is bad idea - there are a concat function, there are ||
operator. Look on complexity of format/RAISE and sprintf. More - it
can be strange, when we have a "format" function and it is almost
"sprintf". I still prefer simplicity of format - you have a true - it
has a issue, but there are simply workaround

format('%', 123||345)

format is very simple - but usually you don't need to specify a with,
a precision.

sprintf has some issue based on common sprintf implementation and
expecting too. For example a precision is used very dynamically - it
has a different sense for integers and for floats, so I wouldn't have
a sprintf in core.

> Then, concat_ws() will be moved into core because contrib/stringfunc
> only has the function now. In addition, I'd like to include the function for
> the compatibility to MySQL. Also, concat() and concat_ws() can share
> the implementation.
>
> Comments?

I disagree - please, return to prev variant

Regards

Pavel Stehule
>
> --
> Itagaki Takahiro
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 04:00:50
Message-ID: AANLkTimowzPkW4E0rRF0iWzDJj5tJdUjUnh0L06porde@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> sprintf has some issue based on common sprintf implementation and
> expecting too. For example a precision is used very dynamically - it
> has a different sense for integers and for floats, so I wouldn't have
> a sprintf in core.

Why do we need to have similar functions in core and contrib?
It will just confuse users. If you want to RAISE-version of format(),
I don't want to have stringfunc in contrib.

sprintf() is cool! So, I'd like to use sprintf() by default rather than
format() which has limited features. Almost all users don't know
well about contrib modules. Books about functions in inter-databases
don't consider about postgres' contrib modules. That's why I want to
move the useful features into core rather than contrib modules.

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 04:33:47
Message-ID: AANLkTim8SSR5BNjg6fxQJ05eBh4HSz5iF6gT_1zBUj2h@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> 2010/7/26 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
>> sprintf has some issue based on common sprintf implementation and
>> expecting too. For example a precision is used very dynamically - it
>> has a different sense for integers and for floats, so I wouldn't have
>> a sprintf in core.
>
> Why do we need to have similar functions in core and contrib?
> It will just confuse users. If you want to RAISE-version of format(),
> I don't want to have stringfunc in contrib.
>

:(

please, look back to discus about this module. There was desided, so
"format" will be in core and "sprintf" in contrib. One reason for this
decision was complexity of printf's implementation.

> sprintf() is cool! So, I'd like to use sprintf() by default rather than
> format() which has limited features. Almost all users don't know
> well about contrib modules. Books about functions in inter-databases
> don't consider about postgres' contrib modules. That's why I want to
> move the useful features into core rather than contrib modules.
>

I have a different opinion and I am not alone. sprintf is good for c
language, but it is problematic in scripting environments, where are
not pointers, where we have more info about variables - where we can
use a introspection - it is like dinosaurus in IT. My implementation
is little bit simple, bacause it is use a buildin functionality - but
still it has more then hundred rows. The full implementation has about
thousand rows. More sprintf is little bit slower than format - it have
to do little bit more work - and it can be confusing for people who
doesn't known it well.

for example - sprintf("%d", 10.2) ---> 10.

next - sprintf respect common standard - but this standard doesn't
calculate with PostgreSQL datatypes - there are not support for
"date", "timestemp" for example.

Function format is designed to work with builtin function to_char.
This is simple and full functional combination - I have not a plan to
replace it.

Regards

Pavel Stehule

> --
> Itagaki Takahiro
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 12:02:44
Message-ID: AANLkTinT6yQcr65aVS+JWPDM4OHzOo0GA5rBEE3HYQGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 25, 2010 at 11:29 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> I think RAISE is badly designed. Using % as a placeholder has a limitation
> to format strings. For example, format() cannot work as concat():
>  SELECT format('%%', 123, 456) => ERROR

It's hard to argue with this, as far as it goes.

> So, my proposal is renaming stringfunc//sprintf() to format(),
> and moving it into the core. I think sprintf() is superior to format()
> in every aspect; '%s%s' works as concat(), and '%s%%' can append
> % without blanks.

So forget about format() and put sprintf() in contrib/stringfunc.
That's not an argument for putting anything in core. Perhaps such an
argument can be made, but this isn't it.

> Then, concat_ws() will be moved into core because contrib/stringfunc
> only has the function now. In addition, I'd like to include the function for
> the compatibility to MySQL. Also, concat() and concat_ws() can share
> the implementation.

Regardless of where this function ends up, the concat_ws documentation
should contain some mention of the fact that "ws" is intended to mean
"with separator", and that the naming is chosen for compatibility with
MySQL. As for where to put it, I see no terribly compelling reason
why it needs to be in core. You can write array_to_string(array[txt1,
txt2, txt3], sep) and get the same effect as concat_ws(sep, txt1,
txt2, txt3). I don't really want to start maintaining duplicate
functionality for things like this, especially since MySQL users will
no doubt expect that our implementation will be bug-compatible. If
the output isn't identical to what MySQL does for every set of
arguments, it'll be reported as a bug.

Come to think of it, have we checked that the behavior of LEFT, RIGHT,
REVERSE, etc. is the same on other DBs, especially as far as nulls,
empty strings, too-large or negative subscripts, etc is concerned? Is
CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
implements here? And why does CONCAT() take a variadic "ANY"
argument? Shouldn't that be variadic TEXT?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 13:10:06
Message-ID: AANLkTi=h2wGYM1y6D8_7HX=tugtKaFasg6=gDnWBKpaZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 8:02 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Regardless of where this function ends up, the concat_ws documentation
> should contain some mention of the fact that "ws" is intended to mean
> "with separator",

big +1 on that -- I've been loosely following the thread and I had
assumed 'ws' meant 'wide string' all this time :-).

> Come to think of it, have we checked that the behavior of LEFT, RIGHT,
> REVERSE, etc. is the same on other DBs, especially as far as nulls,
> empty strings, too-large or negative subscripts, etc is concerned?

Probably 'standard' behavior wrt null would be to be strict; return
null if any argument is null. The proposed behavior seems ok though.

> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
> implements here?  And why does CONCAT() take a variadic "ANY"
> argument?  Shouldn't that be variadic TEXT?

What does that accomplish, besides forcing you to sprinkle every
concat call with text casts (maybe that's not a bad thing?)?

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 13:26:05
Message-ID: AANLkTimKiNH0E8BRvZthgpuvxdY4FR_m6w81-px867St@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>> implements here?  And why does CONCAT() take a variadic "ANY"
>> argument?  Shouldn't that be variadic TEXT?
>

CONCAT with variadic text parameter will be limited with existing
default casts to text - for example, you can't to cast date to text,
int to text.

postgres=# create or replace function concat(variadic text[]) returns
text as $$select string_agg(x,'') from unnest($1) x$$ language sql;
CREATE FUNCTION

postgres=# select concat('a','b');
concat
--------
ab
(1 row)

Time: 20,812 ms
postgres=# select concat('a',10);
ERROR: function concat(unknown, integer) does not exist
LINE 1: select concat('a',10);
^
HINT: No function matches the given name and argument types. You
might need to add explicit type casts.

so with variadic "any"[] concat doesn't need explicit cats.

Regards

Pavel Stehule

p.s. inside function is every value transformed to text.

> merlin
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 13:26:59
Message-ID: AANLkTinsM5o9dztt6vPecGyKiHH-aYr71izNh59_EoFZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>> implements here?  And why does CONCAT() take a variadic "ANY"
>> argument?  Shouldn't that be variadic TEXT?
>
> What does that accomplish, besides forcing you to sprinkle every
> concat call with text casts (maybe that's not a bad thing?)?

You could ask the same thing about the existing || operator. And in
fact, we used to have that behavior. We changed it in 8.3. Perhaps
that was a good decision and perhaps it wasn't, but I don't think
using CONCAT() to make an end-run around that decision is the way to
go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 14:39:02
Message-ID: AANLkTimA-4RObdVx8-NJ6G1CMWBCkk6TOo=M35EAEstK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 9:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>>> implements here?  And why does CONCAT() take a variadic "ANY"
>>> argument?  Shouldn't that be variadic TEXT?
>>
>> What does that accomplish, besides forcing you to sprinkle every
>> concat call with text casts (maybe that's not a bad thing?)?
>
> You could ask the same thing about the existing || operator.  And in
> fact, we used to have that behavior.  We changed it in 8.3.  Perhaps
> that was a good decision and perhaps it wasn't, but I don't think
> using CONCAT() to make an end-run around that decision is the way to
> go.

It was absolutely a good decision because it prevented type inference
in ways that were ambiguous or surprising (for a canonical case see:
http://www.mail-archive.com/pgsql-general(at)postgresql(dot)org/msg93224.html).

|| operator is still pretty tolerant in the 8.3+ world.
select interval_col || bool_col; -- error
select interval_col::text || bool_col; -- text concatenation
select text_col || interval_col || bool_col; -- text concatenation

variadic text would require text casts on EVERY non 'unknown' argument
which drops it below the threshold of usefulness IMO -- it would be
far stricter than vanilla || concatenation. Ok, pure bikeshed here
(shutting my trap now!), but concat() is one of those wonder functions
that you want to make as usable and terse as possible. I don't see
the value in making it overly strict.

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 15:07:50
Message-ID: AANLkTikTma8hZfsFyDtN+We1fXW8Zj82KPyi0X_uv6XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 10:39 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Mon, Jul 26, 2010 at 9:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>>>> implements here?  And why does CONCAT() take a variadic "ANY"
>>>> argument?  Shouldn't that be variadic TEXT?
>>>
>>> What does that accomplish, besides forcing you to sprinkle every
>>> concat call with text casts (maybe that's not a bad thing?)?
>>
>> You could ask the same thing about the existing || operator.  And in
>> fact, we used to have that behavior.  We changed it in 8.3.  Perhaps
>> that was a good decision and perhaps it wasn't, but I don't think
>> using CONCAT() to make an end-run around that decision is the way to
>> go.
>
> It was absolutely a good decision because it prevented type inference
> in ways that were ambiguous or surprising (for a canonical case see:
> http://www.mail-archive.com/pgsql-general(at)postgresql(dot)org/msg93224.html).
>
> || operator is still pretty tolerant in the 8.3+ world.
> select interval_col || bool_col; -- error
> select interval_col::text || bool_col; -- text concatenation
> select text_col || interval_col || bool_col; -- text concatenation
>
> variadic text would require text casts on EVERY non 'unknown' argument
> which drops it below the threshold of usefulness IMO -- it would be
> far stricter than vanilla || concatenation.  Ok, pure bikeshed here
> (shutting my trap now!), but concat() is one of those wonder functions
> that you want to make as usable and terse as possible.  I don't see
> the value in making it overly strict.

I'm just very skeptical that we should give our functions argument
types that are essentially fantasy. CONCAT() doesn't concatenate
integers or intervals or boxes: it concatenates strings, and only
strings. Surely the right fix, if our casting rules are too
restrictive, is to fix the casting rules; not to start adding a bunch
of kludgery in every function we define.

The problem with your canonical example (and the other examples of
this type I've seen) is that they are underspecified. Given two
identically-named operators, one of which accepts types T1 and T2, and
the other of which accepts types T3 and T4, what is one to do with T1
OP T4? You can cast T1 to T3 and call the first operator or you can
cast T4 to T2 and call the second one, and it's really not clear which
is right, so you had better thrown an error. The same applies to
functions: given foo(text) and foo(date) and the call
foo('2010-04-15'), you had better complain, or you may end up with a
very sad user. But sometimes our current casting rules require casts
in situations where they don't seem necessary, such as
LPAD(integer_column, '0', 5). That's not ambiguous because there's
only one definition of LPAD, and the chances that the user will be
unpleasantly surprised if you call it seem quite low.

In other words, this problem is not unique to CONCAT().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 15:39:40
Message-ID: AANLkTi=iwCxJuLMYYoxzeSQTKgjcgopPkJFw+=Kwd9xc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 11:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 26, 2010 at 10:39 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> It was absolutely a good decision because it prevented type inference
>> in ways that were ambiguous or surprising (for a canonical case see:
>> http://www.mail-archive.com/pgsql-general(at)postgresql(dot)org/msg93224.html).
>>
>> || operator is still pretty tolerant in the 8.3+ world.
>> select interval_col || bool_col; -- error
>> select interval_col::text || bool_col; -- text concatenation
>> select text_col || interval_col || bool_col; -- text concatenation
>>
>> variadic text would require text casts on EVERY non 'unknown' argument
>> which drops it below the threshold of usefulness IMO -- it would be
>> far stricter than vanilla || concatenation.  Ok, pure bikeshed here
>> (shutting my trap now!), but concat() is one of those wonder functions
>> that you want to make as usable and terse as possible.  I don't see
>> the value in making it overly strict.
>
> I'm just very skeptical that we should give our functions argument
> types that are essentially fantasy.  CONCAT() doesn't concatenate
> integers or intervals or boxes: it concatenates strings, and only
> strings.  Surely the right fix, if our casting rules are too
> restrictive, is to fix the casting rules; not to start adding a bunch
> of kludgery in every function we define.
>
> The problem with your canonical example (and the other examples of
> this type I've seen) is that they are underspecified.  Given two
> identically-named operators, one of which accepts types T1 and T2, and
> the other of which accepts types T3 and T4, what is one to do with T1
> OP T4?  You can cast T1 to T3 and call the first operator or you can
> cast T4 to T2 and call the second one, and it's really not clear which
> is right, so you had better thrown an error.  The same applies to
> functions: given foo(text) and foo(date) and the call
> foo('2010-04-15'), you had better complain, or you may end up with a
> very sad user.  But sometimes our current casting rules require casts
> in situations where they don't seem necessary, such as
> LPAD(integer_column, '0', 5).  That's not ambiguous because there's
> only one definition of LPAD, and the chances that the user will be
> unpleasantly surprised if you call it seem quite low.
>
> In other words, this problem is not unique to CONCAT().

shoot, can't resist :-).

Are the casting rules broken? If you want to do lpad w/o casts for
integers, you can define it explicitly -- feature, not bug. You can
basically do this for any function with fixed arguments -- either in
userland or core. lpad(int) actually introduces a problem case with
the minus sign possibly requiring application specific intervention,
so things are probably correct exactly as they are. Casting rules
need to be tight because if they are not they can introduce
independent behaviors when you underspecify as you noted above.

ISTM you are unhappy with the "any" variadic mechanism in general, not
the casting rules. "any" essentially means: "the types you pass to me
are irrelevant, because i'm going to look up behaviors on the server
and apply them as I see fit". You've defined a regular methodology
across ALL types and allow the user to pass anything -- I see nothing
at all wrong with this as long as it's only implemented in very
special cases. If you happen to not like the predefined 'box'
behavior, nothing is stopping you from massaging it before sending it
in. Also, there's no guarantee that the behavior hidden under the
"any" can be reproduced via manual cast...concat() is some thing of a
special case where you can.

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 16:04:27
Message-ID: AANLkTinxeY3q8gZNLif_kXWWQe6hgK=aCBTLYmCFnwKv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Jul 26, 2010 at 10:39 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Mon, Jul 26, 2010 at 9:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, Jul 26, 2010 at 9:10 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>>>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>>>>> implements here?  And why does CONCAT() take a variadic "ANY"
>>>>> argument?  Shouldn't that be variadic TEXT?
>>>>
>>>> What does that accomplish, besides forcing you to sprinkle every
>>>> concat call with text casts (maybe that's not a bad thing?)?
>>>
>>> You could ask the same thing about the existing || operator.  And in
>>> fact, we used to have that behavior.  We changed it in 8.3.  Perhaps
>>> that was a good decision and perhaps it wasn't, but I don't think
>>> using CONCAT() to make an end-run around that decision is the way to
>>> go.
>>
>> It was absolutely a good decision because it prevented type inference
>> in ways that were ambiguous or surprising (for a canonical case see:
>> http://www.mail-archive.com/pgsql-general(at)postgresql(dot)org/msg93224.html).
>>
>> || operator is still pretty tolerant in the 8.3+ world.
>> select interval_col || bool_col; -- error
>> select interval_col::text || bool_col; -- text concatenation
>> select text_col || interval_col || bool_col; -- text concatenation
>>
>> variadic text would require text casts on EVERY non 'unknown' argument
>> which drops it below the threshold of usefulness IMO -- it would be
>> far stricter than vanilla || concatenation.  Ok, pure bikeshed here
>> (shutting my trap now!), but concat() is one of those wonder functions
>> that you want to make as usable and terse as possible.  I don't see
>> the value in making it overly strict.
>
> I'm just very skeptical that we should give our functions argument
> types that are essentially fantasy.  CONCAT() doesn't concatenate
> integers or intervals or boxes: it concatenates strings, and only
> strings.  Surely the right fix, if our casting rules are too
> restrictive, is to fix the casting rules; not to start adding a bunch
> of kludgery in every function we define.
>

Rules are correct probably. The problem is in searching function
algorithm - it is too simple (and fast, just only one cycle). And some
exceptions - like COALESCE and similar are solved specifically on
parser level. We cannot enforce some casting on user level. PostgreSQL
is more strict with timestamps, dates than other databases. Sometimes
you have to do explicit casts, but it clean from context desired
datatype -

SELECT date_trunc('day', current_date) - result is timestamp, but it
is clean, so result have to be date ... When I proposed a parser hook
I though about these functions. With this hook, you can enforce any
necessary casting like some buildin functions does.

so "any"

> The problem with your canonical example (and the other examples of
> this type I've seen) is that they are underspecified.  Given two
> identically-named operators, one of which accepts types T1 and T2, and
> the other of which accepts types T3 and T4, what is one to do with T1
> OP T4?  You can cast T1 to T3 and call the first operator or you can
> cast T4 to T2 and call the second one, and it's really not clear which
> is right, so you had better thrown an error.  The same applies to
> functions: given foo(text) and foo(date) and the call
> foo('2010-04-15'), you had better complain, or you may end up with a
> very sad user.  But sometimes our current casting rules require casts
> in situations where they don't seem necessary, such as
> LPAD(integer_column, '0', 5).  That's not ambiguous because there's
> only one definition of LPAD, and the chances that the user will be
> unpleasantly surprised if you call it seem quite low.
>
> In other words, this problem is not unique to CONCAT().
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 16:10:19
Message-ID: AANLkTik6QRMSYCCHwo+OMh81E1ZEhJdBD8cGhBpQqPxx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> so "any" datatype is last possibility - is workaroud for custom functions.

Probably the most correct implementation of CONCAT have to contains a
parser changes - and then you can use a some internal concat function
with text only parameters. VARIADIC with "any" is just workaround that
is enouhg

Regards

Pavel

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>> The problem with your canonical example (and the other examples of
>> this type I've seen) is that they are underspecified.  Given two
>> identically-named operators, one of which accepts types T1 and T2, and
>> the other of which accepts types T3 and T4, what is one to do with T1
>> OP T4?  You can cast T1 to T3 and call the first operator or you can
>> cast T4 to T2 and call the second one, and it's really not clear which
>> is right, so you had better thrown an error.  The same applies to
>> functions: given foo(text) and foo(date) and the call
>> foo('2010-04-15'), you had better complain, or you may end up with a
>> very sad user.  But sometimes our current casting rules require casts
>> in situations where they don't seem necessary, such as
>> LPAD(integer_column, '0', 5).  That's not ambiguous because there's
>> only one definition of LPAD, and the chances that the user will be
>> unpleasantly surprised if you call it seem quite low.
>>
>> In other words, this problem is not unique to CONCAT().
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise Postgres Company
>>
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 17:45:06
Message-ID: AANLkTikoH2jNzo2sP1sApGTwGYCSxbSmBw+4sdHWYi_4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 11:39 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> I'm just very skeptical that we should give our functions argument
>> types that are essentially fantasy.  CONCAT() doesn't concatenate
>> integers or intervals or boxes: it concatenates strings, and only
>> strings.  Surely the right fix, if our casting rules are too
>> restrictive, is to fix the casting rules; not to start adding a bunch
>> of kludgery in every function we define.
>>
>> The problem with your canonical example (and the other examples of
>> this type I've seen) is that they are underspecified.  Given two
>> identically-named operators, one of which accepts types T1 and T2, and
>> the other of which accepts types T3 and T4, what is one to do with T1
>> OP T4?  You can cast T1 to T3 and call the first operator or you can
>> cast T4 to T2 and call the second one, and it's really not clear which
>> is right, so you had better thrown an error.  The same applies to
>> functions: given foo(text) and foo(date) and the call
>> foo('2010-04-15'), you had better complain, or you may end up with a
>> very sad user.  But sometimes our current casting rules require casts
>> in situations where they don't seem necessary, such as
>> LPAD(integer_column, '0', 5).  That's not ambiguous because there's
>> only one definition of LPAD, and the chances that the user will be
>> unpleasantly surprised if you call it seem quite low.
>>
>> In other words, this problem is not unique to CONCAT().
>
> shoot, can't resist :-).
>
> Are the casting rules broken? If you want to do lpad w/o casts for
> integers, you can define it explicitly -- feature, not bug.  You can
> basically do this for any function with fixed arguments -- either in
> userland or core.  lpad(int) actually introduces a problem case with
> the minus sign possibly requiring application specific intervention,
> so things are probably correct exactly as they are.

Huh? If you're arguing that LPAD should require a cast on an integer
argument because the defined behavior of the function might not be
what someone wants, then apparently explicit casts should be required
in all cases. If you're arguing that I can eliminate the need for an
explicit cast by overloading LPAD(), I agree with you, but that's not
a feature.

> ISTM you are unhappy with the "any" variadic mechanism in general, not
> the casting rules.

No, I'm unhappy with the use of "any" to make an end-run around the
casting rules. If you're writing a function that operates on an
argument of any type, like pg_sizeof() - or if you're writing a
function that does something like append two arrays of unspecified but
identical type or, perhaps, search an array of unspecified type for an
element of that same type - or if you're writing a function where the
types of the arguments can't be known in advance, like printf(), well,
then any is what you need. But the only argument anyone has put
forward for making CONCAT() accept ANY instead of TEXT is that it
might require casting otherwise. My response to that is "well then
you have to cast it, or fix the casting rules".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 18:09:03
Message-ID: AANLkTikCToXQVN8Hw-mVn5Ltcu8xO8bMmq2zThb6tgfn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Jul 26, 2010 at 11:39 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>> I'm just very skeptical that we should give our functions argument
>>> types that are essentially fantasy.  CONCAT() doesn't concatenate
>>> integers or intervals or boxes: it concatenates strings, and only
>>> strings.  Surely the right fix, if our casting rules are too
>>> restrictive, is to fix the casting rules; not to start adding a bunch
>>> of kludgery in every function we define.
>>>
>>> The problem with your canonical example (and the other examples of
>>> this type I've seen) is that they are underspecified.  Given two
>>> identically-named operators, one of which accepts types T1 and T2, and
>>> the other of which accepts types T3 and T4, what is one to do with T1
>>> OP T4?  You can cast T1 to T3 and call the first operator or you can
>>> cast T4 to T2 and call the second one, and it's really not clear which
>>> is right, so you had better thrown an error.  The same applies to
>>> functions: given foo(text) and foo(date) and the call
>>> foo('2010-04-15'), you had better complain, or you may end up with a
>>> very sad user.  But sometimes our current casting rules require casts
>>> in situations where they don't seem necessary, such as
>>> LPAD(integer_column, '0', 5).  That's not ambiguous because there's
>>> only one definition of LPAD, and the chances that the user will be
>>> unpleasantly surprised if you call it seem quite low.
>>>
>>> In other words, this problem is not unique to CONCAT().
>>
>> shoot, can't resist :-).
>>
>> Are the casting rules broken? If you want to do lpad w/o casts for
>> integers, you can define it explicitly -- feature, not bug.  You can
>> basically do this for any function with fixed arguments -- either in
>> userland or core.  lpad(int) actually introduces a problem case with
>> the minus sign possibly requiring application specific intervention,
>> so things are probably correct exactly as they are.
>
> Huh?  If you're arguing that LPAD should require a cast on an integer
> argument because the defined behavior of the function might not be
> what someone wants, then apparently explicit casts should be required
> in all cases.  If you're arguing that I can eliminate the need for an
> explicit cast by overloading LPAD(), I agree with you, but that's not
> a feature.
>
>> ISTM you are unhappy with the "any" variadic mechanism in general, not
>> the casting rules.
>
> No, I'm unhappy with the use of "any" to make an end-run around the
> casting rules.  If you're writing a function that operates on an
> argument of any type, like pg_sizeof() - or if you're writing a
> function that does something like append two arrays of unspecified but
> identical type or, perhaps, search an array of unspecified type for an
> element of that same type - or if you're writing a function where the
> types of the arguments can't be known in advance, like printf(), well,
> then any is what you need.  But the only argument anyone has put
> forward for making CONCAT() accept ANY instead of TEXT is that it
> might require casting otherwise.  My response to that is "well then
> you have to cast it, or fix the casting rules".

I understand, but with only text accepting, then CONCAT will has much
less benefit - you can't do a numeric list, for example

see concat(c1::text, ',', c2::text, ',' ...)

with this is much simpler use a pipes '' || c1 || ',' || c2 ... and
this operator does necessary cast self.

This function is probably one use case of exception from our rules.

Regards

Pavel Stehule
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 19:07:08
Message-ID: AANLkTin51OW-=ntjiajOw_T_y6azX9J1WOsVU3KDoLmV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 2:09 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I understand, but with only text accepting, then CONCAT will has much
> less benefit - you can't do a numeric list, for example
>
> see concat(c1::text, ',', c2::text, ',' ...)
>
> with this is much simpler use a pipes '' || c1 || ',' || c2 ... and
> this operator does necessary cast self.
>
> This function is probably one use case of exception from our rules.

At least two, right? Because for that use case you'd probably want
concat_ws(). In fact, it's hard for me to think of a variadic text
function where you wouldn't want the "no casts" behavior you're
getting via ANY.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-26 19:49:05
Message-ID: AANLkTikm0qDyHxfOJ9MEBOftRdhTdxKtyvX46Q3MmiMf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 3:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 26, 2010 at 2:09 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I understand, but with only text accepting, then CONCAT will has much
>> less benefit - you can't do a numeric list, for example
>>
>> see concat(c1::text, ',', c2::text, ',' ...)
>>
>> with this is much simpler use a pipes '' || c1 || ',' || c2 ... and
>> this operator does necessary cast self.
>>
>> This function is probably one use case of exception from our rules.
>
> At least two, right?

correct: there would be at least two.

> Because for that use case you'd probably want
> concat_ws().  In fact, it's hard for me to think of a variadic text
> function where you wouldn't want the "no casts" behavior you're
> getting via ANY.

concat() is not a variadic text function. it is variadic "any" that
happens to do text coercion (not casting) inside the function. The
the assumption that concat is casting internally is probably wrong.
Suppose I had hacked the int->text cast to call a custom function -- I
would very much expect concat() not to produce output from that
function, just the vanilla output text (I could always force the cast
if I wanted to).

concat is just a function that does something highly similar to
casting. suppose I had a function count_memory(variadic "any") that
summed memory usage of input args -- forcing casts would make no sense
in that context (I'm not suggesting that you think so -- just bringing
up a case that illustrates how forcing cast into the function can
change behavior in subtle ways).

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-27 00:48:46
Message-ID: AANLkTikouqVvXmE=eKUoJg8-ddXJvf-URwhi2P_1MccV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 3:49 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> concat() is not a variadic text function. it is variadic "any" that
> happens to do text coercion (not casting) inside the function. The
> the assumption that concat is casting internally is probably wrong.
> Suppose I had hacked the int->text cast to call a custom function -- I
> would very much expect concat() not to produce output from that
> function, just the vanilla output text (I could always force the cast
> if I wanted to).
>
> concat is just a function that does something highly similar to
> casting.  suppose I had a function count_memory(variadic "any") that
> summed memory usage of input args -- forcing casts would make no sense
> in that context (I'm not suggesting that you think so -- just bringing
> up a case that illustrates how forcing cast into the function can
> change behavior in subtle ways).

Right, but I already said I wasn't objecting to the use of variadic
ANY in cases like that - only in cases where, as here, you were
basically taking any old arguments and forcing them all to text.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-27 06:22:54
Message-ID: AANLkTinxKRgg68HK3Hpa0z7o7P95ywXVVvgvCr6Fer74@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Jul 26, 2010 at 2:09 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I understand, but with only text accepting, then CONCAT will has much
>> less benefit - you can't do a numeric list, for example
>>
>> see concat(c1::text, ',', c2::text, ',' ...)
>>
>> with this is much simpler use a pipes '' || c1 || ',' || c2 ... and
>> this operator does necessary cast self.
>>
>> This function is probably one use case of exception from our rules.
>
> At least two, right?  Because for that use case you'd probably want
> concat_ws().

sorry, yes

Pavel

 In fact, it's hard for me to think of a variadic text
> function where you wouldn't want the "no casts" behavior you're
> getting via ANY.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-29 20:43:57
Message-ID: 2320c2f603642bf0c4fdd57c30077a37.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Pavel,

In xfunc.sgml, I came across a function example (for use of VARIADIC in polymorphic functions),
where the function name is concat(): (in the manual: 35.4.10. Polymorphic SQL Functions).
Although that is not strictly wrong, it seems better to change that name when concat goes into
core, as seems to be the plan.

If you agree, it seems best to include this change in your patch and change that example
function's name when the stringfunc patch gets applied.

Erik Rijkers

Attachment Content-Type Size
xfunc.sgml.diff application/octet-stream 1.6 KB

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions ( correct patch attached )
Date: 2010-07-29 20:49:27
Message-ID: 179b1edefaf48e9327264d2671e4015b.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, July 29, 2010 22:43, Erik Rijkers wrote:
> Hi Pavel,
>
> In xfunc.sgml, I came across a function example (for use of VARIADIC in polymorphic functions),
> where the function name is concat(): (in the manual: 35.4.10. Polymorphic SQL Functions).
> Although that is not strictly wrong, it seems better to change that name when concat goes into
> core, as seems to be the plan.
>
> If you agree, it seems best to include this change in your patch and change that example
> function's name when the stringfunc patch gets applied.
>

My apologies, the previous email had the wrong doc-patch attached.

Here is the correct one.

Erik Rijkers

Attachment Content-Type Size
xfunc.sgml.diff application/octet-stream 496 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-07-29 20:53:06
Message-ID: AANLkTimwJ-MzG_dT+vYSyJ73H80dF1ER=9j-RzXMw0Pm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 26, 2010 at 8:48 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 26, 2010 at 3:49 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> concat() is not a variadic text function. it is variadic "any" that
>> happens to do text coercion (not casting) inside the function. The
>> the assumption that concat is casting internally is probably wrong.
>> Suppose I had hacked the int->text cast to call a custom function -- I
>> would very much expect concat() not to produce output from that
>> function, just the vanilla output text (I could always force the cast
>> if I wanted to).
>>
>> concat is just a function that does something highly similar to
>> casting.  suppose I had a function count_memory(variadic "any") that
>> summed memory usage of input args -- forcing casts would make no sense
>> in that context (I'm not suggesting that you think so -- just bringing
>> up a case that illustrates how forcing cast into the function can
>> change behavior in subtle ways).
>
> Right, but I already said I wasn't objecting to the use of variadic
> ANY in cases like that - only in cases where, as here, you were
> basically taking any old arguments and forcing them all to text.

I believe that another unpleasant side effect of this is that CONCAT()
will have to be declared stable rather than immutable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions ( correct patch attached )
Date: 2010-07-30 09:56:53
Message-ID: AANLkTimQFVgcAGtK_F-R-k4+i_M+joT0os33y6jjVcFu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/29 Erik Rijkers <er(at)xs4all(dot)nl>:
> On Thu, July 29, 2010 22:43, Erik Rijkers wrote:
>> Hi Pavel,
>>
>> In xfunc.sgml, I came across a function example (for use of VARIADIC in polymorphic functions),
>> where the function name is concat():  (in the manual: 35.4.10. Polymorphic SQL Functions).
>> Although that is not strictly wrong, it seems better to change that name when concat goes into
>> core, as seems to be the plan.
>>
>> If you agree, it seems best to include this change in your patch and change that example
>> function's name when the stringfunc patch gets applied.
>>
>
> My apologies, the previous email had the wrong doc-patch attached.
>
> Here is the correct one.
>

it is good idea, thank you

Pavel

>
> Erik Rijkers
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-07 00:43:16
Message-ID: AANLkTik+w153TvJfPyhGw6QqdJzi1f1TvvyzU=_ME6S1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/26 Robert Haas <robertmhaas(at)gmail(dot)com>:
> Come to think of it, have we checked that the behavior of LEFT, RIGHT,
> REVERSE, etc. is the same on other DBs, especially as far as nulls,
> empty strings, too-large or negative subscripts, etc is concerned?  Is
> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
> implements here?

I made a discussion page in wiki for the compatibility issue.
http://wiki.postgresql.org/wiki/String_Functions_and_Operators_Compatibility

Please fill empty cells and fix wrong descriptions.
* concat() is not compatible between MySQL and Oracle/DB2. Which do we buy?
* How do other databases behave in left() and right() with negative lengths?
* Are there any databases that has similar features with format() or
sprintf() ?

> And why does CONCAT() take a variadic "ANY"
> argument?  Shouldn't that be variadic TEXT?

I think we have no other choice but to use VARIADIC "any" for variadic
functions.
We have all combinations of argument types for || operator, (text, text),
(text, any), (any, text), but we cannot use such codes for variadic functions
-- they have no limits of argument numbers. And in the case, the functions
should be STABLE because they convert arguments to text in it with typout
functions that might be STABLE.

IMHO, I'd repeat, syntax for format() is a bad choice because it cannot
concatenate multiple arguments without separator, though RAISE also uses it.
%s format in sprintf() or {n} syntax in C#'s String.Format() seems to be
a better design.

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-07 11:39:03
Message-ID: AANLkTi=-96n=oWVvKCD-EMyHaDG70F45nso5DZFfkdGx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/8/7 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> 2010/7/26 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> Come to think of it, have we checked that the behavior of LEFT, RIGHT,
>> REVERSE, etc. is the same on other DBs, especially as far as nulls,
>> empty strings, too-large or negative subscripts, etc is concerned?  Is
>> CONCAT('foo', NULL) => 'foo' really the behavior that everyone else
>> implements here?
>
> I made a discussion page in wiki for the compatibility issue.
> http://wiki.postgresql.org/wiki/String_Functions_and_Operators_Compatibility
>

nice, thank you

> Please fill empty cells and fix wrong descriptions.
>  * concat() is not compatible between MySQL and Oracle/DB2. Which do we buy?

I prefer a our implementation - it skip a NULL values and it has a
variadic arguments. MySQL's concat isn't too consistent - I don't know
why it has different NULL handlidg than concat_ws.

>  * How do other databases behave in left() and right() with negative lengths?

I don't know about one with left() and right() functions. What I know,
only MS Access has these functions. The design of these functions is
inspirited by wide used a Oracle library PLvision - this library is
freeware now - but my code is original. See plvstr.left() and
plvstr.right() - and little bit by python substring operations. The
sense of negative arguments is elimination of necessary detoast
operations and utf8 related calculations. For right() it means skip
first n chars, for left() skip last n chars. These functions was
originally designed for contrib - and I still thinking so contrib is
better - My opinion isn't strong here - I prefer a fully functional
function in contrib before minimalistic version in core. Minimalistic
functions are trivial via substring.

>  * Are there any databases that has similar features with format() or
> sprintf() ?

I know only about package from PLvision library -

select plvsubst.string('My name is %s %s', ARRAY['Pavel','Stěhule']);

but you can find a lot of custom implementations. I found a some
similar - not exactly this in T-SQL see FORMATMESSAGE() function. But
the using of this function is very limited and it is C API function
(available from T-SQL). It doesn't return a string, just write to log.

>
>
>> And why does CONCAT() take a variadic "ANY"
>> argument?  Shouldn't that be variadic TEXT?
>
> I think we have no other choice but to use VARIADIC "any" for variadic
> functions.
> We have all combinations of argument types for || operator, (text, text),
> (text, any), (any, text), but we cannot use such codes for variadic functions
> -- they have no limits of argument numbers. And in the case, the functions
> should be STABLE because they convert arguments to text in it with typout
> functions that might be STABLE.
>
>
> IMHO, I'd repeat, syntax for format() is a bad choice because it cannot
> concatenate multiple arguments without separator, though RAISE also uses it.
> %s format in sprintf() or {n} syntax in C#'s String.Format() seems to be
> a better design.

I don't agree. This function isn't designed to replace string
concation. It is designed to build a SQL string (for dynamic SQL) or
format messages. It isn't designed to replace to_char function. It is
designed to work mainly inside PLpgSQL functions and then is
consistent with RAISE statement.

Thank you

Regards

Pavel Stehule

>
> --
> Itagaki Takahiro
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-23 11:11:39
Message-ID: AANLkTi=xVcVsyegazdYwsfd1U1QO-Sv9P_5-bohrhLhp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Aug 7, 2010 at 8:39 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I made a discussion page in wiki for the compatibility issue.
>> http://wiki.postgresql.org/wiki/String_Functions_and_Operators_Compatibility
> nice, thank you

I filled cells for SQL Server and DB2.

>>  * concat() is not compatible between MySQL and Oracle/DB2. Which do we buy?
> I prefer a our implementation - it skip a NULL values and it has a
> variadic arguments.

OK. I'm going to put both concat() and concat_ws() into core.

>>  * How do other databases behave in left() and right() with negative lengths?
> little bit by python substring operations.

I'll respect your proposal. The behaviors for negative lengths will
be our specific feature, but I don't see any problems there.
Since other databases raises errors, user should have negative-protections
in their existing codes.

> I don't agree. This function isn't designed to replace string
> concation. It is designed to build a SQL string (for dynamic SQL) or
> format messages. It isn't designed to replace to_char function. It is
> designed to work mainly inside PLpgSQL functions and then is
> consistent with RAISE statement.

OK. I'll revert my changes to your original format().

But please wait a moment to include sprintf() and contrib/stringfunc.
I think the function is useful, but don't want to have two versions
of formatting functions. So, the extended features will be merged
into format() with additional syntax something like {10s}. Then,
we could simplify the code because some of complex format syntax
are not so useful in SQL, especially length+string formatter (*s).

--
Itagaki Takahiro


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-23 11:29:53
Message-ID: AANLkTimxW--56RdtE4pPZ_UuPrhZw1UdErXVTUfCMK8f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/8/23 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> On Sat, Aug 7, 2010 at 8:39 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> I made a discussion page in wiki for the compatibility issue.
>>> http://wiki.postgresql.org/wiki/String_Functions_and_Operators_Compatibility
>> nice, thank you
>
> I filled cells for SQL Server and DB2.
>
>>>  * concat() is not compatible between MySQL and Oracle/DB2. Which do we buy?
>> I prefer a our implementation - it skip a NULL values and it has a
>> variadic arguments.
>
> OK. I'm going to put both concat() and concat_ws() into core.
>
>>>  * How do other databases behave in left() and right() with negative lengths?
>> little bit by python substring operations.
>
> I'll respect your proposal. The behaviors for negative lengths will
> be our specific feature, but I don't see any problems there.
> Since other databases raises errors, user should have negative-protections
> in their existing codes.
>
>> I don't agree. This function isn't designed to replace string
>> concation. It is designed to build a SQL string (for dynamic SQL) or
>> format messages. It isn't designed to replace to_char function. It is
>> designed to work mainly inside PLpgSQL functions and then is
>> consistent with RAISE statement.
>
> OK. I'll revert my changes to your original format().
>
> But please wait a moment to include sprintf() and contrib/stringfunc.
> I think the function is useful, but don't want to have two versions
> of formatting functions. So, the extended features will be merged
> into format() with additional syntax something like {10s}. Then,
> we could simplify the code because some of complex format syntax
> are not so useful in SQL, especially length+string formatter (*s).

It's reason, why I moved sprintf to contrib. When you build a SQL
statement or error message, you don't need (usually) complex
formating. And when you need it, then you can use a contrib sprintf
function. I am not against to additional simply formating - but I
afraid so we will create a second "printf" function. The formating
enhancing should be shared with RAISE NOTICE command. Probably we can
share code better now, when a PLpgSQL is in core.

Regards

Pavel

>
> --
> Itagaki Takahiro
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-23 14:14:38
Message-ID: 20168.1282572878@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> ... The formating
> enhancing should be shared with RAISE NOTICE command.

I remain of the opinion that RAISE's approach to formatting is
completely broken and inextensible, and that any attempt to be somehow
compatible with it is going to lead to an unusably broken design.
You should leave RAISE alone and just think about printf.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-23 14:41:31
Message-ID: AANLkTikdZ+30dBYPpbbOQ6e9O+YEg-zQKQXkFwGvWzW2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/8/23 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> ... The formating
>> enhancing should be shared with RAISE NOTICE command.
>
> I remain of the opinion that RAISE's approach to formatting is
> completely broken and inextensible, and that any attempt to be somehow
> compatible with it is going to lead to an unusably broken design.
> You should leave RAISE alone and just think about printf.
>

ok - then we don't need modify proposed patch. "Format" function is
enough for PL/pgSQL and other PL languages has own mutation of this
functions. There are not barrier for implementation as custom
function, so we can hold this function most simple.

Regards

Pavel

>                        regards, tom lane
>


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-24 06:32:47
Message-ID: AANLkTinUY-DnDSJxUrHDzJP14OTn7f=PtkMzDH65XFV7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I applied the attached patch to HEAD. concat(), concat_ws(), left(),
right(), and reverse() are in it, but format() and sprintf() are not.
It's my understanding that we don't have consensus about the best syntax
for the formatting function. We can forget about RAISE. C-like printf
syntax is the next candidate, but we should consider about other ones
restarting with a clean slate.

Anyway, the newly added functions are useful for developers especially
migrated from other database products. Thank you.

On Mon, Aug 23, 2010 at 11:41 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2010/8/23 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> You should leave RAISE alone and just think about printf.
>
> ok - then we don't need modify proposed patch. "Format" function is
> enough for PL/pgSQL and other PL languages has own mutation of this
> functions. There are not barrier for implementation as custom
> function, so we can hold this function most simple.

--
Itagaki Takahiro

Attachment Content-Type Size
stringfunc-20100824.diff application/octet-stream 12.5 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-24 06:38:24
Message-ID: AANLkTi=a12zMEML2eXh7ADA9caDteaYYbhAD91ieYY7D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/8/24 Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>:
> I applied the attached patch to HEAD. concat(), concat_ws(), left(),
> right(), and reverse() are in it, but format() and sprintf() are not.
> It's my understanding that we don't have consensus about the best syntax
> for the formatting function. We can forget about RAISE. C-like printf
> syntax is the next candidate, but we should consider about other ones
> restarting with a clean slate.
>

Thank you very much

C-like printf function is the most worse candidate - I don't like to
repeat discussion again - this function is designed for totally
different environment than SQL. I am sure, so we don't need a function
with complex formatting - maintaining "to_char" is good example.

Regards

Pavel Stehule

> Anyway, the newly added functions are useful for developers especially
> migrated from other database products. Thank you.
>
>
> On Mon, Aug 23, 2010 at 11:41 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2010/8/23 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>> You should leave RAISE alone and just think about printf.
>>
>> ok - then we don't need modify proposed patch. "Format" function is
>> enough for PL/pgSQL and other PL languages has own mutation of this
>> functions. There are not barrier for implementation as custom
>> function, so we can hold this function most simple.
>
> --
> Itagaki Takahiro
>


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Itagaki Takahiro" <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions
Date: 2010-08-24 11:42:33
Message-ID: 6c59aa5f9391a77c40f620163e8a2d85.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, August 24, 2010 08:32, Itagaki Takahiro wrote:
> I applied the attached patch to HEAD. concat(), concat_ws(), left(),
> right(), and reverse() are in it, but format() and sprintf() are not.

+1 to add also sprintf

Erik Rijkers


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch (for 9.1) string functions ( correct patch attached )
Date: 2011-02-17 17:37:31
Message-ID: 201102171737.p1HHbV009648@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Erik Rijkers wrote:
> On Thu, July 29, 2010 22:43, Erik Rijkers wrote:
> > Hi Pavel,
> >
> > In xfunc.sgml, I came across a function example (for use of VARIADIC in polymorphic functions),
> > where the function name is concat(): (in the manual: 35.4.10. Polymorphic SQL Functions).
> > Although that is not strictly wrong, it seems better to change that name when concat goes into
> > core, as seems to be the plan.
> >
> > If you agree, it seems best to include this change in your patch and change that example
> > function's name when the stringfunc patch gets applied.
> >
>
> My apologies, the previous email had the wrong doc-patch attached.
>
> Here is the correct one.

Applied for 9.1. Thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +