Quick Links

Re: What happened to the is_<type> family of functions proposal?

Lists:	pgsql-hackers

From:	"Colin 't Hart" <colinthart(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	What happened to the is_<type> family of functions proposal?
Date:	2010-09-20 14:29:52
Message-ID:	AANLkTikWO3cWfVLeRMjVcYKA2mbCQb+AGgRMkB=fZGE0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Back in 2002 these were proposed, what happened to them?

http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

Also I note:

colin(at)ruby:~/workspace/eyedb$ psql
psql (8.4.4)
Type "help" for help.

colin=> select to_date('731332', 'YYMMDD');
to_date
------------
1974-02-01
(1 row)

colin=>

The fact that this wraps would seem to me to make the implementation of
is_date() difficult.

I'm trying to query character strings for valid dates but can't see how to
do this quickly... but for that discussion I will move to pgsql-general :-)

Cheers,

Colin

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Colin 't Hart <colinthart(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-20 14:54:34
Message-ID:	4C9775AA.4010000@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/20/2010 10:29 AM, Colin 't Hart wrote:
> Hi,
>
> Back in 2002 these were proposed, what happened to them?
>
> http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php

2002 is a looooooooong time ago.

>
>
> Also I note:
>
> colin(at)ruby:~/workspace/eyedb$ psql
> psql (8.4.4)
> Type "help" for help.
>
> colin=> select to_date('731332', 'YYMMDD');
> to_date
> ------------
> 1974-02-01
> (1 row)
>
> colin=>
>
>
> The fact that this wraps would seem to me to make the implementation
> of is_date() difficult.
>
>
>

I think to_date is the wrong gadget to use here. You should probably be
using the date input routine and trapping any data exception. e.g.:

test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps
SQLSTATE '22000' which is the class covering data exceptions.

cheers

andrew

From:	"Colin 't Hart" <colinthart(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-20 15:31:11
Message-ID:	AANLkTi=b799xu1qfrPaLh7vEz8fYW+ihqs+Cze=ks2nO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20 September 2010 16:54, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 09/20/2010 10:29 AM, Colin 't Hart wrote:
>>
>> Hi,
>>
>> Back in 2002 these were proposed, what happened to them?
>>
>> http://archives.postgresql.org/pgsql-sql/2002-09/msg00406.php
>
>
> 2002 is a looooooooong time ago.

<snip>

> I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.:
>
> test_date := date_in(textout(some_text));
>
> In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.

So it's not possible using pure SQL unless one writes a function?

Are the is_<type> family of functions still desired?

Also, where are the to_<type> conversions done?

Thanks,

Colin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	"Colin 't Hart" <colinthart(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 15:03:28
Message-ID:	AANLkTikAKE5ERpaZcEPte14Ct8E3Soweb0yRYC4Nmv10@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 20, 2010 at 11:31 AM, Colin 't Hart <colinthart(at)gmail(dot)com> wrote:
>> I think to_date is the wrong gadget to use here. You should probably be using the date input routine and trapping any data exception. e.g.:
>>
>> test_date := date_in(textout(some_text));
>>
>> In plpgsql you'd put that inside a begin/exception/end block that traps SQLSTATE '22000' which is the class covering data exceptions.
>
> So it's not possible using pure SQL unless one writes a function?

I think that is true.

> Are the is_<type> family of functions still desired?

I think it would be useful to have a way of testing whether a cast to
a given type will succeed. The biggest problem with the
exception-catching method is not that it requires writing a function
(which, IMHO, is no big deal) but that exception handling is pretty
slow and inefficient. You end up doing things like... write a regexp
to see whether the data is in approximately the right format and then
if it is try the cast inside an exception block. Yuck.

(On the other hand, whether the work that was done in 2002 is still
relevant to today's code is questionable. Things have changed a lot.)

> Also, where are the to_<type> conversions done?

I think maybe you are looking for the type input functions?

select typname, typinput::regprocedure from pg_type;

There are also some functions with names of the form to_<type>. You
can get a list of those with the following psql command:

\dfS to_*

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 15:49:56
Message-ID:	3790.1285084196@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think it would be useful to have a way of testing whether a cast to
> a given type will succeed. The biggest problem with the
> exception-catching method is not that it requires writing a function
> (which, IMHO, is no big deal) but that exception handling is pretty
> slow and inefficient. You end up doing things like... write a regexp
> to see whether the data is in approximately the right format and then
> if it is try the cast inside an exception block. Yuck.

The problem here is that putting the exception handling in C doesn't
make things any better: it's still slow and inefficient. And in the
general case the only way to be sure that a string will be accepted by
the input function is to try it.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 15:56:51
Message-ID:	AANLkTi=QPvSTT75UcEi7R7U=0mUbOfMSo_aV7fGWD_os@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think it would be useful to have a way of testing whether a cast to
>> a given type will succeed. The biggest problem with the
>> exception-catching method is not that it requires writing a function
>> (which, IMHO, is no big deal) but that exception handling is pretty
>> slow and inefficient. You end up doing things like... write a regexp
>> to see whether the data is in approximately the right format and then
>> if it is try the cast inside an exception block. Yuck.
>
> The problem here is that putting the exception handling in C doesn't
> make things any better: it's still slow and inefficient. And in the
> general case the only way to be sure that a string will be accepted by
> the input function is to try it.

Given the current API, that is true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Colin 't Hart <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 17:02:05
Message-ID:	1285088398-sup-7388@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Robert Haas's message of mar sep 21 11:56:51 -0400 2010:
> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > The problem here is that putting the exception handling in C doesn't
> > make things any better: it's still slow and inefficient. And in the
> > general case the only way to be sure that a string will be accepted by
> > the input function is to try it.
>
> Given the current API, that is true.

So we could refactor the input functions so that there's an internal
function that returns the accepted datum in the OK case and an ErrorData
for the failure case. The regular input function would just throw the
error data in the latter case; but this would allow another function to
just return whether it worked or not.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 17:41:32
Message-ID:	5922.1285090892@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The problem here is that putting the exception handling in C doesn't
>>> make things any better:

> So we could refactor the input functions so that there's an internal
> function that returns the accepted datum in the OK case and an ErrorData
> for the failure case.

This makes the untenable assumption that there are no elog(ERROR)s in
the "internal" input function *or anything it calls*. Short of truly
massive restructuring, including uglifying many internal APIs to have
error return codes instead of allowing elog within the callee, you will
never make this work for anything more complicated than say float8in().

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 17:45:12
Message-ID:	AANLkTik8FBMYDU=EcM83SGbaYx7rjds3tXZY5urxVgoO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> So we could refactor the input functions so that there's an internal
> function that returns the accepted datum in the OK case and an ErrorData
> for the failure case. The regular input function would just throw the
> error data in the latter case; but this would allow another function to
> just return whether it worked or not.

You're assuming the input function won't have any work it has to undo
which it would need the savepoint for anyways. For most of the
built-in datatypes -- all of the ones intended for holding real data
-- that's true. But for things like regclass or regtype it might not
be and for user-defined data types.... who knows?

Of course all people really want is to test whether something is a
valid integer, floating point value, etc.

--
greg

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 17:56:03
Message-ID:	AANLkTinhYHEQpfp5Oo3earuYtjLer5fOzkJ=pOFzTFui@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 21, 2010 at 1:45 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Tue, Sep 21, 2010 at 6:02 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
>> So we could refactor the input functions so that there's an internal
>> function that returns the accepted datum in the OK case and an ErrorData
>> for the failure case. The regular input function would just throw the
>> error data in the latter case; but this would allow another function to
>> just return whether it worked or not.
>
> You're assuming the input function won't have any work it has to undo
> which it would need the savepoint for anyways. For most of the
> built-in datatypes -- all of the ones intended for holding real data
> -- that's true. But for things like regclass or regtype it might not
> be and for user-defined data types.... who knows?
>
> Of course all people really want is to test whether something is a
> valid integer, floating point value, etc.

Right. Or a date - that's a case that comes up for me pretty
frequently. It's not too hard to write a regular expression to test
whether something is an integer -- although there is the question of
whether it will overflow, which is sometimes relevant -- but a date or
timestamp field is a bit harder.

I don't understand the argument that we need type input functions to
be protected by a savepoint. That seems crazy to me. We're taking a
huge performance penalty here to protect against something that seems
insane to me in the first instance. Not to mention cutting ourselves
off from really important features, like the ability to recover from
errors during COPY. I don't understand why we can't just make some
rules about what type input functions are allowed to do. And if you
break those rules then you get to keep both pieces. Why is this
unreasonable? A savepoint can hardly protect you against damage
inflicted by the execution of arbitrary code; IOW, we're already
relying on the user to follow some rules.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Colin 't Hart <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 18:06:02
Message-ID:	1285092165-sup-6030@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >> On Tue, Sep 21, 2010 at 11:49 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> The problem here is that putting the exception handling in C doesn't
> >>> make things any better:
>
> > So we could refactor the input functions so that there's an internal
> > function that returns the accepted datum in the OK case and an ErrorData
> > for the failure case.
>
> This makes the untenable assumption that there are no elog(ERROR)s in
> the "internal" input function *or anything it calls*. Short of truly
> massive restructuring, including uglifying many internal APIs to have
> error return codes instead of allowing elog within the callee, you will
> never make this work for anything more complicated than say float8in().

... which is what people want anyway. I mean, the day someone requests
is_sthcomplex, we could happily tell them that they need to use the
expensive workaround involving savepoints. I don't think we really need
to support the ones that would require truly expensive refactoring; the
simple ones would cover 99% of the use cases.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 23:05:39
Message-ID:	11329.1285110339@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I don't understand the argument that we need type input functions to
> be protected by a savepoint. That seems crazy to me. We're taking a
> huge performance penalty here to protect against something that seems
> insane to me in the first instance. Not to mention cutting ourselves
> off from really important features, like the ability to recover from
> errors during COPY. I don't understand why we can't just make some
> rules about what type input functions are allowed to do.

There are many rules that you could possibly make for type input
functions. But "you cannot throw an error" is not one of them ---
or at least, not one that you can usefully expect to be followed
for anything more than trivial straightline code.

The poster child for this is of course domain_in(). But even without
that, I don't think you can realistically legislate that no errors be
thrown by something of the complexity of, say, the timestamp input
functions. Just for starters, what of a palloc() failure?

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 23:09:27
Message-ID:	201009220109.29700.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I don't understand the argument that we need type input functions to
> > be protected by a savepoint. That seems crazy to me. We're taking a
> > huge performance penalty here to protect against something that seems
> > insane to me in the first instance. Not to mention cutting ourselves
> > off from really important features, like the ability to recover from
> > errors during COPY. I don't understand why we can't just make some
> > rules about what type input functions are allowed to do.
>
> There are many rules that you could possibly make for type input
> functions. But "you cannot throw an error" is not one of them ---
> or at least, not one that you can usefully expect to be followed
> for anything more than trivial straightline code.
>
> The poster child for this is of course domain_in(). But even without
> that, I don't think you can realistically legislate that no errors be
> thrown by something of the complexity of, say, the timestamp input
> functions. Just for starters, what of a palloc() failure?
Uhm. Isnt a palloc failure a really, really bad example because it will kill
the session anyway? FATAL+ is not relevant in that context, right?

Andres

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 23:09:52
Message-ID:	11394.1285110592@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of mar sep 21 13:41:32 -0400 2010:
>> ...never make this work for anything more complicated than say float8in().

> ... which is what people want anyway. I mean, the day someone requests
> is_sthcomplex, we could happily tell them that they need to use the
> expensive workaround involving savepoints. I don't think we really need
> to support the ones that would require truly expensive refactoring; the
> simple ones would cover 99% of the use cases.

Robert was complaining about COPY in particular. It's hard to see how
you make any progress on that if you don't have pretty near 100%
coverage of datatypes. I don't object if someone puts in is_integer,
is_float, etc; but that's nowhere near a general purpose solution.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 23:35:25
Message-ID:	11813.1285112125@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On Wednesday 22 September 2010 01:05:39 Tom Lane wrote:
>> Just for starters, what of a palloc() failure?

> Uhm. Isnt a palloc failure a really, really bad example because it will kill
> the session anyway? FATAL+ is not relevant in that context, right?

Huh? It's not fatal, just elog(ERROR, "out of memory").

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 23:55:43
Message-ID:	AANLkTikKmwS1PVaz=kiY56b4G7s94wUjiQbx31oEane9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I don't understand the argument that we need type input functions to
>> be protected by a savepoint. That seems crazy to me. We're taking a
>> huge performance penalty here to protect against something that seems
>> insane to me in the first instance. Not to mention cutting ourselves
>> off from really important features, like the ability to recover from
>> errors during COPY. I don't understand why we can't just make some
>> rules about what type input functions are allowed to do.
>
> There are many rules that you could possibly make for type input
> functions. But "you cannot throw an error" is not one of them ---
> or at least, not one that you can usefully expect to be followed
> for anything more than trivial straightline code.

OK. This is one of the things I don't understand. Why does throwing
an error imply that we need to abort the current transaction? Why
can't we just catch the longjmp() and trundle onwards? Obviously,
that's unsafe if a pretty wide variety of cases, but if you're just
scrutinizing the input string (even with a little bit of read-only
database access) it's not obvious to me what can go wrong. (I assume
there is something, but I don't know what it is...)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-21 23:57:37
Message-ID:	201009212357.o8LNvbN13015@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> I don't understand the argument that we need type input functions to
> >> be protected by a savepoint. ?That seems crazy to me. ?We're taking a
> >> huge performance penalty here to protect against something that seems
> >> insane to me in the first instance. ?Not to mention cutting ourselves
> >> off from really important features, like the ability to recover from
> >> errors during COPY. ?I don't understand why we can't just make some
> >> rules about what type input functions are allowed to do.
> >
> > There are many rules that you could possibly make for type input
> > functions. ?But "you cannot throw an error" is not one of them ---
> > or at least, not one that you can usefully expect to be followed
> > for anything more than trivial straightline code.
>
> OK. This is one of the things I don't understand. Why does throwing
> an error imply that we need to abort the current transaction? Why
> can't we just catch the longjmp() and trundle onwards? Obviously,
> that's unsafe if a pretty wide variety of cases, but if you're just
> scrutinizing the input string (even with a little bit of read-only
> database access) it's not obvious to me what can go wrong. (I assume
> there is something, but I don't know what it is...)

That would be interesting. You would need to flag that this was not a
longjump requiring cleanup, but the harder part would be getting back to
where the error occurred. I guess you could rerun the query. :-|

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Colin 't Hart <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-22 01:54:07
Message-ID:	1285120447.7762.15.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-09-21 at 19:55 -0400, Robert Haas wrote:
> OK. This is one of the things I don't understand. Why does throwing
> an error imply that we need to abort the current transaction? Why
> can't we just catch the longjmp() and trundle onwards? Obviously,
> that's unsafe if a pretty wide variety of cases, but if you're just
> scrutinizing the input string (even with a little bit of read-only
> database access) it's not obvious to me what can go wrong. (I assume
> there is something, but I don't know what it is...)

The worry (from me) would be the "little bit of read-only database
access". If you SPI_connect() without an SPI_finish(), that sounds like
a potential problem (as would anything else that requires cleanup).

Regards,
Jeff Davis

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-24 19:41:47
Message-ID:	23835.1285357307@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are many rules that you could possibly make for type input
>> functions. But "you cannot throw an error" is not one of them ---
>> or at least, not one that you can usefully expect to be followed
>> for anything more than trivial straightline code.

> OK. This is one of the things I don't understand. Why does throwing
> an error imply that we need to abort the current transaction? Why
> can't we just catch the longjmp() and trundle onwards? Obviously,
> that's unsafe if a pretty wide variety of cases, but if you're just
> scrutinizing the input string (even with a little bit of read-only
> database access) it's not obvious to me what can go wrong.

The problem is to know that "all you did" was scrutinize the input
string. If it's simple straightline code (even with some C library
calls) then you can know that, but then you can write such code without
including any elog(ERROR) in it in the first place. If you are trapping
longjmps then what you'd need to assert is that no error thrown from
anywhere in any of the code reachable from that place represents a
problem that requires transaction abort to clean up after. This gets
unmaintainable remarkably quickly, especially if you invoke anything
as complicated as database access. And then there are asynchronous
error reasons (query cancel) which you shouldn't trap in any case.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-25 03:15:49
Message-ID:	AANLkTinmLjkNdPTCZC4NF8bu_b6dHr5CsYQs0OZJ=JUn@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> There are many rules that you could possibly make for type input
>>> functions. But "you cannot throw an error" is not one of them ---
>>> or at least, not one that you can usefully expect to be followed
>>> for anything more than trivial straightline code.
>
>> OK. This is one of the things I don't understand. Why does throwing
>> an error imply that we need to abort the current transaction? Why
>> can't we just catch the longjmp() and trundle onwards? Obviously,
>> that's unsafe if a pretty wide variety of cases, but if you're just
>> scrutinizing the input string (even with a little bit of read-only
>> database access) it's not obvious to me what can go wrong.
>
> The problem is to know that "all you did" was scrutinize the input
> string. If it's simple straightline code (even with some C library
> calls) then you can know that, but then you can write such code without
> including any elog(ERROR) in it in the first place. If you are trapping
> longjmps then what you'd need to assert is that no error thrown from
> anywhere in any of the code reachable from that place represents a
> problem that requires transaction abort to clean up after. This gets
> unmaintainable remarkably quickly, especially if you invoke anything
> as complicated as database access. And then there are asynchronous
> error reasons (query cancel) which you shouldn't trap in any case.

Hmm. So the problem is that we don't want to accidentally catch an
error that isn't actually safe to catch. We could probably mitigate
this problem to a considerable degree by throwing data validation
errors using some special flag that say "this is a recoverable error".
And if that flag isn't set then we abort the whole transaction, but
if it is then we continue on. It's still possible for the person
writing the typinput function to set that flag when they should not,
but at least it's less likely to happen by accident. Another
alternative would be to create some kind of explicit way for the
function to RETURN an error instead of throwing it.

But neither of these things is totally bullet-proof, because you could
still do something that requires clean-up and then lie about it. To
protect against that, you'd presumably need to set some kind of a flag
whenever, say, a heap tuple gets modified, and then you could assert
said flag false. What, other than writing to the database, requires
subtransaction cleanup?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Darren Duncan <darren(at)darrenduncan(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-25 03:51:40
Message-ID:	4C9D71CC.1060900@darrenduncan.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Colin 't Hart wrote:
> The fact that this wraps would seem to me to make the implementation of
> is_date() difficult.

Having separate is_foo() syntax per type is a bad design idea, same as having a
different equality test like eq_int() or assignment syntax like assign_str() per
type.

There should just be a single syntax that works for all types, in the general
case, for testing whether a value is a member of that type, or alternately
whether a value can be cast to a particular type.

For example, one could say "is_type( <value>, <type-name> )" or it could be
spelled "isa()" or if you wanted to be more ambitious it could be an infix op,
like "<value> isa <type-name>" to test when a value is of a type already.

Pg already gets it right in this regard by having a single general syntax for
type casting, the "<value>::<type-name>" and value membership of a type should
be likewise.

Maybe to test if a value can be cast as a type, you can continue the ::
mnemonic, say adding a "?" for yes and a "!" for no.

For example, "<value>?::<type-name>" tests if the value can be cast as the type
and "<value>!::<type-name>" or "not <value>?::<type-name>" tests the opposite.
An expression like this results in a boolean.

-- Darren Duncan

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Darren Duncan <darren(at)darrenduncan(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-25 07:01:02
Message-ID:	4C9D9E2E.2050003@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25/09/2010 11:51 AM, Darren Duncan wrote:
> Colin 't Hart wrote:
>> The fact that this wraps would seem to me to make the implementation
>> of is_date() difficult.
>
> Having separate is_foo() syntax per type is a bad design idea, same as
> having a different equality test like eq_int() or assignment syntax like
> assign_str() per type.
>
> There should just be a single syntax that works for all types, in the
> general case, for testing whether a value is a member of that type, or
> alternately whether a value can be cast to a particular type.

Good point. That also permits a general-case implementation that catches
any exceptions thrown, with optimized exception-free cases for
int/date/string etc behind the scenes.

That'd do a good job of protecting the SQL programmer from having to
deal with which types had tests and which they had to use exception
handling for. It'd have to be documented due to the performance
differences, but it'd be great to be able to do this in a general
purpose way at the SQL level.

> Pg already gets it right in this regard by having a single general
> syntax for type casting, the "<value>::<type-name>" and value membership
> of a type should be likewise.

or the standard:

CAST(value AS typename)

> Maybe to test if a value can be cast as a type, you can continue the ::
> mnemonic, say adding a "?" for yes and a "!" for no.
>
> For example, "<value>?::<type-name>" tests if the value can be cast as
> the type and "<value>!::<type-name>" or "not <value>?::<type-name>"
> tests the opposite. An expression like this results in a boolean.

Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)

Good shorthand, I guess, but a CAST syntax extension or alternate CAST
version would be a bonus for readability.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

From:	Darren Duncan <darren(at)darrenduncan(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-25 08:05:00
Message-ID:	4C9DAD2C.7070404@darrenduncan.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Craig Ringer wrote:
> On 25/09/2010 11:51 AM, Darren Duncan wrote:
>> There should just be a single syntax that works for all types, in the
>> general case, for testing whether a value is a member of that type, or
>> alternately whether a value can be cast to a particular type.
<snip>
>> Pg already gets it right in this regard by having a single general
>> syntax for type casting, the "<value>::<type-name>" and value membership
>> of a type should be likewise.
>
> or the standard:
>
> CAST(value AS typename)

Indeed. The exact syntax doesn't matter to me but the point is that the type
name is its own lexical element, conceptually a function argument, rather than
being a substring of another one.

>> Maybe to test if a value can be cast as a type, you can continue the ::
>> mnemonic, say adding a "?" for yes and a "!" for no.
>>
>> For example, "<value>?::<type-name>" tests if the value can be cast as
>> the type and "<value>!::<type-name>" or "not <value>?::<type-name>"
>> tests the opposite. An expression like this results in a boolean.
>
> Personal opinion here: Blech, if I wanted to use Perl6 I'd do so ;-)

I see that someone has been paying attention.

Yes, the idea of using ? or ! to derive a boolean expression from some other
expression did indeed come from Perl 6. The ? means "is so", ! means "is not".
A very useful mnemonic in general.

> Good shorthand, I guess, but a CAST syntax extension or alternate CAST
> version would be a bonus for readability.

Well, sure. But something consistent with cast syntax that Pg provides.

-- Darren Duncan

From:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-25 08:36:24
Message-ID:	AANLkTi=hpDUbPiv4Q7cSAP7jFNB6+tyGu=Fs_vETvYqn@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Sep 24, 2010 at 3:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Tue, Sep 21, 2010 at 7:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> There are many rules that you could possibly make for type input
> >>> functions. But "you cannot throw an error" is not one of them ---
> >>> or at least, not one that you can usefully expect to be followed
> >>> for anything more than trivial straightline code.
> >
> >> OK. This is one of the things I don't understand. Why does throwing
> >> an error imply that we need to abort the current transaction? Why
> >> can't we just catch the longjmp() and trundle onwards? Obviously,
> >> that's unsafe if a pretty wide variety of cases, but if you're just
> >> scrutinizing the input string (even with a little bit of read-only
> >> database access) it's not obvious to me what can go wrong.
> >
> > The problem is to know that "all you did" was scrutinize the input
> > string. If it's simple straightline code (even with some C library
> > calls) then you can know that, but then you can write such code without
> > including any elog(ERROR) in it in the first place. If you are trapping
> > longjmps then what you'd need to assert is that no error thrown from
> > anywhere in any of the code reachable from that place represents a
> > problem that requires transaction abort to clean up after. This gets
> > unmaintainable remarkably quickly, especially if you invoke anything
> > as complicated as database access. And then there are asynchronous
> > error reasons (query cancel) which you shouldn't trap in any case.
>
> Hmm. So the problem is that we don't want to accidentally catch an
> error that isn't actually safe to catch. We could probably mitigate
> this problem to a considerable degree by throwing data validation
> errors using some special flag that say "this is a recoverable error".
> And if that flag isn't set then we abort the whole transaction, but
> if it is then we continue on. It's still possible for the person
> writing the typinput function to set that flag when they should not,
> but at least it's less likely to happen by accident. Another
> alternative would be to create some kind of explicit way for the
> function to RETURN an error instead of throwing it.
>
> But neither of these things is totally bullet-proof, because you could
> still do something that requires clean-up and then lie about it. To
> protect against that, you'd presumably need to set some kind of a flag
> whenever, say, a heap tuple gets modified, and then you could assert
> said flag false. What, other than writing to the database, requires
> subtransaction cleanup?
>
>
Andrew suggested upthread:

<snip>
test_date := date_in(textout(some_text));

In plpgsql you'd put that inside a begin/exception/end block that traps
SQLSTATE '22000' which is the class covering data exceptions.
</snip>

In context of COPY, can we check for this SQLSTATE in PG_CATCH() and
avoid PG_RE_THROW(). This would require that all type input functions call
errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-25 14:34:35
Message-ID:	16584.1285425275@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> writes:
> On Fri, Sep 24, 2010 at 11:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Hmm. So the problem is that we don't want to accidentally catch an
>> error that isn't actually safe to catch. We could probably mitigate
>> this problem to a considerable degree by throwing data validation
>> errors using some special flag that say "this is a recoverable error".

> Andrew suggested upthread:
> <snip>
> test_date := date_in(textout(some_text));
> In plpgsql you'd put that inside a begin/exception/end block that traps
> SQLSTATE '22000' which is the class covering data exceptions.
> </snip>

> In context of COPY, can we check for this SQLSTATE in PG_CATCH() and
> avoid PG_RE_THROW(). This would require that all type input functions call
> errcode() with proper MAKE_SQLSTATE() as part of ereport()/elog().

This is all pretty much a dead end, because it offers no confidence
whatsoever. Suppose that COPY calls type X's input function, which
calls function Y, which calls function Z. Z doesn't like what it sees
so it throws an error, which it marks "recoverable" since Z hasn't
done anything dangerous. Unfortunately, Y *did* do something that
requires cleanup. If COPY catches the longjmp and decides that it
can skip doing a transaction abort, you're screwed.

Again, the problem doesn't really arise as long as you're thinking about
simple code. You can fairly easily verify that no unsafe situation
arises so long as there's only one or two layers of functions involved.
But this approach just doesn't scale to anything complicated.

What I'm wondering is whether we can fix this by reducing the overhead
of subtransactions, enough so that we can afford to run each row's input
function calls within a subxact. In the past that was dismissed because
you'd run out of subxact XIDs at 4G rows. But we have "lazy" assignment
of XIDs now, so a subxact that didn't actually try to modify the
database shouldn't need to consume any permanent resources. Then we're
just looking at the time needed to call all the per-module subxact start
and subxact cleanup functions, which seems like something that might be
optimizable for the typical case where nothing actually needs to be
done.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "Colin 't Hart" <colinthart(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: What happened to the is_<type> family of functions proposal?
Date:	2010-09-26 03:03:35
Message-ID:	AANLkTinChAnFEQP_yQ1eFyCpM53XfsVFUBP=4sbM_ODP@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Sep 25, 2010 at 10:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> This is all pretty much a dead end, because it offers no confidence
> whatsoever. Suppose that COPY calls type X's input function, which
> calls function Y, which calls function Z. Z doesn't like what it sees
> so it throws an error, which it marks "recoverable" since Z hasn't
> done anything dangerous. Unfortunately, Y *did* do something that
> requires cleanup. If COPY catches the longjmp and decides that it
> can skip doing a transaction abort, you're screwed.

Yep. Although it seems a bit pathological for Z to do that, because
if Y is doing something like taking an LWLock then Z is some low-level
internals function that is not in a good position to judge whether
error recovery is feasible. The point is not to recover from as many
errors as possible, but to recover specifically from *data validation*
errors, which I would not expect to be the sort of thing thrown from
someplace deep down in the call stack where we're deep in the middle
of things. The toplevel typinput is a pretty good position to know
whether it's done anything shady.

> What I'm wondering is whether we can fix this by reducing the overhead
> of subtransactions, enough so that we can afford to run each row's input
> function calls within a subxact. In the past that was dismissed because
> you'd run out of subxact XIDs at 4G rows. But we have "lazy" assignment
> of XIDs now, so a subxact that didn't actually try to modify the
> database shouldn't need to consume any permanent resources. Then we're
> just looking at the time needed to call all the per-module subxact start
> and subxact cleanup functions, which seems like something that might be
> optimizable for the typical case where nothing actually needs to be
> done.

Well, reducing the overhead of subtransaction cleanup would certainly
be VERY nice, as it would benefit a FAR broader set of use cases than
just typinput functions. It seems a bit tricky though, because
AbortSubTransaction() calls a whole LOT of cleanup functions, and many
of them already have fast-paths. Where do you anticipate getting a
further large speed-up out of that? The problem seems particularly
tricky because those functions are cleaning up different subsystems.
Maybe you could group them in some way and figure out some method of
skipping entire groups with some kind of super-duper fast path, but
it's not obvious to me how to make that work. And I think you'd need
a pretty considerable speed-up, too. My gut says that even knocking
50% off, while it might be really nice for other reasons, is not going
to be enough to make sticking it inside COPY workable. I bet you need
an order-of-magnitude speed-up, maybe more.

It seems like a good slice of the problem here comes from the
difficulties of being certain what the state is after a longjmp. It
seems like you could get around all of these difficulties almost
completely if the type input function were empowered to return either
(1) a Datum which is the result of the conversion or (2) an SQLSTATE
and error message indicating what went wrong. We're already willing
to believe that cleanup isn't required when the function returns
successfully, so we ought to also believe it when the function returns
a failure result (as opposed to throwing an error indicating a
failure). The conditions that require cleanup here are probably
transient: take an LWLock, do something, release the LWLock. As long
as you know that you haven't stopped somewhere in the middle of that
sequence, it seems like it should be reasonably safe.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company