Re: ALTER TYPE 3: add facility to identify further no-work cases

Lists: pgsql-hackers
From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-09 22:03:53
Message-ID: 20110109220353.GD5777@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here I add the notion of an "exemptor function", a property of a cast that
determines when calls to the cast would be superfluous. Such calls can be
removed, reduced to RelabelType expressions, or annotated (via a new field in
FuncExpr) with the applicable exemptions. I modify various parse_coerce.c
functions to retrieve, call, and act on these exemptor functions; this includes
GetCoerceExemptions() from the last patch. I did opt to make
find_typmod_coercion_function return COERCION_PATH_RELABELTYPE when no work is
needed, rather than COERCION_PATH_NONE; this makes it consistent with
find_coercion_pathway's use of that enumeration.

To demonstrate the functionality, I add exemptor functions for varchar and xml.
Originally I was only going to start with varchar, but xml tests the other major
code path, and the exemptor function for xml is dead simple.

This helps on conversions like varchar(4)->varchar(8) and text->xml.

Attachment Content-Type Size
at3-exemptor.patch text/plain 81.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-25 00:18:47
Message-ID: AANLkTimKmCPCjjk-SyE1+ymbwy7FNnfzBcN1A5KT7UY5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 9, 2011 at 5:03 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Here I add the notion of an "exemptor function", a property of a cast that
> determines when calls to the cast would be superfluous.  Such calls can be
> removed, reduced to RelabelType expressions, or annotated (via a new field in
> FuncExpr) with the applicable exemptions.  I modify various parse_coerce.c
> functions to retrieve, call, and act on these exemptor functions; this includes
> GetCoerceExemptions() from the last patch.  I did opt to make
> find_typmod_coercion_function return COERCION_PATH_RELABELTYPE when no work is
> needed, rather than COERCION_PATH_NONE; this makes it consistent with
> find_coercion_pathway's use of that enumeration.
>
> To demonstrate the functionality, I add exemptor functions for varchar and xml.
> Originally I was only going to start with varchar, but xml tests the other major
> code path, and the exemptor function for xml is dead simple.
>
> This helps on conversions like varchar(4)->varchar(8) and text->xml.

I've read through this patch somewhat. As I believe Tom also
commented previously, exemptor is a pretty bad choice of name. More
generally, I think this patch is a case of coding something
complicated without first achieving consensus on what the behavior
should be. I think we need to address that problem first, and then
decide what code needs to be written, and then write the code.

It seems to me that patch two in this series has the right idea: skip
rewriting the table in cases where the types are binary coercible
(WITHOUT FUNCTION) or one is an unconstrained domain over the other.
IIUC, the problem this patch is trying to address is that our current
CAST infrastructure is insufficient to identify all cases where this
is true - in particular, it makes the decision solely based on the
type OIDs, without consideration of the typmods. In general, varchar
-> varchar is not binary coercible, but varchar(4) -> varchar(8) is
binary coercible. I think what this patch is trying to do is tag the
call to the cast function with the information that we might not need
to call it, but ISTM it would be better to just notice that the
function call is unnecessary and insert a RelabelType node instead.

*reads archives*

In fact, Tom already made pretty much exactly this suggestion, on a
thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE". The
only possible objection I can see to that line of attack is that it's
not clear what the API should be, or whether there might be too much
runtime overhead for the amount of benefit that can be obtained.

This is, incidentally, another problem with this patch - if you want
to make wide-ranging changes to the system like this, you need to
provide a convincing argument that it's not going to compromise
correctness or performance. Just submitting the patch without making
an argument for why it's correct is no good, unless it's so obviously
correct as to be unquestionable, which is certainly not the case here.
You've got to write up some kind of submission notes so that the
reviewer isn't trying to guess why you think this is the right
approach, or spending a lot of time thinking through approaches you've
discarded for good reason. If there is a danger of performance
regression, you need to refute it, preferably with actual benchmarks.
A particular aspect I'm concerned about with this patch in particular:
it strikes me that the overhead of calling the exemptor functions in
the ALTER TABLE case is negligible, but that this might not be true in
other contexts where some of this logic may get invoked - it appears
to implicate the casting machinery much more generally.

I think it's a mistake to merge the handling of the rewrite-vs-noop
case with the check-vs-noop case. They're fundamentally dissimilar.
As noted above, being able to notice the noop case seems like it could
save run-time work during ordinary query execution. But detecting the
"check" case is useless work except in the specific case of a table
rewrite. If we're going to do that at all, it seems to me that it
needs to be done via a code-path that's only going to get invoked in
the case of ALTER TABLE, not something that's going to happen every
time we parse an expression tree. But it seems to me that it might be
better to take the suggestion I made before: forget about the
check-only case for now, and focus on the do-nothing case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-25 04:13:29
Message-ID: 20110125041329.GC20879@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 07:18:47PM -0500, Robert Haas wrote:
> On Sun, Jan 9, 2011 at 5:03 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > Here I add the notion of an "exemptor function", a property of a cast that
> > determines when calls to the cast would be superfluous. ?Such calls can be
> > removed, reduced to RelabelType expressions, or annotated (via a new field in
> > FuncExpr) with the applicable exemptions. ?I modify various parse_coerce.c
> > functions to retrieve, call, and act on these exemptor functions; this includes
> > GetCoerceExemptions() from the last patch. ?I did opt to make
> > find_typmod_coercion_function return COERCION_PATH_RELABELTYPE when no work is
> > needed, rather than COERCION_PATH_NONE; this makes it consistent with
> > find_coercion_pathway's use of that enumeration.
> >
> > To demonstrate the functionality, I add exemptor functions for varchar and xml.
> > Originally I was only going to start with varchar, but xml tests the other major
> > code path, and the exemptor function for xml is dead simple.
> >
> > This helps on conversions like varchar(4)->varchar(8) and text->xml.
>
> I've read through this patch somewhat. As I believe Tom also
> commented previously, exemptor is a pretty bad choice of name.

I spent awhile with a thesaurus before coining that. Since Tom's comment, I've
been hoping the next person to complain would at least suggest a better name.
Shooting in the dark on a nomenclature issue isn't fun, and I'm not picky.

> More
> generally, I think this patch is a case of coding something
> complicated without first achieving consensus on what the behavior
> should be. I think we need to address that problem first, and then
> decide what code needs to be written, and then write the code.

Well, that's why I started the design thread "Avoiding rewrite in ALTER TABLE
ALTER TYPE" before writing any code. That thread petered out rather than reach
any consensus. What would you have done then?

> It seems to me that patch two in this series has the right idea: skip
> rewriting the table in cases where the types are binary coercible
> (WITHOUT FUNCTION) or one is an unconstrained domain over the other.
> IIUC, the problem this patch is trying to address is that our current
> CAST infrastructure is insufficient to identify all cases where this
> is true - in particular, it makes the decision solely based on the
> type OIDs, without consideration of the typmods. In general, varchar
> -> varchar is not binary coercible, but varchar(4) -> varchar(8) is
> binary coercible. I think what this patch is trying to do is tag the
> call to the cast function with the information that we might not need
> to call it, but ISTM it would be better to just notice that the
> function call is unnecessary and insert a RelabelType node instead.

This patch does exactly that for varchar(4) -> varchar(8) and similar cases.

> *reads archives*
>
> In fact, Tom already made pretty much exactly this suggestion, on a
> thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE". The
> only possible objection I can see to that line of attack is that it's
> not clear what the API should be, or whether there might be too much
> runtime overhead for the amount of benefit that can be obtained.

I believe the patch does implement Tom's suggestion, at least in spirit. He
mentioned expression_planner() as the place to do it. In principle, We could do
it *right there* and avoid any changes to coerce_to_target_type(), but the
overhead would increase. Specifically, we would be walking an already-built
expression tree looking for casts to remove, probably doing a syscache lookup
for every cast to grab the exemptor function OID. Doing it there would prevent
us from directly helping or harming plain old queries. Since ExecRelCheck(),
FormIndexDatum() and other notable paths call expression_planner(), we wouldn't
want to casually add an expensive algorithm there. To avoid the extra syscache
lookups, we'd piggyback off those already done in coerce_to_target_type(). That
brings us to the current implementation. Better to make the entire decision in
coerce_to_target_type() and never add the superfluous node to the expression
tree in the first place.

coerce_to_target_type() and callees seemed like the natural, low cost place to
make the changes. By doing it that way, did I miss some important detail or
implication of Tom's suggestion?

> This is, incidentally, another problem with this patch - if you want
> to make wide-ranging changes to the system like this, you need to
> provide a convincing argument that it's not going to compromise
> correctness or performance. Just submitting the patch without making
> an argument for why it's correct is no good, unless it's so obviously
> correct as to be unquestionable, which is certainly not the case here.
> You've got to write up some kind of submission notes so that the
> reviewer isn't trying to guess why you think this is the right
> approach, or spending a lot of time thinking through approaches you've
> discarded for good reason. If there is a danger of performance
> regression, you need to refute it, preferably with actual benchmarks.
> A particular aspect I'm concerned about with this patch in particular:
> it strikes me that the overhead of calling the exemptor functions in
> the ALTER TABLE case is negligible, but that this might not be true in
> other contexts where some of this logic may get invoked - it appears
> to implicate the casting machinery much more generally.

I assumed that readers of this patch email had read the series introduction
email, which referred them to the design discussion that you found. Surely that
design answered more than zero of "why this is correct". What questions in
particular did you find unanswered?

As for performance, coerce_to_target_type() is certainly in wide use, but it's
not exactly a hot path. I prepared and ran the attached bench-coerce-worst.sql
and bench-coerce-best.sql. Both are quite artificial. The first one basically
asks "Can we measure the new overhead at all?" The second one asks "Would any
ordinary query benefit from removing a superfluous cast from an expression
tree?" The worst case had an 4% _speedup_, and the best case had a 27% speedup,
suggesting answers of "no" and "yes (but not dramatically)". The "worst-case"
speedup doesn't make sense, so it's probably an artifact of some recent change
on master creating a larger performance dip in this pathological test case. I
could rebase the patch and rerun the benchmark, but I doubt an exact figure
would be much more meaningful. Unfortunately, I can't think of any more-natural
tests (help welcome) that would still illustrate any performance difference.

> I think it's a mistake to merge the handling of the rewrite-vs-noop
> case with the check-vs-noop case. They're fundamentally dissimilar.
> As noted above, being able to notice the noop case seems like it could
> save run-time work during ordinary query execution. But detecting the
> "check" case is useless work except in the specific case of a table
> rewrite. If we're going to do that at all, it seems to me that it
> needs to be done via a code-path that's only going to get invoked in
> the case of ALTER TABLE, not something that's going to happen every
> time we parse an expression tree.

I disagree that they're fundamentally dissimilar. They're _fundamentally_
similar, in that they are both fences, standing at different distances, around
the necessity of a cast function. Your argument only mentions performance, and
it boils down to a suggestion that we proceed to optimize things now by
separating these questions. Perhaps we should, but that's a different question.
Based on my benchmarks, I'm not seeing a need to micro-optimize.

> But it seems to me that it might be
> better to take the suggestion I made before: forget about the
> check-only case for now, and focus on the do-nothing case.

Let's revisit this when we're on the same page about the above points.

Thanks,
nm

Attachment Content-Type Size
bench-coerce-worst.sql text/plain 406 bytes
bench-coerce-best.sql text/plain 1.8 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 14:35:24
Message-ID: AANLkTikuqc1V+pRfs-AN_cp6O0XZjEt4gkchc9AAaNCV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 11:13 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > This helps on conversions like varchar(4)->varchar(8) and text->xml.
>>
>> I've read through this patch somewhat.  As I believe Tom also
>> commented previously, exemptor is a pretty bad choice of name.
>
> I spent awhile with a thesaurus before coining that.  Since Tom's comment, I've
> been hoping the next person to complain would at least suggest a better name.
> Shooting in the dark on a nomenclature issue isn't fun, and I'm not picky.

I can't speak for Tom, but I guess my gripe about that name is that we
seem to view a cast as either WITH FUNCTION or WITHOUT FUNCTION. A
cast-without-function is trivial from an implementation point of view,
but it's still a cast, whereas the word "exempt" seems to imply that
you're skipping the cast altogether. A side issue is that I really
want to avoid adding a new parser keyword for this. It doesn't bother
me too much to add keywords for really important and user-visible
features, but when we're adding stuff that's only going to be used by
0.1% of our users it's really better if we can avoid it, because it
slows down the parser. Maybe we could do something like:

CREATE CAST (source_type AS target_type)
WITH FUNCTION function_name (argument_type, [, ...])
[ ANALYZE USING function_name ]
[ AS ASSIGNMENT | AS IMPLICIT ]

Presumably, we wouldn't need the ANALYZE USING clause for the WITHOUT
FUNCTION case, since the answer is a foregone conclusion in that case.
We could possibly support it also for WITH INOUT, but I'm not sure
whether the real-world utility is more than zero.

Internally, we could refer to a function of this type as a "cast analyzer".

Don't take the above as Gospel truth, it's just an idea from one
person on one day.

>> More
>> generally, I think this patch is a case of coding something
>> complicated without first achieving consensus on what the behavior
>> should be.  I think we need to address that problem first, and then
>> decide what code needs to be written, and then write the code.
>
> Well, that's why I started the design thread "Avoiding rewrite in ALTER TABLE
> ALTER TYPE" before writing any code.  That thread petered out rather than reach
> any consensus.  What would you have done then?

Let's defer this question until we're more on the same page about the
rest of this. It's obvious I'm not completely understanding this
patch...

>> I think what this patch is trying to do is tag the
>> call to the cast function with the information that we might not need
>> to call it, but ISTM it would be better to just notice that the
>> function call is unnecessary and insert a RelabelType node instead.
>
> This patch does exactly that for varchar(4) -> varchar(8) and similar cases.

Oh, uh, err... OK, I obviously haven't understood it well enough.
I'll look at it some more, but if there's anything you can write up to
explain what you changed and why, that would be helpful. I think I'm
also having trouble with the fact that these patches don't apply
cleanly against the master branch, because they're stacked up on each
other. Maybe this will be easier once we get more of the ALTER TYPE 2
stuff in.

>
>> *reads archives*
>>
>> In fact, Tom already made pretty much exactly this suggestion, on a
>> thread entitled "Avoiding rewrite in ALTER TABLE ALTER TYPE".  The
>> only possible objection I can see to that line of attack is that it's
>> not clear what the API should be, or whether there might be too much
>> runtime overhead for the amount of benefit that can be obtained.
>
> I believe the patch does implement Tom's suggestion, at least in spirit.  He
> mentioned expression_planner() as the place to do it.  In principle, We could do
> it *right there* and avoid any changes to coerce_to_target_type(), but the
> overhead would increase.  Specifically, we would be walking an already-built
> expression tree looking for casts to remove, probably doing a syscache lookup
> for every cast to grab the exemptor function OID.  Doing it there would prevent
> us from directly helping or harming plain old queries.  Since ExecRelCheck(),
> FormIndexDatum() and other notable paths call expression_planner(), we wouldn't
> want to casually add an expensive algorithm there.  To avoid the extra syscache
> lookups, we'd piggyback off those already done in coerce_to_target_type().  That
> brings us to the current implementation.  Better to make the entire decision in
> coerce_to_target_type() and never add the superfluous node to the expression
> tree in the first place.
>
> coerce_to_target_type() and callees seemed like the natural, low cost place to
> make the changes.  By doing it that way, did I miss some important detail or
> implication of Tom's suggestion?

I'm not sure. Tom?

> As for performance, coerce_to_target_type() is certainly in wide use, but it's
> not exactly a hot path.  I prepared and ran the attached bench-coerce-worst.sql
> and bench-coerce-best.sql.  Both are quite artificial.  The first one basically
> asks "Can we measure the new overhead at all?"  The second one asks "Would any
> ordinary query benefit from removing a superfluous cast from an expression
> tree?"  The worst case had an 4% _speedup_, and the best case had a 27% speedup,
> suggesting answers of "no" and "yes (but not dramatically)".  The "worst-case"
> speedup doesn't make sense, so it's probably an artifact of some recent change
> on master creating a larger performance dip in this pathological test case.  I
> could rebase the patch and rerun the benchmark, but I doubt an exact figure
> would be much more meaningful.  Unfortunately, I can't think of any more-natural
> tests (help welcome) that would still illustrate any performance difference.

That certainly seems promising, but I don't understand how the new
code can be faster on your constructed worst case. Thoughts?

>> I think it's a mistake to merge the handling of the rewrite-vs-noop
>> case with the check-vs-noop case.  They're fundamentally dissimilar.
>> As noted above, being able to notice the noop case seems like it could
>> save run-time work during ordinary query execution.  But detecting the
>> "check" case is useless work except in the specific case of a table
>> rewrite.  If we're going to do that at all, it seems to me that it
>> needs to be done via a code-path that's only going to get invoked in
>> the case of ALTER TABLE, not something that's going to happen every
>> time we parse an expression tree.
>
> I disagree that they're fundamentally dissimilar.  They're _fundamentally_
> similar, in that they are both fences, standing at different distances, around
> the necessity of a cast function.  Your argument only mentions performance, and
> it boils down to a suggestion that we proceed to optimize things now by
> separating these questions.  Perhaps we should, but that's a different question.
> Based on my benchmarks, I'm not seeing a need to micro-optimize.

I'm more concerned about modularity than performance. Sticking a
field into every FuncExpr so that if it happens to get passed to an
ALTER TABLE statement we can pull the flag out seems really ugly to
me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 17:47:22
Message-ID: 21002.1296064042@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... A side issue is that I really
> want to avoid adding a new parser keyword for this. It doesn't bother
> me too much to add keywords for really important and user-visible
> features, but when we're adding stuff that's only going to be used by
> 0.1% of our users it's really better if we can avoid it, because it
> slows down the parser. Maybe we could do something like:

> CREATE CAST (source_type AS target_type)
> WITH FUNCTION function_name (argument_type, [, ...])
> [ ANALYZE USING function_name ]
> [ AS ASSIGNMENT | AS IMPLICIT ]

I'm not terribly thrilled with the suggestion of "ANALYZE" here, because
given the way we use that word elsewhere, people are likely to think it
means something to do with statistics collection; or at least that it
implies some deep and complicated analysis of the cast.

I suggest using a phrase based on the idea that this function tells you
whether you can skip the cast, or (if the sense is inverted) whether the
cast has to be executed. "SKIP IF function_name" would be nice but SKIP
isn't an extant keyword either. The best variant that I can come up
with after a minute's contemplation of kwlist.h is "NO WORK IF
function_name". If you didn't mind inverting the sense of the result
then we could use "EXECUTE IF function_name".

> Internally, we could refer to a function of this type as a "cast analyzer".

Another possibility is to call it a "cast checker" and use CHECK USING.
But I'm not sure that's much better than ANALYZE in terms of conveying
the purpose to a newbie.

>> I believe the patch does implement Tom's suggestion, at least in spirit. He
>> mentioned expression_planner() as the place to do it. In principle, We could do
>> it *right there* and avoid any changes to coerce_to_target_type(), but the
>> overhead would increase. Specifically, we would be walking an already-built
>> expression tree looking for casts to remove, probably doing a syscache lookup
>> for every cast to grab the exemptor function OID.

No no no no. The right place to do it is during expression
simplification in eval_const_expressions(). We are already
disassembling and reassembling the tree, and looking up functions,
in that pass. Detecting that a call is a no-op fits into that
very naturally. Think of it as an alternative to inline_function().

One point worth making here is that eval_const_expressions() does not
currently care very much whether a function call came from cast syntax
or explicitly. It might be worth thinking about whether we want to have
a generic this-function-call-is-a-no-op simplifier hook available for
*all* functions not just those that are casts. I'm not sure we want to
pay the overhead of another pg_proc column, but it's something to think
about.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 19:18:09
Message-ID: AANLkTinHOJ2AjpTLvdryD=cpqK+PG9onmbKa+kZ5aGHy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 12:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> ... A side issue is that I really
>> want to avoid adding a new parser keyword for this.  It doesn't bother
>> me too much to add keywords for really important and user-visible
>> features, but when we're adding stuff that's only going to be used by
>> 0.1% of our users it's really better if we can avoid it, because it
>> slows down the parser.  Maybe we could do something like:
>
>> CREATE CAST (source_type AS target_type)
>>     WITH FUNCTION function_name (argument_type, [, ...])
>>     [ ANALYZE USING function_name ]
>>     [ AS ASSIGNMENT | AS IMPLICIT ]
>
> I'm not terribly thrilled with the suggestion of "ANALYZE" here, because
> given the way we use that word elsewhere, people are likely to think it
> means something to do with statistics collection; or at least that it
> implies some deep and complicated analysis of the cast.
>
> I suggest using a phrase based on the idea that this function tells you
> whether you can skip the cast, or (if the sense is inverted) whether the
> cast has to be executed.  "SKIP IF function_name" would be nice but SKIP
> isn't an extant keyword either.  The best variant that I can come up
> with after a minute's contemplation of kwlist.h is "NO WORK IF
> function_name".  If you didn't mind inverting the sense of the result
> then we could use "EXECUTE IF function_name".

What about borrowing from the trigger syntax?

WITH FUNCTION function_name (argument_type, [...]) WHEN
function_that_returns_true_when_the_call_is_needed

> One point worth making here is that eval_const_expressions() does not
> currently care very much whether a function call came from cast syntax
> or explicitly.  It might be worth thinking about whether we want to have
> a generic this-function-call-is-a-no-op simplifier hook available for
> *all* functions not just those that are casts.  I'm not sure we want to
> pay the overhead of another pg_proc column, but it's something to think
> about.

It's not obvious to me that it has a use case outside of casts, but
it's certainly possible I'm missing something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 20:08:51
Message-ID: 24348.1296072531@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jan 26, 2011 at 12:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If you didn't mind inverting the sense of the result
>> then we could use "EXECUTE IF function_name".

> What about borrowing from the trigger syntax?

> WITH FUNCTION function_name (argument_type, [...]) WHEN
> function_that_returns_true_when_the_call_is_needed

That's a good thought. Or we could use WHEN NOT check_function if you
want to keep to the negative-result semantics.

>> One point worth making here is that eval_const_expressions() does not
>> currently care very much whether a function call came from cast syntax
>> or explicitly. It might be worth thinking about whether we want to have
>> a generic this-function-call-is-a-no-op simplifier hook available for
>> *all* functions not just those that are casts. I'm not sure we want to
>> pay the overhead of another pg_proc column, but it's something to think
>> about.

> It's not obvious to me that it has a use case outside of casts, but
> it's certainly possible I'm missing something.

A possible example is simplifying X + 0 to X, or X * 0 to 0. I've never
wanted to inject such datatype-specific knowledge directly into the
planner, but if we viewed it as function-specific knowledge supplied by
a per-function helper routine, maybe it would fly. Knowing that a cast
function does nothing useful for particular cases could be seen as a
special case of this type of simplification.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 20:47:27
Message-ID: AANLkTikYMzmr1SZwE64pLeRC=XT=xUb1chjLmU6hDF8V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 3:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jan 26, 2011 at 12:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If you didn't mind inverting the sense of the result
>>> then we could use "EXECUTE IF function_name".
>
>> What about borrowing from the trigger syntax?
>
>> WITH FUNCTION function_name (argument_type, [...]) WHEN
>> function_that_returns_true_when_the_call_is_needed
>
> That's a good thought.  Or we could use WHEN NOT check_function if you
> want to keep to the negative-result semantics.

Seems a bit contorted; I don't see any particular reason to prefer
positive vs negative semantics in this case.

>>> One point worth making here is that eval_const_expressions() does not
>>> currently care very much whether a function call came from cast syntax
>>> or explicitly.  It might be worth thinking about whether we want to have
>>> a generic this-function-call-is-a-no-op simplifier hook available for
>>> *all* functions not just those that are casts.  I'm not sure we want to
>>> pay the overhead of another pg_proc column, but it's something to think
>>> about.
>
>> It's not obvious to me that it has a use case outside of casts, but
>> it's certainly possible I'm missing something.
>
> A possible example is simplifying X + 0 to X, or X * 0 to 0.  I've never
> wanted to inject such datatype-specific knowledge directly into the
> planner, but if we viewed it as function-specific knowledge supplied by
> a per-function helper routine, maybe it would fly.  Knowing that a cast
> function does nothing useful for particular cases could be seen as a
> special case of this type of simplification.

Oh, I see. The times I've seen an issue with those kinds of
expressions have always been related to selectivity estimation. For
example, you'd like to get a selectivity estimate of 1-nullfrac for
(x+0)=x and 0 for (x+1)=x, and maybe (1-nullfrac)/2 for (x%2)=0. This
would only handle the first of those cases, so I'm inclined to think
it's too weak to have much general utility. The casting cases can, I
think, have a much larger impact - they occur more often in practice,
and if you can (e.g.) avoid an entire table rewrite, that's a pretty
big deal.

Another semi-related problem case I've run across is that CASE WHEN
... THEN 1 WHEN ... THEN 2 END ought to be known to only ever emit 1
and 2, and the selectivity of comparing that to some other value ought
to be computed on that basis. But now I'm really wandering off into
the weeds.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 21:02:53
Message-ID: 25440.1296075773@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jan 26, 2011 at 3:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> It's not obvious to me that it has a use case outside of casts, but
>>> it's certainly possible I'm missing something.

>> A possible example is simplifying X + 0 to X, or X * 0 to 0.

> Oh, I see. The times I've seen an issue with those kinds of
> expressions have always been related to selectivity estimation.

Yeah, helping the planner recognize equivalent cases is at least as
large a reason for wanting this as any direct savings of execution time.

I don't mind confining the feature to casts to start with, but it might
be a good idea to specify the check-function API in a way that would let
it be plugged into a more generally available call-simplification hook
later.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 21:26:25
Message-ID: AANLkTikwvb1kcuEma-bCmP+DMKxha+KqkubG7rL4Z8V8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 4:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jan 26, 2011 at 3:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> It's not obvious to me that it has a use case outside of casts, but
>>>> it's certainly possible I'm missing something.
>
>>> A possible example is simplifying X + 0 to X, or X * 0 to 0.
>
>> Oh, I see.  The times I've seen an issue with those kinds of
>> expressions have always been related to selectivity estimation.
>
> Yeah, helping the planner recognize equivalent cases is at least as
> large a reason for wanting this as any direct savings of execution time.

Agreed.

> I don't mind confining the feature to casts to start with, but it might
> be a good idea to specify the check-function API in a way that would let
> it be plugged into a more generally available call-simplification hook
> later.

Got any suggestions? My thought was that it should just get (type,
typemod, type, typemod) and return a boolean. We could do something
more complicated, like Expr -> Expr, but I'm pretty unconvinced
there's any value there. I'd like to see some kind of appropriate
hook for interjecting selectivity estimates for cases that we
currently can't recognize, but my gut feeling is that that's
insufficiently related at the problem at hand to worry about it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:01:01
Message-ID: 26702.1296079261@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jan 26, 2011 at 4:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't mind confining the feature to casts to start with, but it might
>> be a good idea to specify the check-function API in a way that would let
>> it be plugged into a more generally available call-simplification hook
>> later.

> Got any suggestions? My thought was that it should just get (type,
> typemod, type, typemod) and return a boolean. We could do something
> more complicated, like Expr -> Expr, but I'm pretty unconvinced
> there's any value there.

Well, (type, typemod, type, typemod) would be fine as long as the only
case you're interested in optimizing is typemod-lengthening coercions.
I think we're going to want the more general Expr-to-Expr capability
eventually.

I guess we could do the restricted case for now, with the idea that
there could be a standard Expr-to-Expr interface function created later
and installed into the generic hook facility for functions that are cast
functions. That could look into pg_cast and make the more restricted
call when appropriate. (The meat of this function would be the same
code you'd have to add to eval_const_expressions anyway for the
hard-wired version...)

> I'd like to see some kind of appropriate
> hook for interjecting selectivity estimates for cases that we
> currently can't recognize, but my gut feeling is that that's
> insufficiently related at the problem at hand to worry about it.

Agreed, that seems a bit off-topic. There are ancient comments in the
source code complaining about the lack of selectivity hooks for function
(as opposed to operator) calls, but that's not what Noah signed up to
fix. In any case I'm not sure that expression simplification is
closely connected to selectivity estimation --- to my mind it's an
independent process that is good to run first. As an example, the ALTER
TABLE case has a use for the former but not the latter.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:12:00
Message-ID: AANLkTin0JNVtTnRgWJ1WP7CCwgzVP=d8nUr071p0s9kY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 5:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jan 26, 2011 at 4:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't mind confining the feature to casts to start with, but it might
>>> be a good idea to specify the check-function API in a way that would let
>>> it be plugged into a more generally available call-simplification hook
>>> later.
>
>> Got any suggestions?  My thought was that it should just get (type,
>> typemod, type, typemod) and return a boolean.  We could do something
>> more complicated, like Expr -> Expr, but I'm pretty unconvinced
>> there's any value there.
>
> Well, (type, typemod, type, typemod) would be fine as long as the only
> case you're interested in optimizing is typemod-lengthening coercions.
> I think we're going to want the more general Expr-to-Expr capability
> eventually.
>
> I guess we could do the restricted case for now, with the idea that
> there could be a standard Expr-to-Expr interface function created later
> and installed into the generic hook facility for functions that are cast
> functions.  That could look into pg_cast and make the more restricted
> call when appropriate.  (The meat of this function would be the same
> code you'd have to add to eval_const_expressions anyway for the
> hard-wired version...)

Well, if you're positive we're eventually going to want this in
pg_proc, we may as well add it now. But I'm not too convinced it's
the right general API. The number of people writing exactly x + 0 or
x * 0 in a query has got to be vanishingly small; I'm not eager to add
additional parse analysis time to every SQL statement that has a
function in it just to detect those cases. Even slightly more
complicated problems seem intractable - e.g. (x + 1) = x can be
simplified to constant false, and NOT ((x + 1) = x) can be simplified
to x IS NOT NULL, but under the proposed API those would have to hang
off of =(int4,int4), which seems pretty darn ugly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:32:00
Message-ID: 27312.1296081120@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, if you're positive we're eventually going to want this in
> pg_proc, we may as well add it now. But I'm not too convinced it's
> the right general API. The number of people writing exactly x + 0 or
> x * 0 in a query has got to be vanishingly small; I'm not eager to add
> additional parse analysis time to every SQL statement that has a
> function in it just to detect those cases.

Actually, you've got that backwards: the facility I've got in mind would
cost next to nothing when not used. The place where we'd want to insert
this in eval_const_expressions has already got its hands on the relevant
pg_proc row, so checking for a nonzero hook-function reference would be
a matter of a couple of instructions. If we go with a pg_cast entry
then we're going to have to add a pg_cast lookup for every cast, whether
it turns out to be optimizable or not; which is going to cost quite a
lot more. The intermediate hook function I was sketching might be
worthwhile from a performance standpoint even if we don't expose the
more general feature to users, just because it would be possible to
avoid useless pg_cast lookups (by not installing the hook except on
pg_proc entries for which there's a relevant CAST WHEN function to call).

> Even slightly more
> complicated problems seem intractable - e.g. (x + 1) = x can be
> simplified to constant false, and NOT ((x + 1) = x) can be simplified
> to x IS NOT NULL, but under the proposed API those would have to hang
> off of =(int4,int4), which seems pretty darn ugly.

True, but where else are you going to hang them off of? Not that I was
particularly thinking of doing either one of those.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:35:23
Message-ID: AANLkTinNymbcnUgrDKLdn26p2nvMYRchSoXaYt3tu4vD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 5:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well, if you're positive we're eventually going to want this in
>> pg_proc, we may as well add it now.  But I'm not too convinced it's
>> the right general API.  The number of people writing exactly x + 0 or
>> x * 0 in a query has got to be vanishingly small; I'm not eager to add
>> additional parse analysis time to every SQL statement that has a
>> function in it just to detect those cases.
>
> Actually, you've got that backwards: the facility I've got in mind would
> cost next to nothing when not used.  The place where we'd want to insert
> this in eval_const_expressions has already got its hands on the relevant
> pg_proc row, so checking for a nonzero hook-function reference would be
> a matter of a couple of instructions.  If we go with a pg_cast entry
> then we're going to have to add a pg_cast lookup for every cast, whether
> it turns out to be optimizable or not; which is going to cost quite a
> lot more.  The intermediate hook function I was sketching might be
> worthwhile from a performance standpoint even if we don't expose the
> more general feature to users, just because it would be possible to
> avoid useless pg_cast lookups (by not installing the hook except on
> pg_proc entries for which there's a relevant CAST WHEN function to call).

Oh, really? I was thinking the logic should go into find_coercion_pathway().

>> Even slightly more
>> complicated problems seem intractable - e.g. (x + 1) = x can be
>> simplified to constant false, and NOT ((x + 1) = x) can be simplified
>> to x IS NOT NULL, but under the proposed API those would have to hang
>> off of =(int4,int4), which seems pretty darn ugly.
>
> True, but where else are you going to hang them off of?  Not that I was
> particularly thinking of doing either one of those.

Beats me, just thinking out loud.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:41:44
Message-ID: 27542.1296081704@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Oh, really? I was thinking the logic should go into find_coercion_pathway().

Well, I've been saying right along that it should be in
eval_const_expressions. Putting this sort of optimization in the parser
is invariably the wrong thing, because it fails to catch all the
possibilities. As an example, inlining a SQL function could expose
opportunities of this type. Another issue is that premature
optimization in the parser creates headaches if conditions change such
that a previous optimization is no longer valid --- you may have stored
rules wherein the optimization was already applied. (Not sure that
specific issue applies to casting, since we have no ALTER CAST commmand;
but in general you want expression optimizations applied downstream from
the rule rewriter not upstream.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:54:43
Message-ID: AANLkTinKVDt_vqFp3KBNq2Kv4sg+vEAceQrmk6YaFb-k@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 5:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Oh, really?  I was thinking the logic should go into find_coercion_pathway().
>
> Well, I've been saying right along that it should be in
> eval_const_expressions.  Putting this sort of optimization in the parser
> is invariably the wrong thing, because it fails to catch all the
> possibilities.  As an example, inlining a SQL function could expose
> opportunities of this type.  Another issue is that premature
> optimization in the parser creates headaches if conditions change such
> that a previous optimization is no longer valid --- you may have stored
> rules wherein the optimization was already applied.  (Not sure that
> specific issue applies to casting, since we have no ALTER CAST commmand;
> but in general you want expression optimizations applied downstream from
> the rule rewriter not upstream.)

Well, I guess my thought was that we what we were doing is extending
the coercion system to be able to make decisions based on both type
OID and typemod. It seems very odd to make an initial decision based
on type OID in one place and then have a completely different system
for incorporating the typemod; and it also seems quite a bit more
expensive, since you'd have to consider it for every function, not
just casts.

A further problem is that I don't think it'd play well at all with the
richer-typemod concept we keep bandying about. If, for example, we
represented all arrays with the same OID (getting rid of the
double-entries in pg_type) and used some kind of augmented-typemod to
store the number of dimensions and the base type, this would
completely fall over.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 22:55:37
Message-ID: 27835.1296082537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> ... Another issue is that premature
> optimization in the parser creates headaches if conditions change such
> that a previous optimization is no longer valid --- you may have stored
> rules wherein the optimization was already applied. (Not sure that
> specific issue applies to casting, since we have no ALTER CAST commmand;
> but in general you want expression optimizations applied downstream from
> the rule rewriter not upstream.)

Actually, I can construct a concrete example where applying this
optimization in the parser will do the wrong thing:

CREATE TABLE base (f1 varchar(4));
CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);

If the parser is taught to throw away "useless" length coercions, then
the stored form of vv will contain no cast, and the results will be
wrong after base's column is widened.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 23:01:11
Message-ID: 20110126230111.GA2498@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 05:55:37PM -0500, Tom Lane wrote:
> I wrote:
> > ... Another issue is that premature
> > optimization in the parser creates headaches if conditions change such
> > that a previous optimization is no longer valid --- you may have stored
> > rules wherein the optimization was already applied. (Not sure that
> > specific issue applies to casting, since we have no ALTER CAST commmand;
> > but in general you want expression optimizations applied downstream from
> > the rule rewriter not upstream.)
>
> Actually, I can construct a concrete example where applying this
> optimization in the parser will do the wrong thing:
>
> CREATE TABLE base (f1 varchar(4));
> CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
> ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
>
> If the parser is taught to throw away "useless" length coercions, then
> the stored form of vv will contain no cast, and the results will be
> wrong after base's column is widened.

We reject that:

[local] test=# ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
ERROR: cannot alter type of a column used by a view or rule
DETAIL: rule _RETURN on view vv depends on column "f1"


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 23:06:02
Message-ID: 28015.1296083162@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, I guess my thought was that we what we were doing is extending
> the coercion system to be able to make decisions based on both type
> OID and typemod.

Well, that's an interesting thought, but the proposal at hand is far
more limited than that --- it's only an optimization that can be applied
in limited situations. If we want to do something like what you're
suggesting here, it's not going to get done for 9.1.

> A further problem is that I don't think it'd play well at all with the
> richer-typemod concept we keep bandying about. If, for example, we
> represented all arrays with the same OID (getting rid of the
> double-entries in pg_type) and used some kind of augmented-typemod to
> store the number of dimensions and the base type, this would
> completely fall over.

Your proposal would completely fall over, you mean. An Expr->Expr hook
API wouldn't be affected at all.

I'm not sure how important that concern is though, because it's hard to
see how any such change wouldn't break existing cast implementation
functions anyway. If the API for length-coercing cast functions
changes, breaking their helper functions too hardly seems like an issue.
Or are you saying you want to punt this whole proposal till after that
happens? I had muttered something of the sort way upthread, but I
didn't think anyone else thought that way.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 23:12:26
Message-ID: 28148.1296083546@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Wed, Jan 26, 2011 at 05:55:37PM -0500, Tom Lane wrote:
>> Actually, I can construct a concrete example where applying this
>> optimization in the parser will do the wrong thing:
>>
>> CREATE TABLE base (f1 varchar(4));
>> CREATE VIEW vv AS SELECT f1::varchar(8) FROM base;
>> ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
>>
>> If the parser is taught to throw away "useless" length coercions, then
>> the stored form of vv will contain no cast, and the results will be
>> wrong after base's column is widened.

> We reject that:

> [local] test=# ALTER TABLE base ALTER COLUMN f1 TYPE varchar(16);
> ERROR: cannot alter type of a column used by a view or rule
> DETAIL: rule _RETURN on view vv depends on column "f1"

Nonetheless, the stored form of vv will contain no cast, which can
hardly be thought safe here. Relaxing the restriction you cite is (or
should be) on the TODO list, but we'll never be able to do it if the
parser throws away relevant information.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 23:13:17
Message-ID: 20110126231317.GB2498@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 05:32:00PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Well, if you're positive we're eventually going to want this in
> > pg_proc, we may as well add it now. But I'm not too convinced it's
> > the right general API. The number of people writing exactly x + 0 or
> > x * 0 in a query has got to be vanishingly small; I'm not eager to add
> > additional parse analysis time to every SQL statement that has a
> > function in it just to detect those cases.
>
> Actually, you've got that backwards: the facility I've got in mind would
> cost next to nothing when not used. The place where we'd want to insert
> this in eval_const_expressions has already got its hands on the relevant
> pg_proc row, so checking for a nonzero hook-function reference would be
> a matter of a couple of instructions. If we go with a pg_cast entry
> then we're going to have to add a pg_cast lookup for every cast, whether
> it turns out to be optimizable or not; which is going to cost quite a
> lot more. The intermediate hook function I was sketching might be
> worthwhile from a performance standpoint even if we don't expose the
> more general feature to users, just because it would be possible to
> avoid useless pg_cast lookups (by not installing the hook except on
> pg_proc entries for which there's a relevant CAST WHEN function to call).

If we hook this into eval_const_expressions, it definitely seems cleaner to
attach the auxiliary function to the pg_proc. Otherwise, we'd reconstruct which
cast led to each function call -- is there even enough information available to
do so unambiguously? Unlike something typmod-specific, these functions would
effectively need to be written in C. Seems like a perfectly acceptable
constraint, though.

For the syntax, then, would a new common_func_opt_item of "WHEN func" fit?

That covers fully-removable casts, but ALTER TABLE still needs to identify casts
that may throw errors but never change the value's binary representation. Where
does that fit? Another pg_proc column for a function called to answer that
question, called only from an ALTER TABLE-specific code path?

Thanks for the feedback/analysis.

nm


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-26 23:29:57
Message-ID: 28474.1296084597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> If we hook this into eval_const_expressions, it definitely seems
> cleaner to attach the auxiliary function to the pg_proc. Otherwise,
> we'd reconstruct which cast led to each function call -- is there even
> enough information available to do so unambiguously?

As far as that goes, yes there is --- otherwise ruleutils would be
unable to reverse-list cast constructs. IIRC the function call is
marked as being a cast, and you get the source and dest type IDs
from ordinary exprType() inspection.

> For the syntax, then, would a new common_func_opt_item of "WHEN func" fit?

WHEN is appropriate for the restricted CAST case, but it doesn't seem
like le mot juste for a general function option, unless we restrict
the helper function to return boolean which is not what I had in mind.

> That covers fully-removable casts, but ALTER TABLE still needs to identify casts
> that may throw errors but never change the value's binary representation.

I remain completely unexcited about optimizing that case, especially if
it doesn't fit into a general framework. The bang for the buck ratio
is not good: too much work, too much uglification, too little return.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, jim(at)nasby(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 00:24:57
Message-ID: 20110127002457.GC2498@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 06:29:57PM -0500, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > If we hook this into eval_const_expressions, it definitely seems
> > cleaner to attach the auxiliary function to the pg_proc. Otherwise,
> > we'd reconstruct which cast led to each function call -- is there even
> > enough information available to do so unambiguously?
>
> As far as that goes, yes there is --- otherwise ruleutils would be
> unable to reverse-list cast constructs. IIRC the function call is
> marked as being a cast, and you get the source and dest type IDs
> from ordinary exprType() inspection.

Good point. So it is, FuncExpr.funcformat conveys that.

> > For the syntax, then, would a new common_func_opt_item of "WHEN func" fit?
>
> WHEN is appropriate for the restricted CAST case, but it doesn't seem
> like le mot juste for a general function option, unless we restrict
> the helper function to return boolean which is not what I had in mind.

True. Uhh, "PARSER MAPPING func"? "PLANS CONVERSION func"?

> > That covers fully-removable casts, but ALTER TABLE still needs to identify casts
> > that may throw errors but never change the value's binary representation.
>
> I remain completely unexcited about optimizing that case, especially if
> it doesn't fit into a general framework. The bang for the buck ratio
> is not good: too much work, too much uglification, too little return.

The return looks attractive when you actually save six hours of downtime. If
I'm the only one that sees such a savings for one of his databases, though, I
suppose it's not worthwhile. We'd miss optimizing these cases:

numeric(8,2) -> numeric(7,2)
varbit(8) -> varbit(7)
text -> xml

The xml one is the most interesting to me. In the design thread, Robert also
expressed interest in that one. Jim Nasby mentioned numeric generally; Jim,
what kind of numeric conversions are important to you? If anyone else will miss
one of those greatly, please speak up.

I found that many interesting cases in this area, most notably varchar(8) ->
varchar(4), are safe a good deal of the time, but not provably so. So perhaps a
system for automatically detecting them would be overkill, but an addition to
the ALTER TABLE syntax[1] to request them would be worthwhile.

nm

[1] http://archives.postgresql.org/message-id/20110106042626.GA28230@tornado.leadboat.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, jim(at)nasby(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 00:44:43
Message-ID: 9482.1296089083@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Wed, Jan 26, 2011 at 06:29:57PM -0500, Tom Lane wrote:
>> I remain completely unexcited about optimizing that case, especially if
>> it doesn't fit into a general framework. The bang for the buck ratio
>> is not good: too much work, too much uglification, too little return.

> The return looks attractive when you actually save six hours of downtime. If
> I'm the only one that sees such a savings for one of his databases, though, I
> suppose it's not worthwhile. We'd miss optimizing these cases:

> numeric(8,2) -> numeric(7,2)
> varbit(8) -> varbit(7)
> text -> xml

But how often do those really come up? And do you really save that
much? The table still has to be locked against other users, so you're
still down, and you're still doing all the reads and computation. I
don't deny that saving the writes is worth something; I just don't agree
that it's worth the development and maintenance effort that such a wart
is going to cost us. User-exposed features are *expensive*.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, jim(at)nasby(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 00:52:10
Message-ID: 9611.1296089530@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> text -> xml

BTW, that reminds me of something that I think was mentioned way back
when, but absolutely fails to fit into any of the frameworks discussed
here: the mere fact that a type is binary-compatible (with or without
checking) doesn't make it compatible enough to not have to recreate
indexes. Where and how are we going to have a wart to determine if
that's necessary? And if the answer is "rebuild indexes whenever the
data type changes", isn't that a further big dent in the argument that
it's worth avoiding a table rewrite? A text->xml replacement is going
to be far from cheap anyway.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 01:11:35
Message-ID: AANLkTimX3GGw1h9YJmxjXZJtfBPNHJYhKedX9-5N_V-O@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 6:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Well, I guess my thought was that we what we were doing is extending
>> the coercion system to be able to make decisions based on both type
>> OID and typemod.
>
> Well, that's an interesting thought, but the proposal at hand is far
> more limited than that --- it's only an optimization that can be applied
> in limited situations.  If we want to do something like what you're
> suggesting here, it's not going to get done for 9.1.

Eh, why is this not entirely straightforward? *scratches head*

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, jim(at)nasby(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 01:12:53
Message-ID: AANLkTinX6uu0YBcDdPavFNLHrQQbgRK_3i96VUabtPqW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 7:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> But how often do those really come up?  And do you really save that
> much?  The table still has to be locked against other users, so you're
> still down, and you're still doing all the reads and computation.  I
> don't deny that saving the writes is worth something; I just don't agree
> that it's worth the development and maintenance effort that such a wart
> is going to cost us.  User-exposed features are *expensive*.

I would think that text -> [something that's still a varlena but with
tighter validation] would be quite common.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, jim(at)nasby(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 01:13:54
Message-ID: 20110127011354.GA3164@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 07:44:43PM -0500, Tom Lane wrote:
> > numeric(8,2) -> numeric(7,2)
> > varbit(8) -> varbit(7)
> > text -> xml
>
> But how often do those really come up?

I'll speak from my own experience, having little idea of the larger community
experience on this one. I usually don't even contemplate changes like this on
nontrivial tables, because the pain of the downtime usually won't make up for
getting the schema just like I want it. Cases that I can't discard on those
grounds are fairly rare. As an order-of-magnitude estimate, I'd throw out one
instance per DBA-annum among the DBAs whose work I witness.

> And do you really save that
> much? The table still has to be locked against other users, so you're
> still down, and you're still doing all the reads and computation. I
> don't deny that saving the writes is worth something; I just don't agree
> that it's worth the development and maintenance effort that such a wart
> is going to cost us. User-exposed features are *expensive*.

If you have no indexes, you still save 50-75% of the cost by just reading and
computing, not rewriting. Each index you add, even if it doesn't involve the
column, pushes that advantage even further. With a typical handful of indexes,
a 95+% cost savings is common enough.

If we implemented ALTER TABLE ... SET DATA TYPE ... IMPLICIT, I'd agree that the
marginal value of automatically detecting the above three cases would not
justify the cost.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, jim(at)nasby(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 01:34:46
Message-ID: 20110127013446.GB3164@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 07:52:10PM -0500, Tom Lane wrote:
> Noah Misch <noah(at)leadboat(dot)com> writes:
> > text -> xml
>
> BTW, that reminds me of something that I think was mentioned way back
> when, but absolutely fails to fit into any of the frameworks discussed
> here: the mere fact that a type is binary-compatible (with or without
> checking) doesn't make it compatible enough to not have to recreate
> indexes. Where and how are we going to have a wart to determine if
> that's necessary?

Design (section 3):
http://archives.postgresql.org/message-id/20101229125625.GA27643@tornado.gateway.2wire.net

Implementation:
http://archives.postgresql.org/message-id/20110113230124.GA18733@tornado.gateway.2wire.net
[disclaimer: I've yet to post an updated version fixing a localized bug in this patch]

I ended up making no attempt to optimize indexes that have predicates or
expression columns; we'll just rebuild them every time. Aside from that, I
failed to find an index on built-in types that requires a rebuild after a type
change optimized by this patch stack. So, the entire wart is really for the
benefit of third-party types that might need it.

> And if the answer is "rebuild indexes whenever the
> data type changes", isn't that a further big dent in the argument that
> it's worth avoiding a table rewrite?

No. Rewriting the table means rebuilding *all* indexes, but the worst case for
a non-rewrite type change is to rebuild all indexes that depend on the changed
column. That's a large win by itself, but we'll usually do even better.

> A text->xml replacement is going
> to be far from cheap anyway.

It's tough to generalize. You can certainly construct a pathological case with
minimal win, but you can just as well construct the opposite. Consider a wide
table with a narrow XML column. Consider a usually-NULL XML column.

nm


From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 06:14:28
Message-ID: 20110127061428.GA29873@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 09:35:24AM -0500, Robert Haas wrote:
> On Mon, Jan 24, 2011 at 11:13 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >> > This helps on conversions like varchar(4)->varchar(8) and text->xml.
> >>
> >> I've read through this patch somewhat. ?As I believe Tom also
> >> commented previously, exemptor is a pretty bad choice of name.
> >
> > I spent awhile with a thesaurus before coining that. ?Since Tom's comment, I've
> > been hoping the next person to complain would at least suggest a better name.
> > Shooting in the dark on a nomenclature issue isn't fun, and I'm not picky.

> CREATE CAST (source_type AS target_type)
> WITH FUNCTION function_name (argument_type, [, ...])
> [ ANALYZE USING function_name ]
> [ AS ASSIGNMENT | AS IMPLICIT ]

Thanks for writing about it. Seems the rest of the thread converged pretty well
on a set of viable name candidates.

> >>?I think what this patch is trying to do is tag the
> >> call to the cast function with the information that we might not need
> >> to call it, but ISTM it would be better to just notice that the
> >> function call is unnecessary and insert a RelabelType node instead.
> >
> > This patch does exactly that for varchar(4) -> varchar(8) and similar cases.
>
> Oh, uh, err... OK, I obviously haven't understood it well enough.
> I'll look at it some more, but if there's anything you can write up to
> explain what you changed and why, that would be helpful.

Based on downthread discussion, I figure this will all change a good deal. I'll
still briefly explain the patch as written. Most of the patch is plumbing to
support the new syntax, catalog entries, and FuncExpr field. The important
changes are in parse_coerce.c. I modified find_coercion_pathway() and
find_typmod_coercion_function() to retrieve pg_cast.castexemptor alongside
pg_cast.castfunc. Their callers (coerce_type() and coerce_type_typmod(),
respectively) then call the new apply_exemptor_function(), which calls the
exemptor function, if any, returns the value to place in FuncExpr.funcexempt,
and possibly updates the CoercionPathType the caller is about to use.
build_coercion_expression(), unchanged except to populate FuncExpr.funcexempt,
remains responsible for creating the appropriate node (RelabelType, FuncExpr).
Finally, I change GetCoerceExemptions to use FuncExpr.funcexempt.

> I think I'm
> also having trouble with the fact that these patches don't apply
> cleanly against the master branch, because they're stacked up on each
> other. Maybe this will be easier once we get more of the ALTER TYPE 2
> stuff in.

It's certainly tricky to review a bunch of patches in parallel when they have a
serial dependency chain. I'll merge the at0 and at2 bits per your request to
see what we can do there.

> > As for performance, coerce_to_target_type() is certainly in wide use, but it's
> > not exactly a hot path. ?I prepared and ran the attached bench-coerce-worst.sql
> > and bench-coerce-best.sql. ?Both are quite artificial. ?The first one basically
> > asks "Can we measure the new overhead at all?" ?The second one asks "Would any
> > ordinary query benefit from removing a superfluous cast from an expression
> > tree?" ?The worst case had an 4% _speedup_, and the best case had a 27% speedup,
> > suggesting answers of "no" and "yes (but not dramatically)". ?The "worst-case"
> > speedup doesn't make sense, so it's probably an artifact of some recent change
> > on master creating a larger performance dip in this pathological test case. ?I
> > could rebase the patch and rerun the benchmark, but I doubt an exact figure
> > would be much more meaningful. ?Unfortunately, I can't think of any more-natural
> > tests (help welcome) that would still illustrate any performance difference.
>
> That certainly seems promising, but I don't understand how the new
> code can be faster on your constructed worst case. Thoughts?

I hadn't merged master into my at3 branch in awhile; my best guess is that some
recent change in master slowed that test case down by about 4%. I could merge
up to confirm that theory. Again, though, it's an abjectly silly test case.
Even if the test showed a 50% slowdown, we wouldn't have much cause for concern.
Perhaps a 300% slowdown would have been notable.

> I'm more concerned about modularity than performance. Sticking a
> field into every FuncExpr so that if it happens to get passed to an
> ALTER TABLE statement we can pull the flag out seems really ugly to
> me.

Understood. I agree that it's awfully specialized to be hanging around in
FuncExpr. Doing it this way seemed to minimize the global quantity of ugly
code, but you're right -- better to have somewhat-ugly code in tablecmds.c than
to expose this in FuncExpr. Most of the benefit of the current approach will be
gone when the optimization moves to eval_const_expressions(), anyway. One way
or another, the next version will not do this.

Thanks,
nm


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 14:02:26
Message-ID: AANLkTikyomS+jj61Zb4XYc-fjNdUCkrmgGhX-tuxdz_o@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2011 at 1:14 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Based on downthread discussion, I figure this will all change a good deal.  I'll
> still briefly explain the patch as written.  Most of the patch is plumbing to
> support the new syntax, catalog entries, and FuncExpr field.  The important
> changes are in parse_coerce.c.  I modified find_coercion_pathway() and
> find_typmod_coercion_function() to retrieve pg_cast.castexemptor alongside
> pg_cast.castfunc.  Their callers (coerce_type() and coerce_type_typmod(),
> respectively) then call the new apply_exemptor_function(), which calls the
> exemptor function, if any, returns the value to place in FuncExpr.funcexempt,
> and possibly updates the CoercionPathType the caller is about to use.
> build_coercion_expression(), unchanged except to populate FuncExpr.funcexempt,
> remains responsible for creating the appropriate node (RelabelType, FuncExpr).
> Finally, I change GetCoerceExemptions to use FuncExpr.funcexempt.

OK. I was thinking that instead moving this into
eval_const_expressions(), we just make the logic in
find_coercion_pathway() call the "exemptor" function (or whatever we
call it) right around here:

switch (castForm->castmethod)
{
case COERCION_METHOD_FUNCTION:
result = COERCION_PATH_FUNC;
*funcid = castForm->castfunc;
break;
case COERCION_METHOD_INOUT:
result = COERCION_PATH_COERCEVIAIO;
break;
case COERCION_METHOD_BINARY:
result = COERCION_PATH_RELABELTYPE;
break;
default:
elog(ERROR, "unrecognized
castmethod: %d",
(int) castForm->castmethod);
break;
}

If it's COERCION_METHOD_FUNCTION, then instead of unconditionally
setting the result to COERCION_PATH_FUNC, we inquire - based on the
typemods - whether it's OK to downgrade to a
COERCION_PATH_RELABELTYPE. The only fly in the ointment is that
find_coercion_pathway() doesn't current get the typemods. Not sure
how ugly that would be to fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 14:46:56
Message-ID: 21936.1296139616@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> OK. I was thinking that instead moving this into
> eval_const_expressions(), we just make the logic in
> find_coercion_pathway() call the "exemptor" function (or whatever we
> call it) right around here:

No, no, no, no. Didn't you read yesterday's discussion? parse_coerce
is entirely the wrong place for this.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 15:42:14
Message-ID: AANLkTimuCiwhZ9n8av-vw7zDig+oeJcd0FdmfooMHRAc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2011 at 9:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> OK. I was thinking that instead moving this into
>> eval_const_expressions(), we just make the logic in
>> find_coercion_pathway() call the "exemptor" function (or whatever we
>> call it) right around here:
>
> No, no, no, no.  Didn't you read yesterday's discussion?  parse_coerce
> is entirely the wrong place for this.

I not only read it, I participated in it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 16:24:39
Message-ID: AANLkTik22fqCNhMyZmw__64eLtFqSXUq4QNHhazfVji=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2011 at 10:42 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jan 27, 2011 at 9:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> OK. I was thinking that instead moving this into
>>> eval_const_expressions(), we just make the logic in
>>> find_coercion_pathway() call the "exemptor" function (or whatever we
>>> call it) right around here:
>>
>> No, no, no, no.  Didn't you read yesterday's discussion?  parse_coerce
>> is entirely the wrong place for this.
>
> I not only read it, I participated in it.

To put that another way, there's a difference between reading
something and agreeing with it. I did the former, but not the latter.
It seems to me that this discussion is unnecessarily antagonistic.
Is it not OK for me to have a different idea about which way to go
with something than you do?

My personal view is that we ought to try to be increasing the number
of places where type modifiers behave like they're really part of the
type, rather than being an afterthought that we deal with when
convenient and otherwise ignore. Otherwise, I see the chances of any
substantive improvements in our type system as being just about zero.

However, the larger point here is simply this: I'm trying to make some
progress on reviewing and, where appropriate, committing the patches
that were submitted for this CommitFest. I'd like to try to avoid the
phenomenon where we're tripping over each other's feet. We're not
making out very well on that front at the moment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 16:35:01
Message-ID: 15364.1296146101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Is it not OK for me to have a different idea about which way to go
> with something than you do?

Well, in this case I firmly believe that you're mistaken.

> My personal view is that we ought to try to be increasing the number
> of places where type modifiers behave like they're really part of the
> type, rather than being an afterthought that we deal with when
> convenient and otherwise ignore.

And this argument is 100% irrelevant to the problem. The problem is
that you want to put an optimization into the wrong phase of processing.
That is going to hurt us, tomorrow if not today, and it has got *no*
redeeming social value in terms of beng more flexible about what typmods
are or how "well integrated" they are.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 16:37:26
Message-ID: AANLkTinkWwYcb1pBKYRpPS2NQOjxNLjL0CMr2P4Wy8Fa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2011 at 11:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> My personal view is that we ought to try to be increasing the number
>> of places where type modifiers behave like they're really part of the
>> type, rather than being an afterthought that we deal with when
>> convenient and otherwise ignore.
>
> And this argument is 100% irrelevant to the problem.  The problem is
> that you want to put an optimization into the wrong phase of processing.
> That is going to hurt us, tomorrow if not today, and it has got *no*
> redeeming social value in terms of beng more flexible about what typmods
> are or how "well integrated" they are.

The only thing we're deciding here is whether or not a cast requires a
function. That's a function of the type OIDs and the typemods. I
don't see why it's wrong to do the portion that involves the types in
the same place as the portion that involves the typemods. Perhaps you
could explain.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-27 17:23:17
Message-ID: 20110127172317.GA14528@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2011 at 09:02:26AM -0500, Robert Haas wrote:
> OK. I was thinking that instead moving this into
> eval_const_expressions(), we just make the logic in
> find_coercion_pathway() call the "exemptor" function (or whatever we
> call it) right around here:
>
> switch (castForm->castmethod)
> {
> case COERCION_METHOD_FUNCTION:
> result = COERCION_PATH_FUNC;
> *funcid = castForm->castfunc;
> break;
> case COERCION_METHOD_INOUT:
> result = COERCION_PATH_COERCEVIAIO;
> break;
> case COERCION_METHOD_BINARY:
> result = COERCION_PATH_RELABELTYPE;
> break;
> default:
> elog(ERROR, "unrecognized
> castmethod: %d",
> (int) castForm->castmethod);
> break;
> }
>
> If it's COERCION_METHOD_FUNCTION, then instead of unconditionally
> setting the result to COERCION_PATH_FUNC, we inquire - based on the
> typemods - whether it's OK to downgrade to a
> COERCION_PATH_RELABELTYPE. The only fly in the ointment is that
> find_coercion_pathway() doesn't current get the typemods. Not sure
> how ugly that would be to fix.

Basically, this is a stylistic variation of the approach from my original at3
patch. I believe I looked at that particular positioning and decided that the
extra arguments and effects on non-parse_coerce.c callers were undesirable.
Debatable as any style issue, though. Note that you need to do the same thing
in find_typmod_coercion_function().


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-28 18:52:32
Message-ID: AANLkTikmoMLSMWtML1PPHgvw6YSQ_sR3wqOhkD0ULfLX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 6:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm not sure how important that concern is though, because it's hard to
> see how any such change wouldn't break existing cast implementation
> functions anyway.  If the API for length-coercing cast functions
> changes, breaking their helper functions too hardly seems like an issue.
> Or are you saying you want to punt this whole proposal till after that
> happens?  I had muttered something of the sort way upthread, but I
> didn't think anyone else thought that way.

I've been thinking about this patch a little bit more and I'm coming
around to the viewpoint that we should mark this (and the successor
patches in the same series) Returned with Feedback, and revisit the
issue for 9.2. I'm not necessarily signing on to the viewpoint that
we should wait to do any of this work until after we refactor
typemods, but it does strike me that the fact that Tom and I have
completely different ideas of how this will interact with future
changes in this area probably means we need to take some more time to
talk about what those future enhancements might look like, rather than
rushing something now and maybe regretting it later. We may not need
to actually do all that work before we do this, but it sounds like at
a minimum we need some agreement on what the design should look like.

I think there is still hope for ALTER TYPE 2 (really ALTER TABLE 2 or
SET DATA TYPE 2; it's misnamed) to be committed this cycle and I'll
continue reviewing that one to see whether we can get at least that
much done for 9.1. But I think the rest of this just needs more time
than we have to give it right now. There are more than 60 patches
left open in the CommitFest at this point, and we have less than three
weeks left. If we don't focus our efforts on the things where we have
clear agreement and good, finished code, we're going to end up either
dragging this CommitFest out for months or else getting very little
done for the next few weeks and then indiscriminately punting whatever
is left over at the end. I don't want to do either of those things,
so that means it's time to start making some tough choices, and
unfortunately I think this is one that's going to have to get cut, for
lack of agreement on the basic design.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-28 21:49:39
Message-ID: 20110128214939.GA16966@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 28, 2011 at 01:52:32PM -0500, Robert Haas wrote:
> On Wed, Jan 26, 2011 at 6:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I'm not sure how important that concern is though, because it's hard to
> > see how any such change wouldn't break existing cast implementation
> > functions anyway. ?If the API for length-coercing cast functions
> > changes, breaking their helper functions too hardly seems like an issue.
> > Or are you saying you want to punt this whole proposal till after that
> > happens? ?I had muttered something of the sort way upthread, but I
> > didn't think anyone else thought that way.
>
> I've been thinking about this patch a little bit more and I'm coming
> around to the viewpoint that we should mark this (and the successor
> patches in the same series) Returned with Feedback, and revisit the
> issue for 9.2.

This is just.

> I'm not necessarily signing on to the viewpoint that
> we should wait to do any of this work until after we refactor
> typemods, but it does strike me that the fact that Tom and I have
> completely different ideas of how this will interact with future
> changes in this area probably means we need to take some more time to
> talk about what those future enhancements might look like, rather than
> rushing something now and maybe regretting it later. We may not need
> to actually do all that work before we do this, but it sounds like at
> a minimum we need some agreement on what the design should look like.

I've deferred comment due to my obvious bias, but I can't see any sense in
blocking a change like this one on account of the exceptionally-preliminary
discussions about enriching typmod. Suppose, like my original design, we make
no provision to insulate against future typmod-related changes. The number of
interfaces that deal in typmod are so great that the marginal effort to update
the new ones will be irrelevant. I still like Tom's idea of an Expr<->Expr
interface. I like it because it opens more opportunities now, not because it
will eliminate some modicum of effort from an enriched-typmod implementation.

nm


From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-29 08:26:07
Message-ID: 20110129082607.GB18784@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 28, 2011 at 04:49:39PM -0500, Noah Misch wrote:
> On Fri, Jan 28, 2011 at 01:52:32PM -0500, Robert Haas wrote:
> > On Wed, Jan 26, 2011 at 6:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > I'm not sure how important that concern is though, because it's hard to
> > > see how any such change wouldn't break existing cast implementation
> > > functions anyway. ?If the API for length-coercing cast functions
> > > changes, breaking their helper functions too hardly seems like an issue.
> > > Or are you saying you want to punt this whole proposal till after that
> > > happens? ?I had muttered something of the sort way upthread, but I
> > > didn't think anyone else thought that way.
> >
> > I've been thinking about this patch a little bit more and I'm coming
> > around to the viewpoint that we should mark this (and the successor
> > patches in the same series) Returned with Feedback, and revisit the
> > issue for 9.2.
>
> This is just.

One other thing: #7 does not depend on #3,4,5,6 or any design problems raised
thus far, so there's no need to treat it the same as that group.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-29 10:53:29
Message-ID: AANLkTi=J3ZC9CoAARwU_1k+yqxiPtpfkh2AwSkB-4hpW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 28, 2011 at 4:49 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> I'm not necessarily signing on to the viewpoint that
>> we should wait to do any of this work until after we refactor
>> typemods, but it does strike me that the fact that Tom and I have
>> completely different ideas of how this will interact with future
>> changes in this area probably means we need to take some more time to
>> talk about what those future enhancements might look like, rather than
>> rushing something now and maybe regretting it later.  We may not need
>> to actually do all that work before we do this, but it sounds like at
>> a minimum we need some agreement on what the design should look like.
>
> I've deferred comment due to my obvious bias, but I can't see any sense in
> blocking a change like this one on account of the exceptionally-preliminary
> discussions about enriching typmod.  Suppose, like my original design, we make
> no provision to insulate against future typmod-related changes.  The number of
> interfaces that deal in typmod are so great that the marginal effort to update
> the new ones will be irrelevant.  I still like Tom's idea of an Expr<->Expr
> interface.  I like it because it opens more opportunities now, not because it
> will eliminate some modicum of effort from an enriched-typmod implementation.

Once we add syntax to support this feature, we have to support that
syntax for an extremely long time. We can't just remove it in the
next release and replace it with something else - pg_dump has to still
work, for example, and we have to able to reload whatever it produces.
Adding a user-visible API that we may want to turn around and change
*next release* is just a bad idea. If we were talking about
implementing this through some sort of hard-coded internal list of
types, it wouldn't be quite so much of an issue, but that's not where
we're at. Moreover, I fear that injecting this into
eval_const_expressions() is adding a frammish that has no practical
utility apart from this case, but we still have to pay the overhead;
even Tom expressed some initial doubt about whether that made sense,
and I'm certainly not sold on it either. I don't see any particular
reason why we can't resolve all of these issues, but it's going to
take more time than we have right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 3: add facility to identify further no-work cases
Date: 2011-01-29 11:06:20
Message-ID: AANLkTin3HYvQ_g17F7dwx4izAmjAVfmtHBexoQUXGxSv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 29, 2011 at 3:26 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> One other thing: #7 does not depend on #3,4,5,6 or any design problems raised
> thus far, so there's no need to treat it the same as that group.

Well, if you want to post an updated patch that's independent of the
rest of the series, I guess you can... but I think the chances of that
getting applied for this release are pretty much zero. That patch
relies on some subtle changes to the contract implied by an operator
family and what looks at first blush like a lot of grotty hackery. I
can't get very excited about spending a lot of time on that right now,
especially given the amount of difficulty we've had reaching a meeting
of the minds on cases that don't have those problems.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company