Re: ALTER TYPE 2: skip already-provable no-work rewrites

From: Noah Misch <noah(at)leadboat(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ALTER TYPE 2: skip already-provable no-work rewrites
Date: 2011-02-14 19:15:24
Message-ID: 20110214191524.GA16834@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Stephen,

Thanks for jumping in on this one.

On Mon, Feb 14, 2011 at 01:12:21PM -0500, Stephen Frost wrote:
> First question is- why do you use #ifdef USE_ASSERT_CHECKING ..?

The other six code sites checking assert_enabled directly do the same.

> assert_enabled exists and will work the way you expect regardless, no?

Yes.

> Strikes me as unlikely that the checks would be a real performance
> problem..

Agreed.

> In ATColumnChangeSetWorkLevel(), I'm really not a huge fan of using a
> passed-in argument to move through a list with.. I'd suggest using a
> local variable that is set from what's passed in. I do see that's
> inheirited, but still, you've pretty much redefined that entire
> function anyway..

Could do that. However, the function would reference the original argument just
once, to make that copy. I'm not seeing a win.

> Also, I feel like that while(!tab->rewrite) really deserves more
> explanation of what's happening than the function-level comment above
> gives it. I'd prefer to see a comment above the while() explaining
> that we're moving through a list to see if there's any level at which
> expr is something complicated or is referring to a domain which has
> constraints on it (presuming that I've followed what's going on
> correctly, that is..).

The way I like to think about it is that we're recursively walking an expression
tree to determine which of three categories it falls into: always produces the
same value without error; always produces the same value on success, but may
throw an error; neither of the above. We have a whitelist of node types that
are acceptable, and anything else makes us assume the worst. (The nodes we
accept are simple enough that the recursion degenerates to iteration.) Here's
the comment explaining the algorithm, from an earlier version of the patch:

+ /* GetCoerceExemptions()
+ * Assess invariants of a coercion expression.
+ *
+ * Various common expressions arising from type coercion are subject to
+ * optimizations. For example, a simple varchar -> text cast will never change
+ * the underlying data (COERCE_EXEMPT_NOCHANGE) and never yield an error
+ * (COERCE_EXEMPT_NOERROR). A varchar(8) -> varchar(4) will never change the
+ * data, but it may yield an error. Given a varno and varattno denoting "the"
+ * source datum, determine which invariants hold for an expression by walking it
+ * per these rules:
+ *
+ * 1. A Var with the varno/varattno in question has both invariants.
+ * 2. A RelabelType node inherits the invariants of its sole argument.
+ * 3. A CoerceToDomain node inherits any COERCE_EXEMPT_NOCHANGE invariant from
+ * its sole argument. When GetDomainConstraints() == NIL, it also inherits
+ * COERCE_EXEMPT_NOERROR. Otherwise, COERCE_EXEMPT_NOERROR becomes false.
+ * 4. All other nodes have neither invariant.
+ *
+ * Returns a bit string that may contain the following bits:
+ * COERCE_EXEMPT_NOCHANGE: expression result will always have the same binary
+ * representation as a Var expression having the given varno and
+ * varattno
+ * COERCE_EXEMPT_NOERROR: expression will never throw an error
+ */

The return value changed, but the rest remains accurate. Here's a similar
explanation from the design thread; it covers a more general case:

http://archives.postgresql.org/message-id/20101231013534.GA7521@tornado.leadboat.com

Should I restore some of that? Any other particular text that would have helped?

> It also seems like it'd make more sense to me to
> be a while() controlled by (IsA(expr, Var) && ((Var *) expr)->varattno
> == varattno), since that's really the normal "stopping point".

If we can optimize to some extent, that is the stopping point. If not,
tab->rewrite is the stopping point. I picked the no-optimization case as
"normal" and used that as the loop condition, but one could go either way.

> These are all more stylistic issues than anything else. Last, but not
> least, I do worry about how this may impact contrib modules, external
> projects, or user-added data types, such as PostGIS, hstore, and ip4r.
> Could we/should we limit this to only PG data types that we 'know' are
> binary compatible? Is there any way or reason such external modules
> could be fouled up by this?

External modules are safe. If a binary coercion cast (CREATE CAST ... WITHOUT
FUNCTION) implements the type conversion for an ALTER TABLE ... SET DATA TYPE,
coerce_to_target_type() will inject a RelabelType node, and this code will pick
up on that and avoid the rewrite. If such a cast were defined erroneously, the
user is no less in trouble. He'd get a table rewrite, but the rewrite would
just transfer the bits unchanged. The validity of this optimization does not
rely on any user-settable property of a data type, but it does lean heavily on
the execution semantics of specific nodes.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cédric Villemain 2011-02-14 19:31:45 Re: why two dashes in extension load files
Previous Message Andrew Dunstan 2011-02-14 19:07:17 Re: sepgsql contrib module