Re: Design notes for EquivalenceClasses

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Design notes for EquivalenceClasses
Date: 2007-01-18 00:13:20
Message-ID: Pine.LNX.4.58.0701181052040.25798@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 17 Jan 2007, Tom Lane wrote:

> strict. However, we also allow equivalence clauses that appear below the
> nullable side of an outer join to form EquivalenceClasses; for these
> classes, the interpretation is that either all the values are equal, or
> all (except pseudo-constants) have gone to null. (This requires a
> limitation that non-constant members be strict, else they might not go
> to null when the other members do.) Consider for example
>
> SELECT *
> FROM a LEFT JOIN
> (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss
> ON a.x = ss.y
> WHERE a.x = 42;
>
> We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby
> apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed
> clauses from forming EquivalenceClasses is exactly that we want to be able
> to push any derived clauses as far down as possible.) But once above the
> outer join it's no longer necessarily the case that b.y = 10, and thus we
> cannot use such EquivalenceClasses to conclude that sorting is unnecessary
> (see discussion of PathKeys below). In this example, notice also that
> a.x = ss.y (really a.x = b.y) is not an equivalence clause because its
> applicability to b is restricted by the outer join; thus we do not make
> the mistake of concluding b.y = 42, even though we do have an equivalence
> class for {a.x 42}.

This seems to be correct. A nice improvement.

> With a little further thought, it becomes apparent that nestloop joins
> can also produce sorted output. For example, if we do a nestloop join
> between outer relation A and inner relation B, then any pathkeys relevant
> to A are still valid for the join result: we have not altered the order of
> the tuples from A. Even more interesting, if there was an equivalence clause
> A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert
> that B.Y is a pathkey for the join result; X was ordered before and still
> is, and the joined values of Y are equal to the joined values of X, so Y
> must now be ordered too. This is true even though we used neither an
> explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted
> on to preserve the order of their outer relation, because the executor
> might decide to "batch" the join, so we always set pathkeys to NIL for
> a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the
> ordering of its outer relation, because it might insert nulls at random
> points in the ordering.

I was thinking about this, but in relation to hash joins. A hash join
cannot be guaranteed to produce output sorted according to the pathkey of
the outer relation (as explained in the existing README). I wonder,
however, if it might be useful for hash join to pass a hint that the
output is known ordered (i.e., the join was not split into multiple
batches).

Thanks,

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2007-01-18 00:37:53 Re: Function execution costs 'n all that
Previous Message Alvaro Herrera 2007-01-17 22:41:19 Re: [COMMITTERS] pgsql: Implement width_bucket() for the