Re: Using indices for UNION.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Using indices for UNION.
Date: 2014-04-04 20:57:54
Message-ID: 26886.1396645074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I still think this stuff mostly needs to be thrown away and rewritten
> in Path-creation style, but that's a long-term project. In the meantime
> this seems like a relatively small increment of complexity, so maybe it's
> worth applying. I'm concerned about the method for building a new
> DISTINCT clause though; the best that can be said for that is that it's
> a crude hack, and I'm less than convinced that there are no cases where
> it'll dump core.

OK, after still more time thinking about it, here's the issues with that
way of generating a DISTINCT clause corresponding to the UNION:

1. Calling transformDistinctClause with a NULL pstate seems pretty unsafe.
It might accidentally fail to fail, today, but it's surely fragile.
It's violating the API contract not only of transformDistinctClause
itself (which is not documented as accepting a NULL pstate) but of
addTargetToGroupList (q.v.). A minimum requirement before I'd accept a
patch that did that is that it extended the header comments of those
functions to specify under what circumstances a NULL pstate can be passed.
However, that's not the direction to go, because of #2.

2. This approach potentially changes the semantics of the UNION. (This is
only important for a datatype that has more than one btree equality
operator, but let's posit that.) transformDistinctClause, as per its
header comment, allows the outer ORDER BY to influence which equality
operator it picks for DISTINCT. However, in the case of
"(SELECT ... UNION SELECT ...) ORDER BY ...", the UNION semantics were
chosen without reference to the ORDER BY, so it's possible that the
equality operators cited in the SetOperationStmt's groupClauses list are
not what you'd get from applying transformDistinctClause as the patch does.
It is not okay for the planner to override the parser's choice of
semantics like that.

Now I'm fairly sure that planner.c is expecting that the query's
distinctClause be a superset of the sortClause if any (cf comments for
SortGroupClause in parsenodes.h), so it wouldn't be okay to just blindly
build a distinctClause from topop->groupClauses. I think what you need
to do is check that topop->groupClauses is compatible with the sortClause
if any (skipping the flattening transformation if not) and then build a
distinctClause by extending the sort clause list with any missing items
from topop->groupClauses. So this is sort of like what
transformDistinctClause does, but different in detail, and the failure
case is to not do the transformation, rather than ereport'ing. (See also
generate_setop_grouplist, which you could almost use except that it's not
expecting to have to merge with a sortClause list.)

So this is all doable enough, but you're probably going to end up with
a new distinctClause-generating function that's at least twice the size
of the patch's former net code addition. So the question is whether it's
worth the trouble, considering that all this code has (I hope) a life
expectancy of no more than one or two more release cycles.

I'm going to set the patch back to Waiting on Author, but unless you
are prepared to rewrite the distinctClause creation code in the next
couple of days, you should change it to Returned with Feedback.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-04-04 21:23:49 Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message Andres Freund 2014-04-04 20:43:07 Re: Another thought about search_path semantics