Re: Terrible plan for join to nested union

From: Nate Allan <nallan(at)ancestry(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Terrible plan for join to nested union
Date: 2012-07-09 04:02:23
Message-ID: 9B2D6747F4AB8A47BE45216B06DEDAF92ABE6739@PREXMB01.myfamily.int
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>>Nate Allan <nallan(at)ancestry(dot)com> writes:
>> It seems to me that the join condition (and hence the restriction) should be pushed down into both sides of the union to bring the cardinality limit from millions to 1. I'm imagining a rewrite like this:
>> R(a) J (b U c) -> (b J R(a)) U (c J R(a)) ...where R = Restrict, J
>> = Join, U = Union

>[ eyes that suspiciously ... ] I'm not convinced that such a transformation is either correct in general (you seem to be assuming at least that A's join column is unique, and >what is the UNION operator supposed to do with A's other columns?) or likely to lead to a performance improvement in general.

If there are more columns, you are correct that you might have to project off any additional columns within the union, and leave the join outside of the union intact to bring in the extra columns. Those are essentially the same considerations as when making other rewrites though. As for this optimization making unions faster in general, I would argue that it is rather easy to produce a plan superior to complete materialization of the union.

>We possibly could push down a join condition on the inner side of a nestloop, similarly to what's done in the UNION ALL case ... but that would require a complete >refactoring of what the planner does with UNIONs. By and large, very little optimization effort has been put into non-ALL UNION (or INTERSECT or EXCEPT). You should >not expect that to change on a time scale of less than years.

I hate to come across as contrary, but I'm pretty shocked by this answer for a couple reasons:
1) This is a clear-cut case of an untenable execution plan, essentially a bug in the planner. This response contradicts the widely broadcast assertion that the PG community fixes planner bugs quickly and will not introduce hints because they would rather address these kinds of issues "correctly".
2) Why would more effort go into Union All rather than Union? Are people using Union All more than Union, and if so is this because they actually want duplicates or is it because they've been trained to due to the performance problems with Union? Union All, in many people's opinions, shouldn't even exist in a true relational sense.

Again, sorry if I'm coming off as abrasive, I've spent political capital pushing to get PG in on this project, and now I'm a little worried about whether it is going to work for this kind of scale and complexity, so I'm a little stressed. I do appreciate your responses.

Best,

-Nate

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-07-09 05:49:52 Re: Terrible plan for join to nested union
Previous Message Tom Lane 2012-07-08 15:56:35 Re: Terrible plan for join to nested union