Problem with CVS HEAD's handling of mergejoins

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Problem with CVS HEAD's handling of mergejoins
Date: 2008-01-09 02:08:00
Message-ID: 24443.1199844480@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So I adjusted the patch I was working on as suggested here
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00251.php
and things started blowing up all over the place --- Assert failures,
"too few pathkeys for mergeclauses" errors, etc :-(

On investigation, the problem seems to be a bit of brain-fade on my
part. The planner uses "PathKeys" lists both to represent what's known
about the output sort order of a Path, and to represent per-merge-clause
ordering data for a merge join. A PathKeys list that represents sort
ordering ought to be "canonical", meaning it contains no redundant keys
--- a key might be redundant because it is the same as some prior key
in the list ("ORDER BY x,x") or because it is known equal to a constant
and thus uninteresting for sorting ("WHERE x = 1 ... ORDER BY x").
However, there are several places in the planner that expect the
pathkeys for a merge join to be one-for-one with the selected merge
clauses.

I'm not sure why we didn't come across test cases exposing this problem
earlier in beta. A partial explanation is that the equal-to-a-constant
case was unlikely to arise before, because given "WHERE x = y AND x = 1"
the code up to now would get rid of "x = y" altogether and then never
try for a mergejoin; my patch to eliminate that behavior was what
exposed the problem. But there are other ways to get redundant keys in
a list of candidate mergejoin clauses.

One possibility seems to be to keep a list of "raw" (not canonical)
pathkeys for each side of a list of proposed mergejoin clauses, which
is one-to-one with the clauses, with the clear understanding that the
canonical list that describes the path's sort ordering might be just a
subset of this list. This would mean a couple of new fields in MergePath
structs, but fortunately no on-disk format changes since MergePaths
never get to disk.

A perhaps less invasive idea is to discard any proposed mergeclauses
that are redundant in this sense. This would still require some
reshuffling of responsibility between select_mergejoin_clauses and
the code in pathkeys.c, since right now select_mergejoin_clauses
takes no account of that. However, I'm worried that that might result
in planner failure on some FULL JOIN cases that work today, since we
require all the join clauses to be mergejoinable for a FULL JOIN.
People seem to complain when the planner fails, even for really
stupid queries ;-). I think this would only be acceptable if we can
prove that ignoring clauses that are redundant in this sense doesn't
change the result --- which might be the case, but I'm not sure.

I think I can fix this in a day or so, but I now definitely feel that
we'll need an RC2 :-(

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Problem with CVS HEAD's handling of mergejoins
Date: 2008-01-09 02:21:00
Message-ID: 200801090221.m092L0107477@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I think I can fix this in a day or so, but I now definitely feel that
> we'll need an RC2 :-(

Understood. :-|

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Problem with CVS HEAD's handling of mergejoins
Date: 2008-01-09 19:13:22
Message-ID: 11594.1199906002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> A perhaps less invasive idea is to discard any proposed mergeclauses
> that are redundant in this sense. This would still require some
> reshuffling of responsibility between select_mergejoin_clauses and
> the code in pathkeys.c, since right now select_mergejoin_clauses
> takes no account of that. However, I'm worried that that might result
> in planner failure on some FULL JOIN cases that work today, since we
> require all the join clauses to be mergejoinable for a FULL JOIN.
> People seem to complain when the planner fails, even for really
> stupid queries ;-). I think this would only be acceptable if we can
> prove that ignoring clauses that are redundant in this sense doesn't
> change the result --- which might be the case, but I'm not sure.

On further study, this doesn't seem nearly as bad as it looked. I had
hastily misread some of the existing code as assuming one-for-one
correspondence of mergeclauses and pathkeys, but actually it just
assumes that the pathkeys list contains at least one pathkey matching
each mergeclause. Redundancy between two pathkeys in a list is
eliminated by allowing adjacent mergeclauses to share the same pathkey.
So that part actually all works, and the only problem is when an
equivalence class is redundant-for-sorting in itself --- that is,
when one of the eclass members is a constant. So that explains why
we'd not seen it before; except in very odd corner cases, the old code
would have eliminated the mergejoinable clause anyway.

If we simply make select_mergejoin_clauses() reject a clause as
not-mergejoinable if either side is equated to a constant, then
the problems go away. We'd have a new problem if this meant that
we fail to do a FULL JOIN, but AFAICS that can't happen: a variable
coming from a full join can't be equivalenced to a constant, except
perhaps in a below_outer_join eclass, which isn't considered redundant
for sorting anyway.

Bottom line is that it just takes another half dozen lines of code
to fix this; attached is the core of the fix (there are some
uninteresting prototype changes and macro-moving as well).

> I think I can fix this in a day or so, but I now definitely feel that
> we'll need an RC2 :-(

I no longer think that about this problem, but we've still got some
nasty-looking issues in xml.c, plus Hannes Dorbath's unsolved reports
of GIST/GIN problems ...

regards, tom lane


+ /*
+ * Insist that each side have a non-redundant eclass. This
+ * restriction is needed because various bits of the planner expect
+ * that each clause in a merge be associatable with some pathkey in a
+ * canonical pathkey list, but redundant eclasses can't appear in
+ * canonical sort orderings. (XXX it might be worth relaxing this,
+ * but not enough time to address it for 8.3.)
+ *
+ * Note: it would be bad if this condition failed for an otherwise
+ * mergejoinable FULL JOIN clause, since that would result in
+ * undesirable planner failure. I believe that is not possible
+ * however; a variable involved in a full join could only appear
+ * in below_outer_join eclasses, which aren't considered redundant.
+ *
+ * This *can* happen for left/right join clauses, however: the
+ * outer-side variable could be equated to a constant. (Because we
+ * will propagate that constant across the join clause, the loss of
+ * ability to do a mergejoin is not really all that big a deal, and
+ * so it's not clear that improving this is important.)
+ */
+ cache_mergeclause_eclasses(root, restrictinfo);
+
+ if (EC_MUST_BE_REDUNDANT(restrictinfo->left_ec) ||
+ EC_MUST_BE_REDUNDANT(restrictinfo->right_ec))
+ {
+ have_nonmergeable_joinclause = true;
+ continue; /* can't handle redundant eclasses */
+ }
+
result_list = lappend(result_list, restrictinfo);
}