Re: UNION ALL on partitioned tables won't use indices.

From: Noah Misch <noah(at)leadboat(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: peter_e(at)gmx(dot)net, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UNION ALL on partitioned tables won't use indices.
Date: 2013-11-23 07:39:13
Message-ID: 20131123073913.GA1008138@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 24, 2013 at 07:39:53PM +0900, Kyotaro HORIGUCHI wrote:
> 1. Observed symptom
>
> As you know, UNION ALL accompanied with ORDER BY uses indexes if
> available.

> So do simple queries on partitioned (inheritance) tables.

> Nevertheless, UNION ALL on partitioned tables doesn't. This is
> quite unfavourable behavior especially having LIMIT.
>
> >uniontest=# EXPLAIN ANALYZE SELECT * FROM p1
> > UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;
> > QUERY PLAN
> >------------------------------------------------------------------------
> > Limit (actual time=182.732..182.735 rows=10 loops=1)
> > -> Sort (actual time=182.729..182.730 rows=10 loops=1)
> > Sort Key: p1.a
> > Sort Method: top-N heapsort Memory: 25kB
> > -> Append (actual time=0.029..108.109 rows=400000 loops=1)
> > -> Seq Scan on p1 (actual time=0.001..0.001 rows=0 loops=1)
> > -> Seq Scan on c11 (actual time=0.027..19.074 rows=100000 loops=1)
> > -> Seq Scan on c12 (actual time=0.014..16.653 rows=100000 loops=1)
> > -> Seq Scan on p2 (actual time=0.000..0.000 rows=0 loops=1)
> > -> Seq Scan on c21 (actual time=0.012..16.677 rows=100000 loops=1)
> > -> Seq Scan on c22 (actual time=0.012..16.794 rows=100000 loops=1)
> > Total runtime: 182.857 ms

That is indeed surprising.

> 2. The cause
>
> In grouping_planner, flattern_simple_union_all creates
> appendrelinfos for each subqueries then expand_inherited_tables
> furthur expands the parent tables in each subqueries. Finally,
> this sequence leaves following structure. Where rte[2] and [3]
> are subqueries abandoned on the way pulling up rte[4] and [5].
>
> rte[1] Subquery "SELECT*1", inh = 1
> +- appinfo[0] - rte[4] Relation "p1", inh = 1
> | +- appinfo[2] - rte[6] Relation "p1", inh = 0
> | +- appinfo[3] - rte[7] Relation "c11", inh = 0
> | +- appinfo[4] - rte[8] Relation "c12", inh = 0
> +- appinfo[1] - rte[5] Relation "p2", inh = 1
> +- appinfo[5] - rte[9] Relation "p1", inh = 0
> +- appinfo[6] - rte[10] Relation "c11", inh = 0
> +- appinfo[7] - rte[11] Relation "c12", inh = 0
>
> On the other hand, ec member finally has exprs only for varno =
> 1, 4 and 5, in other words, it lacks the members for the most
> descendant RTEs. This is because add_child_rel_equivalences()
> always inhibits add_eq_member from registering the child_rel's
> relid on EC. Conequently these grandchild relations does not find
> index pathkeys for themselves.

I'm unclear on the key ideas behind distinguishing em_is_child members from
ordinary EC members. src/backend/optimizer/README says "These members are
*not* full-fledged members of the EquivalenceClass and do not affect the
class's overall properties at all." Is that an optimization to avoid futile
planner work, or is it necessary for correctness? If the latter, what sort of
query might give wrong answers if an EquivalenceMember were incorrectly added
as em_is_child = false?

> 3. Measures
>
> I could thought up three approaches for the problem.

Thanks for writing up multiple options. My tentative preference is for (3),
then (2), then (1):

> One is to simplly modifying here to give child flag in the
> parameters of add_eq_member accordig to whether the child_rel is
> inh or not. This seems to work fine although leaves two levels of
> MergeAppends (PATCH-1). So the additional patch is attached to
> collapse these MergeAppends (PATCH-2). This gives the same plan
> as PATCH-3.

Revising the append graph this late in planning (create_merge_append_path())
seems like the wrong location. Changing add_child_rel_equivalences() in this
way adds further subtlety to the meaning of em_is_child, and I haven't
comprehended the full implications. Neither of those objections are fatal,
but let's focus on the other two approaches, which avoid those objections.

> The second is to collapse the appendrel structure shown above to
> have only single level inheritance. This also seems to work
> fine. This makes the plan with single level
> MergeAppend. (PATCH-3)

> The last one is most strait-forward in some sense, and conversely
> is most ad hoc measure. Modifying expand_inherited_rtentry() to
> pull up adding appendrels if needed, using the similar manner to
> PATCH-3. This is added as PATCH-4.

The choice between (2) and (3) should depend on the extent of ways in which we
can end up with a nested AppendRelInfo graph. If expand_inherited_rtentry()
is the only place that can introduce nesting, modifying it to stop doing that
makes sense. Otherwise, adding a generic collapse_appendrels() step might
help more situations. expand_inherited_rtentry() is the only culprit I can
identify, so I lean toward approach-3/PATCH-4. For a data point corroborating
that hypothesis, "make check" issues no queries that trigger this warning:

--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1924,6 +1924,9 @@ add_child_rel_equivalences(PlannerInfo *root,
{
ListCell *lc1;

+ if (root->simple_rte_array[child_rel->relid]->inh)
+ elog(WARNING, "unexpected nested append");
+
foreach(lc1, root->eq_classes)
{
EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1);

On Wed, Nov 13, 2013 at 05:38:11PM +0900, Kyotaro HORIGUCHI wrote:

[Approach (3): union_inh_idx_typ3_v2_20131113.patch]

> @@ -1210,6 +1217,34 @@ expand_inherited_tables(PlannerInfo *root)
> }
> }
>
> +static Node *
> +transvars_merge_mutator(Node *node, transvars_merge_context *ctx)
> +{
> + if (node == NULL)
> + return NULL;
> +
> + if (IsA(node, Var))
> + {
> + Var *oldv = (Var*)node;
> +
> + if (!oldv->varlevelsup && oldv->varno == ctx->target_rti)
> + {
> + if (oldv->varattno >
> + list_length(ctx->child_appinfo->translated_vars))
> + elog(ERROR,
> + "attribute %d of relation \"%s\" does not exist",
> + oldv->varattno,
> + get_rel_name(ctx->child_appinfo->parent_reloid));
> +
> + return (Node*)copyObject(
> + list_nth(ctx->child_appinfo->translated_vars,
> + oldv->varattno - 1));
> + }
> + }
> + return expression_tree_mutator(node,
> + transvars_merge_mutator, (void*)ctx);
> +}

This has roughly the same purpose as adjust_appendrel_attrs_mutator(), but it
propagates the change to far fewer node types. Why can it be so much simpler?
The translated_vars of parent_appinfo may bear mostly-arbitrary expressions.

> + appinfo->translated_vars = (List*)expression_tree_mutator(
> + parent_appinfo->translated_vars,
> + transvars_merge_mutator, &ctx);

I get this warning:

prepunion.c: In function `expand_inherited_rtentry':
prepunion.c:1450: warning: passing argument 1 of `expression_tree_mutator' from incompatible pointer type

I filled out your test case as follows to reproduce your results:

create table p1 (a int, b int);
create table c11 () inherits (p1);
create table c12 () inherits (p1);
create table p2 (a int, b int);
create table c21 () inherits (p2);
create table c22 () inherits (p2);
alter table p1 add primary key (a);
alter table c11 add primary key (a);
alter table c12 add primary key (a);
alter table p2 add primary key (a);
alter table c21 add primary key (a);
alter table c22 add primary key (a);
EXPLAIN ANALYZE SELECT * FROM p1 UNION ALL SELECT * FROM p2 ORDER BY a LIMIT 10;

However, adding a qual to one of the inheritance queries once again defeated
MergeAppend with the patches for approaches (2) and (3). The approach-1 patch
gives a segfault. Test queries:

-- uses MergeAppend for p1
EXPLAIN ANALYZE SELECT * FROM p1 WHERE b <> 0 ORDER BY a;
-- no MergeAppend for p1
EXPLAIN ANALYZE SELECT * FROM p1 WHERE b <> 0 UNION ALL SELECT * FROM p2 ORDER BY a;

I did not study this further to identify the cause. However, I am suspicious
that we're missing much of the work in set_append_rel_size(), notably the
transfer of baserestrictinfo to the child. set_append_rel_size() identifies
the need to do that work based on AppendRelId parent_relid connectivity, but
we can now flatten that graph in an earlier step.

Approaches (2) and (3) leave the inheritance parent with rte->inh == true
despite no AppendRelInfo pointing to it as their parent. Until now,
expand_inherited_rtentry() has been careful to clear rte->inh in such cases.
My gut feeling is that we should retain that property; what do you think?

Finally, the patch should add test cases.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2013-11-23 08:07:34 Re: COPY TO
Previous Message mohsen soodkhah mohammadi 2013-11-23 06:48:22 COPY TO