Re: Partition-wise join for join between (declaratively) partitioned tables

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Rajkumar Raghuwanshi <rajkumar(dot)raghuwanshi(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partition-wise join for join between (declaratively) partitioned tables
Date: 2017-09-20 13:21:07
Message-ID: CAM2+6=U9P8ED2gf5_AA+1b-bDSo0eik31fAb8PXNU6gVZPS+Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 19, 2017 at 3:17 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

> On Tue, Sep 19, 2017 at 2:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Mon, Sep 18, 2017 at 8:02 AM, Ashutosh Bapat
> > <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> >> partition pruning might need partexprs look up relevant quals, but
> >> nullable_partexprs doesn't have any use there. So may be we should add
> >> nullable_partexpr to RelOptInfo as part of 0002 (partition-wise join
> >> implementation) instead of 0001. What do you think?
> >
> > +1.
>
> Done.
>
> >
> >>> - I'm not entirely sure whether maintaining partexprs and
> >>> nullable_partexprs is the right design. If I understand correctly,
> >>> whether or not a partexpr is nullable is really a per-RTI property,
> >>> not a per-expression property. You could consider something like
> >>> "Relids nullable_rels".
> >>
> >> That's true. However in order to decide whether an expression falls on
> >> nullable side of a join, we will need to call pull_varnos() on it and
> >> check the output against nullable_rels. Separating the expressions
> >> themselves avoids that step.
> >
> > Good point. Also, I'm not sure about cases like this:
> >
> > SELECT * FROM (SELECT b.x, b.y FROM a LEFT JOIN b ON a.x = b.x WHERE
> > a.y = b.y) w LEFT JOIN c ON w.x = c.x AND w.y = c.y;
> >
> > Suppose the relations are all partitioned by (x, y) but that the =
> > operator is not strict. A partition-wise join is valid between a and
> > b, but we can't regard w as partitioned any more, because w.x might
> > contain nulls in partitions where the partitioning scheme wouldn't
> > allow them. On the other hand, if the subquery were to select a.x,
> > a.y then clearly it would be fine: there would be no possibility of a
> > NULL having been substituted for a proper value.
> >
> > What if the subquery selected a.x, b.y? Initially, I thought that
> > would be OK too, because of the fact that the a.y = b.y clause is in
> > the WHERE clause rather than the join condition. But on further
> > thought I think that probably doesn't work, because with = being a
> > non-strict operator there's no guarantee that it would remove any
> > nulls introduced by the left join. Of course, if the subselect had a
> > WHERE clause saying that b.x/b.y IS NOT NULL then having the SELECT
> > list mention those columns would be fine.
> >
>
> I am actually not sure whether we can use partition-wise join for a
> LEFT JOIN b when the partition key equalities are spread across ON and
> WHERE clauses. I am not able to find any example against it, but I am
> not able to prove it as well. The reference I used for partition-wise
> join [1], mentions JOIN conditions i.e. ON clause conditions. But all
> the examples used in that paper are that of INNER join. So, I am not
> sure what exactly the authors meant by JOIN conditions. Right now I am
> restricting the patch to work with only conditions in the ON clause.
>
> Practically most of the operators are strict. OUTER join's WHERE
> clause has any partition key equality with strict operator, optimizer
> will turn
> that OUTER join into an INNER one, turning all clauses into join
> clauses. That will enable partition-wise join. So, the current
> restriction doesn't restrict any practical cases.
>
> OTOH, I have seen that treating ON and WHERE clauses as same for an
> OUTER join leads to surprising results. So, I am leaning to treat them
> separate for partition-wise join as well and only use ON clause
> conditions for partition-wise join. If we get complaints about
> partition-wise join not being picked we will fix them after proving
> that it's not harmful. Lifting that restriction is not so difficult.
> have_partition_key_equijoin() ignores "pushed down" quals. We have to
> just change that condition.
>
> Your last sentence about a clause b.x IS NOT NULL or b.y IS NOT NULL
> is interesting. If those conditions are in ON clause, we may still
> have a result where b.x and b.y as NULL when no row in "a" matches a
> row in "b". If those conditions are in WHERE clause, I think optimizer
> will turn the join into an INNER join irrespective of whether the
> equality operator is strict.
>
> >
> >> If partition-wise join is disabled, partition-wise aggregates,
> >> strength reduction of MergeAppend won't be possible on a join tree,
> >> but those will be possible on a base relation. Even if partition-wise
> >> join enabled, one may want to disable other partition-wise
> >> optimizations individually. So, they are somewhat independent
> >> switches. I don't think we should bundle all of those into one.
> >> Whatever names we choose for those GUCs, I think they should have same
> >> naming convention e.g. "partition_wise_xyz". I am open to suggestions
> >> about the names.
> >
> > I think the chances of you getting multiple GUCs for different
> > partition-wise optimizations past Tom are pretty low.
>
> We do have enable_hashjoin and enable_hashagg to control use of
> hashing for aggregate and join. On similar lines we can have three
> GUCs to enable use of partition-wise strategy, one for each of join,
> aggregation and sorting. Having granular switches would be useful for
> debugging and may be to turn partition-wise strategies off when they
> are not optimal.

I think having a granular control over each of these optimization will be
handy for the DBAs too.

> Do we want a switch to turn ON/OFF partition pruning?
> Said, that I am fine with single GUC controlling all. We won't set any
> partitioning information in RelOptInfo if that GUC is turned OFF.
>
> [1] https://pdfs.semanticscholar.org/27c2/ba75f8b6a39d4bce85d5579dace609
> c9abaa.pdf
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-20 13:25:12 Re: Allow GiST opcalsses without compress\decompres functions
Previous Message Robert Haas 2017-09-20 13:14:44 Re: Page Scan Mode in Hash Index