Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Thom Brown <thom(at)linux(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-03-18 15:56:17
Message-ID: CA+TgmoaHTano5mE2MLyk9E9vVFgJKZUD6QmaV072WBu7Be-8WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 18, 2015 at 9:33 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>> On Wed, Mar 18, 2015 at 2:34 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>> > So, overall consensus for the FDW hook location is just before the set_chepest()
>> > at standard_join_search() and merge_clump(), isn't it?
>>
>> Yes, I think so.
>>
>> > Let me make a design of FDW hook to reduce code duplications for each FDW driver,
>> > especially, to identify baserel/joinrel to be involved in this join.
>>
>> Great, thanks!
>>
>> One issue, which I think Ashutosh alluded to upthread, is that we need
>> to make sure it's not unreasonably difficult for foreign data wrappers
>> to construct the FROM clause of an SQL query to be pushed down to the
>> remote side. It should be simple when there are only inner joins
>> involved, but when there are all outer joins it might be a bit
>> complex. It would be very good if someone could try to write that
>> code, based on the new hook locations, and see how it turns out, so
>> that we can figure out how to address any issues that may crop up
>> there.
>>
> Here is an idea that provides a common utility function that break down
> the supplied RelOptInfo of joinrel into a pair of join-type and a list of
> baserel/joinrel being involved in the relations join. It intends to be
> called by FDW driver to list up underlying relations.
> IIUC, root->join_info_list will provide information of how relations are
> combined to the upper joined relations, thus, I expect it is not
> unreasonably complicated way to solve.
> Once a RelOptInfo of the target joinrel is broken down into multiple sub-
> relations (N>=2 if all inner join, elsewhere N=2), FDW driver can
> reference the RestrictInfo to be used in relations join.
>
> Anyway, I'll try to investigate the existing code for more detail today,
> to clarify whether the above approach is feasible.

Sounds good. Keep in mind that, while the parse tree will obviously
reflect the way that the user actually specified the join
syntactically, it's not the job of the join_info_list to make it
simple to reconstruct that information. To the contrary,
join_info_list is supposed to be structured in a way that makes it
easy to determine whether *a particular join order is one of the legal
join orders* not *whether it is the specific join order selected by
the user*. See join_is_legal().

For FDW pushdown, I think it's sufficient to be able to identify *any
one* legal join order, not necessarily the same order the user
originally entered. For exampe, if the user entered A LEFT JOIN B ON
A.x = B.x LEFT JOIN C ON A.y = C.y and the FDW generates a query that
instead does A LEFT JOIN C ON a.y = C.y LEFT JOIN B ON A.x = B.x, I
suspect that's just fine. Particular FDWs might wish to try to be
smart about what they emit based on knowledge of what the remote
side's optimizer is likely to do, and that's fine. If the remote side
is PostgreSQL, it shouldn't matter much.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-03-18 16:01:14 Re: assessing parallel-safety
Previous Message Justin Pryzby 2015-03-18 15:48:48 pg9.4 relpages of child tables