Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)

From: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, 花田茂 <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Hooking at standard_join_search (Was: Re: Foreign join pushdown vs EvalPlanQual)
Date: 2015-09-09 06:30:58
Message-ID: 55EFD222.8060701@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015/09/09 3:53, Robert Haas wrote:
> On Tue, Sep 8, 2015 at 5:35 AM, Etsuro Fujita
> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> On 2015/09/04 0:33, Robert Haas wrote:
>>> I'm worried that trawling through that
>>> SpecialJoinInfo data will end up needing to duplicate much of
>>> make_join_rel and add_paths_to_joinrel. For example, consider:
>>>
>>> SELECT * FROM verysmall v JOIN (bigft1 FULL JOIN bigft2 ON bigft1.x =
>>> bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r;
>>>
>>> The best path for this plan is presumably something like this:
>>>
>>> Nested Loop
>>> -> Seq Scan on verysmall v
>>> -> Foreign Scan on bigft1 and bigft2
>>> Remote SQL: SELECT * FROM bigft1 FULL JOIN bigft2 ON bigft1.x =
>>> bigft2.x AND bigft1.q = $1 AND bigft2.r = $2
>>>
>>> Now, how is the FDW going to figure out that it needs to generate this
>>> parameterized path without duplicating this code from
>>> add_paths_to_joinrel?
>>>
>>> /*
>>> * Decide whether it's sensible to generate parameterized paths for
>>> this
>>> * joinrel, and if so, which relations such paths should require.
>>> There
>>> * is usually no need to create a parameterized result path unless
>>> there
>>> ...
>>>
>>> Maybe there's a very simple answer to this question and I'm just not
>>> seeing it, but I really don't see how that's going to work.

>> Why don't you look at the "regular" (local join execution) paths that were
>> already generated. I think that if we called the FDW at a proper hook
>> location, the FDW could probably find a regular path in rel->pathlist of the
>> join rel (bigft1, bigft2) that possibly generates something like:
>>
>> Nested Loop
>> -> Seq Scan on verysmall v
>> -> Nested Loop
>> Join Filter: (bigft1.a = bigft2.a)
>> -> Foreign Scan on bigft1
>> Remote SQL: SELECT * FROM bigft1 WHERE bigft1.q = $1
>> -> Foreign Scan on bigft2
>> Remote SQL: SELECT * FROM bigft2 WHERE bigft2.r = $2
>>
>> From the parameterization of the regular nestloop path for joining bigft1
>> and bigft2 locally, I think that the FDW could find that it's sensible to
>> generate the foreign-join path for (bigft1, bigft2) with the
>> parameterization.

> But that path might have already been discarded on the basis of cost.
> I think Tom's idea is better: let the FDW consult some state cached
> for this purpose in the RelOptInfo.

Do you have an idea of what information would be collected into the
state and how the FDW would derive parameterizations to consider
producing pushed-down joins with from that information? What I'm
concerned about that is to reduce the number of parameterizations to
consider, to reduce overhead in costing the corresponding queries. I'm
missing something, though.

Best regards,
Etsuro Fujita

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-09-09 06:36:33 Re: On-demand running query plans using auto_explain and signals
Previous Message Haribabu Kommi 2015-09-09 06:17:54 Re: Parallel Seq Scan