Re: [v9.5] Custom Plan API

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-09-15 12:38:06
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8FE5CBE@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai
> >> <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> >> wrote:
> >> >> Don't the changes to src/backend/optimizer/plan/createplan.c
> >> >> belong in patch #2?
> >> >>
> >> > The borderline between #1 and #2 is little bit bogus. So, I moved
> >> > most of portion into #1, however, invocation of InitCustomScan
> >> > (that is a callback in CustomPlanMethod) in create_custom_plan() is
> still in #2.
> >>
> >> Eh, create_custom_scan() certainly looks like it is in #1 from here,
> >> or at least part of it is. It calculates tlist and clauses and then
> >> does nothing with them. That clearly can't be the right division.
> >>
> >> I think it would make sense to have create_custom_scan() compute
> >> tlist and clauses first, and then pass those to CreateCustomPlan().
> >> Then you don't need a separate InitCustomScan() - which is misnamed
> >> anyway, since it has nothing to do with ExecInitCustomScan().
> >>
> > The only reason why I put separate hooks here is, create_custom_scan()
> > needs to know exact size of the CustomScan node (including private
> > fields), however, it is helpful for extensions to kick its callback to
> > initialize the node next to the common initialization stuff.
>
> Why does it need to know that? I don't see that it's doing anything that
> requires knowing the size of that node, and if it is, I think it shouldn't
> be. That should get delegated to the callback provided by the custom plan
> provider.
>
Sorry, my explanation might be confusable. The create_custom_scan() does not
need to know the exact size of the CustomScan (or its inheritance) because of
the two separated hooks; one is node allocation time, the other is the tail
of the series of initialization.
If we have only one hook here, we need to have a mechanism to informs
create_custom_scan() an exact size of the CustomScan node; including private
fields managed by the provider, instead of the first hook on node allocation
time. In this case, node allocation shall be processed by create_custom_scan()
and it has to know exact size of the node to be allocated.

How do I implement the feature here? Is the combination of static node size
and callback on the tail more simple than the existing design that takes two
individual hooks on create_custom_scan()?

> > Regarding to the naming, how about GetCustomScan() instead of
> InitCustomScan()?
> > It follows the manner in create_foreignscan_plan().
>
> I guess that's a bit better, but come to think of it, I'd really like to
> avoid baking in the assumption that the custom path provider has to return
> any particular type of plan node. A good start would be to give it a name
> that doesn't imply that - e.g. PlanCustomPath().
>
OK, I'll use this naming.

> >> > OK, I revised. Now custom-scan assumes it has a particular valid
> >> > relation to be scanned, so no code path with scanrelid == 0 at this
> moment.
> >> >
> >> > Let us revisit this scenario when custom-scan replaces relation-joins.
> >> > In this case, custom-scan will not be associated with a particular
> >> > base- relation, thus it needs to admit a custom-scan node with
> >> > scanrelid
> >> == 0.
> >>
> >> Yeah, I guess the question there is whether we'll want let CustomScan
> >> have scanrelid == 0 or require that CustomJoin be used there instead.
> >>
> > Right now, I cannot imagine a use case that requires individual
> > CustomJoin node because CustomScan with scanrelid==0 (that performs
> > like custom-plan rather than custom-scan in actually) is sufficient.
> >
> > If a CustomScan gets chosen instead of built-in join logics, it shall
> > looks like a relation scan on the virtual one that is consists of two
> > underlying relation. Callbacks of the CustomScan has a responsibility
> > to join underlying relations; that is invisible from the core executor.
> >
> > It seems to me CustomScan with scanrelid==0 is sufficient to implement
> > an alternative logic on relation joins, don't need an individual node
> > from the standpoint of executor.
>
> That's valid logic, but it's not the only way to do it. If we have CustomScan
> and CustomJoin, either of them will require some adaption to handle this
> case. We can either allow a custom scan that isn't scanning any particular
> relation (i.e. scanrelid == 0), or we can allow a custom join that has no
> children. I don't know which way will come out cleaner, and I think it's
> good to leave that decision to one side for now.
>
Yep. I agree with you. It may not be productive discussion to conclude
this design topic right now. Let's assume CustomScan scans on a particular
relation (scanrelid != 0) on the first revision.

> >> >> Why can't the Custom(GpuHashJoin) node build the hash table
> >> >> internally instead of using a separate node?
> >> >>
> >> > It's possible, however, it prevents to check sub-plans using
> >> > EXPLAIN if we manage inner-plans internally. So, I'd like to have a
> >> > separate node being connected to the inner-plan.
> >>
> >> Isn't that just a matter of letting the EXPLAIN code print more stuff?
> >> Why can't it?
> >>
> > My GpuHashJoin takes multiple relations to load them a hash-table.
> > On the other hand, Plan node can have two underlying relations at most
> > (inner/outer). Outer-side is occupied by the larger relation, so it
> > needs to make multiple relations visible using inner-branch.
> > If CustomScan can has a list of multiple underlying plan-nodes, like
> > Append node, it can represent the structure above in straightforward
> > way, but I'm uncertain which is the better design.
>
> Right. I think the key point is that it is *possible* to make this work
> without a multiexec interface, and it seems like we're agreed that it is.
> Now perhaps we will decide that there is enough benefit in having multiexec
> support that we want to do it anyway, but it's clearly not a hard requirement,
> because it can be done without that in the way you describe here. Let's
> leave to the future the decision as to how to proceed here; getting the
> basic thing done is hard enough.
>
OK, let's postpone the discussion on the custom-join support.
Either of approaches (1. multi-exec support, or 2. multiple subplans like
Append) is sufficient for this purpose, and the multi-exec interface is
a way to implement it, not a goal.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-09-15 12:41:22 Re: WAL format and API changes (9.5)
Previous Message Alexander Korotkov 2014-09-15 11:58:41 Re: PoC: Partial sort