Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-03-17 00:29:29
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F8B004@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I adjusted the custom-plan interface patch little bit for the cache-only
scan patch; that is a demonstration module for vacuum-page hook on top of
the custom-plan interface.

fix_scan_expr() looks to me useful for custom-plan providers that want to
implement its own relation scan logic, even though they can implement it
using fix_expr_common() being already exposed.

Also, I removed the hardcoded portion from the nodeCustom.c although, it
may make sense to provide a few template functions to be called by custom-
plan providers, that performs usual common jobs like construction of expr-
context, assignment of result-slot, open relations, and so on.
I though the idea during implementation of BeginCustomPlan handler.
(These template functions are not in the attached patch yet.)
How about your opinion?

The major portion of this patch is not changed from v10.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> Sent: Wednesday, March 12, 2014 1:55 PM
> To: Tom Lane
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> The attached two patches are the revised custom-plan interface and example
> usage that implements existing MergeJoin on top of this interface.
>
> According to the discussion last week, I revised the portion where
> custom-node is expected to perform a particular kind of task, like scanning
> a relation, by putting polymorphism with a set of callbacks set by
> custom-plan provider.
> So, the core backend can handle this custom-plan node just an abstracted
> plan-node with no anticipation.
> Even though the subject of this message says "custom-scan", I'd like to
> name the interface "custom-plan" instead, because it became fully arbitrary
> of extension whether it scan on a particular relation.
>
> Definition of CustomXXXX data types were simplified:
>
> typedef struct CustomPath
> {
> Path path;
> const struct CustomPathMethods *methods;
> } CustomPath;
>
> typedef struct CustomPlan
> {
> Plan plan;
> const struct CustomPlanMethods *methods;
> } CustomPlan;
>
> typedef struct CustomPlanState
> {
> PlanState ps;
> const CustomPlanMethods *methods;
> } CustomPlanState;
>
> Each types have a base class and a set of function pointers that characterize
> the behavior of this custom-plan node.
> In usual use-cases, extension is expected to extend these classes to keep
> their private data fields needed to implement its own functionalities.
>
> Most of the methods are designed to work as a thin layer towards existing
> planner / executor functions, so custom-plan provides has to be responsible
> to implement its method to communicate with core backend as built-in ones
> doing.
>
> Regarding to the topic we discussed last week,
>
> * CUSTOM_VAR has gone.
> The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN command
> output (including column names being referenced) even if a custom-plan node
> replaced a join but has no underlying subplans on left/right subtrees.
> A typical situation like this is a remote-join implementation that I tried
> to extend postgres_fdw on top of the previous interface.
> It retrieves a flat result set of the remote join execution, thus has no
> subplan locally. On the other hand, EXPLAIN tries to find out "actual" Var
> node from the underlying subplan if a Var node has special varno
> (INNER/OUTER/INDEX).
> I put a special method to solve the problem. GetSpecialCustomVar method
> is called if a certain Var node of custom-plan has a special varno, then
> custom-plan provider can inform the core backend an expression node to be
> referenced by this Var node.
> It allows to solve the column name without recursive walking on the subtrees,
> so it enables a custom-plan node that replaces a part of plan-tree.
> This method is optional, so available to adopt existing way if custom-plan
> provider does not do anything special.
>
>
> * Functions to be exposed, from static declaration
>
> Right now, static functions are randomly exposed on demand.
> So, we need more investigation which functions are needed, and which others
> are not.
> According to my trial, the part-2 patch that is MergeJoin on top of the
> custom-plan interface, class of functions that recursively walk on subplan
> tree have to be exposed. Like, ExplainPreScanNode, create_plan_recurse,
> set_plan_refs, fix_expr_common or finalize_plan.
> In case when custom-plan performs like built-in Append node, it keeps a
> list of sub-plans in its private field, so the core backend cannot know
> existence of sub-plans, thus its unavailable to make subplan, unavailable
> to output EXPLAIN and so on.
> It does not make sense to reworking on the extension side again.
> Also, createplan.c has many useful functions to construct plan-node,
> however, most of them are static because all the built-in plan-node are
> constructed by the routines in this file, we didn't need to expose them
> to others. I think, functions in createplan.c being called by
> create_xxxx_plan() functions to construct plan-node should be exposed for
> extension's convenient.
>
>
> * Definition of add_join_path_hook
>
> I didn't have idea to improve the definition and location of this hook,
> so it is still on the tail of the add_paths_to_joinrel().
> Its definition was a bit adjusted according to the feedback on the
> pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> from the argument list. Indeed, these are specific to the built-in MergeJoin
> logic and easy to reproduce.
>
>
> * Hook location of add_scan_path_hook
>
> I moved the add_scan_path_hook and set_cheapest() into
> set_base_rel_pathlists() from various caller locations;
> set_xxxx_pathlist() functions typically.
> It enabled to consolidate the location to add custom-path for base
> relations.
>
>
> * CustomMergeJoin as a proof-of-concept
>
> The contrib module in the part-2 portion is, a merge-join implementation
> on top of custom-plan interface, even though 99% of its implementation is
> identical with built-in ones.
> Its purpose is to demonstrate a custom join logic can be implemented using
> custom-plan interface, even if custom-plan node has underlying sub-plans
> unlike previous my examples.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > Sent: Friday, March 07, 2014 3:09 AM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > Robert Haas; PgHacker; Peter Eisentraut
> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> >
> > Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> > > I expected to include simple function pointers for copying and
> > > text-output as follows:
> >
> > > typedef struct {
> > > Plan plan;
> > > :
> > > NodeCopy_function node_copy;
> > > NodeTextOut_function node_textout;
> > > } Custom;
> >
> > I was thinking more like
> >
> > typedef struct CustomPathFuncs {
> > const char *name; /* used for debugging purposes only */
> > NodeCopy_function node_copy;
> > NodeTextOut_function node_textout;
> > ... etc etc etc ...
> > } CustomPathFuncs;
> >
> > typedef struct CustomPath {
> > Path path;
> > const CustomPathFuncs *funcs;
> > ... maybe a few more fields here, but not too darn many ...
> > } CustomPath;
> >
> > and similarly for CustomPlan.
> >
> > The advantage of this way is it's very cheap for (what I expect will
> > be) the common case where an extension has a fixed set of support
> > functions for its custom paths and plans. It just declares a static
> > constant CustomPathFuncs struct, and puts a pointer to that into its paths.
> >
> > If an extension really needs to set the support functions on a
> > per-object basis, it can do this:
> >
> > typdef struct MyCustomPath {
> > CustomPath cpath;
> > CustomPathFuncs funcs;
> > ... more fields ...
> > } MyCustomPath;
> >
> > and then initialization of a MyCustomPath would include
> >
> > mypath->cpath.funcs = &mypath->funcs;
> > mypath->funcs.node_copy = MyCustomPathCopy;
> > ... etc etc ...
> >
> > In this case we're arguably wasting one pointer worth of space in the
> > path, but considering the number of function pointers such a path will
> > be carrying, I don't think that's much of an objection.
> >
> > >> So? If you did that, then you wouldn't have renumbered the Vars as
> > >> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all;
> > >> if it is needed, then there would also be a need for an additional
> > >> tuple slot in executor contexts, which you haven't provided.
> >
> > > For example, the enhanced postgres_fdw fetches the result set of
> > > remote join query, thus a tuple contains the fields come from both side.
> > > In this case, what varno shall be suitable to put?
> >
> > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the
> solution.
> > Consider for example a join where both tables supply columns named "id"
> > --- if you put them both in one tupledesc then there's no non-kluge
> > way to identify them.
> >
> > Possibly the route to a solution involves adding another plan-node
> > callback function that ruleutils.c would use for printing Vars in custom
> join nodes.
> > Or maybe we could let the Vars keep their original RTE numbers, though
> > that would complicate life at execution time.
> >
> > Anyway, if we're going to punt on add_join_path_hook for the time
> > being, this problem can probably be left to solve later. It won't
> > arise for simple table-scan cases, nor for single-input plan nodes such
> as sorts.
> >
> > regards, tom lane

Attachment Content-Type Size
pgsql-v9.4-custom-scan.part-2.v11.patch application/octet-stream 147.1 KB
pgsql-v9.4-custom-scan.part-1.v11.patch application/octet-stream 66.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2014-03-17 00:45:42 Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)
Previous Message Kyotaro HORIGUCHI 2014-03-17 00:15:26 Re: Archive recovery won't be completed on some situation.