The way to get custom plan interface committed? (RE: Custom Scan APIs)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Stephen Frost <sfrost(at)snowman(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: The way to get custom plan interface committed? (RE: Custom Scan APIs)
Date: 2014-03-28 00:45:30
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F8F6BA@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

We now have about two weeks by the final feature freeze towards of v9.4.
What is the best way to get the custom-plan interface committed in this
limited time?
Fortunately, we could have a series of active discussion three weeks
before, then we reached a consensus that custom "scan" interface should
perform as custom "plan" interface; that gives custom plannode polymorphism
using a set of function pointers as like an object in object oriented
programming language.
The attached patch is the latest version of custom-plan interface being
re-designed according to the suggestion by Tom. It allows extensions to
create an object that extends the base custom types, with a set of
callback function pointers. This approach works well; at least, I could
demonstrate an implementation of existing merge-join on top of this
interface.

One negative factor of my submission is scale of patch. Even though
the main part of this patch is less than 1KL (incl 300lines sgml),
the demonstration part is more than 4.7KL but 99% of them were copied
from the core code.
Probably, it makes sense to focus on the main part of this patch.
The contrib portion I attached have no sense more than its demonstration
because its functionality is identical with existing merge join code.

Also, two other small patches that implements a callback on vacuum-pages
and allows HeapTupleSatisfiesVisibility on cache-only tuples.
These enhancements are needed to synchronize in-memory cache structure
on-heap data structure, as I demonstrated in the cache-only scan module.

I'd like to see above new features in v9.4, that will help people who
want to implement experimental, niche or edge-features on top of the
mainline PostgreSQL.

Your opinions?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei Kaigai
> Sent: Monday, March 24, 2014 7:26 PM
> To: Tom Lane; PgHacker
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> Because of patch conflict towards the latest master branch, I rebased the
> custom-plan interface patches; no functional difference from the v11.
>
> Brief summary of the current approach that has been revised from my original
> submission through the discussion on pgsql-hackers:
>
> The plannode was renamed to CustomPlan, instead of CustomScan, because it
> dropped all the hardcoded portion that assumes the custom-node shall perform
> as alternative scan or join method, because it prevents this custom-node
> to perform as other stuff; like sort or append potentially.
> According to the suggestion by Tom, I put a structure that contains several
> function pointers on the new CustomPlan node, and extension will allocate
> an object that extends CustomPlan node.
> It looks like polymorphism in object oriented programming language.
> The core backend knows abstracted set of methods defined in the tables of
> function pointers, and extension can implement its own logic on the callback,
> using private state on the extended object.
>
> Some issues are still under discussion:
> * Design of add_join_path_hook
> Tom suggested that core backend can support to check join legality and
> parameterization stuff, however, it looks to me existing code handles join
> legality checks within the function of individual join logic.
> So, I'm not certain whether we can have a common legality check for all
> the (potential) alternative join implementation.
> The part-2 is a demonstration module that implemented existing merge- join
> logic, but on top of this interface.
>
> * Functions to be exposed for extensions Some utility functions useful to
> implement plan node are declared as static functions, because most of the
> stuff are implemented within createplan.c, setrefs.c and so on, thus these
> were not needed to expose other stuff. However, extension will become a
> caller for these functions due to custom-plan interface, even though they
> are implemented out of the core.
> In my investigation, class of functions that recursively walk on subplan
> tree have to be exposed, like ExplainPreScanNode, create_plan_recurse,
> set_plan_refs, fix_expr_common or finalize_plan.
> Do we have other criteria to determine what function shall be exposed?
>
> Any help and feedback are welcome.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai(at)ak(dot)jp(dot)nec(dot)com>
>
>
> > -----Original Message-----
> > From: Kaigai Kouhei(海外 浩平)
> > Sent: Thursday, March 20, 2014 10:46 AM
> > To: Kaigai Kouhei(海外 浩平); Tom Lane
> > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > Robert Haas; PgHacker; Peter Eisentraut
> > Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> >
> > Hello,
> >
> > > > * Definition of add_join_path_hook
> > > >
> > > > I didn't have idea to improve the definition and location of this
> > > > hook, so it is still on the tail of the add_paths_to_joinrel().
> > > > Its definition was a bit adjusted according to the feedback on the
> > > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > > > from the argument list. Indeed, these are specific to the built-in
> > > > MergeJoin logic and easy to reproduce.
> > > >
> > >
> > After the submission, I'm still investigating better way to put a hook
> > to add custom join paths.
> >
> > Regarding to the comment from Tom:
> > | * The API for add_join_path_hook seems overcomplex, as well as too
> > | full of implementation details that should remain private to joinpath.c.
> > | I particularly object to passing the mergeclause lists, which seem
> > | unlikely to be of interest for non-mergejoin plan types anyway.
> > | More generally, it seems likely that this hook is at the wrong level
> > | of abstraction; it forces the hook function to concern itself with a
> > | lot of stuff about join legality and parameterization (which I note
> > | your patch3 code fails to do; but that's not an optional thing).
> > |
> > The earlier half was already done. My trouble is the later portion.
> >
> > The overall jobs of add_join_path_hook are below:
> > 1. construction of parameterized path information; being saved at
> > param_source_rel and extra_lateral_rels.
> > 2. Try to add mergejoin paths with underlying Sort node 3. Try to add
> > mergejoin/nestloop paths without underlying Sort node 4. Try to add
> > hashjoin paths
> >
> > It seems to me the check for join legality and parameterization are
> > built within individual routines for each join algorithm.
> > (what does the "join legality check" actually mean?)
> >
> > Probably, it makes sense to provide a common utility function to be
> > called back from the extension if we can find out a common part for
> > all the join logics, however, I don't have clear idea to cut off the common
> portion.
> > What's jobs can be done independent from the join algorithm??
> >
> > I'd like to see ideas around this issue. Of course, I also think it is
> > still an option to handle it by extension on the initial version.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> >
> >
> > > -----Original Message-----
> > > From: pgsql-hackers-owner(at)postgresql(dot)org
> > > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei
> > > Kaigai
> > > Sent: Monday, March 17, 2014 9:29 AM
> > > To: Kaigai Kouhei(海外 浩平); Tom Lane
> > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > Robert Haas; PgHacker; Peter Eisentraut
> > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > >
> > > Hello,
> > >
> > > I adjusted the custom-plan interface patch little bit for the
> > > cache-only scan patch; that is a demonstration module for
> > > vacuum-page hook on top of the custom-plan interface.
> > >
> > > fix_scan_expr() looks to me useful for custom-plan providers that
> > > want to implement its own relation scan logic, even though they can
> > > implement it using fix_expr_common() being already exposed.
> > >
> > > Also, I removed the hardcoded portion from the nodeCustom.c
> > > although, it may make sense to provide a few template functions to
> > > be called by
> > > custom- plan providers, that performs usual common jobs like
> > > construction of expr- context, assignment of result-slot, open
> > > relations,
> > and so on.
> > > I though the idea during implementation of BeginCustomPlan handler.
> > > (These template functions are not in the attached patch yet.) How
> > > about your opinion?
> > >
> > > The major portion of this patch is not changed from v10.
> > >
> > > Thanks,
> > > --
> > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > > <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> > >
> > >
> > > > -----Original Message-----
> > > > From: pgsql-hackers-owner(at)postgresql(dot)org
> > > > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Kouhei
> > > > Kaigai
> > > > Sent: Wednesday, March 12, 2014 1:55 PM
> > > > To: Tom Lane
> > > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > > Robert Haas; PgHacker; Peter Eisentraut
> > > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > > >
> > > > Hello,
> > > >
> > > > The attached two patches are the revised custom-plan interface and
> > > > example usage that implements existing MergeJoin on top of this
> > interface.
> > > >
> > > > According to the discussion last week, I revised the portion where
> > > > custom-node is expected to perform a particular kind of task, like
> > > > scanning a relation, by putting polymorphism with a set of
> > > > callbacks set by custom-plan provider.
> > > > So, the core backend can handle this custom-plan node just an
> > > > abstracted plan-node with no anticipation.
> > > > Even though the subject of this message says "custom-scan", I'd
> > > > like to name the interface "custom-plan" instead, because it
> > > > became fully arbitrary of extension whether it scan on a particular
> relation.
> > > >
> > > > Definition of CustomXXXX data types were simplified:
> > > >
> > > > typedef struct CustomPath
> > > > {
> > > > Path path;
> > > > const struct CustomPathMethods *methods;
> > > > } CustomPath;
> > > >
> > > > typedef struct CustomPlan
> > > > {
> > > > Plan plan;
> > > > const struct CustomPlanMethods *methods;
> > > > } CustomPlan;
> > > >
> > > > typedef struct CustomPlanState
> > > > {
> > > > PlanState ps;
> > > > const CustomPlanMethods *methods;
> > > > } CustomPlanState;
> > > >
> > > > Each types have a base class and a set of function pointers that
> > > > characterize the behavior of this custom-plan node.
> > > > In usual use-cases, extension is expected to extend these classes
> > > > to keep their private data fields needed to implement its own
> > > functionalities.
> > > >
> > > > Most of the methods are designed to work as a thin layer towards
> > > > existing planner / executor functions, so custom-plan provides has
> > > > to be responsible to implement its method to communicate with core
> > > > backend as built-in ones doing.
> > > >
> > > > Regarding to the topic we discussed last week,
> > > >
> > > > * CUSTOM_VAR has gone.
> > > > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
> > > > command output (including column names being referenced) even if a
> > > > custom-plan node replaced a join but has no underlying subplans on
> > > left/right subtrees.
> > > > A typical situation like this is a remote-join implementation that
> > > > I tried to extend postgres_fdw on top of the previous interface.
> > > > It retrieves a flat result set of the remote join execution, thus
> > > > has no subplan locally. On the other hand, EXPLAIN tries to find
> > > > out "actual" Var node from the underlying subplan if a Var node
> > > > has special varno (INNER/OUTER/INDEX).
> > > > I put a special method to solve the problem. GetSpecialCustomVar
> > > > method is called if a certain Var node of custom-plan has a
> > > > special varno, then custom-plan provider can inform the core
> > > > backend an expression node to be referenced by this Var node.
> > > > It allows to solve the column name without recursive walking on
> > > > the subtrees, so it enables a custom-plan node that replaces a
> > > > part of
> > > plan-tree.
> > > > This method is optional, so available to adopt existing way if
> > > > custom-plan provider does not do anything special.
> > > >
> > > >
> > > > * Functions to be exposed, from static declaration
> > > >
> > > > Right now, static functions are randomly exposed on demand.
> > > > So, we need more investigation which functions are needed, and
> > > > which others are not.
> > > > According to my trial, the part-2 patch that is MergeJoin on top
> > > > of the custom-plan interface, class of functions that recursively
> > > > walk on subplan tree have to be exposed. Like, ExplainPreScanNode,
> > > > create_plan_recurse, set_plan_refs, fix_expr_common or
> finalize_plan.
> > > > In case when custom-plan performs like built-in Append node, it
> > > > keeps a list of sub-plans in its private field, so the core
> > > > backend cannot know existence of sub-plans, thus its unavailable
> > > > to make subplan, unavailable to output EXPLAIN and so on.
> > > > It does not make sense to reworking on the extension side again.
> > > > Also, createplan.c has many useful functions to construct
> > > > plan-node, however, most of them are static because all the
> > > > built-in plan-node are constructed by the routines in this file,
> > > > we didn't need to expose them to others. I think, functions in
> > > > createplan.c being called by
> > > > create_xxxx_plan() functions to construct plan-node should be
> > > > exposed for extension's convenient.
> > > >
> > > >
> > > > * Definition of add_join_path_hook
> > > >
> > > > I didn't have idea to improve the definition and location of this
> > > > hook, so it is still on the tail of the add_paths_to_joinrel().
> > > > Its definition was a bit adjusted according to the feedback on the
> > > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > > > from the argument list. Indeed, these are specific to the built-in
> > > > MergeJoin logic and easy to reproduce.
> > > >
> > > >
> > > > * Hook location of add_scan_path_hook
> > > >
> > > > I moved the add_scan_path_hook and set_cheapest() into
> > > > set_base_rel_pathlists() from various caller locations;
> > > > set_xxxx_pathlist() functions typically.
> > > > It enabled to consolidate the location to add custom-path for base
> > > > relations.
> > > >
> > > >
> > > > * CustomMergeJoin as a proof-of-concept
> > > >
> > > > The contrib module in the part-2 portion is, a merge-join
> > > > implementation on top of custom-plan interface, even though 99% of
> > > > its implementation is identical with built-in ones.
> > > > Its purpose is to demonstrate a custom join logic can be
> > > > implemented using custom-plan interface, even if custom-plan node
> > > > has underlying sub-plans unlike previous my examples.
> > > >
> > > > Thanks,
> > > > --
> > > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > > > <kaigai(at)ak(dot)jp(dot)nec(dot)com>
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > > > > Sent: Friday, March 07, 2014 3:09 AM
> > > > > To: Kaigai Kouhei(海外 浩平)
> > > > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > > > Robert Haas; PgHacker; Peter Eisentraut
> > > > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > > > >
> > > > > Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> > > > > > I expected to include simple function pointers for copying and
> > > > > > text-output as follows:
> > > > >
> > > > > > typedef struct {
> > > > > > Plan plan;
> > > > > > :
> > > > > > NodeCopy_function node_copy;
> > > > > > NodeTextOut_function node_textout;
> > > > > > } Custom;
> > > > >
> > > > > I was thinking more like
> > > > >
> > > > > typedef struct CustomPathFuncs {
> > > > > const char *name; /* used for debugging purposes
> only */
> > > > > NodeCopy_function node_copy;
> > > > > NodeTextOut_function node_textout;
> > > > > ... etc etc etc ...
> > > > > } CustomPathFuncs;
> > > > >
> > > > > typedef struct CustomPath {
> > > > > Path path;
> > > > > const CustomPathFuncs *funcs;
> > > > > ... maybe a few more fields here, but not too darn many ...
> > > > > } CustomPath;
> > > > >
> > > > > and similarly for CustomPlan.
> > > > >
> > > > > The advantage of this way is it's very cheap for (what I expect
> > > > > will
> > > > > be) the common case where an extension has a fixed set of
> > > > > support functions for its custom paths and plans. It just
> > > > > declares a static constant CustomPathFuncs struct, and puts a
> > > > > pointer to that into its
> > > paths.
> > > > >
> > > > > If an extension really needs to set the support functions on a
> > > > > per-object basis, it can do this:
> > > > >
> > > > > typdef struct MyCustomPath {
> > > > > CustomPath cpath;
> > > > > CustomPathFuncs funcs;
> > > > > ... more fields ...
> > > > > } MyCustomPath;
> > > > >
> > > > > and then initialization of a MyCustomPath would include
> > > > >
> > > > > mypath->cpath.funcs = &mypath->funcs;
> > > > > mypath->funcs.node_copy = MyCustomPathCopy;
> > > > > ... etc etc ...
> > > > >
> > > > > In this case we're arguably wasting one pointer worth of space
> > > > > in the path, but considering the number of function pointers
> > > > > such a path will be carrying, I don't think that's much of an objection.
> > > > >
> > > > > >> So? If you did that, then you wouldn't have renumbered the
> > > > > >> Vars as INNER/OUTER. I don't believe that CUSTOM_VAR is
> > > > > >> necessary at all; if it is needed, then there would also be a
> > > > > >> need for an additional tuple slot in executor contexts, which
> > > > > >> you haven't
> > > provided.
> > > > >
> > > > > > For example, the enhanced postgres_fdw fetches the result set
> > > > > > of remote join query, thus a tuple contains the fields come
> > > > > > from both
> > > side.
> > > > > > In this case, what varno shall be suitable to put?
> > > > >
> > > > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't
> > > > > the
> > > > solution.
> > > > > Consider for example a join where both tables supply columns
> > > > > named
> > "id"
> > > > > --- if you put them both in one tupledesc then there's no
> > > > > non-kluge way to identify them.
> > > > >
> > > > > Possibly the route to a solution involves adding another
> > > > > plan-node callback function that ruleutils.c would use for
> > > > > printing Vars in custom
> > > > join nodes.
> > > > > Or maybe we could let the Vars keep their original RTE numbers,
> > > > > though that would complicate life at execution time.
> > > > >
> > > > > Anyway, if we're going to punt on add_join_path_hook for the
> > > > > time being, this problem can probably be left to solve later.
> > > > > It won't arise for simple table-scan cases, nor for single-input
> > > > > plan nodes such
> > > > as sorts.
> > > > >
> > > > > regards, tom lane

Attachment Content-Type Size
pgsql-v9.4-mvcc_allows_cache.v13.patch application/octet-stream 918 bytes
pgsql-v9.4-vacuum_page_hook.v13.patch application/octet-stream 1.8 KB
pgsql-v9.4-custom-scan.part-1.v13.patch application/octet-stream 66.3 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-03-28 01:19:19 Re: History of WAL_LEVEL (archive vs hot_standby)
Previous Message Andrew Dunstan 2014-03-28 00:17:36 Re: Useless "Replica Identity: NOTHING" noise from psql \d