Re: Custom Scan APIs (Re: Custom Plan node)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-03-04 19:34:54
Message-ID: 12605.1393961694@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I apologize for not having paid much attention to this thread so far.
It kept getting stuck on my "to look at later" queue. Anyway, I've
taken a preliminary look at the v7 patch now.

While the patch seems roughly along the lines of what we talked about
last PGCon, I share Stephen's unease about a lot of the details. It's
not entirely clear that these hooks are really good for anything, and
it's even less clear what APIs the hook functions should be expected
to depend on. I really do not like the approach embodied in the later
patches of "oh, we'll just expose whatever static planner functions seem
convenient". That's not an approach that's available when doing actual
external development of an extension, and even if it were that doesn't
make it a good idea. The larger the exposed surface of functions the
harder it is to know what's safe to change.

Anyway, on to specifics:

* Please drop the whole register_custom_provider/get_custom_provider API.
There is no reason other than debugging for a provider to have a name at
all, and if we expect providers to have unique names then that creates a
collision risk for independently-created extensions. AFAICS, it's
sufficient for a function hooked into one of the add-a-path hooks to
include a pointer to a struct-of-function-pointers in the Path object it
returns, and similarly the CustomScan Plan object can contain a pointer
inserted when it's created. I don't object to having a name field in the
function pointer structs for debugging reasons, but I don't want any
lookups being done on it.

* The function-struct pointers should be marked const in the referencing
nodes, to indicate that the core code won't be modifying or copying them.
In practice they'd probably be statically allocated constants in the
extensions anyway.

* The patch does lots of violence to the separation between planner and
executor, starting with the decision to include nodes/relation.h in
executor.h. That will not do at all. I see that you did that because you
wanted to make ExecSupportsMarkRestore take a Path, but we need some other
answer. One slightly grotty answer is to invent two different customscan
Plan types, one that supports mark/restore and one that doesn't, so that
ExecSupportsMarkRestore could still just look at the Plan type tag.
(BTW, path->pathtype is supposed to contain the node tag of the Plan node
that the path would produce. Putting T_CustomPath in it is entirely
tone-deaf.) Another way would be to remove ExecSupportsMarkRestore in
favor of some new function in the planner; but it's good to keep it in
execAmi.c since that has other knowledge of which plan types support
mark/restore.

* More generally, I'm not convinced that exactly one Path type and exactly
one Plan type is going to get us very far. It seems rather ugly to use
the same Plan type for both scan and join nodes, and what will happen if
somebody wants to build a custom Append node, or something else that has
neither zero nor two subplans?

* nodeCustom.h is being completely abused. That should only export the
functions in nodeCustom.c, which are going to be pretty much one-liners
anyway. The right place to put the function pointer struct definitions
is someplace else. I'd be inclined to start by separating the function
pointers into two structs, one for functions needed for a Path and one for
functions needed for a Plan, so that you don't have this problem of having
to import everything the planner knows into an executor header or vice
versa. Most likely you could just put the Path function pointer struct
declaration next to CustomPath in relation.h, and the one for Plans next
to CustomPlan (or the variants thereof) in plannodes.h.

* The set of fields provided in CustomScan seems nonsensical. I'm not
even sure that it should be derived from Scan; that's okay for plan types
that actually are scans of a base relation, but it's confusing overhead
for anything that's say a join, or a custom sort node, or anything like
that. Maybe one argument for multiple plan node types is that one would
be derived from Scan and one directly from Plan.

* More generally, what this definition for CustomScan exposes is that we
have no idea whatever what fields a custom plan node might need. I'm
inclined to think that what we should be assuming is that any custom path
or plan node is really an object of a struct type known only to its
providing extension, whose first field is the CustomPath or CustomPlan
struct known to the core backend. (Think C++ subclassing.) This would
imply that copyfuncs/equalfuncs/outfuncs support would have to be provided
by the extension, which is in principle possible if we add function
pointers for those operations to the struct linked to from the path/plan
object. (Notationally this might be a bit of a pain, since the macros
that we use in the functions in copyfuncs.c etc aren't public. Not sure
if it's worth exposing those somewhere, or if people should just
copy/paste them.) This approach would probably also avoid the need for
the klugy bitmapset representation you propose in patch 3.

* This also implies that create_customscan_plan is completely bogus.
A custom plan provider *must* provide a callback function for that,
because only it will know how big a node to palloc. There are far too
many assumptions in create_customscan_plan anyway; I think there is
probably nothing at all in that function that shouldn't be getting done
by the custom provider instead.

* Likewise, there is way too much hard-wired stuff in explain.c. It
should not be optional whether a custom plan provider provides an explain
support function, and that function should be doing pretty much everything
involved in printing the node.

* I don't believe in the hard-wired implementation in setrefs.c either.

* Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
That's at best badly thought out, and at worst a source of new bugs.
Under what circumstances is a custom plan node going to contain any Vars
that don't reduce to either scan or input-plan variables? None, because
that would imply that it was doing something unrelated to the requested
query.

* The API for add_join_path_hook seems overcomplex, as well as too full
of implementation details that should remain private to joinpath.c.
I particularly object to passing the mergeclause lists, which seem
unlikely to be of interest for non-mergejoin plan types anyway.
More generally, it seems likely that this hook is at the wrong level of
abstraction; it forces the hook function to concern itself with a lot of
stuff about join legality and parameterization (which I note your patch3
code fails to do; but that's not an optional thing).

* After a bit of reflection I think the best thing to do might be to ditch
add_join_path_hook for 9.4, on these grounds:
1. You've got enough to do to get the rest of the patch committable.
2. Like Stephen, I feel that the proposed usage as a substitute for
FDW-based foreign join planning is not the direction we want to travel.
3. Without that, the use case for new join nodes seems pretty marginal,
as opposed to say alternative sort-node implementations (a case not
supported by this patch, except to the extent that you could use them in
explicit-sort mergejoins if you duplicated large parts of joinpath.c).

* Getting nitpicky ... what is the rationale for the doubled underscore
in the CUSTOM__ flag names? That's just a typo waiting to happen IMO.

* Why is there more than one call site for add_scan_path_hook? I don't
see the need for the calling macro with the randomly inconsistent name,
either.

* The test arrangement for contrib/ctidscan is needlessly complex, and
testing it in the core tests is a bogus idea anyway. Why not just
let it contain its own test script like most other contrib modules?

That's all I've got for now. I've not really looked at the extension code
in either patch2 or patch3, just at the changes in the core code.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-03-04 19:39:55 Re: ALTER TABLE lock strength reduction patch is unsafe
Previous Message Stephen Frost 2014-03-04 19:34:28 Re: Custom Scan APIs (Re: Custom Plan node)