Re: [v9.5] Custom Plan API

Lists: pgsql-hackers
From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, "Jim Mlodgenski" <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-09-04 23:57:07
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8FD6E29@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Sun, Aug 31, 2014 at 12:54 AM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:
> > 2014-08-29 13:33 GMT-04:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> >> Comments:
> >>
> >> 1. There seems to be no reason for custom plan nodes to have
> >> MultiExec support; I think this as an area where extensibility is
> >> extremely unlikely to work out. The MultiExec mechanism is really
> >> only viable between closely-cooperating nodes, like Hash and
> >> HashJoin, or BitmapIndexScan, BitmapAnd, BitmapOr, and
> >> BitmapHeapScan; and arguably those things could have been written as
> a single, more complex node.
> >> Are we really going to want to support a custom plan that can
> >> substitute for a Hash or BitmapAnd node? I really doubt that's very
> >> useful.
> >>
> > This intends to allows a particular custom-scan provider to exchange
> > its internal data when multiple custom-scan node is stacked.
> > So, it can be considered a facility to implement closely-cooperating
> > nodes; both of them are managed by same custom-scan provider.
> > An example is gpu-accelerated version of hash-join that takes
> > underlying custom-scan node that will returns a hash table with gpu
> > preferable data structure, but should not be a part of row-by-row
> interface.
> > I believe it is valuable for some use cases, even though I couldn't
> > find a use-case in ctidscan example.
>
> Color me skeptical. Please remove that part for now, and we can revisit
> it when, and if, a plausible use case emerges.
>
Now, I removed the multi-exec portion from the patch set.

Existence of this interface affects to the query execution cost so much,
so I want to revisit it as soon as possible. Also see the EXPLAIN output
on the tail of this message.

> > It came from the discussion I had long time before during patch
> > reviewing of postgres_fdw. I suggested to use static table of
> > FdwRoutine but I got a point that says some compiler raise
> > error/warning to put function pointers on static initialization.
> > I usually use GCC only, so I'm not sure whether this argue is right or
> > not, even though the postgres_fdw_handler() allocates FdwRoutine using
> > palloc() then put function pointers for each.
>
> That's odd, because aset.c has used static initializers since forever, and
> I'm sure someone would have complained by now if there were a problem with
> that usage.
>
I reminded the discussion at that time. The GCC specific manner was not
static initialization itself, it was static initialization with field name.
Like:
static CustomPathMethods ctidscan_path_methods = {
.CustomName = "ctidscan",
.CreateCustomPlan = CreateCtidScanPlan,
.TextOutCustomPath = TextOutCtidScanPath,
};

Regarding to the attached three patches:
[1] custom-path and hook
It adds register_custom_path_provider() interface for registration of
custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist
to offer alternative scan path on regular relations.
I may need to explain the terms in use. I calls the path-node custom-path
that is the previous step of population of plan-node (like custom-scan
and potentially custom-join and so on). The node object created by
CreateCustomPlan() is called custom-plan because it is abstraction for
all the potential custom-xxx node; custom-scan is the first of all.

[2] custom-scan node
It adds custom-scan node support. The custom-scan node is expected to
generate contents of a particular relation or sub-plan according to its
custom-logic.
Custom-scan provider needs to implement callbacks of CustomScanMethods
and CustomExecMethods. Once a custom-scan node is populated from
custom-path node, the backend calls back these methods in the planning
and execution stage.

[3] contrib/ctidscan
It adds a logic to scan a base relation if WHERE clause contains
inequality expression around ctid system column; that allows to skip
blocks which will be unread obviously.

During the refactoring, I noticed a few interface is omissible.
The backend can know which relation is the target of custom-scan node
being appeared in the plan-tree if its scanrelid > 0. So, I thought
ExplainCustomPlanTargetRel() and ExplainCustomPreScanNode() are
omissible, then removed from the patch.

Please check the attached ones.

--------
Also, regarding to the use-case of multi-exec interface.
Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has
two sub-plans; GpuScan and MultiHash.
GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize
multi-exec interface for more efficient data exchange between the nodes.
GpuScan already keeps a data structure that is suitable to send to/recv from
GPU devices and constructed on the memory segment being DMA available.
If we have to form a tuple, pass it via row-by-row interface, then deform it,
it will become a major performance degradation in this use case.

postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Custom (GpuHashJoin) (cost=10979.56..90064.15 rows=333 width=49)
pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid]
hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid))
-> Custom (GpuScan) on t10 (cost=10000.00..88831.26 rows=3333327 width=16)
Host References: aid, bid, x
Device References: x
Device Filter: (x < 10::double precision)
-> Custom (MultiHash) (cost=464.56..464.56 rows=1000 width=41)
hash keys: aid, bid
-> Hash Join (cost=60.06..464.56 rows=1000 width=41)
Hash Cond: (t9.data = t8.data)
-> Index Scan using t9_pkey on t9 (cost=0.29..357.29 rows=10000 width=37)
-> Hash (cost=47.27..47.27 rows=1000 width=37)
-> Index Scan using t8_pkey on t8 (cost=0.28..47.27 rows=1000 width=37)
Planning time: 0.810 ms
(15 rows)

--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-v9.5-custom-scan.part-3.v8.patch application/octet-stream 45.0 KB
pgsql-v9.5-custom-scan.part-2.v8.patch application/octet-stream 48.3 KB
pgsql-v9.5-custom-scan.part-1.v8.patch application/octet-stream 9.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Jim Mlodgenski <jimmy76(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [v9.5] Custom Plan API
Date: 2014-09-08 16:39:59
Message-ID: CA+Tgmoa=tH5a6hpxEKFJzEkLJhZRYp5k-cTnLDXxfwUkm=6EpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> Regarding to the attached three patches:
> [1] custom-path and hook
> It adds register_custom_path_provider() interface for registration of
> custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist
> to offer alternative scan path on regular relations.
> I may need to explain the terms in use. I calls the path-node custom-path
> that is the previous step of population of plan-node (like custom-scan
> and potentially custom-join and so on). The node object created by
> CreateCustomPlan() is called custom-plan because it is abstraction for
> all the potential custom-xxx node; custom-scan is the first of all.

I don't think it's a good thing that add_custom_path_type is declared
as void (*)(void *) rather than having a real type. I suggest we add
the path-creation callback function to CustomPlanMethods instead, like
this:

void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel,
RangeTblEntry *rte);

Then, register_custom_path_provider() can just take CustomPathMethods
* as an argument; and create_customscan_paths can just walk the list
of CustomPlanMethods objects and call CreateCustomScanPath for each
one where that is non-NULL. This conflates the path generation
mechanism with the type of path getting generated a little bit, but I
don't see any real downside to that. I don't see a reason why you'd
ever want two different providers to offer the same type of
custompath.

Don't the changes to src/backend/optimizer/plan/createplan.c belong in patch #2?

> [2] custom-scan node
> It adds custom-scan node support. The custom-scan node is expected to
> generate contents of a particular relation or sub-plan according to its
> custom-logic.
> Custom-scan provider needs to implement callbacks of CustomScanMethods
> and CustomExecMethods. Once a custom-scan node is populated from
> custom-path node, the backend calls back these methods in the planning
> and execution stage.

It looks to me like this patch is full of holdovers from its earlier
life as a more-generic CustomPlan node. In particular, it contains
numerous defenses against the case where scanrelid != 0. These are
confusingly written as scanrelid > 0, but I think really they're just
bogus altogether: if this is specifically a CustomScan, not a
CustomPlan, then the relid should always be filled in. Please
consider what can be simplified here.

The comment in _copyCustomScan looks bogus to me. I think we should
*require* a static method table.

In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of
stuff; } else elog(ERROR, ...). I think it would be clearer to write
if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff;

> Also, regarding to the use-case of multi-exec interface.
> Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has
> two sub-plans; GpuScan and MultiHash.
> GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize
> multi-exec interface for more efficient data exchange between the nodes.
> GpuScan already keeps a data structure that is suitable to send to/recv from
> GPU devices and constructed on the memory segment being DMA available.
> If we have to form a tuple, pass it via row-by-row interface, then deform it,
> it will become a major performance degradation in this use case.
>
> postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------
> Custom (GpuHashJoin) (cost=10979.56..90064.15 rows=333 width=49)
> pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid]
> hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid))
> -> Custom (GpuScan) on t10 (cost=10000.00..88831.26 rows=3333327 width=16)
> Host References: aid, bid, x
> Device References: x
> Device Filter: (x < 10::double precision)
> -> Custom (MultiHash) (cost=464.56..464.56 rows=1000 width=41)
> hash keys: aid, bid
> -> Hash Join (cost=60.06..464.56 rows=1000 width=41)
> Hash Cond: (t9.data = t8.data)
> -> Index Scan using t9_pkey on t9 (cost=0.29..357.29 rows=10000 width=37)
> -> Hash (cost=47.27..47.27 rows=1000 width=37)
> -> Index Scan using t8_pkey on t8 (cost=0.28..47.27 rows=1000 width=37)
> Planning time: 0.810 ms
> (15 rows)

Why can't the Custom(GpuHashJoin) node build the hash table internally
instead of using a separate node?

Also, for this patch we are only considering custom scan. Custom join
is another patch. We don't need to provide infrastructure for that
patch in this one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company