Re: Custom Scan APIs (Re: Custom Plan node)

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, "Jim Mlodgenski" <jimmy76(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2014-02-26 06:50:32
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8F7FA71@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> * Kouhei Kaigai (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
> > > Instead of custom node, it might be better idea to improve FDW
> > > infrastructure to push join. For the starters, is it possible for
> > > the custom scan node hooks to create a ForeignScan node? In general,
> > > I think, it might be better for the custom scan hooks to create existing
> nodes if they serve the purpose.
> > >
> > It does not work well because existing FDW infrastructure is designed
> > to perform on foreign tables, not regular tables. Probably, it needs
> > to revise much our assumption around the background code, if we
> > re-define the purpose of FDW infrastructure. For example, ForeignScan
> > is expected to return a tuple according to the TupleDesc that is exactly
> same with table definition.
> > It does not fit the requirement if we replace a join-node by
> > ForeignScan because its TupleDesc of joined relations is not predefined.
>
> I'm not following this logic at all- how are you defining "foreign" from
> "regular"? Certainly, in-memory-only tables which are sitting out in some
> non-persistent GPU memory aren't "regular" by any PG definition.
> Perhaps you can't make ForeignScan suddenly work as a join-node replacement,
> but I've not seen where anyone has proposed that (directly- I've implied
> it on occation where a remote view can be used, but that's not the same
> thing as having proper push-down support for joins).
>
This regular one means usual tables. Even though custom implementation
may reference self-managed in-memory cache instead of raw heap, the table
pointed in user's query shall be a usual table.
In the past, Hanada-san had proposed an enhancement of FDW to support
remote-join but eventually rejected.

> > I'd like to define these features are designed for individual purpose.
>
> My previous complaint about this patch set has been precisely that each
> piece seems to be custom-built and every patch needs more and more backend
> changes. If every time someone wants to do something with this CustomScan
> API, they need changes made to the backend code, then it's not a generally
> useful external API. We really don't want to define such an external API
> as then we have to deal with backwards compatibility, particularly when
> it's all specialized to specific use cases which are all different.
>
The changes to backend are just for convenient. We may be able to implement
functions to translate Bitmapset from/to cstring form in postgres_fdw,
does it make sense to maintain individually?
I thought these functions were useful to have in the backend commonly, but
is not a fundamental functionality lacks of the custom-scan interface.

> > FDW is designed to intermediate an external data source and internal
> > heap representation according to foreign table definition. In other
> > words, its role is to generate contents of predefined database object
> on the fly.
>
> There's certainly nothing in the FDW API which requires that the remote
> side have an internal heap representation, as evidenced by the various FDWs
> which already exist and certainly are not any kind of 'normal'
> heap. Every query against the foriegn relation goes through the FDW API
> and can end up returning whatever the FDW author decides is appropriate
> to return at that time, as long as it matches the tuple description- which
> is absolutely necessary for any kind of sanity, imv.
>
Yes. It's my understanding for the role of FDW driver.

> > On the other hands, custom-scan is designed to implement alternative
> > ways to scan / join relations in addition to the methods supported by
> > built-in feature.
>
> I can see the usefulness in being able to push down aggregates or other
> function-type calls to the remote side of an FDW and would love to see work
> done along those lines, along with the ability to push down joins to remote
> systems- but I'm not convinced that the claimed flexibility with the
> CustomScan API is there, given the need to continue modifying the backend
> code for each use-case, nor that there are particularly new and inventive
> ways of saying "find me all the cases where set X overlaps with set Y".
> I'm certainly open to the idea that we could have an FDW API which allows
> us to ask exactly that question and let the remote side cost it out and
> give us an answer for a pair of relations but that isn't what this is. Note
> also that in any kind of aggregation push-down we must be sure that the
> function is well-defined and that the FDW is on the hook to ensure that
> the returned data is the same as if we ran the same aggregate function locally,
> otherwise the results of a query might differ based on if the aggregate
> was fired locally or remotely (which could be influenced by costing- eg:
> the size of the relation or its statistics).
>
I can also understand the usefulness of join or aggregation into the remote
side in case of foreign table reference. In similar way, it is also useful
if we can push these CPU intensive operations into co-processors on regular
table references.
As I mentioned above, the backend changes by the part-2/-3 patches are just
minor stuff, and I thought it should not be implemented by contrib module
locally.
Regarding to the condition where we can run remote aggregation, you are
right. As current postgres_fdw push-down qualifiers into remote side,
we need to ensure remote aggregate definition is identical with local one.

> > I'm motivated to implement GPU acceleration feature that works
> > transparently for application. Thus, it has to be capable on regular
> > tables, because most of application stores data on regular tables, not
> foreign ones.
>
> You want to persist that data in the GPU across multiple calls though, which
> makes it unlike any kind of regular PG table and much more like some foreign
> table. Perhaps the data is initially loaded from a local table and then
> updated on the GPU card in some way when the 'real' table is updated, but
> neither of those makes it a "regular" PG table.
>
No. What I want to implement is, read the regular table and transfer the
contents into GPU's local memory for calculation, then receives its
calculation result. The in-memory cache (also I'm working on) is supplemental
stuff because disk access is much slower and row-oriented data structure is
not suitable for SIMD style instructions.

> > > Since a custom node is open implementation, it will be important to
> > > pass as much information down to the hooks as possible; lest the
> > > hooks will be constrained. Since the functions signatures within
> > > the planner, optimizer will change from time to time, so the custom
> > > node hook signatures will need to change from time to time. That might
> turn out to be maintenance overhead.
>
> It's more than "from time-to-time", it was "for each use case in the given
> patch set asking for this feature", which is why I'm pushing back on it.
>
My patch set didn't change the interface itself. All it added was (probably)
useful utility routines to be placed on the backend, rather than contrib.

> > Yes. You are also right. But it also makes maintenance overhead if
> > hook has many arguments nobody uses.
>
> I can agree with this- there should be a sensible API if we're going to
> do this.
>
> > Probably, it makes sense to list up the arguments that cannot be
> > reproduced from other information, can be reproduced but complicated
> > steps, and can be reproduced easily.
>
> This really strikes me as the wrong approach for an FDW join-pushdown API,
> which should be geared around giving the remote side an opportunity on a
> case-by-case basis to cost out joins using whatever methods it has available
> to implement them. I've outlined above the reasons I don't agree with just
> making the entire planner/optimizer pluggable.
>
I'm also inclined to have arguments that will provide enough information
for extensions to determine the best path for them.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2014-02-26 07:17:16 Re: jsonb and nested hstore
Previous Message Yugo Nagata 2014-02-26 06:26:12 Re: Fwd: Proposal: variant of regclass