Re: Custom Scan APIs (Re: Custom Plan node)

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: Custom Scan APIs (Re: Custom Plan node)
Date: 2013-11-11 10:43:07
Message-ID: CADyhKSU6W-r+1j+cs5TOJ47tYYvonGGVaHRSg1KbV=nzuULqTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> The attached patches provide a feature to implement custom scan node
> that allows extension to replace a part of plan tree with its own code
> instead of the built-in logic.
> In addition to the previous proposition, it enables us to integrate custom
> scan as a part of candidate paths to be chosen by optimizer.
> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
> a set of API stuff and a simple demonstration module that implement
> regular table scan using inequality operator on ctid system column.
> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
> postgres_fdw to support remote join capability.
>
> Below is an example to show how does custom-scan work.
>
> We usually run sequential scan even if clause has inequality operator
> that references ctid system column.
>
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
> QUERY PLAN
> --------------------------------------------------------
> Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
> Filter: (ctid > '(10,0)'::tid)
> (2 rows)
>
> An extension that performs as custom-scan provider suggests
> an alternative path, and its cost was less than sequential scan,
> thus optimizer choose it.
>
> postgres=# LOAD 'ctidscan';
> LOAD
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
> QUERY PLAN
> ----------------------------------------------------------------------
> Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
> Filter: (ctid > '(10,0)'::tid)
> (2 rows)
>
> Of course, more cost effective plan will win if exists.
>
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
> QUERY PLAN
> -------------------------------------------------------------------
> Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
> Index Cond: (a = 200)
> Filter: (ctid > '(10,0)'::tid)
> (3 rows)
>
> One other worthwhile example is remote-join enhancement on the
> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
> managed by same foreign server.
>
> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
> WHERE f_leak(b) AND y
> like '%aaa%';
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------
> Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
> Output: a, b, x, y
> Filter: f_leak(b)
> Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
> (4 rows)
>
> ---------------------------
> How does it works
> ---------------------------
> This patch adds two hooks (for base and join relations) around allpaths.c
> and joinpaths.c. It allows extensions to add alternative paths to handle
> scanning on the base relation or join of two relations.
>
> Its callback routine can add CustomPath using add_path() to inform
> optimizer this alternative scan path. Every custom-scan provider is
> identified by its name being registered preliminary using the following
> function.
>
> void register_custom_provider(const CustomProvider *provider);
>
> CustomProvider is a set of name string and function pointers of callbacks.
>
> Once CustomPath got chosen, create_scan_plan() construct a custom-
> scan plan and calls back extension to initialize the node.
> Rest of portions are similar to foreign scan, however, some of detailed
> portions are different. For example, foreign scan is assumed to return
> a tuple being formed according to table definition. On the other hand,
> custom-scan does not have such assumption, so extension needs to
> set tuple-descriptor on the scan tuple slot of ScanState structure by
> itself.
>
> In case of join, custom-scan performs as like a regular scan but it
> returns tuples being already joined on underlying relations.
> The patched postgres_fdw utilizes a hook at joinpaths.c to run
> remote join.
>
> ------------
> Issues
> ------------
> I'm not 100% certain whether arguments of add_join_path_hook is
> reasonable. I guess the first 7 arguments are minimum necessity.
> The mergeclause_list and semifactors might be useful if someone
> tries to implement its own mergejoin or semijoin. Also, I'm not
> good at usage of path parameterization, but the last two arguments
> are related to. Where is the best code to learn about its usage?
>
> +/* Hook for plugins to add custom join path, in addition to default ones */
> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> + RelOptInfo *joinrel,
> + RelOptInfo *outerrel,
> + RelOptInfo *innerrel,
> + JoinType jointype,
> + SpecialJoinInfo *sjinfo,
> + List *restrictlist,
> + List *mergeclause_list,
> + SemiAntiJoinFactors *semifactors,
> + Relids param_source_rels,
> + Relids extra_lateral_rels);
> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
>
> When we replace a join by a custom scan, where is the best target
> for Var node that referenced relations under the join?
> Usually, Var->varno is given as rtindex of tables being joined, then,
> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
> It eventually determines the slot to be fetched on ExecEvalScalarVar().
> On the other hand, we want Var-node to reference scan-tuple-slot
> neither outer-slot nor inner-slot, if we replaced a join.
> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
> Probably, it is a straightforward way to run remote join as like a scan,
> but I'm not certain whether it is the best way.
>
>
> I was concerned about FDW callback of postgres_fdw is designed to
> take ForeignState argument. Because of this, remote join code did
> not available to call these routines, even though most of custom-join
> portions are similar.
> So, I'd like to rework postgres_fdw first to put a common routine that
> can be called from FDW portion and remote join portions.
> However, I thought it makes reviewing hard due to the large scale of
> changeset. So, I'd like to have a code reworking first.
>
>
> ----------------
> Jobs to do
> ----------------
> * SGML documentation like fdwhandler.sgml is still under construction.
> * Probably, a wikipage may help people to understand it well.
> * Postgres_fdw needs reworking to share common code for both of
> FDW and remote join portions.
>
> Thanks,
>
> 2013/10/5 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
>> 2013/10/3 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> Well, there were a lot of problems with your demonstration, which have
>>> already been pointed out upthread. I'm skeptical about the idea of
>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>> I think you'll do better to focus on a narrower case - I'd suggest
>>> custom scan nodes - and leave the rest as a project for another time.
>>>
>> Thanks, it makes me clear what we should target on v9.4 development.
>> Towards the next commitfest, I'm planning to develop the following
>> features:
>> * CustomScan node that can run custom code instead of built-in
>> scan nodes.
>> * Join-pushdown of postgres_fdw using the hook to be located on
>> the add_paths_to_joinrel(), for demonstration purpose.
>> * Something new way to scan a relation; probably, your suggested
>> ctid scan with less or bigger qualifier is a good example, also for
>> demonstration purpose.
>>
>> Probably, above set of jobs will be the first chunk of this feature.
>> Then, let's do other stuff like Append, Sort, Aggregate and so on
>> later. It seems to me a reasonable strategy.
>>
>
> --
> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rohit Goyal 2013-11-11 11:06:03 Test of Algorithm || Indexing Scheme
Previous Message Kohei KaiGai 2013-11-11 10:37:19 Re: Exempting superuser from row-security isn't enough. Run predicates as DEFINER?