Re: WIP: Join push-down for foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Join push-down for foreign tables
Date: 2011-10-03 08:07:04
Message-ID: CADyhKSVX=bivbzF0VT9KSwqe=i68tirh+hUZ921XTN770SvUzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hanada-san,

I applied your patch and run a few test cases. while this test, I
noticed a few points.

At first, I tried to use file_fdw, however, it was crashed of course.
It seems to me this logic should be modified to confirm whether the target FDW
support join push down, or not.

+ if (enable_foreignjoin &&
+ joinrel->serverid != InvalidOid &&
+ (IsA(outerpath, ForeignPath) || IsA(outerpath, ForeignJoinPath)) &&
+ (IsA(inner_cheapest_total, ForeignPath) ||
+ IsA(inner_cheapest_total, ForeignJoinPath)))
+
+ {
+ ForeignJoinPath *path;
+ path = create_foreignjoin_path(root,
+ joinrel,
+ jointype,
+ sjinfo,
+ outerpath,
+ inner_cheapest_total,
+ restrictlist,
+ merge_pathkeys);
+ if (path != NULL)
+ add_path(joinrel, (Path *) path);
+ }
+

In my opinion, FdwRoutine should have an additional API to inform the core its
supported features; such as inner-join, outer-join, order-by,
group-by, aggregate
functions, insert, update, delete, etc... in the future version.

Obviously, it is not hard to implement inner/outer-join feature for
pgsql_fdw module,
but it may be a tough work for memcached_fdw module.

> *) SELECT * FROM A JOIN B (...) doesn't work. Specifying columns in
> SELECT clause explicitly like "SELECT A.col1, A.col2, ..." seems to work.
> *) ORDER BY causes error if no column is specified in SELECT clause from
> sort key's table.
>
I doubt these issues are in pgsql_fdw side, not the proposed patch itself.

In the case when the table and column names/types are compatible between
local-side and remote-side, the problem was not reproduced in my environment.
I'd like to suggest you to add a functionality to map remote column names to
the local ones in pgsql_fdw.

See below:

* I set up three foreign tables on the local side.
CREATE FOREIGN TABLE ft_1 (a int, b text) SERVER local_db;
CREATE FOREIGN TABLE ft_2 (x int, y text) SERVER local_db;
CREATE FOREIGN TABLE ft_3 (s int, t text) SERVER local_db;

* I also set up related tables on the remote side.
CREATE TABLE ft_1 (a int, b text);
CREATE TABLE ft_2 (x int, y text);
CREATE TABLE ft_3 (ss int, tt text);
Please note that column name of ft_3 is not compatible

* JOIN ft_1 and ft_2 works collectly.
postgres=# SELECT * FROM ft_1 JOIN ft_2 ON a = x;
a | b | x | y
---+-----+---+-----
2 | bbb | 2 | bbb
3 | ccc | 3 | ccc
4 | ddd | 4 | ddd
(3 rows)

postgres=# EXPLAIN SELECT * FROM ft_1 JOIN ft_2 ON a = x;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Foreign Scan on multiple foreign tables (cost=0.00..0.00 rows=5000 width=72)
Remote SQL: SELECT ft_1.a, ft_1.b, ft_2.x, ft_2.y FROM public.ft_1
ft_1, public.ft_2 ft_2 WHERE (ft_1.a = ft_2.x)
(2 rows)

* JOIN ft_1 and ft_3 does not works. Error message says ft_3.s does
not exist. Probably, it means ft_3.s does not exist "on the remote
host".

postgres=# SELECT * FROM ft_1 JOIN ft_3 ON a = s;
ERROR: could not execute foreign query
DETAIL: ERROR: column ft_3.s does not exist
LINE 1: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1 ft_1,...
^

HINT: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1 ft_1,
public.ft_3 ft_3 WHERE (ft_1.a = ft_3.s)
postgres=# EXPLAIN SELECT * FROM ft_1 JOIN ft_3 ON a = s;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Foreign Scan on multiple foreign tables (cost=0.00..0.00 rows=5000 width=72)
Remote SQL: SELECT ft_1.a, ft_1.b, ft_3.s, ft_3.t FROM public.ft_1
ft_1, public.ft_3 ft_3 WHERE (ft_1.a = ft_3.s)
(2 rows)

In fact, EXPLAIN shows us the remote SQL tries to reference ft_3.s,
instead of ft_3.ss.

Thanks,

2011年9月14日10:24 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
> Hi all,
>
> I'd like to propose $SUBJECT for further foreign query optimization.
> I've not finished development, but I'd appreciate it if I got someone's
> review on my WIP code and its design.
>
> Changes I made
> ==============
>
> (1) Add foreign server OID to RelOptInfo
> I think it would be nice to know whether a join comes from one foreign
> server or not without digging into child nodes during considering paths
> for a query. So I added serverid field to RelOptInfo, which defaults to
> InvalidOid ,and is set to OID of the server if the node and all of its
> children are from same foreign server. This also avoids looking catalog
> up for foreign table entry to determine FDW routine.
>
> (2) Add new planner node, ForeignJoinPath
> ForeignJoinPath derives JoinPath, like other Join nodes, and holds
> FdwPlan like ForeignPath node.
>
> This node is used to represent a pushed-down join between foreign tables
> and/or another foreign join in early planning phase, for all of
> combination such as table-table, table-join, join-table and join-join
> will be considered. In addition, though they might generate same
> internal (FDW-specific) plan, reversed combination is considered;
> planner generates two ForeignJoinPath for both (A & B) and (B & A).
>
> During creation of this node, planner calls new FDW handler function
> PlanForeignJoin to get a FdwPlan which includes costs and internal plan
> of a foreign join. If a FDW can't (or doesn't want to) handle this
> join, just return NULL is OK, and then planner gives such optimization
> up and considers other usual join methods such as nested loop and hash join.
>
> A subtree which has a ForeignJoin on its top is translated into a
> ForeignScan node during constructing a plan tree. This behavior is
> different from other join path nodes such as NestPath and MergePath,
> because they have child plan nodes correspond to path nodes.
>
> (3) Add EXPALIN support for foreign join (currently just for debug)
> ForeignScan might not be a simple foreign table scan, so
> ExplainScanTarget() can't be used for it. An idea I have is adding
> ExplainForeignScanTarget() to handle ForeignScan separately from other
> scan nodes.
>
> (4) Add new GUC parameter, enable_foreignjoin
> If this was off, planner never generates ForeignJoinPath. In such case,
> foreign tables will be joined with one of NestLoop, MergeJoin and HashJoin.
>
> Known issue
> ===========
>
> I'm sorry but attached patch, join_pushdown_v1.patch, is WIP, so
> currently some kind of query fails. Known failure patterns are:
>
> *) SELECT * FROM A JOIN B (...) doesn't work. Specifying columns in
> SELECT clause explicitly like "SELECT A.col1, A.col2, ..." seems to work.
> *) ORDER BY causes error if no column is specified in SELECT clause from
> sort key's table.
>
> Probably more problems still are there...
>
> PG-wrapper as sample implementation
> ===================================
>
> pgsql_fdw-0.1.0.tar.gz is an WIP implementation of PG-wrapper, which can
> (hopefully) handle both simple foreign table scan and multiple foreign
> joins. You can build it with placing in contrib/, or using pgxs. Note
> that it has some issues such as memory leak of PGresult. I'm planning
> to propose this wrapper as a contrib module, but it would be after
> clearing such issues.
>
> Regards,
> --
> Shigeru Hanada
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-10-03 08:21:29 Re: Should we get rid of custom_variable_classes altogether?
Previous Message Simon Riggs 2011-10-03 07:32:04 Re: bug of recovery?