Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Chris Travers <chris(dot)travers(at)adjust(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?
Date: 2017-08-21 01:37:15
Message-ID: CAMsr+YGmqcDD6JhP0OO-vMtJ-EftnNn4d8EXTV1Dc1BOmM9apQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 August 2017 at 10:10, MauMau <maumau307(at)gmail(dot)com> wrote:

> From: Chris Travers
> > Why cannot you do all this in a language handler and treat as a user
> defined function?
> > ...
> > If you have a language handler for cypher, why do you need in_region
> or cast_region? Why not just have a graph_search() function which
> takes in a cypher query and returns a set of records?
>
> The language handler is for *stored* functions. The user-defined
> function (UDF) doesn't participate in the planning of the outer
> (top-level) query. And they both assume that they are executed in SQL
> commands.
>

While I generally agree with Tom on this, I think there are some useful
ideas to examine.

Allow a UDF to emit multiple result sets that can then be incorporated into
a outer query. IMO it'd be fine to support this by returning a wide row of
REFCURSORs and then allow FETCH to be used in a subquery.

The UDF would need to be invoked before the rest of the query was planned,
so the planner could learn the structure of the cursor's result sets.

Or some higher level concept could be introduced, like it was for
aggregates and window functions, where one call can be made to get the
output structure and some stats estimates, and another call (or series) to
get the rows.

I guess you're going two steps further than that, seeking a more integrated
model where the plugin can generate paths and participate more actively in
planning, and where you can optionally make it the default so you don't
need a SQL function call to access it.

If you want to pursue that, I suggest you start small and go step-by-step.
Things like:

* Allow FETCH ... <refcursor> to be used in subqueries with explicitly
listed output relation structure, like calling a function that returns
record

* Allow pre-execution of parts of a query that produce refcursors used in
subqueries, then finish planning the outer query once the cursor output
types are known

* A construct that can inject arbitrary virtual relations into the
namespace at parse-time, so you don't have to do the dance with refcursors.
(Like WITH).

* Construct that can supply stats estimates for the virtual relations

So try to build it in stages.

You could also potentially use the FDW interface.

> I want the data models to meet these:
>
> 1) The query language can be used as a top-level session language.
> For example, if an app specifies "region=cypher_graph" at database
> connection, it can use the database as a graph database and submit
> Cypher queries without embedding them in SQL.
>

Why? What does this offer over the app or client tool wrapping its queries
in "SELECT cypher_graph('....')" ?

> 2) When a query contains multiple query fragments of different data
> models, all those fragments are parsed and planned before execution.
> The planner comes up with the best plan, crossing the data model
> boundary. To take the query example in my first mail, which joins a
> relational table and the result of a graph query. The relational
> planner considers how to scan the table, the graph planner considers
> how to search the graph, and the relational planner considers how to
> join the two fragments.
>

Here, what you need is a way to define a set of virtual relations on a
per-query basis, where you can get stats estimates for the relations during
planning.

I guess what you're imagining is something more sophisticated where you're
generating some kind of sub-plan candidates, like the path model. With some
kind of interaction so the sub-planner for the other model could know to
generate a different sub-plan based on the context of the outer plan. I
have no idea how that could work. But I think you have about zero chance of
achieving what you want by going straight there. Focus on small incremental
steps, preferably ones you can find other uses for too.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-21 02:22:17 Re: Update low-level backup documentation to match actual behavior
Previous Message Michael Paquier 2017-08-21 01:33:00 Re: POC: Sharing record typmods between backends