WIP: Hierarchical Queries - stage 1

Lists: pgsql-hackerspgsql-patches
From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Hierarchical Queries--Status
Date: 2006-08-05 15:12:09
Message-ID: 36e682920608050812t6af5825x4a61607db21fa294@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

All,

In the spirit of our previous discussion, I am writing to inform you
that Mark Cave-Ayland and I will be working on this TODO-item
together. We are thinking through a new design (not based on the
current patch) and will post it to -hackers for approval soon.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 2nd Floor | jharris(at)enterprisedb(dot)com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2006-08-27 01:03:20
Message-ID: 20060827010320.GA30132@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jonah H. Harris wrote:
> All,
>
> In the spirit of our previous discussion, I am writing to inform you
> that Mark Cave-Ayland and I will be working on this TODO-item
> together. We are thinking through a new design (not based on the
> current patch) and will post it to -hackers for approval soon.

Was it posted?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2006-08-27 01:41:45
Message-ID: 36e682920608261841g1a075effqe6366710197b2713@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/26/06, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Was it posted?

No, the patch has not yet been posted. I'm not sure of where Mark's
at with it at this point in time.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 2nd Floor | jharris(at)enterprisedb(dot)com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2006-08-27 02:35:01
Message-ID: 20060827023501.GB30132@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jonah H. Harris wrote:
> On 8/26/06, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> >Was it posted?
>
> No, the patch has not yet been posted. I'm not sure of where Mark's
> at with it at this point in time.

Actually I was thinking in the design rather than the code ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2006-08-27 02:46:30
Message-ID: 36e682920608261946g333e6016n10cfeec92e809824@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/26/06, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Actually I was thinking in the design rather than the code ...

Doh! We hadn't posted the design just yet. Let me write him and see
where he's at and we'll throw something together for the list.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation | fax: 732.331.1301
33 Wood Ave S, 2nd Floor | jharris(at)enterprisedb(dot)com
Iselin, New Jersey 08830 | http://www.enterprisedb.com/


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2006-09-04 16:15:57
Message-ID: 1157386557.7980.24.camel@mca-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 2006-08-26 at 22:46 -0400, Jonah H. Harris wrote:
> On 8/26/06, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> > Actually I was thinking in the design rather than the code ...
>
> Doh! We hadn't posted the design just yet. Let me write him and see
> where he's at and we'll throw something together for the list.

[Note to Jonah: I've tried sending a similar version of this email to
you a couple of times, but I'm not sure that it's getting through, hence
the post to -hackers in the hope you may be able to pick it up there.]

Hi everyone,

I've had a chance to sit down for a day or so to see how to approach
adding hierarchical queries to PostgreSQL, so I thought I'd post my
initial thoughts on how to proceed. My aim is to refine the list below
based upon feedback from hackers until it gets to the point where I can
start work on it. Here's what I've got so far:

1) Add detection of the WITH clause to the parser.

2) Create a new type of RangeTblEntry to represent each common table
expression specified within the WITH clause where the subquery field
points to the nodes representing the common table expression.

3) Add planner support so that WITH clauses are mapped to a new type of
node that utilises two tuplestores - an output tuplestore and a working
tuplestore. The output tuple store will in effect be the contents of the
table expression while the working tuplestore holds the results of the
last iteration if recursive. Also implement some kind of WithState node
which keeps track of the recursion state.

Having spent some more time today looking at 1) and also at the SQL 2003
spec, it would seem that other databases offer the WITH clause within
subqueries and also as part of a view. I'd be interested to hear
feedback from other developers as to whether it would be possible to
achieve this level of support within PostgreSQL.

Many thanks,

Mark.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2006-09-04 20:14:43
Message-ID: 20060904201443.GF16894@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Sep 04, 2006 at 05:15:57PM +0100, Mark Cave-Ayland wrote:
> 3) Add planner support so that WITH clauses are mapped to a new type of
> node that utilises two tuplestores - an output tuplestore and a working
> tuplestore. The output tuple store will in effect be the contents of the
> table expression while the working tuplestore holds the results of the
> last iteration if recursive. Also implement some kind of WithState node
> which keeps track of the recursion state.

That's basically what I came up with. Basically you have a sort of loop
in the execution plan where tuples that come out are copied into a
tuplestore and run through a particular part of the executor again. The
top-down approach of the executor makes it a bit trickier...

> Having spent some more time today looking at 1) and also at the SQL 2003
> spec, it would seem that other databases offer the WITH clause within
> subqueries and also as part of a view. I'd be interested to hear
> feedback from other developers as to whether it would be possible to
> achieve this level of support within PostgreSQL.

Absolutly possible. The question is how much work :)

Incidently, if you find a way to support common subplans (where a part
of the executor is shared between two executions) that might provide a
way to solve some of the trickier multiple evaluation problems with
rules. Again, it would just be a tuplestore the stored the results for
multiple executions.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
To: pgsql-patches(at)postgresql(dot)org
Subject: WIP: Hierarchical Queries - stage 1
Date: 2006-09-20 19:43:38
Message-ID: 1158781418.5658.57.camel@mca-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi everyone,

After spending several days reading PostgreSQL source code (and another
couple of days coding), I've managed to come up with some alpha code
that attempts to implement non-recursive WITH common table expressions.
Having got this far, I feel that I need to ask for advice from some more
experienced PostgreSQL hackers and so I post the alpha version here.

The patch attempts to alter the parser grammar, and using code modified
from that used when processing subselects in the FROM clause, attempts
to treat the CTE as if it were presented the same as a FROM subselect.

My main issue at the moment is that the code in transformFromClauseItem
seems a terrible hack, mainly because the grammar returns each string
within the FROM clause as a RangeVar, and transformFromClauseItem
assumes that each RangeVar represents a physical relation. Of course,
this is not the case when referencing a CTE and so the code first checks
to see if an entry has already been created when processing the WITH
clause; if it does then we return NULL to indicate that
transformFromClause should do nothing. Messy, but I wanted to see what
other developers thought before jumping in and rewriting this part of
the code.

Another point to think about is what should a query return if the SELECT
doesn't refer to a CTE? For example:

- WITH foo(a, b) AS (SELECT * FROM pg_class) SELECT * FROM pg_class;

Since I've inserted the CTE as a range table entry of type RTE_SUBQUERY
then the current behaviour is to perform a cross-join between foo and
bar which I'm not sure is the correct behaviour since CTEs seem to be
more like views in this respect.

Finally, the current code fails when supplying an additional alias to a
CTE in the select statement and then trying to refer to it, e.g.

- with myrel(p1) as (select oid from pg_class) select myrel.p1 from
myrel AS foo, pg_class AS bar WHERE myrel.p1 = bar.oid; -- WORKS

- with myrel(p1) as (select oid from pg_class) select myrel.p1 from
myrel AS foo, pg_class AS bar WHERE foo.p1 = bar.oid; -- FAILS

ERROR: missing FROM-clause entry for table "foo" at character 103

So in this case, should foo be accepted as a valid alias for myrel? My
feeling is that I will end up with an RTE_CTE range table entry kind
which borrows from the current subquery behaviour, but it would be very
useful to see where the similarity between the two range table entry
types ends.

TIA,

Mark.

Attachment Content-Type Size
pgsql_with.patch text/x-patch 11.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: Hierarchical Queries - stage 1
Date: 2006-09-21 00:13:16
Message-ID: 20916.1158797596@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk> writes:
> My main issue at the moment is that the code in transformFromClauseItem
> seems a terrible hack, mainly because the grammar returns each string
> within the FROM clause as a RangeVar, and transformFromClauseItem
> assumes that each RangeVar represents a physical relation. Of course,
> this is not the case when referencing a CTE and so the code first checks
> to see if an entry has already been created when processing the WITH
> clause; if it does then we return NULL to indicate that
> transformFromClause should do nothing. Messy, but I wanted to see what
> other developers thought before jumping in and rewriting this part of
> the code.

You really can't get away with having the identical representation for
CTEs and ordinary sub-selects in the range table. For instance, it
looks like your patch will think that

select ... from (select ...) as x, x, ...

is legal when it certainly is not. I think you need either a new
RTEKind or an additional flag in the RTE to show that it's a CTE rather
than a plain subselect. I'm not entirely sure that you even want the
CTEs in the rangetable at all --- that still needs some thought.

> Another point to think about is what should a query return if the SELECT
> doesn't refer to a CTE?

The spec ought to make this perfectly clear ... or perhaps not so clear,
but I'm sure it's defined.

> - with myrel(p1) as (select oid from pg_class) select myrel.p1 from
> myrel AS foo, pg_class AS bar WHERE foo.p1 = bar.oid; -- FAILS

> So in this case, should foo be accepted as a valid alias for myrel?

This comes back to the question of whether the CTE per se should be an
RTE at all. Maybe only the reference to it should be an RTE. The
behavior when seeing a plain RangeVar in FROM would be to first search
the side list of valid CTEs, and only on failure go looking for a real
table.

regards, tom lane


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: Hierarchical Queries - stage 1
Date: 2006-09-22 07:29:00
Message-ID: 1158910140.8379.27.camel@mca-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi Tom,

Thanks for your initial thoughts on this.

On Wed, 2006-09-20 at 20:13 -0400, Tom Lane wrote:

(cut)

> You really can't get away with having the identical representation for
> CTEs and ordinary sub-selects in the range table. For instance, it
> looks like your patch will think that
>
> select ... from (select ...) as x, x, ...
>
> is legal when it certainly is not. I think you need either a new
> RTEKind or an additional flag in the RTE to show that it's a CTE rather
> than a plain subselect. I'm not entirely sure that you even want the
> CTEs in the rangetable at all --- that still needs some thought.

For semantic reasons, I can see why you are questioning whether or not
the CTE should be contained within the rangetable - there is an implicit
hint that RTEs reflect entries within the FROM clause, but then I also
see that the rewriter adds RTEs when substituting view definitions into
queries. The comments in parsenodes.h also suggest that an RTE is a
namespace/data source reference for a named entity within the query.

The main problem I can see with keeping the CTEs outside the rangetable
is that according to the source, jointree nodes must currently have
RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of
maintaining the CTEs separately would involve something along the lines
of keeping a separate CTETable and creating some form of CTETBLREF node
that could be referenced within the jointree. While arguably it may be
semantically neater, it appears to involve quite a bit of extra work...
could you explain in more detail as to why you feel that CTEs should
remain outside the rangetable?

> This comes back to the question of whether the CTE per se should be an
> RTE at all. Maybe only the reference to it should be an RTE. The
> behavior when seeing a plain RangeVar in FROM would be to first search
> the side list of valid CTEs, and only on failure go looking for a real
> table.

This is effectively what the patch does, albeit not particularly
elegantly. I'll spend some time on making those changes a bit more
refined.

Kind regards,

Mark.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: Hierarchical Queries - stage 1
Date: 2006-09-22 14:02:33
Message-ID: 25922.1158933753@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk> writes:
> The main problem I can see with keeping the CTEs outside the rangetable
> is that according to the source, jointree nodes must currently have
> RANGETBLREF nodes as leaf nodes; as I understand it, your suggestion of
> maintaining the CTEs separately would involve something along the lines
> of keeping a separate CTETable and creating some form of CTETBLREF node
> that could be referenced within the jointree.

No, what I'm thinking is that a *reference* to a CTE, from within the
main query's FROM list, would create a "CTERef" RTE and then you'd have
a normal RANGETBLREF node linking to that in the jointree. This solves
the problem of where do you put the alias: on the RTE. What's not clear
to me at this point is whether there can be multiple references in a
query to the same CTE --- if there can, I suspect you must have a data
structure like this.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2007-02-09 01:49:46
Message-ID: 200702090149.l191nkD06558@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Who is working on this item?

---------------------------------------------------------------------------

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Mon, Sep 04, 2006 at 05:15:57PM +0100, Mark Cave-Ayland wrote:
> > 3) Add planner support so that WITH clauses are mapped to a new type of
> > node that utilises two tuplestores - an output tuplestore and a working
> > tuplestore. The output tuple store will in effect be the contents of the
> > table expression while the working tuplestore holds the results of the
> > last iteration if recursive. Also implement some kind of WithState node
> > which keeps track of the recursion state.
>
> That's basically what I came up with. Basically you have a sort of loop
> in the execution plan where tuples that come out are copied into a
> tuplestore and run through a particular part of the executor again. The
> top-down approach of the executor makes it a bit trickier...
>
> > Having spent some more time today looking at 1) and also at the SQL 2003
> > spec, it would seem that other databases offer the WITH clause within
> > subqueries and also as part of a view. I'd be interested to hear
> > feedback from other developers as to whether it would be possible to
> > achieve this level of support within PostgreSQL.
>
> Absolutly possible. The question is how much work :)
>
> Incidently, if you find a way to support common subplans (where a part
> of the executor is shared between two executions) that might provide a
> way to solve some of the trickier multiple evaluation problems with
> rules. Again, it would just be a tuplestore the stored the results for
> multiple executions.
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to litigate.
-- End of PGP section, PGP failed!

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2007-02-09 14:11:23
Message-ID: 1171030283.8404.6.camel@mca-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 2007-02-08 at 20:49 -0500, Bruce Momjian wrote:
> Who is working on this item?

Jonah was trying to complete this for 8.3, but I believe that he has
handed it onto Gregory Stark - I think
http://archives.postgresql.org/pgsql-hackers/2007-01/msg01586.php is the
latest update.

Kind regards,

Mark.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Mark Cave-Ayland" <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hierarchical Queries--Status
Date: 2007-02-09 16:39:03
Message-ID: 87ejoze15k.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Mark Cave-Ayland" <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk> writes:

> On Thu, 2007-02-08 at 20:49 -0500, Bruce Momjian wrote:
>> Who is working on this item?
>
> Jonah was trying to complete this for 8.3, but I believe that he has
> handed it onto Gregory Stark - I think
> http://archives.postgresql.org/pgsql-hackers/2007-01/msg01586.php is the
> latest update.

There's also

http://archives.postgresql.org/pgsql-patches/2007-02/msg00086.php

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com