WITH RECURSIVE patch V0.1

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: WITH RECURSIVE patch V0.1
Date: 2008-05-18 11:51:29
Message-ID: 20080518.205129.86993930.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

WITH RECURSIVE patch V0.1

Here are patches to implement WITH RECURSIVE clause. There are some
limitiations and TODO items(see the "Current limitations" section
below). Comments are welcome.

1. Credit

These patches were developed by Yoshiyuki Asaba (y-asab(at)sraoss(dot)co(dot)jp)
with some discussions with Tatsuo Ishii (ishii(at)sraoss(dot)co(dot)jp).

- prior patches:

http://archives.postgresql.org/pgsql-patches/2008-03/msg00327.php

The patches include some extentions to the patches above.

2. Design proposales

See: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00642.php

3. Implementation details

- Parsing recursive queries

We convert WITH clause to following structure in analyzer.

/*
* WithClause -
* reprensentation of WITH clause
*/
typedef struct WithClause
{
NodeTag type;
List *subquery; /* query list */
bool recursive; /* true = WITH RECURSIVE */
} WithClause;

If recursive is false, WITH clause is converted to a subquery.

Next we call transformWithClause() to check if the query defined in
WITH RECURSIVE clause is a liner recursive query.

transformWithClause() does followings:

1) Detect query which is not allowed in SQL standard

2) Create a name dependency graphs

transoform* functions checks if target tables actually exist and throw
an error if tables do not exist.

For example in following query:

WITH RECURSIVE x AS (SELECT * from y),
y AS (SELECT * FROM pg_class)
SELECT * FROM x;

x's definition referes to y and y is defined after x's definition. If
we evaluate x and y in the order of above, an error will occur while
evaluating x since y is not defined yet.

To avoid the problem we create new function:

void checkWellFormedCte(ParseState *pstate, WithClause *withClause);

which performs topological sort on graphs.

Non recursive term will be the start of the graph, checks type
checking one by one.

names appear in WITH RECURSIVE clause form a name dependency
graph. The nodes in the graph is represented by CteNode type:

typedef struct CteNode;
struct CteNode
{
Alias *alias;
CteList *dependency; /* if NULL the node does not call other node */
bool self_reference;
bool non_recursive;
} CteNode;

If there is loop in the dependency graph, a matual recursion exists
and an error occurs (error code 0A00).
* Note that this part is not implemented in the V1.0 patches yet.

Tables defined in WITH RECURSIVE clause is identified as RTE_RECURSIVE.
A name space is added to p_recursive_namespace in ParseState
structure. The added item's structure is following:

typedef struct RangeRecursive
{
NodeTag type;
Node *subquery; /* whole subquery */
Alias *alias; /* table alias & optional column aliases */
List *coldeflist; /* list of ColumnDef nodes to describe result
* of function returning RECORD */
Node *non_recursive_term; /* anaylze result of non recursive term */
bool recursive; /* reserved */
} RangeRecursive;

non_recursive_term keeps the result of analyze on non recursive term.
Using this, the type of recursive query is inferenced.

- Generating a plan

New plan node "Recursion" and "Recursive scan" is added.
Recursion scan is a plan which returns WT. Recursion is a plan which
execute recursive query. The subplan of Recursion is always "Append".

Below is an example EXPLAIN output.

EXPLAIN WITH RECURSIVE subdepartment AS
(
-- non recursive term
SELECT * FROM department WHERE name = 'A'

UNION ALL

-- recursive term
SELECT d.* FROM department AS d, subdepartment
WHERE d.parent_department = subdepartment.id
)
SELECT * FROM subdepartment;

QUERY PLAN
-------------------------------------------------------------------------------------------
Recursion on subdepartment (cost=0.00..50.76 rows=12 width=40)
-> Append (cost=0.00..50.64 rows=12 width=40)
-> Seq Scan on department (cost=0.00..24.50 rows=6 width=40) <-- non recusrive term
Filter: (name = 'A'::text)
-> Hash Join (cost=0.01..26.02 rows=6 width=40) <-- recursive term
Hash Cond: (d.parent_department = subdepartment.id)
-> Seq Scan on department d (cost=0.00..21.60 rows=1160 width=40)
-> Hash (cost=0.00..0.00 rows=1 width=4)
-> Recursive Scan on subdepartment (cost=0.00..0.00 rows=1 width=4)

- Executor

new files nodeRecursion.c and nodeRecursivescan.c are added.

The structure which represents the Recursion plan is as follows:

typedef struct Recursion
{
Scan scan;
Plan *subplan;
List *subrtable; /* temporary workspace for planner */
} Recursion;

typedef struct RecursionState
{
ScanState ss; /* its first field is NodeTag */
PlanState *subplan;
bool recursive_empty; /* if recursive term does exist, true */
Tuplestorestate *intermediate_tuplestorestate; /* intermediate table */
bool init_done; /* initialization flag */
} RecursiveState;

Following function are defined.

TupleTableSlot *
ExecRecursion(RecursionState *)
{
...
if (non recursive term)
{
slot = ExecProcNode(non recurisve term)
tuplestore_puttuple(WT, slot);
return slot;
}

/* calculate recursive term */
slot = ExecProcNode(recursive term)
if (!TupIsNull(slot)
{
slot = ExecProcNode(Nested Loop);
if (slot is empty)
swap(intermediate_tuplestorestate, WT)
else
tuplestore_puttuple(intermediate_tuplestorestate, slot)
}
...
}

- Executing RecursiveScan

Returns tuples store in a work table. Work table is stored in Estate structure.

typedef struct EState
{
...
Tuplestorestate *es_tuplestorestate; /* Stuff used for recursive query */
...
} EState;

RecursiveScan and RecursiveScanState structures.

typedef struct RecursiveScan
{
Plan *plan;
Plan *subplan;
} RecursiveScan;

typedef struct RecursiveScanState
{
ScanState ss;
TupleDesc tupdesc;
bool done_init;
} RecursiveScanState;

RecursiveScanState initialized RecursionState the initialize the type
information. Since Recursive plan is an upper plan of RecursiveScan,
initialization should be delayed. For this purpose we use done_init
flag.

Scan functions:

static TupleTableSlot *
RecursivescanNext(RecursiveScanState *node)
{
...
if (tuplestore_gettupleslot(node->ss.ps->state->es_tuplestorestate, true, slot))
return slot;
...
}

TupleTableSlot *
ExecRecursiveScan(RecursiveScanState *)
{
/*
* Delay initializing type info because type inference did not do
* in ExecInitRecursiveScan.
*/
if (!node->init_done)
{
ExecAssignScanType(&node->ss,
node->ss.ps.state->es_rscan_tupledesc);

/*
* Initialize result tuple type and projection info.
*/
ExecAssignResultTypeFromTL(&node->ss.ps);
ExecAssignScanProjectionInfo(&node->ss);
node->init_done = true;
}
return ExecScan(&node->ss, (ExecScanAccessMtd) RecursivescanNext);
}

4. New source codes:

- src/backend/parser/parse_cte.c
parse CTE (not completed)

- src/backend/executor/nodeRecursion.c
- src/include/nodeRecursion.h
Execute Recursion plan

- src/backend/executor/nodeRecursivescan.c
- src/include/nodeRecursivescan.h
Execute Recursive scan plan

5. Limitations and TODO

1) Only the last SELECT of UNION ALL can include self recursion name.

2) Cost of Recursion and Recursivescan plan are always 0.

3) Some recursion type checking is not complete.

4) Outer join for recursive name and tables does not work.

5) Need regression test cases.

6) Documentation needed.

6. Executing WITH RECURSIVE cluase examples

DROP TABLE IF EXISTS department;
DROP TABLE
CREATE TABLE department (
id INT PRIMARY KEY, -- department ID
parent_department INT REFERENCES department, -- upper department ID
name TEXT -- department name
);
psql:recursive-e.sql:6: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "department_pkey" for table "department"
CREATE TABLE
INSERT INTO department VALUES (0, NULL, 'ROOT');
INSERT 0 1
INSERT INTO department VALUES (1, 0, 'A');
INSERT 0 1
INSERT INTO department VALUES (2, 1, 'B');
INSERT 0 1
INSERT INTO department VALUES (3, 2, 'C');
INSERT 0 1
INSERT INTO department VALUES (4, 2, 'D');
INSERT 0 1
INSERT INTO department VALUES (5, 0, 'E');
INSERT 0 1
INSERT INTO department VALUES (6, 4, 'F');
INSERT 0 1
INSERT INTO department VALUES (7, 5, 'G');
INSERT 0 1
-- department structure represented here is as follows:
--
-- ROOT-+->A-+->B-+->C
-- | |
-- | +->D-+->F
-- +->E-+->G
EXPLAIN WITH RECURSIVE subdepartment AS
(
-- non recursive term
SELECT * FROM department WHERE name = 'A'
UNION ALL
-- recursive term
SELECT d.* FROM department AS d, subdepartment
WHERE d.parent_department = subdepartment.id
)
SELECT * FROM subdepartment;
QUERY PLAN
-------------------------------------------------------------------------------------------
Recursion on subdepartment (cost=0.00..50.76 rows=12 width=40)
-> Append (cost=0.00..50.64 rows=12 width=40)
-> Seq Scan on department (cost=0.00..24.50 rows=6 width=40)
Filter: (name = 'A'::text)
-> Hash Join (cost=0.01..26.02 rows=6 width=40)
Hash Cond: (d.parent_department = subdepartment.id)
-> Seq Scan on department d (cost=0.00..21.60 rows=1160 width=40)
-> Hash (cost=0.00..0.00 rows=1 width=4)
-> Recursive Scan on subdepartment (cost=0.00..0.00 rows=1 width=4)
(9 rows)

-- extract all departments under 'A'. Result should be A, B, C, D and F
WITH RECURSIVE subdepartment AS
(
-- non recursive term
SELECT * FROM department WHERE name = 'A'
UNION ALL
-- recursive term
SELECT d.* FROM department AS d, subdepartment
WHERE d.parent_department = subdepartment.id
)
SELECT * FROM subdepartment ORDER BY name;
id | parent_department | name
----+-------------------+------
1 | 0 | A
2 | 1 | B
3 | 2 | C
4 | 2 | D
6 | 4 | F
(5 rows)

-- extract all departments under 'E'. Result should be E and G.
WITH RECURSIVE subdepartment AS
(
-- non recursive term
SELECT * FROM department WHERE name = 'E'
UNION ALL
-- recursive term
SELECT d.* FROM department AS d, subdepartment
WHERE d.parent_department = subdepartment.id
)
SELECT * FROM subdepartment ORDER BY name;
id | parent_department | name
----+-------------------+------
5 | 0 | E
7 | 5 | G
(2 rows)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Attachment Content-Type Size
recursive_query.patch.gz application/octet-stream 21.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-05-18 12:38:12 Re: odd output in restore mode
Previous Message Greg Smith 2008-05-18 07:18:13 Re: New DTrace probes proposal

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2008-05-18 12:38:12 Re: odd output in restore mode
Previous Message Andrew Dunstan 2008-05-18 06:53:33 Re: [HACKERS] use of pager on Windows psql