Writeable CTEs and side effects

From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Writeable CTEs and side effects
Date: 2009-10-07 21:08:03
Message-ID: 4ACD0333.3090803@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've made progress in implementing writeable CTEs (repo at
git://git.postgresql.org/git/writeable_cte.git , branch actually_write)
and I've hit a few corner-cases which have lead me to think that we
should be handling DML inside CTEs a bit differently. Before I go on
implementing this, I'd like to hear your input.

1) WITH t AS
(UPDATE foo SET bar = bar+1 RETURNING *)
SELECT * FROM t LIMIT 1;

What's problematic here is that only 1 row is read from the CTE, meaning
also that only one row is updated which, at least how I see it, is not
what we want. The CTE should only store one row and return that after
it has completely processed the UPDATE statement.

2) WITH t1 AS
(UPDATE foo SET bar=bar+1 RETURNING *),
t2 AS
(UPDATE foo SET bar=bar+1 RETURNING *)
SELECT * FROM t1
UNION ALL
SELECT * FROM t2;

This is probably not the most common scenario, but is still very
surprising if you for some reason happen to hit it. Both of the updates
actually have the same transaction ID and command ID, so the rows are
first updated by t1, but when t2 is processed, it looks at the rows and
thinks that it already updated them.

3) WITH t1 AS
(UPDATE foo SET bar=bar+1 RETURNING *),
t2 AS
(UPDATE baz SET bat=bat+1 RETURNING *)
VALUES (true);

This isn't probably the most common situation either, but I think it's
worth looking at; the user wants to update two different tables, but
ignore the RETURNING data completely. On IRC, this has been requested
multiple times. Even if we wouldn't agree that this feature is useful,
it pretty much follows the semantics of example #1.

Trying to tackle all of these at once, I've come up with this kind of
execution strategy:

Before starting the execution of the main plan tree, for every CTE which
is a DML query, do the following:

1) Get a new CID
2a) If there are no references to the CTE (example #3), run the DML
query to the end but ignore the results of the RETURNING query,
or
2b) If there are references, run the DML query to the end but store
either as many as rows as you need to to answer the outer query (example
#1) or if we can't determine the number of rows we need (most cases,
example #2) run the query and store all of its results.

Then, if required, get a new CID for the main execution tree and execute
it using the data we now have inside the CTEs. This way we can avoid
storing useless rows in memory without unexpected behaviour and caveats.

Regards,
Marko Tiikkaja

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2009-10-07 21:16:20 Re: Writeable CTEs and side effects
Previous Message Tom Lane 2009-10-07 20:58:18 Re: Issues for named/mixed function notation patch