Writeable CTEs and side effects

Lists: pgsql-hackers
From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Writeable CTEs and side effects
Date: 2009-10-07 21:08:03
Message-ID: 4ACD0333.3090803@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've made progress in implementing writeable CTEs (repo at
git://git.postgresql.org/git/writeable_cte.git , branch actually_write)
and I've hit a few corner-cases which have lead me to think that we
should be handling DML inside CTEs a bit differently. Before I go on
implementing this, I'd like to hear your input.

1) WITH t AS
(UPDATE foo SET bar = bar+1 RETURNING *)
SELECT * FROM t LIMIT 1;

What's problematic here is that only 1 row is read from the CTE, meaning
also that only one row is updated which, at least how I see it, is not
what we want. The CTE should only store one row and return that after
it has completely processed the UPDATE statement.

2) WITH t1 AS
(UPDATE foo SET bar=bar+1 RETURNING *),
t2 AS
(UPDATE foo SET bar=bar+1 RETURNING *)
SELECT * FROM t1
UNION ALL
SELECT * FROM t2;

This is probably not the most common scenario, but is still very
surprising if you for some reason happen to hit it. Both of the updates
actually have the same transaction ID and command ID, so the rows are
first updated by t1, but when t2 is processed, it looks at the rows and
thinks that it already updated them.

3) WITH t1 AS
(UPDATE foo SET bar=bar+1 RETURNING *),
t2 AS
(UPDATE baz SET bat=bat+1 RETURNING *)
VALUES (true);

This isn't probably the most common situation either, but I think it's
worth looking at; the user wants to update two different tables, but
ignore the RETURNING data completely. On IRC, this has been requested
multiple times. Even if we wouldn't agree that this feature is useful,
it pretty much follows the semantics of example #1.

Trying to tackle all of these at once, I've come up with this kind of
execution strategy:

Before starting the execution of the main plan tree, for every CTE which
is a DML query, do the following:

1) Get a new CID
2a) If there are no references to the CTE (example #3), run the DML
query to the end but ignore the results of the RETURNING query,
or
2b) If there are references, run the DML query to the end but store
either as many as rows as you need to to answer the outer query (example
#1) or if we can't determine the number of rows we need (most cases,
example #2) run the query and store all of its results.

Then, if required, get a new CID for the main execution tree and execute
it using the data we now have inside the CTEs. This way we can avoid
storing useless rows in memory without unexpected behaviour and caveats.

Regards,
Marko Tiikkaja


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-07 21:16:20
Message-ID: 3073cc9b0910071416r1413fb55vc536cf98a2e5033f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 7, 2009 at 4:08 PM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>
> 1)      WITH t AS
>        (UPDATE foo SET bar = bar+1 RETURNING *)
>        SELECT * FROM t LIMIT 1;
>
> What's problematic here is that only 1 row is read from the CTE, meaning
> also that only one row is updated which, at least how I see it, is not
> what we want.  The CTE should only store one row and return that after
> it has completely processed the UPDATE statement.
>

i'm talking from my ignorance... but AFAIU this should process the
whole UPDATE and then form a WorkTable with the results...
what the select reads from that results is another thing...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-07 21:20:20
Message-ID: 4ACD0614.30800@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova wrote:
> On Wed, Oct 7, 2009 at 4:08 PM, Marko Tiikkaja
> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>> 1) WITH t AS
>> (UPDATE foo SET bar = bar+1 RETURNING *)
>> SELECT * FROM t LIMIT 1;
>>
>> What's problematic here is that only 1 row is read from the CTE, meaning
>> also that only one row is updated which, at least how I see it, is not
>> what we want. The CTE should only store one row and return that after
>> it has completely processed the UPDATE statement.
>>
>
> i'm talking from my ignorance... but AFAIU this should process the
> whole UPDATE and then form a WorkTable with the results...
> what the select reads from that results is another thing...
>

Right. This is exactly what I'm trying to do, except I think we could
easily optimize this case and store only the first processed row inside
the CTE.

Regards,
Marko Tiikkaja


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-07 21:22:21
Message-ID: 3073cc9b0910071422x24ba862ape0e6212d096e11c2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 7, 2009 at 4:20 PM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
> Jaime Casanova wrote:
>>
>> On Wed, Oct 7, 2009 at 4:08 PM, Marko Tiikkaja
>> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>>>
>>> 1)      WITH t AS
>>>       (UPDATE foo SET bar = bar+1 RETURNING *)
>>>       SELECT * FROM t LIMIT 1;
>>>
>>> What's problematic here is that only 1 row is read from the CTE, meaning
>>> also that only one row is updated which, at least how I see it, is not
>>> what we want.  The CTE should only store one row and return that after
>>> it has completely processed the UPDATE statement.
>>>
>>
>> i'm talking from my ignorance... but AFAIU this should process the
>> whole UPDATE and then form a WorkTable with the results...
>> what the select reads from that results is another thing...
>>
>
> Right.  This is exactly what I'm trying to do, except I think we could
> easily optimize this case and store only the first processed row inside
> the CTE.
>

why? we don't should be thinking in optimizations in this phase

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-07 21:27:47
Message-ID: 4ACD07D3.6020700@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova wrote:
> On Wed, Oct 7, 2009 at 4:20 PM, Marko Tiikkaja
> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>> Right. This is exactly what I'm trying to do, except I think we could
>> easily optimize this case and store only the first processed row inside
>> the CTE.
>>
>
> why? we don't should be thinking in optimizations in this phase
>

Maybe you're right. This probably isn't a very common case any way.

Regards,
Marko Tiikkaja


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 10:52:41
Message-ID: 603c8f070910080352y361742dg51dd90a3b4c8e146@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 7, 2009 at 5:08 PM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
> I've made progress in implementing writeable CTEs (repo at
> git://git.postgresql.org/git/writeable_cte.git , branch actually_write)
> and I've hit a few corner-cases which have lead me to think that we
> should be handling DML inside CTEs a bit differently.  Before I go on
> implementing this, I'd like to hear your input.
>
> 1)      WITH t AS
>        (UPDATE foo SET bar = bar+1 RETURNING *)
>        SELECT * FROM t LIMIT 1;
>
> What's problematic here is that only 1 row is read from the CTE, meaning
> also that only one row is updated which, at least how I see it, is not
> what we want.  The CTE should only store one row and return that after
> it has completely processed the UPDATE statement.

I agree.

> 2)      WITH t1 AS
>        (UPDATE foo SET bar=bar+1 RETURNING *),
>        t2 AS
>        (UPDATE foo SET bar=bar+1 RETURNING *)
>        SELECT * FROM t1
>        UNION ALL
>        SELECT * FROM t2;
>
> This is probably not the most common scenario, but is still very
> surprising if you for some reason happen to hit it.  Both of the updates
> actually have the same transaction ID and command ID, so the rows are
> first updated by t1, but when t2 is processed, it looks at the rows and
> thinks that it already updated them.

Yeah, that sucks, although maybe we could live with it. We do in
other contexts, e.g. triggers.

> 3)      WITH t1 AS
>        (UPDATE foo SET bar=bar+1 RETURNING *),
>        t2 AS
>        (UPDATE baz SET bat=bat+1 RETURNING *)
>        VALUES (true);
>
> This isn't probably the most common situation either, but I think it's
> worth looking at; the user wants to update two different tables, but
> ignore the RETURNING data completely.  On IRC, this has been requested
> multiple times.  Even if we wouldn't agree that this feature is useful,
> it pretty much follows the semantics of example #1.

Right.

> Trying to tackle all of these at once, I've come up with this kind of
> execution strategy:
>
> Before starting the execution of the main plan tree, for every CTE which
> is a DML query, do the following:
>
>  1) Get a new CID
>  2a) If there are no references to the CTE (example #3), run the DML
> query to the end but ignore the results of the RETURNING query,
>  or
>  2b) If there are references, run the DML query to the end but store
> either as many as rows as you need to to answer the outer query (example
> #1) or if we can't determine the number of rows we need (most cases,
> example #2) run the query and store all of its results.
>
> Then, if required, get a new CID for the main execution tree and execute
> it using the data we now have inside the CTEs.  This way we can avoid
> storing useless rows in memory without unexpected behaviour and caveats.

This has one MAJOR disadvantage: all the tuples from the CTE queries
have to be buffered someplace, rather than streamed from the subquery
up to the main query. For what may turn out to be pretty common uses
cases like WITH tuples AS (DELETE FROM big_table_1 RETURNING ...)
INSERT INTO big_table_2 ... this is going to suck pretty bad. I
wonder if it isn't better to just use the same command-ID throughout
and live with the weirdness of #2.

I think you should definitely get some input from Tom Lane on this
before you go too much further, but if he doesn't respond to this
thread, I suggest trying again after CommitFest.

Does this have any impact on the pending DML-node patch?

...Robert


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 11:02:41
Message-ID: 4ACDC6D1.7000703@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> This has one MAJOR disadvantage: all the tuples from the CTE queries
> have to be buffered someplace, rather than streamed from the subquery
> up to the main query. For what may turn out to be pretty common uses
> cases like WITH tuples AS (DELETE FROM big_table_1 RETURNING ...)
> INSERT INTO big_table_2 ... this is going to suck pretty bad. I
> wonder if it isn't better to just use the same command-ID throughout
> and live with the weirdness of #2.

I haven't looked at the CTE code in much detail but I was under the
impression that it had to store the results somewhere in any case.
You're right, though, it sucks for this use case. Weirdness of #2 is
probably a lot easier to live with.

> I think you should definitely get some input from Tom Lane on this
> before you go too much further, but if he doesn't respond to this
> thread, I suggest trying again after CommitFest.

Agreed.

> Does this have any impact on the pending DML-node patch?

Not really. This could be done without the patch, but we can use far
more of the existing CTE code with the patch.

Regards,
Marko Tiikkaja


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 12:40:10
Message-ID: 4ACDDDAA.20808@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> 3) WITH t1 AS
> (UPDATE foo SET bar=bar+1 RETURNING *),
> t2 AS
> (UPDATE baz SET bat=bat+1 RETURNING *)
> VALUES (true);
>
> This isn't probably the most common situation either, but I think it's
> worth looking at; the user wants to update two different tables, but
> ignore the RETURNING data completely. On IRC, this has been requested
> multiple times. Even if we wouldn't agree that this feature is useful,
> it pretty much follows the semantics of example #1.

A bit off topic, but IMHO if we allow this case, we could allow
INSERT/UPDATE/DELETE without a RETURNING clause if the CTE isn't
referenced, so you could write this as:

WITH t1 AS
(UPDATE foo SET bar=bar+1),
t2 AS
(UPDATE baz SET bar=bar+1)
VALUES(true);

Regards,
Marko Tiikkaja


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 15:54:08
Message-ID: b42b73150910080854u38547aa1u4c5ee9b85596102e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 8, 2009 at 6:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> up to the main query.  For what may turn out to be pretty common uses
> cases like WITH tuples AS (DELETE FROM big_table_1 RETURNING ...)
> INSERT INTO big_table_2 ... this is going to suck pretty bad.  I

Is the above form:

with x as (delete .. returning *) insert into y select * from x

going to be allowed? I was informed on irc that it wasn't...it would
have to be written as:

insert into y with x as (delete .. returning *) select * from x

IOW, will WITH be able to terminate in update/insert/delete and not just select?

merlin


From: David Fetter <david(at)fetter(dot)org>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 16:15:47
Message-ID: 20091008161547.GE14810@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 08, 2009 at 11:54:08AM -0400, Merlin Moncure wrote:
> On Thu, Oct 8, 2009 at 6:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > up to the main query.  For what may turn out to be pretty common uses
> > cases like WITH tuples AS (DELETE FROM big_table_1 RETURNING ...)
> > INSERT INTO big_table_2 ... this is going to suck pretty bad.  I
>
> Is the above form:
>
> with x as (delete .. returning *) insert into y select * from x
>
> going to be allowed? I was informed on irc that it wasn't...it would
> have to be written as:
>
> insert into y with x as (delete .. returning *) select * from x
>
> IOW, will WITH be able to terminate in update/insert/delete and not just select?

Having INSERT/UPDATE/DELETE after the CTE definition was part of my
original idea for the feature. To keep changes as small as possible,
would it be OK to have the more verbose equivalent to start?

Or have you come up with a case where the more verbose one just
doesn't work?

Cheers,
David (not going into the RECURSIVE version just yet ;)
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 16:30:36
Message-ID: 25734.1255019436@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Thu, Oct 8, 2009 at 6:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> up to the main query. For what may turn out to be pretty common uses
>> cases like WITH tuples AS (DELETE FROM big_table_1 RETURNING ...)
>> INSERT INTO big_table_2 ... this is going to suck pretty bad. I

> Is the above form:

> with x as (delete .. returning *) insert into y select * from x

> going to be allowed? I was informed on irc that it wasn't...it would
> have to be written as:

> insert into y with x as (delete .. returning *) select * from x

I would think that we would require the former and forbid the latter.
One of the basic limitations of the feature is going to be that you
can only have WITH (something RETURNING) at the top level, and the
latter syntax doesn't look like that to me.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 16:57:34
Message-ID: 26185.1255021054@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> Robert Haas wrote:
>> This has one MAJOR disadvantage: all the tuples from the CTE queries
>> have to be buffered someplace, rather than streamed from the subquery
>> up to the main query. For what may turn out to be pretty common uses
>> cases like WITH tuples AS (DELETE FROM big_table_1 RETURNING ...)
>> INSERT INTO big_table_2 ... this is going to suck pretty bad. I
>> wonder if it isn't better to just use the same command-ID throughout
>> and live with the weirdness of #2.

> I haven't looked at the CTE code in much detail but I was under the
> impression that it had to store the results somewhere in any case.
> You're right, though, it sucks for this use case. Weirdness of #2 is
> probably a lot easier to live with.

>> I think you should definitely get some input from Tom Lane on this
>> before you go too much further, but if he doesn't respond to this
>> thread, I suggest trying again after CommitFest.

> Agreed.

Well, I don't know that I have the controlling vote here, but to me
predictable results come first and implementation efficiency is second.
I think the semantics should definitely be that each WITH RETURNING
query is executed, to completion, exactly once, and then you do the
main query. If you end up dumping lots of useless rows into a
tuplestore, so be it. I doubt that this is a major optimization issue
anyway, because I don't believe that it would be common for the main
query to not look at all/most of the result rows.

I also agree with bumping the CID in between. Consider for example that
the main query contains a function call and the function looks at the
table modified by the WITH RETURNING. What state should it see? If you
don't run the WITH query to completion and then bump CID, the answer
will be full of odd implementation details (including whether some
unrelated function caused a CID bump first).

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 18:57:13
Message-ID: 1255028233.16369.100.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-10-08 at 12:57 -0400, Tom Lane wrote:
> I also agree with bumping the CID in between.

Do you mean bump the CID in between each DML statement, or between the
last DML statement and the main query? If the former, how should we
choose the order of execution?

I'm not sure if this is a problem, but it seems like we're essentially
allowing a complex transaction to take place in one statement. Is that
what we want?

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 19:11:59
Message-ID: 6537.1255029119@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Thu, 2009-10-08 at 12:57 -0400, Tom Lane wrote:
>> I also agree with bumping the CID in between.

> Do you mean bump the CID in between each DML statement, or between the
> last DML statement and the main query? If the former, how should we
> choose the order of execution?

I would suggest bumping the CID after each WITH RETURNING query is
complete.

The order-of-execution issue is a good point, but is there a reason
not to do them in the order the WITH clauses appear syntactically?

One potentially interesting issue is if RETURNING withs are mixed
with plain-SELECT withs. If we stick with the current plan then
the plain-SELECTs would execute after the RETURNINGs, which would
make things apparently not syntax-order. An easy answer to that
one is to insist that all the RETURNING queries appear first.

> I'm not sure if this is a problem, but it seems like we're essentially
> allowing a complex transaction to take place in one statement. Is that
> what we want?

Yeah, I think that's more or less the point ...

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 19:34:22
Message-ID: 1255030462.16369.119.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-10-08 at 15:11 -0400, Tom Lane wrote:
> > I'm not sure if this is a problem, but it seems like we're essentially
> > allowing a complex transaction to take place in one statement. Is that
> > what we want?
>
> Yeah, I think that's more or less the point ...

I'm still trying to ponder the consequences of this. Most people assume
that a single statement means that everything in the statement happens
at once (intuitively). The few cases where that's not true are special
commands or things that we are trying to fix, like:
"UPDATE foo SET a = a + 1".

I get the feeling that we're turning a declarative statement into
something more procedural.

I suppose one difference between this and a BEGIN ... END block would be
that the isolation from other transactions would always be SERIALIZABLE.

I can't clearly articulate a problem with any of these things, but it
does seem vaguely troubling.

Also, are we missing out on an opportunity to provide some interesting
functionality if we do treat two DML statements as happening
simultaneously? I've read some interesting perspectives on this in the
past, and it's not trivial, but we might want to leave the possibility
open.

Regards,
Jeff Davis


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-08 23:23:43
Message-ID: 1255044223.13157.11.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2009-10-08 at 12:34 -0700, Jeff Davis wrote:
> I'm still trying to ponder the consequences of this. Most people
> assume
> that a single statement means that everything in the statement happens
> at once (intuitively). The few cases where that's not true are special
> commands or things that we are trying to fix, like:
> "UPDATE foo SET a = a + 1".

Well, the classical case of

INSERT INTO tab1 SELECT ... FROM tab1

clearly requires the SELECT to be distinctly before the INSERT.

Basically, this would not do it the other way around: write first, then
select.

I'm not sure why it needs to be tied in with CTEs, though. Why couldn't
this work:

SELECT * FROM test1 WHERE a IN (UPDATE test2 SET b = b + 1 RETURNING b);

I think I'd want "writable subqueries" instead of only "writable CTEs".


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-09 00:06:58
Message-ID: 1255046818.16369.174.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-10-09 at 02:23 +0300, Peter Eisentraut wrote:
> INSERT INTO tab1 SELECT ... FROM tab1
>
> clearly requires the SELECT to be distinctly before the INSERT.

That's effectively only one thing: assigning a relation (the result of
the select) to a variable (tab1). I was talking about multiple
assignment.

What if you want to append foo to bar and bar to foo?

WITH
t1 AS (INSERT INTO foo SELECT * FROM bar),
t2 AS (INSERT INTO bar SELECT * FROM foo)
VALUES(1);

That could be an interesting command if we didn't increment the command
counter.

> SELECT * FROM test1 WHERE a IN (UPDATE test2 SET b = b + 1 RETURNING b);
>
> I think I'd want "writable subqueries" instead of only "writable CTEs".

I think the original motivation was that it's more clear that a CTE is
separated and can only be executed once (if it has side effects).
Depending on how the query is written, it might be less obvious how many
times the subquery should be executed, and it might change based on the
plan.

We could make the same rules for a subquery that has side effects, and
always materialize it. But for now maybe CTEs are a better place to get
the feature working.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-09 13:35:25
Message-ID: 6804.1255095325@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Fri, 2009-10-09 at 02:23 +0300, Peter Eisentraut wrote:
>> I think I'd want "writable subqueries" instead of only "writable CTEs".

> I think the original motivation was that it's more clear that a CTE is
> separated and can only be executed once (if it has side effects).
> Depending on how the query is written, it might be less obvious how many
> times the subquery should be executed, and it might change based on the
> plan.

Right. The behavior would be entirely unpredictable, and usually
undesirable, if the RETURNING query is underneath a join, or an
aggregate, or a LIMIT, yadda yadda. Tying it to WITH provides a
convenient way, from both the user-visible and implementation sides,
of saying "this is an independent query that we will execute once
and then make the RETURNING results available for use in this other
query".

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-09 20:32:09
Message-ID: 200910092032.n99KW9a19255@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

Allow INSERT/UPDATE/DELETE ... RETURNING in common table expressions

* http://archives.postgresql.org/pgsql-hackers/2009-10/msg00472.php

---------------------------------------------------------------------------

Marko Tiikkaja wrote:
> I've made progress in implementing writeable CTEs (repo at
> git://git.postgresql.org/git/writeable_cte.git , branch actually_write)
> and I've hit a few corner-cases which have lead me to think that we
> should be handling DML inside CTEs a bit differently. Before I go on
> implementing this, I'd like to hear your input.
>
> 1) WITH t AS
> (UPDATE foo SET bar = bar+1 RETURNING *)
> SELECT * FROM t LIMIT 1;
>
> What's problematic here is that only 1 row is read from the CTE, meaning
> also that only one row is updated which, at least how I see it, is not
> what we want. The CTE should only store one row and return that after
> it has completely processed the UPDATE statement.
>
> 2) WITH t1 AS
> (UPDATE foo SET bar=bar+1 RETURNING *),
> t2 AS
> (UPDATE foo SET bar=bar+1 RETURNING *)
> SELECT * FROM t1
> UNION ALL
> SELECT * FROM t2;
>
> This is probably not the most common scenario, but is still very
> surprising if you for some reason happen to hit it. Both of the updates
> actually have the same transaction ID and command ID, so the rows are
> first updated by t1, but when t2 is processed, it looks at the rows and
> thinks that it already updated them.
>
> 3) WITH t1 AS
> (UPDATE foo SET bar=bar+1 RETURNING *),
> t2 AS
> (UPDATE baz SET bat=bat+1 RETURNING *)
> VALUES (true);
>
> This isn't probably the most common situation either, but I think it's
> worth looking at; the user wants to update two different tables, but
> ignore the RETURNING data completely. On IRC, this has been requested
> multiple times. Even if we wouldn't agree that this feature is useful,
> it pretty much follows the semantics of example #1.
>
>
> Trying to tackle all of these at once, I've come up with this kind of
> execution strategy:
>
> Before starting the execution of the main plan tree, for every CTE which
> is a DML query, do the following:
>
> 1) Get a new CID
> 2a) If there are no references to the CTE (example #3), run the DML
> query to the end but ignore the results of the RETURNING query,
> or
> 2b) If there are references, run the DML query to the end but store
> either as many as rows as you need to to answer the outer query (example
> #1) or if we can't determine the number of rows we need (most cases,
> example #2) run the query and store all of its results.
>
> Then, if required, get a new CID for the main execution tree and execute
> it using the data we now have inside the CTEs. This way we can avoid
> storing useless rows in memory without unexpected behaviour and caveats.
>
>
> Regards,
> Marko Tiikkaja
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-19 14:29:45
Message-ID: 4ADC77D9.5070303@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> Is the above form:
>> with x as (delete .. returning *) insert into y select * from x
>> going to be allowed? I was informed on irc that it wasn't...it would
>> have to be written as:
>> insert into y with x as (delete .. returning *) select * from x
>
> I would think that we would require the former and forbid the latter.
> One of the basic limitations of the feature is going to be that you
> can only have WITH (something RETURNING) at the top level, and the
> latter syntax doesn't look like that to me.

I'm looking at this, and if I understood correctly, you're suggesting
we'd add a WithClause to InsertStmt. Would we also allow this?

WITH t1 AS (DELETE FROM foo RETURNING *)
INSERT INTO bar
WITH t2 AS (VALUES(0))
SELECT * FROM t1 UNION ALL
SELECT * FROM t2;

I could also see use for adding this for UDPATE and DELETE too, i.e.

WITH t AS (DELETE FROM foo RETURNING id)
UPDATE bar SET foo_id = NULL FROM t WHERE t.id = bar.foo_id;

Did I misunderstand something here?

Regards,
Marko Tiikkaja


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Writeable CTEs and side effects
Date: 2009-10-19 15:50:17
Message-ID: 6184.1255967417@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> I'm looking at this, and if I understood correctly, you're suggesting
> we'd add a WithClause to InsertStmt. Would we also allow this?

Yeah, we could eventually do all that. I think supporting it in SELECT
would be plenty to start with, though.

regards, tom lane