Re: Writeable CTEs and empty relations

Lists: pgsql-hackers
From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Writeable CTEs and empty relations
Date: 2010-02-08 19:30:03
Message-ID: 4B70663B.1020409@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

While playing around with another issue with the patch, I came across
the following:

=> create table foo(a int);
CREATE TABLE
=> with t as (insert into foo values(0)) select * from foo;
a
---
(0 rows)

I traced this down to heapam.c, which has this:

/*
* return null immediately if relation is empty
*/
if (scan->rs_nblocks == 0)
{
Assert(!BufferIsValid(scan->rs_cbuf));
tuple->t_data = NULL;
return;
}

and

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/
scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_rd);

This doesn't exactly work anymore since we modify the snapshot after
calling ExecInitScan(). I'm not really familiar with this part of the
code, so I'm asking: is there a simple enough way around this? Would
updating scan->rs_nblocks before scanning the first tuple be OK?

Regards,
Marko Tiikkaja


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-08 23:09:19
Message-ID: 910.1265670559@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> I traced this down to heapam.c, which has this:
> ...
> This doesn't exactly work anymore since we modify the snapshot after
> calling ExecInitScan().

So don't do that. The executor is generally entitled to assume that
parameters given to ExecutorStart are correct. In particular, changing
the snapshot after the query has started to run seems to me to ensure
all sorts of inconsistent and undesirable behavior.

regards, tom lane


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-09 05:32:44
Message-ID: 4B70F37C.2000808@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-09 01:09 +0200, Tom Lane wrote:
> Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>> I traced this down to heapam.c, which has this:
>> ...
>> This doesn't exactly work anymore since we modify the snapshot after
>> calling ExecInitScan().
>
> So don't do that. The executor is generally entitled to assume that
> parameters given to ExecutorStart are correct. In particular, changing
> the snapshot after the query has started to run seems to me to ensure
> all sorts of inconsistent and undesirable behavior.

You do remember that the whole patch in its current form depends on
modifying the snapshot?

Regards,
Marko Tiikkaja


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-09 22:49:07
Message-ID: 4B71E663.1020200@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-08 21:30 +0200, I wrote:
> This doesn't exactly work anymore since we modify the snapshot after
> calling ExecInitScan(). I'm not really familiar with this part of the
> code, so I'm asking: is there a simple enough way around this? Would
> updating scan->rs_nblocks before scanning the first tuple be OK?

I've looked at this some more, and the problem is a lot bigger than I
originally thought. We'd basically be forced to do another initscan()
before starting a new scan after the snapshot changed. One way to
accomplish this would be that ExecutePlan() would leave a flag in EState
whenever the scan nodes need to reinit.

Does this sound completely unacceptable?

Regards,
Marko Tiikkaja


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 00:19:56
Message-ID: 1328.1265761196@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> On 2010-02-08 21:30 +0200, I wrote:
>> This doesn't exactly work anymore since we modify the snapshot after
>> calling ExecInitScan(). I'm not really familiar with this part of the
>> code, so I'm asking: is there a simple enough way around this? Would
>> updating scan->rs_nblocks before scanning the first tuple be OK?

> I've looked at this some more, and the problem is a lot bigger than I
> originally thought. We'd basically be forced to do another initscan()
> before starting a new scan after the snapshot changed. One way to
> accomplish this would be that ExecutePlan() would leave a flag in EState
> whenever the scan nodes need to reinit.

> Does this sound completely unacceptable?

You still haven't explained why it's a good idea to change the snapshot
after the executor has started. Right at the moment I'm prepared to
reject the patch on that ground alone.

regards, tom lane


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 00:25:19
Message-ID: 4B71FCEF.8040208@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-10 02:19 +0200, Tom Lane wrote:
> Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>> Does this sound completely unacceptable?
>
> You still haven't explained why it's a good idea to change the snapshot
> after the executor has started. Right at the moment I'm prepared to
> reject the patch on that ground alone.

The patch only touches the snapshot's curcid. That's needed to allow
the queries see the tuples of the DML WITHs ran before that particular
query.

Regards,
Marko Tiikkaja


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 01:20:52
Message-ID: 1977.1265764852@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>> You still haven't explained why it's a good idea to change the snapshot
>> after the executor has started. Right at the moment I'm prepared to
>> reject the patch on that ground alone.

> The patch only touches the snapshot's curcid. That's needed to allow
> the queries see the tuples of the DML WITHs ran before that particular
> query.

... and they will also see, for example, tuples inserted by triggers
fired by those queries. I thought the plan for this feature was to
store and re-read the RETURNING data, not to go back and re-read the
underlying tables.

regards, tom lane


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 10:05:53
Message-ID: 4B728501.2080502@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-10 03:20 +0200, Tom Lane wrote:
> Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>>> You still haven't explained why it's a good idea to change the snapshot
>>> after the executor has started. Right at the moment I'm prepared to
>>> reject the patch on that ground alone.
>
>> The patch only touches the snapshot's curcid. That's needed to allow
>> the queries see the tuples of the DML WITHs ran before that particular
>> query.
>
> ... and they will also see, for example, tuples inserted by triggers
> fired by those queries. I thought the plan for this feature was to
> store and re-read the RETURNING data, not to go back and re-read the
> underlying tables.

I originally suggested this approach about four months ago. During this
whole time you haven't expressed any concerns about this, but suddenly
it's a reason to reject the patch?

Anyway, if we want to avoid modifying the snapshot, we can't bump the
CID between queries. I haven't thought this through, but it seems like
this could work. However, none of the WITH queries can see the previous
statement's tuples, which means that someone may be suprised when they
try to modify the same tuples twice just to discover that only the first
modification took place. I don't see this as a huge problem though.

This will also solve the problem I started this thread for and makes the
patch smaller, simpler and even a bit prettier. Unless someone sees a
problem with this approach, I'm going to make the change.

Regards,
Marko Tiikkaja


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 19:59:17
Message-ID: 603c8f071002101159p3132775eqc127457fd0c515b9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 10, 2010 at 5:05 AM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
> On 2010-02-10 03:20 +0200, Tom Lane wrote:
>> Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>>> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>>>> You still haven't explained why it's a good idea to change the snapshot
>>>> after the executor has started.  Right at the moment I'm prepared to
>>>> reject the patch on that ground alone.
>>
>>> The patch only touches the snapshot's curcid.  That's needed to allow
>>> the queries see the tuples of the DML WITHs ran before that particular
>>> query.
>>
>> ... and they will also see, for example, tuples inserted by triggers
>> fired by those queries.  I thought the plan for this feature was to
>> store and re-read the RETURNING data, not to go back and re-read the
>> underlying tables.
>
> I originally suggested this approach about four months ago.  During this
> whole time you haven't expressed any concerns about this, but suddenly
> it's a reason to reject the patch?
>
> Anyway, if we want to avoid modifying the snapshot, we can't bump the
> CID between queries.  I haven't thought this through, but it seems like
> this could work.  However, none of the WITH queries can see the previous
> statement's tuples, which means that someone may be suprised when they
> try to modify the same tuples twice just to discover that only the first
> modification took place.  I don't see this as a huge problem though.

I think that we agreed previously that running all the DML queries to
completion before the main query, bumping the CID after each one, and
saving the outputs, was the only workable approach.

http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php
http://archives.postgresql.org/pgsql-hackers/2009-10/msg00614.php

If the executor has buried in it the assumption that the snapshot
can't change after startup, then does that mean that we need to start
up and shut down the executor for each subquery?

http://archives.postgresql.org/pgsql-hackers/2009-11/msg01964.php

...Robert


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 20:24:01
Message-ID: 4B7315E1.3040902@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-10 21:59 +0200, Robert Haas wrote:
> On Wed, Feb 10, 2010 at 5:05 AM, Marko Tiikkaja
> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>> On 2010-02-10 03:20 +0200, Tom Lane wrote:
>>> Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>>>> On 2010-02-10 02:19 +0200, Tom Lane wrote:
>>>>> You still haven't explained why it's a good idea to change the snapshot
>>>>> after the executor has started. Right at the moment I'm prepared to
>>>>> reject the patch on that ground alone.
>>>
>>>> The patch only touches the snapshot's curcid. That's needed to allow
>>>> the queries see the tuples of the DML WITHs ran before that particular
>>>> query.
>>>
>>> ... and they will also see, for example, tuples inserted by triggers
>>> fired by those queries. I thought the plan for this feature was to
>>> store and re-read the RETURNING data, not to go back and re-read the
>>> underlying tables.
>>
>> I originally suggested this approach about four months ago. During this
>> whole time you haven't expressed any concerns about this, but suddenly
>> it's a reason to reject the patch?
>>
>> Anyway, if we want to avoid modifying the snapshot, we can't bump the
>> CID between queries. I haven't thought this through, but it seems like
>> this could work. However, none of the WITH queries can see the previous
>> statement's tuples, which means that someone may be suprised when they
>> try to modify the same tuples twice just to discover that only the first
>> modification took place. I don't see this as a huge problem though.
>
> I think that we agreed previously that running all the DML queries to
> completion before the main query, bumping the CID after each one, and
> saving the outputs, was the only workable approach.
>
> http://archives.postgresql.org/pgsql-hackers/2009-10/msg00566.php

Hmm. Yeah. Didn't think about that :-(

> If the executor has buried in it the assumption that the snapshot
> can't change after startup, then does that mean that we need to start
> up and shut down the executor for each subquery?

Then I guess that's pretty much the only option.

Regards,
Marko Tiikkaja


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 21:57:51
Message-ID: 1296.1265839071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> If the executor has buried in it the assumption that the snapshot
> can't change after startup, then does that mean that we need to start
> up and shut down the executor for each subquery?

Yes, I think so. That's the way it's always worked in the past;
see for example PortalRunMulti() and ProcessQuery(). I think trying
to change that is a high-risk, low-reward activity.

This probably means that the planner output for queries involving
writeable CTEs has to be a separate PlannedStmt per such CTE.

regards, tom lane


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 22:50:48
Message-ID: 4B733848.3000107@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-10 23:57 +0200, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> If the executor has buried in it the assumption that the snapshot
>> can't change after startup, then does that mean that we need to start
>> up and shut down the executor for each subquery?
>
> Yes, I think so. That's the way it's always worked in the past;
> see for example PortalRunMulti() and ProcessQuery(). I think trying
> to change that is a high-risk, low-reward activity.
>
> This probably means that the planner output for queries involving
> writeable CTEs has to be a separate PlannedStmt per such CTE.

I'm looking at this, but I can't think of any good way of associating
the tuplestores from PortalRunMulti() with the correct CTEs. Any ideas?

Regards,
Marko Tiikkaja


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 23:25:22
Message-ID: 4B734062.2020109@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-11 00:50 +0200, Marko Tiikkaja wrote:
> On 2010-02-10 23:57 +0200, Tom Lane wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> If the executor has buried in it the assumption that the snapshot
>>> can't change after startup, then does that mean that we need to start
>>> up and shut down the executor for each subquery?
>>
>> Yes, I think so. That's the way it's always worked in the past;
>> see for example PortalRunMulti() and ProcessQuery(). I think trying
>> to change that is a high-risk, low-reward activity.
>>
>> This probably means that the planner output for queries involving
>> writeable CTEs has to be a separate PlannedStmt per such CTE.
>
> I'm looking at this, but I can't think of any good way of associating
> the tuplestores from PortalRunMulti() with the correct CTEs. Any ideas?

Ok, what about the following:
- after planning the original query, standard_planner() goes through
the list of top-level CTEs and assigns a running number for each of
the DML WITHs, in the order they will be executed and returns a
list of PlannedStmts. all necessary statements wi have a flag
signaling that the result should be stored in a tuplestore.
- the portal keeps the list of tuplestores around and passes that
list to every query through PlannedStmt. it keeps on executing
the statements until it finds a PlannedStmt that doesn't want its
results stored anywhere and then frees the list of tuplestores

Does this sound reasonable? And more importantly, am I going to be
wasting my time implementing this? The 15th isn't that far away..

Regards,
Marko Tiikkaja


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-10 23:58:04
Message-ID: 603c8f071002101558j1fe9ab9eje117887fd3596eac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 10, 2010 at 6:25 PM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
> On 2010-02-11 00:50 +0200, Marko Tiikkaja wrote:
>> On 2010-02-10 23:57 +0200, Tom Lane wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> If the executor has buried in it the assumption that the snapshot
>>>> can't change after startup, then does that mean that we need to start
>>>> up and shut down the executor for each subquery?
>>>
>>> Yes, I think so.  That's the way it's always worked in the past;
>>> see for example PortalRunMulti() and ProcessQuery().  I think trying
>>> to change that is a high-risk, low-reward activity.
>>>
>>> This probably means that the planner output for queries involving
>>> writeable CTEs has to be a separate PlannedStmt per such CTE.
>>
>> I'm looking at this, but I can't think of any good way of associating
>> the tuplestores from PortalRunMulti() with the correct CTEs.  Any ideas?
>
> Ok, what about the following:
>  - after planning the original query, standard_planner() goes through
>    the list of top-level CTEs and assigns a running number for each of
>    the DML WITHs, in the order they will be executed and returns a
>    list of PlannedStmts.  all necessary statements wi have a flag
>    signaling that the result should be stored in a tuplestore.
>  - the portal keeps the list of tuplestores around and passes that
>    list to every query through PlannedStmt.  it keeps on executing
>    the statements until it finds a PlannedStmt that doesn't want its
>    results stored anywhere and then frees the list of tuplestores

Wouldn't you'd want to stop when you ran out of PlannedStmts, not when
you hit one that didn't need its results stored? What happens if
there's an unreferenced CTE in the middle of the list?

> Does this sound reasonable?  And more importantly, am I going to be
> wasting my time implementing this?  The 15th isn't that far away..

I have to admit I've been starting to have a feeling over the last
couple of days that this patch isn't going to make it for 9.0... but
obviously I'm in no position to guarantee anything one way or the
other. Please do keep us up to date on your plans, though.

Thanks,

...Robert


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-11 00:13:19
Message-ID: 4B734B9F.3010400@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-11 01:58 +0200, Robert Haas wrote:
> On Wed, Feb 10, 2010 at 6:25 PM, Marko Tiikkaja
> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>> On 2010-02-11 00:50 +0200, Marko Tiikkaja wrote:
>> Ok, what about the following:
>> - after planning the original query, standard_planner() goes through
>> the list of top-level CTEs and assigns a running number for each of
>> the DML WITHs, in the order they will be executed and returns a
>> list of PlannedStmts. all necessary statements wi have a flag
>> signaling that the result should be stored in a tuplestore.
>> - the portal keeps the list of tuplestores around and passes that
>> list to every query through PlannedStmt. it keeps on executing
>> the statements until it finds a PlannedStmt that doesn't want its
>> results stored anywhere and then frees the list of tuplestores
>
> Wouldn't you'd want to stop when you ran out of PlannedStmts, not when
> you hit one that didn't need its results stored? What happens if
> there's an unreferenced CTE in the middle of the list?

Right, of course.

>> Does this sound reasonable? And more importantly, am I going to be
>> wasting my time implementing this? The 15th isn't that far away..
>
> I have to admit I've been starting to have a feeling over the last
> couple of days that this patch isn't going to make it for 9.0... but
> obviously I'm in no position to guarantee anything one way or the
> other. Please do keep us up to date on your plans, though.

Unfortunately, so have I. I'll take a shot at implementing this, but if
I come across any problems, I have to give up.

Regards,
Marko Tiikkaja


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-11 01:44:22
Message-ID: 4B7360F6.2080602@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-11 02:13 +0200, I wrote:
> On 2010-02-11 01:58 +0200, Robert Haas wrote:
>> I have to admit I've been starting to have a feeling over the last
>> couple of days that this patch isn't going to make it for 9.0... but
>> obviously I'm in no position to guarantee anything one way or the
>> other. Please do keep us up to date on your plans, though.
>
> Unfortunately, so have I. I'll take a shot at implementing this, but if
> I come across any problems, I have to give up.

I'm going to have to disappoint a bunch of people and give up. :-(

At this point, I've already used a huge amount of my time and energy to
work on this patch during this commit fest, and I simply can't continue
like this. There's only 4 days left anyway, and I don't feel confident
enough to spend a lot of time during the next couple of days just to get
one more shot. I don't even have any kind of certainty that there
would've been a shot, even if I finished the patch on time.

Thanks everyone for helping out and reviewing this, especially Robert
and Tom.

Regards,
Marko Tiikkaja


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-11 13:46:45
Message-ID: 4B740A45.4060609@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-02-11 03:44 +0200, I wrote:
> I'm going to have to disappoint a bunch of people and give up. :-(

Btw. would it make sense to apply the WITH-on-top-of-DML part of this
patch? At least to me, this seems useful because you can write a
RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that CTE.

Regards,
Marko Tiikkaja


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-11 15:53:22
Message-ID: 603c8f071002110753u6be64d27u9a45399c5a711c41@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
> On 2010-02-11 03:44 +0200, I wrote:
>> I'm going to have to disappoint a bunch of people and give up. :-(
>
> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
> patch?  At least to me, this seems useful because you can write a
> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that CTE.

Hmm, that's a thought. Can you split out just that part?

...Robert


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-11 17:28:28
Message-ID: ca21d184f7fc2b3476e35b1e3e193ae5@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 11 Feb 2010 10:53:22 -0500, Robert Haas <robertmhaas(at)gmail(dot)com>
wrote:
> On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>> On 2010-02-11 03:44 +0200, I wrote:
>>> I'm going to have to disappoint a bunch of people and give up. :-(
>>
>> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
>> patch?  At least to me, this seems useful because you can write a
>> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that
>> CTE.
>
> Hmm, that's a thought. Can you split out just that part?

Here's the patch. It's the same as the stuff in writeable CTE patches, but
I added regression tests.

Regards,
Marko Tiikkaja

Attachment Content-Type Size
with_on_dml.patch text/plain 14.9 KB

From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-11 17:35:18
Message-ID: 8a2b9358757e9e665093b7b006b097bc@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 11 Feb 2010 19:28:28 +0200, I wrote:
> On Thu, 11 Feb 2010 10:53:22 -0500, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
>> On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
>> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>>> On 2010-02-11 03:44 +0200, I wrote:
>>>> I'm going to have to disappoint a bunch of people and give up. :-(
>>>
>>> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
>>> patch?  At least to me, this seems useful because you can write a
>>> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that
>>> CTE.
>>
>> Hmm, that's a thought. Can you split out just that part?
>
> Here's the patch. It's the same as the stuff in writeable CTE patches,
but
> I added regression tests.

Whoops. The reference section in docs still had some traces of writeable
CTEs. Updated patch attached.

Regards,
Marko Tiikkaja

Attachment Content-Type Size
with_on_dml2.patch text/plain 14.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-12 04:26:02
Message-ID: 603c8f071002112026l6ce8f5fayd1c910fad66c3924@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 11, 2010 at 12:35 PM, Marko Tiikkaja
<marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
> On Thu, 11 Feb 2010 19:28:28 +0200, I wrote:
>> On Thu, 11 Feb 2010 10:53:22 -0500, Robert Haas <robertmhaas(at)gmail(dot)com>
>> wrote:
>>> On Thu, Feb 11, 2010 at 8:46 AM, Marko Tiikkaja
>>> <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:
>>>> On 2010-02-11 03:44 +0200, I wrote:
>>>>> I'm going to have to disappoint a bunch of people and give up. :-(
>>>>
>>>> Btw. would it make sense to apply the WITH-on-top-of-DML part of this
>>>> patch?  At least to me, this seems useful because you can write a
>>>> RECURSIVE SELECT and then use UPDATE .. FROM or DELETE .. USING on that
>>>> CTE.
>>>
>>> Hmm, that's a thought.  Can you split out just that part?
>>
>> Here's the patch.  It's the same as the stuff in writeable CTE patches,
> but
>> I added regression tests.
>
> Whoops.  The reference section in docs still had some traces of writeable
> CTEs.  Updated patch attached.

This looks simple and useful. I haven't tested it, but if it's really
this easy, we should definitely do it.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-12 05:11:47
Message-ID: 18510.1265951507@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> This looks simple and useful. I haven't tested it, but if it's really
> this easy, we should definitely do it.

I should be out from under the window functions patch tomorrow,
will look at this one then.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-12 05:19:39
Message-ID: 603c8f071002112119g73e3c65ib7be9444fe259455@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 12, 2010 at 12:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> This looks simple and useful.  I haven't tested it, but if it's really
>> this easy, we should definitely do it.
>
> I should be out from under the window functions patch tomorrow,
> will look at this one then.

Cool, thanks.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-13 01:06:43
Message-ID: 16303.1266023203@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>> Here's the patch. It's the same as the stuff in writeable CTE patches,
>> but I added regression tests.

> Whoops. The reference section in docs still had some traces of writeable
> CTEs. Updated patch attached.

I spent some time playing with this but concluded that it's not
committable. I ran into two significant problems:

1. In an INSERT statement, it's already possible to attach a WITH to
the contained statement, ie
INSERT INTO foo WITH ... SELECT ...
INSERT INTO foo WITH ... VALUES (...)
and the patch wasn't doing anything nice with the case where one tries
to put WITH at both places:
WITH ... INSERT INTO foo WITH ... VALUES (...)
(The SELECT case actually works, mostly, but the VALUES one doesn't.)
I thought about just concat'ing the two WITH lists but this introduces
various strange corner cases; in particular when one list is marked
RECURSIVE and the other isn't there's no way to avoid surprising
behavior. However, since the option for an inner WITH already does
everything you would want, we could just forget about adding outer WITH
for INSERT. The attached modified patch does that.

2. None of the cases play nicely with NEW or OLD references in a rule.
For example,

regression=# create temp table x(f1 int);
CREATE TABLE
regression=# create temp table y(f2 int);
CREATE TABLE
regression=# create rule r2 as on update to x do instead
with t as (select old.*) update y set f2 = t.f1 from t;
CREATE RULE
regression=# update x set f1 = f1+1;
ERROR: bogus local parameter passed to WITH query
regression=#

I don't see any very nice way to fix this. The problem is that the
NEW or OLD reference is treated as though it were a relation of the
main query (the UPDATE in this case), which is something that's not
valid to reference in a WITH query. I'm afraid that it might not
be possible to fix it without significant changes in the way rules
work (and consequent compatibility issues).

We could possibly put in some hack to disallow OLD/NEW references in
the WITH queries, but that got past my threshold of ugliness, so
I'm not going to commit it without further discussion.

Attached is the patch as I had it before giving up (sans documentation
since I'd not gotten to that yet). The main other change from what
you submitted was adding ruleutils.c support.

regards, tom lane

Attachment Content-Type Size
with_on_dml3.patch text/x-patch 11.2 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-13 02:12:43
Message-ID: 603c8f071002121812t21036c7fye50fce1d310449a6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 12, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> writes:
>>> Here's the patch.  It's the same as the stuff in writeable CTE patches,
>>> but I added regression tests.
>
>> Whoops.  The reference section in docs still had some traces of writeable
>> CTEs.  Updated patch attached.
>
> I spent some time playing with this but concluded that it's not
> committable.  I ran into two significant problems:
>
> 1. In an INSERT statement, it's already possible to attach a WITH to
> the contained statement, ie
>        INSERT INTO foo WITH ... SELECT ...
>        INSERT INTO foo WITH ... VALUES (...)
> and the patch wasn't doing anything nice with the case where one tries
> to put WITH at both places:
>        WITH ... INSERT INTO foo WITH ... VALUES (...)
> (The SELECT case actually works, mostly, but the VALUES one doesn't.)
> I thought about just concat'ing the two WITH lists but this introduces
> various strange corner cases; in particular when one list is marked
> RECURSIVE and the other isn't there's no way to avoid surprising
> behavior.  However, since the option for an inner WITH already does
> everything you would want, we could just forget about adding outer WITH
> for INSERT.  The attached modified patch does that.

That seems reasonable, though we might want to document that somehow
for posterity.

By the way, are we going to support WITH ... TABLE?

> 2. None of the cases play nicely with NEW or OLD references in a rule.
> For example,
>
> regression=#  create temp table x(f1 int);
> CREATE TABLE
> regression=#  create temp table y(f2 int);
> CREATE TABLE
> regression=# create rule r2 as on update to x do instead
> with t as (select old.*) update y set f2 = t.f1 from t;
> CREATE RULE
> regression=# update x set f1 = f1+1;
> ERROR:  bogus local parameter passed to WITH query
> regression=#
>
> I don't see any very nice way to fix this.  The problem is that the
> NEW or OLD reference is treated as though it were a relation of the
> main query (the UPDATE in this case), which is something that's not
> valid to reference in a WITH query.  I'm afraid that it might not
> be possible to fix it without significant changes in the way rules
> work (and consequent compatibility issues).
>
> We could possibly put in some hack to disallow OLD/NEW references in
> the WITH queries, but that got past my threshold of ugliness, so
> I'm not going to commit it without further discussion.

On the face of it it's not obvious to me why you wouldn't just do
that. If it's not valid to reference them there, then just don't let
it happen (now comes the part where you tell me why it's not that
simple).

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-13 02:31:56
Message-ID: 20913.1266028316@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Feb 12, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We could possibly put in some hack to disallow OLD/NEW references in
>> the WITH queries, but that got past my threshold of ugliness, so
>> I'm not going to commit it without further discussion.

> On the face of it it's not obvious to me why you wouldn't just do
> that. If it's not valid to reference them there, then just don't let
> it happen (now comes the part where you tell me why it's not that
> simple).

Well, there's no obvious-to-the-user reason why it shouldn't work.
If we hack it, then an example like I gave will give a "no such
table" error, and I'll bet long odds we'll get bug reports about it.

(Now maybe we could suppress the bug reports if we could get it to
print something more like "NEW can't be referenced in WITH", but
doing that seems significantly harder --- the way that I can see
to do it would be to not have NEW/OLD in the namespace while parsing
WITH, and that would lead to a pretty stupid error message.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Writeable CTEs and empty relations
Date: 2010-02-14 04:47:54
Message-ID: 603c8f071002132047t56813fd8kafdcd9b5741fabc1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 12, 2010 at 9:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Feb 12, 2010 at 8:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> We could possibly put in some hack to disallow OLD/NEW references in
>>> the WITH queries, but that got past my threshold of ugliness, so
>>> I'm not going to commit it without further discussion.
>
>> On the face of it it's not obvious to me why you wouldn't just do
>> that.  If it's not valid to reference them there, then just don't let
>> it happen (now comes the part where you tell me why it's not that
>> simple).
>
> Well, there's no obvious-to-the-user reason why it shouldn't work.
> If we hack it, then an example like I gave will give a "no such
> table" error, and I'll bet long odds we'll get bug reports about it.
>
> (Now maybe we could suppress the bug reports if we could get it to
> print something more like "NEW can't be referenced in WITH", but
> doing that seems significantly harder --- the way that I can see
> to do it would be to not have NEW/OLD in the namespace while parsing
> WITH, and that would lead to a pretty stupid error message.)

Oh, I get it. I thought you were saying that it was semantically
nonsensical, but now I think you're saying that there's some
implementation artifact that prevents us from supporting it.
Personally, I don't use NEW and OLD in UPDATE queries, so it wouldn't
bother me to just disallow them, but (1) I might be in the minority
and (2) even if I'm not, the ugliness factor is something to consider.
So I'm not sure what the right thing to do is, but we need to make up
our mind...

If we do decide to hold off on this, then we should probably have a
discussion of what the right way to fix the rule issues is for 9.1.
If there's no non-ugly way to do this, then we might as well go ahead
and do something ugly. If there is a non-ugly way to fix it, we
should figure out what it is so that those who may be interested in
having this feature can see about coding it up.

...Robert