"ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"

Lists: pgsql-hackers
From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-14 21:32:28
Message-ID: 603c8f070811141332l3b8ebbc0vaf70219a27cc795@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I found the following behavior surprising. Is there a reason for it?
This is 8.3.5. ...Robert

rhaas=# BEGIN WORK;
BEGIN
rhaas=# CREATE TABLE test_table (id serial, foo integer, bar integer);
NOTICE: CREATE TABLE will create implicit sequence
"test_table_id_seq" for serial column "test_table.id"
CREATE TABLE
rhaas=# INSERT INTO test_table VALUES (DEFAULT, 1, 1);
INSERT 0 1
rhaas=# DECLARE c CURSOR FOR SELECT id FROM test_table ORDER BY foo FOR UPDATE;
DECLARE CURSOR
rhaas=# FETCH c;
id
----
1
(1 row)

rhaas=# UPDATE test_table SET bar = NULL WHERE CURRENT OF c;
ERROR: cursor "c" is not a simply updatable scan of table "test_table"
rhaas=# \q


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-14 21:56:57
Message-ID: 87ej1em0nq.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:

> I found the following behavior surprising. Is there a reason for it?
> This is 8.3.5. ...Robert
>
> rhaas=# DECLARE c CURSOR FOR SELECT id FROM test_table ORDER BY foo FOR UPDATE;
> DECLARE CURSOR

Skimming the code we don't support WHERE CURRENT OF on a query which involves
a Sort above the table specified. The way CURRENT OF works depends on the
nature of the plan and depends on there being an active scan of the specified
table. Since sort reads all its inputs in advance that information is lost by
the time the results are output.

If you had an index it would work. That this is tied to the nature of the plan
does seem kind of unfortunate. I'm not sure the alternatives are very
appealing though -- they would involve a lot of code to track a lot more
information for queries that might never need it.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-14 22:06:04
Message-ID: 21885.1226700364@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
> I found the following behavior surprising. Is there a reason for it?

Yes.

regards, tom lane

(Oh, you wanted to know what the reason is? It's because a sort step
is not going to pass through any tuple identity information.)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-14 22:24:05
Message-ID: 22096.1226701445@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>> I found the following behavior surprising. Is there a reason for it?
>> This is 8.3.5. ...Robert
>>
>> rhaas=# DECLARE c CURSOR FOR SELECT id FROM test_table ORDER BY foo FOR UPDATE;

> Skimming the code we don't support WHERE CURRENT OF on a query which involves
> a Sort above the table specified. The way CURRENT OF works depends on the
> nature of the plan and depends on there being an active scan of the specified
> table. Since sort reads all its inputs in advance that information is lost by
> the time the results are output.

[ dept of second thoughts... ] Actually, given that he said FOR UPDATE,
the plan should be propagating the tuple identity through to top level
so that execMain can do its thing. So in principle we could probably
get the information without widespread changes. This would fit
reasonably well with the spec's requirements too --- any but trivial
cursors are not deemed updatable unless you say FOR UPDATE. But it
would mean two completely independent implementations within
execCurrent.c...

regards, tom lane


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-14 22:27:48
Message-ID: 603c8f070811141427j1609ad09s84f636cee67d0a30@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> [ dept of second thoughts... ] Actually, given that he said FOR UPDATE,
> the plan should be propagating the tuple identity through to top level
> so that execMain can do its thing. So in principle we could probably
> get the information without widespread changes. This would fit
> reasonably well with the spec's requirements too --- any but trivial
> cursors are not deemed updatable unless you say FOR UPDATE. But it
> would mean two completely independent implementations within
> execCurrent.c...

What's particularly unfortunate is Greg's comment that this would work
if the planner chose an index scan. Had I defined an index on the
columns in question, my code might have worked and been deployed to
production - and then broken if the planner changed its mind and
decided to use a seqscan after all.

ISTM any cursor that's going to be updated should specify WHERE UPDATE
in the query, but maybe that's not a hard requirement as of now.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-15 00:09:54
Message-ID: 28188.1226707794@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> [ dept of second thoughts... ] Actually, given that he said FOR UPDATE,
> the plan should be propagating the tuple identity through to top level
> so that execMain can do its thing. So in principle we could probably
> get the information without widespread changes. This would fit
> reasonably well with the spec's requirements too --- any but trivial
> cursors are not deemed updatable unless you say FOR UPDATE. But it
> would mean two completely independent implementations within
> execCurrent.c...

Here's a draft patch (no docs, no regression test) for that. It doesn't
look as ugly as I expected. Comments? I'm hesitant to call this a bug
fix, and we are past feature freeze ...

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 3.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-15 01:06:32
Message-ID: 28666.1226711192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
> What's particularly unfortunate is Greg's comment that this would work
> if the planner chose an index scan. Had I defined an index on the
> columns in question, my code might have worked and been deployed to
> production - and then broken if the planner changed its mind and
> decided to use a seqscan after all.

> ISTM any cursor that's going to be updated should specify WHERE UPDATE
> in the query, but maybe that's not a hard requirement as of now.

Well, it's contrary to SQL spec, which says that sufficiently simple
cursors should be updatable even if they don't say FOR UPDATE.

However ... the more I think about it, the more I wonder if we shouldn't
rip out the current search-the-plan-tree hack and allow WHERE CURRENT OF
*only* for cursors that say FOR UPDATE. Aside from the above issue,
there's an already known and documented risk if you omit FOR UPDATE,
which is that your WHERE CURRENT OF update silently becomes a no-op
if someone else has already updated the target row since your query
started. It seems like not using FOR UPDATE is sufficiently dangerous
practice that requiring it wouldn't be doing our users a disservice.

There is one thing we lack in order to go that far, though: the current
implementation of WHERE CURRENT OF can cope with inheritance queries,
that is you can say
DECLARE c CURSOR ... FROM parent_table ...
UPDATE parent_table ... WHERE CURRENT OF c
and the right things will happen even if the cursor is currently showing
a row from some child table. SELECT FOR UPDATE does not presently
support inheritance, so we'd really have to fix that in order to not
have a loss of functionality. This is something that was on my private
to-do list for 8.4 but I hadn't thought of an easy way to do it. But,
revisiting the issue just now, I have an idea: when a FOR UPDATE target
is an inheritance tree, make the plan return not only ctid as a junk
column, but also tableoid. The executor top level could easily use that
to determine which table to actually try to do heap_lock_tuple in.
I haven't looked at the planner code for this lately, but I'm thinking
it is probably less than a day's work to make it happen.

Comments?

regards, tom lane


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-15 01:18:19
Message-ID: 603c8f070811141718ie494ef2mf095004cd82c5ed@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Here's a draft patch (no docs, no regression test) for that. It doesn't
> look as ugly as I expected. Comments? I'm hesitant to call this a bug
> fix, and we are past feature freeze ...

Considering the number of and invasiveness of the patches remaining in
the queue, I'm inclined to consider this a drop in the bucket, but
then I'm biased since I just got bitten by it.

...Robert


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-15 01:24:00
Message-ID: 603c8f070811141724w6ef8fe82ic25c929672c2bb60@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> However ... the more I think about it, the more I wonder if we shouldn't
> rip out the current search-the-plan-tree hack and allow WHERE CURRENT OF
> *only* for cursors that say FOR UPDATE. Aside from the above issue,
> there's an already known and documented risk if you omit FOR UPDATE,
> which is that your WHERE CURRENT OF update silently becomes a no-op
> if someone else has already updated the target row since your query
> started. It seems like not using FOR UPDATE is sufficiently dangerous
> practice that requiring it wouldn't be doing our users a disservice.

Yes, I was reading that documentation today and scratching my head.
It didn't quite dawn on me that the solution was to just always use
FOR UPDATE, but certainly if it is then you may as well enforce it.
I've encountered that no-op behavior in other situations in the past,
specifically BEFORE triggers, and it's really weird and confusing,
because you typically have to be doing something fairly complicated
before the problem case arises, and then it mysteriously fails to
work.

...Robert


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-15 05:56:23
Message-ID: 873ahtmt14.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
>> What's particularly unfortunate is Greg's comment that this would work
>> if the planner chose an index scan. Had I defined an index on the
>> columns in question, my code might have worked and been deployed to
>> production - and then broken if the planner changed its mind and
>> decided to use a seqscan after all.
>
>> ISTM any cursor that's going to be updated should specify WHERE UPDATE
>> in the query, but maybe that's not a hard requirement as of now.
>
> Well, it's contrary to SQL spec, which says that sufficiently simple
> cursors should be updatable even if they don't say FOR UPDATE.
>
> However ... the more I think about it, the more I wonder if we shouldn't
> rip out the current search-the-plan-tree hack and allow WHERE CURRENT OF
> *only* for cursors that say FOR UPDATE.

It is tempting since it would make application code which looped updating
WHERE CURRENT OF semantically equivalent to UPDATE FROM. It seems better to
have one set of functionality rather than two similar but subtly different
sets of functionality available depending on the coding style.

> Aside from the above issue, there's an already known and documented risk if
> you omit FOR UPDATE, which is that your WHERE CURRENT OF update silently
> becomes a no-op if someone else has already updated the target row since
> your query started. It seems like not using FOR UPDATE is sufficiently
> dangerous practice that requiring it wouldn't be doing our users a
> disservice.

Could we implicitly add FOR UPDATE when planning and executing a cursor of a
sufficiently simple query? How simple does the spec say the query needs to be?

> There is one thing we lack in order to go that far, though: the current
> implementation of WHERE CURRENT OF can cope with inheritance queries,

How would this implementation relate to the issues described in
inheritance_planner (which always seemed strange):

* inheritance_planner
* Generate a plan in the case where the result relation is an
* inheritance set.
*
* We have to handle this case differently from cases where a source relation
* is an inheritance set. Source inheritance is expanded at the bottom of the
* plan tree (see allpaths.c), but target inheritance has to be expanded at
* the top. The reason is that for UPDATE, each target relation needs a
* different targetlist matching its own column set. Also, for both UPDATE
* and DELETE, the executor needs the Append plan node at the top, else it
* can't keep track of which table is the current target table. Fortunately,
* the UPDATE/DELETE target can never be the nullable side of an outer join,
* so it's OK to generate the plan this way.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-15 22:06:09
Message-ID: 7292.1226786769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Aside from the above issue, there's an already known and documented risk if
>> you omit FOR UPDATE, which is that your WHERE CURRENT OF update silently
>> becomes a no-op if someone else has already updated the target row since
>> your query started. It seems like not using FOR UPDATE is sufficiently
>> dangerous practice that requiring it wouldn't be doing our users a
>> disservice.

> Could we implicitly add FOR UPDATE when planning and executing a cursor of a
> sufficiently simple query?

No, not unless you want plain SELECTs to suddenly start blocking each
other.

>> There is one thing we lack in order to go that far, though: the current
>> implementation of WHERE CURRENT OF can cope with inheritance queries,

> How would this implementation relate to the issues described in
> inheritance_planner (which always seemed strange):

Yeah, it is very tempting to think about getting rid of all the
inherited-target cruft (both in the planner, and in the executor's weird
interactions between nodeAppend and execMain) in favor of using a
tableoid junk column to figure out which target rel to update.
However there's one other nasty problem to fix, which is that in an
inherited UPDATE you may need a different update targetlist for each
target relation. I'm not seeing a solution for that yet in the context
of this simplified approach.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "ORDER BY" clause prevents "UPDATE WHERE CURRENT OF"
Date: 2008-11-16 00:25:58
Message-ID: 13531.1226795158@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Well, it's contrary to SQL spec, which says that sufficiently simple
>> cursors should be updatable even if they don't say FOR UPDATE.
>>
>> However ... the more I think about it, the more I wonder if we shouldn't
>> rip out the current search-the-plan-tree hack and allow WHERE CURRENT OF
>> *only* for cursors that say FOR UPDATE.

> It is tempting since it would make application code which looped updating
> WHERE CURRENT OF semantically equivalent to UPDATE FROM. It seems better to
> have one set of functionality rather than two similar but subtly different
> sets of functionality available depending on the coding style.

After playing around with this for awhile I realized that there really
would be a functional loss if we remove support for WHERE CURRENT OF
with non-FOR-UPDATE cursors. Namely, that a non-FOR-UPDATE cursor is
insensitive to subsequent updates, which sometimes is handy. There are
examples of both cases in the portals.sql regression test.

So what I now propose is:

1. If the cursor includes FOR UPDATE/FOR SHARE, use the ExecRowMark
technique to get the target row identity. In this path we fail if there
is not exactly one FOR UPDATE reference to the UPDATE's target table.
(An interesting property here is that you can update from a self-join,
if you mark only one arm of the join as FOR UPDATE. See example in
attached regression test additions.)

2. If the cursor doesn't have FOR UPDATE/SHARE, use the existing code.
In this path we fail if the plan is "too complicated". However, it
shouldn't fail for any case that is simply updatable according to the
SQL spec.

We should revise the documentation to make it very clear that FOR UPDATE
is the preferred way, but that sometimes you might need the other.

Attached is a draft patch that is code-complete but lacks documentation.
The patch is against CVS HEAD, ie, it assumes the SELECT FOR UPDATE
inheritance fixes I made earlier today.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 12.3 KB