Quick Links

Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order

Lists:	pgsql-hackers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 02:34:02
Message-ID:	1611.1256524442@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Now that we've got a hopefully-non-broken implementation of SELECT FOR
UPDATE locking as a plan node, we can finally contemplate fixing two
misbehaviors that are called out on the SELECT reference page:

It is possible for a SELECT command using both LIMIT and FOR
UPDATE/SHARE clauses to return fewer rows than specified by LIMIT. This
is because LIMIT is applied first. The command selects the specified
number of rows, but might then block trying to obtain a lock on one or
more of them. Once the SELECT unblocks, the row might have been deleted
or updated so that it does not meet the query WHERE condition anymore,
in which case it will not be returned.

Similarly, it is possible for a SELECT command using ORDER BY and FOR
UPDATE/SHARE to return rows out of order. This is because ORDER BY is
applied first. The command orders the result, but might then block
trying to obtain a lock on one or more of the rows. Once the SELECT
unblocks, one of the ordered columns might have been modified and be
returned out of order. A workaround is to perform SELECT ... FOR
UPDATE/SHARE and then SELECT ... ORDER BY.

All that we have to do to fix the first one is to put the LockRows node
below the Limit node instead of above it. The solution for the second
one is to also put LockRows underneath the Sort node, and to regard its
output as unsorted so that a Sort node will certainly be generated.
(This in turn implies that we should prefer the cheapest-total plan
for the rest of the query.)

Does anyone have any objections to this? I can't see that it will break
any applications that work today, but maybe I'm missing something.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 03:29:59
Message-ID:	23F588E6-94A4-4E22-A4B2-96BA1840BE0C@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Oct 25, 2009, at 10:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Now that we've got a hopefully-non-broken implementation of SELECT FOR
> UPDATE locking as a plan node, we can finally contemplate fixing two
> misbehaviors that are called out on the SELECT reference page:
>
> It is possible for a SELECT command using both LIMIT and FOR
> UPDATE/SHARE clauses to return fewer rows than specified by
> LIMIT. This
> is because LIMIT is applied first. The command selects the
> specified
> number of rows, but might then block trying to obtain a lock on
> one or
> more of them. Once the SELECT unblocks, the row might have been
> deleted
> or updated so that it does not meet the query WHERE condition
> anymore,
> in which case it will not be returned.
>
> Similarly, it is possible for a SELECT command using ORDER BY and
> FOR
> UPDATE/SHARE to return rows out of order. This is because ORDER
> BY is
> applied first. The command orders the result, but might then block
> trying to obtain a lock on one or more of the rows. Once the SELECT
> unblocks, one of the ordered columns might have been modified and
> be
> returned out of order. A workaround is to perform SELECT ... FOR
> UPDATE/SHARE and then SELECT ... ORDER BY.
>
> All that we have to do to fix the first one is to put the LockRows
> node
> below the Limit node instead of above it. The solution for the second
> one is to also put LockRows underneath the Sort node, and to regard
> its
> output as unsorted so that a Sort node will certainly be generated.
> (This in turn implies that we should prefer the cheapest-total plan
> for the rest of the query.)

This seems like it could potentially introduce a performance
regression, but the current behavior is so bizarre that it seems like
we should still change it.

> Does anyone have any objections to this? I can't see that it will
> break
> any applications that work today, but maybe I'm missing something.

I'm pretty excited about this. It is a nifty piece of foot-gun
removal. Thanks!

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 03:40:26
Message-ID:	2399.1256528426@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Oct 25, 2009, at 10:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... The solution for the second
>> one is to also put LockRows underneath the Sort node, and to regard its
>> output as unsorted so that a Sort node will certainly be generated.
>> (This in turn implies that we should prefer the cheapest-total plan
>> for the rest of the query.)

> This seems like it could potentially introduce a performance
> regression, but the current behavior is so bizarre that it seems like
> we should still change it.

Yeah, it could definitely run slower than the existing code --- in
particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
would tend to become a seqscan-and-sort rather than possibly just
reading one end of an index. However, I quote the old aphorism that
it can be made indefinitely fast if it doesn't have to give the right
answer. The reason the current behavior is fast is it's giving the
wrong answer :-(

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 12:53:52
Message-ID:	20091026125351.GA8812@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:

> > This seems like it could potentially introduce a performance
> > regression, but the current behavior is so bizarre that it seems like
> > we should still change it.
>
> Yeah, it could definitely run slower than the existing code --- in
> particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
> would tend to become a seqscan-and-sort rather than possibly just
> reading one end of an index. However, I quote the old aphorism that
> it can be made indefinitely fast if it doesn't have to give the right
> answer. The reason the current behavior is fast is it's giving the
> wrong answer :-(

So this probably merits a warning in the release notes for people to
check that their queries continue to run with the performance they
expect.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 14:30:04
Message-ID:	28333.1256567404@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane escribi:
>> Yeah, it could definitely run slower than the existing code --- in
>> particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
>> would tend to become a seqscan-and-sort rather than possibly just
>> reading one end of an index. However, I quote the old aphorism that
>> it can be made indefinitely fast if it doesn't have to give the right
>> answer. The reason the current behavior is fast is it's giving the
>> wrong answer :-(

> So this probably merits a warning in the release notes for people to
> check that their queries continue to run with the performance they
> expect.

One problem with this is that there isn't any good way for someone to
get back the old behavior if they want to. Which might be a perfectly
reasonable thing, eg if they know that no concurrent update is supposed
to change the sort-key column. The obvious thing would be to allow

select * from (select * from foo order by col limit 10) ss for update;

to apply the FOR UPDATE last. Unfortunately, that's not how it works
now because the FOR UPDATE will get pushed down into the subquery.
I was shot down when I proposed a related change, a couple weeks ago
http://archives.postgresql.org/message-id/7741.1255278907@sss.pgh.pa.us
but it seems like we might want to reconsider.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 14:44:38
Message-ID:	603c8f070910260744g5ac2ef9agad9789b715253c8f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 26, 2009 at 10:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Tom Lane escribió:
>>> Yeah, it could definitely run slower than the existing code --- in
>>> particular the combination of all three (FOR UPDATE ORDER BY LIMIT)
>>> would tend to become a seqscan-and-sort rather than possibly just
>>> reading one end of an index. However, I quote the old aphorism that
>>> it can be made indefinitely fast if it doesn't have to give the right
>>> answer. The reason the current behavior is fast is it's giving the
>>> wrong answer :-(
>
>> So this probably merits a warning in the release notes for people to
>> check that their queries continue to run with the performance they
>> expect.
>
> One problem with this is that there isn't any good way for someone to
> get back the old behavior if they want to. Which might be a perfectly
> reasonable thing, eg if they know that no concurrent update is supposed
> to change the sort-key column. The obvious thing would be to allow
>
> select * from (select * from foo order by col limit 10) ss for update;
>
> to apply the FOR UPDATE last. Unfortunately, that's not how it works
> now because the FOR UPDATE will get pushed down into the subquery.
> I was shot down when I proposed a related change, a couple weeks ago
> http://archives.postgresql.org/message-id/7741.1255278907@sss.pgh.pa.us
> but it seems like we might want to reconsider.

"Shot down" might be an overstatement of the somewhat cautious
reaction that proposal. :-)

Could the desired behavior be obtained using a CTE?

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 15:54:15
Message-ID:	6403.1256572455@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Oct 26, 2009 at 10:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> One problem with this is that there isn't any good way for someone to
>> get back the old behavior if they want to. Which might be a perfectly
>> reasonable thing, eg if they know that no concurrent update is supposed
>> to change the sort-key column. The obvious thing would be to allow
>>
>> select * from (select * from foo order by col limit 10) ss for update;
>>
>> to apply the FOR UPDATE last. Unfortunately, that's not how it works
>> now because the FOR UPDATE will get pushed down into the subquery.

> Could the desired behavior be obtained using a CTE?

Nope, we push FOR UPDATE into WITHs too. I don't really see any way to
deal with this without some sort of semantic changes.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 16:06:35
Message-ID:	6562.1256573195@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Could the desired behavior be obtained using a CTE?

> Nope, we push FOR UPDATE into WITHs too. I don't really see any way to
> deal with this without some sort of semantic changes.

... although on reflection, I'm not sure *why* we push FOR UPDATE into
WITHs. That seems a bit antithetical to the position we've evolved that
WITH queries are executed independently of the outer query.

If we removed that bit of behavior, which hopefully is too new for much
code to depend on, then the old FOR-UPDATE-last behavior could be
attained via a WITH. And we'd not have to risk touching the interaction
between plain subqueries and FOR UPDATE, which is something that seems
much more likely to break existing apps.

That seems like a reasonable compromise to me ... any objections?

regards, tom lane

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 17:58:14
Message-ID:	407d949e0910261058v476ac5f4id4c84724af0dc89e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Oct 25, 2009 at 7:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> All that we have to do to fix the first one is to put the LockRows node
> below the Limit node instead of above it. The solution for the second
> one is to also put LockRows underneath the Sort node, and to regard its
> output as unsorted so that a Sort node will certainly be generated.
> (This in turn implies that we should prefer the cheapest-total plan
> for the rest of the query.)

I'm not following how this would work. Would it mean that every record
would end up being locked?

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-26 18:01:57
Message-ID:	13436.1256580117@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Sun, Oct 25, 2009 at 7:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> All that we have to do to fix the first one is to put the LockRows node
>> below the Limit node instead of above it. The solution for the second
>> one is to also put LockRows underneath the Sort node, and to regard its
>> output as unsorted so that a Sort node will certainly be generated.
>> (This in turn implies that we should prefer the cheapest-total plan
>> for the rest of the query.)

> I'm not following how this would work. Would it mean that every record
> would end up being locked?

In the case of LIMIT, no, because LIMIT doesn't fetch any more rows than
it needs from its input node. In the case of ORDER BY, yes,
potentially. So we might conceivably decide we should fix the first
issue and not the second. However, I'd prefer to have a solution
whereby the query does what it appears to mean and you have to write
something more complicated if you want performance over correctness.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-27 15:22:48
Message-ID:	14029.1256656968@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> Could the desired behavior be obtained using a CTE?

>> Nope, we push FOR UPDATE into WITHs too. I don't really see any way to
>> deal with this without some sort of semantic changes.

> ... although on reflection, I'm not sure *why* we push FOR UPDATE into
> WITHs. That seems a bit antithetical to the position we've evolved that
> WITH queries are executed independently of the outer query.

> If we removed that bit of behavior, which hopefully is too new for much
> code to depend on, then the old FOR-UPDATE-last behavior could be
> attained via a WITH. And we'd not have to risk touching the interaction
> between plain subqueries and FOR UPDATE, which is something that seems
> much more likely to break existing apps.

On further investigation, this still seems like a good change to make,
but it doesn't solve the problem at hand. If we make FOR UPDATE not
propagate into WITH then the effect of

with w as (select ... order by x limit n) select * from w for update

is not going to be to lock just the rows pulled from the WITH; it's
going to be to not lock any rows at all. So we're still up against a
performance problem if we make these otherwise-desirable changes in plan
node order.

I realized just now that the backwards-compatibility problem is not
nearly as bad as I thought it was, because a lot of the syntaxes
we might want to change the behavior of will fail outright in 8.4
and before. We had this little bit in preptlist.c:

/*
* Currently the executor only supports FOR UPDATE/SHARE at top level
*/
if (root->query_level > 1)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("SELECT FOR UPDATE/SHARE is not allowed in subqueries")));

and that is the error you will get if you have FOR UPDATE in or applying
to a sub-select that does not get flattened into the calling query.
So in particular, a sub-select using ORDER BY or LIMIT has never been
compatible with FOR UPDATE at all, and that means we can define the
behavior of that combination freely.

What I am thinking we should do is define that FOR UPDATE happens before
ORDER BY or LIMIT normally, but that if the FOR UPDATE is inherited from
an outer query level, it happens after the sub-select's ORDER BY or
LIMIT. The first provision fixes the bugs noted in our documentation,
and the second one allows people to get back the old behavior if they
need it for performance. This also seems reasonably non-astonishing
from a semantic viewpoint.

Actually implementing this will be more than a one-line change, but it
doesn't seem too terribly difficult --- we'll just need to modify the
query representation of FOR UPDATE enough that we can remember which
case we had.

Comments?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-27 16:59:37
Message-ID:	603c8f070910270959y6c21dfb1x8215e5c14be25e60@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 27, 2009 at 11:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> Could the desired behavior be obtained using a CTE?
>
>>> Nope, we push FOR UPDATE into WITHs too. I don't really see any way to
>>> deal with this without some sort of semantic changes.
>
>> ... although on reflection, I'm not sure *why* we push FOR UPDATE into
>> WITHs. That seems a bit antithetical to the position we've evolved that
>> WITH queries are executed independently of the outer query.
>
>> If we removed that bit of behavior, which hopefully is too new for much
>> code to depend on, then the old FOR-UPDATE-last behavior could be
>> attained via a WITH. And we'd not have to risk touching the interaction
>> between plain subqueries and FOR UPDATE, which is something that seems
>> much more likely to break existing apps.
>
> On further investigation, this still seems like a good change to make,
> but it doesn't solve the problem at hand. If we make FOR UPDATE not
> propagate into WITH then the effect of
>
> with w as (select ... order by x limit n) select * from w for update
>
> is not going to be to lock just the rows pulled from the WITH; it's
> going to be to not lock any rows at all. So we're still up against a
> performance problem if we make these otherwise-desirable changes in plan
> node order.
>
> I realized just now that the backwards-compatibility problem is not
> nearly as bad as I thought it was, because a lot of the syntaxes
> we might want to change the behavior of will fail outright in 8.4
> and before. We had this little bit in preptlist.c:
>
> /*
> * Currently the executor only supports FOR UPDATE/SHARE at top level
> */
> if (root->query_level > 1)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("SELECT FOR UPDATE/SHARE is not allowed in subqueries")));
>
> and that is the error you will get if you have FOR UPDATE in or applying
> to a sub-select that does not get flattened into the calling query.
> So in particular, a sub-select using ORDER BY or LIMIT has never been
> compatible with FOR UPDATE at all, and that means we can define the
> behavior of that combination freely.
>
> What I am thinking we should do is define that FOR UPDATE happens before
> ORDER BY or LIMIT normally, but that if the FOR UPDATE is inherited from
> an outer query level, it happens after the sub-select's ORDER BY or
> LIMIT. The first provision fixes the bugs noted in our documentation,
> and the second one allows people to get back the old behavior if they
> need it for performance. This also seems reasonably non-astonishing
> from a semantic viewpoint.
>
> Actually implementing this will be more than a one-line change, but it
> doesn't seem too terribly difficult --- we'll just need to modify the
> query representation of FOR UPDATE enough that we can remember which
> case we had.

When you refer to an "outer query level", is that the same thing as a
sub-select? If so, I think I agree that the behavior is
non-astonishing.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-27 17:06:30
Message-ID:	10454.1256663190@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Oct 27, 2009 at 11:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What I am thinking we should do is define that FOR UPDATE happens before
>> ORDER BY or LIMIT normally, but that if the FOR UPDATE is inherited from
>> an outer query level, it happens after the sub-select's ORDER BY or
>> LIMIT. The first provision fixes the bugs noted in our documentation,
>> and the second one allows people to get back the old behavior if they
>> need it for performance. This also seems reasonably non-astonishing
>> from a semantic viewpoint.

> When you refer to an "outer query level", is that the same thing as a
> sub-select? If so, I think I agree that the behavior is
> non-astonishing.

Right, the case would be something like

select * from
(select * from foo order by x limit n) ss
for update of ss;

If you try this in any existing release it will just fail, because the
planner knows that it hasn't got a way to execute FOR UPDATE in a
subquery.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-27 17:51:47
Message-ID:	603c8f070910271051h410b227ft319e01e8f0fa5c8e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 27, 2009 at 1:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Oct 27, 2009 at 11:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> What I am thinking we should do is define that FOR UPDATE happens before
>>> ORDER BY or LIMIT normally, but that if the FOR UPDATE is inherited from
>>> an outer query level, it happens after the sub-select's ORDER BY or
>>> LIMIT. The first provision fixes the bugs noted in our documentation,
>>> and the second one allows people to get back the old behavior if they
>>> need it for performance. This also seems reasonably non-astonishing
>>> from a semantic viewpoint.
>
>> When you refer to an "outer query level", is that the same thing as a
>> sub-select? If so, I think I agree that the behavior is
>> non-astonishing.
>
> Right, the case would be something like
>
> select * from
> (select * from foo order by x limit n) ss
> for update of ss;
>
> If you try this in any existing release it will just fail, because the
> planner knows that it hasn't got a way to execute FOR UPDATE in a
> subquery.

That's a pretty odd construction.

In some sense I don't like the proposed behavior, because it's
imaginable that someone would use this syntax without realizing that
it could produce wrong answers. My own gut instinct would be to
always push down the FOR UPDATE as being a clearer way to convey what
was meant - but we've already established that not everyone's gut
instincts agree with mine, and if someone does write this, they might
easily fail to understand the risk that it poses.

I'm not sure what to do about it, though. Not giving people ANY way
to recover the old behavior is a little troubling.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-27 20:42:32
Message-ID:	5841.1256676152@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Oct 27, 2009 at 1:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Right, the case would be something like
>>
>> select * from
>> (select * from foo order by x limit n) ss
>> for update of ss;

> That's a pretty odd construction.

Dunno why you think that. That's exactly what one would write if one
wanted certain operations to execute in a different order than they're
defined to execute in within a single query level. We have not
previously been very clear about the order of operations for FOR UPDATE
locking relative to other steps, but now we will be.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Endgame for all those SELECT FOR UPDATE changes: fix plan node order
Date:	2009-10-27 21:05:45
Message-ID:	6197.1256677545@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Oct 27, 2009 at 1:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Right, the case would be something like

>>> select * from
>>> (select * from foo order by x limit n) ss
>>> for update of ss;

>> That's a pretty odd construction.

> Dunno why you think that. That's exactly what one would write if one
> wanted certain operations to execute in a different order than they're
> defined to execute in within a single query level. We have not
> previously been very clear about the order of operations for FOR UPDATE
> locking relative to other steps, but now we will be.

Actually ... it strikes me that there is another way we could approach
this. Namely, leave the semantics as-is (FOR UPDATE runs last) and
document that you can do

select * from
(select * from foo for update) ss
order by x limit n;

if you need FOR UPDATE to run before sorting. Or perhaps better,
redefine the ordering as ORDER BY then FOR UPDATE then LIMIT. Swapping
FOR UPDATE and LIMIT has no performance cost and eliminates the worse of
the two complaints in the documentation, without breaking any working
queries AFAICS. If you have the case where you want to cope with
concurrent updates to the sort key, then you can write the more
complicated query, and it's gonna cost ya. But that's not a typical
usage, as proven by the fact that it took years to realize there was
a problem there. So we shouldn't optimize for that usage at the expense
of cases where the sort key isn't expected to change.

It could be argued that this approach doesn't satisfy the principle of
least astonishment as well as doing FOR UPDATE first, but on reflection
I'm not sure I buy that. The traditional definition has been that we
only lock the rows that are actually returned, and putting FOR UPDATE
underneath the sort will break that expectation. If it's only underneath
LIMIT we can still meet that expectation.

So I'm liking this more the more I think about it ... and it's also
significantly less work than the other way would be.

regards, tom lane