Is FOR UPDATE an optimization fence?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Is FOR UPDATE an optimization fence?
Date: 2009-10-11 16:35:07
Message-ID: 7741.1255278907@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm fooling around with pushing FOR UPDATE locking into a new plan node
type, and I just noticed a behavior that seems a bit bogus.
Historically we have dealt with FOR UPDATE in sub-selects by flattening
the sub-select if we could, because the alternative was to fail
altogether. For example, consider

select * from a join (select * from b for update) ss on a.x = ss.y;

The FOR UPDATE effectively got hoisted to the top because that's where
we could implement it, making this equivalent to

select * from a join b on a.x = b.y for update of b;

It seems to me, though, that this is changing the semantics. In the
latter case it's clear that we should only lock b rows that have a join
partner in a (which indeed is what happens). In the former case, what
I think should be expected to happen is that *all* b rows get locked.

With FOR UPDATE as a plan node, it's possible to fix this by treating
FOR UPDATE in a sub-select as an optimization fence that prevents
flattening of the sub-select, much like LIMIT has always done. The
FOR UPDATE node will end up at the top of the subplan and it will act
as the syntax would suggest.

Of course the downside of changing it is that queries that worked fine
before might work differently (and much slower) now; first because not
flattening the sub-select might lead to a worse plan, and second because
locking more rows takes more time.

The alternative would be to let it continue to flatten such sub-selects
when possible, and to tell anyone who doesn't want that to stick in
OFFSET 0 as an optimization fence.

It's an entirely trivial code change either way. I'm inclined to think
that we should prevent flattening, on the grounds of least astonishment.

Comments?

regards, tom lane


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Is FOR UPDATE an optimization fence?
Date: 2009-10-11 17:52:23
Message-ID: 4AD21B57.10507@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Tom Lane wrote:
> It's an entirely trivial code change either way. I'm inclined to think
> that we should prevent flattening, on the grounds of least astonishment.

Yeah, I also tend towards making FOR UPDATE an optimization fence
(that's how I understood the non-flattening approach). While it might
change behavior compared to previous versions, it doesn't force people
into writing kludges like OFFSET 0.

BTW: how do other databases deal with this? Anything of relevance in the
SQL standards?

Regards

Markus Wanner


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Is FOR UPDATE an optimization fence?
Date: 2009-10-11 18:00:19
Message-ID: 13522.1255284019@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Markus Wanner <markus(at)bluegap(dot)ch> writes:
> BTW: how do other databases deal with this? Anything of relevance in the
> SQL standards?

SQL99 treats FOR UPDATE as an attribute of DECLARE CURSOR, so there's
no way for it to appear in a sub-select per spec. (In general our
approach to FOR UPDATE is only loosely related to what the spec
thinks it does ...)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Is FOR UPDATE an optimization fence?
Date: 2009-10-11 19:17:57
Message-ID: 603c8f070910111217j23932139wa5d121fd535837d8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm fooling around with pushing FOR UPDATE locking into a new plan node
> type, and I just noticed a behavior that seems a bit bogus.
> Historically we have dealt with FOR UPDATE in sub-selects by flattening
> the sub-select if we could, because the alternative was to fail
> altogether.  For example, consider
>
>        select * from a join (select * from b for update) ss on a.x = ss.y;
>
> The FOR UPDATE effectively got hoisted to the top because that's where
> we could implement it, making this equivalent to
>
>        select * from a join b on a.x = b.y for update of b;
>
> It seems to me, though, that this is changing the semantics.  In the
> latter case it's clear that we should only lock b rows that have a join
> partner in a (which indeed is what happens).  In the former case, what
> I think should be expected to happen is that *all* b rows get locked.
>
> With FOR UPDATE as a plan node, it's possible to fix this by treating
> FOR UPDATE in a sub-select as an optimization fence that prevents
> flattening of the sub-select, much like LIMIT has always done.  The
> FOR UPDATE node will end up at the top of the subplan and it will act
> as the syntax would suggest.
>
> Of course the downside of changing it is that queries that worked fine
> before might work differently (and much slower) now; first because not
> flattening the sub-select might lead to a worse plan, and second because
> locking more rows takes more time.
>
> The alternative would be to let it continue to flatten such sub-selects
> when possible, and to tell anyone who doesn't want that to stick in
> OFFSET 0 as an optimization fence.
>
> It's an entirely trivial code change either way.  I'm inclined to think
> that we should prevent flattening, on the grounds of least astonishment.

It seems like this is somewhat related to the question of embedding an
{INSERT|UPDATE|DELETE}...RETURNING in some arbitrary part of a query
versus only allowing it in a WITH clause. The argument for only
allowing it in a WITH clause is that there is otherwise no guarantee
that it is evaluated in its entirety but just once. ISTM we could
contrariwise give it the handling you're proposing here: allow it
anywhere in the query, but make it act as an optimization fence.

For that reason, I think I'd be inclined to make it act as an
optimization fence if used as a top-level CTE, but otherwise flatten
it, so that the handling is consistent with what we've proposed to do
elsewhere. But I'm not really familiar with how these constructs are
being treated by the executor, so I might be creating a false parallel
here.

The other comment I have is that I *expect* subqueries to be pulled
up. So my own personal POLA would not be violated by locking only the
rows with a join partner; in fact it would be more likely to be
violated by the reverse behavior. I might not be typical, though. My
experience is that not pulling up subqueries tends to have disastrous
effects on performance, so I'm somewhat biased against creating more
situations where that will happen.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Is FOR UPDATE an optimization fence?
Date: 2009-10-11 20:16:39
Message-ID: 15044.1255292199@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It's an entirely trivial code change either way. I'm inclined to think
>> that we should prevent flattening, on the grounds of least astonishment.

> It seems like this is somewhat related to the question of embedding an
> {INSERT|UPDATE|DELETE}...RETURNING in some arbitrary part of a query
> versus only allowing it in a WITH clause. The argument for only
> allowing it in a WITH clause is that there is otherwise no guarantee
> that it is evaluated in its entirety but just once. ISTM we could
> contrariwise give it the handling you're proposing here: allow it
> anywhere in the query, but make it act as an optimization fence.

I don't think this analogy to update queries is a very solid one.
SELECT FOR UPDATE has a couple of properties that make it much less
critical for it to have evaluate-exactly-once semantics:

* Locking the same row twice is a no-op.

* If you don't run the query to completion and thus don't lock all
the rows, so what? Arguably, the semantics of the query are to
lock only the rows actually read/returned.

> For that reason, I think I'd be inclined to make it act as an
> optimization fence if used as a top-level CTE, but otherwise flatten
> it, so that the handling is consistent with what we've proposed to do
> elsewhere.

Well, the point of the WITH RETURNING restriction is that just because a
subselect isn't flattened doesn't mean you have evaluate-exactly-once
semantics for the subselect. The upper query is still free to execute
the subselect multiple times, partially, or not at all. A CTE, on the
other hand, is more than just an optimization fence --- we've decided to
give it evaluate-exactly-once semantics. So SELECT FOR UPDATE at the
top of a CTE is really irrelevant to this discussion; it's going to be
treated the same no matter what.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Is FOR UPDATE an optimization fence?
Date: 2009-10-12 17:59:30
Message-ID: 20772.1255370370@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Of course the downside of changing it is that queries that worked fine
>> before might work differently (and much slower) now; first because not
>> flattening the sub-select might lead to a worse plan, and second because
>> locking more rows takes more time.
>>
>> The alternative would be to let it continue to flatten such sub-selects
>> when possible, and to tell anyone who doesn't want that to stick in
>> OFFSET 0 as an optimization fence.
>>
>> It's an entirely trivial code change either way. I'm inclined to think
>> that we should prevent flattening, on the grounds of least astonishment.

> The other comment I have is that I *expect* subqueries to be pulled
> up. So my own personal POLA would not be violated by locking only the
> rows with a join partner; in fact it would be more likely to be
> violated by the reverse behavior. I might not be typical, though. My
> experience is that not pulling up subqueries tends to have disastrous
> effects on performance, so I'm somewhat biased against creating more
> situations where that will happen.

On further reflection I've decided to stick with the old behavior on
this point, at least for the time being. I'm concerned about subtly
altering the behavior of existing queries, and I've also realized that
changing it isn't as much of a one-liner as I thought. The current
behavior of the parser and rewriter really depends on the assumption
that there's not much of a semantic difference between FOR UPDATE
markings at different syntactic levels, because they will happily push
down a FOR UPDATE *into* a sub-select. That is,

select * from a join (select * from b) ss on a.x = ss.y for update;

gets transformed into

select * from a join (select * from b for update of b) ss
on a.x = ss.y
for update of a;

There isn't any simple way to avoid that with the current RowMarkClause
representation, because it only applies to the current query level.
Maybe we should think about changing that sometime, but it seems like
material for a different patch.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Is FOR UPDATE an optimization fence?
Date: 2009-10-12 18:21:49
Message-ID: b42b73150910121121m50b8a3ccv4598ce772316ec0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 12, 2009 at 1:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Oct 11, 2009 at 12:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Of course the downside of changing it is that queries that worked fine
>>> before might work differently (and much slower) now; first because not
>>> flattening the sub-select might lead to a worse plan, and second because
>>> locking more rows takes more time.
>>>
>>> The alternative would be to let it continue to flatten such sub-selects
>>> when possible, and to tell anyone who doesn't want that to stick in
>>> OFFSET 0 as an optimization fence.
>>>
>>> It's an entirely trivial code change either way.  I'm inclined to think
>>> that we should prevent flattening, on the grounds of least astonishment.
>
>> The other comment I have is that I *expect* subqueries to be pulled
>> up.  So my own personal POLA would not be violated by locking only the
>> rows with a join partner; in fact it would be more likely to be
>> violated by the reverse behavior.  I might not be typical, though.  My
>> experience is that not pulling up subqueries tends to have disastrous
>> effects on performance, so I'm somewhat biased against creating more
>> situations where that will happen.
>
> On further reflection I've decided to stick with the old behavior on
> this point, at least for the time being.  I'm concerned about subtly
> altering the behavior of existing queries, and I've also realized that
> changing it isn't as much of a one-liner as I thought.  The current
> behavior of the parser and rewriter really depends on the assumption
> that there's not much of a semantic difference between FOR UPDATE
> markings at different syntactic levels, because they will happily push
> down a FOR UPDATE *into* a sub-select.  That is,

For the record, I wasn't sure if I agreed with your original point that:

select * from a join (select * from b for update) ss on a.x = ss.y;

should necessarily be expected to lock all rows from b (does the
standard insist on it?). The select inside the join clause describes
'how you get' the records, not that they should be all gotten. Along
the same vein, does:
create view foo_update as select * from foo for update;

necessarily lock all the rows from foo for any query against the view?
(It doesn't and IMO shouldn't). ISTM that the particular rows being
locked in your first example are not really defined very well.

merlin