cursors FOR UPDATE don't return most recent row

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: cursors FOR UPDATE don't return most recent row
Date: 2012-01-26 04:03:20
Message-ID: 1327536411-sup-7848@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This is my test case (all in one session):

CREATE TABLE foo (
key int PRIMARY KEY,
value int
);

INSERT INTO foo VALUES (1, 1);

BEGIN;
DECLARE foo CURSOR FOR SELECT * FROM foo FOR UPDATE;
UPDATE foo SET value = 2 WHERE key = 1;
UPDATE foo SET value = 3 WHERE key = 1;
FETCH 1 FROM foo;
COMMIT;

I expected the FETCH to return one row, with the latest data, i.e.
(1, 3), but instead it's returning empty.

If instead I run both UPDATEs in another session, then I do get

alvherre=# FETCH 1 FROM foo;
key | value
-----+-------
1 | 3
(1 fila)

Is this intended?

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cursors FOR UPDATE don't return most recent row
Date: 2012-01-28 04:35:33
Message-ID: 4086.1327725333@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> I expected the FETCH to return one row, with the latest data, i.e.
> (1, 3), but instead it's returning empty.

This is the same thing I was complaining about in the bug #6123 thread,
http://archives.postgresql.org/message-id/9698.1327266271@sss.pgh.pa.us

It looks a bit ticklish to fix.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cursors FOR UPDATE don't return most recent row
Date: 2012-01-30 01:07:06
Message-ID: 1327885054-sup-7077@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of sáb ene 28 01:35:33 -0300 2012:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > I expected the FETCH to return one row, with the latest data, i.e.
> > (1, 3), but instead it's returning empty.
>
> This is the same thing I was complaining about in the bug #6123 thread,
> http://archives.postgresql.org/message-id/9698.1327266271@sss.pgh.pa.us
>
> It looks a bit ticklish to fix.

Hm. Okay, I hadn't read that.

In my FOR KEY SHARE patch I have added a heap_lock_updated_tuple that
makes heap_lock_tuple follow the update chain forward when the tuple
being locked is being updated by a concurrent transaction. I haven't
traced through FETCH to see if it makes sense to apply some of that to
it.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cursors FOR UPDATE don't return most recent row
Date: 2012-01-30 01:13:43
Message-ID: 22199.1327886023@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of sb ene 28 01:35:33 -0300 2012:
>> This is the same thing I was complaining about in the bug #6123 thread,
>> http://archives.postgresql.org/message-id/9698.1327266271@sss.pgh.pa.us

> Hm. Okay, I hadn't read that.
> In my FOR KEY SHARE patch I have added a heap_lock_updated_tuple that
> makes heap_lock_tuple follow the update chain forward when the tuple
> being locked is being updated by a concurrent transaction.

Um, we do that already, no? Certainly in READ COMMITTED queries, we
will do so, though it happens at a higher level than heap_lock_tuple.

> I haven't traced through FETCH to see if it makes sense to apply some
> of that to it.

The issue here is what to do when the update came from our *own*
transaction. In particular I'm a bit worried about avoiding what the
code calls the Halloween problem, namely an infinite loop of re-updating
the same tuple if the scan keeps coming across newer versions.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: cursors FOR UPDATE don't return most recent row
Date: 2012-01-30 02:09:02
Message-ID: 1327888710-sup-5047@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of dom ene 29 22:13:43 -0300 2012:
>
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Excerpts from Tom Lane's message of sáb ene 28 01:35:33 -0300 2012:
> >> This is the same thing I was complaining about in the bug #6123 thread,
> >> http://archives.postgresql.org/message-id/9698.1327266271@sss.pgh.pa.us
>
> > Hm. Okay, I hadn't read that.
> > In my FOR KEY SHARE patch I have added a heap_lock_updated_tuple that
> > makes heap_lock_tuple follow the update chain forward when the tuple
> > being locked is being updated by a concurrent transaction.
>
> Um, we do that already, no? Certainly in READ COMMITTED queries, we
> will do so, though it happens at a higher level than heap_lock_tuple.

Well, it's not quite the same thing. Consider this isolation spec file:

# When a tuple that has been updated is locked, the locking command
# should traverse the update chain; thus, a DELETE should not be able
# to proceed until the lock has been released.

setup
{
CREATE TABLE foo (
key int PRIMARY KEY,
value int
);

INSERT INTO foo VALUES (1, 1);
}

teardown
{
DROP TABLE foo;
}

session "s1"
step "s1b" { BEGIN ISOLATION LEVEL REPEATABLE READ; }
step "s1s" { SELECT * FROM foo; } # obtain snapshot
step "s1l" { SELECT * FROM foo FOR KEY SHARE; } # obtain lock
step "s1c" { COMMIT; }

session "s2"
step "s2b" { BEGIN; }
step "s2u" { UPDATE foo SET value = 2 WHERE key = 1; }
step "s2c" { COMMIT; }
step "s2d" { DELETE FROM foo WHERE key = 1; }

permutation "s1b" "s2b" "s1s" "s2u" "s1l" "s2c" "s2d" "s1c"

Note that session s1 is using repeatable read isolation level, and the
snapshot is older than the update in session s2, so the row it sees is
correctly the old one; however, in order for the delete to honour the
lock (which is necessary for correctness), it has to be propagated up to
tuples that the lock doesn't see itself. Only the old row is returned;
newer rows are locked too, but not returned. So they don't get back to
the executor at all.

> > I haven't traced through FETCH to see if it makes sense to apply some
> > of that to it.
>
> The issue here is what to do when the update came from our *own*
> transaction. In particular I'm a bit worried about avoiding what the
> code calls the Halloween problem, namely an infinite loop of re-updating
> the same tuple if the scan keeps coming across newer versions.

Hmm. Since locking rows does not create new versions, I don't quite see
how we could get into such a problem. A scan should only see each
version once, and will discard all but one due to visibility. This new
routine of mine only follows the ctids to future versions on updated
tuples; there's no new scan.

If I'm wrong about this, I'd sure like to be aware :-)

The fact that SELECT FOR UPDATE returns empty means that so far I've
been unable to exercise the SelfUpdate case in the new routine. The
test case I pasted above started working as I intended once I wrote it;
previously, the DELETE would just be allowed to continue immediately
without blocking.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support