Re: SKIP LOCKED DATA (work in progress)

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Thomas Munro <munro(at)ip9(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SKIP LOCKED DATA (work in progress)
Date: 2014-07-23 12:15:11
Message-ID: CAApHDvqOZGAdSexbMsVamr21QvPV=ELMhZiUPbzQeSKcbMo56Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 29, 2014 at 9:01 PM, Thomas Munro <munro(at)ip9(dot)org> wrote:

>
> Please find attached a rebased version of my SKIP LOCKED
> patch (formerly SKIP LOCKED DATA), updated to support only the
> Oracle-like syntax.
>
>
Hi Thomas,

Apologies for taking this long to get to reviewing this, I'd gotten a bit
side tracked with my own patches during this commitfest.

I'm really glad to see this patch is back again. I think it will be very
useful for processing queues. I could have made good use of it in my last
work, using it for sending unsent emails which were "queued" up in a table
in the database.

I've so far read over the patch and done only some brief tests of the
functionality.

Here's what I've picked up on so far:

* In heap_lock_tuple() there's a few places where you test the wait_policy,
but you're not always doing this in the same order. The previous code did
if (nolock) first each time, but since there's now 3 values I think if
(wait_policy == LockWaitError) should be first each time as likely this is
the most common case.

* The following small whitespace change can be removed in gram.y:

@@ -119,7 +119,6 @@ typedef struct PrivTarget
#define CAS_NOT_VALID 0x10
#define CAS_NO_INHERIT 0x20

-
#define parser_yyerror(msg) scanner_yyerror(msg, yyscanner)
#define parser_errposition(pos) scanner_errposition(pos, yyscanner)

* analyze.c. rc->waitPolicy = Max(rc->waitPolicy, waitPolicy);
I'm not quite sure I understand this yet, but I'm guessing the original
code was there so that something like:

SELECT * FROM (SELECT * FROM a FOR UPDATE) a FOR UPDATE NOWAIT;
Would give:
ERROR: could not obtain lock on row in relation "a"

So it seems that with the patch as you've defined in by the order of the
enum values in LockClauseWaitPolicy that SKIP LOCKED overrides FOR UPDATE.
I'm wondering if this is dangerous.

Should the following really skip locked tuples?
SELECT * FROM (SELECT * FROM a FOR UPDATE SKIP LOCKED) a FOR UPDATE LIMIT 1;

But on the other hand perhaps I've missed a discussion on this, if so then
I think the following comment should be updated to explain it all:

* We also consider that NOWAIT wins if it's specified both ways. This
* is a bit more debatable but raising an error doesn't seem helpful.
* (Consider for instance SELECT FOR UPDATE NOWAIT from a view that
* internally contains a plain FOR UPDATE spec.)
*

* plannodes.h -> RowWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED DATA
options */
Should be "NOWAIT and SKIP LOCKED options" since DATA has now been removed
from the syntax.

* nodeLockRow.c has extra space in if condition: else if (erm->waitPolicy
== RWP_SKIP )

I'm also wondering about this block of code in general:

if (erm->waitPolicy == RWP_WAIT)
wait_policy = LockWaitBlock;
else if (erm->waitPolicy == RWP_SKIP )
wait_policy = LockWaitSkip;
else /* erm->waitPolicy == RWP_NOWAIT */
wait_policy = LockWaitError;

Just after this code heap_lock_tuple() is called, and if that
returns HeapTupleWouldBlock, the code does a goto lnext, which then will
repeat that whole block again. I'm wondering why there's 2 enums that are
for the same thing? if you just had 1 then you'd not need this block of
code at all, you could just pass erm->waitPolicy to heap_lock_tuple().

* parsenodes.h comment does not meet project standards (
http://www.postgresql.org/docs/devel/static/source-format.html)

typedef enum LockClauseWaitPolicy
{
/* order is important (see applyLockingClause which takes the greatest
value when several wait policies have been specified), and values must
match RowWaitPolicy from plannodes.h */

* parsenode.h remove "DATA" from LockClauseWaitPolicy waitPolicy; /* NOWAIT
and SKIP LOCKED DATA */

I have noticed the /* TODO -- work out what needs to be released here */
comments in head_lock_tuple(), and perhaps the lack of cleaning up is what
is causing the following:

create table skiptest (id int primary key); insert into skiptest (id)
select x.x from generate_series(1,1000000) x(x);

Session 1: begin work; select * from skiptest for update limit 999999;
Session 2: select * from skiptest for update skip locked limit 1;
WARNING: out of shared memory
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.

Yet if I do:

session 1: begin work; select * from skiptest where id > 1 for update;
session 2: select * from skiptest for update skip locked limit 1;
id
----
1
(1 row)

That test makes me think your todo comments are in the right place,
something is missing there for sure!

* plays about with patch for a bit *

I don't quite understand how heap_lock_tuple works, as if I debug session 2
from the above set of queries the first call
to ConditionalLockTupleTuplock() (heapam.c line 4236) succeeds where I'd
have thought it would fail, since session 1 should be locking this tuple?
Later at line 4527 on the line if (!ConditionalXactLockTableWait(xwait)),
it fails to grab the lock and returns HeapTupleWouldBlock. The above query
seems to work ok if I just apply the following patch:

diff --git a/src/backend/access/heap/heapam.c
b/src/backend/access/heap/heapam.c
index b661d0e..b3e9dcc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4525,8 +4525,10 @@ l3:
else /* wait_policy == LockWaitSkip */
{
if
(!ConditionalXactLockTableWait(xwait))
- /* TODO -- work out what
needs to be released here */
+ {
+
UnlockTupleTuplock(relation, tid, mode);
return HeapTupleWouldBlock;
+ }
}

/* if there are updates, follow the update
chain */

Running the test query again, I get:
select * from skiptest for update skip locked limit 1;
id
---------
1000000
(1 row)

Would you also be able to supply a rebased patch with current master? The
applying the patch is a bit noisy and the isolation tests part does not
seem to apply at all now, so I've not managed to look at those yet.

I've still got more to look at, but hopefully this will get things moving
again.

Regards

David Rowley

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2014-07-23 13:15:15 Re: PDF builds broken again
Previous Message Magnus Hagander 2014-07-23 11:31:11 Re: PDF builds broken again