Re: SSI modularity questions

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI modularity questions
Date: 2011-06-28 17:47:28
Message-ID: 4E09CD60020000250003ECC3@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> On 27.06.2011 21:23, Kevin Grittner wrote:

> The bigger question is if those calls are needed at all
> (
http://archives.postgresql.org/message-id/4E072EA9.3030200@enterprisedb.com
> ).

Ah, I didn't properly grasp your concerns the first time I read that.
The heap relation lock for a seqscan is indeed required for
correctness and has been there all along. The rs_relpredicatelocked
flag was added in response to this:

http://archives.postgresql.org/pgsql-hackers/2011-01/msg00730.php

> I'm uneasy about changing them this late in the release cycle, but
> I don't feel good about leaving useless clutter in place just
> because we're late in the release cycle either. More importantly,
> if locking the whole relation in a seqscan is not just a
> performance optimization, but is actually required for correctness,
> it's important that we make the code and comments to reflect that
> or someone will break it in the future.

OK, if that isn't clear in the comments, we should definitely make it
clear. Basically, the predicate locking strategy is as follows:

(1) We're only concerned with read/write dependencies, also know as
rw-conflicts. This is where two transactions overlap (each gets its
snapshot before the other commits, so neither can see the work of the
other), and one does a read which doesn't see the write of the other
due only to the timing.

(2) For rw-conflicts where the read follows the write, the predicate
locks don't come into play -- we use the MVCC data in the heap tuples
directly.

(3) Heap tuples are locked so that updates or deletes by an
overlapping transaction of the tuple which has been read can be
detected as a rw-conflict. Keep in mind that access for such a
delete or update may not go through the same index on which the
conflicting read occurred. It might use a different index or a
seqscan. These may be promoted to page or heap relation locks to
control the shared space used by predicate locks, but the concept is
the same -- we're locking actual tuples read, not any gaps.

(4) Index ranges are locked to detect inserts or updates which
create heap tuples which would have been read by an overlapping
transaction if they had existed and been visible at the time of the
index scan. The entire goal of locks on indexes is to lock the
"gaps" where a scan *didn't* find anything; we only care about
conflicting index tuple inserts.

(5) When a heap scan is executed, there is no index gap to lock to
cover the predicate involved, so we need to acquire a heap relation
lock -- any insert to the relation by an overlapping transaction is a
rw-conflict. While these *look* just like tuple locks which got
promoted, their purpose is entirely different. Like index locks,
they are for detecting inserts into the "gaps". [Light bulb goes on
over head: in some future release, perhaps it would be worth
differentiating between the two uses of heap relation locks, to
reduce the frequency of false positives. A couple bit flags in the
lock structure might do it.]

So, the heap relation lock is clearly needed for the seqscan. There
is room for performance improvement there in skipping the tuple lock
attempt when we're in a seqscan, which will always be a no-op when it
finds the heap relation lock after a hash table lookup. But you are
also questioning whether the predicate locking of the tuples where
rs_relpredicatelocked is tested can be removed entirely, rather than
conditioned on the boolean. The question is: can the code be reached
on something other than a seqscan of the heap, and can this happen
for a non-temporary, non-system table using a MVCC snapshot?

I've been trying to work backward to all the spots which call these
functions, directly or indirectly to determine that. That's
obviously not trivial or easy work, and I fear that unless someone
more familiar with the code than I can weigh in on that question for
any particular PredicateLockTuple() call, I would rather leave the
calls alone for 9.1 and sort this out in 9.2. I'm confident that
they don't do any damage where they are; it's a matter of very
marginal performance benefit (skipping a call to a fast return) and
code tidiness (not making unnecessary calls).

I can, with confidence, now answer my own previous question about
moving the calls outside the influence of HeapKeyTest(): it's not
necessary. The rows currently excluded won't be seen by the caller,
so they don't fit under the needs of (3) above, and if (4) or (5)
aren't covered where they need to be, locking a few extra rows won't
help at all. So we can drop that issue.

>> (2) In reviewing the above, Heikki noticed that there was a second
>> place in the executor that SSI calls were needed but missing. I
>> submitted a patch here:
>>
>>
http://archives.postgresql.org/message-id/4E07550F020000250003EC42@gw.wicourts.gov
>>
>> I wonder, though, whether the section of code which I needed to
>> modify should be moved to a new function in heapam.c on modularity
>> grounds.
>>
>> If these two places were moved, there would be no SSI calls from
>> any source file in the executor subdirectory.
>
> Same here, we might not need those PredicateLockTuple calls in
> bitmap heap scan at all. Can you check my logic, and verify if
> those PredicateLockTuple() calls are needed?

These sure look like they are needed per point (3) above. I would
like to add a test involving a lossy bitmap scan. How many rows are
normally needed to force a bitmap scan to be lossy? What's the
easiest way to check whether a plan is going to use (or is using) a
lossy bitmap scan? I assume that narrow rows with an index on a
randomly generated value are the way to go.

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI modularity questions
Date: 2011-06-28 18:29:35
Message-ID: 4E0A1D8F.7060909@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28.06.2011 20:47, Kevin Grittner wrote:
> (3) Heap tuples are locked so that updates or deletes by an
> overlapping transaction of the tuple which has been read can be
> detected as a rw-conflict. Keep in mind that access for such a
> delete or update may not go through the same index on which the
> conflicting read occurred. It might use a different index or a
> seqscan. These may be promoted to page or heap relation locks to
> control the shared space used by predicate locks, but the concept is
> the same -- we're locking actual tuples read, not any gaps.

Ok, that's what I was missing. So the predicate locks on heap tuples are
necessary. Thanks for explaining this again.

> So, the heap relation lock is clearly needed for the seqscan. There
> is room for performance improvement there in skipping the tuple lock
> attempt when we're in a seqscan, which will always be a no-op when it
> finds the heap relation lock after a hash table lookup. But you are
> also questioning whether the predicate locking of the tuples where
> rs_relpredicatelocked is tested can be removed entirely, rather than
> conditioned on the boolean. The question is: can the code be reached
> on something other than a seqscan of the heap, and can this happen
> for a non-temporary, non-system table using a MVCC snapshot?
>
> I've been trying to work backward to all the spots which call these
> functions, directly or indirectly to determine that. That's
> obviously not trivial or easy work, and I fear that unless someone
> more familiar with the code than I can weigh in on that question for
> any particular PredicateLockTuple() call, I would rather leave the
> calls alone for 9.1 and sort this out in 9.2. I'm confident that
> they don't do any damage where they are; it's a matter of very
> marginal performance benefit (skipping a call to a fast return) and
> code tidiness (not making unnecessary calls).

Hmm, the calls in question are the ones in heapgettup() and
heapgettup_pagemode(), which are subroutines of heap_getnext().
heap_getnext() is only used in sequential scans, so it seems safe to
remove those calls.

>>> (2) In reviewing the above, Heikki noticed that there was a second
>>> place in the executor that SSI calls were needed but missing. I
>>> submitted a patch here:
>>>
>>>
> http://archives.postgresql.org/message-id/4E07550F020000250003EC42@gw.wicourts.gov
>>>
>>> I wonder, though, whether the section of code which I needed to
>>> modify should be moved to a new function in heapam.c on modularity
>>> grounds.
>>>
>>> If these two places were moved, there would be no SSI calls from
>>> any source file in the executor subdirectory.
>>
>> Same here, we might not need those PredicateLockTuple calls in
>> bitmap heap scan at all. Can you check my logic, and verify if
>> those PredicateLockTuple() calls are needed?
>
> These sure look like they are needed per point (3) above.

Yep.

> I would
> like to add a test involving a lossy bitmap scan. How many rows are
> normally needed to force a bitmap scan to be lossy?

The size of bitmaps is controlled by work_mem, so you can set work_mem
very small to cause them to become lossy earlier. Off the top of my head
I don't have any guesstimate on how many rows you need.

> What's the
> easiest way to check whether a plan is going to use (or is using) a
> lossy bitmap scan?

Good question. There doesn't seem to be anything in the EXPLAIN ANALYZE
output to show that, so I think you'll have to resort to adding some
elog()s in the right places.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI modularity questions
Date: 2011-06-28 18:31:21
Message-ID: BANLkTin8mMUf=0_R2Q3Zp5xLaHjmvU8idQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 28, 2011 at 1:47 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> (5)  When a heap scan is executed, there is no index gap to lock to
> cover the predicate involved, so we need to acquire a heap relation
> lock -- any insert to the relation by an overlapping transaction is a
> rw-conflict.  While these *look* just like tuple locks which got
> promoted, their purpose is entirely different.  Like index locks,
> they are for detecting inserts into the "gaps".  [Light bulb goes on
> over head: in some future release, perhaps it would be worth
> differentiating between the two uses of heap relation locks, to
> reduce the frequency of false positives.  A couple bit flags in the
> lock structure might do it.]

You know, it just occurred to me while reading this email that you're
using the term "predicate lock" in a way that is totally different
from what I learned in school. What I was taught is that the word
"predicate" in "predicate lock" is like the word "tuple" in "tuple
lock" or the word "relation" in "relation lock" - that is, it
describes *the thing being locked*. In other words, you are
essentially doing:

LOCK TABLE foo WHERE i = 1;

I think that what you're calling the predicate lock manager should
really be called the siread lock manager, and all of the places where
you are "predicate locking" a tuple should really be "siread locking"
the tuple.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI modularity questions
Date: 2011-06-28 20:52:20
Message-ID: BANLkTimt_Ky+zCTkr8mKiqFh71i9pdWzDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/28, Robert Haas <robertmhaas(at)gmail(dot)com>:

> You know, it just occurred to me while reading this email that you're
> using the term "predicate lock" in a way that is totally different
> from what I learned in school. What I was taught is that the word
> "predicate" in "predicate lock" is like the word "tuple" in "tuple
> lock" or the word "relation" in "relation lock" - that is, it
> describes *the thing being locked*. In other words, you are
> essentially doing:
>
> LOCK TABLE foo WHERE i = 1;
>
> I think that what you're calling the predicate lock manager should
> really be called the siread lock manager, and all of the places where
> you are "predicate locking" a tuple should really be "siread locking"
> the tuple.

The predicate in the "full table" case is: "any tuple in this table"
(including tuples that don't exist yet, otherwise it wouldn't be a
predicate). The predicate in the index case is: "any tuple that would
be returned by so-and-such index scan" (idem regarding tuples that
don't exist yet, hence "locking the gaps").

The lock semantics (i.e., how conflicts between it and other locks are
defined and treated) are "siread". The thing that it applies to is a
predicate. (I.e., PostgreSQL before SSI already supported some rather
trivial kind of predicate lock: the full table lock.)

Conclusion: I don't see the problem :-).

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?