Quick Links

Reducing overhead of frequent table locks

Lists:	pgsql-hackers

From:	Alexey Klyukin <alexk(at)commandprompt(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	'tuple concurrently updated' error for alter role ... set
Date:	2011-05-12 22:09:16
Message-ID:	645346BD-753F-4EEA-94C8-61A0A52F59A0@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

We have recently observed a problem with concurrent execution of ALTER ROLE SET... in several sessions. It's similar to the one from http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=fbcf4b92aa64d4577bcf25925b055316b978744a The result is the 'tuple concurrently updated' error message, and the problem is easily reproducible:

CREATE SCHEMA test;
CREATE SCHEMA test2;
CREATE ROLE testrole;

session 1:
while [ 1 ]; do psql postgres -c 'alter role testrole set search_path=test2';done

session 2:
while [ 1 ]; do psql postgres -c 'alter role testrole set search_path=test';done

The error message appears almost immediately on my system.

After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting(). While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replaced the lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with these changes.

Regards,
--
Alexey Klyukin
The PostgreSQL Company - Command Prompt, Inc.

Attachment	Content-Type	Size
db_role_setting.diff	application/octet-stream	1.3 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexey Klyukin <alexk(at)commandprompt(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-12 22:28:48
Message-ID:	11572.1305239328@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexey Klyukin <alexk(at)commandprompt(dot)com> writes:
> After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting(). While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replaced the lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with these changes.

We're not likely to do that, first because it's randomly different from
the handling of every other system catalog update, and second because it
would serialize all updates on this catalog, and probably create
deadlock cases that don't exist now. (BTW, as the patch is given I'd
expect it to still fail, though perhaps with lower probability than
before. For this to actually stop all such cases, you'd have to hold
the lock till commit, which greatly increases the risks of deadlock.)

I see no particular reason why conflicting updates like those *shouldn't*
be expected to fail occasionally.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-12 22:53:56
Message-ID:	BANLkTikvW0u6ea1gBw-r3CAEsSAqex9DEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 12, 2011 at 6:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexey Klyukin <alexk(at)commandprompt(dot)com> writes:
>> After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting(). While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replaced the lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with these changes.
>
> We're not likely to do that, first because it's randomly different from
> the handling of every other system catalog update,

We have very robust locking of this type for table-related DDL
operations and just about none for anything else. I don't consider
the latter to be a feature.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-12 22:59:23
Message-ID:	12194.1305241163@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 12, 2011 at 6:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We're not likely to do that, first because it's randomly different from
>> the handling of every other system catalog update,

> We have very robust locking of this type for table-related DDL
> operations and just about none for anything else. I don't consider
> the latter to be a feature.

I didn't say it was ;-). What I *am* saying is that if we're going to
do anything about this sort of problem, there needs to be a
well-considered system-wide plan. Arbitrarily changing the locking
rules for individual operations is not going to make things better,
and taking exclusive locks on whole catalogs is definitely not going to
make things better.

regards, tom lane

From:	Alexey Klyukin <alexk(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-12 23:07:53
Message-ID:	1AF28A4D-65D3-495B-9C29-87E6D31D9B5F@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 13, 2011, at 1:28 AM, Tom Lane wrote:

> Alexey Klyukin <alexk(at)commandprompt(dot)com> writes:
>> After digging in the code I've found that a RowExclusiveLock is acquired on a pg_db_role_setting table in AlterSetting(). While the name of the locks suggests that it should conflict with itself, it doesn't. After I've replaced the lock in question with ShareUpdateExclusiveLock, the problem disappeared. Attached is the simple patch with these changes.
>
> We're not likely to do that, first because it's randomly different from
> the handling of every other system catalog update, and second because it
> would serialize all updates on this catalog, and probably create
> deadlock cases that don't exist now. (BTW, as the patch is given I'd
> expect it to still fail, though perhaps with lower probability than
> before. For this to actually stop all such cases, you'd have to hold
> the lock till commit, which greatly increases the risks of deadlock.)

Fair enough. I think the AlterSetting holds the lock till commit (it does
heap_close with NoLock). The DropSetting doesn't do this though.

>
> I see no particular reason why conflicting updates like those *shouldn't*
> be expected to fail occasionally.

Excellent question, I don't have enough context to properly answer that (other
than a guess that an unexpected transaction rollback is too unexpected :))
Let me ask the customer first.

--
Alexey Klyukin
The PostgreSQL Company - Command Prompt, Inc.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-13 03:47:42
Message-ID:	BANLkTimXvwCcuywtConu8cDx6a_wV=c-JQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 12, 2011 at 6:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, May 12, 2011 at 6:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> We're not likely to do that, first because it's randomly different from
>>> the handling of every other system catalog update,
>
>> We have very robust locking of this type for table-related DDL
>> operations and just about none for anything else. I don't consider
>> the latter to be a feature.
>
> I didn't say it was ;-). What I *am* saying is that if we're going to
> do anything about this sort of problem, there needs to be a
> well-considered system-wide plan. Arbitrarily changing the locking
> rules for individual operations is not going to make things better,
> and taking exclusive locks on whole catalogs is definitely not going to
> make things better.

Yes; true. I'm inclined to say that this is a bug, but not one we're
going to fix before 9.2. I think it might be about time to get
serious about making an effort to sprinkle the code with a few more
LockDatbaseObject() and LockSharedObject() calls.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-13 04:56:42
Message-ID:	18555.1305262602@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 12, 2011 at 6:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I didn't say it was ;-). What I *am* saying is that if we're going to
>> do anything about this sort of problem, there needs to be a
>> well-considered system-wide plan. Arbitrarily changing the locking
>> rules for individual operations is not going to make things better,
>> and taking exclusive locks on whole catalogs is definitely not going to
>> make things better.

> Yes; true. I'm inclined to say that this is a bug, but not one we're
> going to fix before 9.2. I think it might be about time to get
> serious about making an effort to sprinkle the code with a few more
> LockDatbaseObject() and LockSharedObject() calls.

Yeah. That doesn't rise to the level of a "well-considered plan", but
I believe that we could develop a plan around that concept, ie, take a
lock associated with the individual object we are about to operate on.

BTW, I thought a bit more about why I didn't like the initial proposal
in this thread, and the basic objection is this: the AccessShareLock or
RowExclusiveLock we take on the catalog is not meant to provide any
serialization of operations on individual objects within the catalog.
What it's there for is to interlock against operations that are
operating on the catalog as a table, such as VACUUM FULL (which has to
lock out all accesses to the catalog) or REINDEX (which has to lock out
updates). So the catalog-level lock is the right thing and shouldn't be
changed. If we want to interlock updates of individual objects then we
need a different locking concept for that.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-13 13:07:34
Message-ID:	BANLkTinhL9X4Biv-7GLV78Oi1B5NPd1_Kw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 12:56 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, I thought a bit more about why I didn't like the initial proposal
> in this thread, and the basic objection is this: the AccessShareLock or
> RowExclusiveLock we take on the catalog is not meant to provide any
> serialization of operations on individual objects within the catalog.
> What it's there for is to interlock against operations that are
> operating on the catalog as a table, such as VACUUM FULL (which has to
> lock out all accesses to the catalog) or REINDEX (which has to lock out
> updates). So the catalog-level lock is the right thing and shouldn't be
> changed. If we want to interlock updates of individual objects then we
> need a different locking concept for that.

Right, I agree. Fortunately, we don't have to invent a new one.
There is already locking being done exactly along these lines for
DROP, COMMENT, and SECURITY LABEL (which is important, because
otherwise we could leave behind orphaned security labels that would be
inherited by a later object with the same OID, leading to a security
problem). I think it would be sensible, and quite simple, to extend
that to other DDL operations.

I think that we probably *don't* want to lock non-table objects when
they are just being *used*. We do that for tables (to lock against
concurrent drop operations) and in some workloads it becomes a severe
bottleneck. Doing it for functions and operators would make the
problem far worse, for no particular benefit. Unlike tables, there is
no underlying relation file to worry about, so the worst thing that
happens is someone continues to use a dropped object slightly after
it's gone, or the old definition of an object that's been modified.

Actually, it's occurred to me from time to time that it would be nice
to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and
ROW EXCLUSIVE) locks for tables as well. Under normal operating
conditions (i.e. no DDL running), these locks generate a huge amount
of lock manager traffic even though none of the locks conflict with
each other. Unfortunately, I don't really see a way to make this
work. But maybe it would at least be possible to create some sort of
fast path. For example, suppose every backend opens a file and uses
that file to record lock tags for the objects on which it is taking
"weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking
a "strong" lock (anything that conflicts with one of those lock
types), the exclusive locker is required to open all of those files
and transfer the locks into the lock manager proper. Of course, it's
also necessary to nail down the other direction: you have to have some
way of making sure that the backend can't record in it's local file a
lock that would have conflicted had it been taken in the actual lock
manager. But maybe there's some lightweight way we could detect that,
as well. For example, we could keep, say, a 1K array in shared
memory, representing a 1024-way partitioning of the locktag space.
Each byte is 1 if there are any "strong" locks on objects with that
locktag in the lock manager, and 0 if there are none (or maybe you
need a 4K array with exact counts, for bookkeeping). When a backend
wants to take a "weak" lock, it checks the array: if it finds a 0 then
it just records the lock in its file; otherwise, it goes through the
lock manager. When a backend wants a "strong" lock, it first sets the
byte (or bumps the count) in the array, then transfers any existing
weak locks from individual backends to the lock manager, then tries to
get its own lock. Possibly the array operations could be done with
memory synchronization primitives rather than spinlocks, especially on
architectures that support an atomic fetch-and-add. Of course I don't
know quite how we recover if we try to do one of these "lock
transfers" and run out of shared memory... and overall I'm hand-waving
here quite a bit, but in theory it seems like we ought to be able to
rejigger this locking so that we reduce the cost of obtaining a "weak"
lock, perhaps at the expense of making it more expensive to obtain a
"strong" lock, which are relatively rare by comparison.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-13 13:22:27
Message-ID:	201105131322.p4DDMRZ16948@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Is this a TODO? I don't see it on the TODO list.

---------------------------------------------------------------------------

Robert Haas wrote:
> On Fri, May 13, 2011 at 12:56 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > BTW, I thought a bit more about why I didn't like the initial proposal
> > in this thread, and the basic objection is this: the AccessShareLock or
> > RowExclusiveLock we take on the catalog is not meant to provide any
> > serialization of operations on individual objects within the catalog.
> > What it's there for is to interlock against operations that are
> > operating on the catalog as a table, such as VACUUM FULL (which has to
> > lock out all accesses to the catalog) or REINDEX (which has to lock out
> > updates). ?So the catalog-level lock is the right thing and shouldn't be
> > changed. ?If we want to interlock updates of individual objects then we
> > need a different locking concept for that.
>
> Right, I agree. Fortunately, we don't have to invent a new one.
> There is already locking being done exactly along these lines for
> DROP, COMMENT, and SECURITY LABEL (which is important, because
> otherwise we could leave behind orphaned security labels that would be
> inherited by a later object with the same OID, leading to a security
> problem). I think it would be sensible, and quite simple, to extend
> that to other DDL operations.
>
> I think that we probably *don't* want to lock non-table objects when
> they are just being *used*. We do that for tables (to lock against
> concurrent drop operations) and in some workloads it becomes a severe
> bottleneck. Doing it for functions and operators would make the
> problem far worse, for no particular benefit. Unlike tables, there is
> no underlying relation file to worry about, so the worst thing that
> happens is someone continues to use a dropped object slightly after
> it's gone, or the old definition of an object that's been modified.
>
> Actually, it's occurred to me from time to time that it would be nice
> to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and
> ROW EXCLUSIVE) locks for tables as well. Under normal operating
> conditions (i.e. no DDL running), these locks generate a huge amount
> of lock manager traffic even though none of the locks conflict with
> each other. Unfortunately, I don't really see a way to make this
> work. But maybe it would at least be possible to create some sort of
> fast path. For example, suppose every backend opens a file and uses
> that file to record lock tags for the objects on which it is taking
> "weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking
> a "strong" lock (anything that conflicts with one of those lock
> types), the exclusive locker is required to open all of those files
> and transfer the locks into the lock manager proper. Of course, it's
> also necessary to nail down the other direction: you have to have some
> way of making sure that the backend can't record in it's local file a
> lock that would have conflicted had it been taken in the actual lock
> manager. But maybe there's some lightweight way we could detect that,
> as well. For example, we could keep, say, a 1K array in shared
> memory, representing a 1024-way partitioning of the locktag space.
> Each byte is 1 if there are any "strong" locks on objects with that
> locktag in the lock manager, and 0 if there are none (or maybe you
> need a 4K array with exact counts, for bookkeeping). When a backend
> wants to take a "weak" lock, it checks the array: if it finds a 0 then
> it just records the lock in its file; otherwise, it goes through the
> lock manager. When a backend wants a "strong" lock, it first sets the
> byte (or bumps the count) in the array, then transfers any existing
> weak locks from individual backends to the lock manager, then tries to
> get its own lock. Possibly the array operations could be done with
> memory synchronization primitives rather than spinlocks, especially on
> architectures that support an atomic fetch-and-add. Of course I don't
> know quite how we recover if we try to do one of these "lock
> transfers" and run out of shared memory... and overall I'm hand-waving
> here quite a bit, but in theory it seems like we ought to be able to
> rejigger this locking so that we reduce the cost of obtaining a "weak"
> lock, perhaps at the expense of making it more expensive to obtain a
> "strong" lock, which are relatively rare by comparison.
>
> <end of rambling digression>
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Reducing overhead of frequent table locks
Date:	2011-05-13 20:16:13
Message-ID:	20110513201613.GA22947@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 09:07:34AM -0400, Robert Haas wrote:
> Actually, it's occurred to me from time to time that it would be nice
> to eliminate ACCESS SHARE (and while I'm dreaming, maybe ROW SHARE and
> ROW EXCLUSIVE) locks for tables as well. Under normal operating
> conditions (i.e. no DDL running), these locks generate a huge amount
> of lock manager traffic even though none of the locks conflict with
> each other. Unfortunately, I don't really see a way to make this
> work. But maybe it would at least be possible to create some sort of
> fast path. For example, suppose every backend opens a file and uses
> that file to record lock tags for the objects on which it is taking
> "weak" (ACCESS SHARE/ROW SHARE/ROW EXCLUSIVE) locks on. Before taking
> a "strong" lock (anything that conflicts with one of those lock
> types), the exclusive locker is required to open all of those files
> and transfer the locks into the lock manager proper. Of course, it's
> also necessary to nail down the other direction: you have to have some
> way of making sure that the backend can't record in it's local file a
> lock that would have conflicted had it been taken in the actual lock
> manager. But maybe there's some lightweight way we could detect that,
> as well. For example, we could keep, say, a 1K array in shared
> memory, representing a 1024-way partitioning of the locktag space.
> Each byte is 1 if there are any "strong" locks on objects with that
> locktag in the lock manager, and 0 if there are none (or maybe you
> need a 4K array with exact counts, for bookkeeping). When a backend
> wants to take a "weak" lock, it checks the array: if it finds a 0 then
> it just records the lock in its file; otherwise, it goes through the
> lock manager. When a backend wants a "strong" lock, it first sets the
> byte (or bumps the count) in the array, then transfers any existing
> weak locks from individual backends to the lock manager, then tries to
> get its own lock. Possibly the array operations could be done with
> memory synchronization primitives rather than spinlocks, especially on
> architectures that support an atomic fetch-and-add. Of course I don't
> know quite how we recover if we try to do one of these "lock
> transfers" and run out of shared memory... and overall I'm hand-waving
> here quite a bit, but in theory it seems like we ought to be able to
> rejigger this locking so that we reduce the cost of obtaining a "weak"
> lock, perhaps at the expense of making it more expensive to obtain a
> "strong" lock, which are relatively rare by comparison.
>
> <end of rambling digression>

The key is putting a rapid hard stop to all fast-path lock acquisitions and
then reconstructing a valid global picture of the affected lock table regions.
Your 1024-way table of strong lock counts sounds promising. (Offhand, I do
think they would need to be counts, not just flags.)

If I'm understanding correctly, your pseudocode would look roughly like this:

if (level >= ShareUpdateExclusiveLock)
++strong_lock_counts[my_strong_lock_count_partition]
sfence
if (strong_lock_counts[my_strong_lock_count_partition] == 1)
/* marker 1 */
import_all_local_locks
normal_LockAcquireEx
else if (level <= RowExclusiveLock)
lfence
if (strong_lock_counts[my_strong_lock_count_partition] == 0)
/* marker 2 */
local_only
/* marker 3 */
else
normal_LockAcquireEx
else
normal_LockAcquireEx

At marker 1, we need to block until no code is running between markers two and
three. You could do that with a per-backend lock (LW_SHARED by the strong
locker, LW_EXCLUSIVE by the backend). That would probably still be a win over
the current situation, but it would be nice to have something even cheaper.

Then you have the actual procedure for transfer of local locks to the global
lock manager. Using record locks in a file could work, but that's a system call
per lock acquisition regardless of the presence of strong locks. Is that cost
sufficiently trivial? I wonder if, instead, we could signal all backends at
marker 1 to dump the applicable parts of their local (memory) lock tables to
files. Or to another shared memory region, if that didn't mean statically
allocating the largest possible required amount. If we were willing to wait
until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the
global insertions directly. That might yield a decent amount of bug swatting to
fill in missing CHECK_FOR_INTERRUPTS, though.

Improvements in this area would also have good synergy with efforts to reduce
the global impact of temporary table usage. CREATE TEMP TABLE can be the
major source of AccessExclusiveLock acquisitions. However, with the strong
lock indicator partitioned 1024 ways or more, that shouldn't be a deal killer.

From:	Alexey Klyukin <alexk(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: 'tuple concurrently updated' error for alter role ... set
Date:	2011-05-13 23:29:43
Message-ID:	4D38043D-57AA-42A7-9C46-102C1A5EBC94@commandprompt.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On May 13, 2011, at 2:07 AM, Alexey Klyukin wrote:

> On May 13, 2011, at 1:28 AM, Tom Lane wrote:
>
>>
>> We're not likely to do that, first because it's randomly different from
>> the handling of every other system catalog update, and second because it
>> would serialize all updates on this catalog, and probably create
>> deadlock cases that don't exist now. (BTW, as the patch is given I'd
>> expect it to still fail, though perhaps with lower probability than
>> before. For this to actually stop all such cases, you'd have to hold
>> the lock till commit, which greatly increases the risks of deadlock.)
>
....
>>
>> I see no particular reason why conflicting updates like those *shouldn't*
>> be expected to fail occasionally.
>
> Excellent question, I don't have enough context to properly answer that (other
> than a guess that an unexpected transaction rollback is too unexpected :))
> Let me ask the customer first.

The original use case is sporadical failures of some internal unit tests due
to the error message in subject.

--
Alexey Klyukin
The PostgreSQL Company - Command Prompt, Inc.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-14 00:55:34
Message-ID:	BANLkTik-15v5d3FLMJk_utYNkCyLHbXajA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> The key is putting a rapid hard stop to all fast-path lock acquisitions and
> then reconstructing a valid global picture of the affected lock table regions.
> Your 1024-way table of strong lock counts sounds promising. (Offhand, I do
> think they would need to be counts, not just flags.)
>
> If I'm understanding correctly, your pseudocode would look roughly like this:
>
> if (level >= ShareUpdateExclusiveLock)
> ++strong_lock_counts[my_strong_lock_count_partition]
> sfence
> if (strong_lock_counts[my_strong_lock_count_partition] == 1)
> /* marker 1 */
> import_all_local_locks
> normal_LockAcquireEx
> else if (level <= RowExclusiveLock)
> lfence
> if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> /* marker 2 */
> local_only
> /* marker 3 */
> else
> normal_LockAcquireEx
> else
> normal_LockAcquireEx

I think ShareUpdateExclusiveLock should be treated as neither weak nor
strong. It certainly can't be treated as weak - i.e. use the fast
path - because it's self-conflicting. It could be treated as strong,
but since it doesn't conflict with any of the weak lock types, that
would only serve to prevent fast-path lock acquisitions that otherwise
could have succeeded. In particular, it would unnecessarily disable
fast-path lock acquisition for any relation being vacuumed, which
could be really ugly considering that one of the main workloads that
would benefit from something like this is the case where lots of
backends are fighting over a lock manager partition lock on a table
they all want to run read and/or modify. I think it's best for
ShareUpdateExclusiveLock to always use the regular lock-acquisition
path, but it need not worry about incrementing strong_lock_counts[] or
importing local locks in so doing.

Also, I think in the step just after marker one, we'd only import only
local locks whose lock tags were equal to the lock tag on which we
were attempting to acquire a strong lock. The downside of this whole
approach is that acquiring a strong lock becomes, at least
potentially, a lot slower, because you have to scan through the whole
backend array looking for fast-path locks to import (let's not use the
term "local lock", which is already in use within the lock manager
code). But maybe that can be optimized enough not to matter. After
all, if the lock manager scaled perfectly at high concurrency, we
wouldn't be thinking about this in the first place.

> At marker 1, we need to block until no code is running between markers two and
> three. You could do that with a per-backend lock (LW_SHARED by the strong
> locker, LW_EXCLUSIVE by the backend). That would probably still be a win over
> the current situation, but it would be nice to have something even cheaper.

I don't have a better idea than to use an LWLock. I have a patch
floating around to speed up our LWLock implementation, but I haven't
got a workload where the bottleneck is the actual speed of operation
of the LWLock rather than the fact that it's contended in the first
place. And the whole point of this would be to arrange things so that
the LWLocks are uncontended nearly all the time.

> Then you have the actual procedure for transfer of local locks to the global
> lock manager. Using record locks in a file could work, but that's a system call
> per lock acquisition regardless of the presence of strong locks. Is that cost
> sufficiently trivial?

No, I don't think we want to go into kernel space. When I spoke of
using a file, I was imagining it as an mmap'd region that one backend
could write to which, at need, another backend could mmap and grovel
through. Another (perhaps simpler) option would be to just put it in
shared memory. That doesn't give you as much flexibility in terms of
expanding the segment, but it would be reasonable to allow space for
only, dunno, say 32 locks per backend in shared memory; if you need
more than that, you flush them all to the main lock table and start
over. You could possibly even just make this a hack for the
particular special case where we're taking a relation lock on a
non-shared relation; then you'd need only 128 bytes for a 32-entry
array, plus the LWLock (I think the database ID is already visible in
shared memory).

> I wonder if, instead, we could signal all backends at
> marker 1 to dump the applicable parts of their local (memory) lock tables to
> files. Or to another shared memory region, if that didn't mean statically
> allocating the largest possible required amount. If we were willing to wait
> until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the
> global insertions directly. That might yield a decent amount of bug swatting to
> fill in missing CHECK_FOR_INTERRUPTS, though.

I've thought about this; I believe it's unworkable. If one backend
goes into the tank (think: SIGSTOP, or blocking on I/O to an
unreadable disk sector) this could lead to cascading failure.

> Improvements in this area would also have good synergy with efforts to reduce
> the global impact of temporary table usage. CREATE TEMP TABLE can be the
> major source of AccessExclusiveLock acquisitions. However, with the strong
> lock indicator partitioned 1024 ways or more, that shouldn't be a deal killer.

If that particular case is a problem for you, it seems like optimizing
away the exclusive lock altogether might be possible. No other
backend should be able to touch the table until the transaction
commits anyhow, and at that point we're going to release the lock.
There are possibly some sticky wickets here but it seems at least
worth thinking about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-14 03:05:50
Message-ID:	20110514030550.GB22947@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 08:55:34PM -0400, Robert Haas wrote:
> On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > If I'm understanding correctly, your pseudocode would look roughly like this:
> >
> > ? ? ? ?if (level >= ShareUpdateExclusiveLock)

> I think ShareUpdateExclusiveLock should be treated as neither weak nor
> strong.

Indeed; that should be ShareLock.

> It certainly can't be treated as weak - i.e. use the fast
> path - because it's self-conflicting. It could be treated as strong,
> but since it doesn't conflict with any of the weak lock types, that
> would only serve to prevent fast-path lock acquisitions that otherwise
> could have succeeded. In particular, it would unnecessarily disable
> fast-path lock acquisition for any relation being vacuumed, which
> could be really ugly considering that one of the main workloads that
> would benefit from something like this is the case where lots of
> backends are fighting over a lock manager partition lock on a table
> they all want to run read and/or modify. I think it's best for
> ShareUpdateExclusiveLock to always use the regular lock-acquisition
> path, but it need not worry about incrementing strong_lock_counts[] or
> importing local locks in so doing.

Agreed.

> Also, I think in the step just after marker one, we'd only import only
> local locks whose lock tags were equal to the lock tag on which we
> were attempting to acquire a strong lock. The downside of this whole
> approach is that acquiring a strong lock becomes, at least
> potentially, a lot slower, because you have to scan through the whole
> backend array looking for fast-path locks to import (let's not use the
> term "local lock", which is already in use within the lock manager
> code). But maybe that can be optimized enough not to matter. After
> all, if the lock manager scaled perfectly at high concurrency, we
> wouldn't be thinking about this in the first place.

Incidentally, I used the term "local lock" because I assumed fast-path locks
would still go through the lock manager far enough to populate the local lock
table. But there may be no reason to do so.

> > I wonder if, instead, we could signal all backends at
> > marker 1 to dump the applicable parts of their local (memory) lock tables to
> > files. ?Or to another shared memory region, if that didn't mean statically
> > allocating the largest possible required amount. ?If we were willing to wait
> > until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the
> > global insertions directly. ?That might yield a decent amount of bug swatting to
> > fill in missing CHECK_FOR_INTERRUPTS, though.
>
> I've thought about this; I believe it's unworkable. If one backend
> goes into the tank (think: SIGSTOP, or blocking on I/O to an
> unreadable disk sector) this could lead to cascading failure.

True. It would need some fairly major advantages to justify that risk, and I
don't see any.

Overall, looks like a promising design sketch to me. Thanks.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-14 14:01:01
Message-ID:	BANLkTinCYnmzzH3WbtOdEc4NG-RKnyGOzg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 11:05 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Incidentally, I used the term "local lock" because I assumed fast-path locks
> would still go through the lock manager far enough to populate the local lock
> table. But there may be no reason to do so.

Oh, good point. I think we probably WOULD need to update the local
lock lock hash table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-14 15:14:27
Message-ID:	21022.1305386067@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, May 13, 2011 at 11:05 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> Incidentally, I used the term "local lock" because I assumed fast-path locks
>> would still go through the lock manager far enough to populate the local lock
>> table. But there may be no reason to do so.

> Oh, good point. I think we probably WOULD need to update the local
> lock lock hash table.

I haven't read this thread closely, but the general behavior of the
backend assumes that it's very very cheap to re-acquire a lock that's
already held by the current transaction. It's probably worth
maintaining a local counter just so you can keep that being true, even
if there were no other need for it. (Since I've not read the thread,
I'll refrain from asking how you're gonna clean up at transaction end
if there's no local memory of what locks you hold.)

regards, tom lane

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-14 17:33:36
Message-ID:	BANLkTi=6kBJ0=opKEFrGM9akL5BSwP0EZA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 5:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:

>> I wonder if, instead, we could signal all backends at
>> marker 1 to dump the applicable parts of their local (memory) lock tables to
>> files. Or to another shared memory region, if that didn't mean statically
>> allocating the largest possible required amount. If we were willing to wait
>> until all backends reach a CHECK_FOR_INTERRUPTS, they could instead make the
>> global insertions directly. That might yield a decent amount of bug swatting to
>> fill in missing CHECK_FOR_INTERRUPTS, though.
>
> I've thought about this; I believe it's unworkable. If one backend
> goes into the tank (think: SIGSTOP, or blocking on I/O to an
> unreadable disk sector) this could lead to cascading failure.

Would that risk be substantially worse than it currently is? If a
backend goes into the tank while holding access shared locks, it will
still block access exclusive locks until it recovers. And those
queued access exclusive locks will block new access shared locks from
other backends. How much is risk magnified by the new approach,
going from "any backend holding the lock is tanked" to "any process at
all is tanked"?

What I'd considered playing with in the past is having
LockMethodLocalHash hang on to an Access Shared lock even after
locallock->nLocks == 0, so that re-granting the lock would be a purely
local operation. Anyone wanting an Access Exclusive lock and not
immediately getting it would have to send out a plea (via SINVA?) for
other processes to release their locallock->nLocks == 0 locks. But
this would suffer from the same problem of tanked processes.

Cheers,

Jeff

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-15 18:12:11
Message-ID:	BANLkTimvEtXpp4-XiXjf9P8FonuJBpEYZw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, May 14, 2011 at 1:33 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> Would that risk be substantially worse than it currently is? If a
> backend goes into the tank while holding access shared locks, it will
> still block access exclusive locks until it recovers. And those
> queued access exclusive locks will block new access shared locks from
> other backends. How much is risk magnified by the new approach,
> going from "any backend holding the lock is tanked" to "any process at
> all is tanked"?

I think that's a pretty substantial increase in risk. Consider that
there may be 100 backends out there, one of which holds a relevant
lock. Needing to wait for all of them to do something instead of just
one is quite different.

Also, quite apart from the possibility of hanging altogether, the
latency would probably be increased quite a bit, and not in a very
predictable fashion.

I have the impression that most of the problem comes from fighting
over CPU cache lines. If that's correct, it may not be important to
avoid shared memory access per se; it may be good enough to arrange
things so that the shared memory which is accessed is *typically* not
being accessed by other backends.

> What I'd considered playing with in the past is having
> LockMethodLocalHash hang on to an Access Shared lock even after
> locallock->nLocks == 0, so that re-granting the lock would be a purely
> local operation. Anyone wanting an Access Exclusive lock and not
> immediately getting it would have to send out a plea (via SINVA?) for
> other processes to release their locallock->nLocks == 0 locks. But
> this would suffer from the same problem of tanked processes.

Yeah. I have thought about this, too, but as with Noah's suggestion,
I think this would make the risk of things hanging up substantially
worse than it is now. A backend that, under the present code,
wouldn't be holding an AccessShareLock at all, would now be holding
one that you'd have to convince it to release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 01:15:27
Message-ID:	BANLkTikvVVQsuQ_4ziY6EWATXVqoyZbjXA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> if (level >= ShareUpdateExclusiveLock)
> ++strong_lock_counts[my_strong_lock_count_partition]
> sfence
> if (strong_lock_counts[my_strong_lock_count_partition] == 1)
> /* marker 1 */
> import_all_local_locks
> normal_LockAcquireEx
> else if (level <= RowExclusiveLock)
> lfence
> if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> /* marker 2 */
> local_only
> /* marker 3 */
> else
> normal_LockAcquireEx
> else
> normal_LockAcquireEx
>
> At marker 1, we need to block until no code is running between markers two and
> three. You could do that with a per-backend lock (LW_SHARED by the strong
> locker, LW_EXCLUSIVE by the backend). That would probably still be a win over
> the current situation, but it would be nice to have something even cheaper.

Barring some brilliant idea, or anyway for a first cut, it seems to me
that we can adjust the above pseudocode by assuming the use of a
LWLock. In addition, two other adjustments: first, the first line
should test level > ShareUpdateExclusiveLock, rather than >=, per
previous discussion. Second, import_all_local locks needn't really
move everything; just those locks with a matching locktag. Thus:

! if (level > ShareUpdateExclusiveLock)
! ++strong_lock_counts[my_strong_lock_count_partition]
! sfence
! for each backend
! take per-backend lwlock for target backend
! transfer fast-path entries with matching locktag
! release per-backend lwlock for target backend
! normal_LockAcquireEx
! else if (level <= RowExclusiveLock)
! lfence
! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
! take per-backend lwlock for own backend
! fast-path lock acquisition
! release per-backend lwlock for own backend
! else
! normal_LockAcquireEx
! else
! normal_LockAcquireEx

Now, a small fly in the ointment is that we haven't got, with
PostgreSQL, a portable library of memory primitives. So there isn't
an obvious way of doing that sfence/lfence business. Now, it seems to
me that in the "strong lock" case, the sfence isn't really needed
anyway, because we're about to start acquiring and releasing an lwlock
for every backend, and that had better act as a full memory barrier
anyhow, or we're doomed. The "weak lock" case is more interesting,
because we need the fence before we've taken any LWLock.

But perhaps it'd be sufficient to just acquire the per-backend lwlock
before checking strong_lock_counts[]. If, as we hope, we get back a
zero, then we do the fast-path lock acquisition, release the lwlock,
and away we go. If we get back any other value, then we've wasted an
lwlock acquisition cycle. Or actually maybe not: it seems to me that
in that case we'd better transfer all of our fast-path entries into
the main hash table before trying to acquire any lock the slow way, at
least if we don't want the deadlock detector to have to know about the
fast-path. So then we get this:

! if (level > ShareUpdateExclusiveLock)
! ++strong_lock_counts[my_strong_lock_count_partition]
! for each backend
! take per-backend lwlock for target backend
! transfer fastpath entries with matching locktag
! release per-backend lwlock for target backend
! else if (level <= RowExclusiveLock)
! take per-backend lwlock for own backend
! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
! fast-path lock acquisition
! done = true
! else
! transfer all fastpath entries
! release per-backend lwlock for own backend
! if (!done)
! normal_LockAcquireEx

That seems like it ought to work, at least assuming the position of
your fencing instructions was correct in the first place. But there's
one big problem to worry about: what happens if the lock transfer
fails due to shared memory exhaustion? It's not so bad in the "weak
lock" case; it'll feel just like the already-existing case where you
try to push another lock into the shared-memory hash table and there's
no room. Essentially you've been living on borrowed time anyway. On
the other hand, the "strong lock" case is a real problem, because a
large number of granted fast-path locks can effectively DOS any strong
locker, even one that wouldn't have conflicted with them. That's
clearly not going to fly, but it's not clear to me what the best way
is to patch around it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 09:07:34
Message-ID:	20110524090734.GA18831@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote:
> On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > ? ? ? ?if (level >= ShareUpdateExclusiveLock)
> > ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition]
> > ? ? ? ? ? ? ? ?sfence
> > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1)
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */
> > ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks
> > ? ? ? ? ? ? ? ?normal_LockAcquireEx
> > ? ? ? ?else if (level <= RowExclusiveLock)
> > ? ? ? ? ? ? ? ?lfence
> > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */
> > ? ? ? ? ? ? ? ? ? ? ? ?local_only
> > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */
> > ? ? ? ? ? ? ? ?else
> > ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx
> > ? ? ? ?else
> > ? ? ? ? ? ? ? ?normal_LockAcquireEx
> >
> > At marker 1, we need to block until no code is running between markers two and
> > three. ?You could do that with a per-backend lock (LW_SHARED by the strong
> > locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win over
> > the current situation, but it would be nice to have something even cheaper.
>
> Barring some brilliant idea, or anyway for a first cut, it seems to me
> that we can adjust the above pseudocode by assuming the use of a
> LWLock. In addition, two other adjustments: first, the first line
> should test level > ShareUpdateExclusiveLock, rather than >=, per
> previous discussion. Second, import_all_local locks needn't really
> move everything; just those locks with a matching locktag. Thus:
>
> ! if (level > ShareUpdateExclusiveLock)
> ! ++strong_lock_counts[my_strong_lock_count_partition]
> ! sfence
> ! for each backend
> ! take per-backend lwlock for target backend
> ! transfer fast-path entries with matching locktag
> ! release per-backend lwlock for target backend
> ! normal_LockAcquireEx
> ! else if (level <= RowExclusiveLock)
> ! lfence
> ! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> ! take per-backend lwlock for own backend
> ! fast-path lock acquisition
> ! release per-backend lwlock for own backend
> ! else
> ! normal_LockAcquireEx
> ! else
> ! normal_LockAcquireEx

This drops the part about only transferring fast-path entries once when a
strong_lock_counts cell transitions from zero to one. Granted, that itself
requires some yet-undiscussed locking. For that matter, we can't have
multiple strong lockers completing transfers on the same cell in parallel.
Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
strong locker holds for that entire "if" body and while decrementing the
strong_lock_counts cell at lock release.

As far as the level of detail of this pseudocode goes, there's no need to hold
the per-backend LWLock while transferring the fast-path entries. You just
need to hold it sometime between bumping strong_lock_counts and transferring
the backend's locks. This ensures that, for example, the backend is not
sleeping in the middle of a fast-path lock acquisition for the whole duration
of this code.

> Now, a small fly in the ointment is that we haven't got, with
> PostgreSQL, a portable library of memory primitives. So there isn't
> an obvious way of doing that sfence/lfence business.

I was thinking that, if the final implementation could benefit from memory
barrier interfaces, we should create those interfaces now. Start with only a
platform-independent dummy implementation that runs a lock/unlock cycle on a
spinlock residing in backend-local memory. I'm 75% sure that would be
sufficient on all architectures for which we support spinlocks. It may turn
out that we can't benefit from such interfaces at this time ...

> Now, it seems to
> me that in the "strong lock" case, the sfence isn't really needed
> anyway, because we're about to start acquiring and releasing an lwlock
> for every backend, and that had better act as a full memory barrier
> anyhow, or we're doomed. The "weak lock" case is more interesting,
> because we need the fence before we've taken any LWLock.

Agreed.

> But perhaps it'd be sufficient to just acquire the per-backend lwlock
> before checking strong_lock_counts[]. If, as we hope, we get back a
> zero, then we do the fast-path lock acquisition, release the lwlock,
> and away we go. If we get back any other value, then we've wasted an
> lwlock acquisition cycle. Or actually maybe not: it seems to me that
> in that case we'd better transfer all of our fast-path entries into
> the main hash table before trying to acquire any lock the slow way, at
> least if we don't want the deadlock detector to have to know about the
> fast-path. So then we get this:
>
> ! if (level > ShareUpdateExclusiveLock)
> ! ++strong_lock_counts[my_strong_lock_count_partition]
> ! for each backend
> ! take per-backend lwlock for target backend
> ! transfer fastpath entries with matching locktag
> ! release per-backend lwlock for target backend
> ! else if (level <= RowExclusiveLock)
> ! take per-backend lwlock for own backend
> ! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
> ! fast-path lock acquisition
> ! done = true
> ! else
> ! transfer all fastpath entries
> ! release per-backend lwlock for own backend
> ! if (!done)
> ! normal_LockAcquireEx

Could you elaborate on the last part (the need for "else transfer all fastpath
entries") and, specifically, how it aids deadlock avoidance? I didn't think
this change would have any impact on deadlocks, because all relevant locks
will be in the global lock table before any call to normal_LockAcquireEx.

To validate the locking at this level of detail, I think we need to sketch the
unlock protocol, too. On each strong lock release, we'll decrement the
strong_lock_counts cell. No particular interlock with fast-path lockers
should be needed; a stray AccessShareLock needlessly making it into the global
lock table is no problem. As mentioned above, we _will_ need an interlock
with lock transfer operations. How will transferred fast-path locks get
removed from the global lock table? Presumably, the original fast-path locker
should do so at transaction end; anything else would contort the life cycle.
Then add a way for the backend to know which locks had been transferred as
well as an interlock against concurrent transfer operations. Maybe that's
all.

> That seems like it ought to work, at least assuming the position of
> your fencing instructions was correct in the first place. But there's
> one big problem to worry about: what happens if the lock transfer
> fails due to shared memory exhaustion? It's not so bad in the "weak
> lock" case; it'll feel just like the already-existing case where you
> try to push another lock into the shared-memory hash table and there's
> no room. Essentially you've been living on borrowed time anyway. On
> the other hand, the "strong lock" case is a real problem, because a
> large number of granted fast-path locks can effectively DOS any strong
> locker, even one that wouldn't have conflicted with them. That's
> clearly not going to fly, but it's not clear to me what the best way
> is to patch around it.

To put it another way: the current system is fair; the chance of hitting lock
exhaustion is independent of lock level. The new system would be unfair; lock
exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock
acquisition, through no fault of that transaction. I agree this isn't ideal,
but it doesn't look to me like an unacceptable weakness. Making lock slots
first-come, first-served is inherently unfair; we're not at all set up to
justly arbitrate between mutually-hostile lockers competing for slots. The
overall situation will get better, not worse, for the admin who wishes to
defend against hostile unprivileged users attempting a lock table DOS.

Thanks,
nm

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 12:53:11
Message-ID:	BANLkTikxUXEyupnyOX+2zTjav4Pxz9txBg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> This drops the part about only transferring fast-path entries once when a
> strong_lock_counts cell transitions from zero to one.

Right: that's because I don't think that's what we want to do. I
don't think we want to transfer all per-backend locks to the shared
hash table as soon as anyone attempts to acquire a strong lock;
instead, I think we want to transfer only those fast-path locks which
have the same locktag as the strong lock someone is attempting to
acquire. If we do that, then it doesn't matter whether the
strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7:
we still have to check for strong locks with that particular locktag.

> Granted, that itself
> requires some yet-undiscussed locking. For that matter, we can't have
> multiple strong lockers completing transfers on the same cell in parallel.
> Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
> strong locker holds for that entire "if" body and while decrementing the
> strong_lock_counts cell at lock release.

I was imagining that the per-backend LWLock would protect the list of
fast-path locks. So to transfer locks, you would acquire the
per-backend LWLock for the backend which has the lock, and then the
lock manager partition LWLock, and then perform the transfer.

> As far as the level of detail of this pseudocode goes, there's no need to hold
> the per-backend LWLock while transferring the fast-path entries. You just
> need to hold it sometime between bumping strong_lock_counts and transferring
> the backend's locks. This ensures that, for example, the backend is not
> sleeping in the middle of a fast-path lock acquisition for the whole duration
> of this code.

See above; I'm lost.

>> Now, a small fly in the ointment is that we haven't got, with
>> PostgreSQL, a portable library of memory primitives. So there isn't
>> an obvious way of doing that sfence/lfence business.
>
> I was thinking that, if the final implementation could benefit from memory
> barrier interfaces, we should create those interfaces now. Start with only a
> platform-independent dummy implementation that runs a lock/unlock cycle on a
> spinlock residing in backend-local memory. I'm 75% sure that would be
> sufficient on all architectures for which we support spinlocks. It may turn
> out that we can't benefit from such interfaces at this time ...

OK.

>> Now, it seems to
>> me that in the "strong lock" case, the sfence isn't really needed
>> anyway, because we're about to start acquiring and releasing an lwlock
>> for every backend, and that had better act as a full memory barrier
>> anyhow, or we're doomed. The "weak lock" case is more interesting,
>> because we need the fence before we've taken any LWLock.
>
> Agreed.
>
>> But perhaps it'd be sufficient to just acquire the per-backend lwlock
>> before checking strong_lock_counts[]. If, as we hope, we get back a
>> zero, then we do the fast-path lock acquisition, release the lwlock,
>> and away we go. If we get back any other value, then we've wasted an
>> lwlock acquisition cycle. Or actually maybe not: it seems to me that
>> in that case we'd better transfer all of our fast-path entries into
>> the main hash table before trying to acquire any lock the slow way, at
>> least if we don't want the deadlock detector to have to know about the
>> fast-path. So then we get this:
>>
>> ! if (level > ShareUpdateExclusiveLock)
>> ! ++strong_lock_counts[my_strong_lock_count_partition]
>> ! for each backend
>> ! take per-backend lwlock for target backend
>> ! transfer fastpath entries with matching locktag
>> ! release per-backend lwlock for target backend
>> ! else if (level <= RowExclusiveLock)
>> ! take per-backend lwlock for own backend
>> ! if (strong_lock_counts[my_strong_lock_count_partition] == 0)
>> ! fast-path lock acquisition
>> ! done = true
>> ! else
>> ! transfer all fastpath entries
>> ! release per-backend lwlock for own backend
>> ! if (!done)
>> ! normal_LockAcquireEx
>
> Could you elaborate on the last part (the need for "else transfer all fastpath
> entries") and, specifically, how it aids deadlock avoidance? I didn't think
> this change would have any impact on deadlocks, because all relevant locks
> will be in the global lock table before any call to normal_LockAcquireEx.

Oh, hmm, maybe you're right. I was concerned about the possibility
that of a backend which already holds locks going to sleep on a lock
wait, and maybe running the deadlock detector, and failing to notice a
deadlock. But I guess that can't happen: if any of the locks it holds
are relevant to the deadlock detector, the backend attempting to
acquire those locks will transfer them before attempting to acquire
the lock itself, so it should be OK.

> To validate the locking at this level of detail, I think we need to sketch the
> unlock protocol, too. On each strong lock release, we'll decrement the
> strong_lock_counts cell. No particular interlock with fast-path lockers
> should be needed; a stray AccessShareLock needlessly making it into the global
> lock table is no problem. As mentioned above, we _will_ need an interlock
> with lock transfer operations. How will transferred fast-path locks get
> removed from the global lock table? Presumably, the original fast-path locker
> should do so at transaction end; anything else would contort the life cycle.
> Then add a way for the backend to know which locks had been transferred as
> well as an interlock against concurrent transfer operations. Maybe that's
> all.

I'm thinking that the backend can note, in its local-lock table,
whether it originally acquired a lock via the fast-path or not. Any
lock not originally acquired via the fast-path will be released just
as now. For any lock that WAS originally acquired via the fast-path,
we'll take our own per-backend lwlock, which protects the fast-path
queue, and scan the fast-path queue for a matching entry. If none is
found, then we know the lock was transferred, so release the
per-backend lwlock and do it the regular way (take lock manager
partition lock, etc.).

At transaction end, we need to release all non-session locks, so we
can consolidate a bit to avoid excess locking and unlocking. Take the
per-backend lwlock just once and scan through the queue. Any locks we
find there (that are not session locks) can be nuked from the
local-lock table and we're done. Release the per-backend lwlock. At
this point, any remaining locks that need to be released are in the
shared hash tables and we can proceed as now (see LockReleaseAll -
basically, we iterate over the lock partitions).

> To put it another way: the current system is fair; the chance of hitting lock
> exhaustion is independent of lock level. The new system would be unfair; lock
> exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock
> acquisition, through no fault of that transaction. I agree this isn't ideal,
> but it doesn't look to me like an unacceptable weakness. Making lock slots
> first-come, first-served is inherently unfair; we're not at all set up to
> justly arbitrate between mutually-hostile lockers competing for slots. The
> overall situation will get better, not worse, for the admin who wishes to
> defend against hostile unprivileged users attempting a lock table DOS.

Well, it's certainly true that the proposed system is far less likely
to bomb out trying to acquire an AccessShareLock than what we have
today, since in the common case the AccessShareLock doesn't use up any
shared resources. And that should make a lot of people happy. But as
to the bad scenario, one needn't presume that the lockers are hostile
- it may just be that the system is running on the edge of a full lock
table. In the worst case, someone wanting a strong lock on a table
may end up transferring a hundred or more locks (up to one per
backend) into that table. Even if they all fit and the strong locker
gets his lock, it may now happen that the space is just about
exhausted and other transactions start rolling back, apparently at
random.

Or, even more pathologically, one backend grabs a strong lock on table
X, but it just so happens that there is another table Y in the same
lock partition which is highly trafficked but, as all of the locks
involved are weak, uncontended. Now that strong_lock_counts[] is
non-zero for that partition, all those locks start going into the main
lock manager table. Performance will drop, which isn't great but we
can live with it. But maybe all the locks don't fit. Now we have a
situation in which, due to one backend acquiring a strong lock on
table A, a bunch of other backends are unable to obtain weak locks on
table B, and transactions start rolling back all over the place.

Now maybe you can argue that this scenario is sufficiently unlikely in
practice that we shouldn't worry about it, but if it does happen the
DBA will be incredibly confused, because an apparently innocuous
operation on one table will have resulted in rollbacks acquiring locks
on an apparently unrelated table. Maybe you want to argue that's
sufficiently rare that we shouldn't worry about it, but the
unpleasantness factor seems pretty high to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 14:03:04
Message-ID:	20110524140304.GA21833@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 08:53:11AM -0400, Robert Haas wrote:
> On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > This drops the part about only transferring fast-path entries once when a
> > strong_lock_counts cell transitions from zero to one.
>
> Right: that's because I don't think that's what we want to do. I
> don't think we want to transfer all per-backend locks to the shared
> hash table as soon as anyone attempts to acquire a strong lock;
> instead, I think we want to transfer only those fast-path locks which
> have the same locktag as the strong lock someone is attempting to
> acquire. If we do that, then it doesn't matter whether the
> strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7:
> we still have to check for strong locks with that particular locktag.

Oh, I see. I was envisioning that you'd transfer all locks associated with
the strong_lock_counts cell; that is, all the locks that would now go directly
to the global lock table when requested going forward. Transferring only
exact matches seems fine too, and then I agree with your other conclusions.

> > Granted, that itself
> > requires some yet-undiscussed locking. ?For that matter, we can't have
> > multiple strong lockers completing transfers on the same cell in parallel.
> > Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each
> > strong locker holds for that entire "if" body and while decrementing the
> > strong_lock_counts cell at lock release.
>
> I was imagining that the per-backend LWLock would protect the list of
> fast-path locks. So to transfer locks, you would acquire the
> per-backend LWLock for the backend which has the lock, and then the
> lock manager partition LWLock, and then perform the transfer.

I see later in your description that the transferer will delete each fast-path
lock after transferring it. Given that, this does sound adequate.

> > As far as the level of detail of this pseudocode goes, there's no need to hold
> > the per-backend LWLock while transferring the fast-path entries. ?You just
> > need to hold it sometime between bumping strong_lock_counts and transferring
> > the backend's locks. ?This ensures that, for example, the backend is not
> > sleeping in the middle of a fast-path lock acquisition for the whole duration
> > of this code.
>
> See above; I'm lost.

It wasn't a particularly useful point.

> > To validate the locking at this level of detail, I think we need to sketch the
> > unlock protocol, too. ?On each strong lock release, we'll decrement the
> > strong_lock_counts cell. ?No particular interlock with fast-path lockers
> > should be needed; a stray AccessShareLock needlessly making it into the global
> > lock table is no problem. ?As mentioned above, we _will_ need an interlock
> > with lock transfer operations. ?How will transferred fast-path locks get
> > removed from the global lock table? ?Presumably, the original fast-path locker
> > should do so at transaction end; anything else would contort the life cycle.
> > Then add a way for the backend to know which locks had been transferred as
> > well as an interlock against concurrent transfer operations. ?Maybe that's
> > all.
>
> I'm thinking that the backend can note, in its local-lock table,
> whether it originally acquired a lock via the fast-path or not. Any
> lock not originally acquired via the fast-path will be released just
> as now. For any lock that WAS originally acquired via the fast-path,
> we'll take our own per-backend lwlock, which protects the fast-path
> queue, and scan the fast-path queue for a matching entry. If none is
> found, then we know the lock was transferred, so release the
> per-backend lwlock and do it the regular way (take lock manager
> partition lock, etc.).

Sounds good.

> > To put it another way: the current system is fair; the chance of hitting lock
> > exhaustion is independent of lock level. ?The new system would be unfair; lock
> > exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock
> > acquisition, through no fault of that transaction. ?I agree this isn't ideal,
> > but it doesn't look to me like an unacceptable weakness. ?Making lock slots
> > first-come, first-served is inherently unfair; we're not at all set up to
> > justly arbitrate between mutually-hostile lockers competing for slots. ?The
> > overall situation will get better, not worse, for the admin who wishes to
> > defend against hostile unprivileged users attempting a lock table DOS.
>
> Well, it's certainly true that the proposed system is far less likely
> to bomb out trying to acquire an AccessShareLock than what we have
> today, since in the common case the AccessShareLock doesn't use up any
> shared resources. And that should make a lot of people happy. But as
> to the bad scenario, one needn't presume that the lockers are hostile
> - it may just be that the system is running on the edge of a full lock
> table. In the worst case, someone wanting a strong lock on a table
> may end up transferring a hundred or more locks (up to one per
> backend) into that table. Even if they all fit and the strong locker
> gets his lock, it may now happen that the space is just about
> exhausted and other transactions start rolling back, apparently at
> random.
>
> Or, even more pathologically, one backend grabs a strong lock on table
> X, but it just so happens that there is another table Y in the same
> lock partition which is highly trafficked but, as all of the locks
> involved are weak, uncontended. Now that strong_lock_counts[] is
> non-zero for that partition, all those locks start going into the main
> lock manager table. Performance will drop, which isn't great but we
> can live with it. But maybe all the locks don't fit. Now we have a
> situation in which, due to one backend acquiring a strong lock on
> table A, a bunch of other backends are unable to obtain weak locks on
> table B, and transactions start rolling back all over the place.
>
> Now maybe you can argue that this scenario is sufficiently unlikely in
> practice that we shouldn't worry about it, but if it does happen the
> DBA will be incredibly confused, because an apparently innocuous
> operation on one table will have resulted in rollbacks acquiring locks
> on an apparently unrelated table. Maybe you want to argue that's
> sufficiently rare that we shouldn't worry about it, but the
> unpleasantness factor seems pretty high to me.

Let's see if I understand the risk better now: the new system will handle lock
load better, but when it does hit a limit, understanding why that happened
will be more difficult. Good point. No silver-bullet ideas come to mind for
avoiding that. A lock transfer operation could abort if it sees itself
burning all the shared memory, but I'd be pessimistic about identifying a
concrete heuristic not subject to its own confusing corner cases. Overall,
the original outcome doesn't sound especially confusing to me. YMMV.

Will the pg_locks view scan fast-path lock tables? If not, we probably need
another view that does. We can then encourage administrators to monitor for
fast-path lock counts that get high relative to shared memory capacity.

Thanks,
nm

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 14:35:23
Message-ID:	BANLkTi=3KD=9V-Lmc5uODMCci=pzBp-43g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 10:03 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> Let's see if I understand the risk better now: the new system will handle lock
> load better, but when it does hit a limit, understanding why that happened
> will be more difficult. Good point. No silver-bullet ideas come to mind for
> avoiding that.

The only idea I can think of is to try to impose some bounds. For
example, suppose we track the total number of locks that the system
can handle in the shared hash table. We try to maintain the system in
a state where the number of locks that actually exist is less than
that number, even though some of them may be stored elsewhere. You
can imagine a system where backends grab a global mutex to reserve a
certain number of slots, and store that in shared memory together with
their fast-path list, but another backend which is desperate for space
can go through and trim back reservations to actual usage.

> Will the pg_locks view scan fast-path lock tables? If not, we probably need
> another view that does. We can then encourage administrators to monitor for
> fast-path lock counts that get high relative to shared memory capacity.

I think pg_locks should probably scan the fast-path tables.

Another random idea for optimization: we could have a lock-free array
with one entry per backend, indicating whether any fast-path locks are
present. Before acquiring its first fast-path lock, a backend writes
a 1 into that array and inserts a store fence. After releasing its
last fast-path lock, it performs a store fence and writes a 0 into the
array. Anyone who needs to grovel through all the per-backend
fast-path arrays for whatever reason can perform a load fence and then
scan the array. If I understand how this stuff works (and it's very
possible that I don't), when the scanning backend sees a 0, it can be
assured that the target backend has no fast-path locks and therefore
doesn't need to acquire and release that LWLock or scan that fast-path
array for entries.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 15:38:52
Message-ID:	20110524153852.GC21833@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 10:35:23AM -0400, Robert Haas wrote:
> On Tue, May 24, 2011 at 10:03 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > Let's see if I understand the risk better now: the new system will handle lock
> > load better, but when it does hit a limit, understanding why that happened
> > will be more difficult. ?Good point. ?No silver-bullet ideas come to mind for
> > avoiding that.
>
> The only idea I can think of is to try to impose some bounds. For
> example, suppose we track the total number of locks that the system
> can handle in the shared hash table. We try to maintain the system in
> a state where the number of locks that actually exist is less than
> that number, even though some of them may be stored elsewhere. You
> can imagine a system where backends grab a global mutex to reserve a
> certain number of slots, and store that in shared memory together with
> their fast-path list, but another backend which is desperate for space
> can go through and trim back reservations to actual usage.

Forcing artificial resource exhaustion is a high price to pay. I suppose it's
quite like disabling Linux memory overcommit, and some folks would like it.

> Another random idea for optimization: we could have a lock-free array
> with one entry per backend, indicating whether any fast-path locks are
> present. Before acquiring its first fast-path lock, a backend writes
> a 1 into that array and inserts a store fence. After releasing its
> last fast-path lock, it performs a store fence and writes a 0 into the
> array. Anyone who needs to grovel through all the per-backend
> fast-path arrays for whatever reason can perform a load fence and then
> scan the array. If I understand how this stuff works (and it's very
> possible that I don't), when the scanning backend sees a 0, it can be
> assured that the target backend has no fast-path locks and therefore
> doesn't need to acquire and release that LWLock or scan that fast-path
> array for entries.

I'm probably just missing something, but can't that conclusion become obsolete
arbitrarily quickly? What if the scanning backend sees a 0, and the subject
backend is currently sleeping just before it would have bumped that value? We
need to take the LWLock is there's any chance that the subject backend has not
yet seen the scanning backend's strong_lock_counts[] update.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 15:52:54
Message-ID:	BANLkTik4e-r0Z7P=1mGbdthjD91-MnCe5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 11:38 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> Another random idea for optimization: we could have a lock-free array
>> with one entry per backend, indicating whether any fast-path locks are
>> present. Before acquiring its first fast-path lock, a backend writes
>> a 1 into that array and inserts a store fence. After releasing its
>> last fast-path lock, it performs a store fence and writes a 0 into the
>> array. Anyone who needs to grovel through all the per-backend
>> fast-path arrays for whatever reason can perform a load fence and then
>> scan the array. If I understand how this stuff works (and it's very
>> possible that I don't), when the scanning backend sees a 0, it can be
>> assured that the target backend has no fast-path locks and therefore
>> doesn't need to acquire and release that LWLock or scan that fast-path
>> array for entries.
>
> I'm probably just missing something, but can't that conclusion become obsolete
> arbitrarily quickly? What if the scanning backend sees a 0, and the subject
> backend is currently sleeping just before it would have bumped that value? We
> need to take the LWLock is there's any chance that the subject backend has not
> yet seen the scanning backend's strong_lock_counts[] update.

Can't we bump strong_lock_counts[] *first*, make sure that change is
globally visible, and only then start scanning the array?

Once we've bumped strong_lock_counts[] and made sure everyone can see
that change, it's still possible for backends to take a fast-path lock
in some *other* fast-path partition, but nobody should be able to add
any more fast-path locks in the partition we care about after that
point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 16:34:26
Message-ID:	20110524163426.GD21833@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 11:52:54AM -0400, Robert Haas wrote:
> On Tue, May 24, 2011 at 11:38 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >> Another random idea for optimization: we could have a lock-free array
> >> with one entry per backend, indicating whether any fast-path locks are
> >> present. ?Before acquiring its first fast-path lock, a backend writes
> >> a 1 into that array and inserts a store fence. ?After releasing its
> >> last fast-path lock, it performs a store fence and writes a 0 into the
> >> array. ?Anyone who needs to grovel through all the per-backend
> >> fast-path arrays for whatever reason can perform a load fence and then
> >> scan the array. ?If I understand how this stuff works (and it's very
> >> possible that I don't), when the scanning backend sees a 0, it can be
> >> assured that the target backend has no fast-path locks and therefore
> >> doesn't need to acquire and release that LWLock or scan that fast-path
> >> array for entries.
> >
> > I'm probably just missing something, but can't that conclusion become obsolete
> > arbitrarily quickly? ?What if the scanning backend sees a 0, and the subject
> > backend is currently sleeping just before it would have bumped that value? ?We
> > need to take the LWLock is there's any chance that the subject backend has not
> > yet seen the scanning backend's strong_lock_counts[] update.
>
> Can't we bump strong_lock_counts[] *first*, make sure that change is
> globally visible, and only then start scanning the array?
>
> Once we've bumped strong_lock_counts[] and made sure everyone can see
> that change, it's still possible for backends to take a fast-path lock
> in some *other* fast-path partition, but nobody should be able to add
> any more fast-path locks in the partition we care about after that
> point.

There's a potentially-unbounded delay between when the subject backend reads
strong_lock_counts[] and when it sets its fast-path-used flag. (I didn't mean
"not yet seen" in the sense that some memory load would not show the latest
value. I just meant that the subject backend may still be taking relevant
actions based on its previous load of the value.) We could have the subject
set its fast-path-used flag before even checking strong_lock_counts[], then
clear the flag when strong_lock_counts[] dissuaded it from proceeding. Maybe
that's what you had in mind?

That being said, it's a slight extra cost for all fast-path lockers to benefit
the strong lockers, so I'm not prepared to guess whether it will pay off.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-24 17:37:38
Message-ID:	BANLkTi=CN8GJJki5UvfuSMm86g+bMFt_-A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 12:34 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> There's a potentially-unbounded delay between when the subject backend reads
> strong_lock_counts[] and when it sets its fast-path-used flag. (I didn't mean
> "not yet seen" in the sense that some memory load would not show the latest
> value. I just meant that the subject backend may still be taking relevant
> actions based on its previous load of the value.) We could have the subject
> set its fast-path-used flag before even checking strong_lock_counts[], then
> clear the flag when strong_lock_counts[] dissuaded it from proceeding. Maybe
> that's what you had in mind?

I'd like to say yes, but actually, no, I just failed to notice the
race condition. It's definitely less appealing if we have to do it
that way.

Another idea would be to only clear the fast-path-used flags lazily.
If backend A inspects the fast-path queue for backend B and finds it
completely empty, it clears the flag; otherwise it just stays set
indefinitely.

> That being said, it's a slight extra cost for all fast-path lockers to benefit
> the strong lockers, so I'm not prepared to guess whether it will pay off.

Yeah. Basically this entire idea is about trying to make life easier
for weak lockers at the expense of making it more difficult for strong
lockers. I think that's a good trade-off in general, but we might
need to wait until we have an actual implementation to judge whether
we've turned the dial too far.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-25 12:27:47
Message-ID:	BANLkTinhhL8e3hhrQUiL4H=7XHMY++oErg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 6:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> That being said, it's a slight extra cost for all fast-path lockers to benefit
>> the strong lockers, so I'm not prepared to guess whether it will pay off.
>
> Yeah. Basically this entire idea is about trying to make life easier
> for weak lockers at the expense of making it more difficult for strong
> lockers. I think that's a good trade-off in general, but we might
> need to wait until we have an actual implementation to judge whether
> we've turned the dial too far.

I like this overall concept and like the way this has been described
with strong and weak locks. It seems very useful to me, since temp
tables can be skipped. That leaves shared DDL and we have done lots to
reduce the lock levels held and are looking at further reductions
also. I think even quite extensive delays are worth the trade-off.

I'd been looking at this also, though hadn't mentioned it previously
because I found an Oracle patent that discusses dynamically turning on
and off locking. So that's something to be aware of. IMHO if we
discuss this in terms of sharing/not sharing locking information then
it is sufficient to avoid the patent. That patent also discusses the
locking state change needs to wait longer than required.

I got a bit lost with the description of a potential solution. It
seemed like you were unaware that there is a local lock and a shared
lock table, maybe just me?

Design seemed relatively easy from there: put local lock table in
shared memory for all procs. We then have a use_strong_lock at proc
and at transaction level. Anybody that wants a strong lock first sets
use_strong_lock at proc and transaction level, then copies all local
lock data into shared lock table, double checking
TransactionIdIsInProgress() each time. Then queues for lock using the
now fully set up shared lock table. When transaction with strong lock
completes we do not attempt to reset transaction level boolean, only
at proc level, since DDL often occurs in groups and we want to avoid
flip-flopping quickly between lock share states. Cleanup happens by
regularly by bgwriter, perhaps every 10 seconds or so. All locks are
still visible for pg_locks.

Hopefully thats of use.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-25 12:44:57
Message-ID:	BANLkTikBeCRcOXRLsPfJ0kX5NpcMTsxJdw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I got a bit lost with the description of a potential solution. It
> seemed like you were unaware that there is a local lock and a shared
> lock table, maybe just me?

No, I'm not unaware of the local lock table. The point of this
proposal is to avoid fighting over the LWLocks that protect the shared
hash table by allowing some locks to be taken without touching it.

> Design seemed relatively easy from there: put local lock table in
> shared memory for all procs. We then have a use_strong_lock at proc
> and at transaction level. Anybody that wants a strong lock first sets
> use_strong_lock at proc and transaction level, then copies all local
> lock data into shared lock table, double checking
> TransactionIdIsInProgress() each time. Then queues for lock using the
> now fully set up shared lock table. When transaction with strong lock
> completes we do not attempt to reset transaction level boolean, only
> at proc level, since DDL often occurs in groups and we want to avoid
> flip-flopping quickly between lock share states. Cleanup happens by
> regularly by bgwriter, perhaps every 10 seconds or so. All locks are
> still visible for pg_locks.

I'm not following this...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-25 12:56:37
Message-ID:	BANLkTikBRkZjpx-4rdERfU9hmcx81R8TVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> I got a bit lost with the description of a potential solution. It
>> seemed like you were unaware that there is a local lock and a shared
>> lock table, maybe just me?
>
> No, I'm not unaware of the local lock table. The point of this
> proposal is to avoid fighting over the LWLocks that protect the shared
> hash table by allowing some locks to be taken without touching it.

Yes, I got that. I just couldn't work out where mmap came in.

>> Design seemed relatively easy from there: put local lock table in
>> shared memory for all procs. We then have a use_strong_lock at proc
>> and at transaction level. Anybody that wants a strong lock first sets
>> use_strong_lock at proc and transaction level, then copies all local
>> lock data into shared lock table, double checking
>> TransactionIdIsInProgress() each time. Then queues for lock using the
>> now fully set up shared lock table. When transaction with strong lock
>> completes we do not attempt to reset transaction level boolean, only
>> at proc level, since DDL often occurs in groups and we want to avoid
>> flip-flopping quickly between lock share states. Cleanup happens by
>> regularly by bgwriter, perhaps every 10 seconds or so. All locks are
>> still visible for pg_locks.
>
> I'm not following this...

Which bit aren't you following? It's a design outline for how to
implement, deliberately brief to allow a discussion of design
alternatives.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-25 14:35:24
Message-ID:	5273.1306334124@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Design seemed relatively easy from there: put local lock table in
>>> shared memory for all procs. We then have a use_strong_lock at proc
>>> and at transaction level. Anybody that wants a strong lock first sets
>>> use_strong_lock at proc and transaction level, then copies all local
>>> lock data into shared lock table,

>> I'm not following this...

> Which bit aren't you following? It's a design outline for how to
> implement, deliberately brief to allow a discussion of design
> alternatives.

What I'm not following is how moving the local lock table into shared
memory can possibly be a good idea. The reason we invented the local
lock table in the first place (we didn't use to have one) is so that a
process could do some manipulations without touching shared memory.
(Notably, it is currently nearly free, and certainly lock-free, to
re-request a lock type you already hold. This is not an infrequent
case.) That advantage will go away if you do this.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-25 15:15:53
Message-ID:	BANLkTimQxLDPkavZBveOSEHcPRti8gGZQg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 25, 2011 at 8:56 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> I got a bit lost with the description of a potential solution. It
>>> seemed like you were unaware that there is a local lock and a shared
>>> lock table, maybe just me?
>>
>> No, I'm not unaware of the local lock table. The point of this
>> proposal is to avoid fighting over the LWLocks that protect the shared
>> hash table by allowing some locks to be taken without touching it.
>
> Yes, I got that. I just couldn't work out where mmap came in.

I don't think we were planning to use mmap anywhere.

>>> Design seemed relatively easy from there: put local lock table in
>>> shared memory for all procs. We then have a use_strong_lock at proc
>>> and at transaction level. Anybody that wants a strong lock first sets
>>> use_strong_lock at proc and transaction level, then copies all local
>>> lock data into shared lock table, double checking
>>> TransactionIdIsInProgress() each time. Then queues for lock using the
>>> now fully set up shared lock table. When transaction with strong lock
>>> completes we do not attempt to reset transaction level boolean, only
>>> at proc level, since DDL often occurs in groups and we want to avoid
>>> flip-flopping quickly between lock share states. Cleanup happens by
>>> regularly by bgwriter, perhaps every 10 seconds or so. All locks are
>>> still visible for pg_locks.
>>
>> I'm not following this...
>
> Which bit aren't you following? It's a design outline for how to
> implement, deliberately brief to allow a discussion of design
> alternatives.

Well, OK:

1. I don't think putting the local lock table in shared memory is a
good idea both for performance (keeping it uncontended has value) and
memory management (it would increase shared memory requirements quite
a bit). The design Noah and I were discussing upthread uses only a
small and fixed amount of shared memory for each backend, and
preserves the local lock table as an unshared resource.

2. You haven't explained the procedure for acquire a weak lock at all,
and in particular how a weak locker would be able to quickly determine
whether a conflicting strong lock was potentially present.

3. I don't understand the proposed procedure for acquiring a strong
lock either; in particular, I don't see why
TransactionIdIsInProgress() would be relevant. The lock manager
doesn't really do anything with transaction IDs now, and you haven't
offered any explanation of why that would be necessary or advisable.

4. You haven't explained what the transaction-level boolean would
actually do. It's not even clear whether you intend for that to be
kept in local or shared memory. It's also unclear what you intend for
bgwriter to clean up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-25 15:47:31
Message-ID:	201105251547.p4PFlV828380@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, May 24, 2011 at 6:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> >> That being said, it's a slight extra cost for all fast-path lockers to benefit
> >> the strong lockers, so I'm not prepared to guess whether it will pay off.
> >
> > Yeah. ?Basically this entire idea is about trying to make life easier
> > for weak lockers at the expense of making it more difficult for strong
> > lockers. ?I think that's a good trade-off in general, but we might
> > need to wait until we have an actual implementation to judge whether
> > we've turned the dial too far.
>
> I like this overall concept and like the way this has been described
> with strong and weak locks. It seems very useful to me, since temp
> tables can be skipped. That leaves shared DDL and we have done lots to
> reduce the lock levels held and are looking at further reductions
> also. I think even quite extensive delays are worth the trade-off.
>
> I'd been looking at this also, though hadn't mentioned it previously
> because I found an Oracle patent that discusses dynamically turning on
> and off locking. So that's something to be aware of. IMHO if we
> discuss this in terms of sharing/not sharing locking information then
> it is sufficient to avoid the patent. That patent also discusses the
> locking state change needs to wait longer than required.

FYI, I thought we had agreed not to look at any patents because looking
at them might cause us more problems than not looking at them.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-27 20:55:07
Message-ID:	BANLkTinD2=Ak2x7_U7eQL=8Ne=THAFJ44g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 24, 2011 at 10:03 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Tue, May 24, 2011 at 08:53:11AM -0400, Robert Haas wrote:
>> On Tue, May 24, 2011 at 5:07 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > This drops the part about only transferring fast-path entries once when a
>> > strong_lock_counts cell transitions from zero to one.
>>
>> Right: that's because I don't think that's what we want to do. I
>> don't think we want to transfer all per-backend locks to the shared
>> hash table as soon as anyone attempts to acquire a strong lock;
>> instead, I think we want to transfer only those fast-path locks which
>> have the same locktag as the strong lock someone is attempting to
>> acquire. If we do that, then it doesn't matter whether the
>> strong_lock_counts[] cell is transitioning from 0 to 1 or from 6 to 7:
>> we still have to check for strong locks with that particular locktag.
>
> Oh, I see. I was envisioning that you'd transfer all locks associated with
> the strong_lock_counts cell; that is, all the locks that would now go directly
> to the global lock table when requested going forward. Transferring only
> exact matches seems fine too, and then I agree with your other conclusions.

I took a crack at implementing this and ran into difficulties.
Actually, I haven't gotten to the point of actually testing whether it
works, but I'm worried about a possible problem with the algorithm.

When a strong lock is taken or released, we have to increment or
decrement strong_lock_counts[fasthashpartition]. Here's the question:
is that atomic? In other words, suppose that strong_lock_counts[42]
starts out at 0, and two backends both do ++strong_lock_counts[42].
Are we guaranteed to end up with "2" in that memory location or might
we unluckily end up with "1"? I think the latter is possible... and
some guard is needed to make sure that doesn't happen.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-27 21:01:51
Message-ID:	3368.1306530111@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> When a strong lock is taken or released, we have to increment or
> decrement strong_lock_counts[fasthashpartition]. Here's the question:
> is that atomic? In other words, suppose that strong_lock_counts[42]
> starts out at 0, and two backends both do ++strong_lock_counts[42].
> Are we guaranteed to end up with "2" in that memory location or might
> we unluckily end up with "1"? I think the latter is possible... and
> some guard is needed to make sure that doesn't happen.

There are "atomic increment" primitives on most/all multiprocessors,
although availing ourselves of them everywhere will take an amount of
work not unlike developing the spinlock primitives :-(. You are dead
right that this is unsafe without that.

regards, tom lane

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-27 21:50:36
Message-ID:	20110527215036.GA7188@tornado.gateway.2wire.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 27, 2011 at 04:55:07PM -0400, Robert Haas wrote:
> When a strong lock is taken or released, we have to increment or
> decrement strong_lock_counts[fasthashpartition]. Here's the question:
> is that atomic? In other words, suppose that strong_lock_counts[42]
> starts out at 0, and two backends both do ++strong_lock_counts[42].
> Are we guaranteed to end up with "2" in that memory location or might
> we unluckily end up with "1"? I think the latter is possible... and
> some guard is needed to make sure that doesn't happen.

Yeah: what Tom said. Guard it with a spinlock? Given that the backend is about
to (or did earlier) go off and acquire dozens or hundreds of LWLocks, it doesn't
seem like an area begging for early optimization.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-31 13:22:52
Message-ID:	BANLkTim9SR56i6d+VKamM6fqtDSdvcOdww@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 25, 2011 at 3:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Wed, May 25, 2011 at 1:44 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, May 25, 2011 at 8:27 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> Design seemed relatively easy from there: put local lock table in
>>>> shared memory for all procs. We then have a use_strong_lock at proc
>>>> and at transaction level. Anybody that wants a strong lock first sets
>>>> use_strong_lock at proc and transaction level, then copies all local
>>>> lock data into shared lock table,
>
>>> I'm not following this...
>
>> Which bit aren't you following? It's a design outline for how to
>> implement, deliberately brief to allow a discussion of design
>> alternatives.
>
> What I'm not following is how moving the local lock table into shared
> memory can possibly be a good idea. The reason we invented the local
> lock table in the first place (we didn't use to have one) is so that a
> process could do some manipulations without touching shared memory.
> (Notably, it is currently nearly free, and certainly lock-free, to
> re-request a lock type you already hold. This is not an infrequent
> case.) That advantage will go away if you do this.

The basis for this is that weak locks do not conflict with each other,
whereas strong locks conflict with both strong and weak locks.
(There's a couple of special cases which I ignore for now).
(Using Robert's description of strong/weak locks)

Since most actions in normal running only require weak locks then we
see that 99% of the time we don't need to share lock information at
all.

So the idea is that we have 2 modes of operation: mode (1) when nobody
is requesting a strong lock we don't share lock information. We switch
into mode (2) when somebody requests a strong lock and in this mode we
must share all lock information just as we do now.

The basic analysis is that we have a way of removing 99% of the
overhead of lock information sharing. We still have the local lock
table and we still perform locking, we just don't share that with
other backends unless we need to. So there is a slight reduction in
path length and a total avoidance of contention.

Ideally, we would want to be in mode 2 for a short period of time.

The difficulty is how to move from mode 1 (non-shared locking) to mode
2 (shared locking) and back again?

A strong lock request causes the mode flip automatically via one of
these mechanisms:
1. signal to each backend causes them to update shared lock
information (at that point non-conflicting)
2. local lock table in shared memory
3. files
4. other
The requirement is that the mode be flipped in all backends before we
process the request for a strong lock.

The idea is to make the local lock table accessible for occasional use
in mode switching. Reading the local lock table by its owning backend
would always be lock free. Locks are only required when modifying the
local lock table by the owning backend, or when another backend reads
it. So making the local lock table accessible is not a problem.

Flipping back from mode 2 to mode 1 should be fairly slow, since DDL
usually occurs in groups.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Alexey Klyukin <alexk(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Reducing overhead of frequent table locks
Date:	2011-05-31 13:45:28
Message-ID:	BANLkTimLc98CqP14R-b0byXxov=Cw=7Ehw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 31, 2011 at 9:22 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The basis for this is that weak locks do not conflict with each other,
> whereas strong locks conflict with both strong and weak locks.
> (There's a couple of special cases which I ignore for now).
> (Using Robert's description of strong/weak locks)
>
> Since most actions in normal running only require weak locks then we
> see that 99% of the time we don't need to share lock information at
> all.
>
> So the idea is that we have 2 modes of operation: mode (1) when nobody
> is requesting a strong lock we don't share lock information. We switch
> into mode (2) when somebody requests a strong lock and in this mode we
> must share all lock information just as we do now.
>
> The basic analysis is that we have a way of removing 99% of the
> overhead of lock information sharing. We still have the local lock
> table and we still perform locking, we just don't share that with
> other backends unless we need to. So there is a slight reduction in
> path length and a total avoidance of contention.
>
> Ideally, we would want to be in mode 2 for a short period of time.
>
> The difficulty is how to move from mode 1 (non-shared locking) to mode
> 2 (shared locking) and back again?
>
> A strong lock request causes the mode flip automatically via one of
> these mechanisms:
> 1. signal to each backend causes them to update shared lock
> information (at that point non-conflicting)
> 2. local lock table in shared memory
> 3. files
> 4. other
> The requirement is that the mode be flipped in all backends before we
> process the request for a strong lock.
>
> The idea is to make the local lock table accessible for occasional use
> in mode switching. Reading the local lock table by its owning backend
> would always be lock free. Locks are only required when modifying the
> local lock table by the owning backend, or when another backend reads
> it. So making the local lock table accessible is not a problem.

You can't actually make the local lock table lock-free to the owning
backend, if other backends are going to be modifying it, or even
examining it.

However, as discussed upthread, what does seem possible is to allow
each backend to maintain a queue of "weak" locks that are protected by
an LWLock which is normally taken only by the owning backend, except
on those rare occasions when a "strong" lock enters the picture. This
doesn't completely eliminate LWLocks from the picture, but preliminary
tests with my hacked-up, work-in-progress patch shows that it results
in a very large decrease in LWLock *contention*. I'm going to post
the patch once I get it debugged and tested a bit more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company