Reducing Transaction Start/End Contention

Lists: pgsql-hackers
From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Reducing Transaction Start/End Contention
Date: 2007-07-30 19:20:48
Message-ID: 1185823248.4176.60.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jignesh Shah's scalability testing on Solaris has revealed further
tuning opportunities surrounding the start and end of a transaction.
Tuning that should be especially important since async commit is likely
to allow much higher transaction rates than were previously possible.

There is strong contention on the ProcArrayLock in Exclusive mode, with
the top path being CommitTransaction(). This becomes clear as the number
of connections increases, but it seems likely that the contention can be
caused in a range of other circumstances. My thoughts on the causes of
this contention are that the following 3 tasks contend with each other
in the following way:

CommitTransaction(): takes ProcArrayLock Exclusive
but only needs access to one ProcArray element

waits for

GetSnapshotData():ProcArrayLock Shared
ReadNewTransactionId():XidGenLock Shared

which waits for

GetNextTransactionId()
takes XidGenLock Exclusive
ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
two possible place where I/O is required
ExtendSubtrans(): takes SubtransControlLock()
one possible place where I/O is required
Avoids lock on ProcArrayLock: atomically updates one ProcArray element

or more simply:

CommitTransaction() -- i.e. once per transaction
waits for
GetSnapshotData() -- i.e. once per SQL statement
which waits for
GetNextTransactionId() -- i.e. once per transaction

This gives some goals for scalability improvements and some proposals.
(1) and (2) are proposals for 8.3 tuning, the others are directions for
further research.

Goal: Reduce total time that GetSnapshotData() waits for
GetNextTransactionId()

1. Increase size of Clog-specific BLCKSZ
Clog currently uses BLCKSZ to define the size of clog buffers. This can
be changed to use CLOG_BLCKSZ, which would then be set to 32768.
This will naturally increase the amount of memory allocated to the clog,
so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
suggested, with successful results). This will also reduce the number of
ExtendClog() calls, which will probably reduce the overall contention
also.

2. Perform ExtendClog() as a background activity
Background process can look at the next transactionid once each cycle
without holding any lock. If the xid is almost at the point where a new
clog page would be allocated, then it will allocate one prior to the new
page being absolutely required. Doing this as a background task would
mean that we do not need to hold the XidGenLock in exclusive mode while
we do this, which means that GetSnapshotData() and CommitTransaction()
would also be less likely to block. Also, if any clog writes need to be
performed when the page is moved forwards this would also be performed
in the background.

3. Consider whether ProcArrayLock should use a new queued-shared lock
mode that puts a maximum wait time on ExclusiveLock requests. It would
be fairly hard to implement this well as a timer, but it might be
possible to place a limit on queue length. i.e. allow Share locks to be
granted immediately if a Shared holder already exists, but only if there
is a queue of no more than N exclusive mode requests queued. This might
prevent the worst cases of exclusive lock starvation.

4. Since shared locks are currently queued behind exclusive requests
when they cannot be immediately satisfied, it might be worth
reconsidering the way LWLockRelease works also. When we wake up the
queue we only wake the Shared requests that are adjacent to the head of
the queue. Instead we could wake *all* waiting Shared requestors.

e.g. with a lock queue like this:
(HEAD) S<-S<-X<-S<-X<-S<-X<-S
Currently we would wake the 1st and 2nd waiters only.

If we were to wake the 3rd, 5th and 7th waiters also, then the queue
would reduce in length very quickly, if we assume generally uniform
service times. (If the head of the queue is X, then we wake only that
one process and I'm not proposing we change that). That would mean queue
jumping right? Well thats what already happens in other circumstances,
so there cannot be anything intrinsically wrong with allowing it, the
only question is: would it help?

We need not wake the whole queue, there may be some generally more
beneficial heuristic. The reason for considering this is not to speed up
Shared requests but to reduce the queue length and thus the waiting time
for the Xclusive requestors. Each time a Shared request is dequeued, we
effectively re-enable queue jumping, so a Shared request arriving during
that point will actually jump ahead of Shared requests that were unlucky
enough to arrive while an Exclusive lock was held. Worse than that, the
new incoming Shared requests exacerbate the starvation, so the more
non-adjacent groups of Shared lock requests there are in the queue, the
worse the starvation of the exclusive requestors becomes. We are
effectively randomly starving some shared locks as well as exclusive
locks in the current scheme, based upon the state of the lock when they
make their request. The situation is worst when the lock is heavily
contended and the workload has a 50/50 mix of shared/exclusive requests,
e.g. serializable transactions or transactions with lots of
subtransactions.

Goal: Reduce the total time that CommitTransaction() waits for
GetSnapshotData()

5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
this, we split the ProcArrayLock into multiple partitions (as suggested
by Alvaro). There are comments in GetNewTransactionId() about having one
spinlock per ProcArray entry. This would be too many and we could reduce
contention by having one lock for each N ProcArray entries. Since we
don't see too much contention with 100 users (default) it would seem
sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
partitions then we will need multiple partitions however many users we
have connected, whereas using contiguous ranges would allow one lock for
low numbers of users and yet enough locks for higher numbers of users.

6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
To do this, optimise group commit so that all of the actions for
multiple transactions are executed together: flushing WAL, updating CLOG
and updating ProcArray, whenever it is appropriate to do so. There's no
point in having a group commit facility that optimises just one of those
contention points when all 3 need to be considered. That needs to be
done as part of a general overhaul of group commit. This would include
making TransactionLogMultiUpdate() take CLogControlLock once for each
page that it needs to access, which would also reduce contention from
TransactionIdCommitTree().

(1) and (2) can be patched fairly easily for 8.3. I have a prototype
patch for (1) on the shelf already from 6 months ago.

(3), (4) and (5) seem like changes that would require significant
testing time to ensure we did it correctly, even though the patches
might be fairly small. I'm thinking this is probably an 8.4 change, but
I can get test versions out fairly quickly I think.

(6) seems definitely an 8.4 change.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-07-31 15:34:15
Message-ID: 20070731153415.GI5103@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:

> 1. Increase size of Clog-specific BLCKSZ

> 2. Perform ExtendClog() as a background activity

> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> patch for (1) on the shelf already from 6 months ago.

Hmm, I think (1) may be 8.3 material but all the rest are complex enough
that being left for 8.4 is called for.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-07-31 16:05:59
Message-ID: 25697.1185897959@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Simon Riggs wrote:
>> 1. Increase size of Clog-specific BLCKSZ

>> 2. Perform ExtendClog() as a background activity

>> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
>> patch for (1) on the shelf already from 6 months ago.

> Hmm, I think (1) may be 8.3 material but all the rest are complex enough
> that being left for 8.4 is called for.

NONE of this is 8.3 material. Full stop. Try to keep your eyes on the
ball people --- 8.3 is already months past feature freeze.

I don't even think that Simon has made a case that #1 is a good idea.
Increasing the page size will increase contention between transactions
trying to hit the same clog page, no?

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-07-31 17:00:03
Message-ID: 46AF6A93.4020601@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Simon Riggs wrote:
>>> 1. Increase size of Clog-specific BLCKSZ
>
>>> 2. Perform ExtendClog() as a background activity
>
>>> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
>>> patch for (1) on the shelf already from 6 months ago.
>
>> Hmm, I think (1) may be 8.3 material but all the rest are complex enough
>> that being left for 8.4 is called for.
>
> NONE of this is 8.3 material. Full stop. Try to keep your eyes on the
> ball people --- 8.3 is already months past feature freeze.

yeah - we have still 12(!) open items on the PatchStatus board:

http://developer.postgresql.org/index.php/Todo:PatchStatus

and at least half of them are in need of reviewer capacity(and some of
them there for nearly half a year).

Stefan


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-09-05 20:06:07
Message-ID: 1189022767.4175.256.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-07-30 at 20:20 +0100, Simon Riggs wrote:

> Jignesh Shah's scalability testing on Solaris has revealed further
> tuning opportunities surrounding the start and end of a transaction.
> Tuning that should be especially important since async commit is likely
> to allow much higher transaction rates than were previously possible.
>
> There is strong contention on the ProcArrayLock in Exclusive mode, with
> the top path being CommitTransaction(). This becomes clear as the number
> of connections increases, but it seems likely that the contention can be
> caused in a range of other circumstances. My thoughts on the causes of
> this contention are that the following 3 tasks contend with each other
> in the following way:
>
> CommitTransaction(): takes ProcArrayLock Exclusive
> but only needs access to one ProcArray element
>
> waits for
>
> GetSnapshotData():ProcArrayLock Shared
> ReadNewTransactionId():XidGenLock Shared
>
> which waits for
>
> GetNextTransactionId()
> takes XidGenLock Exclusive
> ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
> two possible place where I/O is required
> ExtendSubtrans(): takes SubtransControlLock()
> one possible place where I/O is required
> Avoids lock on ProcArrayLock: atomically updates one ProcArray element
>
>
> or more simply:
>
> CommitTransaction() -- i.e. once per transaction
> waits for
> GetSnapshotData() -- i.e. once per SQL statement
> which waits for
> GetNextTransactionId() -- i.e. once per transaction
>
> This gives some goals for scalability improvements and some proposals.
> (1) and (2) are proposals for 8.3 tuning, the others are directions for
> further research.
>
>
> Goal: Reduce total time that GetSnapshotData() waits for
> GetNextTransactionId()

The latest patch for lazy xid allocation reduces the number of times
GetNextTransactionId() is called by eliminating the call entirely for
read only transactions. That will reduce the number of waits and so will
for most real world cases increase the scalability of Postgres.
Right-mostly workloads will be slightly less scalable, so we should
expect our TPC-C numbers to be slightly worse than our TPC-E numbers.

We should retest to see whether the bottleneck has been moved
sufficiently to allow us to avoid doing techniques (1), (2), (3), (5) or
(6) at all.

> 1. Increase size of Clog-specific BLCKSZ
> Clog currently uses BLCKSZ to define the size of clog buffers. This can
> be changed to use CLOG_BLCKSZ, which would then be set to 32768.
> This will naturally increase the amount of memory allocated to the clog,
> so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
> suggested, with successful results). This will also reduce the number of
> ExtendClog() calls, which will probably reduce the overall contention
> also.
>
> 2. Perform ExtendClog() as a background activity
> Background process can look at the next transactionid once each cycle
> without holding any lock. If the xid is almost at the point where a new
> clog page would be allocated, then it will allocate one prior to the new
> page being absolutely required. Doing this as a background task would
> mean that we do not need to hold the XidGenLock in exclusive mode while
> we do this, which means that GetSnapshotData() and CommitTransaction()
> would also be less likely to block. Also, if any clog writes need to be
> performed when the page is moved forwards this would also be performed
> in the background.

> 3. Consider whether ProcArrayLock should use a new queued-shared lock
> mode that puts a maximum wait time on ExclusiveLock requests. It would
> be fairly hard to implement this well as a timer, but it might be
> possible to place a limit on queue length. i.e. allow Share locks to be
> granted immediately if a Shared holder already exists, but only if there
> is a queue of no more than N exclusive mode requests queued. This might
> prevent the worst cases of exclusive lock starvation.

(4) is a general concern that remains valid.

> 4. Since shared locks are currently queued behind exclusive requests
> when they cannot be immediately satisfied, it might be worth
> reconsidering the way LWLockRelease works also. When we wake up the
> queue we only wake the Shared requests that are adjacent to the head of
> the queue. Instead we could wake *all* waiting Shared requestors.
>
> e.g. with a lock queue like this:
> (HEAD) S<-S<-X<-S<-X<-S<-X<-S
> Currently we would wake the 1st and 2nd waiters only.
>
> If we were to wake the 3rd, 5th and 7th waiters also, then the queue
> would reduce in length very quickly, if we assume generally uniform
> service times. (If the head of the queue is X, then we wake only that
> one process and I'm not proposing we change that). That would mean queue
> jumping right? Well thats what already happens in other circumstances,
> so there cannot be anything intrinsically wrong with allowing it, the
> only question is: would it help?
>
> We need not wake the whole queue, there may be some generally more
> beneficial heuristic. The reason for considering this is not to speed up
> Shared requests but to reduce the queue length and thus the waiting time
> for the Xclusive requestors. Each time a Shared request is dequeued, we
> effectively re-enable queue jumping, so a Shared request arriving during
> that point will actually jump ahead of Shared requests that were unlucky
> enough to arrive while an Exclusive lock was held. Worse than that, the
> new incoming Shared requests exacerbate the starvation, so the more
> non-adjacent groups of Shared lock requests there are in the queue, the
> worse the starvation of the exclusive requestors becomes. We are
> effectively randomly starving some shared locks as well as exclusive
> locks in the current scheme, based upon the state of the lock when they
> make their request. The situation is worst when the lock is heavily
> contended and the workload has a 50/50 mix of shared/exclusive requests,
> e.g. serializable transactions or transactions with lots of
> subtransactions.
>
>
> Goal: Reduce the total time that CommitTransaction() waits for
> GetSnapshotData()
>
> 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
> this, we split the ProcArrayLock into multiple partitions (as suggested
> by Alvaro). There are comments in GetNewTransactionId() about having one
> spinlock per ProcArray entry. This would be too many and we could reduce
> contention by having one lock for each N ProcArray entries. Since we
> don't see too much contention with 100 users (default) it would seem
> sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
> partitions then we will need multiple partitions however many users we
> have connected, whereas using contiguous ranges would allow one lock for
> low numbers of users and yet enough locks for higher numbers of users.
>
> 6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
> To do this, optimise group commit so that all of the actions for
> multiple transactions are executed together: flushing WAL, updating CLOG
> and updating ProcArray, whenever it is appropriate to do so. There's no
> point in having a group commit facility that optimises just one of those
> contention points when all 3 need to be considered. That needs to be
> done as part of a general overhaul of group commit. This would include
> making TransactionLogMultiUpdate() take CLogControlLock once for each
> page that it needs to access, which would also reduce contention from
> TransactionIdCommitTree().
>
> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> patch for (1) on the shelf already from 6 months ago.
>
> (3), (4) and (5) seem like changes that would require significant
> testing time to ensure we did it correctly, even though the patches
> might be fairly small. I'm thinking this is probably an 8.4 change, but
> I can get test versions out fairly quickly I think.
>
> (6) seems definitely an 8.4 change.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-09-14 03:16:42
Message-ID: 200709140316.l8E3Ggq19385@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:
> Jignesh Shah's scalability testing on Solaris has revealed further
> tuning opportunities surrounding the start and end of a transaction.
> Tuning that should be especially important since async commit is likely
> to allow much higher transaction rates than were previously possible.
>
> There is strong contention on the ProcArrayLock in Exclusive mode, with
> the top path being CommitTransaction(). This becomes clear as the number
> of connections increases, but it seems likely that the contention can be
> caused in a range of other circumstances. My thoughts on the causes of
> this contention are that the following 3 tasks contend with each other
> in the following way:
>
> CommitTransaction(): takes ProcArrayLock Exclusive
> but only needs access to one ProcArray element
>
> waits for
>
> GetSnapshotData():ProcArrayLock Shared
> ReadNewTransactionId():XidGenLock Shared
>
> which waits for
>
> GetNextTransactionId()
> takes XidGenLock Exclusive
> ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
> two possible place where I/O is required
> ExtendSubtrans(): takes SubtransControlLock()
> one possible place where I/O is required
> Avoids lock on ProcArrayLock: atomically updates one ProcArray element
>
>
> or more simply:
>
> CommitTransaction() -- i.e. once per transaction
> waits for
> GetSnapshotData() -- i.e. once per SQL statement
> which waits for
> GetNextTransactionId() -- i.e. once per transaction
>
> This gives some goals for scalability improvements and some proposals.
> (1) and (2) are proposals for 8.3 tuning, the others are directions for
> further research.
>
>
> Goal: Reduce total time that GetSnapshotData() waits for
> GetNextTransactionId()
>
> 1. Increase size of Clog-specific BLCKSZ
> Clog currently uses BLCKSZ to define the size of clog buffers. This can
> be changed to use CLOG_BLCKSZ, which would then be set to 32768.
> This will naturally increase the amount of memory allocated to the clog,
> so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
> suggested, with successful results). This will also reduce the number of
> ExtendClog() calls, which will probably reduce the overall contention
> also.
>
> 2. Perform ExtendClog() as a background activity
> Background process can look at the next transactionid once each cycle
> without holding any lock. If the xid is almost at the point where a new
> clog page would be allocated, then it will allocate one prior to the new
> page being absolutely required. Doing this as a background task would
> mean that we do not need to hold the XidGenLock in exclusive mode while
> we do this, which means that GetSnapshotData() and CommitTransaction()
> would also be less likely to block. Also, if any clog writes need to be
> performed when the page is moved forwards this would also be performed
> in the background.
>
> 3. Consider whether ProcArrayLock should use a new queued-shared lock
> mode that puts a maximum wait time on ExclusiveLock requests. It would
> be fairly hard to implement this well as a timer, but it might be
> possible to place a limit on queue length. i.e. allow Share locks to be
> granted immediately if a Shared holder already exists, but only if there
> is a queue of no more than N exclusive mode requests queued. This might
> prevent the worst cases of exclusive lock starvation.
>
> 4. Since shared locks are currently queued behind exclusive requests
> when they cannot be immediately satisfied, it might be worth
> reconsidering the way LWLockRelease works also. When we wake up the
> queue we only wake the Shared requests that are adjacent to the head of
> the queue. Instead we could wake *all* waiting Shared requestors.
>
> e.g. with a lock queue like this:
> (HEAD) S<-S<-X<-S<-X<-S<-X<-S
> Currently we would wake the 1st and 2nd waiters only.
>
> If we were to wake the 3rd, 5th and 7th waiters also, then the queue
> would reduce in length very quickly, if we assume generally uniform
> service times. (If the head of the queue is X, then we wake only that
> one process and I'm not proposing we change that). That would mean queue
> jumping right? Well thats what already happens in other circumstances,
> so there cannot be anything intrinsically wrong with allowing it, the
> only question is: would it help?
>
> We need not wake the whole queue, there may be some generally more
> beneficial heuristic. The reason for considering this is not to speed up
> Shared requests but to reduce the queue length and thus the waiting time
> for the Xclusive requestors. Each time a Shared request is dequeued, we
> effectively re-enable queue jumping, so a Shared request arriving during
> that point will actually jump ahead of Shared requests that were unlucky
> enough to arrive while an Exclusive lock was held. Worse than that, the
> new incoming Shared requests exacerbate the starvation, so the more
> non-adjacent groups of Shared lock requests there are in the queue, the
> worse the starvation of the exclusive requestors becomes. We are
> effectively randomly starving some shared locks as well as exclusive
> locks in the current scheme, based upon the state of the lock when they
> make their request. The situation is worst when the lock is heavily
> contended and the workload has a 50/50 mix of shared/exclusive requests,
> e.g. serializable transactions or transactions with lots of
> subtransactions.
>
>
> Goal: Reduce the total time that CommitTransaction() waits for
> GetSnapshotData()
>
> 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
> this, we split the ProcArrayLock into multiple partitions (as suggested
> by Alvaro). There are comments in GetNewTransactionId() about having one
> spinlock per ProcArray entry. This would be too many and we could reduce
> contention by having one lock for each N ProcArray entries. Since we
> don't see too much contention with 100 users (default) it would seem
> sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
> partitions then we will need multiple partitions however many users we
> have connected, whereas using contiguous ranges would allow one lock for
> low numbers of users and yet enough locks for higher numbers of users.
>
> 6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
> To do this, optimise group commit so that all of the actions for
> multiple transactions are executed together: flushing WAL, updating CLOG
> and updating ProcArray, whenever it is appropriate to do so. There's no
> point in having a group commit facility that optimises just one of those
> contention points when all 3 need to be considered. That needs to be
> done as part of a general overhaul of group commit. This would include
> making TransactionLogMultiUpdate() take CLogControlLock once for each
> page that it needs to access, which would also reduce contention from
> TransactionIdCommitTree().
>
> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> patch for (1) on the shelf already from 6 months ago.
>
> (3), (4) and (5) seem like changes that would require significant
> testing time to ensure we did it correctly, even though the patches
> might be fairly small. I'm thinking this is probably an 8.4 change, but
> I can get test versions out fairly quickly I think.
>
> (6) seems definitely an 8.4 change.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-09-14 03:52:19
Message-ID: 20070914035219.GA12618@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
>
> This has been saved for the 8.4 release:
>
> http://momjian.postgresql.org/cgi-bin/pgpatches_hold

I think the work on VIDs and latestCompletedXid make this completely
obsolete.

> ---------------------------------------------------------------------------
>
> Simon Riggs wrote:
> > Jignesh Shah's scalability testing on Solaris has revealed further
> > tuning opportunities surrounding the start and end of a transaction.
> > Tuning that should be especially important since async commit is likely
> > to allow much higher transaction rates than were previously possible.
> >
> > There is strong contention on the ProcArrayLock in Exclusive mode, with
> > the top path being CommitTransaction().

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-09-14 03:59:23
Message-ID: 200709140359.l8E3xNl20766@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > This has been saved for the 8.4 release:
> >
> > http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> I think the work on VIDs and latestCompletedXid make this completely
> obsolete.

Please confirm, all of Simon's issues?

http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php

---------------------------------------------------------------------------

>
> > ---------------------------------------------------------------------------
> >
> > Simon Riggs wrote:
> > > Jignesh Shah's scalability testing on Solaris has revealed further
> > > tuning opportunities surrounding the start and end of a transaction.
> > > Tuning that should be especially important since async commit is likely
> > > to allow much higher transaction rates than were previously possible.
> > >
> > > There is strong contention on the ProcArrayLock in Exclusive mode, with
> > > the top path being CommitTransaction().
>
>
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-09-14 04:12:04
Message-ID: 20035.1189743124@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Alvaro Herrera wrote:
>> I think the work on VIDs and latestCompletedXid make this completely
>> obsolete.

> Please confirm, all of Simon's issues?

Not sure --- the area is certainly still worth looking at, but the
recent patches have changed things enough that no older patches should
be applied without study.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2007-09-14 04:12:58
Message-ID: 20070914041258.GE10355@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > >
> > > This has been saved for the 8.4 release:
> > >
> > > http://momjian.postgresql.org/cgi-bin/pgpatches_hold
> >
> > I think the work on VIDs and latestCompletedXid make this completely
> > obsolete.
>
> Please confirm, all of Simon's issues?
>
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php

Hmm, in looking closer, it seems there are some things that still seem
worthy of more discussion.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-12 00:23:25
Message-ID: 200803120023.m2C0NPw02466@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Is this still a TODO?

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Mon, 2007-07-30 at 20:20 +0100, Simon Riggs wrote:
>
> > Jignesh Shah's scalability testing on Solaris has revealed further
> > tuning opportunities surrounding the start and end of a transaction.
> > Tuning that should be especially important since async commit is likely
> > to allow much higher transaction rates than were previously possible.
> >
> > There is strong contention on the ProcArrayLock in Exclusive mode, with
> > the top path being CommitTransaction(). This becomes clear as the number
> > of connections increases, but it seems likely that the contention can be
> > caused in a range of other circumstances. My thoughts on the causes of
> > this contention are that the following 3 tasks contend with each other
> > in the following way:
> >
> > CommitTransaction(): takes ProcArrayLock Exclusive
> > but only needs access to one ProcArray element
> >
> > waits for
> >
> > GetSnapshotData():ProcArrayLock Shared
> > ReadNewTransactionId():XidGenLock Shared
> >
> > which waits for
> >
> > GetNextTransactionId()
> > takes XidGenLock Exclusive
> > ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
> > two possible place where I/O is required
> > ExtendSubtrans(): takes SubtransControlLock()
> > one possible place where I/O is required
> > Avoids lock on ProcArrayLock: atomically updates one ProcArray element
> >
> >
> > or more simply:
> >
> > CommitTransaction() -- i.e. once per transaction
> > waits for
> > GetSnapshotData() -- i.e. once per SQL statement
> > which waits for
> > GetNextTransactionId() -- i.e. once per transaction
> >
> > This gives some goals for scalability improvements and some proposals.
> > (1) and (2) are proposals for 8.3 tuning, the others are directions for
> > further research.
> >
> >
> > Goal: Reduce total time that GetSnapshotData() waits for
> > GetNextTransactionId()
>
> The latest patch for lazy xid allocation reduces the number of times
> GetNextTransactionId() is called by eliminating the call entirely for
> read only transactions. That will reduce the number of waits and so will
> for most real world cases increase the scalability of Postgres.
> Right-mostly workloads will be slightly less scalable, so we should
> expect our TPC-C numbers to be slightly worse than our TPC-E numbers.
>
> We should retest to see whether the bottleneck has been moved
> sufficiently to allow us to avoid doing techniques (1), (2), (3), (5) or
> (6) at all.
>
> > 1. Increase size of Clog-specific BLCKSZ
> > Clog currently uses BLCKSZ to define the size of clog buffers. This can
> > be changed to use CLOG_BLCKSZ, which would then be set to 32768.
> > This will naturally increase the amount of memory allocated to the clog,
> > so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
> > suggested, with successful results). This will also reduce the number of
> > ExtendClog() calls, which will probably reduce the overall contention
> > also.
> >
> > 2. Perform ExtendClog() as a background activity
> > Background process can look at the next transactionid once each cycle
> > without holding any lock. If the xid is almost at the point where a new
> > clog page would be allocated, then it will allocate one prior to the new
> > page being absolutely required. Doing this as a background task would
> > mean that we do not need to hold the XidGenLock in exclusive mode while
> > we do this, which means that GetSnapshotData() and CommitTransaction()
> > would also be less likely to block. Also, if any clog writes need to be
> > performed when the page is moved forwards this would also be performed
> > in the background.
>
> > 3. Consider whether ProcArrayLock should use a new queued-shared lock
> > mode that puts a maximum wait time on ExclusiveLock requests. It would
> > be fairly hard to implement this well as a timer, but it might be
> > possible to place a limit on queue length. i.e. allow Share locks to be
> > granted immediately if a Shared holder already exists, but only if there
> > is a queue of no more than N exclusive mode requests queued. This might
> > prevent the worst cases of exclusive lock starvation.
>
> (4) is a general concern that remains valid.
>
> > 4. Since shared locks are currently queued behind exclusive requests
> > when they cannot be immediately satisfied, it might be worth
> > reconsidering the way LWLockRelease works also. When we wake up the
> > queue we only wake the Shared requests that are adjacent to the head of
> > the queue. Instead we could wake *all* waiting Shared requestors.
> >
> > e.g. with a lock queue like this:
> > (HEAD) S<-S<-X<-S<-X<-S<-X<-S
> > Currently we would wake the 1st and 2nd waiters only.
> >
> > If we were to wake the 3rd, 5th and 7th waiters also, then the queue
> > would reduce in length very quickly, if we assume generally uniform
> > service times. (If the head of the queue is X, then we wake only that
> > one process and I'm not proposing we change that). That would mean queue
> > jumping right? Well thats what already happens in other circumstances,
> > so there cannot be anything intrinsically wrong with allowing it, the
> > only question is: would it help?
> >
> > We need not wake the whole queue, there may be some generally more
> > beneficial heuristic. The reason for considering this is not to speed up
> > Shared requests but to reduce the queue length and thus the waiting time
> > for the Xclusive requestors. Each time a Shared request is dequeued, we
> > effectively re-enable queue jumping, so a Shared request arriving during
> > that point will actually jump ahead of Shared requests that were unlucky
> > enough to arrive while an Exclusive lock was held. Worse than that, the
> > new incoming Shared requests exacerbate the starvation, so the more
> > non-adjacent groups of Shared lock requests there are in the queue, the
> > worse the starvation of the exclusive requestors becomes. We are
> > effectively randomly starving some shared locks as well as exclusive
> > locks in the current scheme, based upon the state of the lock when they
> > make their request. The situation is worst when the lock is heavily
> > contended and the workload has a 50/50 mix of shared/exclusive requests,
> > e.g. serializable transactions or transactions with lots of
> > subtransactions.
> >
> >
> > Goal: Reduce the total time that CommitTransaction() waits for
> > GetSnapshotData()
> >
> > 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
> > this, we split the ProcArrayLock into multiple partitions (as suggested
> > by Alvaro). There are comments in GetNewTransactionId() about having one
> > spinlock per ProcArray entry. This would be too many and we could reduce
> > contention by having one lock for each N ProcArray entries. Since we
> > don't see too much contention with 100 users (default) it would seem
> > sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
> > partitions then we will need multiple partitions however many users we
> > have connected, whereas using contiguous ranges would allow one lock for
> > low numbers of users and yet enough locks for higher numbers of users.
> >
> > 6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
> > To do this, optimise group commit so that all of the actions for
> > multiple transactions are executed together: flushing WAL, updating CLOG
> > and updating ProcArray, whenever it is appropriate to do so. There's no
> > point in having a group commit facility that optimises just one of those
> > contention points when all 3 need to be considered. That needs to be
> > done as part of a general overhaul of group commit. This would include
> > making TransactionLogMultiUpdate() take CLogControlLock once for each
> > page that it needs to access, which would also reduce contention from
> > TransactionIdCommitTree().
> >
> > (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> > patch for (1) on the shelf already from 6 months ago.
> >
> > (3), (4) and (5) seem like changes that would require significant
> > testing time to ensure we did it correctly, even though the patches
> > might be fairly small. I'm thinking this is probably an 8.4 change, but
> > I can get test versions out fairly quickly I think.
> >
> > (6) seems definitely an 8.4 change.
>
> --
> Simon Riggs
> 2ndQuadrant http://www.2ndQuadrant.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-13 08:13:18
Message-ID: 1205395998.4285.50.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2008-03-11 at 20:23 -0400, Bruce Momjian wrote:

> Is this still a TODO?

I think so.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-13 12:44:37
Message-ID: 20080313124437.GC4764@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Tue, 2008-03-11 at 20:23 -0400, Bruce Momjian wrote:
>
> > Is this still a TODO?
>
> I think so.

How about this wording:

"Review Simon's claims to improve performance"

;-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-13 13:00:47
Message-ID: 47D9257F.9000703@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Simon Riggs wrote:
>
>> On Tue, 2008-03-11 at 20:23 -0400, Bruce Momjian wrote:
>>
>>> Is this still a TODO?
>>>
>> I think so.
>>
>
> How about this wording:
>
> "Review Simon's claims to improve performance

What sort of evidence is usually compelling? It seems to me that this
sort of change only benefits configurations with dozens or more CPUs/cores?

I ask, because I saw a few references to "I see no performance change -
but then, I don't have the right hardware." It seems to me that it
should be obvious that contention will only show up under very high
concurrency? :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-13 15:56:33
Message-ID: 18080.1205423793@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> Alvaro Herrera wrote:
>> How about this wording:
>> "Review Simon's claims to improve performance

> What sort of evidence is usually compelling? It seems to me that this
> sort of change only benefits configurations with dozens or more CPUs/cores?

The main point in my mind was that that analysis was based on the code
as it then stood. Florian's work to reduce ProcArrayLock contention
might have invalidated some or all of the ideas. So it needs a fresh
look.

regards, tom lane


From: Paul van den Bogaard <Paul(dot)Vandenbogaard(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-14 08:00:04
Message-ID: E377F6FE-6A04-456B-B94A-221FC633805C@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just started a blog session on my findings running Postgres 8.3(beta)
on a mid range Sun Fire server. Second entry is about the time lost
on LWLock handling. When concurrency increases you can see the
ProcArrayLock wait queue to start and explode.

http://blogs.sun.com/paulvandenbogaard/entry/
leight_weight_lock_contention

I will add more posts on all the other LWlock findings and the
instrumentation method being used. Unfortunately a high priority
project popped up I need to focus on. So please be patient. Hope to
finish this in the first week of april.

Thanks,
Paul

On 13-mrt-2008, at 16:56, Tom Lane wrote:

> Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
>> Alvaro Herrera wrote:
>>> How about this wording:
>>> "Review Simon's claims to improve performance
>
>> What sort of evidence is usually compelling? It seems to me that this
>> sort of change only benefits configurations with dozens or more
>> CPUs/cores?
>
> The main point in my mind was that that analysis was based on the code
> as it then stood. Florian's work to reduce ProcArrayLock contention
> might have invalidated some or all of the ideas. So it needs a fresh
> look.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

------------------------------------------------------------------------
---------------------
Paul van den Bogaard
Paul(dot)vandenBogaard(at)sun(dot)com
ISV-E -- ISV Engineering, Opensource Engineering group

Sun Microsystems, Inc phone: +31
334 515 918
Saturnus 1
extentsion: x (70)15918
3824 ME Amersfoort mobile: +31
651 913 354
The Netherlands
fax: +31 334 515 001


From: Paul van den Bogaard <Paul(dot)Vandenbogaard(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-14 13:02:05
Message-ID: A3B842AB-9396-4853-8207-1346A45A1DC3@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just started a blog session on my findings running Postgres 8.3(beta)
on a mid range Sun Fire server. Second entry is about the time lost
on LWLock handling. When concurrency increases you can see the
ProcArrayLock wait queue to start and explode.

http://blogs.sun.com/paulvandenbogaard/entry/
leight_weight_lock_contention

I will add more posts on all the other LWlock findings and the
instrumentation method being used. Unfortunately a high priority
project popped up I need to focus on. So please be patient. Hope to
finish this in the first week of april.

Thanks,
Paul

On 13-mrt-2008, at 16:56, Tom Lane wrote:

> Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
>> Alvaro Herrera wrote:
>>> How about this wording:
>>> "Review Simon's claims to improve performance
>
>> What sort of evidence is usually compelling? It seems to me that this
>> sort of change only benefits configurations with dozens or more
>> CPUs/cores?
>
> The main point in my mind was that that analysis was based on the code
> as it then stood. Florian's work to reduce ProcArrayLock contention
> might have invalidated some or all of the ideas. So it needs a fresh
> look.
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

------------------------------------------------------------------------
---------------------
Paul van den Bogaard
Paul(dot)vandenBogaard(at)sun(dot)com
ISV-E -- ISV Engineering, Opensource Engineering group

Sun Microsystems, Inc phone: +31
334 515 918
Saturnus 1
extentsion: x (70)15918
3824 ME Amersfoort mobile: +31
651 913 354
The Netherlands
fax: +31 334 515 001


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Paul van den Bogaard <Paul(dot)Vandenbogaard(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-24 17:41:47
Message-ID: 200803241741.m2OHfls07630@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Thread URL added to TODO:

* SMP scalability improvements

---------------------------------------------------------------------------

Paul van den Bogaard wrote:
> Just started a blog session on my findings running Postgres 8.3(beta)
> on a mid range Sun Fire server. Second entry is about the time lost
> on LWLock handling. When concurrency increases you can see the
> ProcArrayLock wait queue to start and explode.
>
> http://blogs.sun.com/paulvandenbogaard/entry/
> leight_weight_lock_contention
>
> I will add more posts on all the other LWlock findings and the
> instrumentation method being used. Unfortunately a high priority
> project popped up I need to focus on. So please be patient. Hope to
> finish this in the first week of april.
>
> Thanks,
> Paul
>
>
> On 13-mrt-2008, at 16:56, Tom Lane wrote:
>
> > Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> >> Alvaro Herrera wrote:
> >>> How about this wording:
> >>> "Review Simon's claims to improve performance
> >
> >> What sort of evidence is usually compelling? It seems to me that this
> >> sort of change only benefits configurations with dozens or more
> >> CPUs/cores?
> >
> > The main point in my mind was that that analysis was based on the code
> > as it then stood. Florian's work to reduce ProcArrayLock contention
> > might have invalidated some or all of the ideas. So it needs a fresh
> > look.
> >
> > regards, tom lane
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>
> ------------------------------------------------------------------------
> ---------------------
> Paul van den Bogaard
> Paul(dot)vandenBogaard(at)sun(dot)com
> ISV-E -- ISV Engineering, Opensource Engineering group
>
> Sun Microsystems, Inc phone: +31
> 334 515 918
> Saturnus 1
> extentsion: x (70)15918
> 3824 ME Amersfoort mobile: +31
> 651 913 354
> The Netherlands
> fax: +31 334 515 001
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing Transaction Start/End Contention
Date: 2008-03-26 01:50:56
Message-ID: 200803260150.m2Q1ouX06621@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

> * Consider transaction start/end performance improvements
>
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php
> http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php

---------------------------------------------------------------------------

Simon Riggs wrote:
> Jignesh Shah's scalability testing on Solaris has revealed further
> tuning opportunities surrounding the start and end of a transaction.
> Tuning that should be especially important since async commit is likely
> to allow much higher transaction rates than were previously possible.
>
> There is strong contention on the ProcArrayLock in Exclusive mode, with
> the top path being CommitTransaction(). This becomes clear as the number
> of connections increases, but it seems likely that the contention can be
> caused in a range of other circumstances. My thoughts on the causes of
> this contention are that the following 3 tasks contend with each other
> in the following way:
>
> CommitTransaction(): takes ProcArrayLock Exclusive
> but only needs access to one ProcArray element
>
> waits for
>
> GetSnapshotData():ProcArrayLock Shared
> ReadNewTransactionId():XidGenLock Shared
>
> which waits for
>
> GetNextTransactionId()
> takes XidGenLock Exclusive
> ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive
> two possible place where I/O is required
> ExtendSubtrans(): takes SubtransControlLock()
> one possible place where I/O is required
> Avoids lock on ProcArrayLock: atomically updates one ProcArray element
>
>
> or more simply:
>
> CommitTransaction() -- i.e. once per transaction
> waits for
> GetSnapshotData() -- i.e. once per SQL statement
> which waits for
> GetNextTransactionId() -- i.e. once per transaction
>
> This gives some goals for scalability improvements and some proposals.
> (1) and (2) are proposals for 8.3 tuning, the others are directions for
> further research.
>
>
> Goal: Reduce total time that GetSnapshotData() waits for
> GetNextTransactionId()
>
> 1. Increase size of Clog-specific BLCKSZ
> Clog currently uses BLCKSZ to define the size of clog buffers. This can
> be changed to use CLOG_BLCKSZ, which would then be set to 32768.
> This will naturally increase the amount of memory allocated to the clog,
> so we need not alter CLOG_BUFFERS above 8 if we do this (as previously
> suggested, with successful results). This will also reduce the number of
> ExtendClog() calls, which will probably reduce the overall contention
> also.
>
> 2. Perform ExtendClog() as a background activity
> Background process can look at the next transactionid once each cycle
> without holding any lock. If the xid is almost at the point where a new
> clog page would be allocated, then it will allocate one prior to the new
> page being absolutely required. Doing this as a background task would
> mean that we do not need to hold the XidGenLock in exclusive mode while
> we do this, which means that GetSnapshotData() and CommitTransaction()
> would also be less likely to block. Also, if any clog writes need to be
> performed when the page is moved forwards this would also be performed
> in the background.
>
> 3. Consider whether ProcArrayLock should use a new queued-shared lock
> mode that puts a maximum wait time on ExclusiveLock requests. It would
> be fairly hard to implement this well as a timer, but it might be
> possible to place a limit on queue length. i.e. allow Share locks to be
> granted immediately if a Shared holder already exists, but only if there
> is a queue of no more than N exclusive mode requests queued. This might
> prevent the worst cases of exclusive lock starvation.
>
> 4. Since shared locks are currently queued behind exclusive requests
> when they cannot be immediately satisfied, it might be worth
> reconsidering the way LWLockRelease works also. When we wake up the
> queue we only wake the Shared requests that are adjacent to the head of
> the queue. Instead we could wake *all* waiting Shared requestors.
>
> e.g. with a lock queue like this:
> (HEAD) S<-S<-X<-S<-X<-S<-X<-S
> Currently we would wake the 1st and 2nd waiters only.
>
> If we were to wake the 3rd, 5th and 7th waiters also, then the queue
> would reduce in length very quickly, if we assume generally uniform
> service times. (If the head of the queue is X, then we wake only that
> one process and I'm not proposing we change that). That would mean queue
> jumping right? Well thats what already happens in other circumstances,
> so there cannot be anything intrinsically wrong with allowing it, the
> only question is: would it help?
>
> We need not wake the whole queue, there may be some generally more
> beneficial heuristic. The reason for considering this is not to speed up
> Shared requests but to reduce the queue length and thus the waiting time
> for the Xclusive requestors. Each time a Shared request is dequeued, we
> effectively re-enable queue jumping, so a Shared request arriving during
> that point will actually jump ahead of Shared requests that were unlucky
> enough to arrive while an Exclusive lock was held. Worse than that, the
> new incoming Shared requests exacerbate the starvation, so the more
> non-adjacent groups of Shared lock requests there are in the queue, the
> worse the starvation of the exclusive requestors becomes. We are
> effectively randomly starving some shared locks as well as exclusive
> locks in the current scheme, based upon the state of the lock when they
> make their request. The situation is worst when the lock is heavily
> contended and the workload has a 50/50 mix of shared/exclusive requests,
> e.g. serializable transactions or transactions with lots of
> subtransactions.
>
>
> Goal: Reduce the total time that CommitTransaction() waits for
> GetSnapshotData()
>
> 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do
> this, we split the ProcArrayLock into multiple partitions (as suggested
> by Alvaro). There are comments in GetNewTransactionId() about having one
> spinlock per ProcArray entry. This would be too many and we could reduce
> contention by having one lock for each N ProcArray entries. Since we
> don't see too much contention with 100 users (default) it would seem
> sensible to make N ~ 120. Striped or contiguous? If we stripe the lock
> partitions then we will need multiple partitions however many users we
> have connected, whereas using contiguous ranges would allow one lock for
> low numbers of users and yet enough locks for higher numbers of users.
>
> 6. Reduce the number of times ProcArrayLock is called in Exclusive mode.
> To do this, optimise group commit so that all of the actions for
> multiple transactions are executed together: flushing WAL, updating CLOG
> and updating ProcArray, whenever it is appropriate to do so. There's no
> point in having a group commit facility that optimises just one of those
> contention points when all 3 need to be considered. That needs to be
> done as part of a general overhaul of group commit. This would include
> making TransactionLogMultiUpdate() take CLogControlLock once for each
> page that it needs to access, which would also reduce contention from
> TransactionIdCommitTree().
>
> (1) and (2) can be patched fairly easily for 8.3. I have a prototype
> patch for (1) on the shelf already from 6 months ago.
>
> (3), (4) and (5) seem like changes that would require significant
> testing time to ensure we did it correctly, even though the patches
> might be fairly small. I'm thinking this is probably an 8.4 change, but
> I can get test versions out fairly quickly I think.
>
> (6) seems definitely an 8.4 change.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +