pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.

Lists: pgsql-committerspgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-19 16:22:31
Message-ID: E1QjD43-00083m-Tq@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Remove O(N^2) performance issue with multiple SAVEPOINTs.
Subtransaction locks now released en masse at main commit, rather than
repeatedly re-scanning for locks as we ascend the nested transaction tree.
Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT
and TBLOCK_SUBRELEASE to allow the commit path to be optimised using
the existing code in ResourceOwnerRelease() which appears to have been
intended for this usage, judging from comments therein.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/7cb7122800ec996d4849ce9b4ad3065db19a2aae

Modified Files
--------------
src/backend/access/transam/xact.c | 102 +++++++++++++++++++++++++------------
1 files changed, 70 insertions(+), 32 deletions(-)


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-19 19:49:26
Message-ID: 4E25DFC6.7070008@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 19.07.2011 19:22, Simon Riggs wrote:
> Remove O(N^2) performance issue with multiple SAVEPOINTs.
> Subtransaction locks now released en masse at main commit, rather than
> repeatedly re-scanning for locks as we ascend the nested transaction tree.
> Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT
> and TBLOCK_SUBRELEASE to allow the commit path to be optimised using
> the existing code in ResourceOwnerRelease() which appears to have been
> intended for this usage, judging from comments therein.

CommitSubTransaction(true) does this:

ResourceOwnerRelease(s->curTransactionOwner, RESOURCE_RELEASE_LOCKS,
true, isTopLevel /* == true */);
...
ResourceOwnerDelete(s->curTransactionOwner);

Because isTopLevel is passed as true, ResourceOwnerRelease() doesn't
release or transfer the locks belonging to the resource owner. After the
call, they still point to s->curTransactionOwner. Then, the resource
owner is deleted. After those two calls, the locks still have pointers
to the now-pfree'd ResourceOwner object. Looking at lock.c, we
apparently never dereference LOCALLOCKOWNER.owner field. Nevertheless, a
dangling pointer like that seems like a recipe for trouble. After
releasing all subtransactions, we still fire deferred triggers, for
example, which can do arbitrarily complex things. For example, you might
allocate new resource owners, which if you're really unlucky might get
allocated at the same address as the already-pfree'd resource owner. I'm
not sure what would happen then, but it can't be good.

Instead of leaving the locks dangling to an already-destroyed resource
owner, how about assigning all locks directly to the top-level resource
owner in one sweep? That'd still be much better than the old way of
recursively reassigning them up the subtransaction tree, one level at a
time.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-19 20:08:30
Message-ID: CA+U5nMJHQnr0ADFK_2nF37svHGCvTDu3QCFDOxPCBBk850shuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Jul 19, 2011 at 8:49 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 19.07.2011 19:22, Simon Riggs wrote:
>>
>> Remove O(N^2) performance issue with multiple SAVEPOINTs.
>> Subtransaction locks now released en masse at main commit, rather than
>> repeatedly re-scanning for locks as we ascend the nested transaction tree.
>> Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT
>> and TBLOCK_SUBRELEASE to allow the commit path to be optimised using
>> the existing code in ResourceOwnerRelease() which appears to have been
>> intended for this usage, judging from comments therein.
>
> CommitSubTransaction(true) does this:
>
> ResourceOwnerRelease(s->curTransactionOwner, RESOURCE_RELEASE_LOCKS, true,
> isTopLevel /* == true */);
> ...
> ResourceOwnerDelete(s->curTransactionOwner);
>
> Because isTopLevel is passed as true, ResourceOwnerRelease() doesn't release
> or transfer the locks belonging to the resource owner. After the call, they
> still point to s->curTransactionOwner. Then, the resource owner is deleted.
> After those two calls, the locks still have pointers to the now-pfree'd
> ResourceOwner object. Looking at lock.c, we apparently never dereference
> LOCALLOCKOWNER.owner field. Nevertheless, a dangling pointer like that seems
> like a recipe for trouble. After releasing all subtransactions, we still
> fire deferred triggers, for example, which can do arbitrarily complex
> things. For example, you might allocate new resource owners, which if you're
> really unlucky might get allocated at the same address as the
> already-pfree'd resource owner. I'm not sure what would happen then, but it
> can't be good.
>
> Instead of leaving the locks dangling to an already-destroyed resource
> owner, how about assigning all locks directly to the top-level resource
> owner in one sweep? That'd still be much better than the old way of
> recursively reassigning them up the subtransaction tree, one level at a
> time.

Yes, I did see what the code was doing.

My feeling was the code was specifically written that way, just never
used. So I wired it up to be used the way intended. Have a look at
ResourceOwnerReleaseInternal()... not code I wrote or touched on this
patch.

You might persuade me to do it another way, but I can't see how to
make that way work. Your case seems a stretch. Not sure why you
mention it now, >7 weeks after review.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-19 20:24:15
Message-ID: 4E25E7EF.9030302@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 19.07.2011 23:08, Simon Riggs wrote:
> On Tue, Jul 19, 2011 at 8:49 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 19.07.2011 19:22, Simon Riggs wrote:
>>>
>>> Remove O(N^2) performance issue with multiple SAVEPOINTs.
>>> Subtransaction locks now released en masse at main commit, rather than
>>> repeatedly re-scanning for locks as we ascend the nested transaction tree.
>>> Split transaction state TBLOCK_SUBEND into two states, TBLOCK_SUBCOMMIT
>>> and TBLOCK_SUBRELEASE to allow the commit path to be optimised using
>>> the existing code in ResourceOwnerRelease() which appears to have been
>>> intended for this usage, judging from comments therein.
>>
>> CommitSubTransaction(true) does this:
>>
>> ResourceOwnerRelease(s->curTransactionOwner, RESOURCE_RELEASE_LOCKS, true,
>> isTopLevel /* == true */);
>> ...
>> ResourceOwnerDelete(s->curTransactionOwner);
>>
>> Because isTopLevel is passed as true, ResourceOwnerRelease() doesn't release
>> or transfer the locks belonging to the resource owner. After the call, they
>> still point to s->curTransactionOwner. Then, the resource owner is deleted.
>> After those two calls, the locks still have pointers to the now-pfree'd
>> ResourceOwner object. Looking at lock.c, we apparently never dereference
>> LOCALLOCKOWNER.owner field. Nevertheless, a dangling pointer like that seems
>> like a recipe for trouble. After releasing all subtransactions, we still
>> fire deferred triggers, for example, which can do arbitrarily complex
>> things. For example, you might allocate new resource owners, which if you're
>> really unlucky might get allocated at the same address as the
>> already-pfree'd resource owner. I'm not sure what would happen then, but it
>> can't be good.
>>
>> Instead of leaving the locks dangling to an already-destroyed resource
>> owner, how about assigning all locks directly to the top-level resource
>> owner in one sweep? That'd still be much better than the old way of
>> recursively reassigning them up the subtransaction tree, one level at a
>> time.
>
> Yes, I did see what the code was doing.
>
> My feeling was the code was specifically written that way, just never
> used. So I wired it up to be used the way intended. Have a look at
> ResourceOwnerReleaseInternal()... not code I wrote or touched on this
> patch.

The way ResourceOwnerReleaseIntenal(isTopLevel==true) works in the case
of a genuine top-level commit doesn't have this problem, because the
sub-resource owners are not deleted until TopTransactionResourceOwner
has been processed, and all the locks released. In fact, before this
patch I think an "Assert(!isTopLevel || owner ==
TopTransactionResourceOwner)" would've be in order in
ResourceOwnerRelease(). Or it could've done "bool isTopLevel = (owner ==
TopTransactionResoureOwner)" in the beginning instead of having
isTopLevel as an argument.

> You might persuade me to do it another way, but I can't see how to
> make that way work. Your case seems a stretch.

You get coincidences with memory allocations surprisingly often, because
things tend to get allocated and free'd in chunks of certain sizes. It's
also pretty fragile in the face of future development. It's not hard to
imagine someone adding code in lock.c to dereference the pointer.

> Not sure why you mention it now,>7 weeks after review.

Because I only just spotted it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-19 20:46:54
Message-ID: CA+U5nMLsw0FfGVAHzqa-e=L1XEy=_7tttXikLft1rb2wZ7Za+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Jul 19, 2011 at 9:24 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

>> You might persuade me to do it another way, but I can't see how to
>> make that way work. Your case seems a stretch.
>
> You get coincidences with memory allocations surprisingly often, because
> things tend to get allocated and free'd in chunks of certain sizes. It's
> also pretty fragile in the face of future development. It's not hard to
> imagine someone adding code in lock.c to dereference the pointer.

Then I think we need a 4th phase (would actually happen first).

I will revoke and rework.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-21 16:56:53
Message-ID: 16113.1311267413@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Instead of leaving the locks dangling to an already-destroyed resource
> owner, how about assigning all locks directly to the top-level resource
> owner in one sweep? That'd still be much better than the old way of
> recursively reassigning them up the subtransaction tree, one level at a
> time.

I haven't actually read the patch, but the reason for pushing them up
only one level at a time is that if an intermediate-level subtransaction
aborts, the locks taken by its child subtransactions have to be released
at that time. It sure sounds like this patch broke that.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-21 17:30:25
Message-ID: CA+U5nM+AWHMTN7fW_N7hDz3hAKZa2Lqm1phaLco6kFLJr1=J4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Jul 21, 2011 at 5:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Instead of leaving the locks dangling to an already-destroyed resource
>> owner, how about assigning all locks directly to the top-level resource
>> owner in one sweep? That'd still be much better than the old way of
>> recursively reassigning them up the subtransaction tree, one level at a
>> time.
>
> I haven't actually read the patch, but the reason for pushing them up
> only one level at a time is that if an intermediate-level subtransaction
> aborts, the locks taken by its child subtransactions have to be released
> at that time.  It sure sounds like this patch broke that.

The only path altered by the patch was the
final-commit-while-in-a-subxact, so I don't see a problem in the part
you mention.

At commit all the locks get transferred to the parent, so we scan the
the lock table repeatedly, giving O(N^2).

I think I'll just revert it though. Subtransactions need a lot of
tuning but this isn't high enough up my list to be worth the work.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-21 18:05:54
Message-ID: 1311271526-sup-4181@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Excerpts from Simon Riggs's message of jue jul 21 13:30:25 -0400 2011:

> I think I'll just revert it though. Subtransactions need a lot of
> tuning but this isn't high enough up my list to be worth the work.

If it works and is sane, why would you revert it?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.
Date: 2011-07-22 07:34:32
Message-ID: 4E292808.4040400@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 21.07.2011 21:05, Alvaro Herrera wrote:
> Excerpts from Simon Riggs's message of jue jul 21 13:30:25 -0400 2011:
>
>> I think I'll just revert it though. Subtransactions need a lot of
>> tuning but this isn't high enough up my list to be worth the work.
>
> If it works and is sane, why would you revert it?

See http://archives.postgresql.org/pgsql-hackers/2011-07/msg01037.php.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com