Re: log_lock_waits to identify transaction's relation

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: log_lock_waits to identify transaction's relation
Date: 2013-01-16 02:42:58
Message-ID: CA+U5nMKOLkzz9AFgT-_Us5w3_C+VxZ8jWUL0AtgW7Ss7Wezfrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When there is contention between concurrent transactions it shows up
as with log_lock_waits as
LOG: process %d acquired %s on %s transaction %u after %ld.%03d ms

Which is mostly useless for identifying and eliminating the contention
in the application since you can't tell which table is causing
problems. This only occurs for lock waits on transactions, not on
other lock types, such as tuples which are much more informative, for
example
LOG: process %d acquired %s on tuple(x,y) of relation %u of
database %u after %ld.%03d ms

Attached patch passes through further information about the lock wait,
so we can display the following message instead
LOG: process %d acquired %s on transaction %u on relation %u of
database %u after %ld.%03d ms

This should help identify benefit/loss from the FKlocks feature, since
MultiXactIdWait() shows up in the log as contention of this type.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
identify_xact_waits.v1.patch application/octet-stream 12.1 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: log_lock_waits to identify transaction's relation
Date: 2013-01-16 03:47:57
Message-ID: 20130116034757.GD16126@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon,

* Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
> Attached patch passes through further information about the lock wait,
> so we can display the following message instead
> LOG: process %d acquired %s on transaction %u on relation %u of
> database %u after %ld.%03d ms

I love this idea. Please take these comments as an initial/quick review
because I'd really like to see this get in.

A couple quick notes regarding the patch- what does
GetXactLockTableRelid really provide..? This patch does use it outside
of lmgr.c, where XactLockTableRelid is defined. Second, do we really
need to describeXact bool to DescribeLockTag..? Strikes me as
unnecessary- if the information is available, include it.

Lastly, I really don't like the changes made to XactLockTableWait() and
friends. They had a very clearly defined and simple goal previously and
the additional parameter seems to muddy things a bit, particularly
without any comments about what it's all about or why it's there. I
understand that you're trying to pass the necessary information down to
where the log is generated, but it doesn't feel quite right. Also, what
about VirtualXactLockTableWait()..? Should that have a similar
treatment?

Thanks!

Stephen


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: log_lock_waits to identify transaction's relation
Date: 2013-01-16 15:08:33
Message-ID: CA+U5nMJoOO0Jni+i-rGMAn+jiXdJ6rCXh4X9vh6QLyh5mGQdBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 January 2013 03:47, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Simon,
>
> * Simon Riggs (simon(at)2ndQuadrant(dot)com) wrote:
>> Attached patch passes through further information about the lock wait,
>> so we can display the following message instead
>> LOG: process %d acquired %s on transaction %u on relation %u of
>> database %u after %ld.%03d ms
>
> I love this idea. Please take these comments as an initial/quick review
> because I'd really like to see this get in.

It's there as a way to prove FKlocks works, or not, and to help that
get committed.

It's possible we reject this in this CF, since I see it as low priority anyway.

> A couple quick notes regarding the patch- what does
> GetXactLockTableRelid really provide..?

The ability to access a static variable in a different module. It
doesn't provide anything other than that,

> This patch does use it outside
> of lmgr.c, where XactLockTableRelid is defined. Second, do we really
> need to describeXact bool to DescribeLockTag..? Strikes me as
> unnecessary- if the information is available, include it.

The information isn't available, which is why I didn't include it.
Comments explain that the additional information is only available
within the backend that was waiting. That's sufficient for stats, but
not enough for an extended multi-backend deadlock report.

> Lastly, I really don't like the changes made to XactLockTableWait() and
> friends. They had a very clearly defined and simple goal previously and
> the additional parameter seems to muddy things a bit, particularly
> without any comments about what it's all about or why it's there. I
> understand that you're trying to pass the necessary information down to
> where the log is generated, but it doesn't feel quite right.

There is an API change to XactLockTableWait() to pass the relid. That
part is essential to any solution.

There are 2 possible designs for this...

(1) Simple - we remember the relid we are waiting on in a backend-only
variable so it can be used later when generating stats.

(2) Complex - record the additional information within the unused
fields of the locktag of a TRANSACTION lock, so they will be stored
and accessible by all, within the shared lock table itself. This would
require changing the hash function for transaction lock tags so that
only the key and not the additional payload items get hashed. That
seemed a much more contentious change.

> Also, what
> about VirtualXactLockTableWait()..? Should that have a similar
> treatment?

That isn't needed, IMHO. We wait on Xids whenever we see them on heap
tuples. Vxids aren't recorded anywhere, so they don't offer the same
blockage, which is basically for visibility checks on index creation.
That is more obvious. One might argue its needed as well, but I would
see it as a separate patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: log_lock_waits to identify transaction's relation
Date: 2013-01-16 16:12:29
Message-ID: 20130116161229.GL16126@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
> > A couple quick notes regarding the patch- what does
> > GetXactLockTableRelid really provide..?
>
> The ability to access a static variable in a different module. It
> doesn't provide anything other than that,

It isn't actually necessary for that currently though, is it? All calls
to it appeared to be inside the same module. Do you anticipate that
needing to change in the future?

> The information isn't available, which is why I didn't include it.
> Comments explain that the additional information is only available
> within the backend that was waiting. That's sufficient for stats, but
> not enough for an extended multi-backend deadlock report.

Right, sorry, to be clear, it seemed that we could simply detect if the
information is available and print it out if it is- and not print it if
it isn't. The additional boolean variable looked to be unnecessary, or
is there a point where OidIsValid(GetXactLockTableRelid()) will be true
but the additional information shouldn't be printed or isn't actually
available?

> There is an API change to XactLockTableWait() to pass the relid. That
> part is essential to any solution.

I'm wondering if we could pass in something both more generic and with
more information- perhaps a simple string which is constructed when we
have more information and is then included in the log message which is
generated? Perhaps something which includes both the relname and the
OID? Could that be used across more than just these kinds of locks? Of
course, on the flip side, I understand that we don't want to spend a lot
of time building strings and whatnot during all of these code paths;
would be nice if we could defer that until we're about to log.

Regardless, I do think it'd be good to update the comments to indicate
that the relid is being passed in solely for the purpose of being
available to be included in the log, if necessary.

> (2) Complex - record the additional information within the unused
> fields of the locktag of a TRANSACTION lock, so they will be stored
> and accessible by all, within the shared lock table itself. This would
> require changing the hash function for transaction lock tags so that
> only the key and not the additional payload items get hashed. That
> seemed a much more contentious change.

As the lock table is itself contended for greatly, I agree that we
probably don't want to go changing it. It would be nice if we had a way
to associate additional meta-data with why a lock is being attempted
though and for logging it.

> That isn't needed, IMHO. We wait on Xids whenever we see them on heap
> tuples. Vxids aren't recorded anywhere, so they don't offer the same
> blockage, which is basically for visibility checks on index creation.
> That is more obvious. One might argue its needed as well, but I would
> see it as a separate patch.

Alright.

Thanks,

Stephen


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: log_lock_waits to identify transaction's relation
Date: 2013-01-17 01:45:00
Message-ID: CA+U5nMJ8OVFW5R5VU9z+UDHfLiqopGJvVq4Q6wMNEP=35wC9xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 January 2013 16:12, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Simon Riggs (simon(at)2ndquadrant(dot)com) wrote:
>> > A couple quick notes regarding the patch- what does
>> > GetXactLockTableRelid really provide..?
>>
>> The ability to access a static variable in a different module. It
>> doesn't provide anything other than that,
>
> It isn't actually necessary for that currently though, is it? All calls
> to it appeared to be inside the same module. Do you anticipate that
> needing to change in the future?

OK, will change.

>> The information isn't available, which is why I didn't include it.
>> Comments explain that the additional information is only available
>> within the backend that was waiting. That's sufficient for stats, but
>> not enough for an extended multi-backend deadlock report.
>
> Right, sorry, to be clear, it seemed that we could simply detect if the
> information is available and print it out if it is- and not print it if
> it isn't. The additional boolean variable looked to be unnecessary, or
> is there a point where OidIsValid(GetXactLockTableRelid()) will be true
> but the additional information shouldn't be printed or isn't actually
> available?

You might be reporting a deadlock in another backend when you are
waiting on a transaction yourself. If we don't address that we'll end
up reporting the wrong info in a way that makes it look real and
probably never even notice.

>> There is an API change to XactLockTableWait() to pass the relid. That
>> part is essential to any solution.
>
> I'm wondering if we could pass in something both more generic and with
> more information- perhaps a simple string which is constructed when we
> have more information and is then included in the log message which is
> generated? Perhaps something which includes both the relname and the
> OID? Could that be used across more than just these kinds of locks? Of
> course, on the flip side, I understand that we don't want to spend a lot
> of time building strings and whatnot during all of these code paths;
> would be nice if we could defer that until we're about to log.

Keeping it as an Oid is better, since we don't always need the string.

> Regardless, I do think it'd be good to update the comments to indicate
> that the relid is being passed in solely for the purpose of being
> available to be included in the log, if necessary.

OK, will add.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services