Re: Patch: show relation and tuple infos of a lock to acquire

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Christian Kruse <christian(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: show relation and tuple infos of a lock to acquire
Date: 2014-02-22 06:23:24
Message-ID: CAA4eK1+-HTa5u+wfyUJRJhwYLqH=Pu8CybWEmF3oqNe0pjjgiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 21, 2014 at 4:55 PM, Christian Kruse
<christian(at)2ndquadrant(dot)com> wrote:
> Hi,
>> 1. New context added by this patch is display at wrong place
>> [...]
>> Do this patch expect to print the context at cancel request
>> as well?
>
> To be honest, I don't see a problem here. If you cancel the request in
> most cases it is because it doesn't finish in an acceptable time. So
> displaying the context seems logical to me.

Isn't it a matter of consistency, we might not be able to display it
on all long running updates, rather only when it goes for update on tuple
on which some other transaction is operating.

Also though this functionality is mainly going to be useful for the case
when log_lock_waits = on, but it will display newly added context even
without that.

Is it possible that we set callback to error_context_stack, only
when we go for displaying this message. The way to do could be that
rather than forming callback in local variable in fuction
XactLockTableWaitWithErrorCallback(), store it in global variable
and then use that at appropriate place to set error_context_stack.

In general, why I am suggesting to restrict display of newly added
context for the case it is added to ensure that it doesn't get started
displaying in other cases where it might make sense or might not
make sense.

I think if it is too tedious to do it or there are some problems
due to which we cannot restrict it for case it has been added, then
we can think of going with it.

>> 2a. About the message:
>> LOG: process 1936 still waiting for ShareLock on transaction 842
>> after 1014.000ms
>> CONTEXT: relation name: foo (OID 57496)
>> tuple (ctid (0,1)): (1, 2, abc )
>>
>> Isn't it better to display database name as well, as if we see in
>> one of related messages as below, it displays database name as well.
>>
>> "LOG: process 3012 still waiting for ExclusiveLock on tuple (0,1) of
>> relation 57
>> 499 of database 12045 after 1014.000 ms"
>
> Maybe. But then we either have duplicate infos in some cases (e.g. on
> a table lock) or we have to distinguish error cases and show distinct
> contextes.

I think if we go with the way suggested above, then this should
not be problem, or do you think it can still create duplicate
info problem.

> And also getting the database name from a relation is not
> really straight forward

I think there should not be any complication, we can do something like
get_database_name(rel->rd_node.dbNode);

> (according to Andres we would have to look at
> rel->rd_node.dbNode) and since other commands dealing with names don't
> do so, either, I think we should leave out the database name.

For this case as I have shown that in similar message, we already display
database oid, it should not be a problem to display here as well.
It will be more information to user.

>> 2b. I think there is no need to display *ctid* before tuple offset, it seems
>> to be obvious as well as consistent with other similar message.
>
> Ok, I'll drop it.

I have asked to just remove *ctid* from info not the actual info of
tuple location. It should look like tuple (0,1).

The way you have currently changed the code, it is crashing the backend.

Another thing related to this is that below code should also not use *ctid*
+ if (geterrlevel() >= ERROR)
+ {
+ errcontext("tuple ctid (%u,%u)",
+ BlockIdGetBlockNumber(&(info->tuple->t_self.ip_blkid)),
+ info->tuple->t_self.ip_posid);
+ }

>> 3. About callback I think rather than calling setup/tear down
>> functions from multiple places, it will be better to have wrapper
>> functions for XactTableLockWait() and MultiXactIdWait(). Just check
>> in plpgsql_exec_function(), how the errorcallback is set, can we do
>> something similar without to avoid palloc?
>
> OK, Attached.

It looks better than previous, one minor suggestion:
Function name XactLockTableWaitWithErrorCallback() looks bit awkward
to me, shall we name something like XactLockTableWaitWithInfo()?

>> 4. About the values of tuple
>> I think it might be useful to display some unique part of tuple for
>> debugging part, but what will we do incase there is no primary
>> key on table, so may be we can display initial few columns (2 or 3)
>> or just display tuple offset as is done in other similar message
>> shown above.
>
> Currently I simply display the whole tuple with truncating long
> fields. This is plain easy, no distinction necessary. I did some code
> reading and it seems somewhat complex to get the PK columns and it
> seems that we need another lock for this, too - maybe not the best
> idea when we are already in locking trouble, do you disagree?

I don't think you need to take another lock for this, it must already
have appropriate lock by that time. There should not be any problem
in calling relationHasPrimaryKey except that currently it is static, you
need to expose it.

> Also showing just a few columns when no PK is available might not be
> helpful since the tuple is not necessarily identified by the columns
> showed. And since we drop the CTID there is no alternative solution to
> identify the tuple.

We should not drop the CTID value.

> IMHO simply displaying the whole tuple is the best solution in this
> case, do you agree?

I think what we can do is check if relation has primary or unique
key, then display the value for same, else don't display.

>> More to the point, I have observed that in few cases
>> displaying values of tuple can be confusing as well. For Example:
>> [...]
>> Now here it will be bit confusing to display values with tuple, as
>> in session-3 statement has asked to update value (3), but we have
>> started waiting for value (4). Although it is right, but might not
>> make much sense.
>
> What do you suggest to avoid confusion? I can see what you are talking
> about, but I'm not sure what to do about it.

I think best could have been to just display tuple offset (ctid) to avoid
such confusion, but I think displaying for unique/primary key could be
acceptable way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-02-22 09:39:16 Re: walsender doesn't send keepalives when writes are pending
Previous Message MauMau 2014-02-22 04:44:19 Re: [bug fix] postgres.exe fails to start on Windows Server 2012 due to ASLR