Re: Let's invent a function to report lock-wait-blocking PIDs

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Let's invent a function to report lock-wait-blocking PIDs
Date: 2013-03-24 14:26:12
Message-ID: 514F0D04.5040404@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/20/13 2:02 PM, Tom Lane wrote:
> If isolationtester were the only market for this type of information,
> maybe it wouldn't be worth worrying about. But I'm pretty sure that
> there are a *lot* of monitoring applications out there that are trying
> to extract who-blocks-whom information from pg_locks. I hadn't realized
> before quite how painful it is to do that, even incorrectly.

As a FYI, the one Marco wrote here is over 100 lines of code, and while
he did a great job I'd still never suggest we release it--because it's
misleading in just enough cases to be dangerous. We can run it
usefully, but I'd never hand this over to a customer and expect them to
do something with it.

> I propose that we should add a backend function that simplifies this
> type of query. The API that comes to mind is (name subject to
> bikeshedding)
>
> pg_blocking_pids(pid int) returns int[]

I think there's a whole family of functions like this needed. This is
one of them, so if it helps the isolation tester I'd be happy to see it
added as a first one, whether or not more come along one day.

I'd rather get the data back as a SRF because I'd usually be joining it
to pg_locks and/or pg_stat_activity to figure out what the blocking pids
own or are doing. You can obviously convert the array form to/from the
SRF form. The exposed function API that is easier for users to join
with is my preference. If the isolation tester is easier to write
against the array form, it can play the appropriate nesting game to do
so. I see that as the unusual case though, and it is also the one being
coded by people who know how to handle the conversion.

The longer list of views/functions I keep wanting includes things like:

-What processes are blocking P from running? [This new function]

-What processes hold locks and are running usefully--they have some
locks but all are granted? [Easy to extract from pg_locks]

-For each running process, which processes are waiting on them?
[Requires a long WITH RECURSIVE query that doesn't get trapped by
circular locks]

-If I try to grab lock type L on object O, what existing locks will that
conflict with?

One really magic thing I'd like in this area is EXPLAIN (ANALYZE ON,
LOCKS ON) which pops out a list of all the locks acquired when running
that statement. We're never going to get fully correct documentation of
what locks a given statement needs. If I can figure that out in a test
environment by running the statement there and seeing what locks it
grabbed along the way, that would eliminate most of the need for
documenting things.

Note that an EXPLAIN based approach doesn't solve all the problems in
this area, because the trickiest ones I run into are ALTER TABLE
changes--which you can't EXPLAIN. Some API that dumps the locks an
arbitrary statement acquired just before it exits would be ideal. When
a user can ask "what locks did an ALTER TABLE adding a foreign key take
and what order were they grabbed in?", that would solve the hardest of
the questions I see in the field.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-03-24 15:39:44 pgsql: Add parallel pg_dump option.
Previous Message Tom Lane 2013-03-24 13:38:15 Re: Limiting setting of hint bits by read-only queries; vacuum_delay