Lock Wait Statistics (next commitfest)

Lists: pgsql-hackers
From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Lock Wait Statistics (next commitfest)
Date: 2009-01-25 23:57:56
Message-ID: 497CFC84.2070302@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Where I work they make extensive use of Postgresql. One of the things
they typically want to know about are lock waits. Out of the box in
there is not much in the way of tracking for such, particularly in older
versions. The view pg_stats is fine for stuff happening *now*, but
typically I find I'm being asked about something that happened last
night...

Now for those platforms with dtrace there is a lock wait probe, useful -
but not for those of us on Linux! There is the conf option to log lock
waits > deadlock timeout (8.3 onwards), and this is great, but I
wondered if having something like this available as part of the stats
module would be useful.

So here is my initial attempt at this, at this point merely to spark
discussion (attached patch)

I have followed some of the ideas from the function execution stats, but
locks required some special treatment.

- new parameter track_locks (maybe should be track_lock_waits?)
- new hash locks attached to stats dbentry
- several new stat*lock c functions
- new view pg_stat_lock_waits
- new attributes for pg_stat_database

This version has simply exposed the locktag structure in the view along
with corresponding #waits and wait time. This should probably get
reformed to look a little more like pg_locks. I figured this is easily
doable along with the no doubt many changes coming from revew comments.

Also I did not do any clever amalgamation of transaction lock waits -
there is probably gonna be a lot of those and keeping the detail is
unlikely to be useful. It would be easy to collect them all together in
one transaction entry.

regards

Mark

Attachment Content-Type Size
lockstats-1.patch.gz application/x-gzip 10.1 KB

From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-10 17:23:47
Message-ID: 3073cc9b0907101023od71e75av8ab43466638c2d18@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 25, 2009 at 6:57 PM, Mark Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>
> So here is my initial attempt at this, at this point merely to spark
> discussion (attached patch)
>

this patch doesn't apply cleanly to head... can you update it, please?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-11 00:56:20
Message-ID: 3073cc9b0907101756s6b433824i2bf35f623ee1c2a9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 10, 2009 at 12:23 PM, Jaime
Casanova<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Sun, Jan 25, 2009 at 6:57 PM, Mark Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>>
>> So here is my initial attempt at this, at this point merely to spark
>> discussion (attached patch)
>>
>
> this patch doesn't apply cleanly to head... can you update it, please?
>

i did it myself, i think this is something we need...

this compile and seems to work... something i was wondering is that
having the total time of lock waits is not very accurate because we
can have 9 lock waits awaiting 1 sec each and one awaiting for 1
minute... simply sum them all will give a bad statistic or am i
missing something?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Attachment Content-Type Size
lockstats-2.patch text/x-diff 49.6 KB

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-17 08:38:46
Message-ID: 4A603896.7010100@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova wrote:
>
> i did it myself, i think this is something we need...
>
> this compile and seems to work... something i was wondering is that
> having the total time of lock waits is not very accurate because we
> can have 9 lock waits awaiting 1 sec each and one awaiting for 1
> minute... simply sum them all will give a bad statistic or am i
> missing something?
>
>
Thank you Jaime - looks good. I seem to have been asleep at the wheel
and missed *both* of your emails until now, belated apologies for that
- especially the first one :-(

With respect to the sum of wait times being not very granular, yes -
quite true. I was thinking it is useful to be able to answer the
question 'where is my wait time being spent' - but it hides cases like
the one you mention. What would you like to see? would max and min wait
times be a useful addition, or are you thinking along different lines?

Cheers

Mark


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-20 07:08:57
Message-ID: 3073cc9b0907200008ua66c322peaa5dc21156f3db1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 17, 2009 at 3:38 AM, Mark Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>
> With respect to the sum of wait times being not very granular, yes - quite
> true. I was thinking it is useful to be able to answer the question 'where
> is my wait time being spent' - but it hides cases like the one you mention.
> What would you like to see?  would max and min wait times be a useful
> addition, or are you thinking along different lines?
>

track number of locks, sum of wait times, max(wait time).
but actually i started to think that the best is just make use of
log_lock_waits send the logs to csvlog and analyze there...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-23 07:41:56
Message-ID: 4A681444.80004@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova wrote:
> On Fri, Jul 17, 2009 at 3:38 AM, Mark Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>
>> With respect to the sum of wait times being not very granular, yes - quite
>> true. I was thinking it is useful to be able to answer the question 'where
>> is my wait time being spent' - but it hides cases like the one you mention.
>> What would you like to see? would max and min wait times be a useful
>> addition, or are you thinking along different lines?
>>
>>
>
> track number of locks, sum of wait times, max(wait time).
> but actually i started to think that the best is just make use of
> log_lock_waits send the logs to csvlog and analyze there...
>
>
Right - I'll look at adding max (at least) early next week.

Yeah, enabling log_lock_waits is certainly another approach, however you
currently miss out on those that are < deadlock_timeout - and
potentially they could be the source of your problem (i.e millions of
waits all < deadlock_timeout but taken together rather significant).
This shortcoming could be overcome by making the cutoff wait time
decoupled from deadlock_timeout (e.g a new parameter
log_min_lock_wait_time or similar).

I'm thinking that having the lock waits analyzable via sql easily may
mean that for most people they don't need to collect and analyze their
logs for this stuff (they just examine the lock stats view from Pgadmin
or similar).

Cheers

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-23 16:16:38
Message-ID: 18010.1248365798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
> Yeah, enabling log_lock_waits is certainly another approach, however you
> currently miss out on those that are < deadlock_timeout - and
> potentially they could be the source of your problem (i.e millions of
> waits all < deadlock_timeout but taken together rather significant).
> This shortcoming could be overcome by making the cutoff wait time
> decoupled from deadlock_timeout (e.g a new parameter
> log_min_lock_wait_time or similar).

The reason that they're tied together is to keep from creating
unreasonable complexity (and an unreasonable number of extra kernel
calls) in management of the timeout timers. You will find that you
can't just wave your hand and decree that they are now decoupled.

regards, tom lane


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-25 00:22:50
Message-ID: 4A6A505A.5070504@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
>
>> Yeah, enabling log_lock_waits is certainly another approach, however you
>> currently miss out on those that are < deadlock_timeout - and
>> potentially they could be the source of your problem (i.e millions of
>> waits all < deadlock_timeout but taken together rather significant).
>> This shortcoming could be overcome by making the cutoff wait time
>> decoupled from deadlock_timeout (e.g a new parameter
>> log_min_lock_wait_time or similar).
>>
>
> The reason that they're tied together is to keep from creating
> unreasonable complexity (and an unreasonable number of extra kernel
> calls) in management of the timeout timers. You will find that you
> can't just wave your hand and decree that they are now decoupled.
>
>

Thanks Tom - I did wonder if there was a deeper reason they were tied
together!

Cheers

Mark


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-07-25 00:29:01
Message-ID: 4A6A51CD.50604@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> Jaime Casanova wrote:
>> On Fri, Jul 17, 2009 at 3:38 AM, Mark
>> Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>>
>>> With respect to the sum of wait times being not very granular, yes -
>>> quite
>>> true. I was thinking it is useful to be able to answer the question
>>> 'where
>>> is my wait time being spent' - but it hides cases like the one you
>>> mention.
>>> What would you like to see? would max and min wait times be a useful
>>> addition, or are you thinking along different lines?
>>>
>>>
>>
>> track number of locks, sum of wait times, max(wait time).
>> but actually i started to think that the best is just make use of
>> log_lock_waits send the logs to csvlog and analyze there...
>>
>>
> Right - I'll look at adding max (at least) early next week.
>

I'm also thinking of taking a look at amalgamating transaction type lock
waits. This seems like a good idea because:

- individually, and viewed at a later date, I don't think they
individual detail is going to be useful
- there will be a lot of them
- I think the statistical data (count, sum elapsed, max elapsed) may be
sufficiently interesting

Cheers

Mark


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-08-01 01:14:16
Message-ID: 4A7396E8.2040407@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> Jaime Casanova wrote:
>> On Fri, Jul 17, 2009 at 3:38 AM, Mark
>> Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>>
>>> With respect to the sum of wait times being not very granular, yes -
>>> quite
>>> true. I was thinking it is useful to be able to answer the question
>>> 'where
>>> is my wait time being spent' - but it hides cases like the one you
>>> mention.
>>> What would you like to see? would max and min wait times be a useful
>>> addition, or are you thinking along different lines?
>>>
>>>
>>
>> track number of locks, sum of wait times, max(wait time).
>> but actually i started to think that the best is just make use of
>> log_lock_waits send the logs to csvlog and analyze there...
>>
>>
> Right - I'll look at adding max (at least) early next week.
>
>
>
>
Patch with max(wait time).

Still TODO

- amalgamate individual transaction lock waits
- redo (rather ugly) temporary pg_stat_lock_waits in a form more like
pg_locks

Attachment Content-Type Size
lockstats-3.patch.gz application/x-gzip 10.5 KB

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-08-09 00:47:18
Message-ID: 4A7E1C96.3090201@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> Mark Kirkwood wrote:
>> Jaime Casanova wrote:
>>> On Fri, Jul 17, 2009 at 3:38 AM, Mark
>>> Kirkwood<markir(at)paradise(dot)net(dot)nz> wrote:
>>>
>>>> With respect to the sum of wait times being not very granular, yes
>>>> - quite
>>>> true. I was thinking it is useful to be able to answer the question
>>>> 'where
>>>> is my wait time being spent' - but it hides cases like the one you
>>>> mention.
>>>> What would you like to see? would max and min wait times be a useful
>>>> addition, or are you thinking along different lines?
>>>>
>>>>
>>>
>>> track number of locks, sum of wait times, max(wait time).
>>> but actually i started to think that the best is just make use of
>>> log_lock_waits send the logs to csvlog and analyze there...
>>>
>>>
>> Right - I'll look at adding max (at least) early next week.
>>
>>
>>
>>
> Patch with max(wait time).
>
> Still TODO
>
> - amalgamate individual transaction lock waits
> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
> pg_locks
>
This version has the individual transaction lock waits amalgamated.

Still TODO: redo pg_stat_lock_waits ...

Attachment Content-Type Size
lockstats-4.patch.gz application/x-gzip 10.6 KB

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-09-16 01:58:03
Message-ID: 20090916015803.GY17756@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

Your last email on this patch, from August 9th, indicates that you've
still got "TODO: redo pg_stat_lock_waits ...". Has you updated this
patch since then?

Thanks!

Stephen


From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "Stephen Frost" <sfrost(at)snowman(dot)net>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-09-18 18:44:39
Message-ID: op.u0gucp0ucke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I have this patch, if you're interested.

LWLock Instrumentation Patch

- counts locks and waits in shared and exclusive mode
- for selected locks, measures wait and hold times
- for selected locks, displays a histogram of wait and hold times
- information is printed at backend exit

Configurable by #define's in lwlock.c

Regards,
pierre

Attachment Content-Type Size
lwlock_instrument_v4.patch application/octet-stream 18.1 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-09-18 18:59:09
Message-ID: 4AB3D87D.3050908@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pierre,

> Configurable by #define's in lwlock.c

Given that we already have dtrace/systemtap probes around the lwlocks,
is there some way you could use those instead of extra #defines?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-09-23 06:18:51
Message-ID: 4AB9BDCB.3010906@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost wrote:
> Mark,
>
> Your last email on this patch, from August 9th, indicates that you've
> still got "TODO: redo pg_stat_lock_waits ...". Has you updated this
> patch since then?
>
>
>

Stephen,

No - that is still a TODO for me - real life getting in the way :-)

Cheers

Mark


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-09-28 07:14:30
Message-ID: 3073cc9b0909280014g55ac952cke351d806f9c9420b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> wrote:
>>>
>>>
>> Patch with max(wait time).
>>
>> Still TODO
>>
>> - amalgamate individual transaction lock waits
>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>> pg_locks
>>
> This version has the individual transaction lock waits amalgamated.
>
> Still TODO: redo pg_stat_lock_waits ...
>

it applies with some hunks, compiles fine and seems to work...
i'm still not sure this is what we need, some more comments could be helpful.

what kind of questions are we capable of answer with this and and what
kind of questions are we still missing?

for example, now we know "number of locks that had to wait", "total
time waiting" and "max time waiting for a single lock"... but still we
can have an inaccurate understanding if we have lots of locks waiting
little time and a few waiting a huge amount of time...

something i have been asked when system starts to slow down is "can we
know if there were a lock contention on that period"? for now the only
way to answer that is logging locks

about the patch itself:
you haven't documented either. what is the pg_stat_lock_waits view
for? and what are those fieldx it has?

i'll let this patch as "needs review" for more people to comment on it...

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-09-30 01:02:36
Message-ID: f67928030909291802r79501f58y2e93fe469ff34db1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova
<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> wrote:
>>>>
>>>>
>>> Patch with max(wait time).
>>>
>>> Still TODO
>>>
>>> - amalgamate individual transaction lock waits
>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>>> pg_locks
>>>
>> This version has the individual transaction lock waits amalgamated.
>>
>> Still TODO: redo pg_stat_lock_waits ...
>>
>
> it applies with some hunks, compiles fine and seems to work...
> i'm still not sure this is what we need, some more comments could be helpful.

I'm pretty sure the logic of this patch is not correct.

in pgstat_init_lock_wait(LOCKTAG *locktag)
...
+ l_curr = htabent->l_counts.l_tot_wait_time;
+ INSTR_TIME_SET_CURRENT(l_start);
+ INSTR_TIME_ADD(l_curr, l_start);
+ htabent->l_counts.l_tot_wait_time = l_curr;

in pgstat_end_lock_wait(LOCKTAG *locktag)
...
+ l_start = htabent->l_counts.l_tot_wait_time;
+ INSTR_TIME_SET_CURRENT(l_end);
+ INSTR_TIME_SUBTRACT(l_end, l_start);
+ htabent->l_counts.l_tot_wait_time = l_end;

So l_start = time cumulatively waited previously + time at start of this wait.

l_end - l_start is equal to:

= time at end of this wait - ( time at start of this wait + time
cumulatively waited previously)
= (time at end of this wait - time at start of this wait) - time
cumulatively waited previously
= (duration of this wait) - time waited cumulatively previously.

That minus sign in the last line can't be good, can it?

Also

+ htabent->l_counts.l_tot_wait_time = l_end;
+
+ if (INSTR_TIME_GET_MICROSEC(l_end) >
INSTR_TIME_GET_MICROSEC(htabent->l_counts.l_max_wait_time))
+ htabent->l_counts.l_max_wait_time = l_end;

The total wait time is equal to the max wait time (which are both
equal to l_end)?
One or both of those has to end up being wrong. At this stage, is
l_end supposed to be the last wait time, or the cumulative wait time?

One of the things in the patch review checklist is about compiler
warnings, so I am reporting these:

lock.c: In function `LockAcquire':
lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards
qualifiers from pointer target type
lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards
qualifiers from pointer target type

Cheers,

Jeff


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-10-03 23:14:55
Message-ID: 4AC7DAEF.1050409@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova wrote:
>
> it applies with some hunks, compiles fine and seems to work...
> i'm still not sure this is what we need, some more comments could be helpful.
>
>
Yeah, that's the big question. Are the current capabilities (logging 'em
for waits > deadlock timeout + dtrace hooks) enough? I'm thinking that
if we had dtrace for linux generally available, then the need for this
patch would be lessened.

> what kind of questions are we capable of answer with this and and what
> kind of questions are we still missing?
>
> for example, now we know "number of locks that had to wait", "total
> time waiting" and "max time waiting for a single lock"... but still we
> can have an inaccurate understanding if we have lots of locks waiting
> little time and a few waiting a huge amount of time...
>
> something i have been asked when system starts to slow down is "can we
> know if there were a lock contention on that period"? for now the only
> way to answer that is logging locks
>
>
Right - there still may be other aggregates that need to be captured....
it would be great to have some more feedback from the field about this.
In my case, I was interested in seeing if the elapsed time was being
spent waiting for locks or actually executing (in fact it turned out to
be the latter - but was still very useful to be able to rule out locking
issues). However , as you mention - there maybe cases where the
question is more about part of the system suffering a disproportional
number/time of lock waits...

> about the patch itself:
> you haven't documented either. what is the pg_stat_lock_waits view
> for? and what are those fieldx it has?
>
>

Yeah, those fields are straight from the LOCKTAG structure. I need to
redo the view to be more like pg_locks, and also do the doco. However
I've been a bit hesitant to dive into these last two steps until I see
that there is some real enthusiasm for this patch (or failing that, a
feeling that it is needed...).

> i'll let this patch as "needs review" for more people to comment on it...
>
>
Agreed, thanks for the comments.

Mark


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-10-03 23:22:03
Message-ID: 4AC7DC9B.9060904@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes wrote:
>
> The total wait time is equal to the max wait time (which are both
> equal to l_end)?
> One or both of those has to end up being wrong. At this stage, is
> l_end supposed to be the last wait time, or the cumulative wait time?
>
>
>
Hmm - I may well have fat fingered the arithmetic, thanks I'll take a look!

> One of the things in the patch review checklist is about compiler
> warnings, so I am reporting these:
>
> lock.c: In function `LockAcquire':
> lock.c:797: warning: passing arg 1 of `pgstat_init_lock_wait' discards
> qualifiers from pointer target type
> lock.c:802: warning: passing arg 1 of `pgstat_end_lock_wait' discards
> qualifiers from pointer target type
>
>
>
>

Right, will look at too.

Cheers

Mark


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Cc: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-10-04 20:14:32
Message-ID: f67928030910041314t22e9dd83t9b85ba8e90648c67@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova
<jcasanov(at)systemguards(dot)com(dot)ec> wrote:
> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> wrote:
>>>>
>>>>
>>> Patch with max(wait time).
>>>
>>> Still TODO
>>>
>>> - amalgamate individual transaction lock waits
>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>>> pg_locks
>>>
>> This version has the individual transaction lock waits amalgamated.
>>
>> Still TODO: redo pg_stat_lock_waits ...
>>
>
> it applies with some hunks, compiles fine and seems to work...
> i'm still not sure this is what we need, some more comments could be helpful.
>
> what kind of questions are we capable of answer with this and and what
> kind of questions are we still missing?
>
> for example, now we know "number of locks that had to wait", "total
> time waiting" and "max time waiting for a single lock"... but still we
> can have an inaccurate understanding if we have lots of locks waiting
> little time and a few waiting a huge amount of time...

Aren't the huge ones already loggable from the deadlock detector?

With the max, we can at least put an upper limit on how long the
longest ones could have been. However, is there a way to reset the
max? I tried deleting data/pg_stat_tmp, but that didn't work. With
cumulative values, you can you take snapshots and then take the
difference of them, that won't work with max. If the max can't be
reset except with an initdb, I think that makes it barely usable.

> something i have been asked when system starts to slow down is "can we
> know if there were a lock contention on that period"? for now the only
> way to answer that is logging locks

I was surprised to find that running with track_locks on did not cause
a detectable difference in performance, so you could just routinely do
regularly scheduled snapshots and go back and mine them over the time
that a problem was occurring. I just checked with pgbench over
various levels of concurrency and fsync settings. If potential
slowness wouldn't show up there, I don't know how else to look for it.

Cheers,

Jeff


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2009-10-06 01:03:34
Message-ID: 603c8f070910051803n26c69260wd6735431291fc0d5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 4, 2009 at 4:14 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, Sep 28, 2009 at 12:14 AM, Jaime Casanova
> <jcasanov(at)systemguards(dot)com(dot)ec> wrote:
>> On Sat, Aug 8, 2009 at 7:47 PM, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> wrote:
>>>>>
>>>>>
>>>> Patch with max(wait time).
>>>>
>>>> Still TODO
>>>>
>>>> - amalgamate individual transaction lock waits
>>>> - redo (rather ugly) temporary pg_stat_lock_waits in a form more like
>>>> pg_locks
>>>>
>>> This version has the individual transaction lock waits amalgamated.
>>>
>>> Still TODO: redo pg_stat_lock_waits ...
>>>
>>
>> it applies with some hunks, compiles fine and seems to work...
>> i'm still not sure this is what we need, some more comments could be helpful.
>>
>> what kind of questions are we capable of answer with this and and what
>> kind of questions are we still missing?
>>
>> for example, now we know "number of locks that had to wait", "total
>> time waiting" and "max time waiting for a single lock"... but still we
>> can have an inaccurate understanding if we have lots of locks waiting
>> little time and a few waiting a huge amount of time...
>
> Aren't the huge ones already loggable from the deadlock detector?
>
> With the max, we can at least put an upper limit on how long the
> longest ones could have been.  However, is there a way to reset the
> max?  I tried deleting data/pg_stat_tmp, but that didn't work.  With
> cumulative values, you can you take snapshots and then take the
> difference of them, that won't work with max.  If the max can't be
> reset except with an initdb, I think that makes it barely usable.
>
>> something i have been asked when system starts to slow down is "can we
>> know if there were a lock contention on that period"? for now the only
>> way to answer that is logging locks
>
> I was surprised to find that running with track_locks on did not cause
> a detectable difference in performance, so you could just routinely do
> regularly scheduled snapshots and go back and mine them over the time
> that a problem was occurring.  I just checked with pgbench over
> various levels of concurrency and fsync settings.  If potential
> slowness wouldn't show up there, I don't know how else to look for it.

It seems that this patch had open TODO items at the beginning of the
CommitFest (so perhaps we should have bounced it immediately), and I
think that's still the case now, so I am going to mark this as
Returned with Feedback. A lot of good reviewing has been done,
though, so many this can be submitted for a future CommitFest in
something close to a final form.

Thanks,

...Robert


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 03:15:21
Message-ID: 201002270315.o1R3FLt23583@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


What happened to this patch?

---------------------------------------------------------------------------

Mark Kirkwood wrote:
> Where I work they make extensive use of Postgresql. One of the things
> they typically want to know about are lock waits. Out of the box in
> there is not much in the way of tracking for such, particularly in older
> versions. The view pg_stats is fine for stuff happening *now*, but
> typically I find I'm being asked about something that happened last
> night...
>
> Now for those platforms with dtrace there is a lock wait probe, useful -
> but not for those of us on Linux! There is the conf option to log lock
> waits > deadlock timeout (8.3 onwards), and this is great, but I
> wondered if having something like this available as part of the stats
> module would be useful.
>
> So here is my initial attempt at this, at this point merely to spark
> discussion (attached patch)
>
> I have followed some of the ideas from the function execution stats, but
> locks required some special treatment.
>
> - new parameter track_locks (maybe should be track_lock_waits?)
> - new hash locks attached to stats dbentry
> - several new stat*lock c functions
> - new view pg_stat_lock_waits
> - new attributes for pg_stat_database
>
> This version has simply exposed the locktag structure in the view along
> with corresponding #waits and wait time. This should probably get
> reformed to look a little more like pg_locks. I figured this is easily
> doable along with the no doubt many changes coming from revew comments.
>
> Also I did not do any clever amalgamation of transaction lock waits -
> there is probably gonna be a lot of those and keeping the detail is
> unlikely to be useful. It would be easy to collect them all together in
> one transaction entry.
>
> regards
>
> Mark
>
>
>

[ application/x-gzip is not supported, skipping... ]

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 05:10:50
Message-ID: 4B88A95A.4020803@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> What happened to this patch?
>

Returned with feedback in October after receiving a lot of review, no
updated version submitted since then:

https://commitfest.postgresql.org/action/patch_view?id=98

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 06:43:49
Message-ID: 9362e74e1002262243s5a6977fap74d4a52c2ca0722a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am just adding my two cents, please ignore it, if its totally irrelevant.
While we do performance testing/tuning of any applications, the important
things, a standard monitoring requirement from a database are
a) Different type of wait events and the time spent in each of them
b) Top ten Queries by Total Logical reads & Average Logical Reads
c) Top ten Queries by Total CPU Time & Average CPU Time

The monitoring methodology should not put too much overhead during the test
to invalidate the application response times captured during the performance
test (Let's not worry about Heisenberg uncertainty for now :)) )

Of all the databases i worked with, Oracle provides the best monitoring
product in the form of Statspack.

Statspack works by the following way -a) it takes a copy of important
catalog tables(pg_ tables) which store the information like wait statistics
against wait events, i/o statistics cumulative against each SQL_Hash( and
SQL_Text), whether a particular plan went for hard parse/ soft parse(because
of plan caching) and the status of different in-memory data structures etc.

So we take a snapshot like this before and after the test and generate
statspack report out of it, which contains all the necessary information for
database level tuning. So we are never left in the dark from database tuning
perspective.

Recently i wrote a set of SQL Statements, which will do the same for SQL
Server from their sys tables like wait_io_events, query_io_stats etc and
finally will retrieve the information in the same format as Statspack.

But i think we lack some functionality like that in Postgres. I think things
like DTrace are more for developers than for users and as already pointed
out, will work only in Solaris. While we can expect that for Linux shortly,
people in windows do not have much options. (While i am maintaining that
DTrace is a absolutely wonderful hooking mechanism). So we should aim to
develop a monitoring mechanism like statspack for postgres.

Hope i have delievered my concern.

Thanks,
Gokul.

On Sat, Feb 27, 2010 at 10:40 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> Bruce Momjian wrote:
>
>> What happened to this patch?
>>
>>
>
> Returned with feedback in October after receiving a lot of review, no
> updated version submitted since then:
>
> https://commitfest.postgresql.org/action/patch_view?id=98
>
> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 10:40:24
Message-ID: 4B88F698.4000304@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
> Bruce Momjian wrote:
>> What happened to this patch?
>>
>
> Returned with feedback in October after receiving a lot of review, no
> updated version submitted since then:
>
> https://commitfest.postgresql.org/action/patch_view?id=98
>

Hmm - I would say a bit of review rather than a lot :-)

The feeling I received from the various comments was a lukewarm level of
interest at best, so I must confess that I let other things take
precedence. Also I was after some clear feedback about whether a
separate stats utility was necessary at all given we have Dtrace support
- despite this not being available for Linux... and the only comment
dealing to this concern is from Gokul just now!

I'd also like to take the opportunity to express a little frustration
about the commitfest business - really all I wanted was the patch
*reviewed* as WIP - it seemed that in order to do that I needed to enter
it into the various commitfests... then I was faced with comments to the
effect that it was not ready for commit so should not have been entered
into a commifest at all... sigh, a bit of an enthusiasm killer I'm afraid...

regards

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 16:44:28
Message-ID: 25908.1267289068@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> I'd also like to take the opportunity to express a little frustration
> about the commitfest business - really all I wanted was the patch
> *reviewed* as WIP - it seemed that in order to do that I needed to enter
> it into the various commitfests... then I was faced with comments to the
> effect that it was not ready for commit so should not have been entered
> into a commifest at all... sigh, a bit of an enthusiasm killer I'm afraid...

Well, entering a patch in a commitfest is certainly the best way to
ensure that you'll get some feedback. If you just pop it up on the
mailing lists, you may or may not draw much commentary depending on
how interested people are and how much time they have that day.
(A day or so later there'll be other topics to distract them.)

As long as the patch submission is clearly labeled WIP you shouldn't
get complaints about it not being ready to commit.

The other approach I'd suggest, if what you mainly want is design
review, is to not post a patch at all. Post a design spec, or even
just specific questions. It's less work for people to look at and
so you're more likely to get immediate feedback.

regards, tom lane


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 18:17:42
Message-ID: 4B8961C6.2000408@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> Greg Smith wrote:
>> Returned with feedback in October after receiving a lot of review, no
>> updated version submitted since then:
>>
>> https://commitfest.postgresql.org/action/patch_view?id=98
>>
>
> Hmm - I would say a bit of review rather than a lot :-)

It looks like you got useful feedback from at least three people, and
people were regularly looking at your patch in some form for about three
months. That's a lot of review. In many other open-source projects,
your first patch would have been rejected after a quick look as
unsuitable and that would have been the end of things for you. I feel
lucky every time I get a volunteer to spend time reading my work and
suggesting how it could be better; your message here doesn't seem to
share that perspective.

> I'd also like to take the opportunity to express a little frustration
> about the commitfest business - really all I wanted was the patch
> *reviewed* as WIP - it seemed that in order to do that I needed to
> enter it into the various commitfests... then I was faced with
> comments to the effect that it was not ready for commit so should not
> have been entered into a commifest at all... sigh, a bit of an
> enthusiasm killer I'm afraid...

To lower your frustration level next time, make sure to label the e-mail
and the entry on the CommitFest app with the magic abbreviation "WIP"
and this shouldn't be so much of an issue. The assumption for patches
is that someone submitted them as commit candidates, and therefore they
should be reviewed to that standard, unless clearly labeled otherwise.
You briefly disclaimed yours as not being in that category in the
initial text of your first message, but it was easy to miss that,
particularly once it had been >8 months from when that messages showed
up and it was still being discussed.

If you wanted to pick this back up again, I'd think that a look at
what's been happening with the lock_timeout GUC patch would be
informative--I'd think that has some overlap with the sort of thing you
were trying to do.

FYI, I thought your patch was useful, but didn't spent time on it
because it's not ambitious enough. I would like to see statistics on a
lot more types of waiting than just locks, and keep trying to find time
to think about that big problem rather than worrying about the
individual pieces of it.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Gokulakannan Somasundaram <gokul007(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 19:10:07
Message-ID: 4B896E0F.1000002@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gokulakannan Somasundaram wrote:
> Statspack works by the following way -a) it takes a copy of important
> catalog tables(pg_ tables) which store the information like wait
> statistics against wait events, i/o statistics cumulative against each
> SQL_Hash( and SQL_Text), whether a particular plan went for hard
> parse/ soft parse(because of plan caching) and the status of different
> in-memory data structures etc.

This is actually missing the real work that went into building this
feature into Oracle. Before it was possible to build statspack, first
they went to the trouble of noting every source of this form of
latency--lock waits, I/O statistics and waits, buffer pool waits--and
instrumented every single one of them internally. Basically, every time
something that might wait for a resource you later wanted to monitor the
wait for happens, a start/end timestamp for that wait is noted, and
ultimately the difference between them noting how long the event took is
stored into the database. That's the data you must have collected at
some point in order to get the summary.

Meanwhile, PostgreSQL development is resistant to making any changes in
this direction under the theory that a) it adds a lot of code complexity
and b) constant calls to get the current system time are too expensive
on some platforms to do them all the time. Until those two things are
sorted out, what the high-level interface to the direction you're
suggesting looks like doesn't really matter. DTrace support has managed
to clear both of those hurdles due to its optional nature, perceived low
overhead, and removing *storing* all the events generated to something
that happens outside of the database.

I agree with you that something like statspack is the direction
PostgreSQL eventually needs to go, but it's going to be an uphill battle
the whole time to get it built. The objections will be that it will add
too much overhead at the lowest levels, where the data needed to support
it is collected at. Once that is cleared, the high-level interface is
easy to build.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 21:04:31
Message-ID: 4B8988DF.1060703@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
> Mark Kirkwood wrote:
>> Greg Smith wrote:
>>> Returned with feedback in October after receiving a lot of review,
>>> no updated version submitted since then:
>>>
>>> https://commitfest.postgresql.org/action/patch_view?id=98
>>>
>>
>> Hmm - I would say a bit of review rather than a lot :-)
>
> It looks like you got useful feedback from at least three people, and
> people were regularly looking at your patch in some form for about
> three months. That's a lot of review. In many other open-source
> projects, your first patch would have been rejected after a quick look
> as unsuitable and that would have been the end of things for you. I
> feel lucky every time I get a volunteer to spend time reading my work
> and suggesting how it could be better; your message here doesn't seem
> to share that perspective.

I don't mean to be ungrateful about the actual reviews at all - and I
did value the feedback received (which I hope was reasonably clear in
the various replies I sent). I sense a bit of attacking the messenger in
your tone... I've submitted several patches to Postgres in the past, and
have previously always enjoyed the experience, and I do get the culture
- being a volunteer myself.
>
> To lower your frustration level next time, make sure to label the
> e-mail and the entry on the CommitFest app with the magic abbreviation
> "WIP" and this shouldn't be so much of an issue. The assumption for
> patches is that someone submitted them as commit candidates, and
> therefore they should be reviewed to that standard, unless clearly
> labeled otherwise. You briefly disclaimed yours as not being in that
> category in the initial text of your first message, but it was easy to
> miss that, particularly once it had been >8 months from when that
> messages showed up and it was still being discussed.
>

LOL - I said a bit - it was only a little, connected with the commit vs
review confusion. I think I just got caught in the bedding in time for
the new development processes, I was advised to add it to the
commitfests, and them advised that it should not have been at a later
time. Yes, a bit frustrating at the time but not earth shattering at
all. I'm mentioning it now mainly to help clarify the situation for
anyone else wanting a WIP patch reviewed! In my case while labeling as
WIP could well have helped - the (pretty short) message accompanying the
patch is very clear that there is stuff to do - using the magic phrase
"...merely to spark discussion..." - as were all the followup
accompanying ones, with phrases like "still todo...". So yes, next time
I'll label any patches more clearly, reviewers need to read the text of
the patch they are about to review (note that most did).

> If you wanted to pick this back up again, I'd think that a look at
> what's been happening with the lock_timeout GUC patch would be
> informative--I'd think that has some overlap with the sort of thing
> you were trying to do.
>
> FYI, I thought your patch was useful, but didn't spent time on it
> because it's not ambitious enough. I would like to see statistics on
> a lot more types of waiting than just locks, and keep trying to find
> time to think about that big problem rather than worrying about the
> individual pieces of it.
>
Excellent thanks - that is exactly the sort of comment that would have
been very valuable to have made at the time (echo's Gokul's recent one
interestingly enough). After all if enough people share this view, then
clearly I need to either forget about the current patch, or design
something more ambitious!

regards

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers(at)postgresql(dot)org, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 21:15:43
Message-ID: 4B898B7F.4080003@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/27/10 1:04 PM, Mark Kirkwood wrote:
>>
>
> LOL - I said a bit - it was only a little, connected with the commit vs
> review confusion. I think I just got caught in the bedding in time for
> the new development processes, I was advised to add it to the
> commitfests, and them advised that it should not have been at a later time.

Yeah,this is only the 2nd year we have done CFs, and is the first year
we've had non-Core Team managing them. So the cement on the procedure
is still wet.

--Josh


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 22:23:18
Message-ID: 4B899B56.1030400@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> I don't mean to be ungrateful about the actual reviews at all - and I
> did value the feedback received (which I hope was reasonably clear in
> the various replies I sent). I sense a bit of attacking the messenger
> in your tone...

I thought there was a moderately big difference between the reality of
the review you got and how you were characterizing it, and I was just
trying to provide some perspective on how bad a true "bit of review"
only would have worked. Since I saw you disclaimed that wording with a
smiley I know it wasn't intending to be ungrateful, and I didn't intend
to shoot the messenger. Apologies if my tone grazed you though.

In any case, process feedback noted and assimilated into recommended
practice: I just added a section about WIP patches to
http://wiki.postgresql.org/wiki/Submitting_a_Patch#Patch_submission

While I was in there I also added some more notes on my personal top
patch submission peeve, patches whose purpose in life is to improve
performance that don't come with associated easy to run test cases,
including a sample of that test running on a system that shows the
speedup clearly. If I were in charge I just would make it standard
project policy to reject any performance patch without those
characteristics immediately.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Performance Patches Was: Lock Wait Statistics (next commitfest)
Date: 2010-02-27 23:22:58
Message-ID: 4B89A952.1000709@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:
>
>
> While I was in there I also added some more notes on my personal top
> patch submission peeve, patches whose purpose in life is to improve
> performance that don't come with associated easy to run test cases,
> including a sample of that test running on a system that shows the
> speedup clearly. If I were in charge I just would make it standard
> project policy to reject any performance patch without those
> characteristics immediately.
>

While I completely agree that the submitter should be required to supply
a test case and their results, so the rest of us can try to reproduce
said improvement - rejecting the patch out of hand is a bit harsh I feel
- Hey, they may just have forgotten to supply these things! The reviewer
can always ask, can they not? I would prefer to see the wiki say
something along the lines of "If you don't supply a test case you will
be asked for one before any further review can proceed..."

Cheers

Mark


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Performance Patches Was: Lock Wait Statistics (next commitfest)
Date: 2010-02-28 00:06:24
Message-ID: 4B89B380.5040300@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> While I completely agree that the submitter should be required to
> supply a test case and their results, so the rest of us can try to
> reproduce said improvement - rejecting the patch out of hand is a bit
> harsh I feel - Hey, they may just have forgotten to supply these things!
I didn't put any strong wording in the Wiki, I was just mentioning my
personal position is far less tolerant of this than the current project
policy. What I added was:

"If the patch is intended to improve performance, it's a good idea to
include some reproducible tests to demonstrate the improvement. If a
reviewer cannot duplicate your claimed performance improvement in a
short period of time, it's very likely your patch will be bounced. Do
not expect that a reviewer is going to find your performance feature so
interesting that they will build an entire test suite to prove it works.
You should have done that as part of patch validation, and included the
necessary framework for testing with the submission."

Finding a reviewer for a performance patch and getting them up to speed
to evaluate any submitted patch is time intensive, and it really sucks
from the perspective of the CF manager and any reviewer who is handed a
messy one. The intention was not to cut people off without warning
them. The position I would advocate as being a fair one is that if you
don't provide a test case for a performance improvement patch, you can't
then expect that you'll be assigned a reviewer by the CF manager either
until that's corrected. And if time for the CF runs out before you do
that, you're automatically moved to "returned with
feedback"--specifically, "write us a test case".

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-28 01:50:37
Message-ID: 603c8f071002271750u6180d54ek640ffe340ca7cd3e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 27, 2010 at 5:40 AM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> I'd also like to take the opportunity to express a little frustration about
> the commitfest business - really all I wanted was the patch *reviewed* as
> WIP - it seemed that in order to do that I needed to enter it into the
> various commitfests... then I was faced with comments to the effect that it
> was not ready for commit so should not have been entered into a commifest at
> all... sigh, a bit of an enthusiasm killer I'm afraid...

This might be my fault, so I apologize for killing your enthusiasm. I
think when I get wrapped up in a CommitFest (and especially during the
second half) I get wound up in determining whether or not things are
going to get applied and tend to give short shrift to thinks that seem
like they won't. My bad.

Generally speaking, I am always in favor of adding things to the
CommitFest, but I guess the one exception I would make is if there are
outstanding comments already given that haven't been addressed yet.
In that case it seems a little unfair to ask people to review it
further unless there are very specific questions you need answered. I
think you were good about communicating that the patch wasn't ready to
be applied yet, but I also think that it's to be expected that you'll
get less feedback while it's in that state.

Anyway, my apologies for turning you off to the process... that wasn't
my intent and I'm sorry that it had that effect.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Performance Patches Was: Lock Wait Statistics (next commitfest)
Date: 2010-02-28 01:53:39
Message-ID: 603c8f071002271753k7b4594afq3906d214b63d7f7b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 27, 2010 at 6:22 PM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> Greg Smith wrote:
>> While I was in there I also added some more notes on my personal top patch
>> submission peeve, patches whose purpose in life is to improve performance
>> that don't come with associated easy to run test cases, including a sample
>> of that test running on a system that shows the speedup clearly.  If I were
>> in charge I just would make it standard project policy to reject any
>> performance patch without those characteristics immediately.
>
> While I completely agree that the submitter should be required to supply a
> test case and their results, so the rest of us can try to reproduce said
> improvement - rejecting the patch out of hand is a bit harsh I feel - Hey,
> they may just have forgotten to supply these things! The reviewer can always
> ask, can they not? I would prefer to see the wiki say something along the
> lines of "If you don't supply a test case you will be asked for one before
> any further review can proceed..."

Agreed. Personally, I have no problem with giving a patch a brief
once-over even if it lacks an appropriate test case, but serious
review without a test case is really hard. That's one of the things
that slowed down rbtree a lot this last CommitFest. We should
probably try to make a point of trying to point this problem out to
patch submitters before the CommitFest even starts, so that they can
address it in advance.

...Robert


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-02-28 06:06:36
Message-ID: 4B8A07EC.6050607@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
>
>
> This might be my fault, so I apologize for killing your enthusiasm. I
> think when I get wrapped up in a CommitFest (and especially during the
> second half) I get wound up in determining whether or not things are
> going to get applied and tend to give short shrift to thinks that seem
> like they won't. My bad.
>
> Generally speaking, I am always in favor of adding things to the
> CommitFest, but I guess the one exception I would make is if there are
> outstanding comments already given that haven't been addressed yet.
> In that case it seems a little unfair to ask people to review it
> further unless there are very specific questions you need answered. I
> think you were good about communicating that the patch wasn't ready to
> be applied yet, but I also think that it's to be expected that you'll
> get less feedback while it's in that state.
>
>

Yeah, makes sense, altho perhaps there needs to be a way to get
incremental progress reviewed?

> Anyway, my apologies for turning you off to the process... that wasn't
> my intent and I'm sorry that it had that effect.
>
>
>
I think there was a level of confusion on both sides, especially with a
newish process for me to get my head around, so no apology needed at all
as it is/was clear that there was no intent on your part to make things
hard! (that is why I said nothing at the time). But thank you for your
kind words, much appreciated.

Best wishes

Mark

best wishes


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, gokul007(at)gmail(dot)com
Subject: Re: Lock Wait Statistics (next commitfest)
Date: 2010-03-01 03:21:20
Message-ID: 603c8f071002281921j197a8b7ey55f4b6dcb2d45a4d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 28, 2010 at 1:06 AM, Mark Kirkwood
<mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:
> Robert Haas wrote:
>> This might be my fault, so I apologize for killing your enthusiasm.  I
>> think when I get wrapped up in a CommitFest (and especially during the
>> second half) I get wound up in determining whether or not things are
>> going to get applied and tend to give short shrift to thinks that seem
>> like they won't.  My bad.
>>
>> Generally speaking, I am always in favor of adding things to the
>> CommitFest, but I guess the one exception I would make is if there are
>> outstanding comments already given that haven't been addressed yet.
>> In that case it seems a little unfair to ask people to review it
>> further unless there are very specific questions you need answered.  I
>> think you were good about communicating that the patch wasn't ready to
>> be applied yet, but I also think that it's to be expected that you'll
>> get less feedback while it's in that state.
>
> Yeah, makes sense, altho perhaps there needs to be a way to get incremental
> progress reviewed?

I think it's possible to get that, but there's a certain way you need
to ask. As a general rule, anything that is of the form "here's my
code, can you take a look" gets less attention - with the possible
except of a patch from a committer who is planning to commit it if no
one writes back. And even then it often doesn't get looked at. Code
dumps are just no fun. Now if you write something like "here's my
patch... I can't quite finish it because of X and I'm not sure whether
the best solution is Y or Z", those tend to get answered a lot more
often, at least IME. Reading a patch and trying to understand what
it's doing and why it's doing it and whether it's really the best
solution is a fairly time-consuming effort; giving the reader some
context makes that a lot easier, and so people are more likely to help
you if you do it, again IME.

...Robert