Re: pg_stat_lwlocks view - lwlocks statistics, round 2

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Qi Huang <huangqiyx(at)hotmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Date: 2012-10-20 01:02:57
Message-ID: CAHGQGwF5cQgn14JQK77oQGNPH3NSvt7_1q2=nHF3+7cys_a5Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 20, 2012 at 1:03 AM, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:
> 2012/10/19 23:48, Fujii Masao wrote:
>>
>> On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga(at)uptime(dot)jp>
>> wrote:
>>>
>>> 2012/10/16 2:40, Jeff Janes wrote:
>>>>
>>>>
>>>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>
>>>>>
>>>>> Satoshi Nagayasu <snaga(at)uptime(dot)jp> writes:
>>>>>>
>>>>>>
>>>>>> (2012/10/14 13:26), Fujii Masao wrote:
>>>>>>>
>>>>>>>
>>>>>>> The tracing lwlock usage seems to still cause a small performance
>>>>>>> overhead even if reporting is disabled. I believe some users would
>>>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not
>>>>>>> available.
>>>>>>> It should be up to a user to decide whether to trace lwlock usage,
>>>>>>> e.g.,
>>>>>>> by using trace_lwlock parameter, I think.
>>>>>
>>>>>
>>>>>
>>>>>> Frankly speaking, I do not agree with disabling performance
>>>>>> instrument to improve performance. DBA must *always* monitor
>>>>>> the performance metrix when having such heavy workload.
>>>>>
>>>>>
>>>>>
>>>>> This brings up a question that I don't think has been honestly
>>>>> considered, which is exactly whom a feature like this is targeted at.
>>>>> TBH I think it's of about zero use to DBAs (making the above argument
>>>>> bogus). It is potentially of use to developers, but a DBA is unlikely
>>>>> to be able to do anything about lwlock-level contention even if he has
>>>>> the knowledge to interpret the data.
>>>>
>>>>
>>>>
>>>> Waiting on BufFreelistLock suggests increasing shared_buffers.
>>>>
>>>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler
>>>> (or does it?)
>>>>
>>>> WALWriteLock suggests doing something about IO, either moving logs to
>>>> different disks, or getting BBU, or something.
>>>>
>>>> WALInsertLock suggests trying to adapt your data loading process so it
>>>> can take advantage of the bulk, or maybe increasing wal_buffers.
>>>>
>>>> And a lot of waiting on any of the locks gives a piece of information
>>>> the DBA can use when asking the mailing lists for help, even if it
>>>> doesn't allow him to take unilateral action.
>>>>
>>>>> So I feel it isn't something that should be turned on in production
>>>>> builds. I'd vote for enabling it by a non-default configure option,
>>>>> and making sure that it doesn't introduce any overhead when the option
>>>>> is off.
>>>>
>>>>
>>>>
>>>> I think hackers would benefit from getting reports from DBAs in the
>>>> field with concrete data on bottlenecks.
>>>>
>>>> If the only way to get this is to do some non-standard compile and
>>>> deploy it to production, or to create a "benchmarking" copy of the
>>>> production database system including a realistic work-load driver and
>>>> run the non-standard compile there; either of those is going to
>>>> dramatically cut down on the participation.
>>>
>>>
>>>
>>> Agreed.
>>>
>>> The hardest thing to investigate performance issue is
>>> reproducing a situation in the different environment
>>> from the production environment.
>>>
>>> I often see people struggling to reproduce a situation
>>> with different hardware and (similar but) different
>>> workload. It is very time consuming, and also it often
>>> fails.
>>>
>>> So, we need to collect any piece of information, which
>>> would help us to understand what's going on within
>>> the production PostgreSQL, without any changes of
>>> binaries and configurations in the production environment.
>>>
>>> That's the reason why I stick to a "built-in" instrument,
>>> and I disagree to disable such instrument even if it has
>>> minor performance overhead.
>>>
>>> A flight-recorder must not be disabled. Collecting
>>> performance data must be top priority for DBA.
>>
>>
>> pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects
>> only narrow performance data concerning lwlock. What we should
>> have as 'flight-recorder' is something like Oracle wait event, I think.
>> Not only lwlocks but also all of wait events should be collected for
>> DBA to investigate the performance bottleneck.
>
>
> That's the reason why I said "I accept that it's not enough
> for DBA", and I'm going to work on another lock stats.
>
>
>> This idea was
>> proposed by Itagaki-san before. Though he implemented the
>> sampling-profiler patch, it failed to be committed. I'm not sure why
>> not.
>
>
> Yeah, I know the previous patch posted by Itagaki-san.
> So, I'm questioning why (again) for this time.
> I think this is very important question because it would
> be critical in order to involve new DBAs to PostgreSQL.
>
>
>> Anyway, I think that this would be more right approach to
>> provide the 'flight-recorder' to DBA.
>
>
> Ok, I guess we have reached the consensus to have
> "some flight-recorder". Right?

Yes, at least I agree to add that. But whatever is added, if it may
have any impact on the performance, on-off switch must be required.
Also the default value of the switch should be off until we'll have
implemented the low-overhead instrument and agreed to enable it
by default.

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-10-20 01:05:16 Re: assertion failure w/extended query protocol
Previous Message Will Leinweber 2012-10-20 00:15:27 patch to add \watch to psql