Re: pg_stat_statements: calls under-estimation propagation

From: Sameer Thakur <samthakur74(at)gmail(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: calls under-estimation propagation
Date: 2013-10-05 14:47:45
Message-ID: CABzZFEuxkqd7_5HrEVYA1TuUrkX_iYqNJqo6EJmraUsNQ6yvGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 5, 2013 at 1:38 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74(at)gmail(dot)com> wrote:
>>> On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74(at)gmail(dot)com> wrote:
>>>>>
>>>>> Looks pretty good. Do you want to package up the patch with your
>>>>> change and do the honors and re-submit it? Thanks for helping out so
>>>>> much!
>>>> Sure, will do. Need to add a bit of documentation explaining
>>>> statistics session as well.
>>>> I did some more basic testing around pg_stat_statements.max, now that
>>>> we have clarity from Peter about its value being legitimate below 100.
>>>> Seems to work fine, with pg_stat_statements =4 the max unique queries
>>>> in the view are 4. On the 5th query the view holds just the latest
>>>> unique query discarding the previous 4. Fujii had reported a
>>>> segmentation fault in this scenario.
>>>> Thank you for the patch
>>>
>>> Please find the patch attached
>>
>> Thanks for the patch! Here are the review comments:
>>
>> + OUT session_start timestamptz,
>> + OUT introduced timestamptz,
>>
>> The patch exposes these columns in pg_stat_statements view.
>> These should be documented.
>>
>> I don't think that session_start should be exposed in every
>> rows in pg_stat_statements because it's updated only when
>> all statistics are reset, i.e., session_start of all entries
>> in pg_stat_statements indicate the same.
>
> Dunno. I agree it'd be less query traffic and noise. Maybe hidden
> behind a UDF? I thought "stats_reset" on pg_database may be prior
> art, but realized that the statistics there differ depending on stats
> resets per database (maybe a name change of 'session' to 'stats_reset'
> would be useful to avoid too much in-cohesion, though).
>
> I didn't want to bloat the taxonomy of exposed API/symbols too much
> for pg_stat_statements, but perhaps in this instance it is reasonable.
> Also, isn't the interlock with the result set is perhaps more
> precise/fine-grained with the current solution? Yet, that's awfully
> corner-casey.
>
> I'm on the fence because the simplicity and precision of the current
> regime for aggregation tools is nice, but avoiding the noise for
> inspecting humans in the common case is also nice. I don't see a
> reason right now to go strongly either way, so if you feel moderately
> strongly that the repetitive column should be stripped then I am happy
> to relent there and help out. Let me know of your detailed thoughts
> (or modify the patch) with your idea.
>

Thinking a bit more, if its just a question of a repeating value we
have the same situation for userid and dbid. They would be the same
for a user across multiple queries in same statistics session. So
userid,dbid and session_start do repeat across rows. Not sure why
treatment for session_start be different. I also checked pg_stat_plans
@ https://github.com/2ndQuadrant/pg_stat_plans and did not see any
special treatment given for a particular field in terms of access i.e.
the granularity of api wrt pg_stat_statements has been maintained.

regards
Sameer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2013-10-05 18:22:54 custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Previous Message Oskari Saarenmaa 2013-10-05 13:57:15 Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones