Re: pg_stat_statements: calls under-estimation propagation

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Sameer Thakur <samthakur74(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: calls under-estimation propagation
Date: 2013-10-05 08:08:13
Message-ID: CAAZKuFY=pxC=+imBkMjF+vk2WK2v+ZyaQpg4dFx08tGdf-9ukA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 4, 2013 at 7:22 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Oct 3, 2013 at 5:11 PM, Sameer Thakur <samthakur74(at)gmail(dot)com> wrote:
>> On Wed, Oct 2, 2013 at 6:40 PM, Sameer Thakur <samthakur74(at)gmail(dot)com> wrote:
>>>>
>>>> Looks pretty good. Do you want to package up the patch with your
>>>> change and do the honors and re-submit it? Thanks for helping out so
>>>> much!
>>> Sure, will do. Need to add a bit of documentation explaining
>>> statistics session as well.
>>> I did some more basic testing around pg_stat_statements.max, now that
>>> we have clarity from Peter about its value being legitimate below 100.
>>> Seems to work fine, with pg_stat_statements =4 the max unique queries
>>> in the view are 4. On the 5th query the view holds just the latest
>>> unique query discarding the previous 4. Fujii had reported a
>>> segmentation fault in this scenario.
>>> Thank you for the patch
>>
>> Please find the patch attached
>
> Thanks for the patch! Here are the review comments:
>
> + OUT session_start timestamptz,
> + OUT introduced timestamptz,
>
> The patch exposes these columns in pg_stat_statements view.
> These should be documented.
>
> I don't think that session_start should be exposed in every
> rows in pg_stat_statements because it's updated only when
> all statistics are reset, i.e., session_start of all entries
> in pg_stat_statements indicate the same.

Dunno. I agree it'd be less query traffic and noise. Maybe hidden
behind a UDF? I thought "stats_reset" on pg_database may be prior
art, but realized that the statistics there differ depending on stats
resets per database (maybe a name change of 'session' to 'stats_reset'
would be useful to avoid too much in-cohesion, though).

I didn't want to bloat the taxonomy of exposed API/symbols too much
for pg_stat_statements, but perhaps in this instance it is reasonable.
Also, isn't the interlock with the result set is perhaps more
precise/fine-grained with the current solution? Yet, that's awfully
corner-casey.

I'm on the fence because the simplicity and precision of the current
regime for aggregation tools is nice, but avoiding the noise for
inspecting humans in the common case is also nice. I don't see a
reason right now to go strongly either way, so if you feel moderately
strongly that the repetitive column should be stripped then I am happy
to relent there and help out. Let me know of your detailed thoughts
(or modify the patch) with your idea.

> + OUT query_id int8,
>
> query_id or queryid? I like the latter. Also the document
> uses the latter.

Not much opinion. I prefer expending the _ character because most
compound words in postgres performance statistics catalogs seem to use
an underscore, though.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-10-05 11:38:38 Re: Patch: FORCE_NULL option for copy COPY in CSV mode
Previous Message Kohei KaiGai 2013-10-05 08:01:07 Re: Custom Plan node