Re: pg_stat_statements: calls under-estimation propagation

Lists: pgsql-hackers
From: pilum(dot)70(at)uni-muenster(dot)de
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements: calls under-estimation propagation
Date: 2013-10-10 10:11:03
Message-ID: alpine.LNX.2.00.1310101204300.5867@ZIVPC313.uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for
me.

Concernig the usability, I would like to suggest a minor change, that massively
increases the usefulness of the patch for beginners, who often use this view as
a first approach to optimize index structure.

The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer needed to
be exposed in the view for everyone.

I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated (The actual
behaviour is the default). This new option is not always the best starting
point to discover index shortfalls, but a huge gain for beginners because it
serves the needs
in more than 90% of the normal use cases.

What do you think?

Arne

P.S. This message is resent, because it is stalled two days, so sorry for a possible duplicate

Date: Mon Oct 7 17:54:08 2013 +0000

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c
b/contrib/pg_stat_statements/pg_stat_statements.c
index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int pgss_max; /* max # statements to track */
static int pgss_track; /* tracking level */
static bool pgss_track_utility; /* whether to track utility commands */
static bool pgss_save; /* whether to save stats across shutdown */
-
+static bool pgss_normalize; /* whether to normalize the query representation shown in the view (otherwise show the first query executed with this query_id) */

#define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
NULL,
NULL);

+ DefineCustomBoolVariable("pg_stat_statements.normalize",
+ "Selects whether the view column contains the query strings in a normalized form.",
+ NULL,
+ &pgss_normalize,
+ true,
+ PGC_SUSET,
+ 0,
+ NULL,
+ NULL,
+ NULL);
+
EmitWarningsOnPlaceholders("pg_stat_statements");

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

- if (jstate)
+ if (jstate && pgss_normalize)
{
- /* Normalize the string if enabled */
+ /* Normalize the string is not NULL and normalized query strings are enabled */
norm_query = generate_normalized_query(jstate, query,
&query_len,
key.encoding);


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: pilum(dot)70(at)uni-muenster(dot)de
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: calls under-estimation propagation
Date: 2013-10-10 10:26:49
Message-ID: CAM3SWZTB_Puh+ir_3t4L9Vz61qkhp=G9SpAgGBtsyYkPA0pP8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 10, 2013 at 3:11 AM, <pilum(dot)70(at)uni-muenster(dot)de> wrote:
> But the drawback of this approach is impossibility to use
> explain analyze without further substitutions.

You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally
executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.

--
Peter Geoghegan


From: pilum(dot)70(at)uni-muenster(dot)de
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: calls under-estimation propagation
Date: 2013-10-10 11:50:23
Message-ID: alpine.LNX.2.00.1310101313140.5867@ZIVPC313.uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thx for your reply.

On Thu, 10 Oct 2013, Peter Geoghegan wrote:

> On Thu, Oct 10, 2013 at 3:11 AM, <pilum(dot)70(at)uni-muenster(dot)de> wrote:
>> But the drawback of this approach is impossibility to use
>> explain analyze without further substitutions.
>
> You can fairly easily disable the swapping of constants with '?'
> symbols, so that the query text stored would match the full originally

I thought I did ?! I introduced an additional user parameter
to disable the normalization in the patch shown in my last mail.

If there is already an easier way in the actual distribution,
i simply missed ist.
Where is this behaviour documented?

> executed query. Why would you want to, though? There could be many
> actual plans whose costs are aggregated as one query. Seeing one of
> them is not necessarily useful at all, and could be misleading.
>

Yeah, (thinking of for example parameter ranges) I mentioned that, I think,
but in the majority of cases beginners can easily conclude missing indices
executing explain analyze, because the queries, that are aggregated
and displayed under one query_id have very similar (or simply the same) query plans.

It's also only an option disabled by default: You can simply do nothing, if you don't like it :-)

VlG

Arne Scheffer