Re: pg_stat_statements: calls under-estimation propagation

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Daniel Farina <drfarina(at)acm(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements: calls under-estimation propagation
Date: 2012-12-28 17:58:53
Message-ID: CAEYLb_Uic+mfDHokX0UWKSSx-5WcFzNjU5ngBJ-MAHOambL9Qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 December 2012 11:43, Daniel Farina <drfarina(at)acm(dot)org> wrote:
> Without further ado, the cover letter taken from the top of the patch:
>
> This tries to establish a maximum under-estimate of the number of
> calls for a given pg_stat_statements entry. That means the number of
> calls to the canonical form of the query is between 'calls' and 'calls
> + calls_underest'.

Cool.

One possible issue I see with this is that this code:

+
+ /* propagate calls under-estimation bound */
+ entry->counters.calls_underest = pgss->calls_max_underest;
+

which appears within entry_alloc(). So if the first execution of the
query results in an error during planning or (much more likely)
execution, as in the event of an integrity constraint violation,
you're going to get a dead sticky entry that no one will ever see.
Now, we currently decay sticky entries much more aggressively, so the
mere fact that we waste an entry isn't a real problem. This was
established by this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d5375491f8e391224b48e4bb449995a4642183ea

However, with this approach, calls_underest values might appear to the
user in what might be considered an astonishing order. Now, I'm not
suggesting that that's a real problem - just that they may not be the
semantics we want, particularly as we can reasonably defer assigning a
calls_underest until a sticky entry is "unstuck", and an entry becomes
user-visible, within pgss_store().

Also, it seems like you should initialise pgss->calls_max_underest,
within pgss_shmem_startup(). You should probably serialise the value
to disk, and initialise it to 0 if there is no such value to begin
with.

I wonder if the way that pg_stat_statements throws its hands up when
it comes to crash safety (i.e. the contents of the hash table are
completely lost) could be a concern here. In other words, a program
tasked with tracking execution costs over time and graphing
time-series data from snapshots has to take responsibility for
ensuring that there hasn't been a crash (or, indeed, a reset).

Another issue is that I don't think that what you've done here solves
the problem of uniquely identify each entry over time, in the same way
that simply exposing the hash value would. I'm concerned with the
related problem to the problem solved here - simply identifying the
entry uniquely. As we've already discussed, the query string is an
imperfect proxy for each entry, even with constants swapped with '?'
characters (and even when combined with the userid and dbid values -
it's still not the same as the hashtable key, in a way that is likely
to bother people that use pg_stat_statements for long enough).

I think you probably should have created a
PG_STAT_STATEMENTS_COLS_V1_1 macro, since that version of the module
is now legacy, like *V1_0 is in HEAD.

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-12-28 17:59:50 Re: XLByte* usage
Previous Message Pavel Stehule 2012-12-28 16:22:26 proposal: ANSI SQL 2011 syntax for named parameters