Storing pg_stat_statements query texts externally, pg_stat_statements in core

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date: 2013-11-14 08:17:58
Message-ID: CAM3SWZRYYnfwXi3r3eDAKWBOYtaf1_PwGJxTAyddNSbjAfDzjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached patch allows pg_stat_statements to store arbitrarily long
query texts, using an external, transparently managed lookaside file.

This is of great practical benefit to certain types of users, who need
to understand the execution costs of queries with associated
excessively long query texts, often due to the fact that the query was
generated by an ORM. This is a common complaint of Heroku customers,
though I'm sure they're not alone there.

As I mentioned on a previous occasion, since query fingerprinting was
added, the query text is nothing more than a way of displaying the
entries to users that are interested. As such, it seems perfectly
reasonable to store those externally, out of shared memory - the
function pg_stat_statements() itself (that the view is defined as a
call to) isn't particularly performance sensitive. Creating a brand
new pg_stat_statements entry is already costly, since it necessitates
an exclusive lock that blocks everything, so the additional cost of
storing the query text when that happens is assumed to be of marginal
concern. Better to have more entries in the first place, so that
doesn't need to happen very frequently - using far less memory per
entry helps the user to increase pg_stat_statements.max in the first
place, making excessive exclusive locking more unlikely in the first
place. Having said all that, the code bends over backwards to make
sure that physical I/O is as unlikely as possible during exclusive
locking. There are numerous tricks employed here where cache pressure
is a concern, that I won't go into here, since it's all well commented
in the patch.

Maybe we should have a warning when eviction occurs too frequently? I
also think that we might want to take another look at making the
normalization logic less strict in terms of differentiating queries
that a reasonable person might consider equivalent. This is obviously
rather fuzzy, but I've had complaints about things like
pg_stat_statements being overly fussy in seeing row and array
comparisons as distinct from each other. This is a discussion for
another thread most probably, but informed by one of the same concerns
- making expensive cache pressure less likely.

This incidentally furthers the case for pg_stat_statements in core
(making it similar to pg_stat_user_functions - not turned on by
default, but easily available, even on busy production systems that
cannot afford a restart and didn't know they needed it until
performance tanked 5 minutes ago). The amount of shared memory now
used (and therefore arguably wasted by having a little bit of shared
memory just in case pg_stat_statements becomes useful) is greatly
reduced. That's really a separate discussion, though, which is why I
haven't done the additional work of integrating pg_stat_statements
into core here. Doing a restart to enable pg_stat_statements is, in my
experience, only acceptable infrequently - restarts are generally to
be avoided at all costs.

I can see a day when the query hash is actually a general query cache
identifier, at which time the query text will also be stored outside
of the pgss hash table proper.

Refactoring
=========

I've decided to remove the encoding id from the pgss entry key (it's
now just in the main entry struct) - I don't think it makes sense
anymore. It is clearly no longer true that it's a "notational
convenience" (and hasn't been since 9.1). Like query texts themselves,
it is something that exists for the sole purpose of displaying
statistics to users, and as such cannot possibly result in incorrectly
failing to differentiate or spuriously differentiating two entries.
Now, I suppose it's true that since we're sometimes directly hashing
query texts, it might matter (e.g. utility statements) assuming that
they could vary. But since encoding invariably comes from database
encoding anyway, and we always differentiate on databaseid to begin
with, I fail to see how that could possibly matter.

I've bumped PGSS_FILE_HEADER, just in case a pg_stat_statements temp
file survives a pg_upgrade. I think that some minor tweaks made by
Noah to pg_stat_statements (as part of commit b560ec1b) ought to have
necessitated doing so anyway, but no matter.

The role of time-series aggregation
=============================

Over on the min/max pg_stat_statements storage thread, I've stated
that I think the way forward is that third party tools aggregate
deltas fairly aggressively - maybe even as aggressively as every 3
seconds or something. So I'm slightly concerned that I may be
hampering a future tool that needs to aggregate statistics very
aggressively. For that reason, we should probably provide an
alternative function/view that identifies pg_stat_statements entries
by hashid + databaseid + userid only, so any overhead from reading big
query strings from disk with a shared lock held is eliminated.

Thoughts?
--
Peter Geoghegan

Attachment Content-Type Size
pg_stat_statements_ext_text.v1.2013_11_14.patch.gz application/x-gzip 12.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2013-11-14 08:21:07 Re: nested hstore patch
Previous Message Amit Kapila 2013-11-14 07:50:10 Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])