Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date: 2013-12-10 03:31:15
Message-ID: CAM3SWZRbxGA9dGXRe8+RDPmKa7LED75z1LxCEDnbTe7EFLZM6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 7, 2013 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> The patch doesn't apply cleanly against the master due to recent change of
> pg_stat_statements. Could you update the patch?

Revision is attached, including changes recently discussed on other
thread [1] to allow avoiding sending query text where that's not
desirable and increase in default pg_stat_statements.max to 5000.

I've found myself tweaking things a bit more here and there. I've
added additional clarification around why we do an in-place garbage
collection (i.e. why we don't just write a new file and atomically
rename -- grep "over-engineer" to find what I mean). I also fully
documented the pg_stat_statements function. If we want external tool
authors to use it to avoid sending query texts, they have to know
about it in the first place.

I now use the pg_stat_tmp directory for my temp file (so
pg_stat_statements behaves more like the statistics collector). The
stats_temp_directory GUC is not used, for reasons that a patch comment
explains.

I've fuzz-tested this throughout with the "make installcheck-parallel"
regression tests. I found it useful to run that in one terminal, while
running pgbench with multiple clients in another. The pgbench script
looks something like:

"""
select * from pg_stat_statements;
select pg_stat_statements_reset();
"""

pg_stat_statements_reset() was temporarily rigged to perform a garbage
collection (and not a reset) for the benefit of this stress-test. The
function pg_stat_statements(), the garbage collection function and
pgss_store() will complain loudly if they notice anything out of the
ordinary. I was not able to demonstrate any problems through this
technique (though I think it focused my thinking on the right areas
during development). Clearly, missed subtleties around managing the
external file while locking in order to ensure correct behavior (that
no session can observe something that it should or should not) will be
something that a committer will want to pay particular attention to. I
go to some lengths to to avoid doing this with only a shared lock.

FYI, without hacking things up, it's quite hard to make external query
text garbage collection occur - need_gc_qtexts() will almost always
say no. It will only automatically happen once with
pg_stat_statements.max set to 1000 when the standard regression tests
are run. This is a really extreme workload in terms of causing
pg_stat_statements churn, which shows just how unlikely garbage
collection is in the real world. Still, it ought to be totally robust
when it happens.

In passing, I did this:

/*
! * Rename file into place, to atomically commit to serializing.
*/

Because at this juncture, there ought to be no file to replace (not
since Magnus had pg_stat_statements not keep the main global file
around for as long as the server is running, in the style of the
statistics collector).

Thanks

[1] http://www.postgresql.org/message-id/CAM3SWZSVFf-vBNC8sc-0CpWZxVQH49REYBcHb95t95fcrGSa6A@mail.gmail.com
--
Peter Geoghegan

Attachment Content-Type Size
pg_stat_statements_ext_text.v5.2013_12_09.patch.gz application/x-gzip 14.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-12-10 03:35:28 Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Previous Message Amit Kapila 2013-12-10 03:30:46 Re: [bug fix] pg_ctl always uses the same event source