Quick Links

Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core

Lists:	pgsql-hackers

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-11-14 08:17:58
Message-ID:	CAM3SWZRYYnfwXi3r3eDAKWBOYtaf1_PwGJxTAyddNSbjAfDzjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Attached patch allows pg_stat_statements to store arbitrarily long
query texts, using an external, transparently managed lookaside file.

This is of great practical benefit to certain types of users, who need
to understand the execution costs of queries with associated
excessively long query texts, often due to the fact that the query was
generated by an ORM. This is a common complaint of Heroku customers,
though I'm sure they're not alone there.

As I mentioned on a previous occasion, since query fingerprinting was
added, the query text is nothing more than a way of displaying the
entries to users that are interested. As such, it seems perfectly
reasonable to store those externally, out of shared memory - the
function pg_stat_statements() itself (that the view is defined as a
call to) isn't particularly performance sensitive. Creating a brand
new pg_stat_statements entry is already costly, since it necessitates
an exclusive lock that blocks everything, so the additional cost of
storing the query text when that happens is assumed to be of marginal
concern. Better to have more entries in the first place, so that
doesn't need to happen very frequently - using far less memory per
entry helps the user to increase pg_stat_statements.max in the first
place, making excessive exclusive locking more unlikely in the first
place. Having said all that, the code bends over backwards to make
sure that physical I/O is as unlikely as possible during exclusive
locking. There are numerous tricks employed here where cache pressure
is a concern, that I won't go into here, since it's all well commented
in the patch.

Maybe we should have a warning when eviction occurs too frequently? I
also think that we might want to take another look at making the
normalization logic less strict in terms of differentiating queries
that a reasonable person might consider equivalent. This is obviously
rather fuzzy, but I've had complaints about things like
pg_stat_statements being overly fussy in seeing row and array
comparisons as distinct from each other. This is a discussion for
another thread most probably, but informed by one of the same concerns
- making expensive cache pressure less likely.

This incidentally furthers the case for pg_stat_statements in core
(making it similar to pg_stat_user_functions - not turned on by
default, but easily available, even on busy production systems that
cannot afford a restart and didn't know they needed it until
performance tanked 5 minutes ago). The amount of shared memory now
used (and therefore arguably wasted by having a little bit of shared
memory just in case pg_stat_statements becomes useful) is greatly
reduced. That's really a separate discussion, though, which is why I
haven't done the additional work of integrating pg_stat_statements
into core here. Doing a restart to enable pg_stat_statements is, in my
experience, only acceptable infrequently - restarts are generally to
be avoided at all costs.

I can see a day when the query hash is actually a general query cache
identifier, at which time the query text will also be stored outside
of the pgss hash table proper.

Refactoring
=========

I've decided to remove the encoding id from the pgss entry key (it's
now just in the main entry struct) - I don't think it makes sense
anymore. It is clearly no longer true that it's a "notational
convenience" (and hasn't been since 9.1). Like query texts themselves,
it is something that exists for the sole purpose of displaying
statistics to users, and as such cannot possibly result in incorrectly
failing to differentiate or spuriously differentiating two entries.
Now, I suppose it's true that since we're sometimes directly hashing
query texts, it might matter (e.g. utility statements) assuming that
they could vary. But since encoding invariably comes from database
encoding anyway, and we always differentiate on databaseid to begin
with, I fail to see how that could possibly matter.

I've bumped PGSS_FILE_HEADER, just in case a pg_stat_statements temp
file survives a pg_upgrade. I think that some minor tweaks made by
Noah to pg_stat_statements (as part of commit b560ec1b) ought to have
necessitated doing so anyway, but no matter.

The role of time-series aggregation
=============================

Over on the min/max pg_stat_statements storage thread, I've stated
that I think the way forward is that third party tools aggregate
deltas fairly aggressively - maybe even as aggressively as every 3
seconds or something. So I'm slightly concerned that I may be
hampering a future tool that needs to aggregate statistics very
aggressively. For that reason, we should probably provide an
alternative function/view that identifies pg_stat_statements entries
by hashid + databaseid + userid only, so any overhead from reading big
query strings from disk with a shared lock held is eliminated.

Thoughts?
--
Peter Geoghegan

Attachment	Content-Type	Size
pg_stat_statements_ext_text.v1.2013_11_14.patch.gz	application/x-gzip	12.3 KB

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-11-15 16:51:25
Message-ID:	5286510D.1040004@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/14/13, 3:17 AM, Peter Geoghegan wrote:
> Attached patch allows pg_stat_statements to store arbitrarily long
> query texts, using an external, transparently managed lookaside file.

Compiler warnings with fortify settings:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -fpic -I. -I. -I../../src/include -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pg_stat_statements.o pg_stat_statements.c -MMD -MP -MF .deps/pg_stat_statements.Po
pg_stat_statements.c: In function ‘entry_reset’:
pg_stat_statements.c:1827:11: warning: ignoring return value of ‘ftruncate’, declared with attribute warn_unused_result [-Wunused-result]
pg_stat_statements.c: In function ‘garbage_collect_query_texts’:
pg_stat_statements.c:1779:11: warning: ignoring return value of ‘ftruncate’, declared with attribute warn_unused_result [-Wunused-result]

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-11-17 00:36:35
Message-ID:	CAM3SWZRkDy9VhKaJmv9GDwkFk2GX+uXk-PPBaBsTGLxCTK+VHA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 15, 2013 at 8:51 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Compiler warnings with fortify settings:

I'll post a revision with fixes for those. Another concern is that I
see some race conditions that only occur when pg_stat_statements cache
is under a lot of pressure with a lot of concurrency, that wasn't
revealed by my original testing. I'm working on a fix for that.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-11-17 04:15:50
Message-ID:	CAM3SWZQdsXO2976HbmzXWBYx8C5eUq6LpmbBrOF=nirtiTuCkA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 16, 2013 at 4:36 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I'll post a revision with fixes for those. Another concern is that I
> see some race conditions that only occur when pg_stat_statements cache
> is under a lot of pressure with a lot of concurrency, that wasn't
> revealed by my original testing. I'm working on a fix for that.

Revision attached.

--
Peter Geoghegan

Attachment	Content-Type	Size
pg_stat_statements_ext_text.v2.2013_11_16.patch.gz	application/x-gzip	12.5 KB

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-11-20 23:37:36
Message-ID:	CAM3SWZT582X_h=Vjdk23qhaVFGAMK5KLErwduHYCidiHVUr7sg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

With a pg_stat_statements.max of 1,000, on my system with the patch
applied the additional amount of shared memory required is only
192KiB. That compares with about 1.17MiB for the same setting in
master's version of pg_stat_statements. Surely with this huge
reduction, we should revisit that default - I think that a default
setting of 5,000 for pg_stat_statements.max makes more sense.

entry_dealloc() requires an exclusive lock, locking all queries out of
simply recording their costs. With a lot of cache pressure this could
be really expensive. In my opinion that risk alone justifies the
change.

Without adding another GUC, we might even go so far as to have a
mechanism like checkpoint_warning warn that entry_dealloc() calls
occur too frequently...

--
Peter Geoghegan

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-12-07 17:26:12
Message-ID:	CAHGQGwELDsmKYcvmE6nZamevGQHT82scsYN_0ZfxePJBFCvN5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Nov 17, 2013 at 1:15 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Sat, Nov 16, 2013 at 4:36 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> I'll post a revision with fixes for those. Another concern is that I
>> see some race conditions that only occur when pg_stat_statements cache
>> is under a lot of pressure with a lot of concurrency, that wasn't
>> revealed by my original testing. I'm working on a fix for that.
>
> Revision attached.

The patch doesn't apply cleanly against the master due to recent change of
pg_stat_statements. Could you update the patch?

Regards,

--
Fujii Masao

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-12-10 03:31:15
Message-ID:	CAM3SWZRbxGA9dGXRe8+RDPmKa7LED75z1LxCEDnbTe7EFLZM6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 7, 2013 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> The patch doesn't apply cleanly against the master due to recent change of
> pg_stat_statements. Could you update the patch?

Revision is attached, including changes recently discussed on other
thread [1] to allow avoiding sending query text where that's not
desirable and increase in default pg_stat_statements.max to 5000.

I've found myself tweaking things a bit more here and there. I've
added additional clarification around why we do an in-place garbage
collection (i.e. why we don't just write a new file and atomically
rename -- grep "over-engineer" to find what I mean). I also fully
documented the pg_stat_statements function. If we want external tool
authors to use it to avoid sending query texts, they have to know
about it in the first place.

I now use the pg_stat_tmp directory for my temp file (so
pg_stat_statements behaves more like the statistics collector). The
stats_temp_directory GUC is not used, for reasons that a patch comment
explains.

I've fuzz-tested this throughout with the "make installcheck-parallel"
regression tests. I found it useful to run that in one terminal, while
running pgbench with multiple clients in another. The pgbench script
looks something like:

"""
select * from pg_stat_statements;
select pg_stat_statements_reset();
"""

pg_stat_statements_reset() was temporarily rigged to perform a garbage
collection (and not a reset) for the benefit of this stress-test. The
function pg_stat_statements(), the garbage collection function and
pgss_store() will complain loudly if they notice anything out of the
ordinary. I was not able to demonstrate any problems through this
technique (though I think it focused my thinking on the right areas
during development). Clearly, missed subtleties around managing the
external file while locking in order to ensure correct behavior (that
no session can observe something that it should or should not) will be
something that a committer will want to pay particular attention to. I
go to some lengths to to avoid doing this with only a shared lock.

FYI, without hacking things up, it's quite hard to make external query
text garbage collection occur - need_gc_qtexts() will almost always
say no. It will only automatically happen once with
pg_stat_statements.max set to 1000 when the standard regression tests
are run. This is a really extreme workload in terms of causing
pg_stat_statements churn, which shows just how unlikely garbage
collection is in the real world. Still, it ought to be totally robust
when it happens.

In passing, I did this:

/*
! * Rename file into place, to atomically commit to serializing.
*/

Because at this juncture, there ought to be no file to replace (not
since Magnus had pg_stat_statements not keep the main global file
around for as long as the server is running, in the style of the
statistics collector).

Thanks

[1] http://www.postgresql.org/message-id/CAM3SWZSVFf-vBNC8sc-0CpWZxVQH49REYBcHb95t95fcrGSa6A@mail.gmail.com
--
Peter Geoghegan

Attachment	Content-Type	Size
pg_stat_statements_ext_text.v5.2013_12_09.patch.gz	application/x-gzip	14.6 KB

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2013-12-10 03:35:28
Message-ID:	CAM3SWZSU0VSp4EOFk4M=Zy8QT5AGi7y4DH7azAJdwxHcwbCVTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 9, 2013 at 7:31 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I go to some lengths to to avoid doing this with only a shared lock.

I should have said: I go to great lengths to do this with only a
shared lock, and not an exclusive (see gc_count stuff).

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-20 20:55:40
Message-ID:	2973.1390251340@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Sat, Dec 7, 2013 at 9:26 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> The patch doesn't apply cleanly against the master due to recent change of
>> pg_stat_statements. Could you update the patch?

> Revision is attached, including changes recently discussed on other
> thread [1] to allow avoiding sending query text where that's not
> desirable and increase in default pg_stat_statements.max to 5000.

I see this is marked as ready for committer. Where does it stand in
relation to the other long-running thread about "calls under-estimation
propagation"? I was surprised to find that there isn't any CommitFest
entry linked to that thread, so I'm wondering if that proposal is
abandoned or what. If it's not, is committing this going to blow up
that patch?

BTW, I'm also thinking that the "detected_version" kluge is about ready
to collapse of its own weight, or at least is clearly going to break in
future. What we need to do is embed the API version in the C name of the
pg_stat_statements function instead.

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-20 21:08:52
Message-ID:	CAM3SWZSX0GroNCKvgnxVaLxMqaWt8NTeDfZDZD_y_zU5XzeVjw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 20, 2014 at 12:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I see this is marked as ready for committer. Where does it stand in
> relation to the other long-running thread about "calls under-estimation
> propagation"? I was surprised to find that there isn't any CommitFest
> entry linked to that thread, so I'm wondering if that proposal is
> abandoned or what. If it's not, is committing this going to blow up
> that patch?

I believe that proposal was withdrawn. I think the conclusion there
was that we should just expose queryid and be done with it. In a way,
exposing the queryid enabled this work, because it provides an
identifier that can be used instead of sending large query texts each
call.

> BTW, I'm also thinking that the "detected_version" kluge is about ready
> to collapse of its own weight, or at least is clearly going to break in
> future. What we need to do is embed the API version in the C name of the
> pg_stat_statements function instead.

I agree that it isn't scalable.

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-20 21:24:51
Message-ID:	3644.1390253091@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Mon, Jan 20, 2014 at 12:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, I'm also thinking that the "detected_version" kluge is about ready
>> to collapse of its own weight, or at least is clearly going to break in
>> future. What we need to do is embed the API version in the C name of the
>> pg_stat_statements function instead.

> I agree that it isn't scalable.

Yeah. It was more or less tolerable as long as we were only using it in
connection with identifying the set of output columns, but using it to
identify the expected input arguments too seems darn rickety. I'll
refactor so there are separate C call points for each supported API
version. (Well, I guess it's too late to separate 1.0 and 1.1, but
we can make 1.2 and future versions separate.)

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 02:22:19
Message-ID:	24505.1390357339@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

BTW, what is the value of using posix_fadvise() in warm_fcache?

ISTM that if you're worried about waiting for I/O while holding the
LWLock (as I concur you should be), this coding is entirely inadequate
for preventing that, as the whole point of posix_fadvise is that it
won't wait around for the I/O to happen. It strains credulity to
the breaking point to imagine that the kernel will have gotten
around to reading the file in the small number of nanoseconds that
will elapse before you start needing the data. Unless of course the file
is already in buffers, but then the whole thing was a waste of code.

I think we should flush the posix_fadvise() code path as being neither
useful nor portable, and just do the actual read() as in the other
path (which btw is lacking an essential fseek).

Actually, I think the whole thing is rather badly designed, as warm
cache or no you're still proposing to do thousands of kernel calls
while holding a potentially contended LWLock. I suggest what we
do is (1) read in the whole file, (2) acquire the lock, (3)
re-read the whole file in the same buffer, (4) work from the buffer.

It might be possible to optimize the re-read away in the common case
that the file didn't actually change since we started to read it;
we'd need to put a bit more reporting into qtext_store probably, so
that it's possible to tell if any space is reserved but still unwritten.
BTW shouldn't there be an fflush() in qtext_store?

We could also eliminate some palloc/pfree cycles by using the string
directly in the buffer (at the small cost of needing to include the
strings' trailing nulls in the file).

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 04:01:51
Message-ID:	CAM3SWZSySkp2z44C_BFthC9rDr10PTF7KjL80nD-Sr=Xv-XjZA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 21, 2014 at 6:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, what is the value of using posix_fadvise() in warm_fcache?
>
> ISTM that if you're worried about waiting for I/O while holding the
> LWLock (as I concur you should be), this coding is entirely inadequate
> for preventing that, as the whole point of posix_fadvise is that it
> won't wait around for the I/O to happen. It strains credulity to
> the breaking point to imagine that the kernel will have gotten
> around to reading the file in the small number of nanoseconds that
> will elapse before you start needing the data. Unless of course the file
> is already in buffers, but then the whole thing was a waste of code.

Not necessarily. Under Linux for example, POSIX_FADV_SEQUENTIAL "sets
the readahead window to the default size for the backing device". That
seems like a useful thing, though I will admit that it's hard to
quantify how useful here. If it is useful, and furthermore if it's
more useful than just reading the file into a temp buffer (which, I
admit, is not clear), then why not do it where we can? Adopting
support for the non-portable posix_fadvise is a sunk cost.

> I think we should flush the posix_fadvise() code path as being neither
> useful nor portable, and just do the actual read() as in the other
> path

Perhaps.

> Actually, I think the whole thing is rather badly designed, as warm
> cache or no you're still proposing to do thousands of kernel calls
> while holding a potentially contended LWLock. I suggest what we
> do is (1) read in the whole file, (2) acquire the lock, (3)
> re-read the whole file in the same buffer, (4) work from the buffer.

fseeko() and fread() are part of the standard library, not any kernel.
I don't believe that what I've done in qtext_fetch() implies thousands
of kernel calls, or thousands of context switches.

Now, using a buffer not managed by libc might be better, but I should
note what holding that LWLock actually represents. As you know, it's a
shared lock that does not block the updating of existing entries. It
will only block the creation of entirely new ones. That ought to be
very rare. You may recall that when I worked with you on
pg_stat_statements in the past, I characterized the rate of growth as
being logarithmic. Once the system has been running for half an hour,
new entries become very infrequent indeed. Admittedly, I may be
failing to consider a worst case there, but in the average case this
works fine. Encouraging automated tools to lazily retrieve query texts
is a big part of why I thought this might be the least worst thing.

It's not obvious to me that always avoiding blocking new entries while
performing physical I/O is important enough to warrant the additional
complexity of copying to a buffer first, and mostly working off that.
Also, you must consider that there may have been a garbage collection
in the interim, so you have to get the shared lock twice (to check the
garbage collection count, to later guard against your buffer having
been invalidated by a garbage collection) instead of just getting the
shared lock once. I discussed this with Pavel, actually. Getting the
shared lock twice on every pg_stat_statements() call doesn't seem very
good either. Especially since you'll have to open the file twice,
*with* the shared lock second time around, when your scheme misses new
entries. Your scheme won't work out under exactly the same
circumstances that mine doesn't do too well, which in when there is a
new entry to consider. When that doesn't happen, holding a shared lock
is not that important.

> It might be possible to optimize the re-read away in the common case
> that the file didn't actually change since we started to read it;
> we'd need to put a bit more reporting into qtext_store probably, so
> that it's possible to tell if any space is reserved but still unwritten.

So, to be clear, you'd still be reading from the file if that proved
necessary? I think you'll have to, because it seems useful that
pg_stat_statements offers a limited form of consistency, in that you
only see entries at a point in time, even if the figures themselves
are not exactly consistent. Maybe I've missed something, but it's not
obvious to me that you're any better off as compared to just using a
FILE/stream.

> BTW shouldn't there be an fflush() in qtext_store?

I don't think so, no. Whenever qtext_store() callers don't fflush()
before releasing their exclusive lock, they FreeFile() before doing so
instead. In the case of pgss_shmem_startup() there is no lock held
anyway, because it doesn't matter, for the same reason it doesn't
matter that we're modifying shmem structures in a way that ordinarily
necessitates an exclusive lock. Why fflush() every qtext_store() call
if there is expected to be more than one, as is the case for several
callers?

FreeFile() calls fclose(). fclose() is described by the standard as
flushing its stream. It is my understanding that calling fflush() on a
stream immediately prior to calling fclose() on the same stream is
always superfluous.

> We could also eliminate some palloc/pfree cycles by using the string
> directly in the buffer (at the small cost of needing to include the
> strings' trailing nulls in the file).

It might be useful to do that anyway, for sanity checking purposes. So
+1 from me.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 04:05:28
Message-ID:	CAM3SWZQ03w48qEEp-ZFx7Qt6dTPEftd79J4QHNpon+swM5GFnA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Not necessarily. Under Linux for example, POSIX_FADV_SEQUENTIAL "sets
> the readahead window to the default size for the backing device"

Excuse me; I meant to put "POSIX_FADV_SEQUENTIAL doubles this size
[default size for the backing device]".

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 10:25:38
Message-ID:	CAM3SWZQBBrCF5xq4Tp4+Rw7b-Layfm97NPUvz8Exsv786Vbpbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> Actually, I think the whole thing is rather badly designed, as warm
>> cache or no you're still proposing to do thousands of kernel calls
>> while holding a potentially contended LWLock. I suggest what we
>> do is (1) read in the whole file, (2) acquire the lock, (3)
>> re-read the whole file in the same buffer, (4) work from the buffer.
>
> fseeko() and fread() are part of the standard library, not any kernel.
> I don't believe that what I've done in qtext_fetch() implies thousands
> of kernel calls, or thousands of context switches.

I ran an strace on a pg_stat_statements backend with a full ~5,000
entries. It seems that glibc is actually content to always make
lseek() and read() syscalls in the event of an fseek(), even though it
does maintain its own buffer and could in principle have a lot more
gumption about that [1]. I agree that we should copy everything into a
buffer in one go.

I think that making the LWLocking duration as brief as possible isn't
critically important. Automated tools will only be interested in new
query texts (implying possible I/O while share locking) as they
observe new entries. But new entries are the only thing worth
considering that may potentially block on that shared lock (although,
not as they write out their query texts, which almost always just
requires a shared lock). New entries ought to be very rare occurrence
compared to the execution of queries - certainly, on the busy
production systems I use pg_stat_statements on, it may be weeks before
a new query is observed (uh, with the possible exception of *my*
ad-hoc pg_stat_statements queries). On my laptop, "explain analyze
select * from pg_stat_statements" indicates a total runtime of ~9ms
with 5,000 regression test entries, even with the (still modest)
overhead of doing all those read()/lseek() calls. So we're talking
about a small impact on a new entry, that will only be affected once,
that was always going to have to write out its query text to file
anyway. On top of all this, in general it's very improbable that any
given new entry will be affected at all, because two unlikely things
need to happen at the same time for that to be possible.

The most important aspect of keeping the overhead of
pg_stat_statements low is that there not be too much cache pressure.
Obviously the huge reduction in its shared memory footprint that this
patch allows helps with that.

[1] http://www.pixelbeat.org/programming/stdio_buffering/
--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 10:39:40
Message-ID:	CAM3SWZStk01Aa5noBea15XrOihRKp0tZOdVwCcZC+wuDQq9zjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> BTW shouldn't there be an fflush() in qtext_store?
>
> I don't think so, no. Whenever qtext_store() callers don't fflush()
> before releasing their exclusive lock, they FreeFile() before doing so
> instead. In the case of pgss_shmem_startup() there is no lock held
> anyway, because it doesn't matter, for the same reason it doesn't
> matter that we're modifying shmem structures in a way that ordinarily
> necessitates an exclusive lock. Why fflush() every qtext_store() call
> if there is expected to be more than one, as is the case for several
> callers?

Having said all that, it occurs to me now that I could have been
smarter about where I fflush(). I'm pretty sure pgss_store() should
have two fflush() calls. The first can occur with just a shared LWLock
held, before the lock strength promotion interim (you probably noticed
that I no longer generate a normalized query text in that interim, for
reasons that are obvious). The second fflush() may occur only in the
very unlikely event of a garbage collection necessitating a new
qtext_store() call with an exclusive lock held, a few lines after the
first fflush(). No need to fflush() with the exclusive lock the vast
majority of the time.

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 18:29:53
Message-ID:	15115.1390415393@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Tue, Jan 21, 2014 at 8:01 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>>> Actually, I think the whole thing is rather badly designed, as warm
>>> cache or no you're still proposing to do thousands of kernel calls
>>> while holding a potentially contended LWLock. I suggest what we
>>> do is (1) read in the whole file, (2) acquire the lock, (3)
>>> re-read the whole file in the same buffer, (4) work from the buffer.

>> fseeko() and fread() are part of the standard library, not any kernel.
>> I don't believe that what I've done in qtext_fetch() implies thousands
>> of kernel calls, or thousands of context switches.

> I ran an strace on a pg_stat_statements backend with a full ~5,000
> entries. It seems that glibc is actually content to always make
> lseek() and read() syscalls in the event of an fseek(), even though it
> does maintain its own buffer and could in principle have a lot more
> gumption about that [1]. I agree that we should copy everything into a
> buffer in one go.

Yeah, that was what I thought might happen. Even if glibc were smarter,
a lot of other libc implementations aren't. Also, ISTM that traversal
of the hash table will generally lead to more-or-less-random access into
the text file. So unless you assume that libc has mmap'd the whole
file, it'd have to do a lot of kernel calls anyway to get different pages
of the file into its buffer at different times.

I think that "read the whole file" is probably going to net out not any
more complex as far as our code goes, and it'll be making a lot fewer
assumptions about how smart the libc level is and what's happening at
the kernel level. I'll have a go at coding it, anyway.

> I think that making the LWLocking duration as brief as possible isn't
> critically important.

Maybe, but I don't like the idea that calling pg_stat_statements()
frequently could result in random glitches in response time for other
queries.

> Automated tools will only be interested in new
> query texts (implying possible I/O while share locking) as they
> observe new entries.

This argument would be more plausible if there were a way to fetch
the text for a previously-observed entry without invoking
pg_stat_statements(). I'm surprised you've not submitted a patch
to add such a function.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 21:22:47
Message-ID:	19429.1390425767@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> I ran an strace on a pg_stat_statements backend with a full ~5,000
> entries.

BTW, have you got any sort of test scenario you could share for this
purpose? I'm sure I could build something, but if you've already
done it ...

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 21:36:04
Message-ID:	CAM3SWZSoDumus=jQkRpQBhOM+QmdyhYKyPmR8-MmyLYUsxf5hQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2014 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, have you got any sort of test scenario you could share for this
> purpose? I'm sure I could build something, but if you've already
> done it ...

I simply ran the standard regression tests, and then straced a backend
as it executed the pgss-calling query. I'm not sure that I've
understood your question.

As previously noted, my approach to testing this patch involved variations of:

running "make installcheck-parallel"

Concurrently, running pgbench with multiple clients each calling
pg_stat_statements() and pg_stat_statements_reset(). The latter was
sometimes rigged to do a direct garbage collection to stress test
things. My expectation was that the sanity checking done everywhere
would complain if there were any race conditions or other bugs. This
was pretty effective as a smoke test during development.

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 21:38:13
Message-ID:	19886.1390426693@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Wed, Jan 22, 2014 at 1:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, have you got any sort of test scenario you could share for this
>> purpose? I'm sure I could build something, but if you've already
>> done it ...

> I simply ran the standard regression tests, and then straced a backend
> as it executed the pgss-calling query. I'm not sure that I've
> understood your question.

Hm, are there 5K distinct queries in the regression tests? I'd have
thought they weren't enough to fill a large pgss hash.

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-22 21:45:16
Message-ID:	CAM3SWZQm4h1OiGkprMa8V+4ZFTMiuYO0V3USFpO4Ywbujg1mKQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 22, 2014 at 1:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hm, are there 5K distinct queries in the regression tests? I'd have
> thought they weren't enough to fill a large pgss hash.

Well, yes. They're a really bad case for pg_stat_statements, in that
almost all distinct queries are only executed once or twice and yet
there are enough of them. I think this is generally useful for
testing.

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-25 19:04:29
Message-ID:	2080.1390676669@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I simply ran the standard regression tests, and then straced a backend
> as it executed the pgss-calling query. I'm not sure that I've
> understood your question.

> As previously noted, my approach to testing this patch involved variations of:
> running "make installcheck-parallel"

Do the regression tests fail for you when doing this?

What I see is a duplicate occurrence of an escape_string_warning bleat:

*** /home/postgres/pgsql/src/test/regress/expected/plpgsql.out Fri Jan 3 17:07
:46 2014
--- /home/postgres/pgsql/src/test/regress/results/plpgsql.out Sat Jan 25 13:37
:20 2014
***************
*** 4568,4573 ****
--- 4568,4579 ----
HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
QUERY: SELECT 'foo\\bar\041baz'
CONTEXT: PL/pgSQL function strtest() line 4 at RETURN
+ WARNING: nonstandard use of \\ in a string literal
+ LINE 1: SELECT 'foo\\bar\041baz'
+ ^
+ HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
+ QUERY: SELECT 'foo\\bar\041baz'
+ CONTEXT: PL/pgSQL function strtest() line 4 at RETURN
strtest
-------------
foo\bar!baz

======================================================================

which seems to happen because generate_normalized_query() reruns the
core lexer on the given statement, and if you got a warning the first
time, you'll get another one.

It seems this only happens with rather small values of
pg_stat_statements.max, else there's already a "RETURN text-constant"
query in the hashtable so we don't need to do reparsing right here.
(I'm testing with max = 100 to exercise the garbage collection code.)

I'm not sure this is a big deal for real-world use, because surely by now
everyone has either fixed their escape-string issues or disabled the
warning. But it's annoying for testing.

The big picture of course is that having this warning issued this way
is broken anyhow, and has been since day one, so it's not entirely
pg_stat_statements' fault. I'm just wondering if it's worth taking
the trouble to turn off the warning when reparsing.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-25 21:11:26
Message-ID:	8741.1390684286@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> I think that "read the whole file" is probably going to net out not any
> more complex as far as our code goes, and it'll be making a lot fewer
> assumptions about how smart the libc level is and what's happening at
> the kernel level. I'll have a go at coding it, anyway.

Attached is a revised patch that does the I/O that way, and also makes
substantial cleanups in the error handling. (The previous patch did
questionable things like resetting the shared hash table while callers
were actively iterating through it, and the error logging left a lot
to be desired as well.) I've chosen to handle failures to load query
text data by just returning NULL for that query text, which seems
reasonable given the new mindset that the query text is auxiliary data
less important than the actual counts.

I've not done much more than smoke-testing on this; I'm thinking about
hot-wiring the code to sometimes force it through the error paths,
just to make sure they act as expected. I've also not done any
real performance testing.

regards, tom lane

Attachment	Content-Type	Size
pg_stat_statements_ext_text-6.patch	text/x-diff	68.4 KB

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-25 21:13:15
Message-ID:	CAM3SWZQLR0zqq6P0WmCL7nR4-9TZ3gHez+5u28i5BX7EpNSHdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 25, 2014 at 11:04 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Do the regression tests fail for you when doing this?
>
> What I see is a duplicate occurrence of an escape_string_warning bleat:
>
> *** /home/postgres/pgsql/src/test/regress/expected/plpgsql.out Fri Jan 3 17:07
> :46 2014
> --- /home/postgres/pgsql/src/test/regress/results/plpgsql.out Sat Jan 25 13:37
> :20 2014
> ***************
> *** 4568,4573 ****
> --- 4568,4579 ----
> HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
> QUERY: SELECT 'foo\\bar\041baz'
> CONTEXT: PL/pgSQL function strtest() line 4 at RETURN
> + WARNING: nonstandard use of \\ in a string literal
> + LINE 1: SELECT 'foo\\bar\041baz'
> + ^
> + HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
> + QUERY: SELECT 'foo\\bar\041baz'
> + CONTEXT: PL/pgSQL function strtest() line 4 at RETURN
> strtest
> -------------
> foo\bar!baz
>
> ======================================================================
>
> which seems to happen because generate_normalized_query() reruns the
> core lexer on the given statement, and if you got a warning the first
> time, you'll get another one.

Oh, yes, I noticed that and reached the same conclusion. Sorry, I
probably should have mentioned this pro-actively.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-25 22:08:56
Message-ID:	CAM3SWZTbB96TJwSWcs=aeDCGDGwrgTSqRnPA2OqTvyvba36ZNg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 25, 2014 at 1:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I've chosen to handle failures to load query
> text data by just returning NULL for that query text, which seems
> reasonable given the new mindset that the query text is auxiliary data
> less important than the actual counts.

I guess that's okay.

> I've not done much more than smoke-testing on this; I'm thinking about
> hot-wiring the code to sometimes force it through the error paths,
> just to make sure they act as expected. I've also not done any
> real performance testing.

I was never too worried about the impact on performance, because any
price that must be paid is paid only when new entries are created,
which is typically a very rare event that was already expensive due to
requiring an exclusive lock. This patch will make it much rarer in
practice by increasing the pg_stat_statements.max default, and thus
reducing the impact on such exclusive locks on *all* sessions, not
just those executing less popular queries. I don't know about you, but
the reason that I didn't performance test this is that it's really
hard to think of a representative scenario where it could possibly
matter, even though it seems like there might be one.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-25 22:12:57
Message-ID:	CAM3SWZSr7k-0LYJb8PysnyvDN2hkgbut5ABQOrRQSaLL68d9zg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Why do you think it's better to release the shared lock while
generating a normalized query text, only to acquire it once more? I'm
not suggesting that it's the wrong thing to do. I'm curious about the
reasoning around assessing the costs.

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-25 22:20:24
Message-ID:	10426.1390688424@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> Why do you think it's better to release the shared lock while
> generating a normalized query text, only to acquire it once more? I'm
> not suggesting that it's the wrong thing to do. I'm curious about the
> reasoning around assessing the costs.

Well, it's fairly expensive to generate that text, in the case of a
large/complex statement. It's possible that continuing to hold the lock
is nonetheless the right thing to do because release+reacquire is also
expensive; but there is no proof of that AFAIK, and I believe that
release+reacquire is not likely to be expensive unless the lock is heavily
contended, which would be exactly the conditions under which keeping it
wouldn't be such a good idea anyway. So I'd prefer to leave it doing what
it did before, until there's some concrete evidence that keeping the lock
is a better idea.

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-26 00:03:09
Message-ID:	CAM3SWZS0yB4SdWqJz2NZTUYUKNCJA0KNR6Hu+nRCdn=FWEaVNg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 25, 2014 at 2:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, it's fairly expensive to generate that text, in the case of a
> large/complex statement. It's possible that continuing to hold the lock
> is nonetheless the right thing to do because release+reacquire is also
> expensive; but there is no proof of that AFAIK, and I believe that
> release+reacquire is not likely to be expensive unless the lock is heavily
> contended, which would be exactly the conditions under which keeping it
> wouldn't be such a good idea anyway.

My reason for only acquiring the shared lock once, and performing text
normalization with it held was that in practice most query texts are
not all that expensive to lex/normalize, and the cost of holding the
lock for the vast majority of sessions (that won't experience
contention) is nil. Acquiring the shared lock twice for something like
that struck me as unjustified. I though it was closer to the original
coding to lex with the lock, because of course the original coding
will only ever acquire the shared lock once. The original
lex-with-no-lock coding is only justified by "well, might as well".

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-26 18:38:48
Message-ID:	14522.1390761528@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Sat, Jan 25, 2014 at 2:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, it's fairly expensive to generate that text, in the case of a
>> large/complex statement. It's possible that continuing to hold the lock
>> is nonetheless the right thing to do because release+reacquire is also
>> expensive; but there is no proof of that AFAIK, and I believe that
>> release+reacquire is not likely to be expensive unless the lock is heavily
>> contended, which would be exactly the conditions under which keeping it
>> wouldn't be such a good idea anyway.

> My reason for only acquiring the shared lock once, and performing text
> normalization with it held was that in practice most query texts are
> not all that expensive to lex/normalize, and the cost of holding the
> lock for the vast majority of sessions (that won't experience
> contention) is nil.

Meh. This line of argument seems to reduce to "we don't need to worry
about performance of this code path because it won't be reached often".
I don't particularly buy that; it may be true in some usage patterns but
probably not all. And if you *do* buy it, it can be turned around to say
that we needn't worry about the costs of an extra locking cycle, anyway.

However, the only thing that is really going to settle this argument is
data, so here's a simple test case. I made a script that just consisted
of many repetitions of
\dd
SELECT pg_stat_statements_reset();
and ran it in psql while tracing the system with "perf". (\dd produces
a query that is middlingly complex, but certainly not a patch on those
I've seen produced by many real applications. The point of the reset
is to force us through the stats-entry-creation code path each time.)

The hits above 1%, in a non-assert-enabled backend:

13.92% 34576 postmaster [kernel.kallsyms] [k] 0xffffffff8103ed21
8.64% 21645 postmaster postgres [.] AllocSetAlloc
5.09% 12502 postmaster postgres [.] SearchCatCache
2.37% 6025 postmaster postgres [.] MemoryContextStrdup
2.22% 5624 postmaster libc-2.12.so [.] __strcmp_sse42
1.93% 4860 postmaster postgres [.] core_yylex
1.84% 4501 postmaster postgres [.] base_yyparse
1.83% 4601 postmaster postgres [.] ScanKeywordLookup
1.54% 3893 postmaster postgres [.] copyObject
1.54% 3849 postmaster postgres [.] MemoryContextAllocZeroAligned
1.30% 3237 postmaster postgres [.] expression_tree_walker
1.23% 3069 postmaster postgres [.] palloc
1.15% 2903 postmaster libc-2.12.so [.] memcpy
1.04% 2566 postmaster postgres [.] hash_uint32

So 3.76% of the entire runtime is going into core_yylex+ScanKeywordLookup,
and presumably half of that can be blamed on generate_normalized_query.
On the other hand, the earliest mention of lock-related functions in the
profile is

0.17% 411 postmaster postgres [.] LWLockAcquire

and LWLockRelease is down here:

0.11% 264 postmaster postgres [.] LWLockRelease

So on this example, the *total* cost of LWLocks, across the entire
backend, is something like 15% of the cost of generate_normalized_query.
This seems to me to be reasonably strong evidence that we should worry
about that cost, and that releasing/reacquiring the pgss LWLock is
trivial by comparison.

Of course, if the lock were contended then the locking cost would go
up substantially --- but that would also be a scenario in which it'd
be important not to hold the lock unnecessarily.

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-27 05:23:53
Message-ID:	CAM3SWZScShFysawm8MetVyvTe2F7aa9v5TdEdoZTPS5_3KoR3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Meh. This line of argument seems to reduce to "we don't need to worry
> about performance of this code path because it won't be reached often".

I think I may have over-elaborated, giving you the false impression
that this was something I felt strongly about. I'm glad that the
overhead has been shown to be quite low, and I think that lexing
without the lock held will be fine.

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-27 20:39:39
Message-ID:	21713.1390855179@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Sun, Jan 26, 2014 at 10:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Meh. This line of argument seems to reduce to "we don't need to worry
>> about performance of this code path because it won't be reached often".

> I think I may have over-elaborated, giving you the false impression
> that this was something I felt strongly about. I'm glad that the
> overhead has been shown to be quite low, and I think that lexing
> without the lock held will be fine.

OK. Committed after a couple of small further revisions.

regards, tom lane

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-27 20:44:02
Message-ID:	CAM3SWZSSwh5fW9jx-TqxqKHx5cz3AhfCHVGi0H2T_aPNJfVvrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 27, 2014 at 12:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> OK. Committed after a couple of small further revisions.

Great. Thank you.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-28 08:45:00
Message-ID:	CAM3SWZROM8qmnfSi_b+2BfraNBjeB=mj=wPHodS8PerzHENN3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I noticed a minor omission in the patch as committed. Attached patch
corrects this.

--
Peter Geoghegan

Attachment	Content-Type	Size
pg_stat_statements_tweak.patch	text/x-patch	553 bytes

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core
Date:	2014-01-28 17:34:53
Message-ID:	25549.1390930493@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> I noticed a minor omission in the patch as committed. Attached patch
> corrects this.

Duh. Thanks.

regards, tom lane