Timing events WIP v1

Lists: pgsql-hackers
From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Timing events WIP v1
Date: 2012-11-15 09:56:30
Message-ID: 50A4BC4E.4030007@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a first WIP saving Timing Events via a new hook, grabbed by
a new pg_timing_events contrib module. This only implements a small
portion of the RFP spec I presented earlier this month, and is by no
means finished code looking for a regular review. This is just enough
framework to do something useful while looking for parts that I suspect
have tricky details.

Right now this saves just a single line of text and inserts the hook
into the checkpoint code, so output looks like this:

$ psql -c "select * from pg_timing_events" -x
-[ RECORD 1 ]-
seq | 0
event | check/restartpoint complete: wrote 0 buffers (0.0%); 0
transaction log file(s) added, 0 removed, 0 recycled; write=0.000 s,
sync=0.000 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s

Extending this to save the key/value set and most of the other data I
mentioned before is pretty straightforward. There is a fair amount of
debris in here from the pg_stat_statements code this started as. I've
left a lot of that here but commented out because it gives examples of
grabbing things like the user and database I'll need soon.

There's a modest list of things that I've done or am facing soon that
involve less obvious choices though, and I would appreciate feedback on
these things in particular:

-This will eventually aggregate data from clients running queries, the
checkpointer process, and who knows what else in the future. The most
challenging part of this so far is deciding how to handle the memory
management side. I've tried just dumping everything into
TopMemoryContext, and that not working as hoped is the biggest known bug
right now. I'm not decided on whether to continue doing that but
resolve the bug; if there is an alternate context that makes more sense
for a contrib module like this; or if I should fix the size of the data
and allocate everything at load time. The size of the data from
log_lock_waits in particular will be hard to estimate in advance.

-The event queue into a simple array accessed in circular fashion.
After realizing that output needed to navigate in the opposite order of
element addition, I ended up just putting all the queue navigation code
directly into the add/output routines. I'd be happy to use a more
formal Postgres type here instead--I looked at SHM_QUEUE for
example--but I haven't found something yet that makes this any simpler.

-I modeled the hook here on the logging one that went into 9.2. It's
defined in its own include file and it gets initialized by the logging
system. No strong justification for putting it there, it was just
similar to the existing hook and that seemed reasonable. This is in
some respects an alternate form of logging after all. The data sent by
the hook itself needs to be more verbose in order to handle the full
spec, right now I'm just asking about placement.

-I'm growing increasingly worried about allowing concurrent reads of
this data (which might be large and take a while to return to the
client) without blocking insertions. The way that was split to improve
pg_stat_statements concurrency doesn't convert naturally to this data.
The serial number field in here is part of one idea I had. I might grab
the header with an exclusive lock for only a moment and lookup the
serial number of the last record I should display. Then I could drop to
a share lock looping over the elements, stopping if I find one with a
serial number over that.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
timing-events-v1.diff text/plain 24.9 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-11-15 18:26:20
Message-ID: 50A533CC.6010805@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Extending this to save the key/value set and most of the other data I
> mentioned before is pretty straightforward.

Why not use Hstore? Seriously?

It would require merging Hstore into core, but I don't necessarily see
that as a bad thing.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timing events WIP v1
Date: 2012-11-15 20:15:24
Message-ID: 20121115201524.GA14789@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith wrote:

> -The event queue into a simple array accessed in circular fashion.
> After realizing that output needed to navigate in the opposite order
> of element addition, I ended up just putting all the queue
> navigation code directly into the add/output routines. I'd be happy
> to use a more formal Postgres type here instead--I looked at
> SHM_QUEUE for example--but I haven't found something yet that makes
> this any simpler.

SHM_QUEUE is on the death row. Try a dlist from src/backend/lib/ilist.c

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-11-16 06:15:53
Message-ID: 50A5DA19.2070507@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/15/12 10:26 AM, Josh Berkus wrote:
>
>> Extending this to save the key/value set and most of the other data I
>> mentioned before is pretty straightforward.
>
> Why not use Hstore? Seriously?

I just haven't done that part yet. The fact that hstore is the likely
way to do it is why I said it's straightforward, and haven't gotten
stressed about not having that finished. I wanted to try and shake out
some feedback on hook location and how best to deal with memory context,
both of which might impact this. Extending the current hook setup I've
got here to pass data via hstore would mean it needs to pull into core.
If there's a better approach for adding a hook that avoids that I'd be
happy to adopt it, I just haven't thought of one.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


From: Craig Ringer <craig(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-11-16 08:20:33
Message-ID: 50A5F751.8060009@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/16/2012 02:26 AM, Josh Berkus wrote:
>> Extending this to save the key/value set and most of the other data I
>> mentioned before is pretty straightforward.
> Why not use Hstore? Seriously?
The only issue I see is that hstore's text format is non-standard, not
widely understood and a pain to work with in applications. You can
always write queries against it in Pg and extract normal row sets of
key/value pairs, so it's not a big problem - but I'd want to think about
adding a built-in `hstore_to_json` if we were going to use hstore to
manage the data, since json is so much more widely understood by client
applications.

Not that implementing `hstore_to_json` is exactly hard, mind you, as a
quick search shows: https://gist.github.com/2318757

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Craig Ringer <craig(at)2ndQuadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-11-20 08:36:26
Message-ID: 50AB410A.2030405@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/16/12 12:20 AM, Craig Ringer wrote:

> Not that implementing `hstore_to_json` is exactly hard, mind you, as a
> quick search shows: https://gist.github.com/2318757

Both pulling hstore more firmly into core and adopting something like a
hstore_to_json call as the preferred UI for timing event data are all
directions I'd be happy to go. That's why I stopped before trying to
even implement that part. I think the general direction to go is clear,
but the details on how to present the resulting data is more of a tarp
than even a bikeshed at this point. It's not a hard problem though.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-11-20 08:48:21
Message-ID: CAFj8pRBJj=bwBzxSc+TFMeBd1+7SAwCnicek2=kdrUsChxCPnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/11/20 Greg Smith <greg(at)2ndquadrant(dot)com>:
> On 11/16/12 12:20 AM, Craig Ringer wrote:
>
>> Not that implementing `hstore_to_json` is exactly hard, mind you, as a
>> quick search shows: https://gist.github.com/2318757
>
>
> Both pulling hstore more firmly into core and adopting something like a
> hstore_to_json call as the preferred UI for timing event data are all
> directions I'd be happy to go. That's why I stopped before trying to even
> implement that part. I think the general direction to go is clear, but the
> details on how to present the resulting data is more of a tarp than even a
> bikeshed at this point. It's not a hard problem though.
>

I don't like to see current hstore in core - It doesn't support
nesting, it doesn't support different datatypes, it is not well
supported by plpgsql

regards

Pavel

> --
> Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Craig Ringer <craig(at)2ndQuadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-11-21 01:02:26
Message-ID: 50AC2822.1020401@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/20/2012 04:48 PM, Pavel Stehule wrote:
> 2012/11/20 Greg Smith <greg(at)2ndquadrant(dot)com>:
>> On 11/16/12 12:20 AM, Craig Ringer wrote:
>>
>>> Not that implementing `hstore_to_json` is exactly hard, mind you, as a
>>> quick search shows: https://gist.github.com/2318757
>>
>> Both pulling hstore more firmly into core and adopting something like a
>> hstore_to_json call as the preferred UI for timing event data are all
>> directions I'd be happy to go. That's why I stopped before trying to even
>> implement that part. I think the general direction to go is clear, but the
>> details on how to present the resulting data is more of a tarp than even a
>> bikeshed at this point. It's not a hard problem though.
>>
> I don't like to see current hstore in core - It doesn't support
> nesting, it doesn't support different datatypes, it is not well
> supported by plpgsql
>

... or by many client libraries, though converting hstore values to json
notation for output usually takes care of that.

I can't help but suspect that hstore will be subsumed by native storage
and indexing of json objects, since these are already widely supported
in KV or document stores. Of course, that requires that we come to some
agreement on how to represent literal scalars for interaction with json,
which hasn't seen much progress.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Craig Ringer <craig(at)2ndQuadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-12-12 01:43:30
Message-ID: 50C7E142.7030605@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/20/12 8:02 PM, Craig Ringer wrote:
> On 11/20/2012 04:48 PM, Pavel Stehule wrote:

>> I don't like to see current hstore in core - It doesn't support
>> nesting, it doesn't support different datatypes, it is not well
>> supported by plpgsql
>>
>
> ... or by many client libraries, though converting hstore values to json
> notation for output usually takes care of that.

The more I look at this, the less comfortable I am moving forward with
hand waving this issue away. The obvious but seemingly all troublesome
options:

-Store things internally using hstore. This seems to require pulling
hstore fully into core, which has this list of issues. Outputting
through a JSON filter seems like a lot of overhead. And even if the
output concerns are addressed, if there's a problem processing the raw
data with PL/pgSQL that would be bad. I didn't think about that at all
until Pavel brought it up.

-Go back to a simpler composite type idea for storing this data. It
feels wrong to create something that looks a lot like hstore data, but
isn't. There's probably subtle display/client/processing issues in
there I haven't thought of too.

-Try to store the data in a JSON friendly way in the first place.

That last one seems to lead right to...

> I can't help but suspect that hstore will be subsumed by native storage
> and indexing of json objects, since these are already widely supported
> in KV or document stores. Of course, that requires that we come to some
> agreement on how to represent literal scalars for interaction with json,
> which hasn't seen much progress.

If this is the Right Way to solve this problem, that's puts a serious
snag in doing something useful with Timing Events in he near future. I
know better than to try and fight against following the correct path
once it's identified. I can't claim any good expertise on client
encoding/transfer issues though; that's not a strong area for me.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timing events WIP v1
Date: 2012-12-12 08:07:16
Message-ID: CAFj8pRAtA4K92pDSEKHcpN=aRUkEPjz9C5Kbkp9VDauSuLyvDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/12/12 Greg Smith <greg(at)2ndquadrant(dot)com>:
> On 11/20/12 8:02 PM, Craig Ringer wrote:
>>
>> On 11/20/2012 04:48 PM, Pavel Stehule wrote:
>
>
>>> I don't like to see current hstore in core - It doesn't support
>>> nesting, it doesn't support different datatypes, it is not well
>>> supported by plpgsql
>>>
>>
>> ... or by many client libraries, though converting hstore values to json
>> notation for output usually takes care of that.
>
>
> The more I look at this, the less comfortable I am moving forward with hand
> waving this issue away. The obvious but seemingly all troublesome options:
>
> -Store things internally using hstore. This seems to require pulling hstore
> fully into core, which has this list of issues. Outputting through a JSON
> filter seems like a lot of overhead. And even if the output concerns are
> addressed, if there's a problem processing the raw data with PL/pgSQL that
> would be bad. I didn't think about that at all until Pavel brought it up.
>
> -Go back to a simpler composite type idea for storing this data. It feels
> wrong to create something that looks a lot like hstore data, but isn't.
> There's probably subtle display/client/processing issues in there I haven't
> thought of too.
>
> -Try to store the data in a JSON friendly way in the first place.
>
> That last one seems to lead right to...
>
>> I can't help but suspect that hstore will be subsumed by native storage
>> and indexing of json objects, since these are already widely supported
>> in KV or document stores. Of course, that requires that we come to some
>> agreement on how to represent literal scalars for interaction with json,
>> which hasn't seen much progress.
>
>
> If this is the Right Way to solve this problem, that's puts a serious snag
> in doing something useful with Timing Events in he near future. I know
> better than to try and fight against following the correct path once it's
> identified. I can't claim any good expertise on client encoding/transfer
> issues though; that's not a strong area for me.
>

I didn't watch this discussion all time, but I don't understand why
you need a complex datatypes.

Maybe you can do normalization and complex data types can hold only in memory.

Regards

Pavel

> --
> Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timing events WIP v1
Date: 2013-01-14 16:19:42
Message-ID: CAEYLb_X0=s+jFgYGTs0n3t0ga4=0qTTddub2USR+bLBgJn0Ufw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I decided to take a look at this.

On 15 November 2012 09:56, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> -I modeled the hook here on the logging one that went into 9.2. It's
> defined in its own include file and it gets initialized by the logging
> system. No strong justification for putting it there, it was just similar
> to the existing hook and that seemed reasonable. This is in some respects
> an alternate form of logging after all. The data sent by the hook itself
> needs to be more verbose in order to handle the full spec, right now I'm
> just asking about placement.

I noticed that when !log_checkpoints, control never reaches the site
where the hook is called, and thus the checkpoint info is not stored.
Is that the intended behaviour of the patch? I don't feel strongly
either way, but it did give me pause for a second. pg_stat

I don't like the design of this patch. It seems like we should be
moving as far away from an (internal) textual representation as
possible - that's basically the main reason why logging is bad.
Textual representations make easy, correct querying hard, are bloated,
and are really no better than manual logging.

The first question I'd ask is "what is true of all timing events?".
You address this in your original RFC [1]. However, this isn't
reflected in the datastructures that the patch uses, and I suspect
that it should be.

What I'm envisioning is new NodeTag infrastructure. You'd have an
"abstract base class". This is sort of like the one used by plan nodes
(i.e. the Plan struct). This would contain the information that is
essential to all timing events (in the case of Plan, this includes
startup and total costs for the node in question). You'd refactor
structs like CheckpointStatsData, to compose them such that their
first field was this parent struct (i.e. logically, they'd inherit
from, say, TimingEvent, which itself would inherit from Node, in the
style of Plan's children). That'd entail normalising things like
inconsistent use of datatypes to represent time intervals and whatnot
among these sorts of structs in the extant code, which is painful, but
long term it seems like the way forward.

With this commonality and variability analysis performed, and having
refactored the code, you'd be able to use type punning to pass the
essential information to your new hook. You'd have a way to usefully
restrict statistics to only one class of timing event, because you
could expose the NodeTag in your view (or perhaps a prettified version
thereof), by putting something in a query predicate. You can perform
aggregation, too. Maybe this abstract base class would look something
like:

typedef struct TimingEvent
{
NodeTag type;

/*
* Common structural data for all TimingEvent types.
*/
double duration; /* Duration of event in milliseconds (per project
policy for views) */
double naffected; /* Number of objects (of variable type) affected */
char *string; /* Non-standardised event string */
} TimingEvent;

However, my failure to come up with more fields here leads me to
believe that we'd be better off with a slightly different approach.
Let's forget about storing text altogether, and make the unstructured
"union" field JSON (while making fields common to all timing events
plain-old scalar datums, for ease of use).

It would become the job of PGTE_memsize() to make each entry in the
shared hashtable wide enough to accommodate the widest of all possible
TimingEvent-inheriting structs. We'd just store heterogeneous
TimingEvent-inheriting structs in the hash table. It would be
pg_timing_events job to call a JSON conversion routine directly, with
baked-in knowledge, within a NodeTag case statement.

It actually wouldn't be hard to have the view spit out some JSON from
the internal, compact struct representation. Currently, explain.c
manages to generate JSON representations of plans with very little
fuss, and without using any of the JSON datatype stuff. Doing this
with hstore is just way too controversial, and not particularly worth
it, IMHO. With a "where event_type = x" in the query predicate, the
JSON datums would have predictable, consistent structure, facilitating
machine reading and aggregation.

[1] Original RFC e-mail:
http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timing events WIP v1
Date: 2013-01-14 20:37:40
Message-ID: 50F46C94.5030906@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/14/13 11:19 AM, Peter Geoghegan wrote:
> I noticed that when !log_checkpoints, control never reaches the site
> where the hook is called, and thus the checkpoint info is not stored.
> Is that the intended behaviour of the patch?

I was aware and considered it a defensible situation--that turning off
checkpoint logging takes that data out everywhere. But it was not
necessarily the right thing to do.

> Currently, explain.c
> manages to generate JSON representations of plans with very little
> fuss, and without using any of the JSON datatype stuff. Doing this
> with hstore is just way too controversial, and not particularly worth
> it, IMHO.

I wasn't optimistic after seeing the number of bugs that scurried out
when the hstore rock was turned over for this job. On the
implementation side, the next round of code I've been playing with here
has struggled with the problem of rendering to strings earlier than I'd
like. I'd like to delay that as long as possible; certainly not do it
during storage, and preferably it only happens when someone asks for the
timing event.

> With a "where event_type = x" in the query predicate, the
> JSON datums would have predictable, consistent structure, facilitating
> machine reading and aggregation.

Filtering on a range of timestamps or on the serial number field is the
main thing that I imagined, as something that should limit results
before even producing tuples. The expected and important case where
someone wants "all timing events after #123" after persisting #123 to
disk, I'd like that to be efficient. All of the fields I'd want to see
filtering on are part of the fixed set of columns every entry will have.

To summarize, your suggestion is to build an in-memory structure capable
of holding the timing event data. The Datum approach will be used to
cope with different types of events having different underlying types.
The output format for queries against this data set will be JSON,
rendered directly from the structure similarly to how EXPLAIN (FORMAT
JSON) outputs query trees. The columns every row contains, like a
serial number, timestamp, and pid, can be filtered on by something
operating at the query executor level. Doing something useful with the
more generic, "dynamic schema" parts will likely require parsing the
JSON output.

Those are all great ideas I think people could live with.

It looks to me like the hook definition itself would need the entire
data structure defined, and known to work, before its API could be
nailed down. I was hoping that we might get a hook for diverting this
data committed into 9.3 even if the full extension to expose it wasn't
nailed down. That was based on similarity to the generic logging hook
that went into 9.2. This new implementation idea reminds me more of the
query length decoration needed for normalized pg_stat_statements
though--something that wasn't easy to just extract out from the consumer
at all.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com