Re: Logging checkpoints and other slowdown causes

Lists: pgsql-hackerspgsql-patches
From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Logging checkpoints and other slowdown causes
Date: 2007-04-01 07:27:46
Message-ID: Pine.GSO.4.64.0704010051050.14920@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

This patch puts entries in the log file around some backend events that
regularly cause client pauses, most of which are visible only when running
with log_min_messages = debug2. The main focus is regularly occuring
checkpoints, from either running out of segments or hitting a timeout.
This is an example of what gets written to the logs at checkpoint time:

DEBUG2: checkpoint required (timeout passed)
DEBUG2: checkpoint starting
DEBUG2: checkpoint flushing buffer pool
DEBUG2: checkpoint database fsync starting
DEBUG2: checkpoint database fsync complete
DEBUG1: checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms
sync=3716.5 ms
DEBUG2: checkpoint complete; 0 transaction log file(s) added, 0 removed,
0 recycled

The idea here is that if you want to really watch what your system is
doing, either to track down a problem or for benchmarking, you log at
DEBUG2 and timestamp the logs with something like log_line_prefix =
'[%m]:'. Then line up your timestamped transactions with the logs to
figure out what happened during bad periods. This is very easy to do with
the timestamped pgbench latency logs for example, so you can put them in
time sequence order and see something like "oh, those long transactions
all came in between the fsync start and end". The summary line with the
write/sync timing information has a broader use and suggests when you need
to go deeper, which is why I put that one at DEBUG1.

I also adjust some other log messages to make it easier to run the system
at DEBUG2 without going completely crazy, like moving individual WAL
segment recycles to DEBUG3 and standardizing the format of the fork/exit
messages so they're easier to filter out.

My main concern about this code is how it adds several gettimeofday calls
and complicated string building inside FlushBufferPool, all of which is
wasted if this data isn't actually going to get logged. I'd like to have
something that tells me whether I should even bother, but I didn't want to
poke into the elog variables for fear of adding a dependency on its
current structure to here. Ideally, it would be nice to call something
like WillLog(DEBUG2) and get a boolean back saying whether something at
that log level will be output anywhere; I don't know enough about the
logging code to add such a thing to it myself.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

Attachment Content-Type Size
checkpointlog.txt text/plain 8.1 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-04-03 01:23:17
Message-ID: 200704030123.l331NHj11400@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Greg Smith wrote:
> This patch puts entries in the log file around some backend events that
> regularly cause client pauses, most of which are visible only when running
> with log_min_messages = debug2. The main focus is regularly occuring
> checkpoints, from either running out of segments or hitting a timeout.
> This is an example of what gets written to the logs at checkpoint time:
>
> DEBUG2: checkpoint required (timeout passed)
> DEBUG2: checkpoint starting
> DEBUG2: checkpoint flushing buffer pool
> DEBUG2: checkpoint database fsync starting
> DEBUG2: checkpoint database fsync complete
> DEBUG1: checkpoint buffers dirty=16.7 MB (13.0%) write=174.8 ms
> sync=3716.5 ms
> DEBUG2: checkpoint complete; 0 transaction log file(s) added, 0 removed,
> 0 recycled
>
> The idea here is that if you want to really watch what your system is
> doing, either to track down a problem or for benchmarking, you log at
> DEBUG2 and timestamp the logs with something like log_line_prefix =
> '[%m]:'. Then line up your timestamped transactions with the logs to
> figure out what happened during bad periods. This is very easy to do with
> the timestamped pgbench latency logs for example, so you can put them in
> time sequence order and see something like "oh, those long transactions
> all came in between the fsync start and end". The summary line with the
> write/sync timing information has a broader use and suggests when you need
> to go deeper, which is why I put that one at DEBUG1.
>
> I also adjust some other log messages to make it easier to run the system
> at DEBUG2 without going completely crazy, like moving individual WAL
> segment recycles to DEBUG3 and standardizing the format of the fork/exit
> messages so they're easier to filter out.
>
> My main concern about this code is how it adds several gettimeofday calls
> and complicated string building inside FlushBufferPool, all of which is
> wasted if this data isn't actually going to get logged. I'd like to have
> something that tells me whether I should even bother, but I didn't want to
> poke into the elog variables for fear of adding a dependency on its
> current structure to here. Ideally, it would be nice to call something
> like WillLog(DEBUG2) and get a boolean back saying whether something at
> that log level will be output anywhere; I don't know enough about the
> logging code to add such a thing to it myself.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-patches(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-04-03 06:14:33
Message-ID: 200704030814.33999.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith wrote:
> This patch puts entries in the log file around some backend events
> that regularly cause client pauses, most of which are visible only
> when running with log_min_messages = debug2.

Something that is aimed at a user should not be enabled at a "debug"
level. Debug levels are for debugging, not for very high verbosity.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-04-03 13:53:37
Message-ID: Pine.GSO.4.64.0704030927180.5732@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 3 Apr 2007, Peter Eisentraut wrote:

> Something that is aimed at a user should not be enabled at a "debug"
> level. Debug levels are for debugging, not for very high verbosity.

I asked for feedback about where to log at when I intially sent the first
version of this in and didn't hear anything back on that part, so I pushed
these in line with other log messages I saw. The messages for when
checkpoints start and stop were both logged at DEBUG2, so I put progress
reports on the other significant phases of the process there as well.

I don't expect these messages will be helpful for a normal user--that's
what the new data in pg_stats_bgwriter is for. Their main purpose of this
patch is debugging checkpoint related performance issues at a level I'd
expect only a developer to work at; they're also helpful for someone
writing benchmark code.

There are several patches in process floating around that aim to adjust
either the background writer or the checkpoint process to reduce the
impact of checkpoints. This logging allows grading their success at that.
As my tests with this patch in place suggest this problem is far from
solved with any of the current suggestions, I'd like to get other
developers looking at that problem the same way I have been; that's why
I'd like to see some standardization on how checkpoints are instrumented.
The fact that really advanced users might also use this for
troubleshooting I consider a bonus rather than the main focus here.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-11 10:01:38
Message-ID: 46443F02.6060300@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith wrote:
> On Tue, 3 Apr 2007, Peter Eisentraut wrote:
>
>> Something that is aimed at a user should not be enabled at a "debug"
>> level. Debug levels are for debugging, not for very high verbosity.
>
> I asked for feedback about where to log at when I intially sent the
> first version of this in and didn't hear anything back on that part, so
> I pushed these in line with other log messages I saw. The messages for
> when checkpoints start and stop were both logged at DEBUG2, so I put
> progress reports on the other significant phases of the process there as
> well.

I agree that debug levels are not suitable for this.

I'm thinking of INFO, NOTICE or LOG. The user manual says about LOG:

LOG

Reports information of interest to administrators, e.g., checkpoint
activity.

But looking at the code, all the checkpoint related messages are at
DEBUG-levels, nothing gets printed at LOG-level. Printing the messages
at LOG-level would bring the code in line with the documentation, but I
don't think we want to fill the log with checkpoint chatter unless the
DBA explicitly asks for that.

How about INFO? It seems like the best level for information on normal
activity of the system. The documentation however has this to say about it:

INFO

Provides information implicitly requested by the user, e.g., during
VACUUM VERBOSE.

We should adjust the documentation, but INFO seems like the best level
to me. Or we could add a GUC variable similar to log_connections or
log_statement to control if the messages are printed or not, and use LOG.

> I don't expect these messages will be helpful for a normal user--that's
> what the new data in pg_stats_bgwriter is for. Their main purpose of
> this patch is debugging checkpoint related performance issues at a level
> I'd expect only a developer to work at; they're also helpful for someone
> writing benchmark code.

I disagree. They would be helpful to an administrator chasing down
checkpoint related problems. E.g. checkpoints taking too long, occurring
too often (though we already have log_checkpoint_warning for that), or
to identify if transient performance problems that users complain about
coincide with checkpoints. And at least I like to have messages like
that in the log just to comfort me that everything is going ok.

> There are several patches in process floating around that aim to adjust
> either the background writer or the checkpoint process to reduce the
> impact of checkpoints. This logging allows grading their success at
> that. As my tests with this patch in place suggest this problem is far
> from solved with any of the current suggestions, I'd like to get other
> developers looking at that problem the same way I have been; that's why
> I'd like to see some standardization on how checkpoints are
> instrumented. The fact that really advanced users might also use this
> for troubleshooting I consider a bonus rather than the main focus here.

Agreed.

Looking at the patch, I think we want one line in the log when
checkpoint starts, one when buffer flushing starts (or maybe not, if we
assume that checkpointing clog, subtrans and multixact don't take long),
one when sync-phase starts and when the checkpoint is done. We don't
need to print the times elapsed in each phase on a separate line, that's
just derived information from the other lines, unless we use different
log-levels for detail lines (like you did in your patch).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-11 11:29:09
Message-ID: 1d4e0c10705110429n3e7f1f9as8eab7fcbeb610758@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 5/11/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
> I agree that debug levels are not suitable for this.

Squid (the (reverse) proxy) has a nice feature for debugging purposes
(admin and developers). It's possible to have different logging level
for different sections of the code. For example, if you want to debug
a particular part, you can have ALL,1 33,2 in your config file which
allows you to have a more verbose debug on the section 33 of the code
(33 is ACL).
It's really useful because you can even debug a particular code path
or a misconfiguration in production without generating a huge amount
of logs.

--
Guillaume


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-11 11:41:36
Message-ID: 46445670.9020209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Guillaume Smet wrote:
> On 5/11/07, Heikki Linnakangas <heikki(at)enterprisedb(dot)com> wrote:
>> I agree that debug levels are not suitable for this.
>
> Squid (the (reverse) proxy) has a nice feature for debugging purposes
> (admin and developers). It's possible to have different logging level
> for different sections of the code. For example, if you want to debug
> a particular part, you can have ALL,1 33,2 in your config file which
> allows you to have a more verbose debug on the section 33 of the code
> (33 is ACL).
> It's really useful because you can even debug a particular code path
> or a misconfiguration in production without generating a huge amount
> of logs.

Yes, that'd be really nice, though using numeric codes is a bit archaic.
In Java-world, a logging framework called log4j has become the de facto
standard to implement logging. It's basically the same idea, you have
different log levels, and you can have enable different levels of
logging for different modules. I'm sure there's similar libraries for C,
that's something to think about in future releases.

8.3 will add some new log_* GUC variables, log_autovacuum and
log_temp_files, so it seems we're headed to that direction already. I'm
now thinking that's the approach we should take for checkpoints as well:
add a new GUC variable, log_checkpoints, and print the messages at
LOG-level if that's enabled.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-11 14:31:09
Message-ID: Pine.GSO.4.64.0705111009080.18685@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 11 May 2007, Heikki Linnakangas wrote:

> Printing the messages at LOG-level would bring the code in line with the
> documentation, but I don't think we want to fill the log with checkpoint
> chatter unless the DBA explicitly asks for that. How about INFO?

In putting that together, I pushed everything toward the lowest DEBUG
level that it was possible for it to run at without the messages I wanted
to see being completely lost. I didn't want this to get rejected just on
the basis of being too chatty for regular users. Putting a single line in
for each checkpoint at INFO, with all the individual progress points being
at DEBUG1 or 2, would be close to optimal as far as I'm concerned. More
on this below.

> Or we could add a GUC variable similar to log_connections or log_statement to
> control if the messages are printed or not, and use LOG.

If the levels for the individual messages are adjusted usefully the
ability to control whether they show up or not falls out of the existing
log level adjustments; I haven't ever felt a strong need for GUC when
using this code. If, as you say, there's already a move in this
direction, then fine--another GUC would be better.

> We don't need to print the times elapsed in each phase on a separate
> line, that's just derived information from the other lines, unless we
> use different log-levels for detail lines

Let me turn this around for a second and ask you this: what do you think
people are going to do with this information? I can tell you what I do.
I parse the logs and look for that single line with the summary
information. I then take the three pieces of information it provides (MB
written, time to write, time to fsync) and save them into a table. From
there I generate stats, scatter plots, all sorts of useful stuff.

If you know when the checkpoint ended, and you know how long each of the
pieces took, you can reconstruct the other times easily. The way you
describe this it is true--that the summary is redundant given the
detail--but if you put yourself in the shoes of a log file parser the
other way around is easier to work with. Piecing together log entries is
a pain, splitting them is easy.

If I had to only keep one line out of this, it would be the one with the
summary. It would be nice to have it logged at INFO.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-11 15:37:44
Message-ID: 46448DC8.4050102@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith wrote:
> On Fri, 11 May 2007, Heikki Linnakangas wrote:
>> Or we could add a GUC variable similar to log_connections or
>> log_statement to control if the messages are printed or not, and use LOG.
>
> If the levels for the individual messages are adjusted usefully the
> ability to control whether they show up or not falls out of the existing
> log level adjustments; I haven't ever felt a strong need for GUC when
> using this code. If, as you say, there's already a move in this
> direction, then fine--another GUC would be better.

The need for GUC becomes apparent when you start adding logging to other
modules as well, like we did for autovacuum a while back. If there
wasn't a log_autovacuum parameter, you'd have to choose between logging
checkpoints and autovacuum actions, or neither.

>> We don't need to print the times elapsed in each phase on a separate
>> line, that's just derived information from the other lines, unless we
>> use different log-levels for detail lines
>
> Let me turn this around for a second and ask you this: what do you
> think people are going to do with this information? I can tell you what
> I do. I parse the logs and look for that single line with the summary
> information. I then take the three pieces of information it provides
> (MB written, time to write, time to fsync) and save them into a table.
> From there I generate stats, scatter plots, all sorts of useful stuff.
>
> If you know when the checkpoint ended, and you know how long each of the
> pieces took, you can reconstruct the other times easily. The way you
> describe this it is true--that the summary is redundant given the
> detail--but if you put yourself in the shoes of a log file parser the
> other way around is easier to work with. Piecing together log entries
> is a pain, splitting them is easy.
>
> If I had to only keep one line out of this, it would be the one with the
> summary. It would be nice to have it logged at INFO.

Yeah, if we have the summary line we don't need the other lines and vice
versa. I have sympathy for parsing log files, I've done that a lot in
the past and I can see what you mean. Having the individual lines is
nice when you're monitoring a running system; you don't get the summary
line until the checkpoint is finished. I suppose we can have both the
individual lines and the summary, the extra lines shouldn't hurt anyone,
and you won't get them unless you turn on the new log_checkpoints
parameter anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
Date: 2007-05-12 21:19:48
Message-ID: 20070512211948.GA58786@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Moving to -hackers.

On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
> >If you know when the checkpoint ended, and you know how long each of the
> >pieces took, you can reconstruct the other times easily. The way you
> >describe this it is true--that the summary is redundant given the
> >detail--but if you put yourself in the shoes of a log file parser the
> >other way around is easier to work with. Piecing together log entries
> >is a pain, splitting them is easy.
> >
> >If I had to only keep one line out of this, it would be the one with the
> >summary. It would be nice to have it logged at INFO.
>
> Yeah, if we have the summary line we don't need the other lines and vice
> versa. I have sympathy for parsing log files, I've done that a lot in
> the past and I can see what you mean. Having the individual lines is
> nice when you're monitoring a running system; you don't get the summary
> line until the checkpoint is finished. I suppose we can have both the
> individual lines and the summary, the extra lines shouldn't hurt anyone,
> and you won't get them unless you turn on the new log_checkpoints
> parameter anyway.

Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-12 21:26:02
Message-ID: 464630EA.5000701@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


> Not to beat a dead horse, but do we really want to force folks to be
> parsing logs for performance monitoring? Especially if that log parsing
> is just going to result in data being inserted into a table anyway?
>
> I know there's concern about performance of the stats system and maybe
> that needs to be addressed, but pushing users to log parsing is a lot of
> extra effort, non-standard, likely to be overlooked, and doesn't play
> well with other tools. It also conflicts with all the existing
> statistics framework.

One thing that doesn't seemed to be being looked at it is the cost of
logging. Logging is very expensive. I don't know if it is more expensive
than the stats system, but you can cut your tps in half by having any
level of verbose logging on.

Yes that can be offset by pushing the logging to another spindle, and
being careful about what you are logging but still.

Either way, we are taking the hit, it is just a matter of where. IMO it
would be better to have the information in the database where it makes
sense, than pushing out to a log that:

A. Will likely be forgotten
B. Is only accessible if you have shell access to the machine (not as
common as all of us would like to think)

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)
Date: 2007-05-12 23:09:55
Message-ID: Pine.GSO.4.64.0705121839520.7527@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 12 May 2007, Jim C. Nasby wrote:

> Not to beat a dead horse, but do we really want to force folks to be
> parsing logs for performance monitoring?

All of the really interesting DBA level information about checkpoints is
now sitting in pg_stat_bgwriter. There really is no reason I'd expect
this information to be part of normal performance monitoring setup; for me
it has fit into troubleshooting and R&D.

When I'm talking about parsing logs in this context, it's for answering
questions like "how does fsync time correlate with amount of data written
at checkpoint time?" Since I've found it educational to sort through
that, I just didn't want the patch modified so that was harder to do than
it had to be.

If I thought it worth the trouble to expose that level of information via
the stats interface, I'd have submitted that instead of this log-based
patch. This circles back to the previous discussion of whether this
particular information is strictly developer-level.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-12 23:24:51
Message-ID: Pine.GSO.4.64.0705121910010.9615@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 12 May 2007, Joshua D. Drake wrote:

> One thing that doesn't seemed to be being looked at it is the cost of
> logging.

If any of this executed at something like the query level, sure, that
would be real important. The majority of the logging I suggested here is
of things that happen at checkpoint time. The presumption is that the
overhead of that is considerably greater than writing a log line or even
five.

The least intensive event I would like to be loggable is when a new WAL
segment is created and cleared, which is again a pretty small bit of log
compared to the 16MB write. I wouldn't mind seeing that exposed under
pg_stats instead, just had more interesting things to statify first.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Neil Conway <neilc(at)samurai(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-13 04:41:24
Message-ID: 1179031284.2830.19.camel@goldbach
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 2007-12-05 at 14:26 -0700, Joshua D. Drake wrote:
> Either way, we are taking the hit, it is just a matter of where. IMO it
> would be better to have the information in the database where it makes
> sense, than pushing out to a log

If performance monitoring information is provided as a database object,
what would the right interface be? IMHO the problem with cleanly
presenting monitoring information within a normal database system is
that this sort of data is fundamentally dynamic and continuous: to
determine how the performance of the system changes over time, you need
to repeatedly rescan the table/view/SRF and recompute your analysis
essentially from scratch. Trying to get even simple information like
"queries per second" from pg_stat_activity is an example of how this can
be painful.

<plug>
BTW, if the system included the concept of a continuous data *stream* as
a kind of database object, this problem would be much more tractable :)
In fact, there is some code in a version of TelegraphCQ that exposes
various information about the runtime state of the system as a set of
system-defined data streams -- like any other stream, users could then
use those streams in arbitrary queries.
</plug>

-Neil


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-13 06:54:20
Message-ID: 4646B61C.1010103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jim C. Nasby wrote:
> Moving to -hackers.
>
> On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
>>> If you know when the checkpoint ended, and you know how long each of the
>>> pieces took, you can reconstruct the other times easily. The way you
>>> describe this it is true--that the summary is redundant given the
>>> detail--but if you put yourself in the shoes of a log file parser the
>>> other way around is easier to work with. Piecing together log entries
>>> is a pain, splitting them is easy.
>>>
>>> If I had to only keep one line out of this, it would be the one with the
>>> summary. It would be nice to have it logged at INFO.
>> Yeah, if we have the summary line we don't need the other lines and vice
>> versa. I have sympathy for parsing log files, I've done that a lot in
>> the past and I can see what you mean. Having the individual lines is
>> nice when you're monitoring a running system; you don't get the summary
>> line until the checkpoint is finished. I suppose we can have both the
>> individual lines and the summary, the extra lines shouldn't hurt anyone,
>> and you won't get them unless you turn on the new log_checkpoints
>> parameter anyway.
>
> Not to beat a dead horse, but do we really want to force folks to be
> parsing logs for performance monitoring? Especially if that log parsing
> is just going to result in data being inserted into a table anyway?
>
> I know there's concern about performance of the stats system and maybe
> that needs to be addressed, but pushing users to log parsing is a lot of
> extra effort, non-standard, likely to be overlooked, and doesn't play
> well with other tools. It also conflicts with all the existing
> statistics framework.

There is two counters for checkpoints in pgstats, the number of timed
(triggered by checkpoint_timeout) and requested (triggered by
checkpoint_segments) checkpoints.

Maybe we should improve the stats system so that we can collect events
with timestamps and durations, but in my experience log files actually
are the most reliable and universal way to collect real-time performance
information. Any serious tool has a generic log parser. The other
alternative is SNMP. I welcome the efforts on pgsnmpd..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-13 15:37:01
Message-ID: 4647309D.4090805@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Heikki Linnakangas wrote:
>>> Yeah, if we have the summary line we don't need the other lines and
>>> vice versa. I have sympathy for parsing log files, I've done that a
>>> lot in the past and I can see what you mean. Having the individual
>>> lines is nice when you're monitoring a running system; you don't get
>>> the summary line until the checkpoint is finished. I suppose we can
>>> have both the individual lines and the summary, the extra lines
>>> shouldn't hurt anyone, and you won't get them unless you turn on the
>>> new log_checkpoints parameter anyway.
>>
>> Not to beat a dead horse, but do we really want to force folks to be
>> parsing logs for performance monitoring? Especially if that log parsing
>> is just going to result in data being inserted into a table anyway?
>>
>> I know there's concern about performance of the stats system and maybe
>> that needs to be addressed, but pushing users to log parsing is a lot of
>> extra effort, non-standard, likely to be overlooked, and doesn't play
>> well with other tools. It also conflicts with all the existing
>> statistics framework.
>
> There is two counters for checkpoints in pgstats, the number of timed
> (triggered by checkpoint_timeout) and requested (triggered by
> checkpoint_segments) checkpoints.
>
> Maybe we should improve the stats system so that we can collect events
> with timestamps and durations, but in my experience log files actually
> are the most reliable and universal way to collect real-time performance
> information. Any serious tool has a generic log parser. The other
> alternative is SNMP. I welcome the efforts on pgsnmpd..

pgsnmpd can't provide any information that's not in the backend. Unless
we'd turn it into a log parser, which is not really something I think is
a good idea.

Log files are great for one kind of thing, "live statistics" through
SNMP or the statistics collector for another kind. It only gets wrong
when you put them in the wrong place. Things you poll regularly makes a
lot more sense in some kind of live view than in a log file.

//Magnus


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-13 20:46:41
Message-ID: 20070513204640.GL60707@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, May 13, 2007 at 07:54:20AM +0100, Heikki Linnakangas wrote:
> Maybe we should improve the stats system so that we can collect events
> with timestamps and durations, but in my experience log files actually
> are the most reliable and universal way to collect real-time performance
> information. Any serious tool has a generic log parser. The other
> alternative is SNMP. I welcome the efforts on pgsnmpd..

Having timing information in the stats system would be useful, but I'm
not sure how it could actually be done. But at least if the information
is in the stats system it's easy to programatically collect and process.
SNMP is just one example of that (fwiw I agree with Magnus that it
probably doesn't make sense to turn pgsnmpd into a log parser...)
--
Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance monitoring
Date: 2007-05-14 11:12:49
Message-ID: 46484431.1070307@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Heikki Linnakangas napsal(a):
> Jim C. Nasby wrote:

>
> There is two counters for checkpoints in pgstats, the number of timed
> (triggered by checkpoint_timeout) and requested (triggered by
> checkpoint_segments) checkpoints.
>
> Maybe we should improve the stats system so that we can collect events
> with timestamps and durations, but in my experience log files actually
> are the most reliable and universal way to collect real-time performance
> information. Any serious tool has a generic log parser. The other
> alternative is SNMP. I welcome the efforts on pgsnmpd..

Whats about add some DTrace probes?

Zdenek


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-28 20:25:31
Message-ID: 465B3ABB.6010709@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Here's a work-in-progress updated version of this patch. It's not
completed, but I've settled on:

- a log_checkpoints GUC variable to control whether anything is printed
or not. This is in line with log_connections, log_autovacuum etc.

- when checkpoints are logged, LOG-level is used

- the following log output:

LOG: checkpoint starting (CHECKPOINT command)
LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms
sync=32.0 ms

where "checkpoint starting" is obviously printed when checkpoint starts,
and "checkpoint complete" when the checkpoint is fully complete.

- all information on the progress of checkpoint is collected to a global
struct. A global struct sounds a bit ugly, but it avoids the ugliness of
littering all functions with multiple return values to pass back
information to the caller that's only used to print it out. This makes
it easy to add more data points. MoveOfflineLogs should probably be
changed to use that as well for the sake of consistency, instead of
returning nsegsremoved and nsegsrecycled through pointers.

- RequestCheckpoint and CreateCheckpoint was changed to take a
CheckpointReason enum as argument, instead of boolean flags. This allows
printing a different message depending on the cause of checkpoint, and
is also cleaner IMHO.

I'm posting now to allow others, and Greg Smith in particular, to review
the changes this far, and take over from here.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
cplog-1.patch text/x-diff 25.3 KB

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-28 20:48:39
Message-ID: 1180385320.26297.17.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2007-05-28 at 21:25 +0100, Heikki Linnakangas wrote:

> - RequestCheckpoint and CreateCheckpoint was changed to take a
> CheckpointReason enum as argument, instead of boolean flags. This allows
> printing a different message depending on the cause of checkpoint, and
> is also cleaner IMHO.

Looks good.

I'd be inclined to make the reason code for creating/dropping database
the same and just say "Cluster DDL" or something, since for analysis
purposes we just want to know it was DDL.

I'd rather see the timings in seconds rather than ms. Counting
significant digits will be a pain...

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-29 02:21:54
Message-ID: Pine.GSO.4.64.0705282200190.28443@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I'll take another stab at refining this can of worms I opened. The one
thing I noticed on a quick review is that it's almost possible to skip all
the calls to gettimeofday if log_checkpoints is off now. I'd like to make
that a specific goal, because that will make me feel better that adding
this code has almost no performance impact relative to now unless you turn
the feature on.

I agree with Simon that tracking create/drop separately is unnecessary.
As for why all the timing info is in ms, given the scale of the numbers
typically encountered I found it easier to work with. I originally wanted
resolution down to 0.1ms if the underlying OS supports it, which means 4
figures to the right of the decimal point if the unit was switched to
seconds. Quite often the times reported are less than 100ms, so you'll
normally be dealing with fractional part of a second. If we take Heikki's
example:

LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms
sync=32.0 ms

And switch it to seconds:

LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms
sync=0.0320 ms

I don't find that as easy to work with. The only way a timing in seconds
would look OK is if the resolution of the whole thing is reduced to ms,
which then makes 3 decimal points--easy to read as ms instead. Having
stared at a fair amount of this data now, that's probably fine; I'll
collect up some more data on it from a fast server this week to confirm
whether's it's worthless precision or worth capturing.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-29 17:56:02
Message-ID: 465C6932.4070800@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Greg Smith wrote:
> I'll take another stab at refining this can of worms I opened. The one
> thing I noticed on a quick review is that it's almost possible to skip
> all the calls to gettimeofday if log_checkpoints is off now. I'd like
> to make that a specific goal, because that will make me feel better that
> adding this code has almost no performance impact relative to now unless
> you turn the feature on.

Saving a couple of gettimeofday calls on an event that happens as
infrequently as checkpoints is not going to make any difference.
Especially if you compare it to all the other work that's done on
checkpoint.

> I agree with Simon that tracking create/drop separately is unnecessary.
> As for why all the timing info is in ms, given the scale of the numbers
> typically encountered I found it easier to work with. I originally
> wanted resolution down to 0.1ms if the underlying OS supports it, which
> means 4 figures to the right of the decimal point if the unit was
> switched to seconds. Quite often the times reported are less than
> 100ms, so you'll normally be dealing with fractional part of a second.
> If we take Heikki's example:
>
> LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=96.8 ms
> sync=32.0 ms
>
> And switch it to seconds:
>
> LOG: checkpoint complete; buffers written=3.1 MB (9.6%) write=0.0968 ms
> sync=0.0320 ms
>
> I don't find that as easy to work with. The only way a timing in
> seconds would look OK is if the resolution of the whole thing is reduced
> to ms, which then makes 3 decimal points--easy to read as ms instead.
> Having stared at a fair amount of this data now, that's probably fine;
> I'll collect up some more data on it from a fast server this week to
> confirm whether's it's worthless precision or worth capturing.

The checkpoint will take at least a couple of seconds on any interesting
system, so 0.1 s resolution should be enough IMHO.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Logging checkpoints and other slowdown causes
Date: 2007-05-30 04:09:29
Message-ID: Pine.GSO.4.64.0705292337150.12775@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Tue, 29 May 2007, Heikki Linnakangas wrote:

> The checkpoint will take at least a couple of seconds on any interesting
> system, so 0.1 s resolution should be enough IMHO.

You may be underestimating the resources some interesting systems are
willing to put into speeding up checkpoints. I'm sometimes dumping into a
SAN whose cache is bigger than the shared_buffer cache in the server, and
0.1s isn't really enough resolution in that situation. A second is a
really long checkpoint there. Since even that's limited by fiber-channel
speeds, I know it's possible to do better than what I'm seeing with
something like a PCIe host adapter having on-board cache in the GB range
(which isn't that expensive nowadays).

Also, even if the checkpoint total takes seconds, much of that is in the
sync phase; the write time can still be in the small number of ms range,
and I wouldn't want to see that truncated too much.

Anyway, I have a bunch of data on this subject being collected at this
moment, and I'll rescale the results based on what I see after analyzing
that this week.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD