Quick Links

Vacuum rate limit in KBps

Lists:	pgsql-hackers

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Vacuum rate limit in KBps
Date:	2012-01-15 08:24:15
Message-ID:	4F128D2F.4070200@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

So far the reaction I've gotten from my recent submission to make
autovacuum log its read/write in MB/s has been rather positive. I've
been surprised at the unprecedented (to me at least) amount of
backporting onto big production systems it's gotten. There is a whole
lot of pent up frustration among larger installs over not having good
visibility into how changing cost-based vacuum parameters turns into
real-world units.

That got me thinking: if MB/s is what everyone wants to monitor, can we
provide a UI to set these parameters that way too? The attached patch
is a bit rough still, but it does that. The key was recognizing that
the cost delay plus cost limit can be converted into an upper limit on
cost units per second, presuming the writes themselves are free. If you
then also assume the worst case--that everything will end up dirty--by
throwing in the block size, too, you compute a maximum rate in MB/s.
That represents the fastest you can possibly write.

If you then turn that equation around, making the maximum write rate the
input, for any given cost delay and dirty page cost you can solve for
the cost limit--the parameter in fictitious units everyone hates. It
works like this, with the computation internals logged every time they
run for now:

#vacuum_cost_rate_limit = 4000 # maximum write rate in kilobytes/second
LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20

That's the same cost limit that was there before, except now it's
derived from that maximum write rate figure. vacuum_cost_limit is gone
as a GUC, replaced with this new vacuum_cost_rate_limit. Internally,
vacuum_cost_rate_limit hasn't gone anywhere though. All of the entry
points into vacuum and autovacuum derive an internal-only
VacuumCostLimit as part of any setup or rebalance operation. But
there's no change to underlying cost management code; the cost limit is
budgeted and accounted for in exactly the same way as it always was.

Why is this set in kilobytes/second rather than using something based on
a memory unit? That decision was made after noting these values can
also be set in relation options. Making relation options aware of
memory unit math seemed ambitious relative to its usefulness, and it's
not like KB/s is hard to work with in this context.

OK, I lied; technically this is set in kibibytes per second right now.
Ran out of energy before I got to confirming that was consistent with
all similar GUC settings, will put on my pedantic hat later to check that.

One nice thing that falls out of this is that the *vacuum_cost_delay
settings essentially turn into a boolean. If the delay is 0, cost
limits are off; set it to any other value, and the rate you get is
driven almost entirely by vacuum_cost_rate_limit (disclaimer mainly
because of issues like sleep time accuracy are possible). You can see
that at work in these examples:

LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20
LOG: cost limit=100 based on rate limit=4000 KB/s delay=10 dirty cost=20

LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20
LOG: cost limit=100 based on rate limit=2000 KB/s delay=20 dirty cost=20

Halve the delay to 10, and the cost limit drops in half too to keep the
same I/O rate. Halve the rate limit instead, and the cost limit halves
with it. Most sites will never need to change the delay figure from
20ms, they can just focus on tuning the more human-readable rate limit
figure instead. The main reason I thought of to keep the delay around
as an integer still is sites trying to minimize power use, they might
increase it from the normally used 20ms. I'm not as worried about
postgresql.conf settings bloat to support a valid edge use case, so long
as most sites find a setting unnecessary to tune. And the autovacuum
side of cost delay should fall into that category with this change.

Here's a full autovacuum log example. This shows how close to the KBps
rate the server actually got, along with the autovacuum cost balancing
working the same old way (this is after running the boring
autovac-big.sql test case attached here too):

2012-01-15 02:10:51.905 EST: LOG: cost limit=200 based on rate
limit=4000 KB/s delay=20 dirty cost=20
2012-01-15 02:10:51.906 EST: DEBUG: autovac_balance_cost(pid=13054
db=16384, rel=16444, cost_rate_limit=4000, cost_limit=200,
cost_limit_base=200, cost_delay=20)

2012-01-15 02:11:05.127 EST: DEBUG: "t": removed 4999999 row versions
in 22124 pages
2012-01-15 02:11:05.127 EST: DEBUG: "t": found 4999999 removable,
5000001 nonremovable row versions in 44248 out of 44248 pages
2012-01-15 02:11:05.127 EST: DETAIL: 0 dead row versions cannot be
removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.27s/0.97u sec elapsed 131.73 sec.
2012-01-15 02:11:05.127 EST: LOG: automatic vacuum of table
"gsmith.public.t": index scans: 0
pages: 0 removed, 44248 remain
tuples: 4999999 removed, 5000001 remain
buffer usage: 48253 hits, 40296 misses, 43869 dirtied
avg read rate: 2.390 MiB/s, avg write rate: 2.602 MiB/s
system usage: CPU 0.27s/0.97u sec elapsed 131.73 sec

I think this new setting suggests the recently adding logging is missing
a combined I/O figure, something that measures reads + writes over the
time period. This is good enough to demonstrate the sort of UI I was
aiming for in action though. Administrator says "don't write more than
4MiB/s", and when autovacuum kicks in it averages 2.4 read + 2.6 write.

I see this change as taking something that feels like black arts tuning
magic now and turning it into a simple interface that's for the most
part intuitive. None of the flexibility is lost here: you can still
retune the relative dirty vs. miss vs. hit costs, you have the option of
reducing the delay to a small value on a busy server where small sleep
values are possible. But you don't have to do any of that just to tweak
autovacuum up or down at a gross level; you can just turn the simple "at
most this much write I/O" knob instead.

All implementation notes and concerns from here down.

The original cost limit here defaulted to 200 and allowed a range of 1
to 10000. The new default of 4000 show these values need to be 20X as
large. The maximum was adjusted to 200000 KBps. Look at that, the
maximum rate you can run cost delay vacuum at is 200MB/s; there's
another good example of something that used to be mysterious to compute
that is obvious now.

I didn't adjust the lower limit downward, so it's actually possible to
set the new code to only operate at 1/200 the minimum speed you could
set before. On the balance this is surely a reduction in foot gun
aiming though, and I could make the minimum 200 to eliminate it. Seems
a needless detail to worry about.

This code is new and just complicated enough that there's surely some
edge cases I broke here. In particular I haven't put together a good
concurrent autovacuum test yet to really prove all the balancing logic
still works correctly. Need to test that with a settings change in the
middle of a long vacuum too.

There's one serious concern I don't have a quick answer to. What do we
do with in-place upgrade of relations that specified a custom
vacuum_cost_limit? I can easily chew on getting the right logic to
convert those to equals in the new setting style, but I am not prepared
to go solely on the hook for all in-place upgrade work one might do
here. Would this be easiest to handle as one of those dump/restore
transformations? My guess is that's more sensible than the alternative
of making an on-read converter that only writes in the new format, then
worrying about upgrading all old pages before moving forward. While
this could be an interesting small test case for that sort of thing, I'd
rather not be patient #1 for that part of the long-term in-place upgrade
path right now.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment	Content-Type	Size
vacuum_rate_limit-v1.patch	text/x-patch	22.8 KB
autovac-big.sql	text/x-sql	138 bytes

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-15 21:17:54
Message-ID:	4F134282.3070404@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.01.2012 10:24, Greg Smith wrote:
> That got me thinking: if MB/s is what everyone wants to monitor, can we
> provide a UI to set these parameters that way too? The attached patch is
> a bit rough still, but it does that. The key was recognizing that the
> cost delay plus cost limit can be converted into an upper limit on cost
> units per second, presuming the writes themselves are free. If you then
> also assume the worst case--that everything will end up dirty--by
> throwing in the block size, too, you compute a maximum rate in MB/s.
> That represents the fastest you can possibly write.

+1. I've been thinking we should do that for a long time, but haven't
gotten around to it.

I think it makes more sense to use the max read rate as the main knob,
rather than write rate. That's because the max read rate is higher than
the write rate, when you don't need to dirty pages. Or do you think
saturating the I/O system with writes is so much bigger a problem than
read I/O that it makes more sense to emphasize the writes?

I was thinking of something like this, in postgresql.conf:

# - Vacuum Throttling -

#vacuum_cost_page_miss = 1.0 # measured on an arbitrary scale
#vacuum_cost_page_dirty = 2.0 # same scale as above
#vacuum_cost_page_hit = 0.1 # same scale as above
#vacuum_rate_limit = 8MB # max reads per second

This is now similar to the cost settings for the planner, which is good.

> There's one serious concern I don't have a quick answer to. What do we
> do with in-place upgrade of relations that specified a custom
> vacuum_cost_limit? I can easily chew on getting the right logic to
> convert those to equals in the new setting style, but I am not prepared
> to go solely on the hook for all in-place upgrade work one might do
> here. Would this be easiest to handle as one of those dump/restore
> transformations?

It needs to be handled at dump/restore time. I'm not sure where that
transformation belongs to, though. Do we have any precedence for this? I
think we have two options:

1. Accept the old "autovacuum_cost_limit" setting in CREATE TABLE, and
transform it immediately into corresponding autovacuum_rate_limit setting.

2. Transform in pg_dump, so that the CREATE TABLE statements in the dump
use the new autovacuum_rate_limit setting.

The advantage of 1. option is that dumps taken with old 9.1 pg_dump
still work on a 9.2 server. We usually try to preserve that
backwards-compatibility, although we always recommend using the pg_dump
from the newer version on upgrade. However, you need to know the
vacuum_cost_page_miss setting effective in the old server to do the
transformation correctly (or vacuum_cost_page_dirty, if we use the write
max rate as the main knob as you suggested), and we don't have access
when restoring a dump.

> My guess is that's more sensible than the alternative
> of making an on-read converter that only writes in the new format, then
> worrying about upgrading all old pages before moving forward.

This requires any page format changes, so I don't think the above
sentence makes any sense.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-16 02:13:05
Message-ID:	4F1387B1.1090108@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/15/2012 04:17 PM, Heikki Linnakangas wrote:
> I think it makes more sense to use the max read rate as the main knob,
> rather than write rate. That's because the max read rate is higher
> than the write rate, when you don't need to dirty pages. Or do you
> think saturating the I/O system with writes is so much bigger a
> problem than read I/O that it makes more sense to emphasize the writes?

I haven't had the I/O rate logging available for long enough to have a
good feel for which is more important to emphasize. I'm agnostic on
this. I'd have no problem accepting the argument that exposing the
larger of the two rates--which is the read one--makes for a cleaner UI.
Or that it is the one more like other knobs setting precedents here.

My guess is that the changed documentation will actually be a bit
cleaner that way. I give an example in the patch of how read and write
rate are related, based on the ratio of the values for dirty vs. hit. I
wasn't perfectly happy with how that was written yet, and I think it
could be cleaner if the read rate is the primary tunable.

> We usually try to preserve that backwards-compatibility, although we
> always recommend using the pg_dump from the newer version on upgrade.
> However, you need to know the vacuum_cost_page_miss setting effective
> in the old server to do the transformation correctly (or
> vacuum_cost_page_dirty, if we use the write max rate as the main knob
> as you suggested), and we don't have access when restoring a dump.

If someone does a storage parameter change to
autovacuum_vacuum_cost_limit but doesn't touch
autovacuum_vacuum_cost_delay there, I think it's possible to need the
GUC value for autovacuum_vacuum_cost_delay too, which can then refer to
vacuum_cost_delay as well.

I don't think that tweaking these parameters, particularly at the
storage options level, is a popular thing to do. I've run into
customers who made changes there while trying to navigate the complexity
of autovacuum tuning, but not very many. My guess as I think about that
history is that I've ended up reverting them as often, or maybe even
slightly more often, than I've ended up keeping them around. It's been
hard to do well. And the level of PostgreSQL deployment that reaches
that stage of tuning, where they managed to tweak those productively,
doesn't seem likely to do a blind upgrade to me.

One of the reasons I thought now was a good time to work on this change
is because there's already things brewing that are going to make 9.2
break a few more things than anyone would like, all for long-term
positive benefits. recovery.conf and pg_stat_activity changes are the
two of those I've been tracking the closest.

My current thinking on this is that we ought to learn a lesson from the
8.3 casting breakage saga and provide a clear "9.2 Upgrade Migration
Guide" that goes beyond just listing what changed in the release notes.
Aim more toward having a checklist of things to look for and tools to
help find them. In this case, having the migration guide include a
query that pokes through the catalogs looking for this particular
customization would be helpful. It would be easy to produce a bit of
post-9.2 upgrade SQL that converted a customization here if you started
by running something against the existing installation.

And that should be a wiki page so it can be extended as new things are
discovered, some of which will include application specific guidance.
The rough idea in this direction I put together for 8.3 (it didn't catch
on for later versions) is at
http://wiki.postgresql.org/wiki/Version_History Note how that grew to
include tsearch2 migration info and MediaWiki specific advice at one
point (some of that was then lost in a Planet fire, but you can get the
idea just from the article titles). Needing a version migration guide
with specific details about issues like this is something I think
PostgreSQL needs to accept as part of the release cycle. Expecting that
pg_upgrade can transparently handle every possible change would be
setting a high bar to clear, higher than I think is expected by the
database industry at large.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-18 02:00:16
Message-ID:	5C4C2A81-9334-4069-A8BD-EA49C464B1B9@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 15, 2012, at 8:13 PM, Greg Smith wrote:
> On 01/15/2012 04:17 PM, Heikki Linnakangas wrote:
>> I think it makes more sense to use the max read rate as the main knob, rather than write rate. That's because the max read rate is higher than the write rate, when you don't need to dirty pages. Or do you think saturating the I/O system with writes is so much bigger a problem than read I/O that it makes more sense to emphasize the writes?
>
> I haven't had the I/O rate logging available for long enough to have a good feel for which is more important to emphasize. I'm agnostic on this. I'd have no problem accepting the argument that exposing the larger of the two rates--which is the read one--makes for a cleaner UI. Or that it is the one more like other knobs setting precedents here.

Could we expose both?

On our systems writes are extremely cheap... we don't do a ton of them (relatively speaking), so they tend to just fit into BBU cache. Reads on the other hard are a lot more expensive, at least if they end up actually hitting disk. So we actually set page_dirty and page_hit the same.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-18 09:49:26
Message-ID:	4F1695A6.5030509@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/17/2012 09:00 PM, Jim Nasby wrote:
> Could we expose both?
>
> On our systems writes are extremely cheap... we don't do a ton of them (relatively speaking), so they tend to just fit into BBU cache. Reads on the other hard are a lot more expensive, at least if they end up actually hitting disk. So we actually set page_dirty and page_hit the same.

My thinking had been that you set as the rate tunable, and then the
rates of the others can be adjusted by advanced users using the ratio
between the primary and the other ones. So at the defaults:

vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20

Setting a read rate cap will imply a write rate cap at 1/2 the value.
Your setup would then be:

vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 1

Which would still work fine if the new tunable was a read cap. If the
cap is a write one, though, this won't make any sense. It would allow
reads to happen at 10X the speed of writes, which is weird.

I need to go back and consider each of the corner cases here, where
someone wants one of [hit,miss,dirty] to be an unusual value relative to
the rest. If I can't come up with a way to make that work as it does
now in the new code, that's a problem. I don't think it really is, it's
just that people in that situation will need to all three upwards. It's
still a simpler thing to work out than the current situation, and this
is an unusual edge case.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-18 21:18:34
Message-ID:	74E74164-63C1-4B67-B304-73514537F4FE@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 18, 2012, at 3:49 AM, Greg Smith wrote:
> On 01/17/2012 09:00 PM, Jim Nasby wrote:
>> Could we expose both?
>>
>> On our systems writes are extremely cheap... we don't do a ton of them (relatively speaking), so they tend to just fit into BBU cache. Reads on the other hard are a lot more expensive, at least if they end up actually hitting disk. So we actually set page_dirty and page_hit the same.
>
> My thinking had been that you set as the rate tunable, and then the rates of the others can be adjusted by advanced users using the ratio between the primary and the other ones. So at the defaults:
>
> vacuum_cost_page_hit = 1
> vacuum_cost_page_miss = 10
> vacuum_cost_page_dirty = 20
>
> Setting a read rate cap will imply a write rate cap at 1/2 the value. Your setup would then be:
>
> vacuum_cost_page_hit = 1
> vacuum_cost_page_miss = 10
> vacuum_cost_page_dirty = 1
>
> Which would still work fine if the new tunable was a read cap. If the cap is a write one, though, this won't make any sense. It would allow reads to happen at 10X the speed of writes, which is weird.
>
> I need to go back and consider each of the corner cases here, where someone wants one of [hit,miss,dirty] to be an unusual value relative to the rest. If I can't come up with a way to make that work as it does now in the new code, that's a problem. I don't think it really is, it's just that people in that situation will need to all three upwards. It's still a simpler thing to work out than the current situation, and this is an unusual edge case.

What about doing away with all the arbitrary numbers completely, and just state data rate limits for hit/miss/dirty?

BTW, this is a case where it would be damn handy to know if the miss was really a miss or not... in the case where we're already rate limiting vacuum, could we afford the cost of get_time_of_day() to see if a miss actually did have to come from disk?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-19 18:10:39
Message-ID:	CA+Tgmobt02R45oCc5YCLkqhrhe+-C8rTgK-zx5F0wkR8pZxTKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 15, 2012 at 4:17 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> +1. I've been thinking we should do that for a long time, but haven't gotten
> around to it.
>
> I think it makes more sense to use the max read rate as the main knob,
> rather than write rate. That's because the max read rate is higher than the
> write rate, when you don't need to dirty pages. Or do you think saturating
> the I/O system with writes is so much bigger a problem than read I/O that it
> makes more sense to emphasize the writes?
>
> I was thinking of something like this, in postgresql.conf:
>
> # - Vacuum Throttling -
>
> #vacuum_cost_page_miss = 1.0 # measured on an arbitrary scale
> #vacuum_cost_page_dirty = 2.0 # same scale as above
> #vacuum_cost_page_hit = 0.1 # same scale as above
> #vacuum_rate_limit = 8MB # max reads per second
>
> This is now similar to the cost settings for the planner, which is good.

I have to say that I find that intensely counterintuitive. The
current settings are not entirely easy to tune correctly, but at least
they're easy to explain. What does that 8MB mean and how does it
relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss,
does that effectively also double the cost limit, so that dirty pages
and hits become relatively cheaper? If so, then I think what that
really means is that the limit is 8MB only if there are no hits and no
dirtied pages - otherwise it's less, and the amount by which it is
less is the result of some arcane calculation. Ugh!

I can really imagine people wanting to limit two things here: either
they want to limit the amount of read I/O, or they want to limit the
amount of write I/O. If your database fits in physical memory you
probably don't care about the cost of page misses very much at all,
but you probably do care about how much data you dirty. OTOH, if your
database doesn't fit in physical memory and you have a relatively
small percentage of dirty pages because the tables are lightly
updated, dirtying might be pretty secondary; if you care at all, it's
going to be because busying the disk head with large sequential reads
eats up too much of the system's I/O capacity. If we added
vacuum_read_rate_limit and vacuum_dirty_rate_limit, totally
independently of each other, and through the current system where
those two things get mixed together in one big bucket out the window
completely, I could maybe sign onto that as an improvement to the UI.

But even then, I think we need to balance the amount of the gain
against the backward compatibility problems we're going to create. If
we start removing autovacuum options, then, as Greg notes, we have to
figure out how to make old pg_dumps load into new databases and
hopefully do something close to what the DBA intended. And the DBA
will have to learn the new system. I'm not sure we're really going to
get enough mileage out changing this to justify the hassle. It's
basically a cosmetic improvement, and I think we should be careful
about breaking compatibility for cosmetic improvements, especially at
the end of a release cycle when we're under time pressure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-19 19:05:36
Message-ID:	CA+U5nML_cu_1xdka+Vi5oRHLxkRQL3Qez42c7hQg9LCoSPDDhg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 15, 2012 at 9:17 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I think it makes more sense to use the max read rate as the main knob,
> rather than write rate. That's because the max read rate is higher than the
> write rate, when you don't need to dirty pages. Or do you think saturating
> the I/O system with writes is so much bigger a problem than read I/O that it
> makes more sense to emphasize the writes?

Yes, the writes are more important of the two.

Too many writes at one time can overflow hardware caches, so things
tend to get much worse beyond a certain point.

Also, rate limiting writes means we rate limit WAL rate also which is
very important.

I'd like this to apply to large DDL, not just VACUUMs.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-19 19:15:12
Message-ID:	1327000472-sup-5934@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Simon Riggs's message of jue ene 19 16:05:36 -0300 2012:
> On Sun, Jan 15, 2012 at 9:17 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> > I think it makes more sense to use the max read rate as the main knob,
> > rather than write rate. That's because the max read rate is higher than the
> > write rate, when you don't need to dirty pages. Or do you think saturating
> > the I/O system with writes is so much bigger a problem than read I/O that it
> > makes more sense to emphasize the writes?
>
> Yes, the writes are more important of the two.
>
> Too many writes at one time can overflow hardware caches, so things
> tend to get much worse beyond a certain point.
>
> Also, rate limiting writes means we rate limit WAL rate also which is
> very important.
>
> I'd like this to apply to large DDL, not just VACUUMs.

More generally, this can sometimes be useful in general queries as well.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-19 22:23:52
Message-ID:	4F1897F8.103@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/18/12 4:18 PM, Jim Nasby wrote:
> What about doing away with all the arbitrary numbers completely, and just state data rate limits for hit/miss/dirty?

Since many workloads will have a mix of all three, it still seems like
there's some need for weighing these individually, even if they each got
their own rates. If someone says read=8MB/s and write=4MB/s (the
current effective defaults), I doubt they would be happy with seeing
12MB/s happen.

> BTW, this is a case where it would be damn handy to know if the miss was really a miss or not... in the case where we're already rate limiting vacuum, could we afford the cost of get_time_of_day() to see if a miss actually did have to come from disk?

We certainly might if it's a system where timing information is
reasonably cheap, and measuring that exact area will be easy if the
timing test contrib module submitted into this CF gets committed. I
could see using that to re-classify some misses as hits if the read
returns fast enough.

There's not an obvious way to draw that line though. The "fast=hit" vs.
"slow=miss" transition happens at very different place on SSD vs.
regular disks, as the simplest example. I don't see any way to wander
down this path that doesn't end up introducing multiple new GUCs, which
is the opposite of what I'd hoped to do--which was at worst to keep the
same number, but reduce how many were likely to be touched.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-19 22:39:41
Message-ID:	4F189BAD.60300@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/19/12 1:10 PM, Robert Haas wrote:
> I have to say that I find that intensely counterintuitive. The
> current settings are not entirely easy to tune correctly, but at least
> they're easy to explain.

I attempt to explain those settings to people in training classes about
once a month. It's never been anything but a complete disaster. I am
barely concerned about preserving the current UI because, as far as I've
been able to tell, there are only a handful of PostgreSQL installatinos
on the planet that have managed to use it happily. Even the ones that
do have a non-default setup that works usually flailed about for some
time until they get something that works, over a few frustrating months.
And the result are settings few dare touch for fear of breaking it.

It's also worth pointing out that VACUUM problems are very close to the
top of the list of problems larger sites run into. So right now we have
an inscrutable UI around an often essential part of the database to
tune, one that any production site that gets over a few hundred GB of
data in it will run into problems with. I wouldn't care about this area
if it weren't for people screaming about how bad it is every time the
topic comes up.

If there's anyone out there who has run a larger PostgreSQL database and
not at some point been extremely frustrated with how the current VACUUM
settings are controlled, please speak up and say I'm wrong about this.
I thought it was well understood the UI was near unusably bad, it just
wasn't obvious what to do about it.

> What does that 8MB mean and how does it
> relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss,
> does that effectively also double the cost limit, so that dirty pages
> and hits become relatively cheaper? If so, then I think what that
> really means is that the limit is 8MB only if there are no hits and no
> dirtied pages - otherwise it's less, and the amount by which it is
> less is the result of some arcane calculation. Ugh!

Saying what I suggested is an arcane calculation strikes me as pretty
weird--we'd be hard pressed to design a more arcane calculation than the
one that's already happening.

The feedback here so far seems to lead toward making independent read
and write knobs. I'm going to chew on the scenarios Robert described
and the ones Jim has been commenting on and see if I can refactor this
into something friendlier that addresses them.

As for the suggestion that I'm bringing this up a bit late in the
release cycle, I've been trying. My first submission pushing in this
direction--improving the logging first, which is needed before you can
usefully measure a behavior change--happened back in September. I've
been moving this area as fast as I can get it to budge. I'm concerned
now that much will be made of improved performance in 9.2, leading to
people converting even larger systems than they used to. And it's not
hard at all to find a large system where inability to tune vacuum easily
is the top limiting factor on overall performance.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-20 02:42:52
Message-ID:	CA+Tgmob9qYfMva79EugTKNCKnVDQjgGQvyovaWHWnN7Zr2SGYA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 19, 2012 at 5:39 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 1/19/12 1:10 PM, Robert Haas wrote:
>> I have to say that I find that intensely counterintuitive. The
>> current settings are not entirely easy to tune correctly, but at least
>> they're easy to explain.
>
> I attempt to explain those settings to people in training classes about once
> a month. It's never been anything but a complete disaster. I am barely
> concerned about preserving the current UI because, as far as I've been able
> to tell, there are only a handful of PostgreSQL installatinos on the planet
> that have managed to use it happily. Even the ones that do have a
> non-default setup that works usually flailed about for some time until they
> get something that works, over a few frustrating months. And the result are
> settings few dare touch for fear of breaking it.
>
> It's also worth pointing out that VACUUM problems are very close to the top
> of the list of problems larger sites run into. So right now we have an
> inscrutable UI around an often essential part of the database to tune, one
> that any production site that gets over a few hundred GB of data in it will
> run into problems with. I wouldn't care about this area if it weren't for
> people screaming about how bad it is every time the topic comes up.
>
> If there's anyone out there who has run a larger PostgreSQL database and not
> at some point been extremely frustrated with how the current VACUUM settings
> are controlled, please speak up and say I'm wrong about this. I thought it
> was well understood the UI was near unusably bad, it just wasn't obvious
> what to do about it.
>
>> What does that 8MB mean and how does it
>> relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss,
>> does that effectively also double the cost limit, so that dirty pages
>> and hits become relatively cheaper? If so, then I think what that
>> really means is that the limit is 8MB only if there are no hits and no
>> dirtied pages - otherwise it's less, and the amount by which it is
>> less is the result of some arcane calculation. Ugh!
>
> Saying what I suggested is an arcane calculation strikes me as pretty
> weird--we'd be hard pressed to design a more arcane calculation than the one
> that's already happening.

Perhaps so, but I'm willing to bet that if we have a variable that
looks like a pure read limit or a pure dirty limit and really is not,
we'll have succeeded. :-)

> The feedback here so far seems to lead toward making independent read and
> write knobs. I'm going to chew on the scenarios Robert described and the
> ones Jim has been commenting on and see if I can refactor this into
> something friendlier that addresses them.
>
> As for the suggestion that I'm bringing this up a bit late in the release
> cycle, I've been trying. My first submission pushing in this
> direction--improving the logging first, which is needed before you can
> usefully measure a behavior change--happened back in September. I've been
> moving this area as fast as I can get it to budge. I'm concerned now that
> much will be made of improved performance in 9.2, leading to people
> converting even larger systems than they used to. And it's not hard at all
> to find a large system where inability to tune vacuum easily is the top
> limiting factor on overall performance.

I certainly didn't intend to come across as disparaging your work on
this topic. I understand that there are big problems with the way
things work now; I'm just cautious about trying to replace them too
hastily with something that may not turn out to be any better. Of
course, if we can replace it with something that we're sure is
actually an improvement, I'm all in favor of that. But, IMHO, the
problems in this area are too serious to be solved by renaming the
knobs. At most, we're going to buy ourselves a little time to come up
with a better solution.

IMHO, and at the risk of repeating myself, one of the big problems in
this area is that we're making the user guess something that we really
ought to be figuring out for them. Just as users want checkpoints to
run as slowly as possible while still not bumping into the next
checkpoint, they'd presumably like vacuum to run as slowly as possible
without bumping into the next vacuum. Instead, we make them tell us
how fast they'd like it tor run, which requires them to guess a value
high enough to finish soon enough but low enough to minimize the
impact on the rest of the system.

Another problem is that the vacuum algorithm itself could, I think, be
made much smarter. We could teach HOT to prune pages that contain no
HOT chains but do contain dead tuples. That would leave dead line
pointers behind, but that's not nearly as bad as leaving the entire
tuple behind. We could, as Simon and others have suggested, have one
threshold for vacuuming the heap (i.e. reclaiming dead tuples) and
another for vacuuming the indexes (i.e. reclaiming dead line
pointers). That would open the door to partial vacuuming: just vacuum
half a gigabyte or so of the heap, and then move on; the next vacuum
can pick up where that one left off, at least up to the point where we
decide we need to make an index pass; it would possibly also allow us
to permit more than one vacuum on the same table at the same time,
which is probably needed for very large tables. We could have
backends that see dead tuples on a page throw them over to the fence
to the background writer for immediate pruning. I blather, but I
guess my point is that I really hope we're going to do something
deeper here at some point in the near future, whatever becomes of the
proposals now on the table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-20 04:29:03
Message-ID:	4F18ED8F.2020809@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I chewed a bit on Heikki's comment that similarity to the query planning
parameters might be useful, and Robert's that being able to explain how
the feature works more easily has value. I have an initial adjustment
of my general idea that I think moves usefully in both those directions.

The existing VACUUM cost constants look like this:

vacuum_cost_page_hit = 1
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20

These could be adjusted to instead be ratios like the query planner ones
(seq_page_cost, random_page_cost, etc.), referenced off a value of 1.0
for page miss ~= a read is expected:

vacuum_cost_page_hit = 0.1
vacuum_cost_page_miss = 1.0
vacuum_cost_page_dirty = 2.0

Now add in the new setting, which is explicitly said to be the read value:

vacuum_cost_read_limit = 8000 # maximum page miss read rate in
kilobytes/second

And I can shuffle the numbers around internally such that things still
work exactly the same, at the default parameters. And then anyone who
spends time learning how either the query planning or vacuum cost ratio
constants work will find the learning curve to pick up the other set easier.

An interesting fall-out of this refactoring is that old postgresql.conf
settings moved forward for *all* these values will still work fine. The
ratios are right and the internal computation won't care. The math is
just more complicated to explain when vacuum_cost_page_miss is anything
but 1.0, which is a problem the manual doesn't have to address. We
don't worry about making every query planner parameter discussion
consider what happens if someone moves seq_page_cost around, this will
put vacuum_cost_page_miss into the same reference constant category.
The only problem is for someone who changed one but not all of them in
their old configuration; that's going to give an unexpected result.

It might be a bit more straightforward yet if things were renamed so it
was more obvious that page miss~=read, but I haven't seen a good way to
do that yet. Renaming the reference cost value to vacuum_cost_page_read
has two problems. It makes the backward compatibility issues larger,
and it's not quite true. The way I think this should be explained, they
really aren't the same; that's why I used ~= above. A page miss is not
guaranteed to be a read, it just is expected to be one in the worst
case. The read rate that vacuum page misses introduce will not be
exactly the same as vacuum_cost_read_limit--but it will be below that
limit, which is all it claims to be.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-20 15:37:21
Message-ID:	CA+TgmoZnzvmyBAzz2-BNYt+xPetH-gxYsBU7SoN_2h4KjdhbBg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I chewed a bit on Heikki's comment that similarity to the query planning
> parameters might be useful, and Robert's that being able to explain how the
> feature works more easily has value. I have an initial adjustment of my
> general idea that I think moves usefully in both those directions.
>
> The existing VACUUM cost constants look like this:
>
> vacuum_cost_page_hit = 1
> vacuum_cost_page_miss = 10
> vacuum_cost_page_dirty = 20
>
> These could be adjusted to instead be ratios like the query planner ones
> (seq_page_cost, random_page_cost, etc.), referenced off a value of 1.0 for
> page miss ~= a read is expected:
>
> vacuum_cost_page_hit = 0.1
> vacuum_cost_page_miss = 1.0
> vacuum_cost_page_dirty = 2.0
>
> Now add in the new setting, which is explicitly said to be the read value:
>
> vacuum_cost_read_limit = 8000 # maximum page miss read rate in
> kilobytes/second
>
> And I can shuffle the numbers around internally such that things still work
> exactly the same, at the default parameters. And then anyone who spends
> time learning how either the query planning or vacuum cost ratio constants
> work will find the learning curve to pick up the other set easier.

That may be a little better, but I still don't think it's worth
breaking backward compatibility for. I mean, suppose I don't care
about read rate, but I want to limit my dirty data rate to 1MB/s.
What parameters should I set?

> It might be a bit more straightforward yet if things were renamed so it was
> more obvious that page miss~=read, but I haven't seen a good way to do that
> yet. Renaming the reference cost value to vacuum_cost_page_read has two
> problems. It makes the backward compatibility issues larger, and it's not
> quite true. The way I think this should be explained, they really aren't
> the same; that's why I used ~= above. A page miss is not guaranteed to be a
> read, it just is expected to be one in the worst case. The read rate that
> vacuum page misses introduce will not be exactly the same as
> vacuum_cost_read_limit--but it will be below that limit, which is all it
> claims to be.

Maybe, but I still think having the read rate limit the dirty rate or
visca versa is *really* weird.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-20 17:35:38
Message-ID:	4F19A5EA.9050100@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/20/2012 10:37 AM, Robert Haas wrote:
> On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith<greg(at)2ndquadrant(dot)com> wrote:
>> vacuum_cost_page_hit = 0.1
>> vacuum_cost_page_miss = 1.0
>> vacuum_cost_page_dirty = 2.0
>>
>> Now add in the new setting, which is explicitly said to be the read value:
>>
>> vacuum_cost_read_limit = 8000 # maximum page miss read rate in
>> kilobytes/second
> That may be a little better, but I still don't think it's worth
> breaking backward compatibility for. I mean, suppose I don't care
> about read rate, but I want to limit my dirty data rate to 1MB/s.
> What parameters should I set?

vacuum_cost_page_dirty = 8.0

The resulting maximum rates will then be:

hit = 80MB/s
miss = 8MB/s
dirty = 1MB/s

The question you should ask yourself next is "how do I limit my dirty data rate to 1MB/s in 9.1?" Working that out by hand is a good exercise, to show just how much less complicated this proposal is over the current state of things. Show me how it's possible to do that in way we can expect new DBAs to follow, then the idea of keeping strong backwards compatibility here would have some weight. I see sticking too closely to the current scheme as being more bug-level compatibility; it's fundamentally broken, by being too difficult to use, to most people in its current form.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-20 18:58:52
Message-ID:	CA+TgmoaKV4TC_NGFo3tNVuAdr9ZiYmF+5v6Tpf9wjS4jdu-XiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 20, 2012 at 12:35 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 01/20/2012 10:37 AM, Robert Haas wrote:
>>
>> On Thu, Jan 19, 2012 at 11:29 PM, Greg Smith<greg(at)2ndquadrant(dot)com> wrote:
>>>
>>> vacuum_cost_page_hit = 0.1
>>>
>>> vacuum_cost_page_miss = 1.0
>>> vacuum_cost_page_dirty = 2.0
>>>
>>> Now add in the new setting, which is explicitly said to be the read
>>> value:
>>>
>>> vacuum_cost_read_limit = 8000 # maximum page miss read rate in
>>> kilobytes/second
>>
>> That may be a little better, but I still don't think it's worth
>>
>> breaking backward compatibility for. I mean, suppose I don't care
>> about read rate, but I want to limit my dirty data rate to 1MB/s.
>> What parameters should I set?
>
> vacuum_cost_page_dirty = 8.0
>
> The resulting maximum rates will then be:
>
> hit = 80MB/s
> miss = 8MB/s
> dirty = 1MB/s

That is, of course, not quite what I asked for. In fact it's likely
that my actual rate will be less than 1MB/s, because there will
probably be a miss for ever dirty. It will probably be about 8/9ths
of a MB/s.

> The question you should ask yourself next is "how do I limit my dirty data
> rate to 1MB/s in 9.1?" Working that out by hand is a good exercise, to show
> just how much less complicated this proposal is over the current state of
> things.

OK, sure. Our block size is 8kB, so we need every 128 blocks to
involve 1000 ms of delay. Obviously there are many combinations of
parameters that will make that work, but here's one: delay for 125
milliseconds after each 16 blocks:

vacuum_cost_page_hit = 0
vacuum_cost_page_miss = 0
vacuum_cost_page_dirty = 1
vacuum_cost_limit = 16
autovacuum_vacuum_cost_delay = 125ms

Maybe that strikes you as worse than what you're proposing; it strikes
me as better. Either way I think it's not going to be a good day for
people who are bad at math. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-21 22:54:07
Message-ID:	3CCA38FD-46EC-4B54-97CE-26A1CCBC2AA3@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 19, 2012, at 4:23 PM, Greg Smith wrote:
> On 1/18/12 4:18 PM, Jim Nasby wrote:
>> What about doing away with all the arbitrary numbers completely, and just state data rate limits for hit/miss/dirty?
>
> Since many workloads will have a mix of all three, it still seems like there's some need for weighing these individually, even if they each got their own rates. If someone says read=8MB/s and write=4MB/s (the current effective defaults), I doubt they would be happy with seeing 12MB/s happen.
>
>> BTW, this is a case where it would be damn handy to know if the miss was really a miss or not... in the case where we're already rate limiting vacuum, could we afford the cost of get_time_of_day() to see if a miss actually did have to come from disk?
>
> We certainly might if it's a system where timing information is reasonably cheap, and measuring that exact area will be easy if the timing test contrib module submitted into this CF gets committed. I could see using that to re-classify some misses as hits if the read returns fast enough.
>
> There's not an obvious way to draw that line though. The "fast=hit" vs. "slow=miss" transition happens at very different place on SSD vs. regular disks, as the simplest example. I don't see any way to wander down this path that doesn't end up introducing multiple new GUCs, which is the opposite of what I'd hoped to do--which was at worst to keep the same number, but reduce how many were likely to be touched.

Your two comments together made me realize something... at the end of the day people don't care about MB/s. They care about impact to other read and write activity in the database.

What would be interesting is if we could monitor how long all *foreground* IO requests took. If they start exceeding some number, that means the system is at or near full capacity, and we'd like background stuff to slow down.

Dealing with SSDs vs real media would be a bit challenging... though, I think it would only be an issue if the two were randomly mixed together. Kept separately I would expect them to have distinct behavior patterns that could be measured and identified.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-22 23:47:17
Message-ID:	4F1CA005.1040001@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby wrote:
> Your two comments together made me realize something... at the end of the day people don't care about MB/s. They care about impact to other read and write activity in the database.
>
> What would be interesting is if we could monitor how long all *foreground* IO requests took. If they start exceeding some number, that means the system is at or near full capacity, and we'd like background stuff to slow down.
>

My hope for 9.2 was to get VACUUM moved over into some human-readable
units. Having the whole thing work only via these abstract cost units
is driving most of my customers with larger databases crazy. The patch
I suggested was the easiest refactoring I thought moved in the right
direction. While it may not be the perfect thing to care about, the
very positive reaction I've gotten to the already landed patch to log in
MB/s has suggested to me people are a lot more comfortable with that
than the cost limit numbers.

For 9.3, this whole mess needs to become integrated with a full-system
monitoring approach, to really solve this well. pg_stat_bgwriter knows
how many writes are coming from the various parts of the system, the
total amount of write I/O. Given that, I can turn VACUUM completely
dynamic based on what else is happening in many common situations. The
sort of end goal I was thinking about was be able to say something like
"let VACUUM use up to 4MB/s on writes, but subtract off the average
write level of everything else". Now it's a background process running
only when there's capacity to spare for it. You could turn it up a lot
higher, if you knew it was only going to run at that level when the
system wasn't as busy. That's one reason I started by suggesting a
write-based limit; it fit into that longer-range plan better. Maybe
that idea is junk and focusing on actual read I/O is the real problem
with VACUUM for most people. I can tell you once I get more data out of
systems that are logging in MB/s.

If instead or in addition we get some better field data on systems that
can afford to time a lot more things, and then start building feedback
limiters based on how long all sorts of operations take to occur, that's
a whole different parallel approach for auto-tuning this. I haven't
thought about that as much simply because it only just became clear
recently when the timing data is cheap to collect. I need to get a lot
more production server data about that overhead to work with here too.

> Dealing with SSDs vs real media would be a bit challenging... though, I think it would only be an issue if the two were randomly mixed together. Kept separately I would expect them to have distinct behavior patterns that could be measured and identified

This might just turn into another one of those things where we will
eventually need to have some more information on a per-tablespace
basis. I envision allowing the server to collect more timing data as
being something you can turn on for a bit, let it populate statistics
about just what fast or slow means for each tablespace. Then you can
keep those results around to guide future decisions even after timing is
turned off. Maybe toggle it back on a day a month to make sure the
numbers are still sane, if it's too expensive to time things every day.

From:	Christopher Browne <cbbrowne(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-23 05:29:27
Message-ID:	CAFNqd5Xw2c=YRCpaVwHu9wtyVLpS7r10ikO+ihxipS51=K=2Pg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 21, 2012 at 5:54 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> What would be interesting is if we could monitor how long all *foreground* IO requests took. If they start exceeding some number, that means the system is at or near full capacity, and we'd like background stuff to slow down.

There's something to that...

On occasion, we've had a bit of "hue and cry" when sysadmins noticed
that nearly 100% of bandwidth was being chewed up by VACUUM.

A closer look showed that there wasn't a *true* problem, as there
wasn't anything else material looking for I/O. Letting VACUUM have
all the bandwidth it wants when the system isn't otherwise busy is a
pretty fine idea.

At such times, limiting VACUUM is counterproductive; it would be
better to let it go through and clean up as many tables in as short a
time as possible.

However, as soon as the number of processes waiting for I/O starts to
grow, you want VACUUM to back off.

That's an entirely more "dynamic" kind of dynamic than the suggested
parameterization indicates. There's enough there that I'm not sure
how yet to measure that it may be necessary to start with the more
static indicator of "maximum acceptable bandwidth usage."

As long as the parameters can be changed without requiring a
postmaster restart, it should be possible to do something more
adaptive if and when measurements emerge.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

From:	Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To:	Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-23 08:21:48
Message-ID:	20120123082147.GS27895@ldn-qws-004.delacy.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19/01/12 17:39, Greg Smith wrote:
> On 1/19/12 1:10 PM, Robert Haas wrote:
> >I have to say that I find that intensely counterintuitive. The
> >current settings are not entirely easy to tune correctly, but at least
> >they're easy to explain.
>
> If there's anyone out there who has run a larger PostgreSQL database
> and not at some point been extremely frustrated with how the current
> VACUUM settings are controlled, please speak up and say I'm wrong
> about this. I thought it was well understood the UI was near unusably
> bad, it just wasn't obvious what to do about it.
>
We are frustrated but mostly our frustration is not about the
somewhat inscrutable knobs but the inscrutable meters or lack
there of.

Postgres (auto or manual for that matter) vacuuming and analyzing
is essentially a performance tuning problem without a good way to
measure the current performance, the fact that the knobs to turn
are confusing as well is secondary.

What I think is missing is a clear way to know if you are vacuuming
(and analyzing) enough, and how much you are paying for that.

At the moment we are basically changing the knobs blindly based on
some back of the envelope calculations and hearsay. Than sometimes
month later we find out that eps we haven't been analyzing enough
and that's why on that particular table the planner is now picking
a "bad" query.

What I want is that page

http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html

to start with "Here is how you know if you are vacuuming enough..."

In an ideal world one would like some meter in a statistics table
or similar that returns a percentage 100% means just enough 50%
means you have to double 150% means 50% too much (e.g. wasted)...
But I could do with a boolean as well. A complicated extension
and the recommendation to install 3 different extensions would
be better than what is there right now but only very barely. Of
course a meter wouldn't tell you that if traffic doubled you would
still keep up and for that you need a complicated calculation or
(you just keep looking at the meter and adjust).

But at the moment there is no such meter (at least I don't know
of it) and that is the actual problem.

My 2cents,

Bene

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-24 05:09:47
Message-ID:	4F1E3D1B.9050801@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Benedikt Grundmann wrote:
> What I think is missing is a clear way to know if you are vacuuming
> (and analyzing) enough, and how much you are paying for that.
>

Any good way to measure if you're vacuuming a particular table enough
needs to note how much free space is in that table and its indexes over
time. That's why one of the other building blocks in the submission
queue for 9.2 is a function to make that easier to do. It's one of the
important dependencies to get settled before we can move very far toward
answering "am I vacuuming enough?".

A second piece to that is recording a history of that information over
time. Can't predict the future need for something without some record
of its past to extrapolate from. That's probably a job better suited
for an external tool. The way you'd want to audit it most easily is to
graph it over time, which isn't the sort of thing PostgreSQL is likely
to build in. Also, the proof of whether a suggested implementation for
a vacuum "meter" was useful or not would be easiest to validate that
way. No sense in doing the difficult work of building one until there's
a working prototype, which is possible to do more quickly in languages
other than C.

A simple meter might not be possible to create even with some better
building blocks to base it on. There is a lot of difference in this
area that is workload dependent, and there are many types of database
workloads out there. The two hardest systems to tune vacuum for that I
work on have settled on completely different approaches to the problem.
The only thing I've found so far that is true about both of them is that
they'd really appreciate easier controls on the maximum rate.

> At the moment we are basically changing the knobs blindly based on
> some back of the envelope calculations and hearsay. Than sometimes
> month later we find out that eps we haven't been analyzing enough
> and that's why on that particular table the planner is now picking
> a "bad" query.
>

Unlike VACUUM, ANALYZE is so cheap to run that it's possible to improve
this situation more easily--just do it a lot more. Lowering
autovacuum_analyze_scale_factor is the easiest way. By default that is
0.10, requiring approximately a 10% change in the table size before a
new ANALYZE is done. I think the lowest production setting I have for
that somewhere is 0.03 on a roughly terabyte scale database. There a
10% change in one the larger tables is well over the point of impacting
query plans badly.

If your data changes its character quite frequently based on new
information, I wouldn't be afraid in that case to drop as low as 0.01
here. That would give you ANALYZE that happened 10X as often as it does
now. You'll waste a moderate amount of CPU and disk resources, but a
tuning error that leans toward analyzing too frequently isn't that
expensive.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Robert Treat <rob(at)xzilla(dot)net>
To:	Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-24 06:16:33
Message-ID:	CABV9wwPoxSTrmpXYR8kO0yMuAC9n8PqMDU8X2P2zV7noUm6vPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 23, 2012 at 3:21 AM, Benedikt Grundmann
<bgrundmann(at)janestreet(dot)com> wrote:
> On 19/01/12 17:39, Greg Smith wrote:
>> On 1/19/12 1:10 PM, Robert Haas wrote:
>> >I have to say that I find that intensely counterintuitive. The
>> >current settings are not entirely easy to tune correctly, but at least
>> >they're easy to explain.
>>
>> If there's anyone out there who has run a larger PostgreSQL database
>> and not at some point been extremely frustrated with how the current
>> VACUUM settings are controlled, please speak up and say I'm wrong
>> about this. I thought it was well understood the UI was near unusably
>> bad, it just wasn't obvious what to do about it.
>>
> We are frustrated but mostly our frustration is not about the
> somewhat inscrutable knobs but the inscrutable meters or lack
> there of.
>

I keep thinking Greg has mistaken happiness with the MB based info in
the vacuum patch as being happy without the output format, when really
it is all about increased visibility. (For the record, we've
backpatched that initial change to a large number of our customers,
just cause we're a bit zealous about monitoring).

> Postgres (auto or manual for that matter) vacuuming and analyzing
> is essentially a performance tuning problem without a good way to
> measure the current performance, the fact that the knobs to turn
> are confusing as well is secondary.
>
> What I think is missing is a clear way to know if you are vacuuming
> (and analyzing) enough, and how much you are paying for that.
>
> At the moment we are basically changing the knobs blindly based on
> some back of the envelope calculations and hearsay. Than sometimes
> month later we find out that eps we haven't been analyzing enough
> and that's why on that particular table the planner is now picking
> a "bad" query.
>

Hmm, I've always thought the answer here is just a systematic approach
to operations. We monitor free space across the system (along with a
bunch of other stuff) so that we know when we're not vacuuming /
analyzing enough.

> What I want is that page
>
> http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html
>
> to start with "Here is how you know if you are vacuuming enough..."
>
> In an ideal world one would like some meter in a statistics table
> or similar that returns a percentage 100% means just enough 50%
> means you have to double 150% means 50% too much (e.g. wasted)...
> But I could do with a boolean as well. A complicated extension
> and the recommendation to install 3 different extensions would
> be better than what is there right now but only very barely. Of
> course a meter wouldn't tell you that if traffic doubled you would
> still keep up and for that you need a complicated calculation or
> (you just keep looking at the meter and adjust).
>
> But at the moment there is no such meter (at least I don't know
> of it) and that is the actual problem.
>

These pieces are out there. I guess I'd say they are crude, but you
can get a handle on it. Of course, if your problem is with analyze,
that's cheap enough that you should probably just do it more. We're
probably a lot more agressive on our vacuum / analyze scale settings
than some people (we cut the defaults in half as a matter of course),
and I come from the "don't limit stuff" camp too, but by and large
what we do works, even if it's more black magic than people would
like. :-)

Robert Treat
conjecture: xzilla.net
consulting: omniti.com

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Robert Treat <rob(at)xzilla(dot)net>
Cc:	Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-26 01:17:28
Message-ID:	4F20A9A8.7000106@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/23/2012 11:16 PM, Robert Treat wrote:
> I keep thinking Greg has mistaken happiness with the MB based info in
> the vacuum patch as being happy without the output format, when really
> it is all about increased visibility.

I haven't taken that as anything but evidence I'm at least moving in the
right direction. I'm relatively systematic about how I approach these
things nowadays: figure out what isn't being logged/accounted for yet,
add visibility to that thing, iterate on improving its behavior as
measured by that, then make the best sorts of behavior changes
automatic. This suggested feature change, moving around what I see as
the worst part of the tuning knob UI, is an initial attempt along step 3
in that path. There's still plenty of ongoing work adding more
visibility too.

To more quickly summarize the point I was trying to make, providing a
meter that trivially shows you good vs. bad is the almost same problem
as making it fully automatic. If you can measure exactly when something
needs to happen and how badly screwed up it is, that's the hard part of
knowing when to just go fix it. Right now, the autovacuum measure is
based on the percent of change to something. I don't think any improved
vacuum triggering will go somewhere useful unless it starts with a
better measure of how real-world bloat accumulates, which you cannot
extract from the data collected yet. The free space measurement thread
and the ideas that have mainly been bouncing between Jaime and Noah
recently on this subject are directly addressing that, the part that
I've found to be the single most useful additional thing to monitor. It
may not be obvious that's providing information that can be consumed by
autovacuum, but that was my intention for how these pieces will
ultimately fit together.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-01-29 23:29:42
Message-ID:	CAMkU=1yM2WxUx8TZe=2ZG-wfFwMHCh-OmeV3ZQOEBuymH4i06Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 15, 2012 at 12:24 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> If you then turn that equation around, making the maximum write rate the
> input, for any given cost delay and dirty page cost you can solve for the
> cost limit--the parameter in fictitious units everyone hates. It works like
> this, with the computation internals logged every time they run for now:
>
> #vacuum_cost_rate_limit = 4000 # maximum write rate in kilobytes/second
> LOG: cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20

The computation seems to be suffering from some kind of overflow:

cost limit=50 based on rate limit=1000 KB/s delay=20 dirty cost=20
cost limit=100 based on rate limit=2000 KB/s delay=20 dirty cost=20
cost limit=150 based on rate limit=3000 KB/s delay=20 dirty cost=20
cost limit=200 based on rate limit=4000 KB/s delay=20 dirty cost=20
cost limit=250 based on rate limit=5000 KB/s delay=20 dirty cost=20
cost limit=1 based on rate limit=6000 KB/s delay=20 dirty cost=20
cost limit=1 based on rate limit=7000 KB/s delay=20 dirty cost=20
cost limit=1 based on rate limit=8000 KB/s delay=20 dirty cost=20

Cheers,

Jeff

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-02 16:32:28
Message-ID:	CAMkU=1zCnqkp1oZqhXR-YQ+HUkTApW-=zR1Y4psbcUOFt7Y5mg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 15, 2012 at 12:24 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> So far the reaction I've gotten from my recent submission to make autovacuum
> log its read/write in MB/s has been rather positive. I've been surprised at
> the unprecedented (to me at least) amount of backporting onto big production
> systems it's gotten. There is a whole lot of pent up frustration among
> larger installs over not having good visibility into how changing cost-based
> vacuum parameters turns into real-world units.
>
> That got me thinking: if MB/s is what everyone wants to monitor, can we
> provide a UI to set these parameters that way too? The attached patch is a
> bit rough still, but it does that. The key was recognizing that the cost
> delay plus cost limit can be converted into an upper limit on cost units per
> second, presuming the writes themselves are free. If you then also assume
> the worst case--that everything will end up dirty--by throwing in the block
> size, too, you compute a maximum rate in MB/s. That represents the fastest
> you can possibly write.

Since this is mostly a usability patch, I was looking at the
documentation, trying to pretend I'm a end user who hasn't seen the
sausage being made.

I think the doc changes are too conservative. "When using cost-based
vacuuming" should be something like "When using rate-limited
vacuuming".

What does the "cost" mean in the variable "vacuum_cost_rate_limit"?
It seems to be genuflection to the past--the past we think is too
confusing. Since this is the primary knob the end user is expected to
use, the fact that we use these "cost things" to implement
rate-limited vacuuming is a detail that should not be reflected in the
variable name, so "vacuum_rate_limit" seems better. Leave the cost
stuff to the advanced users who want to read beyond the primary knob.

Whether I want to rate-limit the vacuum at all should be determined by
vacuum_rate_limit instead of by setting vacuum_cost_delay.
vacuum_rate_limit=0 should mean unlimited. I think it is pretty
intuitive that, in cases where a literal 0 makes no sense, then 0
really means infinity, and that convention is used in other places.

I think it is confusing to have more variables than there are degrees
of freedom. If we want one knob which is largely writes but mixes in
some reads and simple page visits as well, then I think
vacuum_cost_page_dirty should go away (effectively be fixed at 1.0),
and the vacuum_cost_page_miss would default to 0.5 and
vacuum_cost_page_hit default to 0.05.

Also, in the current patch, in addition to the overflow at high rate
limits, there is an rounding-to-zero at low rate limits that leads to
floating point exceptions.

PST:LOG: cost limit=0 based on rate limit=10 KB/s delay=20 dirty cost=20
PST:STATEMENT: VACUUM VERBOSE t;
PST:ERROR: floating-point exception
PST:DETAIL: An invalid floating-point operation was signaled. This
probably means an out-of-range result or an invalid operation, such as
division by zero.
PST:STATEMENT: VACUUM VERBOSE t;

Cheers,

Jeff

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-08 03:38:13
Message-ID:	20120208033813.GA24440@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 19, 2012 at 05:39:41PM -0500, Greg Smith wrote:
> On 1/19/12 1:10 PM, Robert Haas wrote:
> >I have to say that I find that intensely counterintuitive. The
> >current settings are not entirely easy to tune correctly, but at least
> >they're easy to explain.
>
> I attempt to explain those settings to people in training classes
> about once a month. It's never been anything but a complete
> disaster.

I tell students you will not have problems with these settings unless
you change them. ;-)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-08 03:58:58
Message-ID:	20120208035858.GB24440@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 19, 2012 at 09:42:52PM -0500, Robert Haas wrote:
> I certainly didn't intend to come across as disparaging your work on
> this topic. I understand that there are big problems with the way
> things work now; I'm just cautious about trying to replace them too
> hastily with something that may not turn out to be any better. Of
> course, if we can replace it with something that we're sure is
> actually an improvement, I'm all in favor of that. But, IMHO, the
> problems in this area are too serious to be solved by renaming the
> knobs. At most, we're going to buy ourselves a little time to come up
> with a better solution.
>
> IMHO, and at the risk of repeating myself, one of the big problems in
> this area is that we're making the user guess something that we really
> ought to be figuring out for them. Just as users want checkpoints to
> run as slowly as possible while still not bumping into the next
> checkpoint, they'd presumably like vacuum to run as slowly as possible
> without bumping into the next vacuum. Instead, we make them tell us
> how fast they'd like it tor run, which requires them to guess a value
> high enough to finish soon enough but low enough to minimize the
> impact on the rest of the system.
>
> Another problem is that the vacuum algorithm itself could, I think, be
> made much smarter. We could teach HOT to prune pages that contain no
> HOT chains but do contain dead tuples. That would leave dead line
> pointers behind, but that's not nearly as bad as leaving the entire
> tuple behind. We could, as Simon and others have suggested, have one
> threshold for vacuuming the heap (i.e. reclaiming dead tuples) and
> another for vacuuming the indexes (i.e. reclaiming dead line
> pointers). That would open the door to partial vacuuming: just vacuum
> half a gigabyte or so of the heap, and then move on; the next vacuum
> can pick up where that one left off, at least up to the point where we
> decide we need to make an index pass; it would possibly also allow us
> to permit more than one vacuum on the same table at the same time,
> which is probably needed for very large tables. We could have
> backends that see dead tuples on a page throw them over to the fence
> to the background writer for immediate pruning. I blather, but I
> guess my point is that I really hope we're going to do something
> deeper here at some point in the near future, whatever becomes of the
> proposals now on the table.

As much as I hate to poo-poo a patch addition, I have to agree with
Robert Haas on this one. Renaming settings really isn't moving us
forward. It introduces a migration problem and really doesn't move us
forward in solving the underlying problem. Additional monitoring, while
helpful, also is only a stop-gap.

Only a small number of sites are going to monitor auto-vacuum/analyze.
Let's not start writing Postgres for those super-busy sites with a team
of administrators --- while they are important, they are not the
majority of our user-base, and we can pride ourselves that Postgres runs
pretty well without a team of admins. We don't want to get into a case
where our super-visible, high-volume folks are overly setting the
project direction.

If we look at checkpoint smoothing, that was solved the right way with
a setting that worked automatically for everyone.

Now, I don't know if the solution is to time read/write duration to see
how busy the system is, or to look at the statistics to see how
backlogged the autovacuum system is when it gets time to actually
process a table, but those are the questions we should be asking here.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-08 14:38:20
Message-ID:	1328711414-sup-5164@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Bruce Momjian's message of mié feb 08 00:58:58 -0300 2012:

> As much as I hate to poo-poo a patch addition, I have to agree with
> Robert Haas on this one. Renaming settings really isn't moving us
> forward. It introduces a migration problem and really doesn't move us
> forward in solving the underlying problem. Additional monitoring, while
> helpful, also is only a stop-gap.

I think that (part of) the underlying problem is that we have no clear
way to specify "how much I/O do you want autovacuum to use". That's
what this patch is all about, AFAIU; it has nothing to do with
monitoring. Right now, as has been said, the only way to tweak this is
to change vacuum_cost_delay; the problem with that setting is that
making the calculation is not straightforward.

(Now, I disagree that it's so complex that it cannot ever be explain to
a class; or that it's so obscure that the only way to make it work is to
leave it alone and never touch it. It's complex, okay, but it's not
exactly rocket science either.)

If the only real downside to this patch is that some people have already
changed vacuum_cost_delay and they will want to migrate those settings
forward, maybe we shouldn't be looking at _replacing_ that one with a
new setting, but rather just add the new setting; and in the code for
each, make sure that only one of them is set, and throw an error if the
other one is.

> On Thu, Jan 19, 2012 at 09:42:52PM -0500, Robert Haas wrote:

> > Another problem is that the vacuum algorithm itself could, I think, be
> > made much smarter. We could teach HOT to prune pages that contain no
> > HOT chains but do contain dead tuples. That would leave dead line
> > pointers behind, but that's not nearly as bad as leaving the entire
> > tuple behind. We could, as Simon and others have suggested, have one
> > threshold for vacuuming the heap (i.e. reclaiming dead tuples) and
> > another for vacuuming the indexes (i.e. reclaiming dead line
> > pointers). That would open the door to partial vacuuming: just vacuum
> > half a gigabyte or so of the heap, and then move on; the next vacuum
> > can pick up where that one left off, at least up to the point where we
> > decide we need to make an index pass; it would possibly also allow us
> > to permit more than one vacuum on the same table at the same time,
> > which is probably needed for very large tables. We could have
> > backends that see dead tuples on a page throw them over to the fence
> > to the background writer for immediate pruning. I blather, but I
> > guess my point is that I really hope we're going to do something
> > deeper here at some point in the near future, whatever becomes of the
> > proposals now on the table.

This is all fine, but what does it have to do with the current patch? I
mean, if we change vacuum to do some stuff differently, it's still going
to have to read and dirty pages and thus account for I/O.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-08 14:56:17
Message-ID:	CA+TgmoZj3kt5bpF=O6Bg4f49qa6kwFcvLG+tqrfkcD2M44-Xiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 8, 2012 at 9:38 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> I think that (part of) the underlying problem is that we have no clear
> way to specify "how much I/O do you want autovacuum to use". That's
> what this patch is all about, AFAIU; it has nothing to do with
> monitoring. Right now, as has been said, the only way to tweak this is
> to change vacuum_cost_delay; the problem with that setting is that
> making the calculation is not straightforward.

I agree.

> (Now, I disagree that it's so complex that it cannot ever be explain to
> a class; or that it's so obscure that the only way to make it work is to
> leave it alone and never touch it. It's complex, okay, but it's not
> exactly rocket science either.)

I emphatically agree.

> If the only real downside to this patch is that some people have already
> changed vacuum_cost_delay and they will want to migrate those settings
> forward, maybe we shouldn't be looking at _replacing_ that one with a
> new setting, but rather just add the new setting; and in the code for
> each, make sure that only one of them is set, and throw an error if the
> other one is.

I think that part of the confusion here is that the current settings
aren't strictly trying to regulate vacuum's I/O rate so much as the
total amount of work it can do per unit time. Even if autovac dirties
no pages and has no misses, the pages it hits will eventually fill up
the cost limit and it will sleep; but the work in that case is all CPU
utilization, not I/O. Similarly, we don't have separate control knobs
for read I/O (misses) and write I/O (dirty pages). The sleeps happen
due to a blend of factors (CPU, read I/O, write I/O) which are
combined to produce an estimate of the total impact of vacuum on the
rest of the system.

Now, an argument could be made that we ought not to care about that.
We could say: trying to use this blended algorithm to figure out when
vacuum is getting too expensive is a loser. We just want to limit the
amount of dirty data that's being generated, because that's the only
thing that matters to us. In that case, I could see adding a new
setting, sitting alongside the existing settings: amount of dirty data
that can be generated per second. But that's not what Greg is
proposing. He is proposing, essentially, that we keep the blended
algorithm, but then stack another calculation on top that backs into
the blended cost limit based on the user's tolerance for dirtying
data. So there will still be limits on the amount of read I/O and CPU
usage, but they'll be derived from the allowable rate of data dirtying
and the ratios between the different cost parameters. The current
settings aren't exactly intuitive, but I really don't want to have to
explain to a customer that setting the dirty data limit to 8MB/s will
actually limit it to just 5.3MB/s if all the pages are not in shared
buffers (because the page miss costs will account for a third of the
budget) and to 7.8MB/s if all the pages are in shared buffers (because
the page miss costs will account for a twenty-first of the budget) and
somewhere in between if only some of the pages are resident; and that,
further, by setting the dirty data rate to 8MB/s, they've implicitly
set the max read rate from disk at 16MB/s, again because of the 2:1
dirty:miss cost ratio. Yikes!

Honestly, I think the best place for this work is in something like
pgtune. It's totally useful to have a calculator for this stuff (for
many of the same reasons that it's useful to have explain.depesz.com)
but the abstraction being proposed is leaky enough that I think it's
bound to cause confusion.

> This is all fine, but what does it have to do with the current patch? I
> mean, if we change vacuum to do some stuff differently, it's still going
> to have to read and dirty pages and thus account for I/O.

Yeah, I drifted off topic there a bit. I think the only relevant
point in all that is that even if we all agreed that this is an
improvement, I'd be reluctant to slap a band-aid on something that I
think needs surgery.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-08 16:22:21
Message-ID:	20120208162221.GG24440@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 08, 2012 at 09:56:17AM -0500, Robert Haas wrote:
> > This is all fine, but what does it have to do with the current patch? I
> > mean, if we change vacuum to do some stuff differently, it's still going
> > to have to read and dirty pages and thus account for I/O.
>
> Yeah, I drifted off topic there a bit. I think the only relevant
> point in all that is that even if we all agreed that this is an
> improvement, I'd be reluctant to slap a band-aid on something that I
> think needs surgery.

Agreed. I see this only as short-term fix that will be changed in the
next release if we really decide to tackle the problem. What percentage
of our userbase are going to ever adjust these settings --- <1%, for
sure.

What we have now just isn't cutting it for 99% of our users, and we need
to address that if we are going to ever make any real headway here.

Why can't vacuum handle things automatically like checkpoint smoothing?
Why can't it detect when it is falling behind and speed up? Why can't
it see as busy background writer and slow down? Unless we answer
these questions, we are not solving the problem for 99% of our users.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-08 18:02:16
Message-ID:	CA+TgmoaiKeUdLrL=C1gs5JMHyoxVngN9S5X3t_Djz5CbdPtKCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 8, 2012 at 11:22 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> What we have now just isn't cutting it for 99% of our users, and we need
> to address that if we are going to ever make any real headway here.
>
> Why can't vacuum handle things automatically like checkpoint smoothing?
> Why can't it detect when it is falling behind and speed up? Why can't
> it see as busy background writer and slow down? Unless we answer
> these questions, we are not solving the problem for 99% of our users.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndQuadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-09 03:41:43
Message-ID:	4F334077.3060703@2ndQuadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/08/2012 11:22 AM, Bruce Momjian wrote:
> Why can't vacuum handle things automatically like checkpoint smoothing?
> Why can't it detect when it is falling behind and speed up? Why can't
> it see as busy background writer and slow down? Unless we answer
> these questions, we are not solving the problem for 99% of our users.

Scheduling VACUUM is a much harder problem than checkpoint smoothing.
Checkpoints can define "ahead" and "behind" simply; defining what those
terms mean for vacuum scheduling across a set of tables is tricky. A
possible downside to not vacuuming hard enough is that you'll have
tables reach their thresholds faster than you're cleaning them up.
That's a future bad event possible if you don't do the right thing now.
Predicting when that will happen is a whole different class of problem
than tracking whether checkpoints are running on schedule.

I have a design sketched out for something that adjusts the vacuum rate
based on background writer activity. I'm not optimistic that will ever
be automatic to the level you'd like here either. That looks like it
will be prone to feedback loops where autovacuum can starve forever. If
it ever lags enough that commonly accessed tables have become
inefficiently stored, they will then generate more buffer I/O to access,
and thus continue to block future vacuum work. That's the exact
situation where vacuum is most needed, even though it seems too
expensive to do. That's one trap people already fall into when doing
this by hand. Everyone who's ever said "autovacuum uses too much I/O
when it pops up unexpectedly during the day, I'm going to turn that off"
has seen how that works out. Hilarity ensues when the inevitable and
gigantic wraparound vacuums happen in the middle of the day instead.

Just trying to set the expectations bar realistically here. I don't
consider the easier problem of checkpoint smoothing a solved one,
either. Saying autovacuum needs to reach even that level of automation
to be a useful improvement over now is a slippery goal. Regardless, the
simple idea I submitted to this CF is clearly dead for now. I'll take
the feedback of "this level of change can live in a user-side tuning
tool" and do that instead. Since I was already thinking in the
direction of background activity monitoring, I have a good idea how I'd
need to approach this next, to be more likely to gain community buy-in
as an automated improvement. That's a longer term project though, which
I'll hopefully be able to revisit for 9.3.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum rate limit in KBps
Date:	2012-02-09 14:42:21
Message-ID:	20120209144221.GA15011@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 08, 2012 at 10:41:43PM -0500, Greg Smith wrote:
> Just trying to set the expectations bar realistically here. I don't
> consider the easier problem of checkpoint smoothing a solved one,
> either. Saying autovacuum needs to reach even that level of
> automation to be a useful improvement over now is a slippery goal.
> Regardless, the simple idea I submitted to this CF is clearly dead
> for now. I'll take the feedback of "this level of change can live
> in a user-side tuning tool" and do that instead. Since I was
> already thinking in the direction of background activity monitoring,
> I have a good idea how I'd need to approach this next, to be more
> likely to gain community buy-in as an automated improvement. That's
> a longer term project though, which I'll hopefully be able to
> revisit for 9.3.

Totally agree. If it is hard for us, it is super-hard to admins to set
this, so we had better give it serious thought.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +