Re: limiting hint bit I/O

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: limiting hint bit I/O
Date: 2011-02-07 19:02:54
Message-ID: AANLkTi==bJdFbNr0GgHXX-paT1f4_SudX4s18OPuW86q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/2/7 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Robert Haas wrote:
>>> On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> > Uh, in this C comment:
>>> >
>>> > + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 5% of
>>> > + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit changes,
>>> >
>>> > 5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
>>> > clarify this in the patch?
>>>
>>> 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
>>> that's exactly what the comment you just quoted says on its face, but
>>> I'm open to some other wording you want to propose.
>>
>> How about:
>>
>>        otherwise-not-dirty -> only-hint-bit-dirty
>>
>> So 95% of your hint bit modificates are discarded if the pages is not
>> otherwise dirtied?  That seems pretty radical.
>
> No, it's more subtle than that, although I admit it *is* radical.
> There are three ways that pages can get written out to disk:
>
> 1. Checkpoints.
> 2. Background writer activity.
> 3. Backends writing out dirty buffers because there are no clean
> buffers available to allocate.
>
> What the latest version of the patch implements is:
>
> 1. Checkpoints no longer write only-hint-bit-dirty pages to disk.
> Since a checkpoint doesn't evict pages from memory, the hint bits are
> still there to be written out (or not) by (2) or (3), below.
>
> 2. When the background writer's cleaning scan hits an
> only-hint-bit-dirty page, it writes it, same as before.  This
> definitely doesn't result in the loss of any hint bits.
>
> 3. When a backend writes out a dirty buffer itself, because there are
> no clean buffers available to allocate, it initially writes them.  But
> if there are more than 100 such pages per block of 2000 allocations,
> it recycles any after the first 100 without writing them.
>
> In normal operation, I suspect that there will be very little impact
> from this change.  The change described in #1 may slightly reduce the
> size of some checkpoints, but it's unclear that it will be enough to
> be material.  The change described in #3 will probably also not
> matter, because, in a well-tuned system, the background writer should
> be set aggressively enough to provide a supply of clean pages, and
> therefore backends shouldn't be doing many writes themselves, and
> therefore most buffer allocations will be of already-clean pages, and
> the logic described in #3 will probably never kick in.  Even if they
> are writing a lot of buffers themselves, the logic in #3 still won't
> kick in if many of the pages being written are actually dirty - it
> will only matter if the backends are writing out lots and lots of
> pages *solely because they are only-hint-bit-dirty*.
>
> Where I expect this to make a big difference is on sequential scans of
> just-loaded tables.  In that case, the BufferAccessStrategy machinery
> will force the backend to reuse the same buffers over and over again,
> and all of those pages will be only-hint-bit-dirty.  So the backend
> has to do a write for every page it allocates, and even though those
> writes are being absorbed by the OS cache, it's still slow.  With this
> patch, what will happen is that the backend will write about 100
> pages, then perform the next 1900 allocations without writing, then
> write another 100 pages, etc.  So at the end of the scan, instead of
> having written an amount of data equal to the size of the table, we
> will have written 5% of that amount, and 5% of the hint bits will be
> on disk.  Each subsequent scan will get another 5% of the hint bits on
> disk until after 20 scans they are all set.  So the work of setting
> the hint bits is spread out across the first 20 table scans instead of
> all being done the first time through.
>
> Clearly, there's further jiggering that can be done here.  But the
> overall goal is simply that some of our users don't seem to like it
> when the first scan of a newly loaded table generates a huge storm of
> *write* traffic.  Given that the hint bits appear to be quite
> important from a performance perspective (see benchmark numbers
> upthread),

those are not real benchmarks, just quick guess to check behavior.
(and I agree it looks good, but I also got inconsistent results, the
patched postgresql hardly reach the same speed of the original
9.1devel even after 200 hundreds select of your testcase)

> we don't really have the option of just not writing them -
> but we can try to not to do it all at once, if we think that's an
> improvement, which I think is likely.
>
> Overall, I'm inclined to move this patch to the next CommitFest and
> forget about it for now.  I don't think we're going to get enough
> testing of this in the next week to be really confident that it's
> right.  I might be willing to commit with some more moderate amount of
> testing if we were right at the beginning of a development cycle,
> figuring that we'd shake out any warts as the cycle went along, but
> this isn't seeming like the right time for this kind of a change.

I agree.
I think it might be better to do the hint_bit_allowance decrement when
we write something (dirty or dirtyhint).
And so we can have something like :

100% writte : write dirty + hint
5 % write : write 5 % of (dirty + hint) (instead of write 5% of the hint only).

So come a simple Bandwith/IOrequest limiter.
Open for next commitfest :)

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cédric Villemain 2011-02-07 19:04:31 Re: limiting hint bit I/O
Previous Message Dimitri Fontaine 2011-02-07 18:58:30 Re: More extension issues: ownership and search_path