From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Gregory Stark <stark(at)enterprisedb(dot)com>, Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal of PITR performance improvement for 8.4. |
Date: | 2008-10-29 08:32:34 |
Message-ID: | 1225269154.3971.278.camel@ebony.2ndQuadrant |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote:
> 1. You should avoid useless posix_fadvise() calls. In the naive
> implementation, where you simply call posix_fadvise() for every page
> referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls
> per WAL record, and that's a lot of overhead. We face the same design
> question as with Greg's patch to use posix_fadvise() to prefetch index
> and bitmap scans: what should the interface to the buffer manager look
> like? The simplest approach would be a new function call like
> AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the
> page if it's not in the buffer cache, but is a no-op otherwise. But that
> means more overhead, since for every page access, we need to find the
> page twice in the buffer cache; once for the AdviseBuffer() call, and
> 2nd time for the actual ReadBuffer().
That's a much smaller overhead than waiting for an I/O. The CPU overhead
isn't really a problem if we're I/O bound.
> It would be more efficient to pin
> the buffer in the AdviseBuffer() call already, but that requires much
> more changes to the callers.
That would be hard to cleanup safely, plus we'd have difficulty with
timing: is there enough buffer space to allow all the prefetched blocks
live in cache at once? If not, this approach would cause problems.
> 2. The format of each WAL record is different, so you need a "readahead
> handler" for every resource manager, for every record type. It would be
> a lot simpler if there was a standardized way to store that information
> in the WAL records.
I would prefer a new rmgr API call that returns a list of blocks. That's
better than trying to make everything fit one pattern. If the call
doesn't exist then that rmgr won't get prefetch.
> 3. IIRC I tried to handle just a few most important WAL records at
> first, but it turned out that you really need to handle all WAL records
> (that are used at all) before you see any benefit. Otherwise, every time
> you hit a WAL record that you haven't done posix_fadvise() on, the
> recovery "stalls", and you don't need much of those to diminish the gains.
>
> Not sure how these apply to your approach, it's very different. You seem
> to handle 1. by collecting all the page references for the WAL file, and
> sorting and removing the duplicates. I wonder how much CPU time is spent
> on that?
Removing duplicates seems like it will save CPU.
--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support
From | Date | Subject | |
---|---|---|---|
Next Message | KaiGai Kohei | 2008-10-29 08:42:43 | Updates of SE-PostgreSQL 8.4devel patches (r1155) |
Previous Message | Svenne Krap | 2008-10-29 08:20:24 | Re: Feature Request - Table Definition query |