Re: patch for new feature: Buffer Cache Hibernation

From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mitsuru IWASAKI <iwasaki(at)jp(dot)freebsd(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, jeff(dot)janes(at)gmail(dot)com
Subject: Re: patch for new feature: Buffer Cache Hibernation
Date: 2011-05-05 11:35:52
Message-ID: BANLkTikc81tQqKv_yuMsD+UnQxMKvuTUgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/5/5 Mitsuru IWASAKI <iwasaki(at)jp(dot)freebsd(dot)org>:
> Hi,
>
>> I think that PgFincore (http://pgfoundry.org/projects/pgfincore/)
>> provides similar functionality.  Are you familiar with that?  If so,
>> could you contrast your approach with that one?
>
> I'm not familiar with PgFincore at all sorry, but I got source code
> and documents and read through them just now.
> # and I'm a novice on postgres actually...
> The target both is to reduce physical I/O, but their approaches and
> gains are different.
> My understanding is like this;
>
> +---------------------+     +---------------------+
> | Postgres(backend)   |     | Postgres            |
> | +-----------------+ |     |                     |
> | | DB Buffer Cache | |     |                     |
> | | (shared buffers)| |     |                     |
> | |*my target       | |     |                     |
> | +-----------------+ |     |                     |
> |   ^      ^          |     |                     |
> |   |      |          |     |                     |
> |   v      v          |     |                     |
> | +-----------------+ |     | +-----------------+ |
> | |  buffer manager | |     | |    pgfincore    | |
> | +-----------------+ |     | +-----------------+ |
> +---^------^----------+     +----------^----------+
>    |      |smgrread()                 |posix_fadvise()
>    |read()|                           |                 userland
> ==================================================================
>    |      |                           |                 kernel
>    |      +-------------+-------------+
>    |                    |
>    |                    v
>    |       +------------------------+
>    |       | File System            |
>    |       |   +-----------------+  |
>    +------>|   | FS Buffer Cache |  |
>            |   |*PgFincore target|  |
>            |   +-----------------+  |
>            |    ^       ^           |
>            +----|-------|-----------+
>                 |       |
> ==================================================================
>                 |       |                               hardware
>       +---------|-------|----------------+
>       |         |       v  Physical Disk |
>       |         |   +------------------+ |
>       |         |   | base/16384/24598 | |
>       |         v   +------------------+ |
>       | +------------------------------+ |
>       | |Buffer Cache Hibernation Files| |
>       | +------------------------------+ |
>       +----------------------------------+
>

littel detail, pgfincore store its data per relation in a file, like you do.
I rewrote a bit that, and it will store its data directly in
postgresql tables, as well as it will be able to restore the cache
from raw bitstring.

> In summary, PgFincore's target is File System Buffer Cache, Buffer
> Cache Hibernation's target is DB Buffer Cache(shared buffers).

Correct. (btw I am very happy of your idea and that you get time to do it)

>
> PgFincore is trying to preload database file by posix_fadvise() into
> File System Buffer Cache, not into DB Buffer Cache(shared buffers).
> On query execution, buffer manager will get DB buffer blocks by
> smgrread() from file system unless necessary blocks exist in DB Buffer
> Cache.  At this point, physical reads may not happen because part of
> (or entire) database file is already loaded into FS Buffer Cache.
>
> The gain depends on the file system, especially size of File System
> Buffer Cache.
> Preloading database file is equivalent to following command in short.
> $ cat base/16384/24598 > /dev/null

Not exactly.

it exists 2 calls :

* pgfadv_WILLNEED
* pgfadv_WILLNEED_snapshot

The former ask to load each segment of a relation *but* the kernel can
decide to not do that or load only part of each segment. (so it is not
as brutal as cat file > /dev/null )
The later read *exactly* each blocks required in each segment, not all
blocks except if all were in cache while doing the snapshot. (this one
is the part of the snapshot/restore combo)

>
> I think PgFincore is good for data warehouse in applications.

Pgfincore with bitstring storage in a table allow streaming to
HotStandbys and get better response in case of switch-over/fail-over
by doing some house-keeping on the HotStandby and keep it really hot
;)

Even web applications have large database today ....

(they is more, but it is no the subject)

>
>
> Buffer Cache Hibernation, my approach, is more simple and straight forward.
> It try to save/load the contents of DB Buffer Cache(shared buffers) using
> regular files(called Buffer Cache Hibernation Files).
> At startup, buffer manager will load DB buffer blocks into DB Buffer
> Cache from Buffer Cache Hibernation Files which was saved at the last
> shutdown.  Note that database file will not be read, so it is not
> cached in File System Buffer Cache at all.  Only contents of DB Buffer
> Cache are filled.  Therefore, the DB buffer cache miss penalty would
> be larger than PgFincore's.
>
> The gain depends on the size of shared buffers, and how often the
> similar queries are executed before and after restarting.
>
> Buffer Cache Hibernation is good for OLTP in applications.

It is very helpfull for debugging and analysis purpose, also, IIUC.
I may prefer the per relation approach (so you can snapshot and
restore only the interesting tables/index). Given what I read in your
patch it looks easy to do, isn't it ?

I also prefer the idea to keep a map of the Buffer Cache (yes, like
what I do with pgfincore) than storing the data directly and reading
it directly. This later part semmes a bit dangerous to me, even if it
looks sane from a normal postgresql stop/start process.

>
>
> I think that PgFincore and Buffer Cache Hibernation is not exclusive,
> they can co-work together in different caching levels.

Yes.

>
>
>
> Sorry for my poor english skill, but I'm doing my best :)

better than me, and anyway your patch remain very easy to read in all case.

>
> Thanks
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2011-05-05 12:15:20 Re: GSoC 2011: Fast GiST index build
Previous Message Teodor Sigaev 2011-05-05 11:06:39 Re: GSoC 2011: Fast GiST index build