Re: pg_prewarm

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: cedric(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, Stefan Keller <sfkeller(at)gmail(dot)com>
Subject: Re: pg_prewarm
Date: 2012-04-09 15:32:00
Message-ID: CA+TgmoZihvzFW6n6pPwzDOBH2G175WcYsyNyuoNrybySs15tPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 18, 2012 at 7:25 AM, Cédric Villemain
<cedric(at)2ndquadrant(dot)com> wrote:
>> Would be nice to sort out the features of the two Postgres extentions
>> pgfincore (https://github.com/klando/pgfincore ) and pg_prewarm: what
>> do they have in common, what is complementary?
>
> pg_prewarm use postgresql functions (buffer manager) to warm data (different
> kind of 'warm', see pg_prewarm code). Relations are warmed block by block,
> for a range of block.

pg_prewarm actually supports three modes of prewarming: (1) pulling
things into the OS cache using PostgreSQL's asynchronous prefetching
code, which internally uses posix_fadvise on platforms where it's
available, (2) reading the data into a fixed-size buffer a block at a
time to force the OS to read it in synchronously, and (3) actually
pulling the data all the way into shared buffers. So in terms of
prewarming, it can do the stuff that pgfincore does, plus some extra
stuff. Of course, pgfincore has a bunch of extra capabilities in
related areas, like being able to check what's in core and being able
to evict things from core, but those things aren't prewarming and I
didn't feel any urge to include them in pg_prewarm, not because they
are bad ideas but just because they weren't what I was trying to do.

> pgfincore does not use the postgresql buffer manager, it uses the posix
> calls. It can proceed per block or full relation.
>
> Both need POSIX_FADVISE compatible system to be efficient.
>
> The main difference between pgfincore and pg_prewarm about full relation
> warm is that pgfincore will make very few system calls when pg_prewarm will
> do much more.

That's a fair complaint, but I'm not sure it matters in practice,
because I think that in real life the time spent prewarming is going
to be dominated by I/O, not system call time. Now, that's not an
excuse for being less efficient, but I actually did have a reason for
doing it this way, which is that it makes it work on systems that
don't support POSIX_FADVISE, like Windows and MacOS X. Unless I'm
mistaken or it's changed recently, pgfincore makes no effort to be
cross-platform, whereas pg_prewarm should be usable anywhere that
PostgreSQL is, and you'll be able to do prewarming in any of those
places, though of course it may be a bit less efficient without
POSIX_FADVISE, since you'll have to use the "read" or "buffer" mode
rather than "prefetch". Still, being able to do it less efficiently
is better than not being able to do it at all.

Again, I'm not saying this to knock pgfincore: I see the advantages of
its approach in exposing a whole suite of tools to people running on,
well, the operating systems on which the largest number of people run
PostgreSQL. But I do think that being cross-platform is an advantage,
and I think it's essential for anything we'd consider shipping as a
contrib module. I think you could rightly view all of this as
pointing to a deficiency in the APIs exposed by core: there's no way
for anything above the smgr layer to do anything with a range of
blocks, which is exactly what we want to do here. But I wasn't as
interested in fixing that as I was in getting something which did what
I needed, which happened to be getting the entirety of a relation into
shared_buffers without much ado.

> The current implementation of pgfincore allows to make a snapshot and
> restore via pgfincore or via pg_prewarm (just need some SQL-fu for the
> later).

Indeed.

Just to make completely clear my position on pgfincore vs. pg_prewarm,
I think they are complementary utilities with a small overlap. I
think that the prewarming is important enough to a broad enough group
of people that we should find some way of exposing that functionality
in core or contrib, and I wrote pg_prewarm as a minimalist
implementation of that concept. I am not necessarily opposed to
someone taking the bull by the horns and coming up with a grander
vision for what kind of tool we pull into the core distribution -
either by extending pg_prewarm, recasting pgfincore as a contrib
module with appropriate cross-platform sauce, or coming up with some
third approach that is truly the one ring to rule them all and in the
darkness bind them. At the same time, I want to get something done
for 9.3 and I don't want to make it harder than it needs to be. I
honestly believe that just having an easy way to pull stuff into
memory/shared_buffers will give us eighty to ninety percent of what
people need in this area; we can do more, either in core or elsewhere,
as the motivation may strike us.

Attached is an updated patch, with fixes for documentation typo noted
by Jeff Janes and some addition documentation examples also inspired
by comments from Jeff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
pg_prewarm_v2.patch application/octet-stream 11.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-04-09 15:32:43 Re: Deprecating non-select rules (was Re: Last gasp)
Previous Message Tom Lane 2012-04-09 15:18:43 Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)