Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: James Bottomley <James(dot)Bottomley(at)HansenPartnership(dot)com>
Cc: Trond Myklebust <trondmy(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-14 01:26:25
Message-ID: 20140114012625.GB20335@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-01-13 17:13:51 -0800, James Bottomley wrote:
> a file into a user provided buffer, thus obtaining a page cache entry
> and a copy in their userspace buffer, then insert the page of the user
> buffer back into the page cache as the page cache page ... that's right,
> isn't it postgress people?

Pretty much, yes. We'd probably hint (*advise(DONTNEED)) that the page
isn't needed anymore when reading. And we'd normally write if the page
is dirty.

> Effectively you end up with buffered read/write that's also mapped into
> the page cache. It's a pretty awful way to hack around mmap.

Well, the problem is that you can't really use mmap() for the things we
do. Postgres' durability works by guaranteeing that our journal entries
(called WAL := Write Ahead Log) are written & synced to disk before the
corresponding entries of tables and indexes reach the disk. That also
allows to group together many random-writes into a few contiguous writes
fdatasync()ed at once. Only during a checkpointing phase the big bulk of
the data is then (slowly, in the background) synced to disk.

I don't see how that's doable with holding all pages in mmap()ed
buffers.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-01-14 01:27:35 Re: plpgsql.consistent_into
Previous Message Jim Nasby 2014-01-14 01:21:39 Re: Disallow arrays with non-standard lower bounds