Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Greg Stark <stark(at)mit(dot)edu>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-13 21:29:02
Message-ID: CAM-w4HMk9GwHurUpuBm+djJoTSjuKe23Ab9Od=rrujgpPvXdgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 13, 2014 at 9:12 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> For one, postgres doesn't use mmap for files (and can't without major
> new interfaces). Frequently mmap()/madvise()/munmap()ing 8kb chunks has
> horrible consequences for performance/scalability - very quickly you
> contend on locks in the kernel.

I may as well dump this in this thread. We've discussed this in person
a few times, including at least once with Ted T'so when he visited
Dublin last year.

The fundamental conflict is that the kernel understands better the
hardware and other software using the same resources, Postgres
understands better its own access patterns. We need to either add
interfaces so Postgres can teach the kernel what it needs about its
access patterns or add interfaces so Postgres can find out what it
needs to know about the hardware context.

The more ambitious and interesting direction is to let Postgres tell
the kernel what it needs to know to manage everything. To do that we
would need the ability to control when pages are flushed out. This is
absolutely necessary to maintain consistency. Postgres would need to
be able to mark pages as unflushable until some point in time in the
future when the journal is flushed. We discussed various ways that
interface could work but it would be tricky to keep it low enough
overhead to be workable.

The less exciting, more conservative option would be to add kernel
interfaces to teach Postgres about things like raid geometries. Then
Postgres could use directio and decide to do prefetching based on the
raid geometry, how much available i/o bandwidth and iops is available,
etc.

Reimplementing i/o schedulers and all the rest of the work that the
kernel provides inside Postgres just seems like something outside our
competency and that none of us is really excited about doing.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-01-13 21:29:59 Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Previous Message Jim Nasby 2014-01-13 21:26:46 Re: Disallow arrays with non-standard lower bounds