Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Greg Stark <stark(at)mit(dot)edu>
To: Mel Gorman <mgorman(at)suse(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dave Chinner <david(at)fromorbit(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, James Bottomley <James(dot)Bottomley(at)hansenpartnership(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-17 02:13:11
Message-ID: CAM-w4HNBnmuBx5mdMqVbHsTYiXumV+E_Jjc_+U=p0f1EGc=fAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 15, 2014 at 7:53 AM, Mel Gorman <mgorman(at)suse(dot)de> wrote:
> The second is have
> pages that are strictly kept dirty until the application syncs them. An
> unbounded number of these pages would blow up but maybe bounds could be
> placed on it. There are no solid conclusions on that part yet.

I think the interface would be subtler than that. The current
architecture is that if an individual process decides to evict one of
these pages it knows how much of the log needs to be flushed and
fsynced before it can do so and proceeds to do it itself. This is a
situation to be avoided as much as possible but there are workloads
where it's inevitable (the typical example is mass data loads).

There would need to be some kind of similar interface where there
would be some way for the kernel to force log pages to be written to
allow it to advance the epoch. Either some way to wake Postgres up and
inform it of the urgency or better yet Postgres would just always be
writing out pages without fsyncing them and instead be issuing some
other syscall to mark the points in the log file that correspond to
the write barriers that would unpin these buffers.

Ted T'so was concerned this would all be a massive layering violation
and I have to admit that's a huge risk. It would take some clever API
engineering to come with a clean set of primitives to express the kind
of ordering guarantees we need without being too tied to Postgres's
specific implementation. The reason I think it's more interesting
though is that Postgres's journalling and checkpointing architecture
is pretty bog-standard CS stuff and there are hundreds or thousands of
pieces of software out there that do pretty much the same work and
trying to do it efficiently with fsync or O_DIRECT is like working
with both hands tied to your feet.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-01-17 02:31:05 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Previous Message Andrew Dunstan 2014-01-17 02:08:59 Re: new json funcs