Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Jeff Layton <jlayton(at)redhat(dot)com>
To: Dave Chinner <david(at)fromorbit(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-20 13:43:34
Message-ID: 20140120084334.775641f0@tlielax.poochiereds.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 20 Jan 2014 10:51:41 +1100
Dave Chinner <david(at)fromorbit(dot)com> wrote:

> On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
> > On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> > > it's very common to create temporary file data that will never, ever, ever
> > > actually NEED to hit disk. Where I work being able to tell the kernel to
> > > avoid flushing those files unless the kernel thinks it's got better things
> > > to do with that memory would be EXTREMELY valuable
> >
> > Windows has the FILE_ATTRIBUTE_TEMPORARY flag for this purpose.
> >
> > ISTR that there was discussion about implementing something analogous
> > in Linux when ext4 got delayed allocation support, but I don't think
> > it got anywhere and I can't find the discussion now. I think the
> > proposed interface was to create and then unlink the file immediately,
> > which serves as a hint that the application doesn't care about
> > persistence.
>
> You're thinking about O_TMPFILE, which is for making temp files that
> can't be seen in the filesystem namespace, not for preventing them
> from being written to disk.
>
> I don't really like the idea of overloading a namespace directive to
> have special writeback connotations. What we are getting into the
> realm of here is generic user controlled allocation and writeback
> policy...
>

Agreed -- O_TMPFILE semantics are a different beast entirely.

Perhaps what might be reasonable though is a fadvise POSIX_FADV_TMPFILE
hint that tells the kernel: "Don't write out this data unless it's
necessary due to memory pressure".

If the inode is only open with file descriptors that have that hint
set on them. Then we could exempt it from dirty_expire_interval and
dirty_writeback_interval?

Tracking that desire on an inode open multiple times might be
"interesting" though. We'd have to be quite careful not to allow that
to open an attack vector.

> > Postgres is far from being the only application that wants this; many
> > people resort to tmpfs because of this:
> > https://lwn.net/Articles/499410/
>
> Yes, we covered the possibility of using tmpfs much earlier in the
> thread, and came to the conclusion that temp files can be larger
> than memory so tmpfs isn't the solution here. :)
>

--
Jeff Layton <jlayton(at)redhat(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-01-20 13:45:41 Re: Hstore 2.0 patch
Previous Message Dean Rasheed 2014-01-20 13:29:42 Re: array_length(anyarray)