Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Mel Gorman <mgorman(at)suse(dot)de>
To: Dave Chinner <david(at)fromorbit(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Joshua Drake <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-20 14:27:03
Message-ID: 20140120142703.GS4963@suse.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 20, 2014 at 10:51:41AM +1100, Dave Chinner wrote:
> On Sun, Jan 19, 2014 at 03:37:37AM +0200, Marti Raudsepp wrote:
> > On Wed, Jan 15, 2014 at 5:34 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> > > it's very common to create temporary file data that will never, ever, ever
> > > actually NEED to hit disk. Where I work being able to tell the kernel to
> > > avoid flushing those files unless the kernel thinks it's got better things
> > > to do with that memory would be EXTREMELY valuable
> >
> > Windows has the FILE_ATTRIBUTE_TEMPORARY flag for this purpose.
> >
> > ISTR that there was discussion about implementing something analogous
> > in Linux when ext4 got delayed allocation support, but I don't think
> > it got anywhere and I can't find the discussion now. I think the
> > proposed interface was to create and then unlink the file immediately,
> > which serves as a hint that the application doesn't care about
> > persistence.
>
> You're thinking about O_TMPFILE, which is for making temp files that
> can't be seen in the filesystem namespace, not for preventing them
> from being written to disk.
>
> I don't really like the idea of overloading a namespace directive to
> have special writeback connotations. What we are getting into the
> realm of here is generic user controlled allocation and writeback
> policy...
>

Such overloading would be unwelcome. FWIW, I assumed this would be an
fadvise thing. Initially something that controlled writeback on an inode
and not an fd context that ignored the offset and length parameters.
Granded, someone will probably throw a fit about adding a Linux-specific
flag to the fadvise64 syscall. POSIX_FADV_NOREUSE is currently unimplemented
and it could be argued that it could be used to flag temporary files that
have a different writeback policy but it's not clear if that matches the
original intent of the posix flag.

> > Postgres is far from being the only application that wants this; many
> > people resort to tmpfs because of this:
> > https://lwn.net/Articles/499410/
>
> Yes, we covered the possibility of using tmpfs much earlier in the
> thread, and came to the conclusion that temp files can be larger
> than memory so tmpfs isn't the solution here. :)
>

And swap IO patterns blow chunks because people rarely want to touch
that area of the code with a 50 foot pole. It gets filed under "if you're
swapping, you already lost"

--
Mel Gorman
SUSE Labs

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jov 2014-01-20 14:31:51 change alter user to be a true alias for alter role
Previous Message Florian Pflug 2014-01-20 14:25:22 Re: plpgsql.warn_shadow