Re: 9.4 regression

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Thom Brown <thom(at)linux(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4 regression
Date: 2013-08-19 19:49:10
Message-ID: 20130819194910.GF26775@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-08-19 14:40:07 -0500, Jon Nelson wrote:
> On Fri, Aug 16, 2013 at 3:57 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > On Thu, Aug 15, 2013 at 12:08:57PM -0500, Jon Nelson wrote:
> >> > Where are we on this issue?
> >>
> >> I've been able to replicate it pretty easily with PostgreSQL and
> >> continue to look into it. I've contacted Theodore Ts'o and have gotten
> >> some useful information, however I'm unable to replicate the behavior
> >> with the test program (even one that's been modified). What I've
> >> learned is:
> >>
> >> - XLogWrite appears to take approx. 2.5 times longer when writing to a
> >> file allocated with posix_fallocate, but only the first time the file
> >> contents are overwritten. This is partially explained by how ext4
> >> handles extents and uninitialized data, but 2.5x is MUCH more
> >> expensive than anticipated or expected here.
> >> - Writing zeroes to a file allocated with posix_fallocate (essentially
> >> adding a posix_fallocate step before the usual write-zeroes-in-a-loop
> >> approach) not only doesn't seem to hurt performance, it seems to help
> >> or at least have parity, *and* the space is guaranteed to exist on
> >> disk. At the very least that seems useful.
> >
> > Is it time to revert this patch until we know more?
>
> While I'm not qualified to say, my inclination is to say yes. It can
> always be added back later. The only caveat there would be that -
> perhaps - a small modification of the patch would be warranted.
> Specifically, with with posix_fallocate, I saw no undesirable behavior
> when the (newly allocated) file was manually zeroed anyway. The only
> advantages (that I can see) to doing it this way versus not using
> posix_fallocate at all is (a) a potential reduction in the number of
> extents

I vote for adapting the patch to additionally zero out the file via
write(). In your tests that seemed to perform at least as good as the
old method... It also has the advantage that we can use it a littlebit
more as a testbed for possibly using it for heap extensions one day.
We're pretty early in the cycle, so I am not worried about this too much...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-08-19 19:51:16 Re: danger of stats_temp_directory = /dev/shm
Previous Message Josh Berkus 2013-08-19 19:47:05 Re: danger of stats_temp_directory = /dev/shm