Re: fallocate / posix_fallocate for new WAL file creation (etc...)

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date: 2013-07-01 17:13:22
Message-ID: CAHGQGwGnYcX=Rp4f-YQqamHcLYMcskUxXa+k_3nm1VnOVdX3+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 2, 2013 at 1:55 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Sun, 2013-06-30 at 18:55 -0400, Greg Smith wrote:
>> This makes platform level testing a lot easier, thanks. Attached is an
>> updated copy of that program with some error checking. If the files it
>> creates already existed, the code didn't notice, and a series of write
>> errors happened. If you set the test up right it's not a problem, but
>> it's better if a bad setup is caught. I wrapped the whole test with a
>> shell script, also attached, which insures the right test sequence and
>> checks.
>
> Thank you.
>
>> That's glibc helpfully converting your call to posix_fallocate into
>> small writes, because the OS doesn't provide a better way in that
>> kernel. It's not hard to imagine this being slower than what the WAL
>> code is doing right now. I'm not worried about correctness issues
>> anymore, but my gut paranoia about this not working as expected on older
>> systems was justified. Everyone who thought I was just whining owes me
>> a cookie.
>
> So your theory is that it may be slower because there are twice as many
> syscalls (one per 4K page rather than one per 8K page)? Interesting
> observation.
>
>> This is what I plan to benchmark specifically next.
>
> In the interest of keeping this patch moving forward, do you have an
> estimate for when this testing will be complete?
>
>> If the
>> posix_fallocate approach is actually slower than what's done now when
>> it's not getting kernel acceleration, which is the case on RHEL5 era
>> kernels, we might need to make the configure time test more complicated.
>> Whether posix_fallocate is defined isn't sensitive enough; on Linux it
>> may be the case that this only is usable when fallocate() is also there.
>
> I'd say that if posix_fallocate is slower than the existing code on
> pretty much any platform, we shouldn't commit the patch at all.

Even in that case, if a user can easily know which platform posix_fallocate
should be used in, we can commit the patch with the configurable GUC
parameter.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 'Bruce Momjian' 2013-07-01 17:15:11 Re: Minor inheritance/check bug: Inconsistent behavior
Previous Message Atri Sharma 2013-07-01 16:56:14 Re: Randomisation for ensuring nlogn complexity in quicksort