Re: fallocate / posix_fallocate for new WAL file creation (etc...)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Smith <greg(at)2ndQuadrant(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fallocate / posix_fallocate for new WAL file creation (etc...)
Date: 2013-05-29 14:36:07
Message-ID: 20130529143607.GD6434@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Peter Eisentraut (peter_e(at)gmx(dot)net) wrote:
> On 5/28/13 11:36 AM, Greg Smith wrote:
> > Outside of the run for performance testing, I think it would be good at
> > this point to validate that there is really a 16MB file full of zeroes
> > resulting from these operations. I am not really concerned that
> > posix_fallocate might be slower in some cases; that seems unlikely. I
> > am concerned that it might result in a file that isn't structurally the
> > same as the 16MB of zero writes implementation used now.
>
> I see nothing in the posix_fallocate() man pages that says that the
> allocated space is filled with any kind of data or zeroes. It will
> likely be garbage data, but that should be fine for a new WAL file.

I *really* hope that the Linux kernel, and other, folks are smart enough
to realize that they can't just re-use random blocks from an I/O device
without cleaning it first. That would be one massive security hole. I
expect posix_fallocate() actually works more like spase files, except
that it also counts the space as being 'taken', but it doesn't go out
and actually pull blocks to use until you actually go to write to it.
At which point, perhaps there's an optimization that says "if the first
thing done with this is writing, then just write out whatever data is
requested and then fill the rest of the block out with zeros", and a
similar read operation which says "if we havn't formally assigned a
block for this, just return zeros". Hopefully it's smart enough to
avoid writing out all zeros and then turning around and writing out
whatever data is given, though since it'd all be in memory, perhaps
that wouldn't be too bad and might be simpler to implement.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-05-29 14:41:58 Re: Patch to .gitignore
Previous Message Joe Conway 2013-05-29 14:35:42 Re: pg_dump with postgis extension dumps rules separately