Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Andres Freund <andres(at)anarazel(dot)de>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,Anthony Iliopoulos <ailiop(at)altatus(dot)com>,Greg Stark <stark(at)mit(dot)edu>,Geoff Winkless <pgsqladmin(at)geoff(dot)dj>,Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>,Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>,Bruce Momjian <bruce(at)momjian(dot)us>,Robert Haas <robertmhaas(at)gmail(dot)com>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,Catalin Iacob <iacobcatalin(at)gmail(dot)com>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-10 02:00:59
Message-ID: 9CE3ABD7-72DD-4D11-A940-7B56E090D11C@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On April 9, 2018 6:59:03 PM PDT, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>On 10 April 2018 at 04:37, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Hi,
>>
>> On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote:
>>> Maybe. I'd certainly prefer automated recovery from an temporary I/O
>>> issues (like full disk on thin-provisioning) without the database
>>> crashing and restarting. But I'm not sure it's worth the effort.
>>
>> Oh, I agree on that one. But that's more a question of how we force
>the
>> kernel's hand on allocating disk space. In most cases the kernel
>> allocates the disk space immediately, even if delayed allocation is
>in
>> effect. For the cases where that's not the case (if there are current
>> ones, rather than just past bugs), we should be able to make sure
>that's
>> not an issue by pre-zeroing the data and/or using fallocate.
>
>Nitpick: In most cases the kernel reserves disk space immediately,
>before returning from write(). NFS seems to be the main exception
>here.
>
>EXT4 and XFS don't allocate until later, it by performing actual
>writes to FS metadata, initializing disk blocks, etc. So we won't
>notice errors that are only detectable at actual time of allocation,
>like thin provisioning problems, until after write() returns and we
>face the same writeback issues.
>
>So I reckon you're safe from space-related issues if you're not on NFS
>(and whyyy would you do that?) and not thinly provisioned. I'm sure
>there are other corner cases, but I don't see any reason to expect
>space-exhaustion-related corruption problems on a sensible FS backed
>by a sensible block device. I haven't tested things like quotas,
>verified how reliable space reservation is under concurrency, etc as
>yet.

How's that not solved by pre zeroing and/or fallocate as I suggested above?

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-10 02:02:48 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Andres Freund 2018-04-10 01:59:04 Re: Excessive PostmasterIsAlive calls slow down WAL redo