Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 20:30:00
Message-ID: 11277432-7a18-b409-fc0a-671bd0d009c7@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/09/2018 10:04 PM, Andres Freund wrote:
> Hi,
>
> On 2018-04-09 21:54:05 +0200, Tomas Vondra wrote:
>> Isn't the expectation that when a fsync call fails, the next one will
>> retry writing the pages in the hope that it succeeds?
>
> Some people expect that, I personally don't think it's a useful
> expectation.
>

Maybe. I'd certainly prefer automated recovery from an temporary I/O
issues (like full disk on thin-provisioning) without the database
crashing and restarting. But I'm not sure it's worth the effort.

And most importantly, it's rather delusional to think the kernel
developers are going to be enthusiastic about that approach ...

>
> We should just deal with this by crash-recovery. The big problem I
> see is that you always need to keep an file descriptor open for
> pretty much any file written to inside and outside of postgres, to be
> guaranteed to see errors. And that'd solve that. Even if retrying
> would work, I'd advocate for that (I've done so in the past, and I've
> written code in pg that panics on fsync failure...).
>

Sure. And it's likely way less invasive from kernel perspective.

>
> What we'd need to do however is to clear that bit during crash
> recovery... Which is interesting from a policy perspective. Could be
> that other apps wouldn't want that.
>

IMHO it'd be enough if a remount clears it.

>
> I also wonder if we couldn't just somewhere read each relevant
> mounted filesystem's errseq value. Whenever checkpointer notices
> before finishing a checkpoint that it has changed, do a crash
> restart.
>

Hmmmm, that's an interesting idea, and it's about the only thing that
would help us on older kernels. There's a wb_err in adress_space, but
that's at inode level. Not sure if there's something at fs level.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-09 20:34:15 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Mark Dilger 2018-04-09 20:25:54 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS