Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-04 07:32:04
Message-ID: CAEepm=2A4vrFkQgk7z6QnqQtCW7qm4gp3hFzg=e_Of-ZB8AiJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 4, 2018 at 6:00 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 4 April 2018 at 13:29, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
> wrote:
>> /* Ensure that we skip any errors that predate opening of the file */
>> f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
>>
>> [...]
>
> Holy hell. So even PANICing on fsync() isn't sufficient, because the kernel
> will deliberately hide writeback errors that predate our fsync() call from
> us?

Predates the opening of the file by the process that calls fsync().
Yeah, it sure looks that way based on the above code fragment. Does
anyone know better?

> Does that mean that the ONLY ways to do reliable I/O are:
>
> - single-process, single-file-descriptor write() then fsync(); on failure,
> retry all work since last successful fsync()

I suppose you could some up with some crazy complicated IPC scheme to
make sure that the checkpointer always has an fd older than any writes
to be flushed, with some fallback strategy for when it can't take any
more fds.

I haven't got any good ideas right now.

> - direct I/O

As a bit of an aside, I gather that when you resize files (think
truncating/extending relation files) you still need to call fsync()
even if you read/write all data with O_DIRECT, to make it flush the
filesystem meta-data. I have no idea if that could also be affected
by eaten writeback errors.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-04 07:51:53 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Michael Paquier 2018-04-04 07:26:25 Re: BUG #14999: pg_rewind corrupts control file global/pg_control