Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc: David Powers <dpowers(at)janestreet(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming replication, "frozen snapshot backup on it" and missing relfile (postgres 9.2.3 on xfs + LVM)
Date: 2013-05-28 23:27:49
Message-ID: CA+TgmoYC8yY7WDWJ5cankEEOMAT=v7aZPXLT9Z-HCRxPATebhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 28, 2013 at 10:53 AM, Benedikt Grundmann
<bgrundmann(at)janestreet(dot)com> wrote:
> Today we have seen
>
> 2013-05-28 04:11:12.300 EDT,,,30600,,51a41946.7788,1,,2013-05-27 22:41:10
> EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
> flushed only to 1E7E/21CB79A0",,,,,"writing block 9 of relation
> base/16416/293974676",,,,""
> 2013-05-28 04:11:13.316 EDT,,,30600,,51a41946.7788,2,,2013-05-27 22:41:10
> EDT,,0,ERROR,XX000,"xlog flush request 1E95/AFB2DB10 is not satisfied ---
> flushed only to 1E7E/21CB79A0",,,,,"writing block 9 of relation
> base/16416/293974676",,,,""
>
> while taking the backup of the primary. We have been running for a few days
> like that and today is the first day where we see these problems again. So
> it's not entirely deterministic / we don't know yet what we have to do to
> reproduce.
>
> So this makes Robert's theory more likely. However we have also using this
> method (LVM + rsync with hardlinks from primary) for years without these
> problems. So the big question is what changed?

Well... I don't know. But my guess is there's something wrong with
the way you're using hardlinks. Remember, a hardlink means two
logical pointers to the same file on disk. So if either file gets
modified after the fact, then the other pointer is going to see the
changes. The xlog flush request not satisfied stuff could happen if,
for example, the backup is pointing to a file, and the primary is
pointing to the same file, and the primary modifies the file after the
backup is taken (thus modifying the backup after-the-fact).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-05-28 23:33:14 Re: preserving forensic information when we freeze
Previous Message Andres Freund 2013-05-28 23:27:35 Re: preserving forensic information when we freeze