Re: Production block comparison facility

From: Greg Stark <stark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Production block comparison facility
Date: 2014-07-22 11:54:58
Message-ID: CAM-w4HPk0mmt1SPFmNBJdd6affOEB23UAoW8KE1EnEqbZ7SqDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

If you're always going FPW then there's no point in the rest of the record.
The point here was to find problems so that users could run normally with
confidence.

The cases you might want to run in the mode you describe are the build farm
or integration testing. When treating your application on the next release
of postgres it would be nice to have tests for the replication in your
workload given the experience in 9.3.

Even without the constant full page writes a live production system could
do a FPW comparison after a FPW if it was in a consistent state. That would
give standbys periodic verification at low costs.

--
greg
On 22 Jul 2014 12:28, "Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:

> On 22 July 2014 08:49, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Sun, Jul 20, 2014 at 5:31 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> >> The block comparison facility presented earlier by Heikki would not be
> >> able to be used in production systems. ISTM that it would be desirable
> >> to have something that could be used in that way.
> >>
> >> ISTM easy to make these changes
> >>
> >> * optionally generate a FPW for every WAL record, not just first
> >> change after checkpoint
> >> full_page_writes = 'always'
> >>
> >> * when an FPW arrives, optionally run a check to see if it compares
> >> correctly against the page already there, when running streaming
> >> replication without a recovery target. We could skip reporting any
> >> problems until the database is consistent
> >> full_page_write_check = on
> >>
> >> The above changes seem easy to implement.
> >>
> >> With FPW compression, this would be a usable feature in production.
> >>
> >> Comments?
> >
> > This is an interesting idea, and it would be easier to use than what
> > has been submitted for CF1. However, full_page_writes set to "always"
> > would generate a large amount of WAL even for small records,
> > increasing I/O for the partition holding pg_xlog, and the frequency of
> > checkpoints run on system. Is this really something suitable for
> > production?
>
> For critical systems, yes, I think it is.
>
> It would be possible to make that user selectable for particular
> transactions or tables.
>
> > Then, looking at the code, we would need to tweak XLogInsert for the
> > WAL record construction to always do a FPW and to update
> > XLogCheckBufferNeedsBackup. Then for the redo part, we would need to
> > do some extra operations in the area of
> > RestoreBackupBlock/RestoreBackupBlockContents, including masking
> > operations before comparing the content of the FPW and the current
> > page.
> >
> > Does that sound right?
>
> Yes, it doesn't look very much code because it fits well with existing
> approaches.
>
> --
> Simon Riggs http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-07-22 12:46:13 Re: Production block comparison facility
Previous Message Simon Riggs 2014-07-22 11:28:03 Re: Production block comparison facility