Re: WIP checksums patch

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: WIP checksums patch
Date: 2012-09-15 00:58:37
Message-ID: 1347670717.4161.74.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is just a rebased version of the patch by Simon here:

http://archives.postgresql.org/message-id/CA
+U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g(at)mail(dot)gmail(dot)com

There are other things I'd like to do, like:

* include page# in checksum, and perhaps relfilenode or object OID (but
those two might cause difficulties)
* use TLI field instead of page version, if that is the consensus
(though it seems some may still disagree with that)
* we might want to make it slightly easier for external utilities, like
for backup/replication, to verify the pages
* optionally protect temporary files and local buffers
* come up with a robust way to determine that all pages in the database
are protected
* protect uninitialized pages

But not all of these will make the first patch. This is just a starting
point to reignite the discussion.

I welcome any feedback.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-09-15 02:33:04
Message-ID: 1347676384.2867.1.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
> This is just a rebased version of the patch by Simon here:
>
> http://archives.postgresql.org/message-id/CA
> +U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g(at)mail(dot)gmail(dot)com

Now with attachment.

Regards,
Jeff Davis

Attachment Content-Type Size
checksums_rebase.patch.gz application/x-gzip 71.0 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-10-01 02:09:20
Message-ID: 1349057360.15580.36.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
> This is just a rebased version of the patch by Simon here:

I just noticed the following note in the docs for this patch:

The default is <literal>off</> for backwards compatibility and
to allow upgrade. The recommended setting is <literal>on</> though
this should not be enabled until upgrade is successfully complete
with full set of new backups.

I don't understand what that means -- if they have the page_checksums
GUC available, then surely upgrade is complete, right? And what is the
backwards-compatibility issue?

Also, it looks out of date, because the default in guc.c is set to true.
I think we should probably default to true, because it's safer and it
can always be disabled at runtime, but I don't have a strong opinion
about that.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-01 02:19:04
Message-ID: 1349057944.15580.42.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
> * we might want to make it slightly easier for external utilities, like
> for backup/replication, to verify the pages

Ideally, PageVerificationInfoOK should be available to external
utilities, so that someone might script a background job to verify
pages. Or, perhaps we should just include a new utility,
pg_verify_checksums?

Either way, where is a good place to put the function so that it's
shared between the server and the utility? In src/port somewhere?

Regards,
Jeff Davis


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-10-01 14:43:00
Message-ID: 20121001144300.GA30089@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 30, 2012 at 07:09:20PM -0700, Jeff Davis wrote:
> On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
> > This is just a rebased version of the patch by Simon here:
>
> I just noticed the following note in the docs for this patch:
>
> The default is <literal>off</> for backwards compatibility and
> to allow upgrade. The recommended setting is <literal>on</> though
> this should not be enabled until upgrade is successfully complete
> with full set of new backups.
>
> I don't understand what that means -- if they have the page_checksums
> GUC available, then surely upgrade is complete, right? And what is the
> backwards-compatibility issue?
>
> Also, it looks out of date, because the default in guc.c is set to true.
> I think we should probably default to true, because it's safer and it
> can always be disabled at runtime, but I don't have a strong opinion
> about that.

I think this need to clearly state "pg_upgrade", not a dump/restore
upgrade, which would be fine. It would be interesting to have
pg_upgrade change this setting, or tell the user to change it. I am not
sure enough people are using pg_upgrade to change a default value.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-10-01 16:25:43
Message-ID: 1349108743.15580.44.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-10-01 at 10:43 -0400, Bruce Momjian wrote:
> > The default is <literal>off</> for backwards compatibility and
> > to allow upgrade. The recommended setting is <literal>on</> though
> > this should not be enabled until upgrade is successfully complete
> > with full set of new backups.
> >
> > I don't understand what that means -- if they have the page_checksums
> > GUC available, then surely upgrade is complete, right? And what is the
> > backwards-compatibility issue?

> I think this need to clearly state "pg_upgrade", not a dump/restore
> upgrade, which would be fine. It would be interesting to have
> pg_upgrade change this setting, or tell the user to change it. I am not
> sure enough people are using pg_upgrade to change a default value.

I still don't understand why pg_upgrade and page_checksums don't mix.

Regards,
Jeff Davis


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-10-01 16:35:24
Message-ID: 20121001163524.GC30089@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 1, 2012 at 09:25:43AM -0700, Jeff Davis wrote:
> On Mon, 2012-10-01 at 10:43 -0400, Bruce Momjian wrote:
> > > The default is <literal>off</> for backwards compatibility and
> > > to allow upgrade. The recommended setting is <literal>on</> though
> > > this should not be enabled until upgrade is successfully complete
> > > with full set of new backups.
> > >
> > > I don't understand what that means -- if they have the page_checksums
> > > GUC available, then surely upgrade is complete, right? And what is the
> > > backwards-compatibility issue?
>
> > I think this need to clearly state "pg_upgrade", not a dump/restore
> > upgrade, which would be fine. It would be interesting to have
> > pg_upgrade change this setting, or tell the user to change it. I am not
> > sure enough people are using pg_upgrade to change a default value.
>
> I still don't understand why pg_upgrade and page_checksums don't mix.

The heap/index files are copied unmodified from the old cluster, so
there are no checksums on the pages.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-10-01 17:04:09
Message-ID: 1349111049.15580.49.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-10-01 at 12:35 -0400, Bruce Momjian wrote:
> The heap/index files are copied unmodified from the old cluster, so
> there are no checksums on the pages.

That's fine though, the patch still reads the old format the same way,
and the pages are treated as though they have no checksum. How is that a
reason for defaulting the GUC to off?

I'm missing something here. Are we worried about users who turn the GUC
on and then expect all of their old data pages to magically be
protected?

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-01 17:13:49
Message-ID: 1349111629.15580.55.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-09-14 at 17:58 -0700, Jeff Davis wrote:
> This is just a rebased version of the patch by Simon here:
>
> http://archives.postgresql.org/message-id/CA
> +U5nMKw_GBs6qQ_Y8-RjGL1V7MVW2HWBHartB8LoJhnPfxL8g(at)mail(dot)gmail(dot)com

Another thing I noticed about the design of this patch:

It looks like the checksum will usually be wrong in a backup block in
WAL, because it writes the backup block before calculating the checksum
(which isn't done until the time the block is written out).

I think that's OK, because it's still protected by the WAL CRC, and
there's no expectation that the checksum is correct in shared buffers,
and the correct checksum should be set on the next checkpoint. Just an
observation.

Regards,
Jeff Davis


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-01 17:14:18
Message-ID: CA+U5nMKiK=cBNMp=hHLze8kPvaygTswjnHv4TS-3tReOSxJSkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1 October 2012 18:04, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2012-10-01 at 12:35 -0400, Bruce Momjian wrote:
>> The heap/index files are copied unmodified from the old cluster, so
>> there are no checksums on the pages.
>
> That's fine though, the patch still reads the old format the same way,
> and the pages are treated as though they have no checksum.

> How is that a
> reason for defaulting the GUC to off?

It's not.

> Are we worried about users who turn the GUC
> on and then expect all of their old data pages to magically be
> protected?

Yes, but as you say, that is a separate consideration.

You are missing large parts of the previous thread, giving various
opinions on what the UI should look like for enabling checksums.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-10-01 17:14:33
Message-ID: 20121001171433.GD30089@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 1, 2012 at 10:04:09AM -0700, Jeff Davis wrote:
> On Mon, 2012-10-01 at 12:35 -0400, Bruce Momjian wrote:
> > The heap/index files are copied unmodified from the old cluster, so
> > there are no checksums on the pages.
>
> That's fine though, the patch still reads the old format the same way,
> and the pages are treated as though they have no checksum. How is that a
> reason for defaulting the GUC to off?

No idea then.

> I'm missing something here. Are we worried about users who turn the GUC
> on and then expect all of their old data pages to magically be
> protected?

Perhaps we don't allow this to be turned per page, but rather per
cluster, and per-cluster would require the entire cluster to be
rewritten.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-01 17:22:19
Message-ID: 5069D14B.4020105@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I think that's OK, because it's still protected by the WAL CRC, and
> there's no expectation that the checksum is correct in shared buffers,
> and the correct checksum should be set on the next checkpoint. Just an
> observation.

We'd need to document that emphatically. Otherwise folks running on ZFS
and/or FusionIO with atomic writes (and, in the future, BTRFS) will
assume that they can turn "full_page_writes" off and checksums on, and
clearly that won't work with the current code. I think that's an
acceptable limitation, I just think we need to document it carefully,
and maybe throw a warning if people start up in that configuration.

> Perhaps we don't allow this to be turned per page, but rather per
> cluster, and per-cluster would require the entire cluster to be
> rewritten.

We dicussed this last year, and options which require a total rewrite of
the database in order to turn on the option were rejected as impractical
for users.

People did say it was desirable to have a manual option which says
"rewrite this entire table with checksums". However, that's not
required for the initial patch. For that matter, it will become
desirable to turn on checksums only for specific tables by they user
(again, future feature).

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-01 17:40:46
Message-ID: 1349113246.15580.63.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-10-01 at 18:14 +0100, Simon Riggs wrote:
> You are missing large parts of the previous thread, giving various
> opinions on what the UI should look like for enabling checksums.

I read through all of the discussion that I could find. There was quite
a lot, so perhaps I have forgotten pieces of it.

But that one section in the docs does look out of date and/or confusing
to me.

I remember there was discussion about a way to ensure that checksums are
set cluster-wide with some kind of special command (perhaps related to
VACUUM) and a magic file to let recovery know whether checksums are set
everywhere or not. That doesn't necessarily conflict with the GUC though
(the GUC could be a way to write checksums lazily, while this new
command could be a way to write checksums eagerly).

If some consensus was reached on the exact user interface, can you
please send me a link?

Regards,
Jeff Davis


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Jeff Davis'" <pgsql(at)j-davis(dot)com>, "'Simon Riggs'" <simon(at)2ndQuadrant(dot)com>
Cc: "'Bruce Momjian'" <bruce(at)momjian(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP checksums patch
Date: 2012-10-09 06:51:45
Message-ID: 008001cda5ea$8cdd8330$a6988990$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday, October 01, 2012 11:11 PM Jeff Davis wrote:
> On Mon, 2012-10-01 at 18:14 +0100, Simon Riggs wrote:
> > You are missing large parts of the previous thread, giving various
> > opinions on what the UI should look like for enabling checksums.
>
> I read through all of the discussion that I could find. There was quite
> a lot, so perhaps I have forgotten pieces of it.
>
> But that one section in the docs does look out of date and/or confusing
> to me.
>
> I remember there was discussion about a way to ensure that checksums are
> set cluster-wide with some kind of special command (perhaps related to
> VACUUM) and a magic file to let recovery know whether checksums are set
> everywhere or not. That doesn't necessarily conflict with the GUC though
> (the GUC could be a way to write checksums lazily, while this new
> command could be a way to write checksums eagerly).
>
> If some consensus was reached on the exact user interface, can you
> please send me a link?

AFAICT complete consensus has not been reached but one of the discussions can be found on below link:
http://archives.postgresql.org/pgsql-hackers/2012-03/msg00279.php
Here Robert has given suggestions and then further there is more discussion based on that points.

According to me, the main points where more work for this patch is required as per previous discussions is as follows:

1. Performance impact of WAL log for hint-bits needs to be verified for scenario's other than pg_bench (Such as bulk data load (which I
feel there is some way to optimize, but I don't know if that’s part of this patch)).
2. How to put the information in Page header.
I think general direction is use pd_tli.
Storing whether page has checksum should be done or it needs to be maintained at table or db level is not decided.
3. User Interface for Checksum options.
4. Still not there is consensus about locking the buffer.
5. Any more which I have missed?

Apart from above, one of the concern raised by many members is that there should be page format upgrade infrastructure first
and then we should add thinking of checksums(http://archives.postgresql.org/pgsql-hackers/2012-02/msg01517.php).
The major point for upgrade is that it should be an online upgrade and
the problem which I think there is no clear solution yet is hot to ensure that a database will never have more than 2 page formats.

If the general consensus is we should do it without having upgrade, then I think we can pursue discussion about the main points listed above.

With Regards,
Amit Kapila.


From: Jim Nasby <jim(at)nasby(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-29 20:31:13
Message-ID: 508EE791.3030504@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/1/12 12:22 PM, Josh Berkus wrote:
>> >Perhaps we don't allow this to be turned per page, but rather per
>> >cluster, and per-cluster would require the entire cluster to be
>> >rewritten.
> We dicussed this last year, and options which require a total rewrite of
> the database in order to turn on the option were rejected as impractical
> for users.

For whatever it's worth... we (and presumably others) still use londiste (or Slony) as our upgrade path, so we could tolerate a cluster-wide setting. We'd just set it when building new clusters via londiste and forget about it.

So I'd rather see this get in at a cluster level than not make it at all while we wait for something better.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: jesper(at)krogh(dot)cc
To: "Jim Nasby" <jim(at)nasby(dot)net>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-10-30 08:17:48
Message-ID: aee8d8c0eb23f66c3b2d54eb0bfbaae4.squirrel@shrek.krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 10/1/12 12:22 PM, Josh Berkus wrote:
>>> >Perhaps we don't allow this to be turned per page, but rather per
>>> >cluster, and per-cluster would require the entire cluster to be
>>> >rewritten.
>> We dicussed this last year, and options which require a total rewrite of
>> the database in order to turn on the option were rejected as impractical
>> for users.
>
> For whatever it's worth... we (and presumably others) still use londiste
> (or Slony) as our upgrade path, so we could tolerate a cluster-wide
> setting. We'd just set it when building new clusters via londiste and
> forget about it.
>
> So I'd rather see this get in at a cluster level than not make it at all
> while we wait for something better.

Some still do dump/restore between major releases, so there it is also
applicable. (and preferrable).

I do know a lot of things had been discussed last year, an API I could
easily imagine would work:

ALTER TABLE <tablename> SET ENABLE_CHECKSUMMING=YES
(would make the server do the checksums on all writes on table+indices
going forward, here, verification of the checksums would still be enabled,
but only on "already checksummed pages" ).
ALTER TABLE <tablename> set VERIFY_CHECKSUM=YES
(would cause the server to scan table and index, and rewrite the parts
that hadn't an updated with a checksum already.

Perhaps the names are not well-chosen, but it is based on the assumptions
that check-summing overhead is measurable and it is thereby desireable to
be able to dis/enable it on a per-table level.

Then I could let my system go 6-12 months between the above two commands
and the amount of rewrite would hopefully not require a rewrite of the
entire 2.5TB of PGDATA.

Jesper

--
Jesper Krogh


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-11-05 17:19:08
Message-ID: CA+TgmoYLoKgn5SethR3B+8g37WNUBdXFLPFm1vk+n9f9z7usvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 29, 2012 at 4:31 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> For whatever it's worth... we (and presumably others) still use londiste (or
> Slony) as our upgrade path, so we could tolerate a cluster-wide setting.
> We'd just set it when building new clusters via londiste and forget about
> it.
>
> So I'd rather see this get in at a cluster level than not make it at all
> while we wait for something better.

Yeah. I definitely think that we could shed an enormous amount of
complexity by deciding that this is, for now, an option that can only
be selected at initdb time. That would remove approximately 85% of
everything I've ever disliked about this patch - without, I think,
precluding the possibility of improving things later.

It also occurred to me that another way to reduce the scope of this
change would be to have a first version that does CRCs only for SLRU
pages. That would be useful for verifying the integrity of some of
our most critical data (pg_clog) and be a useful building block toward
a more complete implementation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP checksums patch
Date: 2012-11-09 02:17:26
Message-ID: CAFNqd5XPpvKkY7gyT37Z_H-TAjyFw1sG2RQzSOFP2f9CXmKG2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 5, 2012 at 12:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Oct 29, 2012 at 4:31 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> > For whatever it's worth... we (and presumably others) still use londiste
> (or
> > Slony) as our upgrade path, so we could tolerate a cluster-wide setting.
> > We'd just set it when building new clusters via londiste and forget about
> > it.
> >
> > So I'd rather see this get in at a cluster level than not make it at all
> > while we wait for something better.
>
> Yeah. I definitely think that we could shed an enormous amount of
> complexity by deciding that this is, for now, an option that can only
> be selected at initdb time. That would remove approximately 85% of
> everything I've ever disliked about this patch - without, I think,
> precluding the possibility of improving things later.
>
>
I see one thing to be concerned about, there...

I imagine it would not be a totally happy thing if the only way to switch
it on/off was to use Slony or Londiste to replicate into a database with
the opposite setting. (e.g. - This implies that built-in replication may
only replicate into a database with the identical checksum configuration.)

It's not outrageous for it to be a pretty heavyweight operation to switch
polarities, but there's such a thing as too heavy.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP checksums patch
Date: 2012-11-09 15:18:05
Message-ID: CA+Tgmoa4Yb-XDQ5vDA470_Gy92rzxAt5Zhk=AjG73EMEV0jL5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 8, 2012 at 9:17 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
> I see one thing to be concerned about, there...
>
> I imagine it would not be a totally happy thing if the only way to switch it
> on/off was to use Slony or Londiste to replicate into a database with the
> opposite setting. (e.g. - This implies that built-in replication may only
> replicate into a database with the identical checksum configuration.)

Sure, I agree. I don't think it should stay that way forever, but
removing the burden of dealing with this issue from the initial commit
would likely allow that commit to happen this release cycle, perhaps
even in the next CommitFest. And then we'd have half a loaf, which is
better than none, and we could deal with the issues of switching it on
and off as a further enhancement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-11-09 23:14:26
Message-ID: 1352502866.26644.15.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-11-05 at 12:19 -0500, Robert Haas wrote:
> Yeah. I definitely think that we could shed an enormous amount of
> complexity by deciding that this is, for now, an option that can only
> be selected at initdb time. That would remove approximately 85% of
> everything I've ever disliked about this patch - without, I think,
> precluding the possibility of improving things later.

That's certainly true, but it introduces one large problem: upgrading
would not work, which (in the past few releases) we've treated as a
major showstopper for many features.

If there is really no other good way to do it, then that might be
reasonable. But it seems within grasp to at least offer an offline way
to set checksums.

> It also occurred to me that another way to reduce the scope of this
> change would be to have a first version that does CRCs only for SLRU
> pages. That would be useful for verifying the integrity of some of
> our most critical data (pg_clog) and be a useful building block toward
> a more complete implementation.

That also breaks upgrade, right?

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP checksums patch
Date: 2012-11-09 23:19:42
Message-ID: 1352503182.26644.20.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-11-09 at 10:18 -0500, Robert Haas wrote:
> Sure, I agree. I don't think it should stay that way forever, but
> removing the burden of dealing with this issue from the initial commit
> would likely allow that commit to happen this release cycle, perhaps
> even in the next CommitFest. And then we'd have half a loaf, which is
> better than none, and we could deal with the issues of switching it on
> and off as a further enhancement.

Just after sending the last email, I realized that it can be separated
into separate commits fairly naturally, I think. So, I agree with you
that we should focus on an initdb setting for the next commitfest and
try for at least an offline migration tool (if not online) later.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-11-09 23:25:15
Message-ID: 1352503515.26644.24.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-10-01 at 10:22 -0700, Josh Berkus wrote:
> > I think that's OK, because it's still protected by the WAL CRC, and
> > there's no expectation that the checksum is correct in shared buffers,
> > and the correct checksum should be set on the next checkpoint. Just an
> > observation.
>
> We'd need to document that emphatically. Otherwise folks running on ZFS
> and/or FusionIO with atomic writes (and, in the future, BTRFS) will
> assume that they can turn "full_page_writes" off and checksums on, and
> clearly that won't work with the current code. I think that's an
> acceptable limitation, I just think we need to document it carefully,
> and maybe throw a warning if people start up in that configuration.

What situation are you concerned about here? I think that COW
filesystems should still be safe with full_page_writes off, right?

The checksum is calculated before every write, and the COW filesystems
do atomic writes, so the checksums should always be fine. What am I
missing?

Regards,
Jeff Davis


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: WIP checksums patch
Date: 2012-11-12 05:39:17
Message-ID: 50A08B85.1020708@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/9/12 6:14 PM, Jeff Davis wrote:
> On Mon, 2012-11-05 at 12:19 -0500, Robert Haas wrote:
>> Yeah. I definitely think that we could shed an enormous amount of
>> complexity by deciding that this is, for now, an option that can only
>> be selected at initdb time. That would remove approximately 85% of
>> everything I've ever disliked about this patch - without, I think,
>> precluding the possibility of improving things later.
>
> That's certainly true, but it introduces one large problem: upgrading
> would not work, which (in the past few releases) we've treated as a
> major showstopper for many features.

If you have pages that were written with one datetime storage format,
and you create a cluster using the other one, that will fail. If
checksums are done as an initdb time option, I see "checksummed" as
being a block change on that level, and the precedent for not supporting
it defensible. pg_upgrade will need to check for a mismatch--new
cluster has checksums turned on, old one doesn't--and abort out if that
happens. That can be lumped in with the other pg_controldata tests
easily enough.

What I think this argues for, though, is being precise about naming what
goes into pg_controldata. Let's say the initial commit target is an
initdb time switch, but later finer grained ones are expected to be
available. I think the output and source code labels on the initdb
controldata part should refer to this as something like "checksums
available" then. The word "enabled" could be misleading when there's
finer grained enabling coming later. We don't want people to run
pg_controldata, see "checksums: enabled/on", and later discover they're
not fully operational in that cluster yet. Saying "checksums:
available" suggests there's somewhere else that should be checked, to
tell if they're available on a given database/table or not.

The sort of text I'm thinking of for the manual and/or warning is:

"You can't use pg_upgrade to migrate from a cluster where checksums are
not available to one where they are. This limitation may be removed by
a future version."

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-11-14 17:54:54
Message-ID: 20121114175454.GF13888@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 12, 2012 at 12:39:17AM -0500, Greg Smith wrote:
> On 11/9/12 6:14 PM, Jeff Davis wrote:
> >On Mon, 2012-11-05 at 12:19 -0500, Robert Haas wrote:
> >>Yeah. I definitely think that we could shed an enormous amount of
> >>complexity by deciding that this is, for now, an option that can only
> >>be selected at initdb time. That would remove approximately 85% of
> >>everything I've ever disliked about this patch - without, I think,
> >>precluding the possibility of improving things later.
> >
> >That's certainly true, but it introduces one large problem: upgrading
> >would not work, which (in the past few releases) we've treated as a
> >major showstopper for many features.
>
> If you have pages that were written with one datetime storage
> format, and you create a cluster using the other one, that will
> fail. If checksums are done as an initdb time option, I see
> "checksummed" as being a block change on that level, and the
> precedent for not supporting it defensible. pg_upgrade will need to
> check for a mismatch--new cluster has checksums turned on, old one
> doesn't--and abort out if that happens. That can be lumped in with
> the other pg_controldata tests easily enough.

Yes, pg_upgrade can do that easily.

> What I think this argues for, though, is being precise about naming
> what goes into pg_controldata. Let's say the initial commit target
> is an initdb time switch, but later finer grained ones are expected
> to be available. I think the output and source code labels on the
> initdb controldata part should refer to this as something like
> "checksums available" then. The word "enabled" could be misleading
> when there's finer grained enabling coming later. We don't want
> people to run pg_controldata, see "checksums: enabled/on", and
> later discover they're not fully operational in that cluster yet.
> Saying "checksums: available" suggests there's somewhere else that
> should be checked, to tell if they're available on a given
> database/table or not.
>
> The sort of text I'm thinking of for the manual and/or warning is:
>
> "You can't use pg_upgrade to migrate from a cluster where checksums
> are not available to one where they are. This limitation may be
> removed by a future version."

"available" sounds like it is compiled in, but in this case, it means it
is active. I think we are just going to need to rename it as we make it
finer grained.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP checksums patch
Date: 2012-11-14 22:28:09
Message-ID: CAHyXU0yRFE7hR4jbV+s3Z0knpf3AZNCnmGMKACxs_J1BBnUQqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 9, 2012 at 1:51 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Monday, October 01, 2012 11:11 PM Jeff Davis wrote:
>> On Mon, 2012-10-01 at 18:14 +0100, Simon Riggs wrote:
>> > You are missing large parts of the previous thread, giving various
>> > opinions on what the UI should look like for enabling checksums.
>>
>> I read through all of the discussion that I could find. There was quite
>> a lot, so perhaps I have forgotten pieces of it.
>>
>> But that one section in the docs does look out of date and/or confusing
>> to me.
>>
>> I remember there was discussion about a way to ensure that checksums are
>> set cluster-wide with some kind of special command (perhaps related to
>> VACUUM) and a magic file to let recovery know whether checksums are set
>> everywhere or not. That doesn't necessarily conflict with the GUC though
>> (the GUC could be a way to write checksums lazily, while this new
>> command could be a way to write checksums eagerly).
>>
>> If some consensus was reached on the exact user interface, can you
>> please send me a link?
>
> AFAICT complete consensus has not been reached but one of the discussions can be found on below link:
> http://archives.postgresql.org/pgsql-hackers/2012-03/msg00279.php
> Here Robert has given suggestions and then further there is more discussion based on that points.
>
> According to me, the main points where more work for this patch is required as per previous discussions is as follows:
>
> 1. Performance impact of WAL log for hint-bits needs to be verified for scenario's other than pg_bench (Such as bulk data load (which I
> feel there is some way to optimize, but I don't know if that’s part of this patch)).

Atri has a patch posted which (if it passes muster) would eliminate
the i/o impact of WAL logging hint bits following a bulk load or any
scenario where many pages worth of tuples were sequentially written
out with the same XID. Atri's patch was written with the checksum
patch in mind.

http://archives.postgresql.org/message-id/CAOeZVid7zd9V-9WsEtf9wRCd0H4yw_1OgCLm7%3DCdvGPCxZcJJw%40mail.gmail.com

merlin