Re: 16-bit page checksums for 9.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-16 13:53:48
Message-ID: CA+Tgmoa5TSiu9G0A3Qbtu_NHB7XvE6necuz-YsLcUgxp6tWE=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 16, 2012 at 6:16 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> v8 attached

It's hard to believe that this version has been tested terribly
thoroughly, because it doesn't compile.

+ LockBufHdr(buf);
+
+ /*
+ * Run PageGetLSN while holding header lock.
+ */
+ recptr = BufferGetLSN(buf);
+
+ /* To check if block content changes while flushing. - vadim 01/17/97 */
+ buf->flags &= ~BM_JUST_DIRTIED;
+ UnlockBufHdr(buf);
+

This doesn't seem very helpful. It's obvious even without the comment
that we're running PageGetLSN while holding the header lock. What's
not obvious, and what the comment should be explaining, is why we're
doing that.

+ /*
+ * If we're in recovery we cannot dirty a page because of a hint.
+ * We can set the hint, just not dirty the page as a result so
+ * the hint is lost when we evict the page or shutdown.
+ *
+ * See long discussion in bufpage.c
+ */
+ if (RecoveryInProgress())
+ return;

Doesn't this seem awfully bad for performance on Hot Standby servers?
I agree that it fixes the problem with un-WAL-logged pages there, but
I seem to recall some recent complaining about performance features
that work on the master but not the standby. Durable hint bits are
one such feature.

+ * Basically, we simply prevent the checkpoint WAL record from
+ * being written until we have marked the buffer dirty. We don't
+ * start the checkpoint flush until we have marked dirty, so our
+ * checkpoint must flush the change to disk successfully or the
+ * checkpoint never gets written, so crash recovery will fix.
+ *
+ * It's possible we may enter here without an xid, so it is
+ * essential that CreateCheckpoint waits for virtual transactions
+ * rather than full transactionids.
+ */
+ MyPgXact->delayChkpt = delayChkpt = true;

I am slightly worried that this expansion in the use of this mechanism
(formerly called inCommit, for those following along at home) could
lead to checkpoint starvation. Suppose we've got one or two large
table scans wandering along, setting hint bits, and now suddenly it's
time to checkpoint. How long will it take the checkpoint process to
find a time when nobody's got delayChkpt set?

+ #define PageSetChecksum(page) \
+ do \
+ { \
+ PageHeader p = (PageHeader) page; \
+ p->pd_flags |= PD_PAGE_VERSION_PLUS1; \
+ p->pd_flags |= PD_CHECKSUM1; \
+ p->pd_flags &= ~PD_CHECKSUM2; \
+ p->pd_verify.pd_checksum16 = PageCalcChecksum16(page); \
+ } while (0);
+
+ /* ensure any older checksum info is overwritten with watermark */
+ #define PageResetVersion(page) \
+ do \
+ { \
+ PageHeader p = (PageHeader) page; \
+ if (!PageHasNoChecksum(p)) \
+ { \
+ p->pd_flags &= ~PD_PAGE_VERSION_PLUS1; \
+ p->pd_flags &= ~PD_CHECKSUM1; \
+ p->pd_flags &= ~PD_CHECKSUM2; \
+ PageSetPageSizeAndVersion(p, BLCKSZ, PG_PAGE_LAYOUT_VERSION); \
+ } \
+ } while (0);

So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
doesn't have a checksum, PD_CHECKSUM2 is not set? What good does that
do?

* PageGetPageSize
* Returns the page size of a page.
*
! * Since PageSizeIsValid() when pagesize == BLCKSZ, just written BLCKSZ.
! * This can be called on any page, initialised or not, in or out of buffers.
! * You might think this can vary at runtime but you'd be wrong, since pages
! * frequently need to occupy buffers and pages are copied from one to another
! * so there are many hidden assumptions that this simple definition is true.
*/
! #define PageGetPageSize(page) (BLCKSZ)

I think I agree that the current definition of PageGetPageSize seems
unlikely to come anywhere close to allowing us to cope with multiple
page sizes, but I think this method of disabling it is a hack. The
callers that want to know how big the page really is should just use
BLCKSZ instead of this macro, and those that want to know how big the
page THINKS it is (notably contrib/pageinspect) need a way to get that
information.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-02-16 13:58:35 Re: run GUC check hooks on RESET
Previous Message Shigeru Hanada 2012-02-16 13:41:01 Re: pgsql_fdw, FDW for PostgreSQL server