WAL format changes

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: WAL format changes
Date: 2012-06-14 21:01:42
Message-ID: 4FDA5136.6080206@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As I threatened earlier
(http://archives.postgresql.org/message-id/4FD0B1AB.3090405@enterprisedb.com),
here are three patches that change the WAL format. The goal is to change
the format so that when you're inserting a WAL record of a given size,
you know exactly how much space it requires in the WAL.

1. Use a 64-bit segment number, instead of the log/seg combination. And
don't waste the last segment on each logical 4 GB log file. The concept
of a "logical log file" is now completely gone. XLogRecPtr is unchanged,
but it should now be understood as a plain 64-bit value, just split into
two 32-bit integers for historical reasons. On disk, this means that
there will be log files ending in FF, those were skipped before.

2. Always include the xl_rem_len field, used for continuation records,
in the xlog page header. A continuation log record only contained that
one field, it's now included straight in the page header, so the concept
of a continuation record doesn't exist anymore. Because of alignment,
this wastes 4 bytes on every page that contains continued data from a
previous record, and 8 bytes on pages that don't. That's not very much,
and the next step will buy that back:

3. Allow WAL record header to be split across pages. Per Tom's
suggestion, move xl_tot_len to be the first field in XLogRecord, so that
even if the header is split, xl_tot_len is always on the first page.
xl_crc is moved to be the last field, and xl_prev is the second to last.
This has the advantage that you can calculate the CRC for all the other
fields before acquiring WALInsertLock. For xl_prev, you need to know
where exactly the record is inserted, so it's handy that it's the last
field before CRC. This patch doesn't try to take advantage of that,
however, and I'm not sure if that makes any difference once I finish the
patch to make XLogInsert scale better, which is the ultimate goal of all
this.

Those are the three patches I'd like to get committed in this
commitfest. To see where all this is leading to, I've included a rough
WIP version of the XLogInsert scaling patch. This version is quite
different from the one I posted in spring, it takes advantage of the WAL
format changes, and I'm also experimenting with a different method of
tracking how far each WAL insertion has progressed. But more on that later.

(Note to self: remember to bump XLOG_PAGE_MAGIC)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
1-use-uint64-got-segno.patch text/x-diff 82.2 KB
2-move-continuation-record-to-page-header.patch text/x-diff 5.1 KB
3-allow-wal-record-header-to-be-split.patch text/x-diff 22.1 KB
4-WIP-xloginsert-scale.patch text/x-diff 86.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2012-06-14 21:18:42 Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
Previous Message Robert Haas 2012-06-14 21:00:54 Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode