Re: WAL format

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL format
Date: 2009-12-08 06:40:11
Message-ID: 4B1DF4CB.5010105@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> In particular I wonder why we bother with the page headers.
>
>> Since we re-use the file for a new segment, without overwriting the
>> old contents, it seems like we would need to do *something* to
>> reliably determine when we've hit the end of a segment and have
>> moved into old data from a previous use of the file. Would your
>> proposed changes cover that adequately?
>
> AFAICT the proposal would make us 100% dependent on the record CRC
> to detect when a record has been torn (ie, only the first few sectors
> made it to disk). I'm a bit nervous about that from a reliability
> standpoint --- with a 32-bit CRC you've got a 1-in-4-billion chance
> of accepting bad data. Checking the page headers too gives us many
> more bits that have to be as-expected to consider the data good.

We also check the prev-link, and some weak checks on rmid, and the
length fields.

> Since the records are fed to XLogInsert as units, it seems like the
> actual problem might be addressable by hooking in the sync-rep data
> sending at that level, rather than looking at the WAL page buffers
> as I gather it must be doing now.

No, walsender reads from disk. The sending side actually looks OK to me,
it's the code in ReadRecord that reads partial pages at the receiving
end that I'd like to simplify. It works as it is, but we have to re-read
the most recent page when it wasn't received as whole yet, and add some
state to track that. I think it's already relying on the fact that
walsender always sends full records (it can stop at a page boundary, at
a continuation record).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2009-12-08 06:40:47 Re: EXPLAIN BUFFERS
Previous Message Daniel Farina 2009-12-08 06:01:04 Re: questions about concurrency control in Postgresql