Re: Updating FSM on recovery

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Updating FSM on recovery
Date: 2008-10-28 14:22:25
Message-ID: 49072021.7010801@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The one remaining issue I'd like to address in the new FSM
implementation is the fact that the FSM is currently not updated at all
in WAL recovery. The old FSM wasn't updated on WAL recovery either, and
was in fact completely thrown away if the system wasn't shut down
cleanly. The difference is that after recovery, we used to start with no
FSM information at all, and all inserts would have to extend the
relations until the next vacuum, while now the inserts use the old data
in the FSM. In case of a PITR recovery or warm stand-by, the FSM would
information would come from the last base backup, which could be *very* old.

The first inserter after the recovery might have to visit a lot of pages
that the FSM claimed had free space, but didn't in reality, before
finding a suitable target. In the absolutely worst case, where the table
was almost empty when the base backup was taken, but is now full, it
might have to visit every single heap page. That's not good.

So we should try to update the FSM during recovery as well. It doesn't
need to be very accurate, as the FSM information isn't accurate anyway,
but we should try to avoid the worst case scenarios.

The attached patch is my first attempt at that. Arbitrarily, if after a
heap insert/update there's less than 20% of free space on the page, the
FSM is updated. Compared to updating it every time, that saves a lot of
overhead, while doing a pretty good job at marking full pages as full in
the FSM. My first thought was to update the FSM if there isn't enough
room on the page for a new tuple of the same size as the one just
inserted; that would be pretty close to the logic we have during normal
operation, where the FSM is updated when the tuple that we're about to
insert doesn't fit on the page. But because we don't know the fillfactor
during recovery, I don't think we can do reliably.

One issue with this patch is that it doesn't update the FSM at all when
pages are restored from full page images. It would require fetching the
page and checking the free space on it, or peeking into the size of the
backup block data, and I'm not sure if it's worth the extra code to do that.

Thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fsm-updateonrecovery-1.patch text/x-diff 5.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 14:51:18
Message-ID: 24334.1225205478@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> So we should try to update the FSM during recovery as well. It doesn't
> need to be very accurate, as the FSM information isn't accurate anyway,
> but we should try to avoid the worst case scenarios.

Agreed.

> One issue with this patch is that it doesn't update the FSM at all when
> pages are restored from full page images. It would require fetching the
> page and checking the free space on it, or peeking into the size of the
> backup block data, and I'm not sure if it's worth the extra code to do that.

I'd vote not to bother, at least not in the first cut. As you say, 100%
accuracy isn't required, and I think that in typical scenarios an
insert/update that causes a page to become full would be relatively less
likely to have a full-page image.

As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
just call XLogReadBufferWithFork with init = true, and then initialize
the page if PageIsNew?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 15:35:49
Message-ID: 1225208150.3971.191.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-10-28 at 16:22 +0200, Heikki Linnakangas wrote:

> Arbitrarily, if after a
> heap insert/update there's less than 20% of free space on the page,
> the FSM is updated. Compared to updating it every time, that saves a
> lot of overhead, while doing a pretty good job at marking full pages
> as full in the FSM. My first thought was to update the FSM if there
> isn't enough room on the page for a new tuple of the same size as the
> one just
> inserted; that would be pretty close to the logic we have during
> normal
> operation, where the FSM is updated when the tuple that we're about
> to
> insert doesn't fit on the page. But because we don't know the
> fillfactor
> during recovery, I don't think we can do reliably.

With HOT, we tend to hover around the nearly-full state, so this seems
like it will trigger repeatedly.

Is it possible that we could put an extra field onto a heap_clean record
to show remaining space. We would use it only for VACUUMs, not HOT, just
as we do now.

Probably good idea to make a list of user cases and say what we do in
eahc case. e.g. COPY, other bulk ops, HOT etc..

I wonder if there is merit in having an XLogInsertMulti() which inserts
multiple records in a batch as a way of reducing WALInsertLock traffic.
It might be possible to piggyback FSM records onto the main heap
changes.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 16:02:31
Message-ID: 1225209751.3971.196.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-10-28 at 15:35 +0000, Simon Riggs wrote:

> I wonder if there is merit in having an XLogInsertMulti() which inserts
> multiple records in a batch as a way of reducing WALInsertLock traffic.
> It might be possible to piggyback FSM records onto the main heap
> changes.

Or possibly an XLogInsertDeferred() which just queues up some work so
the next time we call XLogInsert() it will insert the deferred work as
well as the main work all in one lock cycle. It would only be usable for
low priority info like FSM stuff that isn't needed for recovery. Maybe
we could do that with hints also.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 16:12:38
Message-ID: 490739F6.3040103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Tue, 2008-10-28 at 16:22 +0200, Heikki Linnakangas wrote:
>
>> Arbitrarily, if after a
>> heap insert/update there's less than 20% of free space on the page,
>> the FSM is updated. Compared to updating it every time, that saves a
>> lot of overhead, while doing a pretty good job at marking full pages
>> as full in the FSM. My first thought was to update the FSM if there
>> isn't enough room on the page for a new tuple of the same size as the
>> one just
>> inserted; that would be pretty close to the logic we have during
>> normal
>> operation, where the FSM is updated when the tuple that we're about
>> to
>> insert doesn't fit on the page. But because we don't know the
>> fillfactor
>> during recovery, I don't think we can do reliably.
>
> With HOT, we tend to hover around the nearly-full state, so this seems
> like it will trigger repeatedly.

Hmm, true. Perhaps we should skip updating the FSM on HOT updates. After
recovery, the new HOT-updated tuples are prunable anyway, so for
inserting a new tuple, the page is almost as good as it was before the
HOT update.

> Is it possible that we could put an extra field onto a heap_clean record
> to show remaining space. We would use it only for VACUUMs, not HOT, just
> as we do now.

Sure, we could do that. I'm more worried about "killing" the pages from
the FSM that are full, though, than keeping track of pages with plenty
of free space accurately.

> I wonder if there is merit in having an XLogInsertMulti() which inserts
> multiple records in a batch as a way of reducing WALInsertLock traffic.
> It might be possible to piggyback FSM records onto the main heap
> changes.

Umm, in the version that was finally committed, FSM doesn't generate any
extra WAL records (except for truncation).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 16:16:16
Message-ID: 7992.1225210576@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Or possibly an XLogInsertDeferred() which just queues up some work so
> the next time we call XLogInsert() it will insert the deferred work as
> well as the main work all in one lock cycle. It would only be usable for
> low priority info like FSM stuff that isn't needed for recovery. Maybe
> we could do that with hints also.

If it isn't needed for recovery, why would we be logging it at all?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 16:45:14
Message-ID: 1225212314.3971.206.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-10-28 at 12:16 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > Or possibly an XLogInsertDeferred() which just queues up some work so
> > the next time we call XLogInsert() it will insert the deferred work as
> > well as the main work all in one lock cycle. It would only be usable for
> > low priority info like FSM stuff that isn't needed for recovery. Maybe
> > we could do that with hints also.
>
> If it isn't needed for recovery, why would we be logging it at all?

You just agreed that the info didn't need to be very accurate. There's a
few things on the server that aren't needed for recovery, but it might
be useful if they were logged occasionally to give roughly correct
values.

Contention on WALInsertLock seems to be a problem, yet writing WAL to
disk is not a bottleneck. Deferring writing it slightly to allow things
to be batched might be one way of smoothing the impact of that type of
operation. That might be better than a heuristic method of reducing the
number of inserts.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 17:10:17
Message-ID: 49074779.2080309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
> just call XLogReadBufferWithFork with init = true, and then initialize
> the page if PageIsNew?

With init=true, we don't even try to read the page from the disk (since
8.3), so all FSM pages accessed during recovery would be zeroed out. I
don't think that's what you intended.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-28 17:36:30
Message-ID: 9672.1225215390@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
>> just call XLogReadBufferWithFork with init = true, and then initialize
>> the page if PageIsNew?

> With init=true, we don't even try to read the page from the disk (since
> 8.3), so all FSM pages accessed during recovery would be zeroed out. I
> don't think that's what you intended.

Ah, right. Maybe the API change you suggested in the comment is the
way to go.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-29 12:12:23
Message-ID: 49085327.5010608@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
>>> just call XLogReadBufferWithFork with init = true, and then initialize
>>> the page if PageIsNew?
>
>> With init=true, we don't even try to read the page from the disk (since
>> 8.3), so all FSM pages accessed during recovery would be zeroed out. I
>> don't think that's what you intended.
>
> Ah, right. Maybe the API change you suggested in the comment is the
> way to go.

Done, patch attached. But while I was hacking that, I realized another
problem:

Because changes to FSM pages are not WAL-logged, they can be "torn" if
at crash, one part of the page is flushed to disk, but another is not.
The FSM code will recover from internally inconsistent pages, caused by
torn pages or other errors, but we still have a problem if the FSM file
is extended, and the new page is torn. It can happen that the first part
of the page, containing the page header, doesn't make it to disk, but
other parts of the page do. ReadBuffer() checks that the page header is
valid, so it will throw an error on a torn page like that. ReadBuffer()
doesn't complain about a page that is all-zeros, but it's not in this
scenario.

The FSM would be perfectly happy to just initialize torn or otherwise
damaged pages, so I think we should add yet another mode to ReadBuffer()
to allow that. We could also treat read() errors as merely warnings in
that mode, effectively the same as with zero_damaged_pages=on.

The ReadBuffer() interface is already pretty complex, with all the
different variants. We should probably keep the good old ReadBuffer()
the same, for the sake of simplicity in the callers, but try to reduce
the number of other variatns.

The current API is this:

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferWithFork(Relation reln, ForkNumber forkNum, BlockNumber
blockNum);
Buffer ReadBufferWithStrategy(Relation reln, BlockNumber blockNum,
BufferAccessStrategy strategy);
Buffer ReadOrZeroBuffer(Relation reln, ForkNumber forkNum, BlockNumber
blockNum);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
ForkNumber forkNum, BlockNumber blockNum, bool zeroPage);

Here's my proposal for new API:

typedef enum
{
RBM_NORMAL, /* checks header, ereport(ERROR) on errors */
RBM_INIT, /* just allocate a buffer, don't read from disk. Caller
must initialize the page */
RBM_INIT_ON_ERROR /* read, but instead of ERRORing, return an
all-zeros page */
} ReadBufferMode;

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber
blockNum, BufferAccessStrategy strategy, ReadBufferMode mode);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp,
ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode);

Thoughts?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fsm-updateonrecovery-2.patch text/x-diff 13.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-29 12:36:54
Message-ID: 11303.1225283814@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> The FSM would be perfectly happy to just initialize torn or otherwise
> damaged pages, so I think we should add yet another mode to ReadBuffer()
> to allow that. We could also treat read() errors as merely warnings in
> that mode, effectively the same as with zero_damaged_pages=on.

> The ReadBuffer() interface is already pretty complex, with all the
> different variants. We should probably keep the good old ReadBuffer()
> the same, for the sake of simplicity in the callers, but try to reduce
> the number of other variatns.

Indeed. Did you see the discussion about the similarly-too-complex
heap_insert API a couple days ago in connection with bulk-write
scenarios? The conclusion there was to try to shift stuff into a
bitmask options argument, in hopes that future additions might not
require touching every caller. Can we do it similarly here?

regards, tom lane


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-29 13:17:18
Message-ID: 603c8f070810290617rcdc4da4yc2cf88deae4b5ae3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
> Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber
> blockNum, BufferAccessStrategy strategy, ReadBufferMode mode);
> Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, ForkNumber
> forkNum, BlockNumber blockNum, ReadBufferMode mode);
>
> Thoughts?

I'm not sure why we would abbreviate Extended to Ext when nothing else
in here is abbreviated. Seems needlessly inconsistent.

We may also want to rethink our approach to BufferAccessStrategy a
bit. Right now, we don't admit that
GetBufferAccessStrategy(BAS_NORMAL) just returns a NULL pointer - we
expect the caller to get that strategy and later call
FreeBufferAccessStrategy it just as if it were a real object.
Particularly in light of this API change, I think we should just give
up on that. Otherwise, a caller that wants to specify a fork number
or ReadBufferMode has to get and free an access strategy that doesn't
amount to anything. Perhaps it would be sufficient to do this:

#define NormalBufferAccessStrategy NULL

That way, it would be easy to grep for any place where we used this to
get around a useless pair of get/free calls if we ever need to go back
and make a normal buffer access strategy into a real object.

...Robert


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Updating FSM on recovery
Date: 2008-10-30 08:40:05
Message-ID: 490972E5.1010605@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> The ReadBuffer() interface is already pretty complex, with all the
>> different variants. We should probably keep the good old ReadBuffer()
>> the same, for the sake of simplicity in the callers, but try to reduce
>> the number of other variatns.
>
> Indeed. Did you see the discussion about the similarly-too-complex
> heap_insert API a couple days ago in connection with bulk-write
> scenarios? The conclusion there was to try to shift stuff into a
> bitmask options argument, in hopes that future additions might not
> require touching every caller. Can we do it similarly here?

Hmm. I think an enum is better than a bitmask here. At the moment, we
need three different modes of operation:
1. Read the page as usual, throw an error on corrupted page (ReadBuffer())
2. Read the page, zero page on corruption (this is new)
3. Don't read the page from disk, just allocate a buffer.
(ReadOrZeroBuffer())

If we turned this into a bitmask, what would the bits be? Perhaps:

DONT_READ /* don't read the page from disk, just allocate buffer */
NO_ERROR_ON_CORRUPTION /* don't throw an error if page is corrupt */

With two bits, there's four different combinations. I don't think the
DONT_READ | NO_ERROR_ON_CORRUPTION combination makes much sense. Also,
negative arguments like that can be confusing, but if we inverted the
meanings, most callers would have to pass both flags to get the normal
behavior.

Looking into the crystal ball, there's two forthcoming features to the
interface that I can see:
1. Pin the buffer if the page is in buffer cache. If it's not, do
nothing. This is what Simon proposed for the B-tree vacuum interlocking,
and I can see that it might be useful elsewhere as well.
2. The posix_fadvise() thing. Or async I/O. It looks like it's going to
be a separate function you call before ReadBuffer(), but it could also
be implemented as a new mode to ReadBuffer() that just allocates a
buffer, issues a posix_fadvise(), and returns. You would then pass the
Buffer to another function to finish the read and make the contents of
the buffer valid.

Neither of these fits too well with the bitmask. Neither would make
sense with DONT_READ or NO_ERROR_ON_CORRUPTION.

So, attached is a patch using an enum. Barring objections, I'll commit this.

There is a conflict with Simon's hot standby patch, though. Simon's
patch adds yet another argument to XLogReadBufferWithFork(), to indicate
whether a normal exclusive lock or a cleanup lock is taken on the
buffer. I'm inclined to change the interface of XLogReadBufferExtended
(as it's now called, after this patch) so that it only pins the page,
and leave the locking to the caller.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
readbuffer-refactor-1.patch text/x-diff 45.7 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-30 09:57:58
Message-ID: 1225360678.3971.362.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-10-30 at 10:40 +0200, Heikki Linnakangas wrote:

> So, attached is a patch using an enum. Barring objections, I'll commit
> this.

I probably agree with the changes from reading your post, but I'd ask
that you hang fire on committing this for a few days.

It's just going to prevent Koichi and myself from submitting clean
patches on F-Day, or it will cause us to spend time on rework before
we've even submitted the patch. I'd like to avoid the pileup for now,
though don't have any problem with the rework after that point.

Thanks,

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Updating FSM on recovery
Date: 2008-10-30 10:04:57
Message-ID: 1225361097.3971.367.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Thu, 2008-10-30 at 09:57 +0000, Simon Riggs wrote:
> On Thu, 2008-10-30 at 10:40 +0200, Heikki Linnakangas wrote:
>
> > So, attached is a patch using an enum. Barring objections, I'll commit
> > this.
>
> I probably agree with the changes from reading your post, but I'd ask
> that you hang fire on committing this for a few days.

Best thing from here is for me to just freeze my tree for next few days.
It will make my submission a few days out of date, but we can fix that
up fairly quickly.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Updating FSM on recovery
Date: 2008-10-30 10:52:50
Message-ID: 87iqracq2l.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:

> Hmm. I think an enum is better than a bitmask here. At the moment, we need
> three different modes of operation:
> 1. Read the page as usual, throw an error on corrupted page (ReadBuffer())
> 2. Read the page, zero page on corruption (this is new)

Is this new? Would it make sense for zero_damaged_pages to use this? Perhaps
the enum should have an option to error on damaged pages, warn and zero
damaged pages, or just zero damaged pages.

We might also want different behaviour for pages for which the crc doesn't
match versus pages that have nonsensical page headers.

> 3. Don't read the page from disk, just allocate a buffer. (ReadOrZeroBuffer())

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!