Quick Links

Re: bug of recovery?

Lists:	pgsql-hackers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	bug of recovery?
Date:	2011-09-26 09:50:06
Message-ID:	CAHGQGwEEn13awaWZA-fMb76fNXASLLpqmUY8N-sK55ApZy-ErA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Currently, if a reference to an invalid page is found during recovery,
its information
is saved in hash table "invalid_page_tab". Then, if such a reference
is resolved,
its information is removed from the hash table. If there is unresolved
reference to
an invalid page in the hash table at the end of recovery, PANIC error occurs.

What I'm worried about is that the hash table is volatile. If a user restarts
the server before reaching end of recovery, any information in the
hash table is lost,
and we wrongly miss the PANIC error case because we cannot find any unresolved
reference. That is, even if database is corrupted at the end of recovery,
a user might not be able to notice that. This looks like a serious problem. No?

To prevent the above problem, we should write the contents of the hash table to
the disk for every restartpoints, I think. Then, when the server
starts recovery,
it should reload the hash table from the disk. Thought? Am I missing something?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-26 17:15:18
Message-ID:	8CFFA577-7145-473C-9E2B-8ACE08D5C16C@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep26, 2011, at 11:50 , Fujii Masao wrote:
> Currently, if a reference to an invalid page is found during recovery,
> its information
> is saved in hash table "invalid_page_tab". Then, if such a reference
> is resolved,
> its information is removed from the hash table. If there is unresolved
> reference to
> an invalid page in the hash table at the end of recovery, PANIC error occurs.
>
> What I'm worried about is that the hash table is volatile. If a user restarts
> the server before reaching end of recovery, any information in the
> hash table is lost,
> and we wrongly miss the PANIC error case because we cannot find any unresolved
> reference. That is, even if database is corrupted at the end of recovery,
> a user might not be able to notice that. This looks like a serious problem. No?
>
> To prevent the above problem, we should write the contents of the hash table to
> the disk for every restartpoints, I think. Then, when the server
> starts recovery,
> it should reload the hash table from the disk. Thought? Am I missing something?

Shouldn't references to invalid pages only occur before we reach a consistent
state? If so, the right fix would be to check whether all invalid page references
have been resolved after we've reached a consistent state, and to skip creating
restart points while there're unresolved page references.

best regards,
Florian Pflug

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-26 19:06:41
Message-ID:	CA+U5nML8aK6iMpnaXf=vGYLCRkdqKUFJ6nTLTjEVpLj6oYYTRA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 26, 2011 at 10:50 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> Currently, if a reference to an invalid page is found during recovery,
> its information
> is saved in hash table "invalid_page_tab". Then, if such a reference
> is resolved,
> its information is removed from the hash table. If there is unresolved
> reference to
> an invalid page in the hash table at the end of recovery, PANIC error occurs.
>
> What I'm worried about is that the hash table is volatile. If a user restarts
> the server before reaching end of recovery, any information in the
> hash table is lost,
> and we wrongly miss the PANIC error case because we cannot find any unresolved
> reference. That is, even if database is corrupted at the end of recovery,
> a user might not be able to notice that. This looks like a serious problem. No?
>
> To prevent the above problem, we should write the contents of the hash table to
> the disk for every restartpoints, I think. Then, when the server
> starts recovery,
> it should reload the hash table from the disk. Thought? Am I missing something?

That doesn't happen because the when we stop the server it will
restart from a valid restartpoint - one where there is no in-progress
multi-phase operation.

So when it replays it will always replay both parts of the operation.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-26 20:39:16
Message-ID:	889.1317069556@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Mon, Sep 26, 2011 at 10:50 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> To prevent the above problem, we should write the contents of the hash table to
>> the disk for every restartpoints, I think. Then, when the server
>> starts recovery,
>> it should reload the hash table from the disk. Thought? Am I missing something?

> That doesn't happen because the when we stop the server it will
> restart from a valid restartpoint - one where there is no in-progress
> multi-phase operation.

Not clear that that's true. The larger point though is that the
invalid-page table is only interesting during crash recovery --- once
you've reached a consistent state, it should be empty and remain so.
So I see no particular value in Fujii's proposal of logging the table to
disk during standby mode.

It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
we (think we) have reached consistency, rather than leaving it to be
done only when we exit recovery mode.

regards, tom lane

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-26 22:28:33
Message-ID:	F4BDC274-D382-4BC2-A093-282E29523DD8@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep26, 2011, at 22:39 , Tom Lane wrote:
> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
> we (think we) have reached consistency, rather than leaving it to be
> done only when we exit recovery mode.

I believe we also need to prevent the creation of restart points before
we've reached consistency. If we're starting from an online backup,
and a checkpoint occurred between pg_start_backup() and pg_stop_backup(),
we currently create a restart point upon replaying that checkpoint's
xlog record. At that point, however, unresolved page references are
not an error, since a truncation that happened after the checkpoint
(but before pg_stop_backup()) might or might not be reflected in the
online backup.

best regards,
Florian Pflug

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 00:49:51
Message-ID:	CAHGQGwH2O=-1etYFzE10ftsOsJQEU_RbGcD8PuE4w5CSqkvMMg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 27, 2011 at 7:28 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>> we (think we) have reached consistency, rather than leaving it to be
>> done only when we exit recovery mode.
>
> I believe we also need to prevent the creation of restart points before
> we've reached consistency. If we're starting from an online backup,
> and a checkpoint occurred between pg_start_backup() and pg_stop_backup(),
> we currently create a restart point upon replaying that checkpoint's
> xlog record. At that point, however, unresolved page references are
> not an error, since a truncation that happened after the checkpoint
> (but before pg_stop_backup()) might or might not be reflected in the
> online backup.

Preventing the creation of restartpoints before reaching consistent point
sounds fragile to the case where the backup takes very long time. It might
also take very long time to reach consistent point when replaying from that
backup. Which prevents also the removal of WAL files (e.g., streamed from
the master server) for a long time, and then might cause disk full failure.

ISTM that writing an invalid-page table to the disk for every restartpoints is
better approach. If an invalid-page table is never updated after we've
reached consistency point, we probably should make restartpoints write
that table only after that point. And, if a reference to an invalid
page is found
after the consistent point, we should emit error and cancel a recovery.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 04:05:44
Message-ID:	13861.1317096344@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> ISTM that writing an invalid-page table to the disk for every restartpoints is
> better approach.

I still say that's uncalled-for overkill. The invalid-page table is not
necessary for recovery, it's only a debugging cross-check. You're more
likely to introduce bugs than fix any by adding a mechanism like that.

regards, tom lane

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 05:00:14
Message-ID:	CAHGQGwGwVrsXGKDd5MSO-m3sG+RZ_5zNPvLYHUPD+KLAXw907Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 27, 2011 at 1:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> ISTM that writing an invalid-page table to the disk for every restartpoints is
>> better approach.
>
> I still say that's uncalled-for overkill. The invalid-page table is not
> necessary for recovery, it's only a debugging cross-check.

If so, there is no risk even if the invalid-page table is lost and the check
is skipped unexpectedly?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 05:54:58
Message-ID:	4E816532.4050904@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 26.09.2011 21:06, Simon Riggs wrote:
> On Mon, Sep 26, 2011 at 10:50 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> Currently, if a reference to an invalid page is found during recovery,
>> its information
>> is saved in hash table "invalid_page_tab". Then, if such a reference
>> is resolved,
>> its information is removed from the hash table. If there is unresolved
>> reference to
>> an invalid page in the hash table at the end of recovery, PANIC error occurs.
>>
>> What I'm worried about is that the hash table is volatile. If a user restarts
>> the server before reaching end of recovery, any information in the
>> hash table is lost,
>> and we wrongly miss the PANIC error case because we cannot find any unresolved
>> reference. That is, even if database is corrupted at the end of recovery,
>> a user might not be able to notice that. This looks like a serious problem. No?
>>
>> To prevent the above problem, we should write the contents of the hash table to
>> the disk for every restartpoints, I think. Then, when the server
>> starts recovery,
>> it should reload the hash table from the disk. Thought? Am I missing something?
>
> That doesn't happen because the when we stop the server it will
> restart from a valid restartpoint - one where there is no in-progress
> multi-phase operation.
>
> So when it replays it will always replay both parts of the operation.

I think you're mixing this up with the multi-page page split operations
in b-tree. This is different from that. What the "invalid_page_tab" is
for is the situation where you for example, insert to a page on table X,
and later table X is dropped, and then you crash. On WAL replay, you
will see the insert record, but the file for the table doesn't exist,
because the table was dropped. In that case we skip the insert, note
what happened in invalid_page_tab, and move on with recovery. When we
see the later record to drop the table, we know it was OK that the file
was missing earlier. But if we don't see it before end of recovery, we
PANIC, because then the file should've been there.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 05:59:44
Message-ID:	4E816650.80607@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 27.09.2011 00:28, Florian Pflug wrote:
> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>> we (think we) have reached consistency, rather than leaving it to be
>> done only when we exit recovery mode.
>
> I believe we also need to prevent the creation of restart points before
> we've reached consistency.

Seems reasonable. We could still allow restartpoints when the hash table
is empty, though. And once we've reached consistency, we can throw an
error immediately in log_invalid_page(), instead of adding the entry in
the hash table.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 07:00:50
Message-ID:	CA+U5nMJz3iH2S=D3kh9-4+Def0MaZwrPsRVD7UH1GXvffGKLEA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 27, 2011 at 6:54 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I think you're mixing this up with the multi-page page split operations in
> b-tree. This is different from that. What the "invalid_page_tab" is for is
> the situation where you for example, insert to a page on table X, and later
> table X is dropped, and then you crash. On WAL replay, you will see the
> insert record, but the file for the table doesn't exist, because the table
> was dropped. In that case we skip the insert, note what happened in
> invalid_page_tab, and move on with recovery. When we see the later record to
> drop the table, we know it was OK that the file was missing earlier. But if
> we don't see it before end of recovery, we PANIC, because then the file
> should've been there.

OK, yes, I see. Thanks.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-27 11:06:39
Message-ID:	DC50F45C-95E9-4F9A-B835-B6BDEEFA4EEC@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote:
> On 27.09.2011 00:28, Florian Pflug wrote:
>> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>>> we (think we) have reached consistency, rather than leaving it to be
>>> done only when we exit recovery mode.
>>
>> I believe we also need to prevent the creation of restart points before
>> we've reached consistency.
>
> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency, we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table.

That mimics the way the rm_safe_restartpoint callbacks work, which is good.

Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need to create one that checks whether invalid_page_tab is empty.

best regards,
Florian Pflug

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-29 11:31:11
Message-ID:	CAHGQGwHDmzMs-GjYw-k=Ob+YSeK9c9KpDcBzbm1BDNNp5xcb1g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote:
>> On 27.09.2011 00:28, Florian Pflug wrote:
>>> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>>>> we (think we) have reached consistency, rather than leaving it to be
>>>> done only when we exit recovery mode.
>>>
>>> I believe we also need to prevent the creation of restart points before
>>> we've reached consistency.
>>
>> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency, we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table.
>
> That mimics the way the rm_safe_restartpoint callbacks work, which is good.
>
> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need to create one that checks whether invalid_page_tab is empty.

Okay, the attached patch prevents the creation of restartpoints by using
rm_safe_restartpoint callback if we've not reached a consistent state yet
and the invalid-page table is not empty. But the invalid-page table is not
tied to the specific resource manager, so using rm_safe_restartpoint for
that seems to slightly odd. Is this OK?

Also, according to other suggestions, the patch changes XLogCheckInvalidPages()
so that it's called as soon as we've reached a consistent state, and changes
log_invalid_page() so that it emits PANIC immediately if consistency is already
reached. These are very good changes, I think. Because they enable us to
notice serious problem which causes PANIC error immediately. Without these
changes, you unfortunately might notice that the standby database is corrupted
when failover happens. Though such a problem might rarely happen, I think it's
worth doing those changes.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
invalid_page_table_v1.patch	text/x-patch	6.5 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-29 11:49:11
Message-ID:	CA+U5nMKDoseVh4usTz7B-N_XCFq-JHMFy2Q6=epy2udsS=e+yA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 29, 2011 at 12:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Sep27, 2011, at 07:59 , Heikki Linnakangas wrote:
>>> On 27.09.2011 00:28, Florian Pflug wrote:
>>>> On Sep26, 2011, at 22:39 , Tom Lane wrote:
>>>>> It might be worthwhile to invoke XLogCheckInvalidPages() as soon as
>>>>> we (think we) have reached consistency, rather than leaving it to be
>>>>> done only when we exit recovery mode.
>>>>
>>>> I believe we also need to prevent the creation of restart points before
>>>> we've reached consistency.
>>>
>>> Seems reasonable. We could still allow restartpoints when the hash table is empty, though. And once we've reached consistency, we can throw an error immediately in log_invalid_page(), instead of adding the entry in the hash table.
>>
>> That mimics the way the rm_safe_restartpoint callbacks work, which is good.
>>
>> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need to create one that checks whether invalid_page_tab is empty.
>
> Okay, the attached patch prevents the creation of restartpoints by using
> rm_safe_restartpoint callback if we've not reached a consistent state yet
> and the invalid-page table is not empty. But the invalid-page table is not
> tied to the specific resource manager, so using rm_safe_restartpoint for
> that seems to slightly odd. Is this OK?
>
> Also, according to other suggestions, the patch changes XLogCheckInvalidPages()
> so that it's called as soon as we've reached a consistent state, and changes
> log_invalid_page() so that it emits PANIC immediately if consistency is already
> reached. These are very good changes, I think. Because they enable us to
> notice serious problem which causes PANIC error immediately. Without these
> changes, you unfortunately might notice that the standby database is corrupted
> when failover happens. Though such a problem might rarely happen, I think it's
> worth doing those changes.

Patch does everything we agreed it should.

Good suggestion from Florian.

This worries me slightly now though because the patch makes us PANIC
in a place we didn't used to and once we do that we cannot restart the
server at all. Are we sure we want that? It's certainly a great way to
shake down errors in other code...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Florian Pflug <fgp(at)phlo(dot)org>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-29 14:12:30
Message-ID:	75A06739-4819-4460-A1B2-860309EDEE15@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sep29, 2011, at 13:49 , Simon Riggs wrote:
> This worries me slightly now though because the patch makes us PANIC
> in a place we didn't used to and once we do that we cannot restart the
> server at all. Are we sure we want that? It's certainly a great way to
> shake down errors in other code...

The patch only introduces a new PANIC condition during archive recovery,
though. Crash recovery is unaffected, except that we no longer create
restart points before we reach consistency.

Also, if we hit an invalid page reference after reaching consistency,
the cause is probably either a bug in our recovery code, or (quite unlikely)
a corrupted WAL that passed the CRC check. In both cases, the likelyhood
of data-corruption seems high, so PANICing seems like the right thing to do.

best regards,
Florian Pflug

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Florian Pflug <fgp(at)phlo(dot)org>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-30 01:09:07
Message-ID:	CAHGQGwFGQccis3QNiu7CgmdtiMPUrjFUZeQer5S8aeZhLPHiHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Sep 29, 2011 at 11:12 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Sep29, 2011, at 13:49 , Simon Riggs wrote:
>> This worries me slightly now though because the patch makes us PANIC
>> in a place we didn't used to and once we do that we cannot restart the
>> server at all. Are we sure we want that? It's certainly a great way to
>> shake down errors in other code...
>
> The patch only introduces a new PANIC condition during archive recovery,
> though. Crash recovery is unaffected, except that we no longer create
> restart points before we reach consistency.
>
> Also, if we hit an invalid page reference after reaching consistency,
> the cause is probably either a bug in our recovery code, or (quite unlikely)
> a corrupted WAL that passed the CRC check. In both cases, the likelyhood
> of data-corruption seems high, so PANICing seems like the right thing to do.

Fair enough.

We might be able to use FATAL or ERROR instead of PANIC because they
also cause all processes to exit when the startup process emits them.
For example, we now use FATAL to stop the server in recovery mode
when recovery is about to end before we've reached a consistent state.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-09-30 06:57:02
Message-ID:	CA+U5nMK+S+bqZwEpwQUaVoE+gzqspwq8rp-6TQPFP77kyGS1NQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 30, 2011 at 2:09 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Sep 29, 2011 at 11:12 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Sep29, 2011, at 13:49 , Simon Riggs wrote:
>>> This worries me slightly now though because the patch makes us PANIC
>>> in a place we didn't used to and once we do that we cannot restart the
>>> server at all. Are we sure we want that? It's certainly a great way to
>>> shake down errors in other code...
>>
>> The patch only introduces a new PANIC condition during archive recovery,
>> though. Crash recovery is unaffected, except that we no longer create
>> restart points before we reach consistency.
>>
>> Also, if we hit an invalid page reference after reaching consistency,
>> the cause is probably either a bug in our recovery code, or (quite unlikely)
>> a corrupted WAL that passed the CRC check. In both cases, the likelyhood
>> of data-corruption seems high, so PANICing seems like the right thing to do.
>
> Fair enough.
>
> We might be able to use FATAL or ERROR instead of PANIC because they
> also cause all processes to exit when the startup process emits them.
> For example, we now use FATAL to stop the server in recovery mode
> when recovery is about to end before we've reached a consistent state.

I think we should issue PANIC if the source is a critical rmgr, or
just WARNING if from a non-critical rmgr, such as indexes.

Ideally, I think we should have a mechanism to allow indexes to be
marked corrupt. For example, a file that if present shows that the
index is corrupt and would be marked not valid. We can then create the
file and send a sinval message to force the index relcache to be
rebuilt showing valid set to false.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-10-03 05:23:25
Message-ID:	CAHGQGwGBMUoZ30x80HaROaMuKMSHBA1J4ccYm47+ehhK-u50Rg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 30, 2011 at 3:57 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I think we should issue PANIC if the source is a critical rmgr, or
> just WARNING if from a non-critical rmgr, such as indexes.
>
> Ideally, I think we should have a mechanism to allow indexes to be
> marked corrupt. For example, a file that if present shows that the
> index is corrupt and would be marked not valid. We can then create the
> file and send a sinval message to force the index relcache to be
> rebuilt showing valid set to false.

This seems not to be specific to the invalid-page table problem.
All error cases from a non-critical rmgr should be treated not-PANIC.
So I think that the idea should be implemented separately from
the patch I've posted.

Anyway what if read-only query accesses the index marked invalid?
Just emit ERROR?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-10-03 07:07:10
Message-ID:	CA+U5nMLW1khVOz3+N-qGfprueT0y9kAO2Rq4mPmNgv7hOnPBiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 3, 2011 at 6:23 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> So I think that the idea should be implemented separately from
> the patch I've posted.

Agreed. I'll do a final review and commit today.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-10-03 07:21:26
Message-ID:	4E896276.7020209@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29.09.2011 14:31, Fujii Masao wrote:
> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug<fgp(at)phlo(dot)org> wrote:
>> Actually, why don't we use that machinery to implement this? There's currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need to create one that checks whether invalid_page_tab is empty.
>
> Okay, the attached patch prevents the creation of restartpoints by using
> rm_safe_restartpoint callback if we've not reached a consistent state yet
> and the invalid-page table is not empty. But the invalid-page table is not
> tied to the specific resource manager, so using rm_safe_restartpoint for
> that seems to slightly odd. Is this OK?

I don't think this should use the rm_safe_restartpoint machinery. As you
said, it's not tied to any specific resource manager. And I've actually
been thinking that we will get rid of rm_safe_restartpoint altogether in
the future. The two things that still use it are the b-tree and gin, and
I'd like to change both of those to not require any post-recovery
cleanup step to finish multi-page operations, similar to what I did with
GiST in 9.1.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-10-03 07:32:04
Message-ID:	CA+U5nMJewJBQ4xRDsMKNE1KPK2-iwNo9uNWhYrV3SOnYU73ZVw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 3, 2011 at 8:21 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 29.09.2011 14:31, Fujii Masao wrote:
>>
>> On Tue, Sep 27, 2011 at 8:06 PM, Florian Pflug<fgp(at)phlo(dot)org> wrote:
>>>
>>> Actually, why don't we use that machinery to implement this? There's
>>> currently no rm_safe_restartpoint callback for RM_XLOG_ID, so we'd just need
>>> to create one that checks whether invalid_page_tab is empty.
>>
>> Okay, the attached patch prevents the creation of restartpoints by using
>> rm_safe_restartpoint callback if we've not reached a consistent state yet
>> and the invalid-page table is not empty. But the invalid-page table is not
>> tied to the specific resource manager, so using rm_safe_restartpoint for
>> that seems to slightly odd. Is this OK?
>
> I don't think this should use the rm_safe_restartpoint machinery. As you
> said, it's not tied to any specific resource manager. And I've actually been
> thinking that we will get rid of rm_safe_restartpoint altogether in the
> future. The two things that still use it are the b-tree and gin, and I'd
> like to change both of those to not require any post-recovery cleanup step
> to finish multi-page operations, similar to what I did with GiST in 9.1.

I thought that was quite neat doing it that way, but there's no
specific reason to do it that way I guess. If you're happy to rewrite
the patch then I guess we're OK.

I certainly would like to get rid of rm_safe_restartpoint in the
longer term, hopefully sooner.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-10-04 06:43:30
Message-ID:	CAHGQGwFLAX09=CJdsNOUe513kwrBGfyop45b8EuHsHBXby1+5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 3, 2011 at 4:32 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> I don't think this should use the rm_safe_restartpoint machinery. As you
>> said, it's not tied to any specific resource manager. And I've actually been
>> thinking that we will get rid of rm_safe_restartpoint altogether in the
>> future. The two things that still use it are the b-tree and gin, and I'd
>> like to change both of those to not require any post-recovery cleanup step
>> to finish multi-page operations, similar to what I did with GiST in 9.1.
>
> I thought that was quite neat doing it that way, but there's no
> specific reason to do it that way I guess. If you're happy to rewrite
> the patch then I guess we're OK.
>
> I certainly would like to get rid of rm_safe_restartpoint in the
> longer term, hopefully sooner.

Though Heikki might be already working on that,... anyway,
the attached patch is the version which doesn't use rm_safe_restartpoint
machinery.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
invalid_page_table_v2.patch	text/x-patch	7.1 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-11-24 13:59:23
Message-ID:	CA+U5nM+yp1usAJabYi=Wu0SLPoADogP97EN4p2UNrbWrcqe1oQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 4, 2011 at 7:43 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Oct 3, 2011 at 4:32 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> I don't think this should use the rm_safe_restartpoint machinery. As you
>>> said, it's not tied to any specific resource manager. And I've actually been
>>> thinking that we will get rid of rm_safe_restartpoint altogether in the
>>> future. The two things that still use it are the b-tree and gin, and I'd
>>> like to change both of those to not require any post-recovery cleanup step
>>> to finish multi-page operations, similar to what I did with GiST in 9.1.
>>
>> I thought that was quite neat doing it that way, but there's no
>> specific reason to do it that way I guess. If you're happy to rewrite
>> the patch then I guess we're OK.
>>
>> I certainly would like to get rid of rm_safe_restartpoint in the
>> longer term, hopefully sooner.
>
> Though Heikki might be already working on that,... anyway,
> the attached patch is the version which doesn't use rm_safe_restartpoint
> machinery.

Heikki - I see you are down on the CF app to review this.

I'd been working on it as well, just forgot to let Greg know.

Did you start already? Should I stop?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: bug of recovery?
Date:	2011-12-02 08:59:01
Message-ID:	4ED89355.90006@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 04.10.2011 09:43, Fujii Masao wrote:
> On Mon, Oct 3, 2011 at 4:32 PM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>>> I don't think this should use the rm_safe_restartpoint machinery. As you
>>> said, it's not tied to any specific resource manager. And I've actually been
>>> thinking that we will get rid of rm_safe_restartpoint altogether in the
>>> future. The two things that still use it are the b-tree and gin, and I'd
>>> like to change both of those to not require any post-recovery cleanup step
>>> to finish multi-page operations, similar to what I did with GiST in 9.1.
>>
>> I thought that was quite neat doing it that way, but there's no
>> specific reason to do it that way I guess. If you're happy to rewrite
>> the patch then I guess we're OK.
>>
>> I certainly would like to get rid of rm_safe_restartpoint in the
>> longer term, hopefully sooner.
>
> Though Heikki might be already working on that,...

Just haven't gotten around to it. It's a fair amount of work with little
user-visible benefit.

> anyway,
> the attached patch is the version which doesn't use rm_safe_restartpoint
> machinery.

Thanks, committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com