Re: Streaming replication, some small issues

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming replication, some small issues
Date: 2009-12-08 08:30:59
Message-ID: 4B1E0EC3.8020902@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

A couple of small issues spotted while reviewing the streaming
replication patch:

- Because sentPtr is initialized to zeros, GetOldestWALSendPointer will
return zero before a just-launched WAL sender has sent its first
message. That can lead to WAL files that are still needed by another
standby to be deleted prematurely.

- If a WAL file is not found in the master for some reason, standby goes
into an infinite loop retrying it:

ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
or directory

ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
or directory

ERROR: could not read xlog records: FATAL: could not open file
"pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
or directory

...

- It's possible to shut down master, change max_wal_senders to 0,
restart and do an operation like CLUSTER which then skips WAL-logging.
Then shutdown, change max_wal_senders back to non-zero. All this while
the standby is running. Leads to a corrupt standby.

I've also pushed a couple of small cosmetic changes to replication
branch at git://git.postgresql.org/git/users/heikki/postgres.git

I'll continue reviewing...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-08 11:38:31
Message-ID: 3f0b79eb0912080338m71505de4g1aa61e6229fc1666@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> A couple of small issues spotted while reviewing the streaming
> replication patch:

Thanks for the review!

> - Because sentPtr is initialized to zeros, GetOldestWALSendPointer will
> return zero before a just-launched WAL sender has sent its first
> message. That can lead to WAL files that are still needed by another
> standby to be deleted prematurely.

Oops! I fixed that (in my git repository, see the bottom of this mail).

> - If a WAL file is not found in the master for some reason, standby goes
> into an infinite loop retrying it:
>
> ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
> or directory

http://archives.postgresql.org/pgsql-hackers/2009-09/msg01393.php
>> walreceiver shouldn't die on connection error, just to be restarted by
>> startup process. Can we add error handling a la bgwriter and have a
>> retry loop within walreceiver?

As the result of your current and previous comment, you mean that
walreceiver should always retry connecting to the primary after
a connection error occurs in PQgetXLogData/PQputXLogRecPtr, and
exit after the other errors occur? Though I'm not sure whether
we can determine the error type precisely.

> - It's possible to shut down master, change max_wal_senders to 0,
> restart and do an operation like CLUSTER which then skips WAL-logging.
> Then shutdown, change max_wal_senders back to non-zero. All this while
> the standby is running. Leads to a corrupt standby.

I've regarded this case as a restriction. But, how do you think
we should cope with it?

1. Restriction: only documentation is required?
2. Needs safe guard:
- forbid the primary to perform such operations while the
standby is running?
- emit PANIC error on the standby if the primary which lost sync
restarts?
3. Full solution: automatic resync mechanism is required?

> I've also pushed a couple of small cosmetic changes to replication
> branch at git://git.postgresql.org/git/users/heikki/postgres.git

Your changes seem good.

I pulled and merged your changes into my repository:

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

And, I pushed the capability of replication of a backup history file
into the repository.

> I'll continue reviewing...

Thanks a lot!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-08 11:54:58
Message-ID: 407d949e0912080354t5b1f3b8bscf7c889dd17c6b39@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 8, 2009 at 8:30 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> - It's possible to shut down master, change max_wal_senders to 0,
> restart and do an operation like CLUSTER which then skips WAL-logging.
> Then shutdown, change max_wal_senders back to non-zero. All this while
> the standby is running. Leads to a corrupt standby.

The same thing is possible with archived logs as well, no?

I suspect we should have a WAL record to say "unlogged operation
performed here" which a standby database would recognize and throw a
large warning up. The only reason I say warning is because it might be
reasonable if the relation is some temporary ETL table which isn't
needed in the standby. Perhaps if we note the relation affected we
could throw an error iff the standby is activated with the relation
still existing.

--
greg


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-08 12:05:03
Message-ID: 4B1E40EF.2020405@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> On Tue, Dec 8, 2009 at 8:30 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> - It's possible to shut down master, change max_wal_senders to 0,
>> restart and do an operation like CLUSTER which then skips WAL-logging.
>> Then shutdown, change max_wal_senders back to non-zero. All this while
>> the standby is running. Leads to a corrupt standby.
>
> The same thing is possible with archived logs as well, no?

Yeah, I think you're right.

> I suspect we should have a WAL record to say "unlogged operation
> performed here" which a standby database would recognize and throw a
> large warning up.

+1. Seems like a very simple solution.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-09 01:01:22
Message-ID: 3f0b79eb0912081701t4adbde40s4aa63b1f0df60196@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 8, 2009 at 9:05 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I suspect we should have a WAL record to say "unlogged operation
>> performed here" which a standby database would recognize and throw a
>> large warning up.
>
> +1. Seems like a very simple solution.

Sounds good. This is not just a problem of SR, so I'll implement it
as self-contained feature later.

Design:
- If relation is not temp and archiving (and streaming replication) is enabled,
we log the "unlogged OP" record including relfilenode of the relation.

- If "unlogged OP" record is found during archive recovery, we register its
relfilenode to the hashtable which tracks maybe corrupted relations.
If the registered relfilenode is brandnew, we emit warning. Also, the log
record indicating "DROP TABLE" etc is found, we remove its relfilenode
from the hashtable.

- When restartpoint occurs, we write all the registered relfilenodes to the
flat file.

- At the end of archive recovery, if there is relfilenode in the hashtable, we
emit FATAL error to prevent the server from being brought up.
XXX: But this might be too conservative. I believe that some people want
to complete archive recovery even if a relation is corrupted, and drop that
relation after the server has been activated. So I'm going to provide new
recovery.conf parameter specifying whether to let archive recovery fail
when some relations might be corrupted.

Thought? Am I missing something?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-09 01:12:25
Message-ID: 18363.1260321145@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> Thought? Am I missing something?

This seems terribly overdesigned. Just emit a warning when you see
the "unlogged op" record and have done.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-09 01:51:58
Message-ID: 3f0b79eb0912081751s46a0c88dw5d6021336afce750@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 9, 2009 at 10:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> Thought? Am I missing something?
>
> This seems terribly overdesigned.  Just emit a warning when you see
> the "unlogged op" record and have done.

Sounds quite simple. OK, I'll do so.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-09 09:25:09
Message-ID: 3f0b79eb0912090125h75283c34ta4591a31e826b08@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 9, 2009 at 10:51 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Dec 9, 2009 at 10:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>> Thought? Am I missing something?
>>
>> This seems terribly overdesigned.  Just emit a warning when you see
>> the "unlogged op" record and have done.
>
> Sounds quite simple. OK, I'll do so.

Here is the patch:

- Write an XLOG UNLOGGED record in WAL if WAL-logging is skipped for only
the reason that WAL archiving is not enabled and such record has not been
written yet.

- Cause archive recovery to end if an XLOG UNLOGGED record is found during
it.

I add this patch to the CommitFest 2010-01.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
log_unlogged_op.patch text/x-patch 7.1 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication, some small issues
Date: 2009-12-11 08:01:15
Message-ID: 3f0b79eb0912110001m63c166e8x4eaf7ed0f20137a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 8, 2009 at 5:30 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> - If a WAL file is not found in the master for some reason, standby goes
> into an infinite loop retrying it:
>
> ERROR:  could not read xlog records: FATAL:  could not open file
> "pg_xlog/000000010000000000000000" (log file 0, segment 0): No such file
> or directory

I also fixed this problem.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center