Re: Streaming replication status

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication status
Date: 2010-01-09 06:49:13
Message-ID: 3f0b79eb1001082249r2c410f5q8b1386fc8c765f61@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 9, 2010 at 6:16 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I've gone through the patch in detail now. Here's my list of remaining
> issues:

Great! Thanks a lot!

> * If there's no WAL to send, walsender doesn't notice if the client has
> closed connection already. This is the issue Fujii reported already.
> We'll need to add a select() call to the walsender main loop to check if
> the socket has been closed.

We should reactivate pq_wait() and secure_poll()?

> * I removed the feature that archiver was started during recovery. The
> idea of that was to enable archiving from a standby server, to relieve
> the master server of that duty, but I found it annoying because it
> causes trouble if the standby and master are configured to archive to
> the same location; they will fight over which copies the file to the
> archive first. Frankly the feature doesn't seem very useful as the patch
> stands, because you still have to configure archiving in the master in
> practice; you can't take an online base backup otherwise, and you have
> the risk of standby falling too much behind and having to restore from
> base backup whenever the standby is disconnected for any reason. Let's
> revisit this later when it's truly useful.

Okey.

> * We still have a related issue, though: if standby is configured to
> archive to the same location as master (as it always is on my laptop,
> where I use the postgresql.conf of the master unmodified in the server),
> right after failover the standby server will try to archive all the old
> WAL files that were streamed from the master; but they exist already in
> the archive, as the master archived them already. I'm not sure if this
> is a pilot error, or if we should do something in the server to tell
> apart WAL segments streamed from master and those generated in the
> standby server after failover. Maybe we should immediately create a
> .done file for every file received from master?

There is no guarantee that such file has already been archived by master.
This is just an idea, but new WAL record indicating the completion of the
archiving would be useful for the standby to create .done file. But, this idea
might kill the "archiving during recovery" idea discussed above.

Personally, I'm OK with that issue because we can avoid it by tweaking
archive_command. Could we revisit this discussion with the "archiving
during recovery" discussion later?

> * I don't think we should require superuser rights for replication.
> Although you see all WAL and potentially all data in the system through
> that, a standby doesn't need any write access to the master, so it would
> be good practice to create a dedicated account with limited privileges
> for replication.

Okey to just drop the superuser() check from walsender.c.

> * A standby that connects to master, initiates streaming, and then sits
> idle without stalls recycling of old WAL files in the master. That will
> eventually lead to a full disk in master. Do we need some kind of a
> emergency valve on that?

I think that we need the GUC parameter to specify the maximum number
of log file segments held in pg_xlog directory to send to the standby server.
The replication to the standby which falls more than that GUC value behind
is just terminated.
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01901.php

> * Do we really need REPLICATION_DEBUG_ENABLED? The output doesn't seem
> very useful to me.

This was useful for me to debug the code. But, right now, Okey to drop it.

> * Need to add comments somewhere to note that ReadRecord depends on the
> fact that a WAL record is always send as whole, never split across two
> messages.

Okey.

> * Do we really need to split the sleep in walsender to NAPTIME_PER_CYCLE
>  increments?

Yes. It's required for some platforms (probably HP-UX) in which signals
cannot interrupt the sleep.

> * Walreceiver should flush less aggresively than after each received
> piece of WAL as noted by XXX comment.

> * XXX: Flushing after each received message is overly aggressive. Should
> * implement some sort of lazy flushing. Perhaps check in the main loop
> * if there's any more messages before blocking and waiting for one, and
> * flush the WAL if there isn't, just blocking.

In this approach, if messages continuously arrive from master, the fsync
would be delayed until WAL segment is switched. Likewise, recovery also
would be delayed, which seems to be problem.

How about the straightforward approach; let the process which wants to
flush the buffer send the fsync-request to walreceiver and wait until WAL
is flushed up to the buffer's LSN?

> * Consider renaming PREPARE_REPLICATION to IDENTIFY_SYSTEM or something.

Okey.

> * What's the change in bgwriter.c for?

It's for the bgwriter to know the current timeline for recycling the WAL files.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-01-09 06:53:48 Re: Streaming replication status
Previous Message Alvaro Herrera 2010-01-09 05:17:27 Re: mailing list archiver chewing patches