Re: Streaming Replication patch for CommitFest 2009-09

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication patch for CommitFest 2009-09
Date: 2009-09-17 08:46:48
Message-ID: 9837222c0909170146g7721af7fte033c4a08349f407@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 17, 2009 at 10:08, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>> On Tue, Sep 15, 2009 at 7:53 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> After playing with this a little bit, I think we need logic in the slave
>>> to reconnect to the master if the connection is broken for some reason,
>>> or can't be established in the first place. At the moment, that is
>>> considered as the end of recovery, and the slave starts up. You have the
>>> trigger file mechanism to stop that, but it only gives you a chance to
>>> manually kill and restart the slave before it chooses a new timeline and
>>> starts up, it doesn't reconnect automatically.
>>
>> I was thinking that the automatic reconnection capability is the TODO item
>> for the later CF. The infrastructure for it has already been introduced in the
>> current patch. Please see the macro MAX_WALRCV_RETRIES (backend/
>> postmaster/walreceiver.c). This is the maximum number of times to retry
>> walreceiver. In the current version, this is the fixed value, but we can make
>> this user-configurable (parameter of recovery.conf is suitable, I think).
>
> Ah, I see.
>
> Robert Haas suggested a while ago that walreceiver could be a
> stand-alone utility, not requiring postmaster at all. That would allow
> you to set up streaming replication as another way to implement WAL
> archiving. Looking at how the processes interact, there really isn't
> much communication between walreceiver and the rest of the system, so
> that sounds pretty attractive.

Yes, that would be very very useful.

> Walreceiver is really a slave to the startup process. The startup
> process decides when it's launched, and it's the startup process that
> then waits for it to advance. But the way it's set up at the moment, the
> startup process needs to ask the postmaster to start it up, and it
> doesn't look very robust to me. For example, if launching walreceiver
> fails for some reason, startup process will just hang waiting for it.
>
> I'm thinking that walreceiver should be a stand-alone program that the
> startup process launches, similar to how it invokes restore_command in
> PITR recovery. Instead of using system(), though, it would use
> fork+exec, and a pipe to communicate.

Not having looked at all into the details, that sounds like a nice
improvement :-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Csaba Nagy 2009-09-17 11:22:01 Re: Streaming Replication patch for CommitFest 2009-09
Previous Message Heikki Linnakangas 2009-09-17 08:08:06 Re: Streaming Replication patch for CommitFest 2009-09