Re: Proposal for 9.1: WAL streaming from WAL buffers

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for 9.1: WAL streaming from WAL buffers
Date: 2010-06-21 12:49:25
Message-ID: AANLkTim3x4SJf4ipsqoDTsu2vW2eYbM9AFapO7UqSSPj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 21, 2010 at 10:40 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I guess, but you have to be very careful to correctly refrain from applying
> the WAL. For example, a naive implementation might write the WAL to disk in
> walreceiver immediately, but refrain from telling the startup process about
> it. If walreceiver is then killed because the connection is broken (and it
> will be because the master just crashed), the startup process will read the
> streamed WAL from the file in pg_xlog, and go ahead to apply it anyway.

So the goal is that when you *do* failover to the standby it replays
these additional records. So whether the startup process obeys this
limit would have to be conditional on whether it's still in standby
mode.

> So maybe there's some room for optimization there, but given the round-trip
> required for the acknowledgment anyway it might not buy you much, and the
> implementation is not very straightforward. This is clearly 9.1 material, if
> worth optimizing at all.

I don't see any need for a round-trip acknowledgement -- no more than
currently. the master just includes the flush location in every
response. It might have to send additional responses though when
fsyncs happen to update the flush location even if no additional
records are sent. Otherwise a hot standby might spend a long time with
out-dated data even if on failover it would be up to date that seems
nonideal for the hot standby users.

I think this would be a good improvement for databases processing
large batch updates so the standby doesn't have an increased risk of
losing a large amount of data if there's a crash after processing such
a large query. I agree it's 9.1 material.

Earlier we made a change to the WAL streaming protocol on the basis
that we wanted to get the protocol right even if we don't use the
change right away. I'm not sure I understand that -- it's not like
we're going to stream WAL from 9.0 to 9.1. But if that was true then
perhaps we need to add the WAL flush location to the protocol now even
if we're not going to use yet?

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-06-21 12:53:06 Re: Keepalive for max_standby_delay
Previous Message Robert Haas 2010-06-21 11:11:56 Re: beta3 & the open items list