Re: logical changeset generation v5

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v5
Date: 2013-09-03 19:56:15
Message-ID: CA+TgmoaHPnVBfyjcKrbWdgGMMtyftM5y1+zm+Od=w_+NNED4pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> To my way of thinking, it seems as though we ought to always begin
>> replay at a checkpoint, so the standby ought always to see one of
>> these records immediately. Obviously that's not good enough, but why
>> not?
>
> We always see one after the checkpoint (well, actually before the
> checkpoint record, but ...), correct. The problem is just that reading a
> single xact_running record doesn't automatically make you consistent. If
> there's a single suboverflowed transaction running on the primary when
> the xl_runing_xacts is logged we won't be able to switch to
> consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
> and some optimizations.
> Since the only place where we currently have the information to
> potentially become consistent is ProcArrayApplyRecoveryInfo() we will
> have to wait checkpoint_timeout time till we get consistent. Which
> sucks as there are good arguments to set that to 1h.
> That especially sucks as you loose consistency everytime you restart the
> standby...

Right, OK.

>> And why is every 15 seconds good enough?
>
> Waiting 15s to become consistent instead of checkpoint_timeout seems to
> be ok to me and to be a good tradeoff between overhead and waiting. We
> can certainly discuss other values or making it configurable. The latter
> seemed to be unnecessary to me, but I have don't have a problem
> implementing it. I just don't want to document it :P

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it. For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot. I realize that the overhead of a WAL record every 15
seconds is fairly small, but the load on some systems is all but
nonexistent. It would be nice not to wake up the HD unnecessarily.

>> The WAL writer is supposed to call XLogBackgroundFlush() every time
>> WalWriterDelay expires. Yeah, it can hibernate, but if it's
>> hibernating, then we should respect that decision for this WAL record
>> type also.
>
> Why should we respect it?

Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.

>> >> I understand why logical replication needs to connect to a database,
>> >> but I don't understand why any other walsender would need to connect
>> >> to a database.
>> >
>> > Well, logical replication actually streams out data using the walsender,
>> > so that's the reason why I want to add it there. But there have been
>> > cases in the past where we wanted to do stuff in the walsender that need
>> > database access, but we couldn't do so because you cannot connect to
>> > one.
>
>> Could you be more specific?
>
> I only remember 3959(dot)1349384333(at)sss(dot)pgh(dot)pa(dot)us but I think it has come up
> before.

It seems we need some more design there. Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes. But then, do the replication commands simply become
SQL commands? I've certainly seen hackers use them that way. And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2013-09-03 21:12:40 Re: Backup throttling
Previous Message Bruce Momjian 2013-09-03 19:44:01 Re: [9.4] Make full_page_writes only settable on server start?