From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: logical changeset generation v5 |
Date: | 2013-09-03 19:56:15 |
Message-ID: | CA+TgmoaHPnVBfyjcKrbWdgGMMtyftM5y1+zm+Od=w_+NNED4pw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> To my way of thinking, it seems as though we ought to always begin
>> replay at a checkpoint, so the standby ought always to see one of
>> these records immediately. Obviously that's not good enough, but why
>> not?
>
> We always see one after the checkpoint (well, actually before the
> checkpoint record, but ...), correct. The problem is just that reading a
> single xact_running record doesn't automatically make you consistent. If
> there's a single suboverflowed transaction running on the primary when
> the xl_runing_xacts is logged we won't be able to switch to
> consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
> and some optimizations.
> Since the only place where we currently have the information to
> potentially become consistent is ProcArrayApplyRecoveryInfo() we will
> have to wait checkpoint_timeout time till we get consistent. Which
> sucks as there are good arguments to set that to 1h.
> That especially sucks as you loose consistency everytime you restart the
> standby...
Right, OK.
>> And why is every 15 seconds good enough?
>
> Waiting 15s to become consistent instead of checkpoint_timeout seems to
> be ok to me and to be a good tradeoff between overhead and waiting. We
> can certainly discuss other values or making it configurable. The latter
> seemed to be unnecessary to me, but I have don't have a problem
> implementing it. I just don't want to document it :P
I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it. For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot. I realize that the overhead of a WAL record every 15
seconds is fairly small, but the load on some systems is all but
nonexistent. It would be nice not to wake up the HD unnecessarily.
>> The WAL writer is supposed to call XLogBackgroundFlush() every time
>> WalWriterDelay expires. Yeah, it can hibernate, but if it's
>> hibernating, then we should respect that decision for this WAL record
>> type also.
>
> Why should we respect it?
Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.
>> >> I understand why logical replication needs to connect to a database,
>> >> but I don't understand why any other walsender would need to connect
>> >> to a database.
>> >
>> > Well, logical replication actually streams out data using the walsender,
>> > so that's the reason why I want to add it there. But there have been
>> > cases in the past where we wanted to do stuff in the walsender that need
>> > database access, but we couldn't do so because you cannot connect to
>> > one.
>
>> Could you be more specific?
>
> I only remember 3959(dot)1349384333(at)sss(dot)pgh(dot)pa(dot)us but I think it has come up
> before.
It seems we need some more design there. Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes. But then, do the replication commands simply become
SQL commands? I've certainly seen hackers use them that way. And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Antonin Houska | 2013-09-03 21:12:40 | Re: Backup throttling |
Previous Message | Bruce Momjian | 2013-09-03 19:44:01 | Re: [9.4] Make full_page_writes only settable on server start? |