Re: logical changeset generation v6

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6
Date: 2013-09-19 14:02:31
Message-ID: CA+TgmoZDE0f32jjRyrXuRs=FTj-D+DnHQysXUOzUSmHFvJvSLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 17, 2013 at 10:31 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Rebased patches attached.

I spent a bit of time looking at these patches yesterday and today.
It seems to me that there's a fair amount of stylistic cleanup that is
still needed, and some pretty bad naming choices, and some FIXMEs that
should probably be fixed, but for an initial review email it seems
more helpful to focus on high-level design, so here goes.

- Looking specifically at the 0004 patch, I think that the
RecentGlobalDataXmin changes are logically separate from the rest of
the patch, and that if we're going to commit them at all, it should be
separate from the rest of this. I think this is basically a
performance optimization. AFAICS, there's no correctness problem with
the idea of continuing to maintain a single RecentGlobalXmin; it's
just that you'll potentially end up with quite a lot of bloat. But
that's not an argument that those two things HAVE to be committed
together; either could be done first, and independently of the other.
Also, these changes are quite complex and it's different to form a
judgement as to whether this idea is safe when they are intermingled
with the rest of the logical replication stuff.

More generally, the thing that bugs me about this approach is that
logical replication is not really special, but what you've done here
MAKES it special. There are plenty of other situations where we are
too aggressive about holding back xmin. A single-statement
transaction that selects from a single table holds back xmin for the
entire cluster, and that is Very Bad. It would probably be unfair to
say, well, you have to solve that problem first. But on the other
hand, how do we know that the approach you've adopted here isn't going
to make the more general problem harder to solve? It seems invasive
at a pretty low level. I think we should at least spend some time
thinking about what *general* solutions to this problem would like
like and then decide whether this is approach is likely to be
forward-compatible with those solutions.

- There are no changes to the "doc" directory. Obviously, if you're
going to add a new value for the wal_level GUC, it's gonna need to be
documented. Similarly, pg_receivellog needs to be documented. In all
likelihood, you also need a whole chapter providing general background
on this technology. A couple of README files is not going to do it,
and those files aren't suitable for check-in anyway (e.g. DESIGN.txt
makes reference to a URL where the current version of some patch can
be found; that's not appropriate for permanent documentation). But
aside from that, what we really need here is user documentation, not
developer documentation. I can perhaps pass judgement on whether the
guts of this functionality do things that are fundamentally unsafe,
but whether the user interface is good or bad is a question that
deserves broader input, and without documentation, most people aren't
going to understand it well enough to know whether they like it. And
TBH, *I* don't really want to reverse-engineer what pg_receivellog
does from a read-through of the code, either.

- Given that we don't reassemble transactions until commit time, why
do we need to to ensure that XIDs are logged before their sub-XIDs
appear in WAL? As I've said previously, I actually think that
on-the-fly reassembly is probably going to turn out to be very
important. But if we're not getting that, do we really need this?
Also, while I'm trying to keep this email focused on high-level
concerns, I have to say that guaranteedlyLogged has got to be one of
the worst variable names I've ever seen, starting (but not ending)
with the fact that guaranteedly is not a word. I'm also tempted to
say that all of the wal_level=logical changes should be split out as
their own patch, separate from the decoding stuff. Personally, I
would find that much easier to review, although I admit it's less
critical here than for the RecentGlobalDataXmin stuff.

- If XLOG_HEAP_INPLACE is not decoded, doesn't that mean that this
facility can't be used to replicate a system catalog table? Is that a
restriction we should enforce/document somehow?

- The new code is rather light on comments. decode.c is extremely
light. For example, consider the function DecodeAbort(), which
contains the following comment:

+ /*
+ * this is a bit grotty, but if we're "faking" an abort we've
already gone
+ * through
+ */

Well, I have no idea what that means. I'm sure you do, but I bet the
next person who isn't you that tries to understand this probably
won't. It's also probably true that I could figure it out if I spent
more time on it, but I think the point of comments is to keep the
amount of time that must be spent trying to understand code to a
manageable level. Generally, I'd suggest that any non-trivial
functions in these files should have a header comment explaining what
their job is; e.g. for DecodeStandbyOp you could write something like
"Decode an RM_STANDBY WAL record. Currently, we only care about
XLOG_RUNNING_XACTS records, which tell us about transactions that may
have aborted when without writing an explicit abort record." Or
whatever the right explanation is. And then particularly tricky bits
should have their own comments.

- It still bothers me that we're going to have mandatory slots for
logical replication and no slots for physical replication. Why are
slots mandatory in one case and not even allowable in the other? Is
it sensible to think about slotless logical replication - or maybe I
should say, is it any LESS sensible than slotless physical
replication?

- What is the purpose of (Un)SuspendDecodingSnapshots? It seems that
should be explained somewhere. I have my doubts about how safe that
is. And I definitely think that SetupDecodingSnapshots() is not OK.
Overwriting the satisfies functions in static pointers may be a great
way to make sure you've covered all bases during development, but I
can't see us wanting that ugliness in the official sources.

- I don't really like "time travel" as a name for reconstructing a
previous snapshot of a catalog. Maybe it's as good as anything, but
it also doesn't help that "during decoding" is used in some places to
refer to the same concept. I wonder if we should call these "logical
replication snapshots" or "historical MVCC snapshots" or somesuch and
then try to make the terminology consistent throughout.
ReorderBufferTXN->does_timetravel really means "time travel will be
needed to decode what this transaction did", which is not really the
same thing.

That's as much relatively-big-picture stuff as I'm able to notice on a
first read-through.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-09-19 14:11:01 Re: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Previous Message Sawada Masahiko 2013-09-19 13:41:30 Re: Patch for fail-back without fresh backup