Re: [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Andres Freund" <andres(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH 07/16] Log enough data into the wal to reconstruct logical changes from it if wal_level=logical
Date: 2012-06-13 15:27:06
Message-ID: 4FD86AFA02000025000483F6@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> This adds a new wal_level value 'logical'
>
> Missing cases:
> - heap_multi_insert
> - primary key changes for updates
> - no primary key
> - LOG_NEWPAGE

First, Wow!

I look forward to the point where we can replace our trigger-based
replication with this! Your "missing cases" for primary key issues
would not cause us any pain for our current system, since we require
a primary key and don't support updates to PKs for replicated
tables. While I don't expect that the first cut of this will be able
to replace our replication-related functionality, I'm interested in
making sure it can be extended in that direction, so I have a couple
things to consider:

(1) For our usage, with dozens of source databases feeding into
multiple aggregate databases and interfaces, DDL replication is not
of much if any interest. It should be easy enough to ignore as long
as it is low volume, so that doesn't worry me too much; but if I'm
missing something any you run across any logical WAL logging for DDL
which does generate a lot of WAL traffic, it would be nice to have a
way to turn that off at generation time rather than filtering it or
ignoring it later. (Probably won't be an issue, just a head-up.)

(2) To match the functionality we now have, we would need the
logical stream to include the *before* image of the whole tuple for
each row updated or deleted. I understand that this is not needed
for the use cases you are initially targeting; I just hope the
design leaves this option open without needing to disturb other use
cases. Perhaps this would require yet another wal_level value.
Perhaps rather than testing the current value directly for
determining whether to log something, the GUC processing could set
some booleans for faster testing and less code churn when the
initial implementation is expanded to support other use cases (like
ours).

(3) Similar to point 2, it would be extremely desirable to be able
to determine table name and columns names for the tuples in a stream
from that stream, without needing to query a hot standby or similar
digging into other sources of information. Not only will the
various source databases all have different OID values for the same
objects, and the aggregate targets have different values from each
other and the sources, but some targets don't have the tables at
all. I'm talking about our database transaction repository and the
interfaces to business partners which we currently drive off of the
same transaction stream which drives replication.

Would it be helpful or just a distraction if I were to provide a
more detailed description of our whole replication / transaction
store / interface area?

If it would be useful, I could also describe some other replication
patterns I have seen over the years. In particular, one which might
be interesting is where subsets of the data are distributed to
multiple standalone machines which have intermittent or unreliable
connections to a central site, which periodically collects data from
all the remote sites, recalculates distribution, and sends
transactions back out to those remote sites to add, remove, and
update rows based on the distribution rules and the new data.

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2012-06-13 15:39:36 Re: [PATCH 16/16] current version of the design document
Previous Message Robert Haas 2012-06-13 15:23:55 9.3devel branch