Re: logical changeset generation v6.1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.1
Date: 2013-10-01 14:07:19
Message-ID: CA+TgmobekX+n8REsWLs1LciChrWX9-cDWERdnxDBYmcyTudabg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Review comments on 0004:

- In heap_insert and heap_multi_insert, please rewrite the following
comment for clarity: "add record for the buffer without actual content
thats removed if fpw is done for that buffer".
- In heap_delete, the assignment to need_tuple_data() need not
separately check RelationNeedsWAL(), as RelationIsLogicallyLogged()
does that.
- It seems that HeapSatisfiesHOTandKeyUpdate is now
HeapSatisfiesHOTandKeyandCandidateKeyUpdate. Considering I think this
was merely HeapSatisfiesHOTUpdate a year ago, it's hard not to be
afraid that something unscalable is happening to this function. On a
related node, any overhead added here costs broadly; I'm not sure if
there's enough to worry about.
- MarkCurrentTransactionIdLoggedIfAny has superfluous braces.
- AssignTransactionId changes "Mustn't" to "May not", which seems like
an entirely pointless change.
- You've removed a blank line just before IsSystemRelation; this is an
unnecessary whitespace change.
- Do none of the callers of IsSystemRelation() care about the fact
that you've considerably changed the semantics?
- RelationIsDoingTimetravel is still a crappy name. How about
RelationRequiredForLogicalDecoding? And maybe the reloption
treat_as_catalog_table can become required_for_logical_decoding.
- I don't understand the comment in xl_heap_new_cid to the effect that
the combocid isn't needed for decoding. How not?
- xlogreader.h includes an additional header with no other changes.
Doesn't seem right.
- relcache.h has a cuddled curly brace.

Review comments on 0003:

I have no problem with caching the primary key in the relcache, or
with using that as the default key for logical decoding, but I'm
extremely uncomfortable with the fallback strategy when no primary key
exists. Choosing any old unique index that happens to present itself
as the primary key feels wrong to me. The choice of key is
user-visible. If we say, update the row with a = 1 to
(a,b,c)=(2,2,2), that's different than saying update the row with b =
1 to (a,b,c)=(2,2,2). Suppose the previous contents of the target
table are (a,b,c)=(1,2,3) and (a,b,c)=(2,1,4). You get different
answers depending on which you choose. I think multi-master
replication just isn't going to work unless the two sides agree on the
key, and I think you'll get strange conflicts unless that key is
chosen by the user according to their business logic.

In single-master replication, being able to pick the key is clearly
not essential for correctness, but it's still desirable, because if
the system picks the "wrong" key, the change stream will in the end
get the database to the right state, but it may do so by "turning one
record into a different one" from the user's perspective.

All in all, it seems to me that we shouldn't try to punt. Maybe we
should have something that works like ALTER TABLE name CLUSTER ON
index_name to configure which index should be used for logical
replication. Possibly this same syntax could be used as ALTER
MATERIALIZED VIEW to set the candidate key for that case.

What happens if new unique indexes are created or old ones dropped
while logical replication is running?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-10-01 14:12:13 Re: Cmpact commits and changeset extraction
Previous Message Andres Freund 2013-10-01 13:35:44 [PATCH] Fix pg_isolation_regress to work outside its build directory