Re: logical changeset generation v6.1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.1
Date: 2013-10-01 14:58:10
Message-ID: CA+TgmoaLOJ-m+10j40UuV+uOtXA-CH=B0NT-ywfSPEn5pS3YCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 1, 2013 at 10:31 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> I have no problem with caching the primary key in the relcache, or
>> with using that as the default key for logical decoding, but I'm
>> extremely uncomfortable with the fallback strategy when no primary key
>> exists. Choosing any old unique index that happens to present itself
>> as the primary key feels wrong to me.
>> [stuff I don't disagree with]
>
> People lobbied vigorously to allow candidate keys before. I personally
> would never want to use anything but an actual primary key for
> replication, but there's other usecases than replication.

I like allowing candidate keys; I just don't like assuming that any
old one we select will be as good as any other.

>> All in all, it seems to me that we shouldn't try to punt. Maybe we
>> should have something that works like ALTER TABLE name CLUSTER ON
>> index_name to configure which index should be used for logical
>> replication. Possibly this same syntax could be used as ALTER
>> MATERIALIZED VIEW to set the candidate key for that case.
>
> I'd be fine with that, but I am also not particularly interested in it
> because I personally don't see much of a usecase.
> For replication ISTM the only case where there would be no primary key
> is a) initial load b) replacing the primary key by another index.

The latter is the case I'd be principally concerned about. I once had
to change the columns that formed the key for a table being used in a
production web application; fortunately, it has traditionally not
mattered much whether a unique index is the primary key, so creating a
new unique index and dropping the old primary key was good enough.
But I would have wanted to control the point at which we changed our
notion of what the candidate key was, I think.

One other thought: you could just log the whole old tuple if there's
no key available. That would let this work on tables that don't have
indexes. Replaying the changes might be horribly complex and slow,
but extracting them would work. If a replication plugin got <old
tuple, new tuple> with no information on keys, it could find *a* tuple
(not all tuples) that match the old tuple exactly and update each
column to the value from new tuple. From a correctness point of view,
there's no issue there; it's all about efficiency. But the user can
solve that problem whenever they like by indexing the destination
table. It need not even be a unique index, so long as it's reasonably
selective.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-10-01 15:15:36 C question about bitmasks in datetime.c
Previous Message David Johnston 2013-10-01 14:49:31 Re: Documentation for SET var_name FROM CURRENT