Re: logical changeset generation v6.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.2
Date: 2013-10-22 15:59:35
Message-ID: CA+TgmoY9MY0hh4Od=fBZW3n+5e9dPh8Ey3axdR547TT_ZfnG7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 22, 2013 at 11:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-10-22 10:52:48 -0400, Robert Haas wrote:
>> On Fri, Oct 18, 2013 at 2:26 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > So. As it turns out that solution isn't sufficient in the face of VACUUM
>> > FULL and mixed DML/DDL transaction that have not yet been decoded.
>> >
>> > To reiterate, as published it works like:
>> > For every modification of catalog tuple (insert, multi_insert, update,
>> > delete) that has influence over visibility issue a record that contains:
>> > * filenode
>> > * ctid
>> > * (cmin, cmax)
>> >
>> > When doing a visibility check on a catalog row during decoding of mixed
>> > DML/DDL transaction lookup (cmin, cmax) for that row since we don't
>> > store both for the tuple.
>> >
>> > That mostly works great.
>> >
>> > The problematic scenario is decoding a transaction that has done mixed
>> > DML/DDL *after* a VACUUM FULL/CLUSTER has been performed. The VACUUM
>> > FULL obviously changes the filenode and the ctid of a tuple, so we
>> > cannot successfully do a lookup based on what we logged before.
>>
>> So I have a new idea for handling this problem, which seems obvious in
>> retrospect. What if we make the VACUUM FULL or CLUSTER log the old
>> CTID -> new CTID mappings? This would only need to be done for
>> catalog tables, and maybe could be skipped for tuples whose XIDs are
>> old enough that we know those transactions must already be decoded.
>
> Ah. If it only were so simple ;). That was my first idea, and after I'd
> bragged in an 2ndq internal chat that I'd found a simple idea I
> obviously had to realize it doesn't work.
>
> Consider:
> INIT_LOGICAL_REPLICATION;
> CREATE TABLE foo(...);
> BEGIN;
> INSERT INTO foo;
> ALTER TABLE foo ...;
> INSERT INTO foo;
> COMMIT TX 3;
> VACUUM FULL pg_class;
> START_LOGICAL_REPLICATION;
>
> When we decode tx 3 we haven't yet read the mapping from the vacuum
> freeze. That scenario can happen either because decoding was stopped for
> a moment, or because decoding couldn't keep up (slow connection,
> whatever).

That strikes me as a flaw in the implementation rather than the idea.
You're presupposing a patch where the necessary information is
available in WAL yet you don't make use of it at the proper time. It
seems to me that you have to think of the CTID map as tied to a
relfilenode; if you try to use one relfilenode's map with a different
relfilenode, it's obviously not going to work. So don't do that.

/me looks innocent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-22 16:12:13 Re: logical changeset generation v6.2
Previous Message Andres Freund 2013-10-22 15:02:12 Re: logical changeset generation v6.2