Re: logical changeset generation v6.2

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v6.2
Date: 2013-10-22 15:02:12
Message-ID: 20131022150212.GC7435@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-10-22 10:52:48 -0400, Robert Haas wrote:
> On Fri, Oct 18, 2013 at 2:26 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > So. As it turns out that solution isn't sufficient in the face of VACUUM
> > FULL and mixed DML/DDL transaction that have not yet been decoded.
> >
> > To reiterate, as published it works like:
> > For every modification of catalog tuple (insert, multi_insert, update,
> > delete) that has influence over visibility issue a record that contains:
> > * filenode
> > * ctid
> > * (cmin, cmax)
> >
> > When doing a visibility check on a catalog row during decoding of mixed
> > DML/DDL transaction lookup (cmin, cmax) for that row since we don't
> > store both for the tuple.
> >
> > That mostly works great.
> >
> > The problematic scenario is decoding a transaction that has done mixed
> > DML/DDL *after* a VACUUM FULL/CLUSTER has been performed. The VACUUM
> > FULL obviously changes the filenode and the ctid of a tuple, so we
> > cannot successfully do a lookup based on what we logged before.
>
> So I have a new idea for handling this problem, which seems obvious in
> retrospect. What if we make the VACUUM FULL or CLUSTER log the old
> CTID -> new CTID mappings? This would only need to be done for
> catalog tables, and maybe could be skipped for tuples whose XIDs are
> old enough that we know those transactions must already be decoded.

Ah. If it only were so simple ;). That was my first idea, and after I'd
bragged in an 2ndq internal chat that I'd found a simple idea I
obviously had to realize it doesn't work.

Consider:
INIT_LOGICAL_REPLICATION;
CREATE TABLE foo(...);
BEGIN;
INSERT INTO foo;
ALTER TABLE foo ...;
INSERT INTO foo;
COMMIT TX 3;
VACUUM FULL pg_class;
START_LOGICAL_REPLICATION;

When we decode tx 3 we haven't yet read the mapping from the vacuum
freeze. That scenario can happen either because decoding was stopped for
a moment, or because decoding couldn't keep up (slow connection,
whatever).

There also can be nasty variations where the VACUUM FULL happens while a
transaction is writing data since we don't hold locks on system
relations for very long.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-10-22 15:59:35 Re: logical changeset generation v6.2
Previous Message Tom Lane 2013-10-22 15:00:42 Re: Why the asprintf patch is still breaking the buildfarm