Re: First draft of snapshot snapshot building design document

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <peter(at)2ndquadrant(dot)com>, hlinnakangas(at)vmware(dot)com
Subject: Re: First draft of snapshot snapshot building design document
Date: 2012-10-18 14:47:12
Message-ID: CA+TgmoZXkCo5FAbU=3JHuXXF0Op2SLhGJcVuFM3tkmcBnmhBMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 16, 2012 at 7:30 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On Thursday, October 11, 2012 01:02:26 AM Peter Geoghegan wrote:
>> The design document [2] really just explains the problem (which is the
>> need for catalog metadata at a point in time to make sense of heap
>> tuples), without describing the solution that this patch offers with
>> any degree of detail. Rather, [2] says "How we build snapshots is
>> somewhat intricate and complicated and seems to be out of scope for
>> this document", which is unsatisfactory. I look forward to reading the
>> promised document that describes this mechanism in more detail.
>
> Here's the first version of the promised document. I hope it answers most of
> the questions.
>
> Input welcome!

I haven't grokked all of this in its entirety, but I'm kind of
uncomfortable with the relfilenode -> OID mapping stuff. I'm
wondering if we should, when logical replication is enabled, find a
way to cram the table OID into the XLOG record. It seems like that
would simplify things.

If we don't choose to do that, it's worth noting that you actually
need 16 bytes of data to generate a unique identifier for a relation,
as in database OID + tablespace OID + relfilenode# + backend ID.
Backend ID might be ignorable because WAL-based logical replication is
going to ignore temporary relations anyway, but you definitely need
the other two. There's nothing, for example, to keep you from having
two relations with the same value in pg_class.relfilenode in the same
database but in different tablespaces. It's unlikely to happen,
because for new relations we set OID = relfilenode, but a subsequent
rewrite can bring it about if the stars align just right. (Such
situations are, of course, a breeding ground for bugs, which might
make you question whether our current scheme for assigning
relfilenodes has much of anything to recommend it.)

Another thing to think about is that, like catalog snapshots,
relfilenode mappings have to be time-relativized; that is, you need to
know what the mapping was at the proper point in the WAL sequence, not
what it is now. In practice, the risk here seems to be minimal,
because it takes a while to churn through 4 billion OIDs. However, I
suspect it pays to think about this fairly carefully because if we do
ever run into a situation where the OID counter wraps during a time
period comparable to the replication lag, the bugs will be extremely
difficult to debug.

Anyhow, adding the table OID to the WAL header would chew up a few
more bytes of WAL space, but it seems like it might be worth it to
avoid having to think very hard about all of these issues.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-10-18 15:07:14 Re: Deprecations in authentication
Previous Message Fujii Masao 2012-10-18 14:41:35 Re: [BUG] False indication in pg_stat_replication.sync_state