Re: Set new system identifier using pg_resetxlog

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Set new system identifier using pg_resetxlog
Date: 2014-06-17 16:50:11
Message-ID: 20140617165011.GA3115@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-06-17 12:07:04 -0400, Robert Haas wrote:
> On Tue, Jun 17, 2014 at 10:33 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> > On 17/06/14 16:18, Robert Haas wrote:
> >> On Fri, Jun 13, 2014 at 8:31 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com>
> >> wrote:
> >>> attached is a simple patch which makes it possible to change the system
> >>> identifier of the cluster in pg_control. This is useful for
> >>> individualization of the instance that is started on top of data
> >>> directory
> >>> produced by pg_basebackup - something that's helpful for logical
> >>> replication
> >>> setup where you need to easily identify each node (it's used by
> >>> Bidirectional Replication for example).
> >>
> >>
> >> I can clearly understand the utility of being able to reset the system
> >> ID to a new, randomly-generated system ID - but giving the user the
> >> ability to set a particular value of their own choosing seems like a
> >> pretty sharp tool. What is the use case for that?

I've previously hacked this up adhoc during data recovery when I needed
to make another cluster similar enough that I could replay WAL.

Another usecase is to mark a database as independent from its
origin. Imagine a database that gets sharded across several
servers. It's not uncommon to do that by initially basebackup'ing the
database to several nodes and then use them separately from
thereon. It's quite useful to actually mark them as being
distinct. Especially as several of them right now would end up with the
same timeline id...

> But it seems to me that we might need to have a process discussion
> here, because, while I'm all in favor of incremental feature proposals
> that build towards a larger goal, it currently appears that the larger
> goal toward which you are building is not something that's been
> publicly discussed and debated on this list. And I really think we
> need to have that conversation. Obviously, individual patches will
> still need to be debated, but I feel like 2ndQuadrant is trying to
> construct a castle without showing the community the floor plan. I
> believe that there is relatively broad agreement that we would all
> like a castle, but different people may have legitimately different
> ideas about how it should be constructed. If the work arrives as a
> series of disconnected pieces (user-specified system ID, event
> triggers for CREATE, etc.), then everyone outside of 2ndQuadrant has
> to take it on faith that those pieces are going to eventually fit
> together in a way that we'll all be happy with. In some cases, that's
> fine, because the feature is useful on its own merits whether it ends
> up being part of the castle or not.
>

Uh. Right now this patch has been written because it's needed for a out
of core replication solution. That's what BDR is at this point. The
patch is unobtrusive, has other usecases than just our internal one and
doesn't make pg_resetxlog even more dangerous than it already is. I
don't see much problem with considering it on it's own cost/benefit?

So this seems to be a concern that's relatively independent of this
patch. Am I seing that right?

I think one very important point here is that BDR is *not* the proposed
in core solution. I think a reasonable community perspective - besides
also being useful on it's own - is to view it as a *prototype* for a in
core solution. And e.g. logical decoding would have looked much worse -
and likely not have been integrated - without externally already being
used for BDR.

I'm not sure how we can ease or even resolve your conerns when talking
about pretty independent and general pieces of functionality like the
DDL even trigger stuff. We needed to actually *write* those to see how
BDR will look like. And the communities feedback heavily influenced how
BDR looks like by accepting some pieces, demanding others, and outright
rejecting the remainder.

I think there's some pieces that need to consider them on their own
merit. Logical decoding is useful on it's own. The ability for out of
core systems to do DDL replication is another piece (that you referred
to above).
I think the likelihood of success if we were to try to design a in-core
system from ground up first and then follow through prety exactly along
those lines is minimal.

So, what I think we can do is to continue trying to build independent,
generally useful bits. Which imo all the stuff that's been integrated
is. Then, somewhat soon I think, we'll have to come up with a proposal
how the parts that are *not* necessarily useful outside of in-core
logical rep. might look like. Which will likely trigger some long long
discussions that turn that design around a couple of times. Which is
fine. I *don't* think that's going to be a trimmed down version of
todays BDR.

> But in other cases, like this one, if the premise that the slot name
> should match the system identifier isn't something the community wants
> to accept, then taking a patch that lets people do that is probably a
> bad idea, because at least one person will use it to set the system
> identifier of a system to a value that enables physical replication to
> take place when that is actually totally unsafe, and we don't want to
> enable that for no reason.

It also allows many other dangerous things. Many of which are much more
dangerous than changing the system identifier. Resetting an independent
cluster is also not very likely to work - the LSNs would still not
match. But it wouldn't corrupt the copy of the database that's been
changed...

> Maybe the slot name should match the
> replication identifier rather than the standby system ID, for example.
> There are conflicting proposals for how replication identifiers should
> work, but one of those proposals limits it to 16 bits.

I actually don't think any of the discussions I was involved in had the
externally visible version of replication identifiers limited to 16bits?
If you are referring to my patch, 16bits was just the width of the
*internal* name that should basically never be looked at. User visible
replication identifiers are always identified by an arbitrary string -
whose format is determined by the user of the replication identifier
facility. *BDR* currently stores the system identifer, the database id
and a name in there - but that's nothing core needs to concern itself
with.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-17 16:54:14 Re: 9.5 CF1
Previous Message Alvaro Herrera 2014-06-17 16:47:19 Re: 9.5 CF1