Re: logical changeset generation v4 - Heikki's thoughts about the patch state

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Phil Sorber <phil(at)omniti(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical changeset generation v4 - Heikki's thoughts about the patch state
Date: 2013-01-24 18:27:00
Message-ID: CA+TgmoYhkMpkB8JZYhVei--h-onT-kT-Ko8bHvrrBUsewi_u-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 24, 2013 at 6:14 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Thats way much more along the lines of what I am afraid of than the
> performance stuff - but Heikki cited those, so I replied to that.
>
> Note that I didn't say this must, must go in - I just don't think
> Heikki's reasoning about why not hit the nail on the head.

Fair enough, no argument.

Before getting bogged down in technical commentary, let me say this
very clearly: I am enormously grateful for your work on this project.
Logical replication based on WAL decoding is a feature of enormous
value that PostgreSQL has needed for a long time, and your work has
made that look like an achievable goal. Furthermore, it seems to me
that you have pursued the community process with all the vigor and
sincerity for which anyone could ask. Serious design concerns were
raised early in the process and you made radical changes to the design
which I believe have improved it tremendously, and you've continued to
display an outstanding attitude at every phase of this process about
which I can't say enough good things. There is no question in my mind
that this work is going to be the beginning of a process that
revolutionizes the way people think about replication and PostgreSQL,
and you deserve our sincere thanks for that.

Now, the bad news is, I don't think it's very reasonable to try to
commit this to 9.3. I think it is just too much stuff too late in the
cycle. I've reviewed some of the patches from time to time but there
is a lot more stuff and it's big and complicated and it's not really
clear that we have the interface quite right yet, even though I think
it's also clear that we are a lot of closer than we were. I don't
want to be fixing that during beta, much less after release.

> I tried very, very hard to get the basics of the design & interface
> solid. Which obviously doesn't man I am succeeding - luckily not being
> superhuman after all ;). And I think thats very much where input is
> desparetely needed and where I failed to raise enough attention. The
> "output plugin" interface follewed by the walsender interface is what
> needs to be most closely vetted.
> Those are the permanent, user/developer exposed UI and the one we should
> try to keep as stable as possible.
>
> The output plugin callbacks are defined here:
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/include/replication/output_plugin.h;hb=xlog-decoding-rebasing-cf4
> To make it more agnostic of the technology to implement changeset
> extraction we possibly should replace the ReorderBuffer(TXN|Change)
> structs being passed by something more implementation agnostic.
>
> walsender interface:
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=src/backend/replication/repl_gram.y;hb=xlog-decoding-rebasing-cf4
> The interesting new commands are:
> 1) K_INIT_LOGICAL_REPLICATION NAME NAME
> 2) K_START_LOGICAL_REPLICATION NAME RECPTR plugin_options
> 3) K_FREE_LOGICAL_REPLICATION NAME
>
> 1 & 3 allocate (respectively free) the permanent state associated with
> one changeset consumer whereas START_LOGICAL_REPLICATION streams out
> changes starting at RECPTR.

Forgive me for not having looked at the patch, but to what extent is
all this, ah, documented?

> Btw, there are currently *no* changes to the wal format at all if
> wal_format < logical except that xl_running_xacts are logged more
> frequently which obviously could easily be made conditional. Baring bugs
> of course.
> The changes with wal_level>=logical aren't that big either imo:
> * heap_insert, heap_update prevent full page writes from removing their
> normal record by using a separate XLogRecData block for the buffer and
> the record
> * heap_delete adds more data (the pkey of the tuple) after the unchanged
> xl_heap_delete struct
> * On changes to catalog tables (relfilenode, tid, cmin, cmax) are logged.
>
> No changes to mvcc for normal backends at all, unless you count the very
> slightly changed *Satisfies interface (getting passed a HeapTuple
> instead of HeapTupleHeader).
>
> I am not sure what you're concerned about WRT the on-disk format of the
> tuples? We are pretty much nailed down on that due to pg_upgrade'ability
> anyway and it could be changed from this patches POV without a problem,
> the output plugin just sees normal HeapTuples? Or are you concerned
> about the code extracting them from the xlog records?

Mostly, my concern is that you've accidentally broken something, or
that your code will turn out to be flaky in ways we can't now predict.
My only really specific concern at this point is about the special
treatment of catalog tables. We've never done anything like that
before, and it feels like a bad idea. In particular, the fact that
you have to WAL-log new information about cmin/cmax really suggests
that we're committing ourselves to the MVCC infrastructure in a way
that we weren't previously. There's some category of stuff that our
MVCC implementation didn't previously require us to persist on disk
which, after this, it will. I don't understand exactly where the
boundaries of that are in terms of future changes we might want to
make - but I don't like moving the goalposts in that area.

> So I think the "won't break anything else" argument can be made rather
> fairly if the heapam.c changes, which aren't that complex, are vetted
> closely.
>
> Now, the disucssion about all the code thats active *during* decoding is
> something else entirely :/
>
>> You
>> agreed with Tom that 9.2 is the buggiest release in recent memory, but
>> I think logical replication could easily be an order of magnitude
>> worse.
>
> I unfortunately think that not providing more builtin capabilities in
> this area also has significant dangers. Imo this is one of the weakest,
> or even the weakest, area of postgres.
>
> I personally have the impression that just about nobody did actual beta
> testing of the lastest releases, especially 9.2, and that is the reason
> why its the buggiest recent release.
>
>> I also have serious concerns about checksums and foreign key locks.
>> Any single one of those three patches could really inflict
>> unprecedented damage on our community's reputation for stability and
>> reliability if they turn out to be seriously buggy, and unfortunately
>> I don't consider that an unlikely outcome. I don't know what to do
>> about it, either.
>
> I see your point although I would attest both having far more danger of
> collateral damage than logical decoding itself. Especially fklocks
> pretty much has to be active by default and there's not much that can
> be reasonably done about that.
> I tried to give fklocks a very thorough look but unfortunately I didn't
> know anything beforehand about most areas of the code it touches which
> obviously limits the amount of dangers one is seeing (FWIW I am still
> mostly concerned with multixact logging and the locking changes in
> heapam.c itself).

Yeah, I don't disagree with any of that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2013-01-24 18:29:10 Re: Materialized views WIP patch
Previous Message Fujii Masao 2013-01-24 18:12:47 Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)