logical changeset generation v5

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: logical changeset generation v5
Date: 2013-06-14 22:48:17
Message-ID: 20130614224817.GA19641@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I am rather pleased to announce the next version of the changeset
extraction patchset. Thanks to help from a large number of people I
think we are slowly getting to the point where it is getting
committable.

Since the last submitted version
(20121115002746(dot)GA7692(at)awork2(dot)anarazel(dot)de) a large number of fixes and
the result of good amount of review has been added to the tree. All
bugs known to me have been fixed.

Fixes include:
* synchronous replication support
* don't peg the xmin for user tables, do it only for catalog ones.
* arbitrarily large transaction support by spilling large transactions
to disk
* spill snapshots to disk, so we can restart without waiting for a new
snapshot to be built
* Don't read all WAL from the establishment of a logical slot
* tests via SQL interface to changeset extraction

The todo list includes:
* morph the "logical slot" interface into being "replication slots" that
can also be used by streaming replication
* move some more code from snapbuild.c to decode.c to remove a largely
duplicated switch
* do some more header/comment cleanup & clarification
* move pg_receivellog into its own directory in src/bin or contrib/.
* user/developer level documentation

The patch series currently has two interfaces to logical decoding. One -
which is primarily useful for pg_regress style tests and playing around
- is SQL based, the other one uses a walsender replication connection.

A quick demonstration of the SQL interface (server needs to be started
with wal_level = logical and max_logical_slots > 0):
=# CREATE EXTENSION test_logical_decoding;
=# SELECT * FROM init_logical_replication('regression_slot', 'test_decoding');
slotname | xlog_position
-----------------+---------------
regression_slot | 0/17D5908
(1 row)

=# CREATE TABLE foo(id serial primary key, data text);

=# INSERT INTO foo(data) VALUES(1);

=# UPDATE foo SET id = -id, data = ':'||data;

=# DELETE FROM foo;

=# DROP TABLE foo;

=# SELECT * FROM start_logical_replication('regression_slot', 'now', 'hide-xids', '0');
location | xid | data
-----------+-----+--------------------------------------------------------------------------------
0/17D59B8 | 695 | BEGIN
0/17D59B8 | 695 | COMMIT
0/17E8B58 | 696 | BEGIN
0/17E8B58 | 696 | table "foo": INSERT: id[int4]:1 data[text]:1
0/17E8B58 | 696 | COMMIT
0/17E8CA8 | 697 | BEGIN
0/17E8CA8 | 697 | table "foo": UPDATE: old-pkey: id[int4]:1 new-tuple: id[int4]:-1 data[text]::1
0/17E8CA8 | 697 | COMMIT
0/17E8E50 | 698 | BEGIN
0/17E8E50 | 698 | table "foo": DELETE: id[int4]:-1
0/17E8E50 | 698 | COMMIT
0/17E9058 | 699 | BEGIN
0/17E9058 | 699 | COMMIT
(13 rows)

=# SELECT * FROM pg_stat_logical_decoding ;
slot_name | plugin | database | active | xmin | restart_decoding_lsn
-----------------+---------------+----------+--------+------+----------------------
regression_slot | test_decoding | 12042 | f | 695 | 0/17D58D0
(1 row)

=# SELECT * FROM stop_logical_replication('regression_slot');
stop_logical_replication
--------------------------
0

The walsender interface has the same calls
INIT_LOGICAL_REPLICATION 'slot' 'plugin';
START_LOGICAL_REPLICATION 'slot' restart_lsn [(option value)*];
STOP_LOGICAL_REPLICATION 'slot';

The only difference is that START_LOGICAL_REPLICATION can stream changes
and it can support synchronous replication.

The output seen in the 'data' column is produced by a so called 'output
plugin' which users of the facility can write to suit their needs. They
can be written by implementing 5 functions in the shared object that's
passed to init_logical_replication() above:
* pg_decode_init (optional)
* pg_decode_begin_txn
* pg_decode_change
* pg_decode_commit_txn
* pg_decode_cleanup (optional)

The most interesting function pg_decode_change get's passed a structure
containing old/new versions of the row, the 'struct Relation' belonging
to it and metainformation about the transaction.

The output plugin can rely on syscache lookups et al. to decode the
changed tuple in whatever fashion it wants.

I'd like to invite reviewers to first look at:
* the output plugin interface
* the walsender/SRF interface
* patch 12 which contains most of the code

When reading the code, the information flow during decoding might be
interesting:
---------------
+---------------+
| XLogReader |
+---------------+
|
XLOG Records
|
v
+---------------+
| decode.c |
+---------------+
| |
| |
v |
+---------------+ |
| snapbuild.c | HeapTupleData
+---------------+ |
| |
catalog snapshots |
| |
v v
+---------------+
|reorderbuffer.c|
+---------------+
|
HeapTuple & Metadata
|
v
+---------------+
| Output Plugin |
+---------------+
|
Whatever you want
|
v
+---------------+
| Output Handler|
| |
|WalSnd or SRF |
+---------------+
---------------

Overview of the attached patches:
0001: indirect toast tuples; required but submitted independently
0002: functions for testing; not required,
0003: (tablespace, filenode) syscache; required
0004: RelationMapFilenodeToOid: required, simple
0005: pg_relation_by_filenode() function; not required but useful
0006: Introduce InvalidCommandId: required, simple
0007: Adjust Satisfies* interface: required, mechanical,
0008: Allow walsender to attach to a database: required, needs review
0009: New GetOldestXmin() parameter; required, pretty boring
0010: Log xl_running_xact regularly in the bgwriter: required
0011: make fsync_fname() public; required, needs to be in a different file
0012: Relcache support for an Relation's primary key: required
0013: Actual changeset extraction; required
0014: Output plugin demo; not required (except for testing) but useful
0015: Add pg_receivellog program: not required but useful
0016: Add test_logical_decoding extension; not required, but contains
the tests for the feature. Uses 0014
0017: Snapshot building docs; not required

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-06-14 22:51:47 Re: changeset generation v5-01 - Patches & git tree
Previous Message Richard Poole 2013-06-14 22:42:25 stray SIGALRM