Re: pglogical_output - a general purpose logical decoding output plugin

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pglogical_output - a general purpose logical decoding output plugin
Date: 2016-01-30 21:09:33
Message-ID: CAMsr+YECDGHG_WMfu8Vi8J3GdOuV=nUincYNWryb26d93ngtEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 January 2016 at 18:16, Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> so, I'm reviewing the output of:
>

Thankyou very much for the review.

> > + pglogical_output_plhooks \
>
> I'm doubtful we want these plhooks. You aren't allowed to access normal
> (non user catalog) tables in output plugins. That seems too much to
> expose to plpgsql function imo.
>

You're right. We've got no way to make sure the user sticks to things
that're reasonably safe.

The intent of that module was to allow people to write row and origin
filters in plpgsql, to serve as an example of how to implement hooks, and
to provide a tool usable in testing pglogical_output from pg_regress.

An example can be in C, since it's not safe to do it in plpgsql as you
noted. A few toy functions will be sufficient for test use.

As for allowing users to flexibly filter, I'm stating to think that hooks
in pglogical_output aren't really the best long term option. They're needed
for now, but for 9.7+ we should look at whether it's practical to separate
"what gets forwarded" policy from the mechanics of how it gets sent.
pglogical_output currently just farms part of the logical decoding hook out
to its own hooks, but it wouldn't have to do that if logical decoding let
you plug in policy on what you send separately to how you send it. Either
via catalogs or plugin functions separate to the output plugin.

(Kinda off topic, though, and I think we need the hooks for now, just not
the plpgsql implementation).

> > +++ b/contrib/pglogical_output/README.md
>
> I don't think we've markdown in postgres so far - so let's just keep the
> current content and remove the .md
>

I'm halfway through turning it all into SGML anyway. I just got sidetracked
by other work. I'd be just as happy to leave it as markdown but figured
SGML would likely be required.

>
> > +==== Table metadata header
> > +
> > +|===
> > +|*Message*|*Type/Size*|*Notes*
> > +
> > +|Message type|signed char|Literal ‘**R**’ (0x52)
> > +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not
> recognised.
> > +|relidentifier|uint32|Arbitrary relation id, unique for this upstream.
> In practice this will probably be the upstream table’s oid, but the
> downstream can’t assume anything.
> > +|nspnamelength|uint8|Length of namespace name
> > +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> > +|relnamelength|uint8|Length of relation name
> > +|relname|char[relname]|Relation name (null terminated)
> > +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> > +|natts|uint16|number of attributes
> > +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each
> of which begins with a column delimiter followed by zero or more column
> metadata blocks, each with the same column metadata block header.
>
> That's a fairly high overhead. Hm.
>

Yeah, and it shows in Oleksandr's measurements. However, that's a metadata
message that is sent only pretty infrequently if you enable relation
metadata caching. Which is really necessary to get reasonable performance
on anything but the simplest workloads, and is only optional because it
makes it easier to write and test a client without it first.

> > +== JSON protocol
> > +
> > +If `proto_format` is set to `json` then the output plugin will emit JSON
> > +instead of the custom binary protocol. JSON support is intended mainly
> for
> > +debugging and diagnostics.
> > +
>
> I'm fairly strongly opposed to including two formats in one output
> plugin. I think the demand for being able to look into the binary
> protocol should instead be satisfied by having a function that "expands"
> the binary data returned into something easier to understand.
>

Per our discussion yesterday I think I agree with you on that now.

My thinking is that we should patch pg_recvlogical to be able to load a
decoder plugin. Then extract the protocol decoding support from pglogical
into a separately usable library that can be loaded by pg_recvlogical,
pglogical downstream, and by SQL-level debug/test helper functions.

pg_recvlogical won't be able to decode binary or internal format field
values, but you can simply not request that they be sent.

> > + case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> > + val = get_param_value(elem, false,
> OUTPUT_PARAM_TYPE_UINT32);
> > +
> data->client_binary_basetypes_major_version = DatumGetUInt32(val);
> > + break;
>
> Why is the major version tied to basetypes (by name)? Seem more
> generally useful.
>

I found naming that param rather awkward.

The idea is that we can rely on the Pg major version only for types defined
in core. It's mostly safe for built-in extensions in that few if any
people ever upgrade them, but it's not strictly correct even there. Most of
them (hstore, etc) don't expose their own versions so it's hard to know
what to do about them.

What I want(ed?) to do is let a downstream enumerate the extensions it has
and the version(s) it knows they're compatible with for send/recv and
internal formats. But designing that was going to be hairy in the time
available, and I think falling back on text representation for contribs and
extensions is the safest choice. A solid way to express an extension
compatibility matrix seemed rather too much to bite off as well as
everything else.

>
> > + case PARAM_RELMETA_CACHE_SIZE:
> > + val = get_param_value(elem, false,
> OUTPUT_PARAM_TYPE_INT32);
> > + data->client_relmeta_cache_size =
> DatumGetInt32(val);
> > + break;
>
> I'm not convinced this a) should be optional b) should have a size
> limit. Please argue for that choice. And how the client should e.g. know
> about evictions in that cache.
>

I don't think metadata every row makes sense. So it shouldn't be optional
in that sense. It was optional mostly because the downstream didn't
initially support it and I thought it'd be useful to allow people to write
simpler clients.

The cache logic is really quite simple though, so I'm happy enough to make
it required.

Re the size limit, my thinking was that there are (unfortunately) real
world apps that create and drop tables in production, that have tens of
thousands or more schemas that each have hundreds or more tables, etc.
They're awful and I wish people wouldn't do that, but they do. Rather than
expecting some unknown and unbounded cache size being able to limit the
number of tables for which the downstream is required to retain metadata
seemed possibly useful. I'm far from wedded to the idea as it requires the
upstream to maintain a metadata cache LRU (which it doesn't yet) and send
cache eviction messages to the downstream. Cutting that whole idea out
would simplify things.

>
>
> > --- /dev/null
> > +++ b/contrib/pglogical_output/pglogical_config.h
> > @@ -0,0 +1,55 @@
> > +#ifndef PG_LOGICAL_CONFIG_H
> > +#define PG_LOGICAL_CONFIG_H
> > +
> > +#ifndef PG_VERSION_NUM
> > +#error <postgres.h> must be included first
> > +#endif
>
> Huh?
>

It's a stray that should've been part of pglogical/compat.h (which will
presumably get cut for inclusion in contrib anyway).

> > +#include "nodes/pg_list.h"
> > +#include "pglogical_output.h"
> > +
> > +inline static bool
> > +server_float4_byval(void)
> > +{
> > +#ifdef USE_FLOAT4_BYVAL
> > + return true;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > +inline static bool
> > +server_float8_byval(void)
> > +{
> > +#ifdef USE_FLOAT8_BYVAL
> > + return true;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > +inline static bool
> > +server_integer_datetimes(void)
> > +{
> > +#ifdef USE_INTEGER_DATETIMES
> > + return true;
> > +#else
> > + return false;
> > +#endif
> > +}
> > +
> > +inline static bool
> > +server_bigendian(void)
> > +{
> > +#ifdef WORDS_BIGENDIAN
> > + return true;
> > +#else
> > + return false;
> > +#endif
> > +}
>
> Not convinced these should exists, and even moreso exposed in a header.
>

Yeah. The info's needed in both pglogical_config.c and pglogical_output.c
but that can probably be avoided.

> > +/*
> > + * Returns Oid of the hooks function specified in funcname.
> > + *
> > + * Error is thrown if function doesn't exist or doen't return correct
> datatype
> > + * or is volatile.
> > + */
> > +static Oid
> > +get_hooks_function_oid(List *funcname)
> > +{
> > + Oid funcid;
> > + Oid funcargtypes[1];
> > +
> > + funcargtypes[0] = INTERNALOID;
> > +
> > + /* find the the function */
> > + funcid = LookupFuncName(funcname, 1, funcargtypes, false);
> > +
> > + /* Validate that the function returns void */
> > + if (get_func_rettype(funcid) != VOIDOID)
> > + {
> > + ereport(ERROR,
> > + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > + errmsg("function %s must return void",
> > +
> NameListToString(funcname))));
> > + }
>
> Hm, this seems easy to poke holes into. I mean you later use it like:
>
> > + if (data->hooks_setup_funcname != NIL)
> > + {
> > + hooks_func =
> get_hooks_function_oid(data->hooks_setup_funcname);
> > +
> > + old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> > + (void) OidFunctionCall1(hooks_func,
> PointerGetDatum(&data->hooks));
> > + MemoryContextSwitchTo(old_ctxt);
>
> e.g. you basically assume the function the does something reasonable
> with those types. Why don't we instead create a 'plogical_hooks' return
> type, and have the function return that?
>

I want to avoid requiring any extension to be loaded for the output plugin,
to just have it provide the mechanism for transporting data changes and not
start adding types, user catalogs, etc.

That's still OK if the signature in pg_proc is 'returns internal', I just
don't want anything visible from SQL.

My other reasons for this approach have been obsoleted now that we're
installing the pglogical/hooks.h header in Pg's server includes. Originally
I was afraid extensions would have to make a *copy* of the hooks struct
definition in their own headers. In that case we'd be able to add entries
only strictly at the end of the struct and could only use it by palloc0'ing
it and passing a pointer. Of course, I didn't think of the case where the
hook defining extension had the bigger of the two definitions...

So yeah. Happy to just return 'internal', DatumGetPointer it and cast to a
pointer to our hooks struct.

> > + if (func_volatile(funcid) == PROVOLATILE_VOLATILE)
> > + {
> > + ereport(ERROR,
> > + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> > + errmsg("function %s must not be VOLATILE",
> > +
> NameListToString(funcname))));
> > + }
>
> Hm, not sure what that's supposed to achieve. You could argue for
> requiring the function to be immutable (i.e. not stable or volatile),
> but I'm not sure what that'd achieve.
>

It's a stupid holdover from the function's use elsewhere, and entirely my
fault.

> > + old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> > + (void) (*data->hooks.startup_hook)(&args);
> > + MemoryContextSwitchTo(old_ctxt);
>
> What is the hooks memory contexts intended to achieve? It's apparently
> never reset. Normally output plugin calbacks are called in more
> shortlived memory contexts, for good reason, to avoid leaks....
>

Mainly to make memory allocated by and used by hooks more visible when
debugging, rather than showing it conflated with the main logical decoding
context, while still giving hooks somewhere to store state that they may
need across the decoding session.

Not going to argue strongly for it. Just seemed like something I'd regret
not having when trying to figure out why the decoding context was massive
for no apparent reason later...

> +bool
> > +call_row_filter_hook(PGLogicalOutputData *data, ReorderBufferTXN *txn,
> > + Relation rel, ReorderBufferChange *change)
> > +{
> > + struct PGLogicalRowFilterArgs hook_args;
> > + MemoryContext old_ctxt;
> > + bool ret = true;
> > +
> > + if (data->hooks.row_filter_hook != NULL)
> > + {
> > + hook_args.change_type = change->action;
> > + hook_args.private_data = data->hooks.hooks_private_data;
> > + hook_args.changed_rel = rel;
> > + hook_args.change = change;
> > +
> > + elog(DEBUG3, "calling pglogical row filter hook");
> > +
> > + old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> > + ret = (*data->hooks.row_filter_hook)(&hook_args);
>
> Why aren't we passing txn to the filter? ISTM it'd be better to
> basically reuse/extend the signature by the the original change
> callback.
>

Yeah, probably.

I somewhat wish the original callbacks used struct arguments. Otherwise you
land up with fun #ifdef's when supporting multiple Pg versions and it's
hard to add new params. Quite annoying when dealing with extensions. OTOH
it's presumably faster to use the usual C calling convention.

> > +/* These must be available to pg_dlsym() */
>
> No the following don't? And they aren't, since they're static functions?
> _PG_init and _PG_output_plugin_init need to, but that's it.
>

Yeah. Total thinko.

> > +/*
> > + * COMMIT callback
> > + */
> > +void
> > +pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> > + XLogRecPtr commit_lsn)
> > +{
>
> Missing static's?
>

Yes.

>
> > + /*
> > + * Nobody keeps pointers to entries in this hash table around so
> > + * it's safe to directly HASH_REMOVE the entries as soon as they
> are
> > + * invalidated. Finding them and flagging them invalid then
> removing
> > + * them lazily might save some memory churn for tables that get
> > + * repeatedly invalidated and re-sent, but it dodesn't seem worth
> > + * doing.
> > + *
> > + * Getting invalidations for relations that aren't in the table is
> > + * entirely normal, since there's no way to unregister for an
> > + * invalidation event. So we don't care if it's found or not.
> > + */
> > + (void) hash_search(RelMetaCache, &relid, HASH_REMOVE, NULL);
> > + }
>
> So, I don't buy this, like at all. The cache entry is passed to
> functions, while we call output functions and such. Which in turn can
> cause cache invalidations to be processed.
>

That was a misunderstanding on my part. I hadn't realised that cache
invalidations could be fired so easily by apparently innocuous actions, and
had assumed incorrectly that we'd get invalidations only outside the scope
of a decoding callback, not during one.

Clearly need to fix this with the usual invalid flag based approach. Which
in turn makes me agree with your proposal yesterday about adding a generic
mechanism for extensions to register their interest in invalidations on
tables, attach data to them, and not worry about the details of managing
the hash table correctly etc.

> > +struct PGLRelMetaCacheEntry
> > +{
> > + Oid relid;
> > + /* Does the client have this relation cached? */
> > + bool is_cached;
> > + /* Field for API plugin use, must be alloc'd in decoding context */
> > + void *api_private;
> > +};
>
> I don't see how api_private can safely be used. At the very least it
> needs a lot more documentation about memory lifetime rules and
> such. Afaics we'd just forever leak memory atm.
>

Yeah. It's pretty much just wrong, and since I don't have a compelling
reason for it I'm happy enough for it to just go away. Doing it correctly
would pretty much require a callback to be registered for freeing it, and
... meh.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-01-30 21:28:47 Re: Fwd: Core dump with nested CREATE TEMP TABLE
Previous Message Craig Ringer 2016-01-30 20:32:13 Re: Additional role attributes && superuser review