Re: Sequence Access Method WIP

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sequence Access Method WIP
Date: 2013-11-18 09:48:56
Message-ID: 20131118094856.GA5526@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
> On 15.11.2013 20:21, Andres Freund wrote:
> >Well, it exposes log_sequence_tuple() - together with the added "am
> >private" column of pg_squence that allows to do quite a bit of different
> >things. I think unless we really implement pluggable RMGRs - which I
> >don't really see coming - we cannot be radically different.
>
> The proposed abstraction leaks like a sieve. The plugin should not need to
> touch anything but the private amdata field. log_sequence_tuple() is way too
> low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence->amdata
->is_called
->last_value
->log_cnt
SeqTable ->last
SeqTable ->cached
SeqTable ->last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
once are. Which e.g. you need to control if want a sequence that
doesn't leak values in crashes
* cached is needed to control the per-backend caching.

> Just as a thought-experiment, imagine that we decided to re-implement
> sequences so that all the sequence values are stored in one big table, or
> flat-file in the data directory, instead of the current implementation where
> we have a one-block relation file for each sequence. If the sequence access
> method API is any good, it should stay unchanged. That's clearly not the
> case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

> Stepping back a bit and looking at this problem from a higher level, why do
> you need to hack this stuff into the sequences? Couldn't you just define a
> new set of functions, say bdr_currval() and bdr_nextval(), to operate on
> these new kinds of sequences?

Basically two things:
a) User interface. For one everyone would need to reimplement the entire
sequence DDL from start. For another it means it's hard to write
(client) code that doesn't depend on a specific implementation.
b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

> You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the
> implementation completely separate. If you need it for compatibility with
> applications, you could create facade currval/nextval() functions that call
> the built-in version or the bdr version depending on the argument.

That doesn't get you very far:
a) the default functions created by sequences are pg_catalog
prefixed. So you'd need to hack up the catalog to get your own functions
to be used if you want the application to work transparently. In which
you need to remember the former function because you now cannot call it
normally anymore. Yuck.
b) One would need nearly all of sequence.c again. You need the state
across calls, the locking, the WAL logging, DDL support. Pretty much
the only thing *not* used would be nextval_internal() and do_setval().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sameer Thakur 2013-11-18 09:54:16 Re: pg_stat_statements: calls under-estimation propagation
Previous Message Haribabu kommi 2013-11-18 09:31:14 Re: New option for pg_basebackup, to specify a different directory for pg_xlog