Quick Links

Re: Sequence Access Method WIP

Lists:	pgsql-hackers

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Sequence Access Method WIP
Date:	2013-01-16 02:40:32
Message-ID:	CA+U5nMLV3ccdzbqCvcedd-HfrE4dUmoFmTBPL_uJ9YjsQbR7iQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

SeqAm allows you to specify a plugin that alters the behaviour for
sequence allocation and resetting, aimed specifically at clustering
systems.

New command options on end of statement allow syntax
CREATE SEQUENCE foo_seq
USING globalseq WITH (custom_option=setting);
which allows you to specify an alternate Sequence Access Method for a
sequence object,
or you can specify a default_sequence_mgr as a USERSET parameter
SET default_sequence_mgr = globalseq;

Existing sequences can be modified to use a different SeqAM, by calling
ALTER SEQUENCE foo_seq USING globalseq;

SeqAM is similar to IndexAM: There is a separate catalog table for
SeqAMs, but no specific API to create them. Initdb creates one
sequence am, called "local", which is the initial default. If
default_sequence_mgr is set to '' or 'local' then we use the local
seqam. The local seqam's functions are included in core.

Status is still "Work In Progress". Having said that most of the grunt
work is done and if we agree the shape of this is right, its
relatively easy going code.

postgres=# select relname, relam from pg_class where relname = 'foo2';
relname | relam
---------+-------
foo2 | 3839

postgres=# create sequence foo5 using global;
ERROR: access method "global" does not exist

Tasks to complete
* contrib module for example/testing
* Docs

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
seqam.v5.patch	application/octet-stream	40.6 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-14 20:10:17
Message-ID:	CA+U5nMKh+pmbaGUd_x9ev-F5k+whT8HM7wXfkX9ouF=cPkAQEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16 January 2013 00:40, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> SeqAm allows you to specify a plugin that alters the behaviour for
> sequence allocation and resetting, aimed specifically at clustering
> systems.
>
> New command options on end of statement allow syntax
> CREATE SEQUENCE foo_seq
> USING globalseq

Production version of this, ready for commit to PG 9.4

Includes test extension which allows sequences without gaps - "gapless".

Test using seq_test.psql after creating extension.

No dependencies on other patches.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
seqam.v33.patch	application/octet-stream	70.8 KB
gapless.tar	application/x-tar	12.5 KB
seq_test.psql	application/octet-stream	754 bytes

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-15 18:08:30
Message-ID:	5286631E.1000406@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14.11.2013 22:10, Simon Riggs wrote:
> On 16 January 2013 00:40, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> SeqAm allows you to specify a plugin that alters the behaviour for
>> sequence allocation and resetting, aimed specifically at clustering
>> systems.
>>
>> New command options on end of statement allow syntax
>> CREATE SEQUENCE foo_seq
>> USING globalseq
>
> Production version of this, ready for commit to PG 9.4
>
> Includes test extension which allows sequences without gaps - "gapless".
>
> Test using seq_test.psql after creating extension.
>
> No dependencies on other patches.

It's pretty hard to review the this without seeing the "other"
implementation you're envisioning to use this API. But I'll try:

I wonder if the level of abstraction is right. The patch assumes that
the on-disk storage of all sequences is the same, so the access method
can't change that. But then it leaves the details of actually updating
the on-disk block, WAL-logging and all, as the responsibility of the
access method. Surely that's going to look identical in all the seqams,
if they all use the same on-disk format. That also means that the
sequence access methods can't be implemented as plugins, as plugins
can't do WAL-logging.

The comment in seqam.c says that there's a private column reserved for
access method-specific data, called am_data, but that seems to be the
only mention of am_data in the patch.

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-15 18:21:02
Message-ID:	20131115182102.GF5489@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:
> It's pretty hard to review the this without seeing the "other"
> implementation you're envisioning to use this API. But I'll try:

We've written a distributed sequence implementation against it, so it
works at least for that use case.

While certainly not release worthy, it still might be interesting to
look at.
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=blob;f=contrib/bdr/bdr_seq.c;h=c9480c8021882f888e35080f0e7a7494af37ae27;hb=bdr

bdr_sequencer_fill_sequence pre-allocates ranges of values,
bdr_sequence_alloc returns them upon nextval().

> I wonder if the level of abstraction is right. The patch assumes that the
> on-disk storage of all sequences is the same, so the access method can't
> change that. But then it leaves the details of actually updating the on-disk
> block, WAL-logging and all, as the responsibility of the access method.
> Surely that's going to look identical in all the seqams, if they all use the
> same on-disk format. That also means that the sequence access methods can't
> be implemented as plugins, as plugins can't do WAL-logging.

Well, it exposes log_sequence_tuple() - together with the added "am
private" column of pg_squence that allows to do quite a bit of different
things. I think unless we really implement pluggable RMGRs - which I
don't really see coming - we cannot be radically different.

> The comment in seqam.c says that there's a private column reserved for
> access method-specific data, called am_data, but that seems to be the only
> mention of am_data in the patch.

It's amdata, not am_data :/. Directly at the end of pg_sequence.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-15 18:31:07
Message-ID:	CA+U5nMJ1xaHEjvVowyQnJYarUsiRLR3DMozyxjRU0ayZ1mK6FA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15 November 2013 15:08, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> I wonder if the level of abstraction is right.

That is the big question and not something to shy away from.

What is presented is not the first thought, by a long way. Andres'
contribution to the patch is mainly around this point, so the seq am
is designed with the needs of the main use case in mind.

I'm open to suggested changes but I would say that practical usage
beats changes suggested for purity.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-15 18:48:09
Message-ID:	52866C69.80307@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/14/13, 3:10 PM, Simon Riggs wrote:
> On 16 January 2013 00:40, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> SeqAm allows you to specify a plugin that alters the behaviour for
>> sequence allocation and resetting, aimed specifically at clustering
>> systems.
>>
>> New command options on end of statement allow syntax
>> CREATE SEQUENCE foo_seq
>> USING globalseq
>
> Production version of this, ready for commit to PG 9.4

This patch doesn't apply anymore.

Also, you set this to "returned with feedback" in the CF app. Please
verify whether that was intentional.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-15 19:00:00
Message-ID:	CA+U5nMJ2vpvf2AVnFZD+suGJGLNHKHy1ri_+VJpL+Z0frXXC2Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15 November 2013 15:48, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 11/14/13, 3:10 PM, Simon Riggs wrote:
>> On 16 January 2013 00:40, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>
>>> SeqAm allows you to specify a plugin that alters the behaviour for
>>> sequence allocation and resetting, aimed specifically at clustering
>>> systems.
>>>
>>> New command options on end of statement allow syntax
>>> CREATE SEQUENCE foo_seq
>>> USING globalseq
>>
>> Production version of this, ready for commit to PG 9.4
>
> This patch doesn't apply anymore.

Yes, a patch conflict was just committed, will repost.

> Also, you set this to "returned with feedback" in the CF app. Please
> verify whether that was intentional.

Not sure that was me, if so, corrected.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 08:40:16
Message-ID:	5289D270.9000905@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.11.2013 21:00, Simon Riggs wrote:
> On 15 November 2013 15:48, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> Also, you set this to "returned with feedback" in the CF app. Please
>> verify whether that was intentional.
>
> Not sure that was me, if so, corrected.

It was me, sorry. I figured this needs such a large restructuring that
it's not going to be committable in this commitfest. But I'm happy to
keep the discussion going if you think otherwise.

- Heikki

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 08:54:42
Message-ID:	5289D5D2.6010902@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.11.2013 20:21, Andres Freund wrote:
> On 2013-11-15 20:08:30 +0200, Heikki Linnakangas wrote:
>> It's pretty hard to review the this without seeing the "other"
>> implementation you're envisioning to use this API. But I'll try:
>
> We've written a distributed sequence implementation against it, so it
> works at least for that use case.
>
> While certainly not release worthy, it still might be interesting to
> look at.
> https://urldefense.proofpoint.com/v1/url?u=http://git.postgresql.org/gitweb/?p%3Dusers/andresfreund/postgres.git%3Ba%3Dblob%3Bf%3Dcontrib/bdr/bdr_seq.c%3Bh%3Dc9480c8021882f888e35080f0e7a7494af37ae27%3Bhb%3Dbdr&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0A&m=Rbmo%2BDar4PZrQmGH2adz7cCgqRl2%2FXH845YIA1ifS7A%3D%0A&s=8d287f35070fe7cb586f10b5bfe8664ad29e986b5f2d2286c4109e09f615668d
>
> bdr_sequencer_fill_sequence pre-allocates ranges of values,
> bdr_sequence_alloc returns them upon nextval().

Thanks. That pokes into the low-level details of sequence tuples, just
like I was afraid it would.

>> I wonder if the level of abstraction is right. The patch assumes that the
>> on-disk storage of all sequences is the same, so the access method can't
>> change that. But then it leaves the details of actually updating the on-disk
>> block, WAL-logging and all, as the responsibility of the access method.
>> Surely that's going to look identical in all the seqams, if they all use the
>> same on-disk format. That also means that the sequence access methods can't
>> be implemented as plugins, as plugins can't do WAL-logging.
>
> Well, it exposes log_sequence_tuple() - together with the added "am
> private" column of pg_squence that allows to do quite a bit of different
> things. I think unless we really implement pluggable RMGRs - which I
> don't really see coming - we cannot be radically different.

The proposed abstraction leaks like a sieve. The plugin should not need
to touch anything but the private amdata field. log_sequence_tuple() is
way too low-level.

Just as a thought-experiment, imagine that we decided to re-implement
sequences so that all the sequence values are stored in one big table,
or flat-file in the data directory, instead of the current
implementation where we have a one-block relation file for each
sequence. If the sequence access method API is any good, it should stay
unchanged. That's clearly not the case with the proposed API.

>> The comment in seqam.c says that there's a private column reserved for
>> access method-specific data, called am_data, but that seems to be the only
>> mention of am_data in the patch.
>
> It's amdata, not am_data :/. Directly at the end of pg_sequence.

Ah, got it.

Stepping back a bit and looking at this problem from a higher level, why
do you need to hack this stuff into the sequences? Couldn't you just
define a new set of functions, say bdr_currval() and bdr_nextval(), to
operate on these new kinds of sequences? You're not making much use of
the existing sequence infrastructure, anyway, so it might be best to
just keep the implementation completely separate. If you need it for
compatibility with applications, you could create facade
currval/nextval() functions that call the built-in version or the bdr
version depending on the argument.

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 09:48:56
Message-ID:	20131118094856.GA5526@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
> On 15.11.2013 20:21, Andres Freund wrote:
> >Well, it exposes log_sequence_tuple() - together with the added "am
> >private" column of pg_squence that allows to do quite a bit of different
> >things. I think unless we really implement pluggable RMGRs - which I
> >don't really see coming - we cannot be radically different.
>
> The proposed abstraction leaks like a sieve. The plugin should not need to
> touch anything but the private amdata field. log_sequence_tuple() is way too
> low-level.

Why? It's *less* low level than a good part of other crash recovery code
we have. I have quite some doubts about the layering, but it's imo
pretty sensible to centralize the wal logging code that plugins couldn't
write.

If you look at what e.g the _alloc callback currently gets passed.
a) Relation: Important for metadata like TupleDesc, name etc.
b) SeqTable entry: Important to setup state for cur/lastval
c) Buffer of the tuple: LSN etc.
d) HeapTuple itself: a place to store the tuples changed state

And if you then look at what gets modified in that callback:
Form_pg_sequence->amdata
->is_called
->last_value
->log_cnt
SeqTable ->last
SeqTable ->cached
SeqTable ->last_valid

You need is_called, last_valid because of our current representation of
sequences state (which we could change, to remove that need). The rest
contains values that need to be set depending on how you want your
sequence to behave:
* Amdata is obvious.
* You need log_cnt to influence/remember how big the chunks are you WAL log at
once are. Which e.g. you need to control if want a sequence that
doesn't leak values in crashes
* cached is needed to control the per-backend caching.

> Just as a thought-experiment, imagine that we decided to re-implement
> sequences so that all the sequence values are stored in one big table, or
> flat-file in the data directory, instead of the current implementation where
> we have a one-block relation file for each sequence. If the sequence access
> method API is any good, it should stay unchanged. That's clearly not the
> case with the proposed API.

I don't think we can entirely abstract away the reality that now they are
based on relations and might not be at some later point. Otherwise we'd
have to invent an API that copied all the data that's stored in struct
RelationData. Like name, owner, ...
Which non SQL accessible API we provide has a chance of providing that
level of consistency in the face of fundamental refactoring? I'd say
none.

> Stepping back a bit and looking at this problem from a higher level, why do
> you need to hack this stuff into the sequences? Couldn't you just define a
> new set of functions, say bdr_currval() and bdr_nextval(), to operate on
> these new kinds of sequences?

Basically two things:
a) User interface. For one everyone would need to reimplement the entire
sequence DDL from start. For another it means it's hard to write
(client) code that doesn't depend on a specific implementation.
b) It's not actually easy to get similar semantics in "userspace". How
would you emulate the crash-safe but non-transactional semantics of
sequences without copying most of sequence.c? Without writing
XLogInsert()s which we cannot do from a out-of-tree code?

> You're not making much use of the existing sequence infrastructure, anyway, so it might be best to just keep the
> implementation completely separate. If you need it for compatibility with
> applications, you could create facade currval/nextval() functions that call
> the built-in version or the bdr version depending on the argument.

That doesn't get you very far:
a) the default functions created by sequences are pg_catalog
prefixed. So you'd need to hack up the catalog to get your own functions
to be used if you want the application to work transparently. In which
you need to remember the former function because you now cannot call it
normally anymore. Yuck.
b) One would need nearly all of sequence.c again. You need the state
across calls, the locking, the WAL logging, DDL support. Pretty much
the only thing *not* used would be nextval_internal() and do_setval().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 10:50:21
Message-ID:	5289F0ED.20708@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18.11.2013 11:48, Andres Freund wrote:
> On 2013-11-18 10:54:42 +0200, Heikki Linnakangas wrote:
>> On 15.11.2013 20:21, Andres Freund wrote:
>>> Well, it exposes log_sequence_tuple() - together with the added "am
>>> private" column of pg_squence that allows to do quite a bit of different
>>> things. I think unless we really implement pluggable RMGRs - which I
>>> don't really see coming - we cannot be radically different.
>>
>> The proposed abstraction leaks like a sieve. The plugin should not need to
>> touch anything but the private amdata field. log_sequence_tuple() is way too
>> low-level.
>
> Why? It's *less* low level than a good part of other crash recovery code
> we have. I have quite some doubts about the layering, but it's imo
> pretty sensible to centralize the wal logging code that plugins couldn't
> write.

It doesn't go far enough, it's still too *low*-level. The sequence AM
implementation shouldn't need to have direct access to the buffer page
at all.

> If you look at what e.g the _alloc callback currently gets passed.
> a) Relation: Important for metadata like TupleDesc, name etc.
> b) SeqTable entry: Important to setup state for cur/lastval
> c) Buffer of the tuple: LSN etc.
> d) HeapTuple itself: a place to store the tuples changed state
>
> And if you then look at what gets modified in that callback:
> Form_pg_sequence->amdata
> ->is_called
> ->last_value
> ->log_cnt
> SeqTable ->last
> SeqTable ->cached
> SeqTable ->last_valid
>
> You need is_called, last_valid because of our current representation of
> sequences state (which we could change, to remove that need). The rest
> contains values that need to be set depending on how you want your
> sequence to behave:
> * Amdata is obvious.
> * You need log_cnt to influence/remember how big the chunks are you WAL log at
> once are. Which e.g. you need to control if want a sequence that
> doesn't leak values in crashes
> * cached is needed to control the per-backend caching.

I don't think the sequence AM should be in control of 'cached'. The
caching is done outside the AM. And log_cnt probably should be passed to
the _alloc function directly as an argument, ie. the server code asks
the AM to allocate N new values in one call.

I'm thinking that the alloc function should look something like this:

seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

And it should return:

int64 value - the first value allocated.
int nvalues - the number of values allocated.
am_private - updated private data.

The backend code handles the caching and logging of values. When it has
exhausted all the cached values (or doesn't have any yet), it calls the
AM's alloc function to get a new batch. The AM returns the new batch,
and updates its private state as necessary. Then the backend code
updates the relation file with the new values and the AM's private data.
WAL-logging and checkpointing is the backend's responsibility.

>> Just as a thought-experiment, imagine that we decided to re-implement
>> sequences so that all the sequence values are stored in one big table, or
>> flat-file in the data directory, instead of the current implementation where
>> we have a one-block relation file for each sequence. If the sequence access
>> method API is any good, it should stay unchanged. That's clearly not the
>> case with the proposed API.
>
> I don't think we can entirely abstract away the reality that now they are
> based on relations and might not be at some later point. Otherwise we'd
> have to invent an API that copied all the data that's stored in struct
> RelationData. Like name, owner, ...
> Which non SQL accessible API we provide has a chance of providing that
> level of consistency in the face of fundamental refactoring? I'd say
> none.

I'm not thinking that we'd change sequences to not be relations. I'm
thinking that we might not want to store the state as a one-page file
anymore. In fact that was just discussed in the other thread titled
"init_sequence spill to hash table".

> b) It's not actually easy to get similar semantics in "userspace". How
> would you emulate the crash-safe but non-transactional semantics of
> sequences without copying most of sequence.c? Without writing
> XLogInsert()s which we cannot do from a out-of-tree code?

heap_inplace_update()

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 11:35:53
Message-ID:	20131118113553.GA20305@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-18 12:50:21 +0200, Heikki Linnakangas wrote:
> On 18.11.2013 11:48, Andres Freund wrote:
> I don't think the sequence AM should be in control of 'cached'. The caching
> is done outside the AM. And log_cnt probably should be passed to the _alloc
> function directly as an argument, ie. the server code asks the AM to
> allocate N new values in one call.

Sounds sane.

> I'm thinking that the alloc function should look something like this:
>
> seqam_alloc(Relation seqrel, int nrequested, Datum am_private)

I don't think we can avoid giving access to the other columns of
pg_sequence, stuff like increment, limits et all need to be available
for reading, so that'd also need to get passed in. And we need to signal
that am_private can be NULL, otherwise we'd end up with ugly ways to
signal that.
So I'd say to pass in the entire tuple, and return a copy? Alternatively
we can return am_private as a 'struct varlena *', so we can properly
signal empty values.

We also need a way to set am_private from outside
seqam_alloc/setval/... Many of the fancier sequences that people talked
about will need preallocation somewhere in the background. As proposed
that's easy enough to write using log_sequence_tuple(), this way we'd
need something that calls a callback with the appropriate buffer lock
held. So maybe a seqam_update(Relation seqrel, ...) callback that get's
called when somebody executes pg_sequence_update(oid)?

It'd probably a good idea to provide a generic function for checking
whether a new value falls in the boundaries of the sequence's min, max +
error handling.

> I'm not thinking that we'd change sequences to not be relations. I'm
> thinking that we might not want to store the state as a one-page file
> anymore. In fact that was just discussed in the other thread titled
> "init_sequence spill to hash table".

Yes, I read and even commented in that thread... But nothing in the
current proposed API would prevent you from going in that direction,
you'd just get passed in a different tuple/buffer.

> >b) It's not actually easy to get similar semantics in "userspace". How
> >would you emulate the crash-safe but non-transactional semantics of
> >sequences without copying most of sequence.c? Without writing
> >XLogInsert()s which we cannot do from a out-of-tree code?
>
> heap_inplace_update()

That gets the crashsafe part partially (doesn't allow making the tuple
wider than before), but not the caching/stateful part et al. The point
is that you need most of sequence.c again.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 11:48:22
Message-ID:	CA+U5nMJ=fBW+ussJhTHQt5DxXRdSqP8OhuGOWKo=pSLrwghrfw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> It doesn't go far enough, it's still too *low*-level. The sequence AM
> implementation shouldn't need to have direct access to the buffer page at
> all.

> I don't think the sequence AM should be in control of 'cached'. The caching
> is done outside the AM. And log_cnt probably should be passed to the _alloc
> function directly as an argument, ie. the server code asks the AM to
> allocate N new values in one call.

I can't see what the rationale of your arguments is. All index Ams
write WAL and control buffer locking etc..

Do you have a new use case that shows why changes should happen? We
can't just redesign things based upon arbitrary decisions about what
things should or should not be possible via the API.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 12:06:25
Message-ID:	528A02C1.1060308@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18.11.2013 13:48, Simon Riggs wrote:
> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>
>> It doesn't go far enough, it's still too *low*-level. The sequence AM
>> implementation shouldn't need to have direct access to the buffer page at
>> all.
>
>> I don't think the sequence AM should be in control of 'cached'. The caching
>> is done outside the AM. And log_cnt probably should be passed to the _alloc
>> function directly as an argument, ie. the server code asks the AM to
>> allocate N new values in one call.
>
> I can't see what the rationale of your arguments is. All index Ams
> write WAL and control buffer locking etc..

Index AM's are completely responsible for the on-disk structure, while
with the proposed API, both the AM and the backend are intimately aware
of the on-disk representation. Such a shared responsibility is not a
good thing in an API. I would also be fine with going 100% to the index
AM direction, and remove all knowledge of the on-disk layout from the
backend code and move it into the AMs. Then you could actually implement
the discussed "store all sequences in a single file" change by writing a
new sequence AM for it.

- Heikki

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-18 12:36:35
Message-ID:	528A09D3.1090201@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14.11.2013 22:10, Simon Riggs wrote:
> Includes test extension which allows sequences without gaps - "gapless".

I realize this is just for demonstration purposes, but it's worth noting
that it doesn't actually guarantee that when you use the sequence to
populate a column in the table, the column would not have gaps.
Sequences are not transactional, so rollbacks will still produce gaps.
The documentation is misleading on that point. Without a strong
guarantee, it's a pretty useless extension.

- Heikki

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-24 17:15:38
Message-ID:	CA+U5nM+cXmO6cEpkZMcsQhgY99SfGCDGBWp7oOQpcjpcrL_BdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 14.11.2013 22:10, Simon Riggs wrote:
>>
>> Includes test extension which allows sequences without gaps - "gapless".
>
>
> I realize this is just for demonstration purposes, but it's worth noting
> that it doesn't actually guarantee that when you use the sequence to
> populate a column in the table, the column would not have gaps. Sequences
> are not transactional, so rollbacks will still produce gaps. The
> documentation is misleading on that point. Without a strong guarantee, it's
> a pretty useless extension.

True.

If I fix that problem, I should change the name to "lockup" sequences,
since only one transaction at a time could use the nextval.

Should I change the documentation, or just bin the idea?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-24 17:23:59
Message-ID:	CA+U5nMJ=fjfWCBTMT9dai-nsL_W_uNTJoWaBp2npo69DVxnRkQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> On 18.11.2013 13:48, Simon Riggs wrote:
>>
>> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
>> wrote:
>>
>>> It doesn't go far enough, it's still too *low*-level. The sequence AM
>>> implementation shouldn't need to have direct access to the buffer page at
>>> all.
>>
>>
>>> I don't think the sequence AM should be in control of 'cached'. The
>>> caching
>>> is done outside the AM. And log_cnt probably should be passed to the
>>> _alloc
>>> function directly as an argument, ie. the server code asks the AM to
>>> allocate N new values in one call.
>>
>>
>> I can't see what the rationale of your arguments is. All index Ams
>> write WAL and control buffer locking etc..
>
>
> Index AM's are completely responsible for the on-disk structure, while with
> the proposed API, both the AM and the backend are intimately aware of the
> on-disk representation. Such a shared responsibility is not a good thing in
> an API. I would also be fine with going 100% to the index AM direction, and
> remove all knowledge of the on-disk layout from the backend code and move it
> into the AMs. Then you could actually implement the discussed "store all
> sequences in a single file" change by writing a new sequence AM for it.

I think the way to resolve this is to do both of these things, i.e. a
two level API

1. Implement SeqAM API at the most generic level. Add a nextval() call
as well as alloc()

2. Also implement the proposed changes to alloc()

So the SeqAM would implement either nextval() or alloc() but not both

global sequences as envisaged for BDR would use a special alloc() call.

I don't think that is too much work, but I want to do this just once...

Thoughts on exact next steps for implementation please?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-25 09:01:29
Message-ID:	529311E9.8050905@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24.11.2013 19:23, Simon Riggs wrote:
> On 18 November 2013 07:06, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>> On 18.11.2013 13:48, Simon Riggs wrote:
>>>
>>> On 18 November 2013 07:50, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
>>> wrote:
>>>
>>>> It doesn't go far enough, it's still too *low*-level. The sequence AM
>>>> implementation shouldn't need to have direct access to the buffer page at
>>>> all.
>>>
>>>> I don't think the sequence AM should be in control of 'cached'. The
>>>> caching
>>>> is done outside the AM. And log_cnt probably should be passed to the
>>>> _alloc
>>>> function directly as an argument, ie. the server code asks the AM to
>>>> allocate N new values in one call.
>>>
>>> I can't see what the rationale of your arguments is. All index Ams
>>> write WAL and control buffer locking etc..
>>
>> Index AM's are completely responsible for the on-disk structure, while with
>> the proposed API, both the AM and the backend are intimately aware of the
>> on-disk representation. Such a shared responsibility is not a good thing in
>> an API. I would also be fine with going 100% to the index AM direction, and
>> remove all knowledge of the on-disk layout from the backend code and move it
>> into the AMs. Then you could actually implement the discussed "store all
>> sequences in a single file" change by writing a new sequence AM for it.
>
> I think the way to resolve this is to do both of these things, i.e. a
> two level API
>
> 1. Implement SeqAM API at the most generic level. Add a nextval() call
> as well as alloc()
>
> 2. Also implement the proposed changes to alloc()

The proposed changes to alloc() would still suffer from all the problems
that I complained about. Adding a new API alongside doesn't help with that.

- Heikki

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-25 10:00:09
Message-ID:	CA+U5nMLG+X8TcsmZPO9w3-1s14C=1xJhtSYYdJa1yWyOU9Yy3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> The proposed changes to alloc() would still suffer from all the problems
> that I complained about. Adding a new API alongside doesn't help with that.

You made two proposals. I suggested implementing both.

What would you have me do?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-26 11:28:33
Message-ID:	529485E1.6090402@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/24/13 19:15, Simon Riggs wrote:
> On 18 November 2013 07:36, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>> On 14.11.2013 22:10, Simon Riggs wrote:
>>>
>>> Includes test extension which allows sequences without gaps - "gapless".
>>
>> I realize this is just for demonstration purposes, but it's worth noting
>> that it doesn't actually guarantee that when you use the sequence to
>> populate a column in the table, the column would not have gaps. Sequences
>> are not transactional, so rollbacks will still produce gaps. The
>> documentation is misleading on that point. Without a strong guarantee, it's
>> a pretty useless extension.
>
> True.
>
> If I fix that problem, I should change the name to "lockup" sequences,
> since only one transaction at a time could use the nextval.
>
> Should I change the documentation, or just bin the idea?

Just bin it. It would be useful if it could guarantee gaplessness, but I
don't see how to do that.

- Heikki

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2013-11-26 11:32:58
Message-ID:	529486EA.9060603@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/25/13 12:00, Simon Riggs wrote:
> On 25 November 2013 04:01, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>
>> The proposed changes to alloc() would still suffer from all the problems
>> that I complained about. Adding a new API alongside doesn't help with that.
>
> You made two proposals. I suggested implementing both.
>
> What would you have me do?

Dunno. I do know that the proposed changes to alloc() are not a good
API. You could implement the other API, I think that has a chance of
being a cleaner API, but looking at the BDR extension that would use
that facility, I'm not sure how useful that would be for you. (and I'd
really like to see an actual implementation of whatever API we come up
with, before we commit to maintaining it).

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-09-14 23:38:52
Message-ID:	5416270C.5000005@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18/11/13 11:50, Heikki Linnakangas wrote:
>
> I don't think the sequence AM should be in control of 'cached'. The
> caching is done outside the AM. And log_cnt probably should be passed to
> the _alloc function directly as an argument, ie. the server code asks
> the AM to allocate N new values in one call.
>
> I'm thinking that the alloc function should look something like this:
>
> seqam_alloc(Relation seqrel, int nrequested, Datum am_private)
>

I was looking at this a bit today and what I see is that it's not that
simple.

Minimum input the seqam_alloc needs is:
- Relation seqrel
- int64 minv, maxv, incby, bool is_cycled - these are basically options
giving info about how the new numbers are allocated (I guess some
implementations are not going to support all of those)
- bool is_called - the current built-in sequence generator behaves
differently based on it and I am not sure we can get over it (it could
perhaps be done in back-end independently of AM?)
- int64 nrequested - number of requested values
- Datum am_private - current private data

In this light I agree with what Andres wrote - let's just send the whole
Form_pg_sequence object.

Also makes me think that the seqam options interface should also be
passed the minv/maxv/incby/is_cycled etc options for validation, not
just the amoptions.

> And it should return:
>
> int64 value - the first value allocated.
> int nvalues - the number of values allocated.
> am_private - updated private data.
>

There is also more needed than this, you need:
- int64 value - first value allocated (value to be returned)
- int64 nvalues - number of values allocated
- int64 last - last cached value (used for cached/last_value)
- int64 next - last logged value (used for wal logging)
- am_private - updated private data, must be possible to return as null

I personally don't like that we need all the "nvalues", "next" and
"last" as it makes the seqam a little bit too aware of the sequence
logging internals in my opinion but I haven't found a way around it -
it's impossible for backend to know how the AM will act around
incby/maxv/minv/cycling so it can't really calculate these values by
itself, unless ofcourse we fix the behavior and require seqams to behave
predictably, but that somewhat breaks the whole idea of leaving the
allocation to the seqam. Obviously it would also work to return list of
allocated values and then backend could calculate the "value",
"nvalues", "last", "next" from that list by itself, but I am worried
about performance of that approach.

>
> The backend code handles the caching and logging of values. When it has
> exhausted all the cached values (or doesn't have any yet), it calls the
> AM's alloc function to get a new batch. The AM returns the new batch,
> and updates its private state as necessary. Then the backend code
> updates the relation file with the new values and the AM's private data.
> WAL-logging and checkpointing is the backend's responsibility.
>

Agreed here.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-09-16 12:17:01
Message-ID:	20140916121701.GC25887@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote:
> - int64 minv, maxv, incby, bool is_cycled - these are basically options
> giving info about how the new numbers are allocated (I guess some
> implementations are not going to support all of those)
> - bool is_called - the current built-in sequence generator behaves
> differently based on it and I am not sure we can get over it (it could
> perhaps be done in back-end independently of AM?)

I think we might be able to get rid of is_called entirely. Or at least
get rid of it from the view of the AMs.

> Also makes me think that the seqam options interface should also be passed
> the minv/maxv/incby/is_cycled etc options for validation, not just the
> amoptions.

Sup.

BTW: Is 'is_cycled' a horrible name, or is that just me? Horribly easy
to confuse with the fact that a sequence has already wrapped around...

> >And it should return:
> >
> >int64 value - the first value allocated.
> >int nvalues - the number of values allocated.
> >am_private - updated private data.
> >
>
> There is also more needed than this, you need:
> - int64 value - first value allocated (value to be returned)
> - int64 nvalues - number of values allocated
> - int64 last - last cached value (used for cached/last_value)
> - int64 next - last logged value (used for wal logging)
> - am_private - updated private data, must be possible to return as null

> I personally don't like that we need all the "nvalues", "next" and "last" as
> it makes the seqam a little bit too aware of the sequence logging internals
> in my opinion but I haven't found a way around it - it's impossible for
> backend to know how the AM will act around incby/maxv/minv/cycling so it
> can't really calculate these values by itself, unless ofcourse we fix the
> behavior and require seqams to behave predictably, but that somewhat breaks
> the whole idea of leaving the allocation to the seqam. Obviously it would
> also work to return list of allocated values and then backend could
> calculate the "value", "nvalues", "last", "next" from that list by itself,
> but I am worried about performance of that approach.

Yea, it's far from pretty.

I'm not convinced that the AM ever needs to/should care about
caching. To me that's more like a generic behaviour. So it probably
should be abstracted away from the individual AMs.

I think the allocation routine might also need to be able to indicate
whether WAL logging is needed or not.

One thing I want attention to be paid to is that the interface should be
able to support 'gapless' sequences. I.e. where nextval() (and thus
alloc) needs to wait until the last caller to it finished. That very
well can be relevant from the locking *and* WAL logging perspective.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-09-16 21:00:46
Message-ID:	5418A4FE.5010803@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16/09/14 14:17, Andres Freund wrote:
> On 2014-09-15 01:38:52 +0200, Petr Jelinek wrote:
>>
>> There is also more needed than this, you need:
>> - int64 value - first value allocated (value to be returned)
>> - int64 nvalues - number of values allocated
>> - int64 last - last cached value (used for cached/last_value)
>> - int64 next - last logged value (used for wal logging)
>> - am_private - updated private data, must be possible to return as null
>
>> I personally don't like that we need all the "nvalues", "next" and "last" as
>> it makes the seqam a little bit too aware of the sequence logging internals
>> in my opinion but I haven't found a way around it - it's impossible for
>> backend to know how the AM will act around incby/maxv/minv/cycling so it
>> can't really calculate these values by itself, unless ofcourse we fix the
>> behavior and require seqams to behave predictably, but that somewhat breaks
>> the whole idea of leaving the allocation to the seqam. Obviously it would
>> also work to return list of allocated values and then backend could
>> calculate the "value", "nvalues", "last", "next" from that list by itself,
>> but I am worried about performance of that approach.
>
> Yea, it's far from pretty.
>
> I'm not convinced that the AM ever needs to/should care about
> caching. To me that's more like a generic behaviour. So it probably
> should be abstracted away from the individual AMs.
>
> I think the allocation routine might also need to be able to indicate
> whether WAL logging is needed or not.

Well that means we probably want to return first allocated value, last
allocated value and then some boolean that tells backend if to wal log
the sequence or not (number of values allocated does not really seem to
be important unless I am missing something).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-10-13 10:01:10
Message-ID:	543BA2E6.3070308@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I rewrote the patch with different API along the lines of what was
discussed.

The API now consists of following functions:
sequence_alloc - allocating range of new values
The function receives the sequence relation, current value, number of
requested values amdata and relevant sequence options like min/max and
returns new amdata, new current value, number of values allocated and
also if it needs wal write (that should be returned if amdata has
changed plus other reasons the AM might have to force the wal update).

sequence_setval - notification that setval is happening
This function gets sequence relation, previous value and new value plus
the amdata and returns amdata (I can imagine some complex sequence AMs
will want to throw error that setval can't be done on them).

sequence_request_update/sequence_update - used for background processing
Basically AM can call the sequence_request_update and backend will then
call the sequence_update method of an AM with current amdata and will
write the updated amdata to disk

sequence_seqparams - function to process/validate the standard sequence
options like start position, min/max, increment by etc by the AM, it's
called in addition to the standard processing

sequence_reloptions - this is the only thing that remained unchanged
from previous patch, it's meant to pass custom options to the AM

Only the alloc and reloptions methods are required (and implemented by
the local AM).

The caching, xlog writing, updating the page, etc is handled by backend,
the AM does not see the tuple at all. I decided to not pass even the
struct around and just pass the relevant options because I think if we
want to abstract the storage properly then the AM should not care about
how the pg_sequence looks like at all, even if it means that the
sequence_alloc parameter list is bit long.

For the amdata handling (which is the AM's private data variable) the
API assumes that (Datum) 0 is NULL, this seems to work well for
reloptions so should work here also and it simplifies things a little
compared to passing pointers to pointers around and making sure
everything is allocated, etc.

Sadly the fact that amdata is not fixed size and can be NULL made the
page updates of the sequence relation quite more complex that it used to
be. There are probably some optimizations possible there but I think the
patch is good enough for the review now, so I am adding it to October
commitfest.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
seqamv3.patch	text/x-diff	80.1 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-04 12:11:04
Message-ID:	5458C258.802@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/13/2014 01:01 PM, Petr Jelinek wrote:
> Hi,
>
> I rewrote the patch with different API along the lines of what was
> discussed.

Thanks, that's better.

It would be good to see an alternative seqam to implement this API, to
see how it really works. The "local" one is too dummy to expose any
possible issues.

> The API now consists of following functions:
> sequence_alloc - allocating range of new values
> The function receives the sequence relation, current value, number of
> requested values amdata and relevant sequence options like min/max and
> returns new amdata, new current value, number of values allocated and
> also if it needs wal write (that should be returned if amdata has
> changed plus other reasons the AM might have to force the wal update).
>
> sequence_setval - notification that setval is happening
> This function gets sequence relation, previous value and new value plus
> the amdata and returns amdata (I can imagine some complex sequence AMs
> will want to throw error that setval can't be done on them).
>
> sequence_request_update/sequence_update - used for background processing
> Basically AM can call the sequence_request_update and backend will then
> call the sequence_update method of an AM with current amdata and will
> write the updated amdata to disk
>
> sequence_seqparams - function to process/validate the standard sequence
> options like start position, min/max, increment by etc by the AM, it's
> called in addition to the standard processing
>
> sequence_reloptions - this is the only thing that remained unchanged
> from previous patch, it's meant to pass custom options to the AM
>
> Only the alloc and reloptions methods are required (and implemented by
> the local AM).
>
> The caching, xlog writing, updating the page, etc is handled by backend,
> the AM does not see the tuple at all. I decided to not pass even the
> struct around and just pass the relevant options because I think if we
> want to abstract the storage properly then the AM should not care about
> how the pg_sequence looks like at all, even if it means that the
> sequence_alloc parameter list is bit long.

Hmm. The division of labour between the seqam and commands/sequence.c
still feels a bit funny. sequence.c keeps track of how many values have
been WAL-logged, and thus usable immediately, but we still call
sequence_alloc even when using up those already WAL-logged values. If
you think of using this for something like a centralized sequence server
in a replication cluster, you certainly don't want to make a call to the
remote server for every value - you'll want to cache them.

With the "local" seqam, there are two levels of caching. Each backend
caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
addition to that, the server WAL-logs 32 values at a time. If you have a
remote seqam, it would most likely add a third cache, but it would
interact in strange ways with the second cache.

Considering a non-local seqam, the locking is also a bit strange. The
server keeps the sequence page locked throughout nextval(). But if the
actual state of the sequence is maintained elsewhere, there's no need to
serialize the calls to the remote allocator, i.e. the sequence_alloc()
calls.

I'm not exactly sure what to do about that. One option is to completely
move the maintenance of the "current" value, i.e. sequence.last_value,
to the seqam. That makes sense from an abstraction point of view. For
example with a remote server managing the sequence, storing the
"current" value in the local catalog table makes no sense as it's always
going to be out-of-date. The local seqam would store it as part of the
am-private data. However, you would need to move the responsibility of
locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
provide an API that the seqam can call to do that. Perhaps just let the
seqam call heap_inplace_update on the sequence relation.

> For the amdata handling (which is the AM's private data variable) the
> API assumes that (Datum) 0 is NULL, this seems to work well for
> reloptions so should work here also and it simplifies things a little
> compared to passing pointers to pointers around and making sure
> everything is allocated, etc.
>
> Sadly the fact that amdata is not fixed size and can be NULL made the
> page updates of the sequence relation quite more complex that it used to
> be.

It would be nice if the seqam could define exactly the columns it needs,
with any datatypes. There would be a set of common attributes:
sequence_name, start_value, cache_value, increment_by, max_value,
min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
and "is_called" to that. A remote seqam that calls out to some other
server might store the remote server's hostname etc.

There could be a seqam function that returns a TupleDesc with the
required columns, for example.

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-04 21:01:52
Message-ID:	54593EC0.4030009@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 04/11/14 13:11, Heikki Linnakangas wrote:
> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>> Hi,
>>
>> I rewrote the patch with different API along the lines of what was
>> discussed.
>
> Thanks, that's better.
>
> It would be good to see an alternative seqam to implement this API, to
> see how it really works. The "local" one is too dummy to expose any
> possible issues.
>

Yeah, I don't know what that would be, It's hard for me to find
something that would sever purely demonstration purposes. I guess I
could port BDR sequences to this if it would help (once we have bit more
solid agreement that the proposed API at least theoretically seems ok so
that I don't have to rewrite it 10 times if at all possible).

>> ...
>> Only the alloc and reloptions methods are required (and implemented by
>> the local AM).
>>
>> The caching, xlog writing, updating the page, etc is handled by backend,
>> the AM does not see the tuple at all. I decided to not pass even the
>> struct around and just pass the relevant options because I think if we
>> want to abstract the storage properly then the AM should not care about
>> how the pg_sequence looks like at all, even if it means that the
>> sequence_alloc parameter list is bit long.
>
> Hmm. The division of labour between the seqam and commands/sequence.c
> still feels a bit funny. sequence.c keeps track of how many values have
> been WAL-logged, and thus usable immediately, but we still call
> sequence_alloc even when using up those already WAL-logged values. If
> you think of using this for something like a centralized sequence server
> in a replication cluster, you certainly don't want to make a call to the
> remote server for every value - you'll want to cache them.
>
> With the "local" seqam, there are two levels of caching. Each backend
> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
> addition to that, the server WAL-logs 32 values at a time. If you have a
> remote seqam, it would most likely add a third cache, but it would
> interact in strange ways with the second cache.
>
> Considering a non-local seqam, the locking is also a bit strange. The
> server keeps the sequence page locked throughout nextval(). But if the
> actual state of the sequence is maintained elsewhere, there's no need to
> serialize the calls to the remote allocator, i.e. the sequence_alloc()
> calls.
>
> I'm not exactly sure what to do about that. One option is to completely
> move the maintenance of the "current" value, i.e. sequence.last_value,
> to the seqam. That makes sense from an abstraction point of view. For
> example with a remote server managing the sequence, storing the
> "current" value in the local catalog table makes no sense as it's always
> going to be out-of-date. The local seqam would store it as part of the
> am-private data. However, you would need to move the responsibility of
> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
> provide an API that the seqam can call to do that. Perhaps just let the
> seqam call heap_inplace_update on the sequence relation.
>

My idea of how this works is - sequence_next handles the allocation and
the level2 caching (WAL logged cache) via amdata if it supports it, or
returns single value if it doesn't - then the WAL will always just write
the 1 value and there will basically be no level2 cache, since it is the
sequence_next who controls how much will be WAL-logged, what backend
asks for is just "suggestion".

The third level caching as you say should be solved by the
sequence_request_update and sequence_update callback - that will enable
the sequence AM to handle this kind of caching asynchronously without
blocking the sequence_next unnecessarily.

That way it's possible to implement many different strategies, from no
cache at all, to three levels of caching, with and without blocking of
calls to sequence_next.

>> For the amdata handling (which is the AM's private data variable) the
>> API assumes that (Datum) 0 is NULL, this seems to work well for
>> reloptions so should work here also and it simplifies things a little
>> compared to passing pointers to pointers around and making sure
>> everything is allocated, etc.
>>
>> Sadly the fact that amdata is not fixed size and can be NULL made the
>> page updates of the sequence relation quite more complex that it used to
>> be.
>
> It would be nice if the seqam could define exactly the columns it needs,
> with any datatypes. There would be a set of common attributes:
> sequence_name, start_value, cache_value, increment_by, max_value,
> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
> and "is_called" to that. A remote seqam that calls out to some other
> server might store the remote server's hostname etc.
>
> There could be a seqam function that returns a TupleDesc with the
> required columns, for example.
>

Wouldn't that somewhat bloat catalog if we had new catalog table for
each sequence AM? It also does not really solve the amdata being dynamic
size "issue". I think just dynamic field where AM stores whatever it
wants is enough (ie the current solution), it's just that the hackery
that sequences storage implementation does is more visible once there
are non-fixed fields.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-05 12:45:31
Message-ID:	545A1BEB.9030406@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/04/2014 11:01 PM, Petr Jelinek wrote:
> On 04/11/14 13:11, Heikki Linnakangas wrote:
>> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>>> Only the alloc and reloptions methods are required (and implemented by
>>> the local AM).
>>>
>>> The caching, xlog writing, updating the page, etc is handled by backend,
>>> the AM does not see the tuple at all. I decided to not pass even the
>>> struct around and just pass the relevant options because I think if we
>>> want to abstract the storage properly then the AM should not care about
>>> how the pg_sequence looks like at all, even if it means that the
>>> sequence_alloc parameter list is bit long.
>>
>> Hmm. The division of labour between the seqam and commands/sequence.c
>> still feels a bit funny. sequence.c keeps track of how many values have
>> been WAL-logged, and thus usable immediately, but we still call
>> sequence_alloc even when using up those already WAL-logged values. If
>> you think of using this for something like a centralized sequence server
>> in a replication cluster, you certainly don't want to make a call to the
>> remote server for every value - you'll want to cache them.
>>
>> With the "local" seqam, there are two levels of caching. Each backend
>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
>> addition to that, the server WAL-logs 32 values at a time. If you have a
>> remote seqam, it would most likely add a third cache, but it would
>> interact in strange ways with the second cache.
>>
>> Considering a non-local seqam, the locking is also a bit strange. The
>> server keeps the sequence page locked throughout nextval(). But if the
>> actual state of the sequence is maintained elsewhere, there's no need to
>> serialize the calls to the remote allocator, i.e. the sequence_alloc()
>> calls.
>>
>> I'm not exactly sure what to do about that. One option is to completely
>> move the maintenance of the "current" value, i.e. sequence.last_value,
>> to the seqam. That makes sense from an abstraction point of view. For
>> example with a remote server managing the sequence, storing the
>> "current" value in the local catalog table makes no sense as it's always
>> going to be out-of-date. The local seqam would store it as part of the
>> am-private data. However, you would need to move the responsibility of
>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
>> provide an API that the seqam can call to do that. Perhaps just let the
>> seqam call heap_inplace_update on the sequence relation.
>
> My idea of how this works is - sequence_next handles the allocation and
> the level2 caching (WAL logged cache) via amdata if it supports it, or
> returns single value if it doesn't - then the WAL will always just write
> the 1 value and there will basically be no level2 cache, since it is the
> sequence_next who controls how much will be WAL-logged, what backend
> asks for is just "suggestion".

Hmm, so the AM might return an "nallocated" value less than the "fetch"
value that sequence.c requested? As the patch stands, wouldn't that make
sequence.c write a WAL record more often?

In fact, if the seqam manages the current value outside the database
(e.g. a "remote" seqam that gets the value from another server),
nextval() never needs to write a WAL record.

> The third level caching as you say should be solved by the
> sequence_request_update and sequence_update callback - that will enable
> the sequence AM to handle this kind of caching asynchronously without
> blocking the sequence_next unnecessarily.
>
> That way it's possible to implement many different strategies, from no
> cache at all, to three levels of caching, with and without blocking of
> calls to sequence_next.

It's nice that asynchronous operation is possible, but that seems like a
more advanced feature. Surely an approach where you fetch a bunch of
values when needed is going to be more common. Let's focus on the simple
synchronous case first, and make sure the API works well for that.

>>> For the amdata handling (which is the AM's private data variable) the
>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>> reloptions so should work here also and it simplifies things a little
>>> compared to passing pointers to pointers around and making sure
>>> everything is allocated, etc.
>>>
>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>> page updates of the sequence relation quite more complex that it used to
>>> be.
>>
>> It would be nice if the seqam could define exactly the columns it needs,
>> with any datatypes. There would be a set of common attributes:
>> sequence_name, start_value, cache_value, increment_by, max_value,
>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>> and "is_called" to that. A remote seqam that calls out to some other
>> server might store the remote server's hostname etc.
>>
>> There could be a seqam function that returns a TupleDesc with the
>> required columns, for example.
>
> Wouldn't that somewhat bloat catalog if we had new catalog table for
> each sequence AM?

No, that's not what I meant. The number of catalog tables would be the
same as today. Sequences look much like any other relation, with entries
in pg_attribute catalog table for all the attributes for each sequence.
Currently, all sequences have the same set of attributes, sequence_name,
last_value and so forth. What I'm proposing is that there would a set of
attributes that are common to all sequences, but in addition to that
there could be any number of AM-specific attributes.

> It also does not really solve the amdata being dynamic
> size "issue".

Yes it would. There would not be a single amdata attribute, but the AM
could specify any number of custom attributes, which could be fixed size
or varlen. It would be solely the AM's responsibility to set the values
of those attributes.

> I think just dynamic field where AM stores whatever it
> wants is enough (ie the current solution), it's just that the hackery
> that sequences storage implementation does is more visible once there
> are non-fixed fields.

Using a single amdata field is like creating a table with a single text
column, and putting all your data in the single column separated by
tabs. I'm sure you can make it work, but normalized data is much nicer
to work with.

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-05 15:07:24
Message-ID:	545A3D2C.1080404@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/11/14 13:45, Heikki Linnakangas wrote:
> On 11/04/2014 11:01 PM, Petr Jelinek wrote:
>> On 04/11/14 13:11, Heikki Linnakangas wrote:
>>> On 10/13/2014 01:01 PM, Petr Jelinek wrote:
>>>> Only the alloc and reloptions methods are required (and implemented by
>>>> the local AM).
>>>>
>>>> The caching, xlog writing, updating the page, etc is handled by
>>>> backend,
>>>> the AM does not see the tuple at all. I decided to not pass even the
>>>> struct around and just pass the relevant options because I think if we
>>>> want to abstract the storage properly then the AM should not care about
>>>> how the pg_sequence looks like at all, even if it means that the
>>>> sequence_alloc parameter list is bit long.
>>>
>>> Hmm. The division of labour between the seqam and commands/sequence.c
>>> still feels a bit funny. sequence.c keeps track of how many values have
>>> been WAL-logged, and thus usable immediately, but we still call
>>> sequence_alloc even when using up those already WAL-logged values. If
>>> you think of using this for something like a centralized sequence server
>>> in a replication cluster, you certainly don't want to make a call to the
>>> remote server for every value - you'll want to cache them.
>>>
>>> With the "local" seqam, there are two levels of caching. Each backend
>>> caches some values (per the CACHE <value> option in CREATE SEQUENCE). In
>>> addition to that, the server WAL-logs 32 values at a time. If you have a
>>> remote seqam, it would most likely add a third cache, but it would
>>> interact in strange ways with the second cache.
>>>
>>> Considering a non-local seqam, the locking is also a bit strange. The
>>> server keeps the sequence page locked throughout nextval(). But if the
>>> actual state of the sequence is maintained elsewhere, there's no need to
>>> serialize the calls to the remote allocator, i.e. the sequence_alloc()
>>> calls.
>>>
>>> I'm not exactly sure what to do about that. One option is to completely
>>> move the maintenance of the "current" value, i.e. sequence.last_value,
>>> to the seqam. That makes sense from an abstraction point of view. For
>>> example with a remote server managing the sequence, storing the
>>> "current" value in the local catalog table makes no sense as it's always
>>> going to be out-of-date. The local seqam would store it as part of the
>>> am-private data. However, you would need to move the responsibility of
>>> locking and WAL-logging to the seqam. Maybe that's OK, but we'll need to
>>> provide an API that the seqam can call to do that. Perhaps just let the
>>> seqam call heap_inplace_update on the sequence relation.
>>
>> My idea of how this works is - sequence_next handles the allocation and
>> the level2 caching (WAL logged cache) via amdata if it supports it, or
>> returns single value if it doesn't - then the WAL will always just write
>> the 1 value and there will basically be no level2 cache, since it is the
>> sequence_next who controls how much will be WAL-logged, what backend
>> asks for is just "suggestion".
>
> Hmm, so the AM might return an "nallocated" value less than the "fetch"
> value that sequence.c requested? As the patch stands, wouldn't that make
> sequence.c write a WAL record more often?
>

That's correct, that's also why you usually want to have some form of
local caching if possible.

> In fact, if the seqam manages the current value outside the database
> (e.g. a "remote" seqam that gets the value from another server),
> nextval() never needs to write a WAL record.

Sure it does, you need to keep the current state in Postgres also, at
least the current value so that you can pass correct input to
sequence_alloc(). And you need to do this in crash-safe way so WAL is
necessary.

I think sequences will cache in amdata if possible for that type of
sequence and in case where it's not and it's caching on some external
server then you'll most likely get bigger overhead from network than WAL
anyway...

>
>>>> For the amdata handling (which is the AM's private data variable) the
>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>> reloptions so should work here also and it simplifies things a little
>>>> compared to passing pointers to pointers around and making sure
>>>> everything is allocated, etc.
>>>>
>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>> page updates of the sequence relation quite more complex that it
>>>> used to
>>>> be.
>>>
>>> It would be nice if the seqam could define exactly the columns it needs,
>>> with any datatypes. There would be a set of common attributes:
>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>> and "is_called" to that. A remote seqam that calls out to some other
>>> server might store the remote server's hostname etc.
>>>
>>> There could be a seqam function that returns a TupleDesc with the
>>> required columns, for example.
>>
>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>> each sequence AM?
>
> No, that's not what I meant. The number of catalog tables would be the
> same as today. Sequences look much like any other relation, with entries
> in pg_attribute catalog table for all the attributes for each sequence.
> Currently, all sequences have the same set of attributes, sequence_name,
> last_value and so forth. What I'm proposing is that there would a set of
> attributes that are common to all sequences, but in addition to that
> there could be any number of AM-specific attributes.

Oh, that's interesting idea, so the AM interfaces would basically return
updated tuple and there would be some description function that returns
tupledesc. I am bit worried that this would kill any possibility of
ALTER SEQUENCE USING access_method. Plus I don't think it actually
solves any real problem - serializing the internal C structs into bytea
is not any harder than serializing them into tuple IMHO.

>
>> It also does not really solve the amdata being dynamic
>> size "issue".
>
> Yes it would. There would not be a single amdata attribute, but the AM
> could specify any number of custom attributes, which could be fixed size
> or varlen. It would be solely the AM's responsibility to set the values
> of those attributes.
>

That's not the issue I was referring to, I was talking about the page
replacement code which is not as simple now that we have potentially
dynamic size tuple and if tuples were different for different AMs the
code would still have to be able to handle that case. Setting the values
in tuple itself is not too complicated.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-05 17:32:27
Message-ID:	545A5F2B.2010503@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/05/2014 05:07 PM, Petr Jelinek wrote:
> On 05/11/14 13:45, Heikki Linnakangas wrote:
>> In fact, if the seqam manages the current value outside the database
>> (e.g. a "remote" seqam that gets the value from another server),
>> nextval() never needs to write a WAL record.
>
> Sure it does, you need to keep the current state in Postgres also, at
> least the current value so that you can pass correct input to
> sequence_alloc(). And you need to do this in crash-safe way so WAL is
> necessary.

Why does sequence_alloc need the current value? If it's a "remote"
seqam, the current value is kept in the remote server, and the last
value that was given to this PostgreSQL server is irrelevant.

That irks me with this API. The method for acquiring a new value isn't
fully abstracted behind the AM interface, as sequence.c still needs to
track it itself. That's useful for the local AM, of course, and maybe
some others, but for others it's totally useless.

>>>>> For the amdata handling (which is the AM's private data variable) the
>>>>> API assumes that (Datum) 0 is NULL, this seems to work well for
>>>>> reloptions so should work here also and it simplifies things a little
>>>>> compared to passing pointers to pointers around and making sure
>>>>> everything is allocated, etc.
>>>>>
>>>>> Sadly the fact that amdata is not fixed size and can be NULL made the
>>>>> page updates of the sequence relation quite more complex that it
>>>>> used to
>>>>> be.
>>>>
>>>> It would be nice if the seqam could define exactly the columns it needs,
>>>> with any datatypes. There would be a set of common attributes:
>>>> sequence_name, start_value, cache_value, increment_by, max_value,
>>>> min_value, is_cycled. The local seqam would add "last_value", "log_cnt"
>>>> and "is_called" to that. A remote seqam that calls out to some other
>>>> server might store the remote server's hostname etc.
>>>>
>>>> There could be a seqam function that returns a TupleDesc with the
>>>> required columns, for example.
>>>
>>> Wouldn't that somewhat bloat catalog if we had new catalog table for
>>> each sequence AM?
>>
>> No, that's not what I meant. The number of catalog tables would be the
>> same as today. Sequences look much like any other relation, with entries
>> in pg_attribute catalog table for all the attributes for each sequence.
>> Currently, all sequences have the same set of attributes, sequence_name,
>> last_value and so forth. What I'm proposing is that there would a set of
>> attributes that are common to all sequences, but in addition to that
>> there could be any number of AM-specific attributes.
>
> Oh, that's interesting idea, so the AM interfaces would basically return
> updated tuple and there would be some description function that returns
> tupledesc.

Yeah, something like that.

> I am bit worried that this would kill any possibility of
> ALTER SEQUENCE USING access_method. Plus I don't think it actually
> solves any real problem - serializing the internal C structs into bytea
> is not any harder than serializing them into tuple IMHO.

I agree that serialization to bytea isn't that difficult, but it's still
nicer to work directly with the correct data types. And it makes the
internal state easily accessible for monitoring and debugging purposes.

>>> It also does not really solve the amdata being dynamic
>>> size "issue".
>>
>> Yes it would. There would not be a single amdata attribute, but the AM
>> could specify any number of custom attributes, which could be fixed size
>> or varlen. It would be solely the AM's responsibility to set the values
>> of those attributes.
>>
>
> That's not the issue I was referring to, I was talking about the page
> replacement code which is not as simple now that we have potentially
> dynamic size tuple and if tuples were different for different AMs the
> code would still have to be able to handle that case. Setting the values
> in tuple itself is not too complicated.

I don't see the problem with that. We deal with variable-sized tuples in
heap pages all the time. The max size of amdata (or the extra
AM-specific columns) is going to be determined by the block size, though.

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-05 22:43:00
Message-ID:	545AA7F4.3010403@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/11/14 18:32, Heikki Linnakangas wrote:
> On 11/05/2014 05:07 PM, Petr Jelinek wrote:
>> On 05/11/14 13:45, Heikki Linnakangas wrote:
>>> In fact, if the seqam manages the current value outside the database
>>> (e.g. a "remote" seqam that gets the value from another server),
>>> nextval() never needs to write a WAL record.
>>
>> Sure it does, you need to keep the current state in Postgres also, at
>> least the current value so that you can pass correct input to
>> sequence_alloc(). And you need to do this in crash-safe way so WAL is
>> necessary.
>
> Why does sequence_alloc need the current value? If it's a "remote"
> seqam, the current value is kept in the remote server, and the last
> value that was given to this PostgreSQL server is irrelevant.

Hmm, I am not sure if I see this usecase as practical TBH, but I also
don't see fundamental problem with it.

> That irks me with this API. The method for acquiring a new value isn't
> fully abstracted behind the AM interface, as sequence.c still needs to
> track it itself. That's useful for the local AM, of course, and maybe
> some others, but for others it's totally useless.

Hmm, I think that kind of abstraction can only be done by passing
current tuple and returning updated tuple (yes I realize that it's what
you have been saying basically).

In general it sounds like the level of abstraction you'd want would be
one where AM cares about everything except the the code that does the
actual writes to page and WAL (but when to do those would still be
controlled completely by AM?) and the SQL interface.
I don't see how to make that work with ALTER SEQUENCE USING to be honest
and I do care quite a lot about that use-case (I think the ability to
convert the "local" sequences to 3rd party ones and back is very important).

>>
>> That's not the issue I was referring to, I was talking about the page
>> replacement code which is not as simple now that we have potentially
>> dynamic size tuple and if tuples were different for different AMs the
>> code would still have to be able to handle that case. Setting the values
>> in tuple itself is not too complicated.
>
> I don't see the problem with that. We deal with variable-sized tuples in
> heap pages all the time. The max size of amdata (or the extra
> AM-specific columns) is going to be determined by the block size, though.
>

Glad to hear that. Yes the limit is block size, I think we can live with
that at least for the moment...

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-06 10:22:16
Message-ID:	545B4BD8.2050001@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/05/2014 05:01 AM, Petr Jelinek wrote:
> I guess I could port BDR sequences to this if it would help (once we
> have bit more solid agreement that the proposed API at least
> theoretically seems ok so that I don't have to rewrite it 10 times if at
> all possible).

Because the BDR sequences rely on all the other BDR machinery I suspect
that'd be a pretty big thing to review and follow for someone who
doesn't know the BDR code.

Do you think it'd be simple to provide a blocking, transactional
sequence allocator via this API? i.e. gapless sequences, much the same
as typically implemented with SELECT ... FOR UPDATE on a counter table.

It might be more digestible standalone, and would be a handy contrib/
example extension demonstrating use of the API.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-07 15:30:11
Message-ID:	20141107153011.GO1791@alvin.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Craig Ringer wrote:
> On 11/05/2014 05:01 AM, Petr Jelinek wrote:
> > I guess I could port BDR sequences to this if it would help (once we
> > have bit more solid agreement that the proposed API at least
> > theoretically seems ok so that I don't have to rewrite it 10 times if at
> > all possible).
>
> Because the BDR sequences rely on all the other BDR machinery I suspect
> that'd be a pretty big thing to review and follow for someone who
> doesn't know the BDR code.
>
> Do you think it'd be simple to provide a blocking, transactional
> sequence allocator via this API? i.e. gapless sequences, much the same
> as typically implemented with SELECT ... FOR UPDATE on a counter table.

I think that would be a useful contrib module too; gapless allocation is
a frequent user need. If we can build it in some reasonable way with the
seqam API, it'd show the API is good.

But on the other hand, since BDR sequences are the ultimate goal of this
patch, we need to make sure that they can be made to work with whatever
the seqam API ends up being. It's not necessary for reviewers here to
review that code, but the BDR developers need to say "yes, this API
works for us" -- just like the slony developers (well, ssinger) did for
logical decoding.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-07 16:35:13
Message-ID:	545CF4C1.4080007@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/11/14 11:22, Craig Ringer wrote:
> On 11/05/2014 05:01 AM, Petr Jelinek wrote:
>> I guess I could port BDR sequences to this if it would help (once we
>> have bit more solid agreement that the proposed API at least
>> theoretically seems ok so that I don't have to rewrite it 10 times if at
>> all possible).
>
> Because the BDR sequences rely on all the other BDR machinery I suspect
> that'd be a pretty big thing to review and follow for someone who
> doesn't know the BDR code.
>
> Do you think it'd be simple to provide a blocking, transactional
> sequence allocator via this API? i.e. gapless sequences, much the same
> as typically implemented with SELECT ... FOR UPDATE on a counter table.
>
> It might be more digestible standalone, and would be a handy contrib/
> example extension demonstrating use of the API.
>

Yes I think that's doable (once we iron out the API we can agree on).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-07 23:45:39
Message-ID:	9D8BE011-5E2E-4608-A716-8B7FB90B745F@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that use-case (I think the ability to convert the "local" sequences to 3rd party ones and back is very important).

What specific problems do you foresee? There's an issue if something depends on one of the added sequence columns, but if that is the case then you had *better* fail.

I think that the debugability value of making extra sequence columns human-readable is quite high.

...Robert

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-07 23:57:41
Message-ID:	545D5C75.90303@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/11/14 00:45, Robert Haas wrote:
> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> I don't see how to make that work with ALTER SEQUENCE USING to be honest and I do care quite a lot about that use-case (I think the ability to convert the "local" sequences to 3rd party ones and back is very important).
>
> What specific problems do you foresee? There's an issue if something depends on one of the added sequence columns, but if that is the case then you had *better* fail.
>
> I think that the debugability value of making extra sequence columns human-readable is quite high.
>

My main problem is actually not with having tuple per-seqAM, but more
with the fact that Heikki does not want to have last_value as compulsory
column/parameter. How is the new AM then supposed to know where to pick
up and if it even can pick up?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-08 00:26:46
Message-ID:	545D6346.8000407@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/11/14 00:57, Petr Jelinek wrote:
> On 08/11/14 00:45, Robert Haas wrote:
>> On Nov 5, 2014, at 5:43 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>> I don't see how to make that work with ALTER SEQUENCE USING to be
>>> honest and I do care quite a lot about that use-case (I think the
>>> ability to convert the "local" sequences to 3rd party ones and back
>>> is very important).
>>
>> What specific problems do you foresee? There's an issue if something
>> depends on one of the added sequence columns, but if that is the case
>> then you had *better* fail.
>>
>> I think that the debugability value of making extra sequence columns
>> human-readable is quite high.
>>
>
> My main problem is actually not with having tuple per-seqAM, but more
> with the fact that Heikki does not want to have last_value as compulsory
> column/parameter. How is the new AM then supposed to know where to pick
> up and if it even can pick up?
>

And obviously, once the last_value is part of the compulsory columns we
again have to WAL log all the time for the use-case which Heikki is
using as model, so it does not help there (just to clear what my point
was about).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-08 02:10:41
Message-ID:	CA+TgmoYA5Ud4qd0zXCjdwrz=ohve25cfk8n_SOVr_6SnmuxKAA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> My main problem is actually not with having tuple per-seqAM, but more
>> with the fact that Heikki does not want to have last_value as compulsory
>> column/parameter. How is the new AM then supposed to know where to pick
>> up and if it even can pick up?

That seems pretty well impossible to know anyway. If the pluggable AM
was handing out values at random or in some unpredictable fashion,
there may be no well-defined point where it's safe for the default AM
to resume. Granted, in the case of replication, it probably is
possible, and maybe that's important enough to be worth catering to.

> And obviously, once the last_value is part of the compulsory columns we
> again have to WAL log all the time for the use-case which Heikki is using as
> model, so it does not help there (just to clear what my point was about).

But I don't know what to do about that. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-08 11:15:03
Message-ID:	545DFB37.6060304@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/08/2014 12:35 AM, Petr Jelinek wrote:
>>
>> Do you think it'd be simple to provide a blocking, transactional
>> sequence allocator via this API? i.e. gapless sequences, much the same
>> as typically implemented with SELECT ... FOR UPDATE on a counter table.
>>
>> It might be more digestible standalone, and would be a handy contrib/
>> example extension demonstrating use of the API.
>>
>
> Yes I think that's doable (once we iron out the API we can agree on).

Cool; I think that'd be an interesting self-contained example that'd
also have real-world uses ... and provide a nice succinct way to answer
those semi-regular "how do I create a sequence that doesn't have holes"
questions.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-08 11:17:25
Message-ID:	545DFBC5.3010405@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 08/11/14 03:10, Robert Haas wrote:
> On Fri, Nov 7, 2014 at 7:26 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>>> My main problem is actually not with having tuple per-seqAM, but more
>>> with the fact that Heikki does not want to have last_value as compulsory
>>> column/parameter. How is the new AM then supposed to know where to pick
>>> up and if it even can pick up?
>
> That seems pretty well impossible to know anyway. If the pluggable AM
> was handing out values at random or in some unpredictable fashion,
> there may be no well-defined point where it's safe for the default AM
> to resume. Granted, in the case of replication, it probably is
> possible, and maybe that's important enough to be worth catering to.

While this is true, I think 99% of this use-case in practice is about
converting existing schema with "local" sequence to some other sequence
and perhaps fixing schema after people create sequence using wrong AM
because their default is not what they thought it is.

>
>> And obviously, once the last_value is part of the compulsory columns we
>> again have to WAL log all the time for the use-case which Heikki is using as
>> model, so it does not help there (just to clear what my point was about).
>
> But I don't know what to do about that. :-(
>

Me neither and I don't think this is actually solvable, we either have
last_value and logcnt as mandatory columns and the central sequence
server example Heikki is talking about will be impacted by this, or we
leave those columns to AM implementations and we lose the ability to do
AM change for majority of cases, and also IMHO most AMs will then have
to implement their own last_value and logcnt logic anyway.

Honestly, I am still not convinced that the centralized sequence server
with no local caching is something we should be optimizing for as that
one will suffer from performance problems anyway. And it can ignore the
last_value input from postgres if it choses to, so it's not like the
currently proposed patch forbids implementation of such AMs.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-08 14:21:54
Message-ID:	CA+U5nMLBRDJRDj4=mrToqkoP_U=E7awfLiyRAX-5=zNWVheZSg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> Why does sequence_alloc need the current value? If it's a "remote" seqam,
> the current value is kept in the remote server, and the last value that was
> given to this PostgreSQL server is irrelevant.
>
> That irks me with this API. The method for acquiring a new value isn't fully
> abstracted behind the AM interface, as sequence.c still needs to track it
> itself. That's useful for the local AM, of course, and maybe some others,
> but for others it's totally useless.

Please bear in mind what seqam is for...

At present it's only use is to provide Global Sequences. There's a few
ways of doing that, but they all look sorta similar.

The only other use we thought of was shot down, so producing the
perfect API isn't likely to help anyone. It's really not worth the
effort to produce a better API.

BDR doesn't require Global Sequences, nor are Global Sequences
restricted in their use to just BDR - lots of cluster configurations
would want something like this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-08 15:46:26
Message-ID:	CA+TgmoYiKUG335uw3B6ZhUgacszz6OorC3M56E7j-icZ6-Vc5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Nov 8, 2014 at 6:17 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Honestly, I am still not convinced that the centralized sequence server with
> no local caching is something we should be optimizing for as that one will
> suffer from performance problems anyway. And it can ignore the last_value
> input from postgres if it choses to, so it's not like the currently proposed
> patch forbids implementation of such AMs.

I can buy that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-09 19:47:37
Message-ID:	545FC4D9.2020901@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/08/2014 01:57 AM, Petr Jelinek wrote:
> My main problem is actually not with having tuple per-seqAM, but more
> with the fact that Heikki does not want to have last_value as compulsory
> column/parameter. How is the new AM then supposed to know where to pick
> up and if it even can pick up?

Call nextval(), and use the value it returns as the starting point for
the new AM.

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-10 00:39:02
Message-ID:	54600926.4040304@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 09/11/14 20:47, Heikki Linnakangas wrote:
> On 11/08/2014 01:57 AM, Petr Jelinek wrote:
>> My main problem is actually not with having tuple per-seqAM, but more
>> with the fact that Heikki does not want to have last_value as compulsory
>> column/parameter. How is the new AM then supposed to know where to pick
>> up and if it even can pick up?
>
> Call nextval(), and use the value it returns as the starting point for
> the new AM.
>

Hah, so simple, works for me.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-11-24 11:16:24
Message-ID:	54731388.4070608@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/08/2014 04:21 PM, Simon Riggs wrote:
> On 5 November 2014 17:32, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>
>> Why does sequence_alloc need the current value? If it's a "remote" seqam,
>> the current value is kept in the remote server, and the last value that was
>> given to this PostgreSQL server is irrelevant.
>>
>> That irks me with this API. The method for acquiring a new value isn't fully
>> abstracted behind the AM interface, as sequence.c still needs to track it
>> itself. That's useful for the local AM, of course, and maybe some others,
>> but for others it's totally useless.
>
> Please bear in mind what seqam is for...
>
> At present it's only use is to provide Global Sequences. There's a few
> ways of doing that, but they all look sorta similar.

Ok.

> The only other use we thought of was shot down, so producing the
> perfect API isn't likely to help anyone. It's really not worth the
> effort to produce a better API.

Your argument seems to be that because this API is only used for
creating global sequences, it's not worth it to make it better for that
purpose. That doesn't make much sense, so that's probably not what you
meant. I'm confused.

To be clear: I don't think this API is very good for its stated purpose,
for implementing global sequences for use in a cluster. For the reasons
I've mentioned before. I'd like to see two changes to this proposal:

1. Make the AM implementation solely responsible for remembering the
"last value". (if it's a global or remote sequence, the current value
might not be stored in the local server at all)

2. Instead of the single amdata field, make it possible for the
implementation to define any number of fields with any datatype in the
tuple. That would make debugging, monitoring etc. easier.

> BDR doesn't require Global Sequences, nor are Global Sequences
> restricted in their use to just BDR - lots of cluster configurations
> would want something like this.

Sure. (I don't see how that's relevant to this discussion, however).

(marking this as "Returned with Feedback" in the commitfest)

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-02 19:21:28
Message-ID:	20141202192128.GK2456@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
> To be clear: I don't think this API is very good for its stated purpose, for
> implementing global sequences for use in a cluster. For the reasons I've
> mentioned before. I'd like to see two changes to this proposal:
>
> 1. Make the AM implementation solely responsible for remembering the "last
> value". (if it's a global or remote sequence, the current value might not be
> stored in the local server at all)

I think that reason isn't particularly good. The practical applicability
for such a implementation doesn't seem to be particularly large.

> 2. Instead of the single amdata field, make it possible for the
> implementation to define any number of fields with any datatype in the
> tuple. That would make debugging, monitoring etc. easier.

My main problem with that approach is that it pretty much nails the door
shut for moving sequences into a catalog table instead of the current,
pretty insane, approach of a physical file per sequence. Currently, with
our without seqam, it'd not be all that hard to force it into a catalog,
taking care to to force each tuple onto a separate page...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-03 09:59:50
Message-ID:	547EDF16.3040306@adv-solutions.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/02/2014 08:21 PM, Andres Freund wrote:
> [snip]
>> 2. Instead of the single amdata field, make it possible for the
>> implementation to define any number of fields with any datatype in the
>> tuple. That would make debugging, monitoring etc. easier.
> My main problem with that approach is that it pretty much nails the door
> shut for moving sequences into a catalog table instead of the current,
> pretty insane, approach of a physical file per sequence.

Hmm... having done my fair bit of testing, I can say that this isn't
actually that bad (though depends heavily on the underlying filesystem
and workload, of course)
With this approach, I fear extreme I/O contention with an update-heavy
workload... unless all sequence activity is finally WAL-logged and hence
writes to the actual files become mostly sequential and asynchronous.

May I possibly suggest a file-per-schema model instead? This approach
would certainly solve the excessive i-node consumption problem that --I
guess-- Andres is trying to address here.
Moreover, the "one file per schema for sequences" solution would fit a
quite common model of grouping tables (in schemas) for physical
[tablespace] location purposes....
> Currently, with
> our without seqam, it'd not be all that hard to force it into a catalog,
> taking care to to force each tuple onto a separate page...

IMHO, this is jst as wasteful as the current approach (one-page file per
sequence) in terms of disk usage and complicates the code a bit .... but
I really don't see how we can have more than one sequence state per page
without severe (page) locking problems.
However, someone with deeper knowledge of page pinning and buffer
manager internals could certainly devise a better solution...

Just my 2c

Thanks,

/ J.L.

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-03 10:17:26
Message-ID:	547EE336.6020307@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 02/12/14 20:21, Andres Freund wrote:
> On 2014-11-24 13:16:24 +0200, Heikki Linnakangas wrote:
>> To be clear: I don't think this API is very good for its stated purpose, for
>> implementing global sequences for use in a cluster. For the reasons I've
>> mentioned before. I'd like to see two changes to this proposal:
>> ...
>> 2. Instead of the single amdata field, make it possible for the
>> implementation to define any number of fields with any datatype in the
>> tuple. That would make debugging, monitoring etc. easier.
>
> My main problem with that approach is that it pretty much nails the door
> shut for moving sequences into a catalog table instead of the current,
> pretty insane, approach of a physical file per sequence. Currently, with
> our without seqam, it'd not be all that hard to force it into a catalog,
> taking care to to force each tuple onto a separate page...
>

I don't know, I think if we decide to change storage format we can do
serialization/conversion in seqam layer, it does not seem to matter too
much if the serialization into some opaque type is done in AM itself or
by the API layer. Or we can have one relation for all sequences in
single AM, etc. In general I don't think that the custom columns for AM
approach prohibits future storage changes.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	José Luis Tallón <jltallon(at)adv-solutions(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-03 10:24:25
Message-ID:	20141203102425.GT2456@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
> On 12/02/2014 08:21 PM, Andres Freund wrote:
> >[snip]
> >>2. Instead of the single amdata field, make it possible for the
> >>implementation to define any number of fields with any datatype in the
> >>tuple. That would make debugging, monitoring etc. easier.
> >My main problem with that approach is that it pretty much nails the door
> >shut for moving sequences into a catalog table instead of the current,
> >pretty insane, approach of a physical file per sequence.
>
> Hmm... having done my fair bit of testing, I can say that this isn't
> actually that bad (though depends heavily on the underlying filesystem and
> workload, of course)
> With this approach, I fear extreme I/O contention with an update-heavy
> workload... unless all sequence activity is finally WAL-logged and hence
> writes to the actual files become mostly sequential and asynchronous.

I don't think the WAL logging would need to change much in comparison to
the current solution. We'd just add the page number to the WAL record.

The biggest advantage would be to require fewer heavyweight locks,
because the pg_sequence one could be a single fastpath lock. Currently
we have to take the sequence's relation lock (heavyweight) and then the
the page level lock (lwlock) for every single sequence used.

> May I possibly suggest a file-per-schema model instead? This approach would
> certainly solve the excessive i-node consumption problem that --I guess--
> Andres is trying to address here.

I don't think that really has any advantages.

> >Currently, with
> >our without seqam, it'd not be all that hard to force it into a catalog,
> >taking care to to force each tuple onto a separate page...
>
> IMHO, this is jst as wasteful as the current approach (one-page file per
> sequence) in terms of disk usage and complicates the code a bit .... but I
> really don't see how we can have more than one sequence state per page
> without severe (page) locking problems.

The overhead of a file is much more than wasting the remainder of a
page. Alone the requirement of separate fsyncs and everything is pretty
bothersome. The generated IO patterns are also much worse...

> However, someone with deeper knowledge of page pinning and buffer manager
> internals could certainly devise a better solution...

I think there's pretty much no chance of accepting more than one page
per

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	José Luis Tallón <jltallon(at)adv-solutions(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-03 14:50:00
Message-ID:	547F2318.6080207@adv-solutions.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/03/2014 11:24 AM, Andres Freund wrote:
> On 2014-12-03 10:59:50 +0100, José Luis Tallón wrote:
>> snip]
> I don't think the WAL logging would need to change much in comparison to
> the current solution. We'd just add the page number to the WAL record.
>
> The biggest advantage would be to require fewer heavyweight locks,
> because the pg_sequence one could be a single fastpath lock. Currently
> we have to take the sequence's relation lock (heavyweight) and then the
> the page level lock (lwlock) for every single sequence used.

Got it, thank you for the explanation.

>> May I possibly suggest a file-per-schema model instead? This approach would
>> certainly solve the excessive i-node consumption problem that --I guess--
>> Andres is trying to address here.
> I don't think that really has any advantages.

Just spreading the I/O load, nothing more, it seems:

Just to elaborate a bit on the reasoning, for completeness' sake:
Given that a relation's segment maximum size is 1GB, we'd have
(1048576/8)=128k sequences per relation segment.
Arguably, not many real use cases will have that many sequences.... save
for *massively* multi-tenant databases.

The downside being that all that random I/O --- in general, it can't
really be sequential unless there are very very few sequences--- can't
be spread to other spindles. Create a "sequence_default_tablespace" GUC
+ ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe?
(I could take a shot at the patch, if deemed worthwhile)

>> [snip]
> The overhead of a file is much more than wasting the remainder of a
> page. Alone the requirement of separate fsyncs and everything is pretty
> bothersome. The generated IO patterns are also much worse...

Yes, you are right. I stand corrected.

> [snip]
> I think there's pretty much no chance of accepting more than one page
> per sequence

Definitively.

Thanks,

J.L.

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	José Luis Tallón <jltallon(at)adv-solutions(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-04 16:33:27
Message-ID:	54808CD7.9040100@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/3/14, 8:50 AM, José Luis Tallón wrote:
>
>>> May I possibly suggest a file-per-schema model instead? This approach would
>>> certainly solve the excessive i-node consumption problem that --I guess--
>>> Andres is trying to address here.
>> I don't think that really has any advantages.
>
> Just spreading the I/O load, nothing more, it seems:
>
> Just to elaborate a bit on the reasoning, for completeness' sake:
> Given that a relation's segment maximum size is 1GB, we'd have (1048576/8)=128k sequences per relation segment.
> Arguably, not many real use cases will have that many sequences.... save for *massively* multi-tenant databases.
>
> The downside being that all that random I/O --- in general, it can't really be sequential unless there are very very few sequences--- can't be spread to other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE SET TABLESPACE, to use an SSD for this purpose maybe?

Why not? RAID arrays typically use stripe sizes in the 128-256k range, which means only 16 or 32 sequences per stripe.

It still might make sense to allow controlling what tablespace a sequence is in, but IMHO the default should just be pg_default.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	José Luis Tallón <jltallon(at)adv-solutions(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-04 18:01:29
Message-ID:	20141204180129.GE27550@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> >>May I possibly suggest a file-per-schema model instead? This approach would
> >>certainly solve the excessive i-node consumption problem that --I guess--
> >>Andres is trying to address here.
> >I don't think that really has any advantages.
>
> Just spreading the I/O load, nothing more, it seems:
>
> Just to elaborate a bit on the reasoning, for completeness' sake:
> Given that a relation's segment maximum size is 1GB, we'd have
> (1048576/8)=128k sequences per relation segment.
> Arguably, not many real use cases will have that many sequences.... save for
> *massively* multi-tenant databases.
>
> The downside being that all that random I/O --- in general, it can't really
> be sequential unless there are very very few sequences--- can't be spread to
> other spindles. Create a "sequence_default_tablespace" GUC + ALTER SEQUENCE
> SET TABLESPACE, to use an SSD for this purpose maybe?
> (I could take a shot at the patch, if deemed worthwhile)

I think that's just so far outside the sane usecases that I really don't
see us adding complexity to reign it in. If your frequently used
sequences get flushed out to disk by anything but the checkpointer the
schema is just badly designed...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-10 02:33:54
Message-ID:	5487B112.9090304@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24/11/14 12:16, Heikki Linnakangas wrote:
> I'd like to see two changes to this proposal:
>
> 1. Make the AM implementation solely responsible for remembering the
> "last value". (if it's a global or remote sequence, the current value
> might not be stored in the local server at all)
>
> 2. Instead of the single amdata field, make it possible for the
> implementation to define any number of fields with any datatype in the
> tuple. That would make debugging, monitoring etc. easier.
>

Hi,

here is the new version for next CF, there are several rough edges (more
about that later) but I think it's ready for more feedback.

I did change the API quite considerably (rewrite is the word I believe).

Instead of trying to put smarts on one or the other side of the API, I
basically created 2 APIs, one is the seqam which is responsible for
maintaining the sequence data and the other is sequence storage API
provided by postgres backend.

This means that backend cares purely about storage, and local caching
while AM cares about the data inside sequence, and decides when to WAL.

I'll start with storage API description:
- sequence_open(Oid) - open sequence with given Oid and returns
SequenceHandle (opens relation in share lock only)

- sequence_read_tuple(SequenceHandle) - returns heap tuple of the
sequence, this will also do the buffer locking

- sequence_save_tuple(SequenceHandle, newtuple, do_wal) - updates the
buffer with new sequence, optionally does WAL logging, newtuple can be
NULL in which case the tuple is not replaced (this is performance
optimization for sequences that don't need varlena/nullable columns and
can update everything inplace)

- sequence_release_tuple(SequenceHandle) - unlocks previously locked
buffer - usable for synchronization inside the seqam implementation, the
gapless sequence uses this

- sequence_close(SequenceHandle) - closes the sequence handle, also
unlocks buffer if needed

- sequence_needs_wal(SequenceHandle) - returns true if sequence needs
wal, that is if the last wal write happened before last checkpoint -
this is a little bit low level but I don't see a way around it if the
local sequence is supposed to maintain last_value and logcnt itself,
also we have similar function for relations (that checks for different
things but naming and usage is similar) so it's probably not too bad

The SequenceHandle is opaque struct so no gory details seen by the AM.

Now for the actual sequence AM API:
- seqam_extra_columns(Oid) - returns list of custom columns needed by
specified sequenceAM

- seqam_init - Initializes the columns for the custom sequenceAM, used
when creating, altering or resetting sequence. It gets sequenceam Oid,
sequence options, reloptions on input and returns updated values and
nulls arrays.

- seqam_alloc - Allocates new sequence values. It gets sequence
relation, handle and how many new values the backend wants and returns
first and last allocated value.

- seqam_setval - implements setval() support

- seqam_dump_state - pg_dump support
- seqam_restore_state - pg_dump support

The API is slightly bigger, but it also includes proper support for
dumping and restoring via pg_dump and seems somewhat cleaner, plus the
synchronous and asynchronous APIs are now same and no callbacks anywhere
(if AM needs to do something asynchronously it just calls the storage API).

There is also gapless sequence contrib module included as separate
patch, it's missing proper locking during dump/restore atm but otherwise
should be fully working (see tests). I am also using it for testing the
ALTER TABLE USING.

About the rough edges:
- The AlterSequence is not prettiest code around as we now have to
create new relation when sequence AM is changed and I don't know how to
do that nicely
- I am not sure if I did the locking, command order and dependency
handling in AlterSequence correcly
- I removed the setval3 since there is no guarantee that sequence will
have is_called parameter but there is now proper pg_dump support so it's
not needed for pg_dump usecase, but it's still incompatible change
- ResetSequence basically does ALTER SEQUENCE RESET internally instead
calling setval now because of the above
- there are no docs for the C APIs yet - I don't even know where those
should be placed btw, would appreciate suggestions (I have same problem
with committs API docs)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v4.patch	text/x-diff	138.8 KB
0002-gapless_seq-v1.patch	text/x-diff	16.3 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2014-12-15 10:36:53
Message-ID:	548EB9C5.5090905@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/12/14 03:33, Petr Jelinek wrote:
> On 24/11/14 12:16, Heikki Linnakangas wrote:
>
> About the rough edges:
> - The AlterSequence is not prettiest code around as we now have to
> create new relation when sequence AM is changed and I don't know how to
> do that nicely
> - I am not sure if I did the locking, command order and dependency
> handling in AlterSequence correcly

This version does AlterSequence differently and better. Didn't attach
the gapless sequence again as that one is unchanged.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v5.patch	text/x-diff	143.0 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-12 21:33:39
Message-ID:	54B43DB3.3070908@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15/12/14 11:36, Petr Jelinek wrote:
> On 10/12/14 03:33, Petr Jelinek wrote:
>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>
>> About the rough edges:
>> - The AlterSequence is not prettiest code around as we now have to
>> create new relation when sequence AM is changed and I don't know how to
>> do that nicely
>> - I am not sure if I did the locking, command order and dependency
>> handling in AlterSequence correcly
>
> This version does AlterSequence differently and better. Didn't attach
> the gapless sequence again as that one is unchanged.
>
>

And another version, separated into patch-set of 3 patches.
First patch contains the seqam patch itself, not many changes there,
mainly docs/comments related. What I wrote in the previous email for
version 3 still applies.

Second patch adds DDL support. I originally wanted to make it
CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
which does not need to change anything (besides adding METHOD to
unreserved keywords).
The DDL support uses the DefineStmt infra with some very small change as
the sequence ams are not schema qualified, but I think it's acceptable
and saves considerable amount of boilerplate.

And third patch is gapless sequence implementation updated to work with
the new DDL support with some tests added for checking if dependencies
work correctly. It also acts as example on how to make custom AMs.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v5.patch	text/x-diff	150.3 KB
0002-seqam-DDL-WIP.patch	text/x-diff	33.2 KB
0003-gapless-sequence-v2.patch	text/x-diff	18.5 KB

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-13 12:24:37
Message-ID:	54B50E85.2060901@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12.1.2015 22:33, Petr Jelinek wrote:
> On 15/12/14 11:36, Petr Jelinek wrote:
>> On 10/12/14 03:33, Petr Jelinek wrote:
>>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>>
>>> About the rough edges:
>>> - The AlterSequence is not prettiest code around as we now have to
>>> create new relation when sequence AM is changed and I don't know how to
>>> do that nicely
>>> - I am not sure if I did the locking, command order and dependency
>>> handling in AlterSequence correcly
>>
>> This version does AlterSequence differently and better. Didn't attach
>> the gapless sequence again as that one is unchanged.
>>
>>
>
> And another version, separated into patch-set of 3 patches.
> First patch contains the seqam patch itself, not many changes there,
> mainly docs/comments related. What I wrote in the previous email for
> version 3 still applies.

I did a review of the first part today - mostly by reading through the
diff. I plan to do a bit more thorough testing in a day or two. I'll
also look at the two (much smaller) patches.

comments/questions/general nitpicking:

(1) Why treating empty string as equal to 'local'? Isn't enforcing the
actual value a better approach?

(2) NITPICK: Maybe we could use access_method in the docs (instead of
sequence_access_method), as the 'sequence' part is clear from the
context?

(3) Why index_reloptions / sequence_reloptions when both do exactly the
same thing (call to common_am_reloptions)? I'd understand this if
the patch(es) then change the sequence_reloptions() but that's not
happening. Maybe that's expected to happen?

(4) DOCS: Each sequence can only use one access method at a time.

Does that mean a sequence can change the access method during it's
lifetime? My understanding is that the AM is fixed after creating
the sequence?

(5) DOCS/NITPICK: SeqAM / SeqAm ... (probably should be the same?)

(6) cache lookup failed for relation %u

I believe this should reference 'sequence access method' instead of
a relation.

(7) seqam_init

I believe this is a bug (not setting nulls[4] but twice nulls[3]):

+ fcinfo.argnull[0] = seqparams == NULL;
+ fcinfo.argnull[1] = reloptions_parsed == NULL;
+ fcinfo.argnull[2] = false;
+ fcinfo.argnull[3] = false;
+ fcinfo.argnull[3] = false;

(8) check_default_seqam without a transaction

* If we aren't inside a transaction, we cannot do database access so
* cannot verify the name. Must accept the value on faith.

In which situation this happens? Wouldn't it be better to simply
enforce the transaction and fail otherwise?

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-13 14:00:36
Message-ID:	54B52504.6050608@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13/01/15 13:24, Tomas Vondra wrote:
> On 12.1.2015 22:33, Petr Jelinek wrote:
>> On 15/12/14 11:36, Petr Jelinek wrote:
>>> On 10/12/14 03:33, Petr Jelinek wrote:
>>>> On 24/11/14 12:16, Heikki Linnakangas wrote:
>>>>
>>>> About the rough edges:
>>>> - The AlterSequence is not prettiest code around as we now have to
>>>> create new relation when sequence AM is changed and I don't know how to
>>>> do that nicely
>>>> - I am not sure if I did the locking, command order and dependency
>>>> handling in AlterSequence correcly
>>>
>>> This version does AlterSequence differently and better. Didn't attach
>>> the gapless sequence again as that one is unchanged.
>>>
>>>
>>
>> And another version, separated into patch-set of 3 patches.
>> First patch contains the seqam patch itself, not many changes there,
>> mainly docs/comments related. What I wrote in the previous email for
>> version 3 still applies.
>
> I did a review of the first part today - mostly by reading through the
> diff. I plan to do a bit more thorough testing in a day or two. I'll
> also look at the two (much smaller) patches.
>

Thanks!

> comments/questions/general nitpicking:
>
> (1) Why treating empty string as equal to 'local'? Isn't enforcing the
> actual value a better approach?
>

Álvaro mentioned on IM also, I already changed it to saner normal GUC
with 'local' as default value in my working copy

> (2) NITPICK: Maybe we could use access_method in the docs (instead of
> sequence_access_method), as the 'sequence' part is clear from the
> context?

Yes.

> (3) Why index_reloptions / sequence_reloptions when both do exactly the
> same thing (call to common_am_reloptions)? I'd understand this if
> the patch(es) then change the sequence_reloptions() but that's not
> happening. Maybe that's expected to happen?

That's leftover from the original design where AM was supposed to call
this, since this is not exposed to AM anymore I think it can be single
function now.

>
> (4) DOCS: Each sequence can only use one access method at a time.
>
> Does that mean a sequence can change the access method during it's
> lifetime? My understanding is that the AM is fixed after creating
> the sequence?
>

Oh, I forgot to add ALTER SEQUENCE USING into docs, you can change AM
even though you probably don't want to do it often, but for migrations
it's useful.

> (8) check_default_seqam without a transaction
>
> * If we aren't inside a transaction, we cannot do database access so
> * cannot verify the name. Must accept the value on faith.
>
> In which situation this happens? Wouldn't it be better to simply
> enforce the transaction and fail otherwise?

Reading postgresql.conf during startup, I don't think we want to fail in
that scenario ;)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-22 15:50:08
Message-ID:	54C11C30.6000801@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/12/2015 11:33 PM, Petr Jelinek wrote:
> Second patch adds DDL support. I originally wanted to make it
> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
> which does not need to change anything (besides adding METHOD to
> unreserved keywords).
> The DDL support uses the DefineStmt infra with some very small change as
> the sequence ams are not schema qualified, but I think it's acceptable
> and saves considerable amount of boilerplate.

Do we need DDL commands for this at all? I could go either way on that
question. We recently had a discussion on that wrt. index access methods
[1], and Tom opined that providing DDL for creating index access methods
is not worth it. The extension can just insert the rows into pg_seqam
with INSERT. Do we expect sequence access methods as extensions to be
more popular than index access methods? Maybe, because the WAL-logging
problem doesn't exist. But OTOH, if you're writing something like a
replication system that needs global sequences as part of it, there
aren't that many of those, and the installation scripts will need to
deal with more complicated stuff than inserting a row in pg_seqam.

[1] http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us

> And third patch is gapless sequence implementation updated to work with
> the new DDL support with some tests added for checking if dependencies
> work correctly. It also acts as example on how to make custom AMs.

I'll take a look at that to see how the API works, but we're not going
to include it in the source tree, because it doesn't actually guarantee
gaplessness. That makes it a pretty dangerous example. I'm sure we can
come up with a better example that might even be useful. How about a
Lamport's clock sequence, which advances once per second, in addition to
when anyone calls nextval() ? Or a remote sequence that uses an FDW to
call nextval() in a foreign server?

- Heikki

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-22 16:00:17
Message-ID:	CA+TgmoY1cMofPRj_A6V3Bk2woNPMbkf86XxmDU5S3ybr1_2DAQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 22, 2015 at 10:50 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 01/12/2015 11:33 PM, Petr Jelinek wrote:
>> Second patch adds DDL support. I originally wanted to make it
>> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
>> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
>> which does not need to change anything (besides adding METHOD to
>> unreserved keywords).
>> The DDL support uses the DefineStmt infra with some very small change as
>> the sequence ams are not schema qualified, but I think it's acceptable
>> and saves considerable amount of boilerplate.
>
> Do we need DDL commands for this at all? I could go either way on that
> question. We recently had a discussion on that wrt. index access methods
> [1], and Tom opined that providing DDL for creating index access methods is
> not worth it. The extension can just insert the rows into pg_seqam with
> INSERT. Do we expect sequence access methods as extensions to be more
> popular than index access methods? Maybe, because the WAL-logging problem
> doesn't exist. But OTOH, if you're writing something like a replication
> system that needs global sequences as part of it, there aren't that many of
> those, and the installation scripts will need to deal with more complicated
> stuff than inserting a row in pg_seqam.

I think the main reason we don't need DDL for pg_am is because there's
no real hope of anybody actually inserting anything in there whether
we have a command for it or not. If we actually expect this to be
used, I think it should probably have some kind of DDL, because
otherwise what will pg_dump emit? "Nothing" is bad because then dumps
won't restore, and "direct inserts to pg_seqam" doesn't seem great
either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-22 16:02:36
Message-ID:	54C11F1C.8090702@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22/01/15 16:50, Heikki Linnakangas wrote:
> On 01/12/2015 11:33 PM, Petr Jelinek wrote:
>> Second patch adds DDL support. I originally wanted to make it
>> CREATE/DROP SEQUENCE ACCESS METHOD... but that would mean making ACCESS
>> a reserver keyword so I went for CREATE ACCESS METHOD FOR SEQUENCES
>> which does not need to change anything (besides adding METHOD to
>> unreserved keywords).
>> The DDL support uses the DefineStmt infra with some very small change as
>> the sequence ams are not schema qualified, but I think it's acceptable
>> and saves considerable amount of boilerplate.
>
> Do we need DDL commands for this at all? I could go either way on that
> question. We recently had a discussion on that wrt. index access methods
> [1], and Tom opined that providing DDL for creating index access methods
> is not worth it. The extension can just insert the rows into pg_seqam
> with INSERT. Do we expect sequence access methods as extensions to be
> more popular than index access methods? Maybe, because the WAL-logging
> problem doesn't exist. But OTOH, if you're writing something like a
> replication system that needs global sequences as part of it, there
> aren't that many of those, and the installation scripts will need to
> deal with more complicated stuff than inserting a row in pg_seqam.

I don't know about popularity, and I've seen the discussion about
indexes. The motivation for DDL for me was handling dependencies
correctly, that's all. If we say we don't care about that (and allow
DROP EXTENSION even though user has sequences that are using the AM)
then we don't need DDL.

>
> [1] http://www.postgresql.org/message-id/26822.1414516012@sss.pgh.pa.us
>
>> And third patch is gapless sequence implementation updated to work with
>> the new DDL support with some tests added for checking if dependencies
>> work correctly. It also acts as example on how to make custom AMs.
>
> I'll take a look at that to see how the API works, but we're not going
> to include it in the source tree, because it doesn't actually guarantee
> gaplessness. That makes it a pretty dangerous example. I'm sure we can
> come up with a better example that might even be useful. How about a
> Lamport's clock sequence, which advances once per second, in addition to
> when anyone calls nextval() ? Or a remote sequence that uses an FDW to
> call nextval() in a foreign server?
>

I have updated patch ready, just didn't submit it because I am otherwise
busy this week, I hope to get to it today evening or tomorrow morning,
so I'd wait until that with looking at the patch.

The new version (the one that is not submitted yet) of gapless sequence
is way more ugly and probably not best example either but does guarantee
gaplessness (it stores the last value in it's own value table). So I am
not sure if it should be included either...

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-23 00:34:35
Message-ID:	54C1971B.4050107@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22/01/15 17:02, Petr Jelinek wrote:
>
> The new version (the one that is not submitted yet) of gapless sequence
> is way more ugly and probably not best example either but does guarantee
> gaplessness (it stores the last value in it's own value table). So I am
> not sure if it should be included either...
>

Here it is as promised.

Notable changes:
- The gapless sequence rewritten to use the transactional storage as
that's the only way to guarantee gaplessness between dump and restore.

- Custom columns are now stored in the catalog instead of generated by
some special interface. This makes it possible to record dependencies on
types and also seems in line with how other similar things are done.

- Removed couple of pallocs from the nextval codepath which improved
performance considerably.

- Changed how the ALTER SEQUENCE ... USING behaves - when RESET WITH
value is specified the new sequenceAM will use that one for starting, if
it's not specified the nextval from original sequenceAM will be used as
RESET value. In general handling of the RESET parameter is left to the AMs.

- Unified the reloptions for index and sequences into single function.

- Re-added setval3 for compatibility reasons - it only works on local
sequences, otherwise it throws error.

- Made the 'local' default value for the GUC instead of special handling
of the empty string.

Notable missing things:
- Docs for pg_seqam.

- pg_dump support for pg_seqam. It's embarrassing but I didn't think
about this until Robert mentioned it, I am apparently too used to handle
these things using extensions.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v6.patch	text/x-diff	134.1 KB
0002-seqam-ddl-v2.patch	text/x-diff	34.8 KB
0003-gapless-sequence-v3.patch	text/x-diff	27.0 KB

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-28 17:09:47
Message-ID:	54C917DB.2090307@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/23/2015 02:34 AM, Petr Jelinek wrote:
> On 22/01/15 17:02, Petr Jelinek wrote:
>>
>> The new version (the one that is not submitted yet) of gapless sequence
>> is way more ugly and probably not best example either but does guarantee
>> gaplessness (it stores the last value in it's own value table). So I am
>> not sure if it should be included either...
>
> Here it is as promised.

I generally like the division of labour between the seqam
implementations and the API now.

I don't like the default_sequenceam GUC. That seems likely to create
confusion. And it's not really enough for something like a remote
sequence AM anyway that definitely needs some extra reloptions, like the
hostname of the remote server. The default should be 'local', unless you
specify something else with CREATE SEQUENCE USING - no GUCs.

Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number
instead of checking whether pg_sequence_dump_state() function exists.
That's what we usually do when new features are introduced. And there's
actually a bug there: you have the check backwards. (try dumping a
database with any sequences in it; it fails)

* pg_dump should not output a USING clause when a sequence uses the
'local' sequence. No point in adding such noise - making the SQL command
not standard-compatible - for the 99% of databases that don't use other
sequence AMs.

* Be careful to escape all strings correctly in pg_dump. I think you're
missing escaping for the 'state' variable at least.

In sequence_save_tuple:
> else
> {
> /*
> * New tuple was not sent, so the original tuple was probably just
> * changed inline, all we need to do is mark the buffer dirty and
> * optionally log the update tuple.
> */
> START_CRIT_SECTION();
>
> MarkBufferDirty(seqh->buf);
>
> if (do_wal)
> log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page);
>
> END_CRIT_SECTION();
> }

This is wrong when checksums are enabled and !do_wal. I believe this
should use MarkBufferDirtyHint().

> Notable changes:
> - The gapless sequence rewritten to use the transactional storage as
> that's the only way to guarantee gaplessness between dump and restore.

Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,
updating a row in a table for every nextval() call is pretty darn
expensive. But I don't actually see a problem with dump and restore.

Can you rewrite it without using the values table? AFAICS, there are a
two of problems to solve:

1. If the top-level transaction aborts, you need to restore the old
value. I suggest keeping two values in the sequence tuple: the old,
definitely committed one, and the last value. The last value is only
considered current if the associated XID committed; otherwise the old
value is current. When a transaction is about to commit, it stores its
top-level XID and the new value in the "last" field, and copies the
previously current value to the "old" field.

2. You need to track the last value on a per-subtransaction basis, until
the transaction commits/rolls back, in order to have "rollback to
savepoint" to retreat the sequence's value. That can be done in
backend-private memory, maintaining a stack of subtransactions and the
last value of each. Use RegisterSubXactCallback to hook into subxact
commit/abort. Just before top-level commit (in pre-commit callback),
update the sequence tuple with the latest value, so that it gets
WAL-logged before the commit record.

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-01-28 17:56:21
Message-ID:	54C922C5.7030107@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28/01/15 18:09, Heikki Linnakangas wrote:
> On 01/23/2015 02:34 AM, Petr Jelinek wrote:
>> On 22/01/15 17:02, Petr Jelinek wrote:
>>>
>>> The new version (the one that is not submitted yet) of gapless sequence
>>> is way more ugly and probably not best example either but does guarantee
>>> gaplessness (it stores the last value in it's own value table). So I am
>>> not sure if it should be included either...
>>
>> Here it is as promised.
>
> I generally like the division of labour between the seqam
> implementations and the API now.
>
> I don't like the default_sequenceam GUC. That seems likely to create
> confusion. And it's not really enough for something like a remote
> sequence AM anyway that definitely needs some extra reloptions, like the
> hostname of the remote server. The default should be 'local', unless you
> specify something else with CREATE SEQUENCE USING - no GUCs.
>

Hmm, I am not too happy about this, I want SERIAL to work with custom
sequences (as long as it's possible for the sequence to work that way at
least). That's actually important feature for me. Your argument about
this being potential problem for some sequenceAMs is valid though.

> Some comments on pg_dump:
>
> * In pg_dump's dumpSequence function, check the server version number
> instead of checking whether pg_sequence_dump_state() function exists.
> That's what we usually do when new features are introduced. And there's
> actually a bug there: you have the check backwards. (try dumping a
> database with any sequences in it; it fails)

Right.

> * pg_dump should not output a USING clause when a sequence uses the
> 'local' sequence. No point in adding such noise - making the SQL command
> not standard-compatible - for the 99% of databases that don't use other
> sequence AMs.

Well this only works if we remove the GUC. Because otherwise if GUC is
there then you always need to either add USING clause or set the GUC in
advance (like we do with search_path for example).

> * Be careful to escape all strings correctly in pg_dump. I think you're
> missing escaping for the 'state' variable at least.

Ouch.

> In sequence_save_tuple:
>> else
>> {
>> /*
>> * New tuple was not sent, so the original tuple was probably
>> just
>> * changed inline, all we need to do is mark the buffer dirty and
>> * optionally log the update tuple.
>> */
>> START_CRIT_SECTION();
>>
>> MarkBufferDirty(seqh->buf);
>>
>> if (do_wal)
>> log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page);
>>
>> END_CRIT_SECTION();
>> }
>
> This is wrong when checksums are enabled and !do_wal. I believe this
> should use MarkBufferDirtyHint().
>

Oh ok, didn't realize.

>> Notable changes:
>> - The gapless sequence rewritten to use the transactional storage as
>> that's the only way to guarantee gaplessness between dump and restore.
>
> Can you elaborate?
>
> Using the auxiliary seqam_gapless_values is a bit problematic. First of
> all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
> whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,

Yeah but at worst there are some unused rows there, it's not too bad. I
could also create relation per sequence so that dependency code would
handle everything correctly, but seems bit too expensive to create not
one but two relations per sequence...

> updating a row in a table for every nextval() call is pretty darn
> expensive.

Yes it's expensive, but the gapless sequence also serializes access so
it will never be speed daemon.

> But I don't actually see a problem with dump and restore.

The problem is that the tuple stored in the sequence relation is always
the one with latest state (and with frozen xid), so pg_dump has no way
of getting the value as it was at the time it took the snapshot. This
means that after dump/restore cycle, the sequence can be further away
than the table it's attached to and you end up with a gap there.

> Can you rewrite it without using the values table? AFAICS, there are a
> two of problems to solve:
>
> 1. If the top-level transaction aborts, you need to restore the old
> value. I suggest keeping two values in the sequence tuple: the old,
> definitely committed one, and the last value. The last value is only
> considered current if the associated XID committed; otherwise the old
> value is current. When a transaction is about to commit, it stores its
> top-level XID and the new value in the "last" field, and copies the
> previously current value to the "old" field.
>

Yes, this is how the previous implementation worked.

> 2. You need to track the last value on a per-subtransaction basis, until
> the transaction commits/rolls back, in order to have "rollback to
> savepoint" to retreat the sequence's value. That can be done in
> backend-private memory, maintaining a stack of subtransactions and the
> last value of each. Use RegisterSubXactCallback to hook into subxact
> commit/abort. Just before top-level commit (in pre-commit callback),
> update the sequence tuple with the latest value, so that it gets
> WAL-logged before the commit record.
>

I started writing something like what you described here but then I
realized the dump/restore problem and went with the values table which
solves both of these issues at the same time.

BTW I am happy to discuss this at FOSDEM if you are interested and will
have time.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-02-15 18:40:24
Message-ID:	54E0E818.4030009@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

sending new version that is updated along the lines of what we discussed
at FOSDEM, which means:

- back to single bytea amdata column (no custom columns)

- the dump/restore interfaces were changed to produce/accept array of
key/value pairs

- psql shows the output of dump interface in verbose mode (this makes it
easier to inspect the state)

- last_value and is_called are always present columns even if the
sequence am does not care for them (this helps with backwards
compatibility with various admin tools)

- pg_dump uses new interface for dumping only for non-local sequences so
that dumps of schemas which are not using custom seqams are restorable
to older postgres

- the default_sequenceam was changed to serial_sequenceam and only
applies to serial/bigserial columns - I would personally prefer to have
the old default but default for serial is better than nothing

- added description of pg_seqam catalog to docs

- gapless_seq still uses table internally to support snapshots for pg_dump

- pg_dump support for dumping the sequence ams

It would be nice to also have something along the lines of chapter 55.2.
(Index Access Method Functions), I plan to send that as additional patch
in few days.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v7.patch	text/x-diff	134.2 KB
0002-seqam-ddl-v3.patch	text/x-diff	40.7 KB
0003-gapless-sequence-v4.patch	text/x-diff	27.5 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-02-17 22:11:26
Message-ID:	CA+TgmoZqEQOa4jgwuNkD476yuT+e87zh3ZJF2WHtvDYJL90=Wg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> sending new version that is updated along the lines of what we discussed at
> FOSDEM, which means:
>
> - back to single bytea amdata column (no custom columns)

Why?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-02-18 01:59:32
Message-ID:	54E3F204.7070804@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 17/02/15 23:11, Robert Haas wrote:
> On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> sending new version that is updated along the lines of what we discussed at
>> FOSDEM, which means:
>>
>> - back to single bytea amdata column (no custom columns)
>

Well, the main argument is still future possibility of moving into
single table for storage. And when we discussed about it in person we
agreed that there is not too big advantage in having separate columns
since there need to be dump/restore state interfaces anyway so you can
see the relevant state via those as we made the output more human
readable (and the psql support reflects that).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-02-18 02:02:13
Message-ID:	54E3F2A5.1030008@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18/02/15 02:59, Petr Jelinek wrote:
> On 17/02/15 23:11, Robert Haas wrote:
>> On Sun, Feb 15, 2015 at 1:40 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com>
>> wrote:
>>> sending new version that is updated along the lines of what we
>>> discussed at
>>> FOSDEM, which means:
>>>
>>> - back to single bytea amdata column (no custom columns)
>>
>
> Well, the main argument is still future possibility of moving into
> single table for storage. And when we discussed about it in person we
> agreed that there is not too big advantage in having separate columns
> since there need to be dump/restore state interfaces anyway so you can
> see the relevant state via those as we made the output more human
> readable (and the psql support reflects that).
>

Also makes the patch a little simpler obviously.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-03-15 19:07:07
Message-ID:	5505D85B.1000306@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Slightly updated version of the patch.

Mainly rebased against current master (there were several conflicts) and
fixed some typos, no real functional change.

I also attached initial version of the API sgml doc.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0004-seqam-api-doc-v1.patch	text/x-diff	9.6 KB
0003-gapless-sequence-v4.patch	text/x-diff	27.5 KB
0002-seqam-ddl-v4.patch	text/x-diff	40.7 KB
0001-seqam-v8.patch	text/x-diff	135.3 KB

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-20 09:49:39
Message-ID:	5534CBB3.4010007@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 03/15/2015 09:07 PM, Petr Jelinek wrote:
> Slightly updated version of the patch.
>
> Mainly rebased against current master (there were several conflicts) and
> fixed some typos, no real functional change.
>
> I also attached initial version of the API sgml doc.

Thanks!

With the patch, pg_class.relam column references to the pg_seqam table
for sequences, but pg_indexam for indexes. I believe it's the first
instance where we reuse a "foreign key" column like that. It's not a
real foreign key, of course - that wouldn't work with a real foreign key
at all - but it's a bit strange. That makes me a bit uncomfortable. How
do others feel about that?

- Heikki

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-20 10:05:37
Message-ID:	20150420100537.GC25107@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:
> With the patch, pg_class.relam column references to the pg_seqam table for
> sequences, but pg_indexam for indexes. I believe it's the first instance
> where we reuse a "foreign key" column like that. It's not a real foreign
> key, of course - that wouldn't work with a real foreign key at all - but
> it's a bit strange. That makes me a bit uncomfortable. How do others feel
> about that?

Hm. I'd modeled it more as an extension of the 'relkind' column
mentally. I.e. it further specifies how exactly the relation is
behaving. Given that the field has been added to pg_class and not
pg_index, combined with it not having index in its name, makes me think
that it actually was intended to be used the way it's done in the patch.

It's not the first column that behaves that way btw, at least pg_depend
comes to mind.

Greetings,

Andres Freund

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-20 10:26:15
Message-ID:	5534D447.2040701@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20/04/15 12:05, Andres Freund wrote:
> On 2015-04-20 12:49:39 +0300, Heikki Linnakangas wrote:
>> With the patch, pg_class.relam column references to the pg_seqam table for
>> sequences, but pg_indexam for indexes. I believe it's the first instance
>> where we reuse a "foreign key" column like that. It's not a real foreign
>> key, of course - that wouldn't work with a real foreign key at all - but
>> it's a bit strange. That makes me a bit uncomfortable. How do others feel
>> about that?
>
> Hm. I'd modeled it more as an extension of the 'relkind' column
> mentally. I.e. it further specifies how exactly the relation is
> behaving. Given that the field has been added to pg_class and not
> pg_index, combined with it not having index in its name, makes me think
> that it actually was intended to be used the way it's done in the patch.
>

That's how I think about it too. It's also short for access method,
nothing really suggests to me that it should be index specific by design.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-20 15:50:44
Message-ID:	55352054.5020408@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I finally got around to have another round of review on this. I fixed a
couple of little bugs, did some minor edition on comments etc. See
attached. It is also available in my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
if you want to look at individual changes. It combines your patches 1
and 4, I think those need to be applied together. I haven't looked at
the DDL changes yet.

I'm fairly happy with the alloc API now. I'm not sure it's a good idea
for the AM to access the sequence tuple directly, though. I would seem
cleaner if it only manipulated the amdata Datum. But it's not too bad as
it is.

We went back and forth on whether 'amdata' should be a single column or
multiple columns, but I guess the single bytea column was the consensus
in the end.

The division of labour between sequence.c and the AM, in the init and
the get/set_state functions, is a bit more foggy:

* Do we really need a separate amoptions() method and an init() method,
when the only caller to amoptions() is just before the init() method?
The changes in extractRelOptions suggest that it would call
am_reloptions for sequences too, but that doesn't actually seem to be
happening.

postgres=# create sequence fooseq using local with (garbage=option);
CREATE SEQUENCE

Invalid options probably should throw an error.

* Currently, the AM's init function is responsible for basic sanity
checking, like min < max. It also has to extract the RESTART value from
the list of options. I think it would make more sense to move that to
sequence.c. The AM should only be responsible for initializing the
'amdata' portion, and for handling any AM-specific options. If the AM
doesn't support some options, like MIN and MAX value, it can check them
and throw an error, but it shouldn't be responsible for just passing
them through from the arguments to the sequence tuple.

* It might be better to form the sequence tuple before calling the init
function, and pass the ready-made tuple to it. All the other API
functions deal with the tuple (after calling sequence_read_tuple), so
that would be more consistent. The init function would need to
deconstruct it to modify it, but we're not concerned about performance here.

* The transformations of the arrays in get_state() and set_state()
functions are a bit complicated. The seqam_get_state() function returns
two C arrays, and pg_sequence_get_state() turns them into a text[]
array. Why not construct the text[] array directly in the AM? I guess
it's a bit more convenient to the AM, if the pg_sequence_get_state() do
that, but if that's an issue, you could create generic helper function
to turn two C string arrays into text[], and call that from the AM.

> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>
> for (i = 0; i < count; i++)
> {
> if (pg_strcasecmp(keys[i], "last_value") == 0)
> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
> CStringGetDatum(values[i])));
> else if (pg_strcasecmp(keys[i], "is_called") == 0)
> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
> CStringGetDatum(values[i])));
> else
> ereport(ERROR,
> (errcode(ERRCODE_SYNTAX_ERROR),
> errmsg("invalid state key \"%s\" for local sequence",
> keys[i])));
> }
>
> sequence_save_tuple(seqh, NULL, true);

If that error happens after having already processed a "last_value" or
"is_called" entry, you have already modified the on-disk tuple. AFAICS
that's the only instance of that bug, but sequence_read_tuple - modify
tuple in place - sequence_save_tuple pattern is quite unsafe in general.
If you modify a tuple directly in a Buffer, you should have a critical
section around it. It would make sense to start a critical section in
sequence_read_tuple(), except that not all callers want to modify the
tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
functions should be split into two:

sequence_read_tuple_for_update()
sequence_save_tuple()

* seqam_local_get_state() calls sequence_read_tuple() but not
sequence_release_tuple(). It looks like sequence_close() releases the
tuple and the buffer, but that seems sloppy. sequence_close() probably
shouldn't be calling sequence_release_tuple at all, so that you'd get a
warning at the end of transaction about the buffer leak.

- Heikki

Attachment	Content-Type	Size
seqam-v8+api-doc-heikki.patch	application/x-patch	134.2 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-20 16:22:56
Message-ID:	20150420162256.GU4369@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:

> * The transformations of the arrays in get_state() and set_state() functions
> are a bit complicated. The seqam_get_state() function returns two C arrays,
> and pg_sequence_get_state() turns them into a text[] array. Why not
> construct the text[] array directly in the AM? I guess it's a bit more
> convenient to the AM, if the pg_sequence_get_state() do that, but if that's
> an issue, you could create generic helper function to turn two C string
> arrays into text[], and call that from the AM.

Um, see strlist_to_textarray() in objectaddress.c if you do that. Maybe
we need some generic place to store that kind of helper function.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-22 20:01:57
Message-ID:	5537FE35.5070608@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 20/04/15 17:50, Heikki Linnakangas wrote:
> On 03/15/2015 09:07 PM, Petr Jelinek wrote:
>> Slightly updated version of the patch.
>>
>> Mainly rebased against current master (there were several conflicts) and
>> fixed some typos, no real functional change.
>>
>> I also attached initial version of the API sgml doc.
>
> I finally got around to have another round of review on this. I fixed a
> couple of little bugs, did some minor edition on comments etc. See
> attached. It is also available in my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
> if you want to look at individual changes. It combines your patches 1
> and 4, I think those need to be applied together. I haven't looked at
> the DDL changes yet.

Thanks!

>
> I'm fairly happy with the alloc API now. I'm not sure it's a good idea
> for the AM to access the sequence tuple directly, though. I would seem
> cleaner if it only manipulated the amdata Datum. But it's not too bad as
> it is.

Yeah, I was thinking about this myself I just liked sending 10
parameters to the function less than this.

>
> The division of labour between sequence.c and the AM, in the init and
> the get/set_state functions, is a bit more foggy:
>
> * Do we really need a separate amoptions() method and an init() method,
> when the only caller to amoptions() is just before the init() method?
> The changes in extractRelOptions suggest that it would call
> am_reloptions for sequences too, but that doesn't actually seem to be
> happening.

Hmm yes it should and I am sure it did at some point, must have messed
it during one of the rebases :(

And it's the reason why we need separate API function.

>
> postgres=# create sequence fooseq using local with (garbage=option);
> CREATE SEQUENCE
>
> Invalid options probably should throw an error.
>
> * Currently, the AM's init function is responsible for basic sanity
> checking, like min < max. It also has to extract the RESTART value from
> the list of options. I think it would make more sense to move that to
> sequence.c. The AM should only be responsible for initializing the
> 'amdata' portion, and for handling any AM-specific options. If the AM
> doesn't support some options, like MIN and MAX value, it can check them
> and throw an error, but it shouldn't be responsible for just passing
> them through from the arguments to the sequence tuple.

Well then we need to send restart as additional parameter to the init
function as restart is normally not stored anywhere.

The checking is actually if new value is withing min/max but yes that
can be done inside sequence.c I guess.

>
> * It might be better to form the sequence tuple before calling the init
> function, and pass the ready-made tuple to it. All the other API
> functions deal with the tuple (after calling sequence_read_tuple), so
> that would be more consistent. The init function would need to
> deconstruct it to modify it, but we're not concerned about performance
> here.

Right, this is actually more of a relic of the custom columns
implementation where I didn't want to build the tuple with NULLs for
columns that might have been specified as NOT NULL, but with the amdata
approach it can create the tuple safely.

>
> * The transformations of the arrays in get_state() and set_state()
> functions are a bit complicated. The seqam_get_state() function returns
> two C arrays, and pg_sequence_get_state() turns them into a text[]
> array. Why not construct the text[] array directly in the AM? I guess
> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
> that, but if that's an issue, you could create generic helper function
> to turn two C string arrays into text[], and call that from the AM.

Yeah that was exactly the reasoning. Helper function works for me (will
check what Álvaro's suggested, maybe it can be moved somewhere and reused).

>
>> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>
>> for (i = 0; i < count; i++)
>> {
>> if (pg_strcasecmp(keys[i], "last_value") == 0)
>> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>
>> CStringGetDatum(values[i])));
>> else if (pg_strcasecmp(keys[i], "is_called") == 0)
>> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>
>> CStringGetDatum(values[i])));
>> else
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> errmsg("invalid state key \"%s\" for local
>> sequence",
>> keys[i])));
>> }
>>
>> sequence_save_tuple(seqh, NULL, true);
>
> If that error happens after having already processed a "last_value" or
> "is_called" entry, you have already modified the on-disk tuple. AFAICS
> that's the only instance of that bug, but sequence_read_tuple - modify
> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
> If you modify a tuple directly in a Buffer, you should have a critical
> section around it. It would make sense to start a critical section in
> sequence_read_tuple(), except that not all callers want to modify the
> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
> functions should be split into two:
>
> sequence_read_tuple_for_update()
> sequence_save_tuple()

Agreed, however I am much more concerned about the way
seqam_local_alloc() works, see the XXX comment there. I am thinking it's
not really safe that way, but problem is that we can't put critical
section around it currently as the sequence_save_tuple() can potentially
call GetTopTransactionId(). Separating it into more functions might be
solution. Except the split into two does not really help there, it would
have to be something like this:
sequence_tuple_update_start() - does the GetTopTransactionId bit and
starts critical section
sequence_tuple_save() - saves the tuple/does wal
sequence_tuple_update_finish() - ends the critical section

That looks slightly cumbersome but I don't really have better idea.

>
> * seqam_local_get_state() calls sequence_read_tuple() but not
> sequence_release_tuple(). It looks like sequence_close() releases the
> tuple and the buffer, but that seems sloppy. sequence_close() probably
> shouldn't be calling sequence_release_tuple at all, so that you'd get a
> warning at the end of transaction about the buffer leak.
>

Do you mean that you want to make call to sequence_release_tuple()
mandatory when sequence_read_tuple() was called?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject:	Re: Sequence Access Method WIP
Date:	2015-04-27 15:49:27
Message-ID:	553E5A87.9030105@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 22/04/15 22:01, Petr Jelinek wrote:
> On 20/04/15 17:50, Heikki Linnakangas wrote:
>>>
>> I finally got around to have another round of review on this. I fixed a
>> couple of little bugs, did some minor edition on comments etc. See
>> attached. It is also available in my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch "seqam",
>> if you want to look at individual changes. It combines your patches 1
>> and 4, I think those need to be applied together. I haven't looked at
>> the DDL changes yet.
>
> Thanks!

I merged all your changes and merged the patch#1 with patch#4.

Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam
and gave you (Heikki) commit rights there in case you want to change
anything.

>> The division of labour between sequence.c and the AM, in the init and
>> the get/set_state functions, is a bit more foggy:
>>
>> * Do we really need a separate amoptions() method and an init() method,
>> when the only caller to amoptions() is just before the init() method?
>> The changes in extractRelOptions suggest that it would call
>> am_reloptions for sequences too, but that doesn't actually seem to be
>> happening.
>
> Hmm yes it should and I am sure it did at some point, must have messed
> it during one of the rebases :(
>
> And it's the reason why we need separate API function.
>

Actually the reloption handling was broken in more than one place, I
fixed it all around. I had to teach the heap_create_with_catalog about
relam but the change does not seem to be too intrusive to me so I think
it's fine. We no longer do the update of pg_class tuple after the
sequence was created anymore and the relcache works correctly now. Also
the whole reloption handling was moved to more proper places than the
seqam_init so it's more in line with how it works for other relation kinds.

>>
>> postgres=# create sequence fooseq using local with (garbage=option);
>> CREATE SEQUENCE
>>
>> Invalid options probably should throw an error.
>>

Done, well actually the local sequence now throws error on any options.
I also updated CREATE SEQUENCE and ALTER SEQUENCE docs with the
reloptions syntax.

>> * Currently, the AM's init function is responsible for basic sanity
>> checking, like min < max. It also has to extract the RESTART value from
>> the list of options. I think it would make more sense to move that to
>> sequence.c. The AM should only be responsible for initializing the
>> 'amdata' portion, and for handling any AM-specific options. If the AM
>> doesn't support some options, like MIN and MAX value, it can check them
>> and throw an error, but it shouldn't be responsible for just passing
>> them through from the arguments to the sequence tuple.
>

I now do the checking in sequence.c where possible, the init got
generally redone to accept the newly created tuple and restart value
parameters.

One thing that I am not sure how much like is that if the AM wants to
change the amdata in the init, it now has to modify the tuple and free
the old one.

>
>>
>> * The transformations of the arrays in get_state() and set_state()
>> functions are a bit complicated. The seqam_get_state() function returns
>> two C arrays, and pg_sequence_get_state() turns them into a text[]
>> array. Why not construct the text[] array directly in the AM? I guess
>> it's a bit more convenient to the AM, if the pg_sequence_get_state() do
>> that, but if that's an issue, you could create generic helper function
>> to turn two C string arrays into text[], and call that from the AM.
>
> Yeah that was exactly the reasoning. Helper function works for me (will
> check what Álvaro's suggested, maybe it can be moved somewhere and reused).
>

Didn't use Álvaro's code in the end as ISTM working directly with arrays
is simple enough for this use case. The only slightly ugly part is the
use of TextDatumGetCString/CStringGetDatum but I think that's survivable
(maybe the sequence.c could convert the TEXT[] to CSTRING[] or just char
*[] as this is not performance critical code path).

>>
>>> seq = (FormData_pg_sequence *) GETSTRUCT(sequence_read_tuple(seqh));
>>>
>>> for (i = 0; i < count; i++)
>>> {
>>> if (pg_strcasecmp(keys[i], "last_value") == 0)
>>> seq->last_value = DatumGetInt64(DirectFunctionCall1(int8in,
>>>
>>> CStringGetDatum(values[i])));
>>> else if (pg_strcasecmp(keys[i], "is_called") == 0)
>>> seq->is_called = DatumGetBool(DirectFunctionCall1(boolin,
>>>
>>> CStringGetDatum(values[i])));
>>> else
>>> ereport(ERROR,
>>> (errcode(ERRCODE_SYNTAX_ERROR),
>>> errmsg("invalid state key \"%s\" for local
>>> sequence",
>>> keys[i])));
>>> }
>>>
>>> sequence_save_tuple(seqh, NULL, true);
>>
>> If that error happens after having already processed a "last_value" or
>> "is_called" entry, you have already modified the on-disk tuple. AFAICS
>> that's the only instance of that bug, but sequence_read_tuple - modify
>> tuple in place - sequence_save_tuple pattern is quite unsafe in general.
>> If you modify a tuple directly in a Buffer, you should have a critical
>> section around it. It would make sense to start a critical section in
>> sequence_read_tuple(), except that not all callers want to modify the
>> tuple in place. Perhaps the sequence_read_tuple/sequence_save_tuple
>> functions should be split into two:
>>
>> sequence_read_tuple_for_update()
>> sequence_save_tuple()
>
> Agreed, however I am much more concerned about the way
> seqam_local_alloc() works, see the XXX comment there. I am thinking it's
> not really safe that way, but problem is that we can't put critical
> section around it currently as the sequence_save_tuple() can potentially
> call GetTopTransactionId(). Separating it into more functions might be
> solution. Except the split into two does not really help there, it would
> have to be something like this:
> sequence_tuple_update_start() - does the GetTopTransactionId bit and
> starts critical section
> sequence_tuple_save() - saves the tuple/does wal
> sequence_tuple_update_finish() - ends the critical section
>
> That looks slightly cumbersome but I don't really have better idea.
>

So I went with something like what was proposed:
- sequence_start_update(seqh, do_wal) - prepares the state and starts
the critical section
- sequence_apply_update(seqh, do_wal) - does buffer dirtying and wal logging
- sequence_finish_update(seqh) - closes the critical section and cleans
up the state

The API might look slightly hairy now but I don't see much difference
between having to do:
START_CRIT_SECTION()
sequence_update_tuple()
END_CRIT_SECTION()
and the above, except the above also solves the xid acquiring and shows
less internals.

I also added sequence_swap_tuple() for the use-case when the AM does not
want to do inline updates because having this separate makes things
easier and less ugly. And once you are changing whole tuple you are
already in slow/complex code path so one more function call does not matter.

>> * seqam_local_get_state() calls sequence_read_tuple() but not
>> sequence_release_tuple(). It looks like sequence_close() releases the
>> tuple and the buffer, but that seems sloppy. sequence_close() probably
>> shouldn't be calling sequence_release_tuple at all, so that you'd get a
>> warning at the end of transaction about the buffer leak.
>>

I did this (requiring the sequence_release_tuple() always) and
documented it, but personally don't like it much as it just adds one
more call to the API which already has enough of them and I really don't
see the advantage of it.

Other than things mentioned above I rebased it on current master (the
transforms commit produced several issues). And fixed several bugs like
checking min/max value in set_state, proper initialization of session
cached increment_by, etc.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v9.patch	application/x-patch	158.1 KB
0002-seqam-ddl-v5.patch	application/x-patch	39.5 KB
0003-gapless-sequence-v5.patch	application/x-patch	28.6 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 04:10:36
Message-ID:	20150513041035.GR2523@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki, do you have time to go through this at this point?

Petr Jelinek wrote:

> I merged all your changes and merged the patch#1 with patch#4.
>
> Also I pushed my repo to https://github.com/PJMODOS/postgres/tree/seqam and
> gave you (Heikki) commit rights there in case you want to change anything.

I rebased your #1 to current master; attached. It builds and passes
regression test, but I didn't check any further. Doc build fails with:

$ make check
onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s /pgsql/source/master/doc/src/sgml/postgres.sgml
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here
onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open
Makefile:320: recipe for target 'check' failed
make: *** [check] Error 1

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
seqam-v9ah.patch	text/x-diff	154.8 KB

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 10:56:11
Message-ID:	55532DCB.70007@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
> Heikki, do you have time to go through this at this point?

I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
won't have the time to review this thoroughly enough to commit. Sorry.

I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax
patch at all yet.

We discussed using a single amdata column vs. any number of am-specific
columns. We settled on amdata, but I'm still not 100% convinced that's
the best approach. Just as a data point, this removes the log_cnt field
and moves it into amdata in a non-human-readable format. So for someone
who only uses the local seqam, this just makes things slightly worse.
For more complicated seqam's, it would be even more important to display
the state in a human-readable format. Perhaps it's OK that each seqam
provides its own functions or similar to do that, but I'd like to
revisit that decision.

I still don't like the serial_sequenceam GUC. Not sure what to do
instead. Needs some thought.

- Heikki

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 11:12:56
Message-ID:	CANP8+jKuBG3C0-ja8uObihZ-2=THHMKj-SgzX8X1WY7Z8GauzA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13 May 2015 at 11:56, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
>
>> Heikki, do you have time to go through this at this point?
>>
>
> I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
> won't have the time to review this thoroughly enough to commit. Sorry.
>
> I haven't looked at the CREATE/DROP ACCESS METHOD FOR SEQUENCE syntax
> patch at all yet.
>
> We discussed using a single amdata column vs. any number of am-specific
> columns. We settled on amdata, but I'm still not 100% convinced that's the
> best approach. Just as a data point, this removes the log_cnt field and
> moves it into amdata in a non-human-readable format. So for someone who
> only uses the local seqam, this just makes things slightly worse. For more
> complicated seqam's, it would be even more important to display the state
> in a human-readable format. Perhaps it's OK that each seqam provides its
> own functions or similar to do that, but I'd like to revisit that decision.
>
> I still don't like the serial_sequenceam GUC. Not sure what to do instead.
> Needs some thought.

This has had around 2 years of thought at this point. I don't agree it
needs more thought.

There is one clear use case for this and it is of benefit to many
distributed architectures.

I don't see what calamity will occur if we commit this. If you don't want a
sequence AM, don't ever use this.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	hlinnaka(at)iki(dot)fi, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 11:23:21
Message-ID:	55533429.30603@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13/05/15 12:56, Heikki Linnakangas wrote:
> On 05/13/2015 07:10 AM, Alvaro Herrera wrote:
>> Heikki, do you have time to go through this at this point?
>
> I'm afraid I won't :-(. I did intend to, but looking at the calendar, I
> won't have the time to review this thoroughly enough to commit. Sorry.
>
> We discussed using a single amdata column vs. any number of am-specific
> columns. We settled on amdata, but I'm still not 100% convinced that's
> the best approach. Just as a data point, this removes the log_cnt field
> and moves it into amdata in a non-human-readable format. So for someone
> who only uses the local seqam, this just makes things slightly worse.
> For more complicated seqam's, it would be even more important to display
> the state in a human-readable format. Perhaps it's OK that each seqam
> provides its own functions or similar to do that, but I'd like to
> revisit that decision.
>

I do think it's ok for seqam to provide functions that can give you the
data in human readable form.

I think main argument against custom human readable columns is that it
will kill any possibility to have common storage for sequences.

There is also additional complexity in the API required for that.

The performance implications are probably small as one could still
define opaque bytea column and store the data same way a now.

> I still don't like the serial_sequenceam GUC. Not sure what to do
> instead. Needs some thought.
>

I think it would be ok if this issue was solved by follow-up patch at
later time.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 12:15:53
Message-ID:	55534079.70109@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/13/2015 02:12 PM, Simon Riggs wrote:
> This has had around 2 years of thought at this point. I don't agree it
> needs more thought.

Noted.

> There is one clear use case for this and it is of benefit to many
> distributed architectures.

Right. What's your point?

> I don't see what calamity will occur if we commit this. If you don't want a
> sequence AM, don't ever use this.

I'd like the API to be good for its purpose. Also, I did mention that
the the current patch makes the situation slightly worse for people who
never use this: it makes the log_cnt field non human-readable. That's a
really minor thing, but it shows that it *does* matter how this is
implemented, even if you only ever use the local seq AM.

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 13:45:47
Message-ID:	5553558B.2000500@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13/05/15 14:15, Heikki Linnakangas wrote:
>> I don't see what calamity will occur if we commit this. If you don't
>> want a
>> sequence AM, don't ever use this.
>
> I'd like the API to be good for its purpose. Also, I did mention that
> the the current patch makes the situation slightly worse for people who
> never use this: it makes the log_cnt field non human-readable. That's a
> really minor thing, but it shows that it *does* matter how this is
> implemented, even if you only ever use the local seq AM.
>

It definitely does matter.

I don't think we'll find perfect compromise here though, you can either
do it one way or the other. Trust me it does not make me happy either, I
like perfect solutions too, but when there is lack of perfect solution I
prefer the simpler one.

Both of the solutions have drawbacks
- amdata has opaque blob which does not store data in user visible way
but that can be worked around by providing function that shows it in
human readable way (and the dump function for many sequence types
actually does that).
- multiple columns basically kill any future ability to unify the
storage for sequences and also adds complexity, especially around alter
table (since it means drop/add column and stuff)

But I already wrote both versions anyway so from that perspective it
does not matter much which part we merge.

(As a side-node I would have preferred to have this discussion earlier
than 2 days before feature freeze because the current implementation is
something that we agreed on several months ago so there was plenty of
time to revisit that decision.)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-05-13 16:40:33
Message-ID:	55537E81.3000902@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13/05/15 06:10, Alvaro Herrera wrote:
>
> I rebased your #1 to current master; attached. It builds and passes

Thanks

> regression test, but I didn't check any further. Doc build fails with:
>
> $ make check
> onsgmls -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D /pgsql/source/master/doc/src/sgml -s /pgsql/source/master/doc/src/sgml/postgres.sgml
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:3:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:44:E: document type does not allow element "LITERAL" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:185:54:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:53:E: element "STRUCT" undefined
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:186:71:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:31:E: document type does not allow element "FUNCTION" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:187:55:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:12:E: document type does not allow element "FUNCTION" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:189:38:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:28:E: document type does not allow element "FUNCTION" here; missing one of "REMARK", "SYNOPSIS", "LITERALLAYOUT", "PROGRAMLISTING", "SCREEN", "PARA", "SIMPARA", "BRIDGEHEAD" start-tag
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:190:54:E: character data is not allowed here
> onsgmls:/pgsql/source/master/doc/src/sgml/seqam.sgml:191:8:E: end tag for element "PARA" which is not open
> Makefile:320: recipe for target 'check' failed
> make: *** [check] Error 1
>

Missing <para>, attached fixed version with your rebase and one of the
commits from Heikki that I missed last time.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-v10.patch	binary/octet-stream	154.0 KB
0002-seqam-ddl-v5.patch	binary/octet-stream	39.5 KB
0003-gapless-sequence-v5.patch	binary/octet-stream	28.6 KB

From:	Vik Fearing <vik(at)2ndquadrant(dot)fr>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-06-15 09:32:47
Message-ID:	557E9BBF.9030807@2ndquadrant.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've been looking at these patches a bit and here are some comments:

Patch 1: seqam

I would like to see an example in the docs for CREATE SEQUENCE. That's
perhaps not possible (or desirable) with only the "local" seqam? Not sure.

In the docs for pg_class, there is no mention that relam refers to
pg_seqam for sequences but pg_am for all other types.

There are errant half-sentences in the documentation, for example "to
the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in
Sequence Access Method Functions.

I'd prefer a README instead of the long comment at the start of seqam.c.
The other ams do that.

As mentioned upthread, this patch isn't a seamless replacement for
what's already there because of the amdata field. I wasn't part of the
conversation of FOSDEM unfortunately, and there's not enough information
in this thread to know why this solution is preferred over each seqam
having its own table type with all the columns it needs. I see that
Heikki is waffling a bit between the two, and I have a fairly strong
opinion that amdata should be split into separate columns. The patch
already destroys and recreates what it needs when changing access method
via ALTER SEQUENCE, so I don't really see what the problem is.

There is no psql tab completion for the new USING clause in ALTER
SEQUENCE, and in looking at that I noticed we don't have tab completion
for CREATE SEQUENCE at all. I know we don't complete everything, but if
we're going to do ALTER, I think we should do CREATE. I'll submit a
patch for that on its own thread, but then this patch will need to
modify it to include USING.

There is no \d command for sequence access methods. Without querying
pg_seqam directly, how does one discover what's available?

There are some unfinished comments, such as "When relam is specified,
record dependency on the" in heap.c.

Comments and code in pg_dump.c check that the server is at least 9.5.
Those'll need to be changed to at least 9.6.

Patch 2: seqam ddl

When defining a new access method for sequences, it is possible to list
the arguments multiple times (last one wins). Other defel loops raise
an error if the argument is specified more than once. I haven't looked
at all of such loops to see if this is the only odd man out or not, but
I prefer the error behavior.

It doesn't seem possible to create a comment for a seqam, is that
intentional? Ah, after more review I see it is possible, just not
documented. I think that needs to be corrected.

Same comment as above about testing for 9.5.

Patch 3: gapless_seq

I really like the idea of having a gapless sequence in contrib.
Although it has big potential to be abused, doing it manually when it's
needed (like for invoices, at least in France) is a major pain. So big
+1 for including this.

However, the patch doesn't update the main contrib Makefile so you have
to compile it explicitly. It also doesn't have the .sgml file in the
right place, so that's not installed either.

There is a FIXME in get_last_value_tup() which should probably be
removed in light of the code comment in catcache.c stating that pg_seqam
is too small to benefit from an index scan.

It would be nice to be able to specify an access method when declaring a
serial or bigserial column. This could be a separate patch later on,
though.

On the whole, I think this is a pretty good patchset. Aside from the
design decision of whether amdata is a single opaque column or a set of
columns, there are only a few things that need to be changed before it's
ready for committer, and those things are mostly documentation.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	hlinnaka(at)iki(dot)fi, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-06-17 15:51:26
Message-ID:	5581977E.3080302@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-06-15 11:32, Vik Fearing wrote:
> I've been looking at these patches a bit and here are some comments:
>

Thanks for looking at this.

> Patch 1: seqam
>
> I would like to see an example in the docs for CREATE SEQUENCE. That's
> perhaps not possible (or desirable) with only the "local" seqam? Not sure.
>

It is possible to have example with local seqam, it might be confusing
though, given it produces same results as not putting USING in the query.

> In the docs for pg_class, there is no mention that relam refers to
> pg_seqam for sequences but pg_am for all other types.
>
> There are errant half-sentences in the documentation, for example "to
> the options in the CREATE SEQUENCE or ALTER SEQUENCE statement." in
> Sequence Access Method Functions.

I think that's the side effect of all the rebases and rewrites over the
2y(!) that this has been going forward and back. It can be easily fixed
by proof reading before final submission. I didn't pay too much
attention yet because it's not clear how the docs should look like if
there is no real agreement on the api. (This applies to other comments
about docs as well)

>
> I'd prefer a README instead of the long comment at the start of seqam.c.
> The other ams do that.
>

OK, since things have been moved to separate directory, README is
doable, I personally prefer the docs in the main .c file usually but I
know project uses README sometimes for this.

> As mentioned upthread, this patch isn't a seamless replacement for
> what's already there because of the amdata field. I wasn't part of the
> conversation of FOSDEM unfortunately, and there's not enough information
> in this thread to know why this solution is preferred over each seqam
> having its own table type with all the columns it needs. I see that
> Heikki is waffling a bit between the two, and I have a fairly strong
> opinion that amdata should be split into separate columns. The patch
> already destroys and recreates what it needs when changing access method
> via ALTER SEQUENCE, so I don't really see what the problem is.
>

FOSDEM was just about agreeing that amdata is simpler after we discussed
it with Heikki. Nothing too important you missed there I guess.

I can try to summarize what are the differences:
- amdata is somewhat simpler in terms of code for both init, alter and
DDL, since with custom columns you have to specify them somehow and deal
with them in catalog, also ALTER SEQUENCE USING means that there are
going to be colums marked as deleted which produces needless waste, etc
- amdata make it easier to change the storage model as the tuple
descriptor is same for all sequences
- the separate columns are much nicer from user point of view
- my opinion is that separate columns also more nicely separate state
from options and I think that if we move to separate storage model,
there can be state table per AM which solves the tuple descriptor issue
- there is probably some slight performance benefit to amdata but I
don't think it's big enough to be important

I personally have slight preference to separate columns design, but I am
ok with both ways honestly.

>
> There is no \d command for sequence access methods. Without querying
> pg_seqam directly, how does one discover what's available?
>

Good point.

>
> Patch 2: seqam ddl
>
> When defining a new access method for sequences, it is possible to list
> the arguments multiple times (last one wins). Other defel loops raise
> an error if the argument is specified more than once. I haven't looked
> at all of such loops to see if this is the only odd man out or not, but
> I prefer the error behavior.
>

Hmm yeah, there should be error. I think only tsearch doesn't enforce
errors from the existing stuff, should probably be fixed as well
(separately of course).

>
> Patch 3: gapless_seq
>
> I really like the idea of having a gapless sequence in contrib.
> Although it has big potential to be abused, doing it manually when it's
> needed (like for invoices, at least in France) is a major pain. So big
> +1 for including this.
>

Yeah, people make gapless sequences regardless, it's better to provide
them one that behaves correctly, also it's quite good test for the API.

>
> It would be nice to be able to specify an access method when declaring a
> serial or bigserial column. This could be a separate patch later on,
> though.
>

The patch originally had GUC for this, but Heikki didn't like it so it's
left for the future developments.

>
> On the whole, I think this is a pretty good patchset. Aside from the
> design decision of whether amdata is a single opaque column or a set of
> columns, there are only a few things that need to be changed before it's
> ready for committer, and those things are mostly documentation.
>

Unfortunately the amdata being opaque vs set of columns is the main
issue here.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-07-28 18:11:36
Message-ID:	55B7C5D8.80502@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

So, we have this patch in the commitfest again. Let's see where we are,
and try to find a consensus on what needs to be done before this can be
committed.

On 06/17/2015 06:51 PM, Petr Jelinek wrote:
> On 2015-06-15 11:32, Vik Fearing wrote:
>> I've been looking at these patches a bit and here are some comments:
>
> Thanks for looking at this.

+1, thanks Vik.

>> As mentioned upthread, this patch isn't a seamless replacement for
>> what's already there because of the amdata field. I wasn't part of the
>> conversation of FOSDEM unfortunately, and there's not enough information
>> in this thread to know why this solution is preferred over each seqam
>> having its own table type with all the columns it needs. I see that
>> Heikki is waffling a bit between the two, and I have a fairly strong
>> opinion that amdata should be split into separate columns. The patch
>> already destroys and recreates what it needs when changing access method
>> via ALTER SEQUENCE, so I don't really see what the problem is.
>
> FOSDEM was just about agreeing that amdata is simpler after we discussed
> it with Heikki. Nothing too important you missed there I guess.
>
> I can try to summarize what are the differences:
> - amdata is somewhat simpler in terms of code for both init, alter and
> DDL, since with custom columns you have to specify them somehow and deal
> with them in catalog, also ALTER SEQUENCE USING means that there are
> going to be colums marked as deleted which produces needless waste, etc
> - amdata make it easier to change the storage model as the tuple
> descriptor is same for all sequences
> - the separate columns are much nicer from user point of view
> - my opinion is that separate columns also more nicely separate state
> from options and I think that if we move to separate storage model,
> there can be state table per AM which solves the tuple descriptor issue
> - there is probably some slight performance benefit to amdata but I
> don't think it's big enough to be important
>
> I personally have slight preference to separate columns design, but I am
> ok with both ways honestly.

Regarding the amdata issue, I'm also leaning towards set of columns.
I've felt that way all along, but not very strongly, so I relented at
some point when Andres felt strongly that a single column would be
better. But the more I think about it, the more I feel that separate
columns really would be better. As evidence, I offer this recent thread:

Tom said
(http://www.postgresql.org/message-id/8739.1436893588@sss.pgh.pa.us):
> I really don't see what's wrong with "SELECT last_value FROM sequence",
> especially since that has worked in every Postgres version since 6.x.
> Anyone slightly worried about backwards compatibility wouldn't use
> an equivalent function even if we did add one.

If we went with the single amdata column, that would break. Or we'd need
to leave last_value as a separate column anyway, and leave it unused for
sequence AMs where it's not applicable. But that's a bit ugly too.

Jim Nasby said in the same thread:
> FWIW, I think it'd be better to have a pg_sequences view that's the
> equivalent of SELECT * FROM <sequence> for every sequence in the
> database. That would let you get whatever info you needed.

Creating such a view would be difficult if all the sequences have a
different set of columns. But when you think about it, it's not really
any better with a single amdata column. You can't easily access the data
in the amdata column that way either.

Anyway, that's my opinion. Several others have weighed in to support
separate columns, too, so I think that is the consensus. Separate
columns it is.

>> There is no \d command for sequence access methods. Without querying
>> pg_seqam directly, how does one discover what's available?
>
> Good point.

Well, you can query pg_seqam. I don't think this deserves a \d command.

>> On the whole, I think this is a pretty good patchset. Aside from the
>> design decision of whether amdata is a single opaque column or a set of
>> columns, there are only a few things that need to be changed before it's
>> ready for committer, and those things are mostly documentation.
>
> Unfortunately the amdata being opaque vs set of columns is the main
> issue here.

There was discussion on another thread on how the current sequence AM
API is modeled after the indexam API, at
http://www.postgresql.org/message-id/3896.1437059303@sss.pgh.pa.us. Will
need to do something about that too.

Petr, is this enough feedback on this patch for this commitfest, or are
there some other issues you want to discuss before I mark this as returned?

- Heikki

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-07-28 18:51:22
Message-ID:	55B7CF2A.5010602@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
> Petr, is this enough feedback on this patch for this commitfest, or are
> there some other issues you want to discuss before I mark this as returned?
>

You can mark it as returned, I didn't have much time to actually do much
useful work on this in the current CF.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Thom Brown <thom(at)linux(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-09-16 11:21:32
Message-ID:	CAA-aLv5ed-_BSc_8m3wsm5cRXuCBODiiui4kCiPOKP0ye2CctQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28 July 2015 at 19:51, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
>>
>> Petr, is this enough feedback on this patch for this commitfest, or are
>> there some other issues you want to discuss before I mark this as
>> returned?
>>
>>
> You can mark it as returned, I didn't have much time to actually do much
> useful work on this in the current CF.

Is this now dependant on the work Alexander Korotkov is doing on the AM
interface?

Thom

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-09-17 02:36:11
Message-ID:	55FA271B.8000407@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2015-09-16 13:21, Thom Brown wrote:
> On 28 July 2015 at 19:51, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>
> On 2015-07-28 20:11, Heikki Linnakangas wrote:
>
>
> Petr, is this enough feedback on this patch for this commitfest,
> or are
> there some other issues you want to discuss before I mark this
> as returned?
>
>
> You can mark it as returned, I didn't have much time to actually do
> much useful work on this in the current CF.
>
>
> Is this now dependant on the work Alexander Korotkov is doing on the AM
> interface?
>

Yes.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2015-12-31 23:51:51
Message-ID:	5685BF97.7080800@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

here is another try at this. I had some more time to think about various
corner cases (like the sequence not having TOAST).

I decided for this version to try bit different approach and that is to
use custom datatype for the sequence access method state. So there is
one extra amstate column and the access method defines the custom type.

The idea is that the custom type should be fixed width so that we can do
proper checks during the creation of access method for size and stuff.
It also makes the interface bit cleaner as we can provide method for
reading and saving of the state and abstract the internals more nicely.
It also solves the issues with critical sections vs memory allocations
when the access methods wants to save completely new state.

It also means that get_state and set_state interfaces just work with
text representation of custom type, no need to have complex conversions
to arrays and back.

One downside is that this means access methods can't use composite types
as those are not fixed width (nor plain) so the names of the individual
items in the output are somewhat hidden (see the modified create_view
test for example of how this affects things).

Other than that, this is based on the new am api by Alexander Korotkov
[1]. It extends it by adding another column called amkind to the pg_am
which can have either value "i" for index or "S" for sequence (same as
relkind in pg_class for those).

I didn't attach DDL and the gapless sequence yet, mainly because I don't
want to waste time rewriting it again in case we can't agree that this
approach is good (based on the long discussion and resulting several
rewrites wrt the multiple columns vs bytea previously).

[1] https://commitfest.postgresql.org/8/336/

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
seqam-2015-12-31.patch	binary/octet-stream	151.3 KB

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Vik Fearing <vik(at)2ndquadrant(dot)fr>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-18 08:17:37
Message-ID:	CAMsr+YETwmoL2tGa8Zvsq8D8VM=oMPM1zYPMYZge-3tzpCG29A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1 January 2016 at 07:51, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

>
> Other than that, this is based on the new am api by Alexander Korotkov
> [1]. It extends it by adding another column called amkind to the pg_am
> which can have either value "i" for index or "S" for sequence (same as
> relkind in pg_class for those).
>

This patch will no longer apply after 65c5fcd353 (
http://github.com/postgres/postgres/commit/65c5fcd353) as outlined in
http://www.postgresql.org/message-id/10804.1453077804@sss.pgh.pa.us .

Setting waiting-on-author in the CF app.

The good news is that the commit of the pg_am rework greatly eases the path
of this patch into core.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-18 08:19:20
Message-ID:	20160118081920.1299.35459.pgcf@coridan.postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Needs rework after the commit of https://commitfest.postgresql.org/8/336/

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-27 16:56:28
Message-ID:	CALLjQTTLxNJXQ3oPFSmp7eACK+HMC1qM4grBWo7Zc73mOAd19Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18 January 2016 at 09:19, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> Needs rework after the commit of https://commitfest.postgresql.org/8/336/

Here is version that applies to current master. There is some work to
do (mostly cleanup) and the DDL is missing, but that's because I want
to know what people think about the principle of how it works now and
if it makes sense to finish it this way (explained in the original
submission for Jan CF).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-seqam-2015-01-28.patch	application/octet-stream	155.2 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 12:00:37
Message-ID:	20160129120037.GA764006@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Petr Jelinek wrote:
> On 18 January 2016 at 09:19, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> > Needs rework after the commit of https://commitfest.postgresql.org/8/336/
>
> Here is version that applies to current master. There is some work to
> do (mostly cleanup) and the DDL is missing, but that's because I want
> to know what people think about the principle of how it works now and
> if it makes sense to finish it this way (explained in the original
> submission for Jan CF).

I would guess that the DDL boilterplate should come from Alexander
Korotkov's patch, right? I think a first easy step may be to combine
parts both patches so that we get the "amkind" column from this patch
and the DDL support from Alexander's patch (means that his proposed
command needs a new clause to specify the amkind); then the really tough
stuff in both Alexander's and this patch can be rebased on top of that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 13:48:34
Message-ID:	94685.1454075314@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> I would guess that the DDL boilterplate should come from Alexander
> Korotkov's patch, right? I think a first easy step may be to combine
> parts both patches so that we get the "amkind" column from this patch
> and the DDL support from Alexander's patch (means that his proposed
> command needs a new clause to specify the amkind);

Uh, what? Surely we would provide a bespoke command for each possible
sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check
that the provided function has the right signature, and then it would put
the correct amkind into the pg_am entry automatically.

If we skimp on this infrastructure then we're just relying on the
user not to make a typo, which doesn't seem like a good idea.

(Agreed though that a reasonable first step would be to add amkind to
pg_am and make the appropriate adjustments to existing code, ie check
that amkind is correct when fetching an index handler. I considered
putting that into the AM API refactor patch, but desisted.)

regards, tom lane

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 14:11:21
Message-ID:	CALLjQTT6b-DuP5daQKo7A7xBw3X6tONhK-m4nVYE+PbvVci8yw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29 January 2016 at 14:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> I would guess that the DDL boilterplate should come from Alexander
>> Korotkov's patch, right? I think a first easy step may be to combine
>> parts both patches so that we get the "amkind" column from this patch
>> and the DDL support from Alexander's patch (means that his proposed
>> command needs a new clause to specify the amkind);
>
> Uh, what? Surely we would provide a bespoke command for each possible
> sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.
>
> If we skimp on this infrastructure then we're just relying on the
> user not to make a typo, which doesn't seem like a good idea.
>

Yeah it has to be completely separate command for both that do
completely different sanity checks. It can't even be called CREATE
INDEX/SEQUENCE ACCESS METHOD unless we are willing to make ACCESS a
keyword due to preexisting CREATE INDEX/SEQUENCE <name> commands. The
previous version of the patch which used somewhat different underlying
catalog structure already had DDL and honestly writing the DDL part is
quite easy. I used CREATE ACCESS METHOD FOR SEQUENCE there.

> (Agreed though that a reasonable first step would be to add amkind to
> pg_am and make the appropriate adjustments to existing code, ie check
> that amkind is correct when fetching an index handler. I considered
> putting that into the AM API refactor patch, but desisted.)
>

Well I was wondering how to handle this as well, the submitted patch
currently just adds Asserts, because IMHO the actual ERROR should be
thrown in the DDL not in the code that just uses it.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 15:36:46
Message-ID:	20160129153645.GA773178@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:

> > Uh, what? Surely we would provide a bespoke command for each possible
> > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 16:03:16
Message-ID:	94895.1454083396@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>> On 29 January 2016 at 14:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Uh, what? Surely we would provide a bespoke command for each possible
> sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.

> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.

That seems okay. I had the impression you were proposing
CREATE ACCESS METHOD foobar TYPE 'i' USING functionname
or something like that, where it would be totally up to the user that
the amkind matched the function. That seems unnecessarily error-prone,
not to mention that it would close the door forever on any hope that
we might allow non-superusers to execute such commands.

regards, tom lane

From:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 20:55:55
Message-ID:	CAPpHfdvH3G1WmevJv7P5aZzRO9uYhxEOHHW+2XxUQ-=+LBsbZA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Petr Jelinek wrote:
> > On 29 January 2016 at 14:48, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>
> > > Uh, what? Surely we would provide a bespoke command for each possible
> > > sort of handler. As an example, CREATE INDEX ACCESS METHOD ought to
> check
> > > that the provided function has the right signature, and then it would
> put
> > > the correct amkind into the pg_am entry automatically.
>
> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.
>

Ok! Let's nail down the syntax and I can integrate it into my createam
patch.
I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
handler;", but I don't insist.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 22:24:07
Message-ID:	66776.1454106247@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>> like that.

> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
> handler;", but I don't insist.

I think that Alvaro's idea is less likely to risk future grammar
conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX
[ CONCURRENTLY ] to make me really wary of variables in the statement-name
part of the syntax.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-29 22:59:52
Message-ID:	CA+TgmoYy92CO1tcpDG-hd=pxPBcyqcyetFqeW=2Cu2H8st0VKA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
>> wrote:
>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>> like that.
>
>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>> handler;", but I don't insist.
>
> I think that Alvaro's idea is less likely to risk future grammar
> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX
> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
> part of the syntax.

Strong +1. If we put the type of access method immediately after
CREATE, I'm almost positive we'll regret it for exactly that reason.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-30 12:37:51
Message-ID:	CALLjQTQ8cf75FskqfGaSR5sGzbVNNUEPnKfZg_Gikw1PgNfDoA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29 January 2016 at 23:59, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
>>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
>>> wrote:
>>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>>> like that.
>>
>>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>>> handler;", but I don't insist.
>>
>> I think that Alvaro's idea is less likely to risk future grammar
>> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX
>> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
>> part of the syntax.
>
> Strong +1. If we put the type of access method immediately after
> CREATE, I'm almost positive we'll regret it for exactly that reason.
>

Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
conflict now, that's why my proposal was different, I didn't want to
add more keywords. I think Alvaro's proposal is fine as well.

The other point is that we are creating ACCESS METHOD object so that's
what should be after CREATE.

In any case this is slightly premature IMHO as DDL is somewhat unless
until we have sequence access methods implementation we can agree on,
or the generic WAL logging so that custom indexes can be crash safe.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-30 12:48:37
Message-ID:	CA+TgmoZBMHRNe9ErmcFgpd00cTV7+2B0xkhM2k_EGr=QC92BdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.

I missed your proposal, I guess, so please don't take as having any
position on whether it's better or worse than Alvaro's. I was only
intending to vote for the proposition that the type of access method
should follow the name of the access method.

> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.

Agreed.

> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.

Generic WAL logging seems like a *great* idea to me. But I think it's
largely independent from the access method stuff. If we have generic
WAL logging, people can create WAL-logged stuff that is not a new
access method. If we have access methods, they can create new access
methods that are not WAL-logged. If we have both, then they can
create WAL-logged access methods which of course is the payoff pitch,
but I don't see it as necessary or desirable for the two systems to be
tied together in any way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-01-30 13:03:47
Message-ID:	CALLjQTSUtMK9YvcrXWeYywdpuAgGNWVuB2OcjpVewUbH_dbVew@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30 January 2016 at 13:48, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Jan 30, 2016 at 7:37 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
>> conflict now, that's why my proposal was different, I didn't want to
>> add more keywords. I think Alvaro's proposal is fine as well.
>
> I missed your proposal, I guess, so please don't take as having any
> position on whether it's better or worse than Alvaro's. I was only
> intending to vote for the proposition that the type of access method
> should follow the name of the access method.
>

No worries didn't mean it that way.

>> In any case this is slightly premature IMHO as DDL is somewhat unless
>> until we have sequence access methods implementation we can agree on,
>> or the generic WAL logging so that custom indexes can be crash safe.
>
> Generic WAL logging seems like a *great* idea to me. But I think it's
> largely independent from the access method stuff. If we have generic
> WAL logging, people can create WAL-logged stuff that is not a new
> access method. If we have access methods, they can create new access
> methods that are not WAL-logged. If we have both, then they can
> create WAL-logged access methods which of course is the payoff pitch,
> but I don't see it as necessary or desirable for the two systems to be
> tied together in any way.

Okay, I am only debating the usefulness of DDL for access methods in
the current situation where we only have custom index access methods
which can't create WAL records. In other words trying to nudge people
slightly back towards the actual patch(es).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-02-16 12:58:48
Message-ID:	CAPpHfdvRsfqe+aPvxYvSNJUvuXEbS6FOQdcZzGOAgVWjC23Z4g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 30, 2016 at 3:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:

> On 29 January 2016 at 23:59, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> >>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera <
> alvherre(at)2ndquadrant(dot)com>
> >>> wrote:
> >>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or
> something
> >>>> like that.
> >>
> >>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name
> HANDLER
> >>> handler;", but I don't insist.
> >>
> >> I think that Alvaro's idea is less likely to risk future grammar
> >> conflicts. We've had enough headaches from CREATE [ UNIQUE ] INDEX
> >> [ CONCURRENTLY ] to make me really wary of variables in the
> statement-name
> >> part of the syntax.
> >
> > Strong +1. If we put the type of access method immediately after
> > CREATE, I'm almost positive we'll regret it for exactly that reason.
> >
>
> Just as a note, CREATE SEQUENCE ACCESS METHOD already causes grammar
> conflict now, that's why my proposal was different, I didn't want to
> add more keywords. I think Alvaro's proposal is fine as well.
>
> The other point is that we are creating ACCESS METHOD object so that's
> what should be after CREATE.
>
> In any case this is slightly premature IMHO as DDL is somewhat unless
> until we have sequence access methods implementation we can agree on,
> or the generic WAL logging so that custom indexes can be crash safe.

I've changed syntax of CREATE ACCESS METHOD syntax in the thread about
index access method extendability.
Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New
column amtype of pg_am stores access method type.
http://www.postgresql.org/message-id/CAPpHfdu9gLN7kuicweGsp50CaAMWx8Q-JWzbPehc92rvFHzkeg@mail.gmail.com
It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE
HANDLER handler;".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-05 04:09:24
Message-ID:	56DA5BF4.7040503@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 16/02/16 13:58, Alexander Korotkov wrote:
>
> I've changed syntax of CREATE ACCESS METHOD syntax in the thread about
> index access method extendability.
> Now it is "CREATE ACCESS METHOD name TYPE INDEX HANDLER handler;". New
> column amtype of pg_am stores access method type.
> http://www.postgresql.org/message-id/CAPpHfdu9gLN7kuicweGsp50CaAMWx8Q-JWzbPehc92rvFHzkeg@mail.gmail.com
> It could be easily extended to "CREATE ACCESS METHOD name TYPE SEQUENCE
> HANDLER handler;".
>

Yes I've seen and I will send a review within couple of days.

But first here is updated patch for sequence access methods. I went with
the previously discussed custom type as this gives us proper control
over the width of the state and making sure that it's not gonna be
toastable, etc and we need this as sequence is limited to single page.

Attached are 3 separate files. The first one (0001-create-am) is mainly
separate for the reason that there is some overlap with Alexander's
index access methods patch, so I extracted common code into separate
patch as it will make it easier to merge in case we are lucky enough to
get these patches in (but it's still based on Alexander's code). It
provides infra for defining new access methods from SQL, although
without the parser and without any actual access methods.

The 0002-seqam provides the SQL interface and support for sequence
access methods on top of the infra patch, and also provides all the
sequence access methods infrastructure and abstraction and documentation.

And finally the 0003-gapless-seq is example contrib module that
implements dependably and transitionally safe gapless sequence access
method. It's obviously slow as it has to do locking and basically
serialize all the changes to sequence so only one transaction may use it
at a time but it's truly gapless. It also serves as an example of use of
the api and as a test.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-create-am-infra-2015-03-05.patch	application/x-patch	35.1 KB
0002-seqam-2015-03-05.patch	application/x-patch	183.9 KB
0003-gapless-seq-2015-03-05.patch	application/x-patch	28.4 KB

From:	David Steele <david(at)pgmasters(dot)net>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-16 15:42:05
Message-ID:	56E97ECD.6080906@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/4/16 11:09 PM, Petr Jelinek wrote:

> But first here is updated patch for sequence access methods. I went with
> the previously discussed custom type as this gives us proper control
> over the width of the state and making sure that it's not gonna be
> toastable, etc and we need this as sequence is limited to single page.
>
> Attached are 3 separate files. The first one (0001-create-am) is mainly
> separate for the reason that there is some overlap with Alexander's
> index access methods patch, so I extracted common code into separate
> patch as it will make it easier to merge in case we are lucky enough to
> get these patches in (but it's still based on Alexander's code). It
> provides infra for defining new access methods from SQL, although
> without the parser and without any actual access methods.
>
> The 0002-seqam provides the SQL interface and support for sequence
> access methods on top of the infra patch, and also provides all the
> sequence access methods infrastructure and abstraction and documentation.
>
> And finally the 0003-gapless-seq is example contrib module that
> implements dependably and transitionally safe gapless sequence access
> method. It's obviously slow as it has to do locking and basically
> serialize all the changes to sequence so only one transaction may use it
> at a time but it's truly gapless. It also serves as an example of use of
> the api and as a test.

As far as I can see Petr has addressed all the outstanding issues in
this patch and it's ready for a review. The patch applies with some
easily-fixed conflicts:

$ git apply -3 ../other/0002-seqam-2015-03-05.patch
error: patch failed: src/include/catalog/pg_proc.h:5225
Falling back to three-way merge...
Applied patch to 'src/include/catalog/pg_proc.h' with conflicts.

Simon, you are signed up to review. Do you have an idea of when you'll
be doing that?

--
-David
david(at)pgmasters(dot)net

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-19 22:02:05
Message-ID:	20160319220205.GA329479@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Petr Jelinek wrote:

> And finally the 0003-gapless-seq is example contrib module that implements
> dependably and transitionally safe gapless sequence access method. It's
> obviously slow as it has to do locking and basically serialize all the
> changes to sequence so only one transaction may use it at a time but it's
> truly gapless. It also serves as an example of use of the api and as a test.

I'm trying to figure out this one, and I think I found something very
surprising. This code contains this

+#define GAPLESS_SEQ_NAMESPACE "gapless_seq"
+#define VALUES_TABLE_NAME "seqam_gapless_values"

which as far as I understand is something like a side table where values
for all the sequences are stored. Is that right? If it is, I admit I
didn't realize that these sequences worked in this way. This seems
broken to me. I had been under the impression that the values were
always stored in the sequence's relation. Maybe I'm misunderstanding;
please explain how this works.

FWIW I find it annoying that this submission's 0001 patch and
Alexander's 0001 have so much in common, yet they aren't similar enough
to consider that either is the definite form. Also, you have the SGML
docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both
versions isn't trivial.

Anyway I'm back at reviewing Alexander's 0001, which should be okay as a
basis for this series regardless of any edits I suggest there.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-21 05:26:25
Message-ID:	56EF8601.7020008@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 19/03/16 23:02, Alvaro Herrera wrote:
> Petr Jelinek wrote:
>
>> And finally the 0003-gapless-seq is example contrib module that implements
>> dependably and transitionally safe gapless sequence access method. It's
>> obviously slow as it has to do locking and basically serialize all the
>> changes to sequence so only one transaction may use it at a time but it's
>> truly gapless. It also serves as an example of use of the api and as a test.
>
> I'm trying to figure out this one, and I think I found something very
> surprising. This code contains this
>
> +#define GAPLESS_SEQ_NAMESPACE "gapless_seq"
> +#define VALUES_TABLE_NAME "seqam_gapless_values"
>
> which as far as I understand is something like a side table where values
> for all the sequences are stored. Is that right? If it is, I admit I
> didn't realize that these sequences worked in this way. This seems
> broken to me. I had been under the impression that the values were
> always stored in the sequence's relation. Maybe I'm misunderstanding;
> please explain how this works.
>

Yes, I did explain in the original submission that added gapless
sequence, which is about million messages upthread so it's
understandable that it's hard to follow.

The sequence am API is non-transactional (and does not use MVCC) as
sequences are non-trasactional and this provides significant performance
boost. But the gapless sequence needs to handle things like
subtransactions and rollbacks so it needs MVCC, for this reasons it also
uses external table to handle that, the locking and stuff is still
handled using normal sequence api but the MVCC part is handled by side
table yes.

Just as a note to you and anybody who wasn't following in the beginning
of this quite long thread - one of the goals of the sequence am api was
to abstract the actual storage and WAL logging of the sequences away
from the extension, so there is no generic WAL or anything like the
index am provides here. At this point though, it might be worth
reevaluating that approach if the generic WAL gets in.

>
> FWIW I find it annoying that this submission's 0001 patch and
> Alexander's 0001 have so much in common, yet they aren't similar enough
> to consider that either is the definite form. Also, you have the SGML
> docs for CREATE ACCESS METHOD in 0002 instead of 0001, so comparing both
> versions isn't trivial.

Well it's relatively annoying for the patch author as well. I tried to
write it so that it's as easy to merge as possible - the common code is
in my 0001, except for the SQL (gram.y, docs) because the SQL does not
have any meaning until either indexam or seqeunceam gets in (which also
makes it impossible to submit as separate infrastructure patch, because
there is no point of having the code in without the stuff that either
indexam or sequenceam provides on top of it).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-24 21:12:53
Message-ID:	56F45855.3070900@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I rebased this on top of the recently committed CREATE ACCESS METHOD.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0002-gapless-seq2016-02-24.patch	text/plain	28.4 KB
0001-seqam-2016-03-24.patch	text/plain	169.3 KB

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-28 19:11:29
Message-ID:	CAFcNs+pKtc6Y9vUO8TYN0vGcAkTgON0iPPKgCTp6S5_j7JwnfQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 24, 2016 at 6:12 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> I rebased this on top of the recently committed CREATE ACCESS METHOD.
>

Hi,

I got the above error trying to apply to the current master:

$ git apply /home/fabrizio/Downloads/0001-seqam-2016-03-24.patch
error: patch failed: src/backend/commands/amcmds.c:29
error: src/backend/commands/amcmds.c: patch does not apply

There are a wrong definition at the beginning of the amcmds.c:

34 <<<<<<< ours
35 static Oid lookup_index_am_handler_func(List *handler_name, char
amtype);
36 static const char *get_am_type_string(char amtype);
37 =======
38 static Oid lookup_am_handler_func(List *handler_name, char amtype);
39 static char *get_am_type_string(char amtype);
40 >>>>>>> theirs

After this small fix I can build and ran regress tests without errors.

But running "check-world" I got the error:

make[1]: Leaving directory `/data/postgresql/src/test/regress'
make: Leaving directory `/data/postgresql'
+ pg_dumpall -f /data/postgresql/src/bin/pg_upgrade/tmp_check/dump1.sql
ok 9 - dropuser foobar1 exit code 0
ok 10 - SQL DROP ROLE run: SQL found in server log
ok 11 - fails with nonexistent user
ok
t/080_pg_isready.pl .......
1..10
ok 1 - pg_isready --help exit code 0
ok 2 - pg_isready --help goes to stdout
ok 3 - pg_isready --help nothing to stderr
ok 4 - pg_isready --version exit code 0
ok 5 - pg_isready --version goes to stdout
ok 6 - pg_isready --version nothing to stderr
ok 7 - pg_isready with invalid option nonzero exit code
ok 8 - pg_isready with invalid option prints error message
ok 9 - fails with no server running
pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name" does
not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1
THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0
AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value =
-9223372036854775807 THEN NULL ELSE min_value END AS min_value,
cache_value, is_cycled FROM check_seq
pg_dumpall: pg_dump failed on database "regression", exiting
+ pg_dumpall1_status=1
+ [ /data/postgresql != /data/postgresql ]
+
/data/postgresql/src/bin/pg_upgrade/tmp_check/install//home/fabrizio/pgsql/bin/pg_ctl
-m fast stop
waiting for server to shut down.... done
server stopped
+ [ -n ]
+ [ -n ]
+ [ -n 1 ]
+ echo pg_dumpall of pre-upgrade database cluster failed
pg_dumpall of pre-upgrade database cluster failed
+ exit 1
+ rm -rf /tmp/pg_upgrade_check-3NUa0X
make[2]: *** [check] Error 1
make[2]: Leaving directory `/data/postgresql/src/bin/pg_upgrade'
make[1]: *** [check-pg_upgrade-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....

And testing pg_dump itself I got the same error trying to dump a database
that contains a sequence.

fabrizio=# \q

fabrizio(at)bagual:~/pgsql
$ bin/pg_dump > /tmp/fabrizio.sql
pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name" does
not exist
LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
^
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
increment_by, CASE WHEN increment_by > 0 AND max_value =
9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1
THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0
AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value =
-9223372036854775807 THEN NULL ELSE min_value END AS min_value,
cache_value, is_cycled FROM x

Toninght I'll review some parts of the code.

Regards,

Att,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

From:	David Steele <david(at)pgmasters(dot)net>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	fabriziomello(at)gmail(dot)com, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 15:25:24
Message-ID:	56FA9E64.9060601@pgmasters.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Petr,

On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:

> fabrizio(at)bagual:~/pgsql
> $ bin/pg_dump > /tmp/fabrizio.sql
> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"
> does not exist
> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
> ^
> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
> increment_by, CASE WHEN increment_by > 0 AND max_value =
> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value =
> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN
> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0
> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END
> AS min_value, cache_value, is_cycled FROM x

It looks like a new patch is needed. I've marked this "waiting on author".

Thanks,
--
-David
david(at)pgmasters(dot)net

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	David Steele <david(at)pgmasters(dot)net>
Cc:	Petr Jelinek <petr(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 16:50:05
Message-ID:	CAFcNs+q9VKkiHDXSqM+LWrVnbBc3-tEV=V5q9hFJvMCfYL3mVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david(at)pgmasters(dot)net> wrote:
>
> Hi Petr,
>
> On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>
>> fabrizio(at)bagual:~/pgsql
>> $ bin/pg_dump > /tmp/fabrizio.sql
>> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"
>> does not exist
>> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
>> ^
>> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
>> increment_by, CASE WHEN increment_by > 0 AND max_value =
>> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value =
>> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN
>> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0
>> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END
>> AS min_value, cache_value, is_cycled FROM x
>
>
> It looks like a new patch is needed. I've marked this "waiting on
author".
>

The attached patches fix the issues previous pointed by me.

But there are other issue in the gapless-seq extension when I ran "make
check":

43 CREATE EXTENSION gapless_seq;
44 CREATE SEQUENCE test_gapless USING gapless;
45 SELECT nextval('test_gapless'::regclass);
46 ! ERROR: could not access status of transaction 1275068416
47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
directory.
48 BEGIN;
49 SELECT nextval('test_gapless'::regclass);
50 ! ERROR: could not access status of transaction 1275068416
51 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
directory.
52 SELECT nextval('test_gapless'::regclass);
53 ! ERROR: current transaction is aborted, commands ignored until end of
transaction block
54 SELECT nextval('test_gapless'::regclass);
55 ! ERROR: current transaction is aborted, commands ignored until end of
transaction block
56 ROLLBACK;
57 SELECT nextval('test_gapless'::regclass);
58 ! ERROR: could not access status of transaction 1275068416
59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
directory.

And I see the same running manually:

fabrizio=# create extension gapless_seq;
CREATE EXTENSION
fabrizio=# create sequence x using gapless;
CREATE SEQUENCE
fabrizio=# select nextval('x');
ERROR: could not access status of transaction 1258291200
DETAIL: Could not open file "pg_subtrans/4B00": No such file or directory.

Regards,

Attachment	Content-Type	Size
0001-seqam-2016-03-29.patch	text/x-diff	165.8 KB
0002-gapless-seq-2016-03-29.patch	text/x-diff	28.9 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	fabriziomello(at)gmail(dot)com, David Steele <david(at)pgmasters(dot)net>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 17:26:28
Message-ID:	56FABAC4.9050303@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29/03/16 18:50, Fabrízio de Royes Mello wrote:
>
>
> On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david(at)pgmasters(dot)net
> <mailto:david(at)pgmasters(dot)net>> wrote:
> >
> > Hi Petr,
> >
> > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
> >
> >> fabrizio(at)bagual:~/pgsql
> >> $ bin/pg_dump > /tmp/fabrizio.sql
> >> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"
> >> does not exist
> >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN i...
> >> ^
> >> pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value,
> >> increment_by, CASE WHEN increment_by > 0 AND max_value =
> >> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value =
> >> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN
> >> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0
> >> AND min_value = -9223372036854775807 THEN NULL ELSE min_value END
> >> AS min_value, cache_value, is_cycled FROM x
> >
> >
> > It looks like a new patch is needed. I've marked this "waiting on
> author".
> >
>

Yeah there were some incompatible commits since my last rebase, fixed,
along with the pg_dump bugs..

>
> But there are other issue in the gapless-seq extension when I ran "make
> check":
>
> 43 CREATE EXTENSION gapless_seq;
> 44 CREATE SEQUENCE test_gapless USING gapless;
> 45 SELECT nextval('test_gapless'::regclass);
> 46 ! ERROR: could not access status of transaction 1275068416
> 47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
> directory.
> 48 BEGIN;
> 49 SELECT nextval('test_gapless'::regclass);
> 50 ! ERROR: could not access status of transaction 1275068416
> 51 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
> directory.
> 52 SELECT nextval('test_gapless'::regclass);
> 53 ! ERROR: current transaction is aborted, commands ignored until
> end of transaction block
> 54 SELECT nextval('test_gapless'::regclass);
> 55 ! ERROR: current transaction is aborted, commands ignored until
> end of transaction block
> 56 ROLLBACK;
> 57 SELECT nextval('test_gapless'::regclass);
> 58 ! ERROR: could not access status of transaction 1275068416
> 59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
> directory.
>
>
> And I see the same running manually:
>
> fabrizio=# create extension gapless_seq;
> CREATE EXTENSION
> fabrizio=# create sequence x using gapless;
> CREATE SEQUENCE
> fabrizio=# select nextval('x');
> ERROR: could not access status of transaction 1258291200
> DETAIL: Could not open file "pg_subtrans/4B00": No such file or directory.
>
> Regards,
>

Hmm I am unable to reproduce this. What OS? Any special configure flags
you use?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0002-gapless-seq-2016-03-29.patch	text/x-diff	28.4 KB
0001-seqam-2016-03-29.patch	text/x-diff	170.5 KB

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 17:46:53
Message-ID:	CAFcNs+r-8JAX-WrK+CZsjXMCqRZo7pp0iujj-q9Uei7ysOX8=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2016 at 2:26 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 29/03/16 18:50, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 12:25 PM, David Steele <david(at)pgmasters(dot)net
>> <mailto:david(at)pgmasters(dot)net>> wrote:
>> >
>> > Hi Petr,
>> >
>> > On 3/28/16 3:11 PM, Fabrízio de Royes Mello wrote:
>> >
>> >> fabrizio(at)bagual:~/pgsql
>> >> $ bin/pg_dump > /tmp/fabrizio.sql
>> >> pg_dump: [archiver (db)] query failed: ERROR: column "sequence_name"
>> >> does not exist
>> >> LINE 1: SELECT sequence_name, start_value, increment_by, CASE WHEN
i...
>> >> ^
>> >> pg_dump: [archiver (db)] query was: SELECT sequence_name,
start_value,
>> >> increment_by, CASE WHEN increment_by > 0 AND max_value =
>> >> 9223372036854775807 THEN NULL WHEN increment_by < 0 AND
max_value =
>> >> -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN
>> >> increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by
< 0
>> >> AND min_value = -9223372036854775807 THEN NULL ELSE min_value
END
>> >> AS min_value, cache_value, is_cycled FROM x
>> >
>> >
>> > It looks like a new patch is needed. I've marked this "waiting on
>> author".
>> >
>>
>
> Yeah there were some incompatible commits since my last rebase, fixed,
along with the pg_dump bugs..
>

Now all applies without errors, build and "make check" too.

>> But there are other issue in the gapless-seq extension when I ran "make
>> check":
>>
>> 43 CREATE EXTENSION gapless_seq;
>> 44 CREATE SEQUENCE test_gapless USING gapless;
>> 45 SELECT nextval('test_gapless'::regclass);
>> 46 ! ERROR: could not access status of transaction 1275068416
>> 47 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>> 48 BEGIN;
>> 49 SELECT nextval('test_gapless'::regclass);
>> 50 ! ERROR: could not access status of transaction 1275068416
>> 51 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>> 52 SELECT nextval('test_gapless'::regclass);
>> 53 ! ERROR: current transaction is aborted, commands ignored until
>> end of transaction block
>> 54 SELECT nextval('test_gapless'::regclass);
>> 55 ! ERROR: current transaction is aborted, commands ignored until
>> end of transaction block
>> 56 ROLLBACK;
>> 57 SELECT nextval('test_gapless'::regclass);
>> 58 ! ERROR: could not access status of transaction 1275068416
>> 59 ! DETAIL: Could not open file "pg_subtrans/4C00": No such file or
>> directory.
>>
>>
>> And I see the same running manually:
>>
>> fabrizio=# create extension gapless_seq;
>> CREATE EXTENSION
>> fabrizio=# create sequence x using gapless;
>> CREATE SEQUENCE
>> fabrizio=# select nextval('x');
>> ERROR: could not access status of transaction 1258291200
>> DETAIL: Could not open file "pg_subtrans/4B00": No such file or
directory.
>>
>> Regards,
>>
>
> Hmm I am unable to reproduce this. What OS? Any special configure flags
you use?
>

In my environment the error remains with your last patches.

I didn't use any special.

./configure --prefix=/home/fabrizio/pgsql --enable-cassert
--enable-coverage --enable-tap-tests --enable-depend
make -s -j8 install
make check-world

My environment:

fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
$ uname -a
Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Regards,

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	fabriziomello(at)gmail(dot)com
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 19:59:42
Message-ID:	56FADEAE.1030602@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>
> >
> > Hmm I am unable to reproduce this. What OS? Any special configure
> flags you use?
> >
>
> In my environment the error remains with your last patches.
>
> I didn't use any special.
>
> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
> --enable-coverage --enable-tap-tests --enable-depend
> make -s -j8 install
> make check-world
>
>
> My environment:
>
> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
> $ uname -a
> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
> 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
> $ gcc --version
> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
> Copyright (C) 2013 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>

Hmm nothing special indeed, still can't reproduce, I did one blind try
for a fix. Can you test with attached?

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0002-gapless-seq-2016-03-29-2.patch	text/x-diff	28.4 KB

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 20:08:02
Message-ID:	CAFcNs+o=O4yp2SYBMZHKnai83CeyBj4yMONOj0KQ3_oOOkrCyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>>
>>
>> >
>> > Hmm I am unable to reproduce this. What OS? Any special configure
>> flags you use?
>> >
>>
>> In my environment the error remains with your last patches.
>>
>> I didn't use any special.
>>
>> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> --enable-coverage --enable-tap-tests --enable-depend
>> make -s -j8 install
>> make check-world
>>
>>
>> My environment:
>>
>> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ uname -a
>> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
>> 2016 x86_64 x86_64 x86_64 GNU/Linux
>>
>> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
>> $ gcc --version
>> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> Copyright (C) 2013 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is
NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
>>
>
> Hmm nothing special indeed, still can't reproduce, I did one blind try
for a fix. Can you test with attached?
>

Same error... Now I'll leave in a trip but when I arrive I'll try to figure
out what happening debugging the code.

Regards,

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	fabriziomello(at)gmail(dot)com
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-29 20:58:45
Message-ID:	56FAEC85.9080305@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>
>
> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
> >>
> >>
> >> >
> >> > Hmm I am unable to reproduce this. What OS? Any special configure
> >> flags you use?
> >> >
> >>
> >> In my environment the error remains with your last patches.
> >>
> >> I didn't use any special.
> >>
> >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
> >> --enable-coverage --enable-tap-tests --enable-depend
> >> make -s -j8 install
> >> make check-world
> >>
> >>
> >> My environment:
> >>
> >> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
> >> $ uname -a
> >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37 UTC
> >> 2016 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
> >> $ gcc --version
> >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
> >> Copyright (C) 2013 Free Software Foundation, Inc.
> >> This is free software; see the source for copying conditions. There
> is NO
> >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.
> >>
> >
> > Hmm nothing special indeed, still can't reproduce, I did one blind
> try for a fix. Can you test with attached?
> >
>
> Same error... Now I'll leave in a trip but when I arrive I'll try to
> figure out what happening debugging the code.
>

Okay, thanks.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-30 04:06:34
Message-ID:	CAFcNs+pfX2+9LBECxde=UuufMdh=5cN=WmP+tSGhBwHnWxEhLg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2016 at 5:58 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> On 29/03/16 22:08, Fabrízio de Royes Mello wrote:
>>
>>
>>
>> On Tue, Mar 29, 2016 at 4:59 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com
>> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
>> >
>> > On 29/03/16 19:46, Fabrízio de Royes Mello wrotez
>> >>
>> >>
>> >> >
>> >> > Hmm I am unable to reproduce this. What OS? Any special configure
>> >> flags you use?
>> >> >
>> >>
>> >> In my environment the error remains with your last patches.
>> >>
>> >> I didn't use any special.
>> >>
>> >> ./configure --prefix=/home/fabrizio/pgsql --enable-cassert
>> >> --enable-coverage --enable-tap-tests --enable-depend
>> >> make -s -j8 install
>> >> make check-world
>> >>
>> >>
>> >> My environment:
>> >>
>> >> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
>> >> $ uname -a
>> >> Linux bagual 3.13.0-83-generic #127-Ubuntu SMP Fri Mar 11 00:25:37
UTC
>> >> 2016 x86_64 x86_64 x86_64 GNU/Linux
>> >>
>> >> fabrizio(at)bagual:/d/postgresql (0002-gapless-seq-petr)
>> >> $ gcc --version
>> >> gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
>> >> Copyright (C) 2013 Free Software Foundation, Inc.
>> >> This is free software; see the source for copying conditions. There
>> is NO
>> >> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
>> PURPOSE.
>> >>
>> >
>> > Hmm nothing special indeed, still can't reproduce, I did one blind
>> try for a fix. Can you test with attached?
>> >
>>
>> Same error... Now I'll leave in a trip but when I arrive I'll try to
>> figure out what happening debugging the code.
>>
>
> Okay, thanks.
>

Hi,

After some debugging seems that the "seqstate->xid" in "wait_for_sequence"
isn't properly initialized or initialized with the wrong value (1258291200).

Regards,

From:	Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-30 13:22:41
Message-ID:	20160330132241.1315.90869.pgcf@coridan.postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

[Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ]
Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
It does compile cleanly.

DESIGN
The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME ("seqam_gapless_values") strikes me a bit: while I understand the creation of a private schema named after the extension, it seems overkill for just a single table.
I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation details, if deemed reasonable.
On the other hand, creating the schema/table upon extension installation makes the values table use the default tablespace for the database, which can be good for concurrency --- direct writes to less loaded storage
(Note that users may want to move this table into an SSD-backed tablespace or equivalently fast storage ... moreso when many --not just one-- gapless sequences are being used)

Yet, there is <20141203102425(dot)GT2456(at)alap3(dot)anarazel(dot)de> where Andres argues against anything different than one-page-per-sequence implementations.
I guess this patch changes everything in this respect.

CODE
seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,
bool restart_requested, bool is_init)
-> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid reading as "is_initIALIZED"

DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master.
Please update/rebase the patch and resubmit.

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	Jose Luis Tallon <jltallon(at)adv-solutions(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-30 15:32:37
Message-ID:	56FBF195.8010702@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Thanks for review.

On 30/03/16 15:22, Jose Luis Tallon wrote:
> [Partial review] Evaluated: 0002-gapless-seq-2016-03-29-2.patch
> Needs updating code copyright years ... or is this really from 2013? [ contrib/gapless_seq/gapless_seq.c ]
> Patch applies cleanly to current master (3063e7a84026ced2aadd2262f75eebbe6240f85b)
>

Ouch good point, it does show how long the whole sequence am thing has
been around.

> DESIGN
> The decision to hardcode the schema GAPLESS_SEQ_NAMESPACE ("gapless_seq") and VALUES_TABLE_NAME ("seqam_gapless_values") strikes me a bit: while I understand the creation of a private schema named after the extension, it seems overkill for just a single table.
> I would suggest to devote some reserved schema name for internal implementation details and/or AM implementation details, if deemed reasonable.
> On the other hand, creating the schema/table upon extension installation makes the values table use the default tablespace for the database, which can be good for concurrency --- direct writes to less loaded storage
> (Note that users may want to move this table into an SSD-backed tablespace or equivalently fast storage ... moreso when many --not just one-- gapless sequences are being used)
>

Schema is needed for the handler function as well. In general I don't
want to add another internal schema that will be empty when no sequence
AM is installed.

> Yet, there is <20141203102425(dot)GT2456(at)alap3(dot)anarazel(dot)de> where Andres argues against anything different than one-page-per-sequence implementations.
> I guess this patch changes everything in this respect.

Not really, gapless just needs table for transactionality and as such is
an exception. It does not change how the nontransactional sequence
storage works though. I agree with Andres on this one.

> CODE
> seqam_gapless_init(Relation seqrel, Form_pg_sequence seq, int64 restart_value,
> bool restart_requested, bool is_init)
> -> is_init sounds confusing; suggest renaming to "initialize" or "initial" to avoid reading as "is_initIALIZED"

Sounds good.

> DEPENDS ON 0001-seqam-v10.patch , which isn't commited yet --- and it doesn't apply cleanly to current git master.
> Please update/rebase the patch and resubmit.
>

The current version of seqam is 0001-seqam-2016-03-29 which should apply
correctly.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	fabriziomello(at)gmail(dot)com
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-03-31 12:19:17
Message-ID:	56FD15C5.7070605@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

new version attached that should fix the issue. It was alignment -
honestly don't know what I was thinking using fixed alignment when the
AMs can define their own type.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0002-gapless-seq-2016-03-31.patch	text/x-diff	28.6 KB
0001-seqam-2016-03-31.patch	text/x-diff	170.8 KB

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-04-04 13:53:13
Message-ID:	CAFcNs+of-yp13AwXp7waP40O_9YFG8TqPLovKL-uVYkp+1jc3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> new version attached that should fix the issue. It was alignment -
honestly don't know what I was thinking using fixed alignment when the AMs
can define their own type.
>

Yeah... now all works fine... I had a minor issue when apply to the current
master but the attached fix it and I also added tab-complete support for
CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.

I ran all the regression tests and all passed.

I have just one question about de 0002 patch:
- There are some reason to not use TransactionId instead of uint32 in
GaplessSequenceState struct?

Regards,

Attachment	Content-Type	Size
0001-seqam-2016-04-04.patch	text/x-diff	168.4 KB
0002-gapless-seq-2016-04-04.patch	text/x-diff	27.2 KB

From:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
To:	fabriziomello(at)gmail(dot)com
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-04-05 15:52:46
Message-ID:	5703DF4E.2030004@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 04/04/16 15:53, Fabrízio de Royes Mello wrote:
>
>
> On Thu, Mar 31, 2016 at 9:19 AM, Petr Jelinek <petr(at)2ndquadrant(dot)com
> <mailto:petr(at)2ndquadrant(dot)com>> wrote:
> >
> > Hi,
> >
> > new version attached that should fix the issue. It was alignment -
> honestly don't know what I was thinking using fixed alignment when the
> AMs can define their own type.
> >
>
> Yeah... now all works fine... I had a minor issue when apply to the
> current master but the attached fix it and I also added tab-complete
> support for CREATE SEQUENCE...USING and ALTER SEQUENCE...USING.
>

Cool thanks.

> I ran all the regression tests and all passed.
>
> I have just one question about de 0002 patch:
> - There are some reason to not use TransactionId instead of uint32 in
> GaplessSequenceState struct?
>

I missed we have that :)

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To:	Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc:	David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Craig Ringer <craig(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Sequence Access Method WIP
Date:	2016-04-05 17:54:32
Message-ID:	CAFcNs+peRxMhWGvfQhCbcJy8b78Q7mj1bLKesgkjFcXNJfdFQA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 12:52 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>
>> I have just one question about de 0002 patch:
>> - There are some reason to not use TransactionId instead of uint32 in
>> GaplessSequenceState struct?
>>
>
> I missed we have that :)
>

Attached fix it and also I changed to use TransactionIdEquals instead of
compare xids directly.

- if (seqstate->xid != local_xid)
+ if (!TransactionIdEquals(seqstate->xid, local_xid))

IMHO this patch is Ready for Commiter. Changed status in CommitfestApp.

Regards,

Attachment	Content-Type	Size
0001-seqam-2016-04-05.patch	text/x-diff	168.4 KB
0002-gapless-seq-2016-04-05.patch	text/x-diff	27.2 KB