Shared sequence-like objects in PostgreSQL

Lists: pgsql-hackers
From: Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Shared sequence-like objects in PostgreSQL
Date: 2011-09-21 07:19:34
Message-ID: 4E799006.4030308@dc.baikal.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello all,

I'm writing a C-language function that is similar to nextval() but
should return the next member of the recurrent sequence:
T(n+1) = f(T(n), T(n-1), ..., T(n-k)), where f is some function and k is
a constant.
The state of this object should be persistent between database restarts
and should be easily recovered if the database crashes.

So the first problem I encountered was where to store the current state
of this object (n and values T(n), T(n-1), ... T(n-k)). I believe that
TopMemoryContext is not shared between processes, therefore I must use
shmem functions from backend/storage/ipc/shmem.c to create a structure
in shared memory.

The next issue is how to synchronize backends' reads/writes to this
chunk of shared memory. I suppose there must be something to handle with
semaphores in the Postgres code.

Then I periodically need to persist the state of this object to the
database, for example for every 100 generated values, as well as on the
postmaster's shutdown. What is the best method for doing that?

Please let me know if this problem has been solved before. Thanks for
you help.


From: Greg Stark <stark(at)mit(dot)edu>
To: Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Shared sequence-like objects in PostgreSQL
Date: 2011-09-21 12:33:33
Message-ID: CAM-w4HOEW6vuvZ-om8EHnakgh1fjVgOoWUoQWetYHo-98f+kMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 21, 2011 at 8:19 AM, Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru> wrote:
> I'm writing a C-language function that is similar to nextval() but should
> return the next member of the recurrent sequence:
> T(n+1) = f(T(n), T(n-1), ..., T(n-k)), where f is some function and k is a
> constant.
> The state of this object should be persistent between database restarts and
> should be easily recovered if the database crashes.

The purpose of nextval() is to provide an escape hatch from the normal
transactional guarantees which would normally serialize everything
using it. Avoiding the performance impact of that is the only reason
it needs to use shared memory and so on.

If this function isn't performance critical and doesn't need to be
highly concurrent then you would be better off storing this
information in a table and updating the table using regular database
updates. The way you've defined it also makes me wonder whether you
can afford to skip values. If not then you don't really get an option
of avoiding the serialization.

If you can, one short-cut you could consider would be to populate a
table with the values of the sequence, and periodically populate more
values when you run short of unused values. Then you can use a regular
postgres sequence to generate indexes into that table. That would not
perform quite as well as a shared memory native implementation like
you describe but wouldn't require nearly as much Postgres-specific C
code.

Perhaps if you can explain what the problem you're actually trying to
solve is it might be clearer whether it justifies working at such a
low level.

--
greg