Listen / Notify rewrite

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Listen / Notify rewrite
Date: 2009-11-11 21:25:05
Message-ID: dc7b844e0911111325u39f92d7aib8bae8322b12a7c2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is a patch for a new listen/notify implementation.

In a few words, the patch reimplements listen/notify as an slru-based queue
which works similar to the sinval structure. Essentially it is a ring buffer on
disk with pages mapped into shared memory for read/write access.

Additionally the patch does the following (see below for details):

1. It removes the pg_listener relation and
2. adds the possibility to specify a payload parameter, i.e. executing in SQL
"NOTIFY foo 'payload';" and 'payload' will be delivered to any listening
backend.
3. Every distinct notification is delivered.
4. Order is preserved, i.e. if txn 1 first does NOTIFY foo, then NOTIFY bar, a
backend (listening to both "foo" and "bar") will always first receive the
notification "foo" and then the notification "bar".
5. It's now "listen to a channel", not "listen to a relation" anymore...

Details:

1. Instead of placing the queue into shared memory only I propose to create a
new subdirectory pg_notify/ and make the queue slru-based, such that we do not
risk blocking. Several people here have pointed out that blocking is a true
no-go for a new listen/notify implementation. With an slru-based queue we have
so much space that blocking is really unlikely even in periods with extreme
notify bursts.
Regarding performance, the slru-queue is not fsync-ed to disk so most activity
would be in the OS file cache memory anyway and most backends will probably
work on the same pages most of the time. However more locking overhead is
required in comparison to a shared-memory-only implementation.

There is one doubt that I have: Currently the patch adds notifications to the
queue after the transaction has committed to clog. The advantage is that we do
not need to take care of visibility. When we add notifications to the queue, we
have committed our transaction already and all reading backends are not in a
transaction anyway, so everything is visible to everyone and we can just write
to and read from the queue.

However, if for some reason we cannot write to the slru files in the pg_notify/
directory we might want to roll back the current transaction but with the
proposed patch we cannot because we have already committed...
But... if there is a problem with the pg_notify/ directory, then something is
fundamentally wrong on the file system of the database server and pg_subtrans/
and pg_clog/ are probably affected by the same problem... One possible solution
would be to write to the queue before committing and adding the TransactionID.
Then other backends can check if our TransactionID has successfully committed
or not. Not sure if this is worth the overhead however...

2. The payload parameter is optional. A notifying client can either call
"NOTIFY foo;" or "NOTIFY foo 'payload';". The length of the payload is
currently limited to 128 characters... Not sure if we should allow longer
payload strings... If there is more complex data to be transferred, the sending
transaction can always put all of that data into a relation and just send the
id of the entry. If no payload is specified, then this is treated internally
like an empty payload. Consequently an empty string will be delivered as the
payload to the listening backend.

3. Not every notification is delivered, only distinct notifications (per
transaction).
In other words, each sending transaction eliminates duplicate notifications
that have the exact same payload and channel name as another notification that
is already in the queue to be sent out.

Should we have an upper limit on the number of notifications that a transaction
is allowed to send? Without an upper limit, a client can open a transaction and
send a series of NOTIFYs, each with a different payload until its backend runs
out of memory...

4.
Sending Transaction does: This will be delivered (on commit):

NOTIFY foo; NOTIFY foo;
NOTIFY foo; NOTIFY bar 'value1';
NOTIFY bar 'value1'; NOTIFY bar 'value2';
NOTIFY foo;
NOTIFY bar 'value2';
NOTIFY bar 'value1';

Note that we do _not_ guarantee that notifications from transaction 1 are
always delivered before the notifications of transaction 2 just because
transaction 1 committed before transaction 2.

Let me know what you think,

Regards,
Joachim

Attachment Content-Type Size
listennotify.1.diff text/x-diff 66.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-11-11 21:40:43 Re: next CommitFest
Previous Message Peter Eisentraut 2009-11-11 21:24:09 not logging caught exceptions