Re: Listen / Notify - what to do when the queue is full

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Arnaud Betremieux <arnaud(dot)betremieux(at)keyconsulting(dot)fr>
Subject: Re: Listen / Notify - what to do when the queue is full
Date: 2010-02-14 16:22:12
Message-ID: dc7b844e1002140822s3329a59el3fb953afff3e40ba@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 14, 2010 at 2:46 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Some minor review comments without having taken in much of previous
> discussion:
>
> * Patch doesn't apply cleanly anymore, close.

New version is rebased against current HEAD.

> * In async.c you say "new async notification model". Please say "async
> notification model in 9.0 is".

I haven't changed that line at all... ;-) But the fact that the
former "new async model" is now the old one really shows that a
version number makes sense here.

> In (2) you say there is a central queue,

fixed with some additional comments.

> * There is no mention of what to do with pg_notify at checkpoint. Look
> at how pg_subtrans handles this. Should pg_notify do the same?

Actually we don't care... We even hope that the pg_notify pages are
not flushed at all. Notifications don't survive a server restart
anyway and upon restart we just delete whatever is in the directory.

> * Is there a lazy backend avoidance scheme as exists for relcache
> invalidation messages? see storage/ipc/sinval...

We cannot avoid a lazy backend. If a listening backend is (probably
idle) in a running transaction we don't clean up the notification
queue anymore but wait for the slowest backend to do it. But there are
other effects of long running transactions that you will probably
notice long before your notification queue gets filled up.

> OK, I see asyncQueueFillWarning() but nowhere else says it exists and
> there aren't any comments in it to say what it does or why
> * What do you expect the DBA to do when they receive a queue fill
> warning? Where is that written down?

Comment added as well as an errdetail for whoever sees this message:

WARNING: pg_notify queue is more than 50% full. Among the slowest
backends: 27744
DETAIL: Cleanup can only proceed if this backend ends its current transaction

I have also added a note in the NOTIFY documentation about this.

> * Not clear of the purpose of backendSendsNotifications. In
> AtCommit_NotifyAfterCommit() the logic seems strange. Code is
> /* Allow transactions that have not executed
> LISTEN/UNLISTEN/NOTIFY to
> * return as soon as possible */
> if (!pendingActions && !backendSendsNotifications)
> return;
> but since backendSendsNotifications is set true earlier there is no fast
> path.

What exactly is the strange logic that you're seeing? The fast path is
for transactions that do neither LISTEN/UNLISTEN nor NOTIFY. Without
LISTEN/UNLISTEN they have pendingActions = NIL and without NOTIFY they
have backendSendsNotifications = false and will return immediately.

backendSendsNotifications is only set to be true if the backend has
executed NOTIFY...

> * AtSubCommit_Notify doesn't seem to have a fastpath when no notify
> commands have been executed.

I haven't changed this function at all. I don't see much benefit
either as it's just concatenating empty lists. This immediately
returns from the respective list handling function.

> * In Send_Notify() you throw ERROR while still holding the lock. It
> seems better practice to drop the lock then ERROR.

Explicit LWLockRelease() added.

> * Why is Send_Notify() a separate function? It's only called from one
> point in the code.

I have now inlined Send_Notify().

> * We know that sometimes a FATAL error can occur without properly
> processing the abort. Do we depend upon this never happening?

What exactly could happen? I thought that a backend either does a
clean shutdown or that the postmaster starts over. We depend on a
listening backend to execute its on_shmem_exit() handlers before
leaving the game...

> * Can we make NUM_ASYNC_BUFFERS = 8? 4 just seems too small

I have done some tests with NUM_ASYNC_BUFFERS and couldn't find any
measurable performance difference at all, so I kept it at 4. 8 isn't
exaggerated either and seems to be a fine number though. Changed to 8.

> * backendSendsNotifications is future tense. The variable should be
> called something like hasSentNotifications. Couple of other variables
> with similar names

Variable names changed as proposed.

New patch attached, thanks for the review.

Joachim

Attachment Content-Type Size
listennotify.12.diff text/x-patch 111.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2010-02-14 17:06:09 Re: Optimizing GetConflictingVirtualXIDs()
Previous Message Yeb Havinga 2010-02-14 16:08:05 Re: function to display ddl