Re: Serialization, Locking...implement processing Queue with a table

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "D(dot) Dante Lorenso" <dante(at)lorenso(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Serialization, Locking...implement processing Queue with a table
Date: 2003-05-12 14:23:26
Message-ID: 8777.1052749406@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"D. Dante Lorenso" <dante(at)lorenso(dot)com> writes:
> How should I go about implementing a synchronized process id
> queue that will select one unique row from a table at a time
> and make the selection be safe with concurrent accesses?

You can find various discussions of this in the archives, but a
reasonable way to proceed is:

1. The table of pending or in-process jobs has a column "processor_id"
that is zero for pending jobs and equal to the (unique) processor number
for active jobs. (Assume for the moment that completed jobs are removed
from the table entirely.)

2. When idle, you try to reserve a job like so:

BEGIN;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT job_id, ... FROM job_table
WHERE processor_id = 0 LIMIT 1 FOR UPDATE;

The SELECT has three possible outcomes:

2a: One row is returned. You do

UPDATE job_table SET processor_id = $me
WHERE job_id = $jobid;
COMMIT;

and then go about executing the job. When done, delete the row from
job_table and try to get another one.

2b: No row is returned: no jobs are pending. Commit your transaction,
sleep for an appropriate delay period, and try again.

2c: You get a "can't serialize" failure. This will happen if two
processors try to reserve the same row at the same time. In this case,
roll back your transaction, sleep for a short interval (maybe a few
msec) and try again. You don't want to sleep as long as normal in this
case, since there might be another available job.

(Note that you *cannot* do this in a plpgsql function, since it cannot
start or commit a transaction; these commands have got to be directly
issued by the application.)

Assuming that there aren't a vast number of pending jobs at any time,
this should work pretty well without even bothering with an index on
job_table. You will want to vacuum it often though (at least every few
hundred job completions, I'd think).

Now, what if you wanted to remember completed jobs? I'd actually
recommend transferring the records of completed jobs to a different
table. But if you really want to keep them in the same table, maybe
add a boolean "completed" field, and make the initial SELECT be

SELECT job_id, ... FROM job_table
WHERE processor_id = 0 AND NOT completed LIMIT 1 FOR UPDATE;

Now you *will* need an index to keep things speedy. I'd try a partial
index on processor_id with condition "NOT completed". You'll still
need frequent vacuums.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Max Bernaert 2003-05-12 14:30:31 Re: More than one user on postgresql database on Windows ?
Previous Message Jason Hihn 2003-05-12 14:20:30 Arrays with MS Access?