Re: queueing via database table?

Lists: pgsql-general
From: Mark Harrison <mh(at)pixar(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: queueing via database table?
Date: 2007-01-03 06:34:24
Message-ID: 459B4E70.2050307@pixar.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a cluster of CPUs generating thumbnails for
a render farm. I would like to place thumbnail
requests on a queue, and have the cluster of client
dequeue the requests and process them.

Of course, each request should be only dequeued once...
if a thumbnail is being processed by one CPU, it
shouldn't be processed by another CPU.

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table. They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.

Comments and suggestions welcome,
Mark

--
Mark Harrison
Pixar Animation Studios


From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: queueing via database table?
Date: 2007-01-03 07:00:06
Message-ID: AD0E0B23-6AD2-4AE5-BC4D-CB4EB4B1969B@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Jan 2, 2007, at 10:34 PM, Mark Harrison wrote:

> I have a cluster of CPUs generating thumbnails for
> a render farm. I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
>
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
>
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table. They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.
>
> Comments and suggestions welcome,

Holding a lock while generating the thumbnail doesn't
sound like a great idea, and I think that the select
for update will end up serialising the requests.

I'd add a "rendering" field, text, defaulting
to an empty string.

Then do a "select for update where ... and rendering = '' limit 1",
update the rendering field to the hostname of the box doing the
work and commit. Render the thumbnail. Delete the record.

That'll also give you an easy way to show status of which
box is rendering which scene.

Depending on what else you're putting into the where clause
a partial index on something for records where rendering=''
might be helpful.

Cheers,
Steve


From: Richard Huxton <dev(at)archonet(dot)com>
To: Steve Atkins <steve(at)blighty(dot)com>
Cc: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: queueing via database table?
Date: 2007-01-03 08:37:30
Message-ID: 459B6B4A.5030903@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Steve Atkins wrote:
>
> Holding a lock while generating the thumbnail doesn't
> sound like a great idea, and I think that the select
> for update will end up serialising the requests.
>
> I'd add a "rendering" field, text, defaulting
> to an empty string.
>
> Then do a "select for update where ... and rendering = '' limit 1",
> update the rendering field to the hostname of the box doing the
> work and commit. Render the thumbnail. Delete the record.

Assuming each processing host keeps its connection open, I'd store the
process-id instead (get via function pg_backend_pid()). Also have a
separate status (pending|processing|done) and timestamps to track when
each status is set. Only delete rows that have been marked "done" for a
certain length of time.

This will let you spot when a host has stopped processing (e.g. crashed)
and also let you measure throughput on particular hosts.

The other thing to be aware of is that queries of the SELECT FOR UPDATE
LIMIT 1 form can return 0 rows. Open two psql connections and try it to
see what happens. You'll need to have the processing clients retry the
query in this case.

HTH
--
Richard Huxton
Archonet Ltd


From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Mark Harrison" <mh(at)pixar(dot)com>, "PostgreSQL general" <pgsql-general(at)postgresql(dot)org>
Subject: Re: queueing via database table?
Date: 2007-01-03 09:14:54
Message-ID: 758d5e7f0701030114t4a1b590ds319f534c2793545@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 1/3/07, Mark Harrison <mh(at)pixar(dot)com> wrote:
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table. They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.

Well, this will block. So it will mean that only one thumbnail
will be processed while running the transaction.

You may want to rather use SELECT FOR UPDATE NOWAIT,
probably "wrapped" into a PL/PgSQL function. I did that and
I'm quite satisfied with this approach.

A simple implementation would be something like this:

CREATE OR REPLACE FUNCTION get_next() RETURNS int AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT id FROM foo_table LIMIT 100 LOOP
BEGIN
PERFORM id FROM foo_table WHERE id=r.id FOR UPDATE NOWAIT;
RETURN r.id;
EXCEPTION
WHEN lock_not_available THEN -- do nothing
END;
END LOOP;
RETURN NULL;
END;
$$ LANGUAGE PLpgSQL;

Of course you should customize the query, and use better tuned limit.
I think good rule of the thumb size of LIMIT is twice the number of
simultaneous processing nodes working. An ORDER BY might be
worh it or not, etc, etc.

Other approach might be using something like
LOOP
BEGIN
SELECT id INTO i FROM foo_table LIMIT 1 OFFSET n FOR UPDATE NOWAIT;
RETURN i;
EXCEPTION
WHEN lock_not_avaibale THEN -- do nothing;
END;
n := n + 1;
END LOOP;

But I feel it will be slower most of the time.


From: "Gregory S(dot) Williamson" <gsw(at)globexplorer(dot)com>
To: "Mark Harrison" <mh(at)pixar(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: queueing via database table?
Date: 2007-01-03 09:52:08
Message-ID: 71E37EF6B7DCC1499CEA0316A256832802B3EB44@loki.wc.globexplorer.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Mark --

As others have indicated, there may be some blocking issues with the approach you outlined.

A variant I have seen used in the past uses a table with a unique id for the job, the work queue it is in, a status flag, priority and at least one time stamp (and perhaps space for a process id).

Each client that wants work issues a request (SELECT FOR UPDATE) to get the next job in its queue that has a status flag of "Available" ordered by priority or initial time of creation, etc.; update that entry with the current timestamp (and perhaps the process id of the client) and set the status flag to show the job is now being worked on all in one transaction.

This releases the job but now with a changed status flag so other processes pulling work from the same queue won't see it anymore.

When the job finishes it selects its entry and updates the status flag and timestamp (and probably clears its process id). Logic for how to bump a job to the next step can be embedded in the client or in another process, depending on your needs.

It is useful to have a daemon or some other process to sweep the queue table and at least send an alert about stale or frozen jobs.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC
-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org on behalf of Mark Harrison
Sent: Tue 1/2/2007 10:34 PM
To: pgsql-general(at)postgresql(dot)org
Cc:
Subject: [GENERAL] queueing via database table?

I have a cluster of CPUs generating thumbnails for
a render farm. I would like to place thumbnail
requests on a queue, and have the cluster of client
dequeue the requests and process them.

Of course, each request should be only dequeued once...
if a thumbnail is being processed by one CPU, it
shouldn't be processed by another CPU.

Does the following sound like a reasonable approach?
If not, what's a good way to go?

The processes generating the requests will insert into
a queue table. They may add a priority and timestamp.

The several processes servicing the requests will do a
SELECT FOR UPDATE where ... limit 1, generate thumbnail,
delete the record and commit.

Comments and suggestions welcome,
Mark

--
Mark Harrison
Pixar Animation Studios

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
message can get through to the mailing list cleanly

-------------------------------------------------------
Click link below if it is SPAM gsw(at)globexplorer(dot)com
"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=459b5025191744846743324&user=gsw(at)globexplorer(dot)com&retrain=spam&template=history&history_page=1"
!DSPAM:459b5025191744846743324!
-------------------------------------------------------


From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: queueing via database table?
Date: 2007-01-03 12:10:32
Message-ID: 459B9D38.8020304@cox.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/03/07 00:34, Mark Harrison wrote:
> I have a cluster of CPUs generating thumbnails for
> a render farm. I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
>
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
>
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table. They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.
>
> Comments and suggestions welcome,

That's not what relational tables are good at. Instead, use a
message queuing library with a file backing-store (so that if the
machine goes down for any reason, the messages are still in the queue).

In a message-passing system, a network-aware daemon manages a set of
named FIFO queues. Some processes call in_q(), and other processes
(same machine, or not) call de_q(). If nothing is calling de_q(),
messages just pile up in the queue until such time as something
*does* start calling de_q().

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFFm504S9HxQb37XmcRAmj6AKDWa7Sx15wygoTc+/wOfLZIpqi4awCg34SZ
rkq1IEjdqu1zx0B5QyFW/n0=
=V0oF
-----END PGP SIGNATURE-----


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Mark Harrison <mh(at)pixar(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: queueing via database table?
Date: 2007-01-03 15:34:26
Message-ID: 1167838466.3393.4.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-01-02 at 22:34 -0800, Mark Harrison wrote:
> I have a cluster of CPUs generating thumbnails for
> a render farm. I would like to place thumbnail
> requests on a queue, and have the cluster of client
> dequeue the requests and process them.
>
> Of course, each request should be only dequeued once...
> if a thumbnail is being processed by one CPU, it
> shouldn't be processed by another CPU.
>
> Does the following sound like a reasonable approach?
> If not, what's a good way to go?
>
> The processes generating the requests will insert into
> a queue table. They may add a priority and timestamp.
>
> The several processes servicing the requests will do a
> SELECT FOR UPDATE where ... limit 1, generate thumbnail,
> delete the record and commit.

Here's what I'd do. Create two sequences. Sequence one is used to
assign ids to the thumbnail records when they're placed into the control
table. The other is used to "check out" the records.

A process selects nextval from sequence two, and then selects the
corresponding record info from the control table, and marks the record
as being in work. When it's done, it marks it as done, and selects
another value from the second sequence and repeats the process.

Every now and then run a check program to look for thumbs that have been
missed or skipped and process them or assign them a new id from sequence
one to put them back into the queue.


From: Vivek Khera <vivek(at)khera(dot)org>
To: pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: queueing via database table?
Date: 2007-01-03 19:01:44
Message-ID: E43DE9CF-3C9D-4635-A4A2-1B58AA444A7E@khera.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Jan 3, 2007, at 2:00 AM, Steve Atkins wrote:

> Holding a lock while generating the thumbnail doesn't
> sound like a great idea, and I think that the select
> for update will end up serialising the requests.
>
> I'd add a "rendering" field, text, defaulting
> to an empty string.
>
> Then do a "select for update where ... and rendering = '' limit 1",
> update the rendering field to the hostname of the box doing the
> work and commit. Render the thumbnail. Delete the record.
>
> That'll also give you an easy way to show status of which
> box is rendering which scene.
>
> Depending on what else you're putting into the where clause
> a partial index on something for records where rendering=''
> might be helpful.

this is more or less how we do it, so i second this.

we also use NOTIFY/LISTEN to "wake up" the job processors when new
work is added.