Re: how to pass data (tuples) to worker processes?

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Andrew Tipton'" <andrew(at)kiwidrew(dot)com>
Cc: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Tomas Vondra'" <tv(at)fuzzy(dot)cz>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: how to pass data (tuples) to worker processes?
Date: 2013-08-07 05:41:17
Message-ID: 004d01ce9330$bde5f280$39b1d780$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, August 06, 2013 6:29 PM Robert Haas wrote:
> On Sat, Aug 3, 2013 at 6:31 AM, Andrew Tipton <andrew(at)kiwidrew(dot)com>
> wrote:
> > Robert: any chance you could share a few more details on the
> enhancements
> > you're planning for bgworkers? I seem to recall reading that
> communicating
> > with the dynamic bgworkers after they had been launched was next on
> your
> > agenda...
>
> Yeah, it is. I'm working on a patch to allow additional shared memory
> segments to be created on the fly. The idea I'm working with is that
> a backend that plans to launch a worker will first create a dynamic
> shared memory segment, then pass the ID of that segment to the worker
> via bgw_main_arg. The worker will map the segment, and then the two
> processes can use that to communicate. My thought is to create a
> queue abstraction that sits on top of the dynamic shared memory
> infrastructure, so that you can set aside a portion of your dynamic
> shared memory segment to use as a ring buffer and send messages back
> and forth with using some kind of API along these lines:
>
> extern void dsm_queue_send(dsm_queue *, char *data, uint64 len);
> extern uint64 dsm_queue_receive(dsm_queue *, char **dataptr);
>
> It would also be possible to implement message sending and receiving
> using pipes, but I'm leaning away from that because it would require
> even more OS-dependent code than I'm already having to write, and
> writing OS-dependent shim layers is one of the world's less-rewarding
> coding tasks; and also because I think it will be easier to achieve
> zero-copy semantics using shared memory.

Another idea to get parallel tasks done by bgworkers is rather than
dynamically invoking
a new bgworker, we can have a set of pre-allocated bgworkers for a sever and
then based on need allocate
bgworker from pre-allocated array.

Now we can allocate shared memory in the beginning based on bgworkers and
the information needed to share between
Backend and bgworkers (Plan, Tuple, snapshot, .. ).

The basic idea can work as below:
a. Backend who wishes to get parallel tasks done by bgworker will divide the
tasks and check which bgworkers are free and share the plan in
corresponding bgworker share memory.
b. Bgworker who is polling on its slot of shared memory can retrieve the
plan and execute it.
c. Bgworker can share the tuples again in its shared memory slot
d. Backend can retrieve tuples from shared memory slots of bgworkers where
it has communicated the plan
e. Backend can send the tuples back to client

This idea has a drawback that queue of tuples to be shared has to be of
fixed size as we need to allocate memory in beginning.

With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Harris 2013-08-07 08:29:47 Re: how to pass data (tuples) to worker processes?
Previous Message Amit Kapila 2013-08-07 03:44:02 Re: Unsafe GUCs and ALTER SYSTEM WAS: Re: ALTER SYSTEM SET