Re: how to pass data (tuples) to worker processes?

Lists: pgsql-hackers
From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: how to pass data (tuples) to worker processes?
Date: 2013-04-07 21:24:45
Message-ID: 5161E41D.2090609@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I'm learning how to use the "background worker processes" commited in
9.3. The usage basics are quite nicely illustrated in the worker_spi
extension (kudos to those who designed the feature / extension).

I'm not quite sure how to pass data between the regular backend and a
worker. Implementing the channel (socket/pipe/...) itself is not a big
deal, that's IPC 101, but deciding which data to copy (and how) is.

Say I need to forward a tuple to the worker process - e.g. from a
nodeAgg node, so that the worker can build the hash table. Is there
something (a rule of a thumb, method, ...) that would help me to
identify the pieces of data that need to be copied?

Or do I need to do the go through the objects and decide what to copy
and how on my own?

regards
Tomas


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: how to pass data (tuples) to worker processes?
Date: 2013-08-02 21:43:03
Message-ID: 20130802214302.GV5669@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra wrote:

> I'm learning how to use the "background worker processes" commited in
> 9.3. The usage basics are quite nicely illustrated in the worker_spi
> extension (kudos to those who designed the feature / extension).

Thanks!

> I'm not quite sure how to pass data between the regular backend and a
> worker. Implementing the channel (socket/pipe/...) itself is not a big
> deal, that's IPC 101, but deciding which data to copy (and how) is.
>
> Say I need to forward a tuple to the worker process - e.g. from a
> nodeAgg node, so that the worker can build the hash table. Is there
> something (a rule of a thumb, method, ...) that would help me to
> identify the pieces of data that need to be copied?
>
> Or do I need to do the go through the objects and decide what to copy
> and how on my own?

Were you able to figure it out? If so, would you share?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Tipton <andrew(at)kiwidrew(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: how to pass data (tuples) to worker processes?
Date: 2013-08-03 10:31:34
Message-ID: CA+M2pVUxk5OP3y==LL+v_P7zrr2qAF6ZpTfpQAbN4z7h_CjQvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Aug 3, 2013 at 5:43 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>wrote:

> Tomas Vondra wrote:
>
> > I'm learning how to use the "background worker processes" commited in
> > 9.3. The usage basics are quite nicely illustrated in the worker_spi
> > extension (kudos to those who designed the feature / extension).
>
> Thanks!
>
> > I'm not quite sure how to pass data between the regular backend and a
> > worker. Implementing the channel (socket/pipe/...) itself is not a big
> > deal, that's IPC 101, but deciding which data to copy (and how) is.
> >
> > [...]
>
> Were you able to figure it out? If so, would you share?

I'm also in the middle of doing some experiments with bgworkers, and for me
it's the IPC part that's proving tricky. I'd love to have a simple socket
that can be used to communicate with the bgworker. But because the
bgworker is launched by the postmaster -- and not the backend which
registers it -- there's no chance for the bgworker to inherit one end of
the socketpair().

Tomas: in the end, what approach did you use for IPC?

Robert: any chance you could share a few more details on the enhancements
you're planning for bgworkers? I seem to recall reading that communicating
with the dynamic bgworkers after they had been launched was next on your
agenda...

Regards,
Andrew Tipton


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Tipton <andrew(at)kiwidrew(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: how to pass data (tuples) to worker processes?
Date: 2013-08-06 12:59:25
Message-ID: CA+TgmobiD=Bm+-Jci_1uXSGTF7bfzKEw9Uzp1CFQmP2o0ecwCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Aug 3, 2013 at 6:31 AM, Andrew Tipton <andrew(at)kiwidrew(dot)com> wrote:
> Robert: any chance you could share a few more details on the enhancements
> you're planning for bgworkers? I seem to recall reading that communicating
> with the dynamic bgworkers after they had been launched was next on your
> agenda...

Yeah, it is. I'm working on a patch to allow additional shared memory
segments to be created on the fly. The idea I'm working with is that
a backend that plans to launch a worker will first create a dynamic
shared memory segment, then pass the ID of that segment to the worker
via bgw_main_arg. The worker will map the segment, and then the two
processes can use that to communicate. My thought is to create a
queue abstraction that sits on top of the dynamic shared memory
infrastructure, so that you can set aside a portion of your dynamic
shared memory segment to use as a ring buffer and send messages back
and forth with using some kind of API along these lines:

extern void dsm_queue_send(dsm_queue *, char *data, uint64 len);
extern uint64 dsm_queue_receive(dsm_queue *, char **dataptr);

It would also be possible to implement message sending and receiving
using pipes, but I'm leaning away from that because it would require
even more OS-dependent code than I'm already having to write, and
writing OS-dependent shim layers is one of the world's less-rewarding
coding tasks; and also because I think it will be easier to achieve
zero-copy semantics using shared memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Andrew Tipton'" <andrew(at)kiwidrew(dot)com>
Cc: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Tomas Vondra'" <tv(at)fuzzy(dot)cz>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: how to pass data (tuples) to worker processes?
Date: 2013-08-07 05:41:17
Message-ID: 004d01ce9330$bde5f280$39b1d780$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, August 06, 2013 6:29 PM Robert Haas wrote:
> On Sat, Aug 3, 2013 at 6:31 AM, Andrew Tipton <andrew(at)kiwidrew(dot)com>
> wrote:
> > Robert: any chance you could share a few more details on the
> enhancements
> > you're planning for bgworkers? I seem to recall reading that
> communicating
> > with the dynamic bgworkers after they had been launched was next on
> your
> > agenda...
>
> Yeah, it is. I'm working on a patch to allow additional shared memory
> segments to be created on the fly. The idea I'm working with is that
> a backend that plans to launch a worker will first create a dynamic
> shared memory segment, then pass the ID of that segment to the worker
> via bgw_main_arg. The worker will map the segment, and then the two
> processes can use that to communicate. My thought is to create a
> queue abstraction that sits on top of the dynamic shared memory
> infrastructure, so that you can set aside a portion of your dynamic
> shared memory segment to use as a ring buffer and send messages back
> and forth with using some kind of API along these lines:
>
> extern void dsm_queue_send(dsm_queue *, char *data, uint64 len);
> extern uint64 dsm_queue_receive(dsm_queue *, char **dataptr);
>
> It would also be possible to implement message sending and receiving
> using pipes, but I'm leaning away from that because it would require
> even more OS-dependent code than I'm already having to write, and
> writing OS-dependent shim layers is one of the world's less-rewarding
> coding tasks; and also because I think it will be easier to achieve
> zero-copy semantics using shared memory.

Another idea to get parallel tasks done by bgworkers is rather than
dynamically invoking
a new bgworker, we can have a set of pre-allocated bgworkers for a sever and
then based on need allocate
bgworker from pre-allocated array.

Now we can allocate shared memory in the beginning based on bgworkers and
the information needed to share between
Backend and bgworkers (Plan, Tuple, snapshot, .. ).

The basic idea can work as below:
a. Backend who wishes to get parallel tasks done by bgworker will divide the
tasks and check which bgworkers are free and share the plan in
corresponding bgworker share memory.
b. Bgworker who is polling on its slot of shared memory can retrieve the
plan and execute it.
c. Bgworker can share the tuples again in its shared memory slot
d. Backend can retrieve tuples from shared memory slots of bgworkers where
it has communicated the plan
e. Backend can send the tuples back to client

This idea has a drawback that queue of tuples to be shared has to be of
fixed size as we need to allocate memory in beginning.

With Regards,
Amit Kapila.


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: how to pass data (tuples) to worker processes?
Date: 2013-08-07 08:29:47
Message-ID: 5202057B.1010408@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/08/13 13:59, Robert Haas wrote:
> My thought is to create a
> queue abstraction that sits on top of the dynamic shared memory
> infrastructure, so that you can set aside a portion of your dynamic
> shared memory segment to use as a ring buffer and send messages back
> and forth with using some kind of API along these lines:

You may find http://quatermass.co.uk/toolsmith/mplib1/ of use here.
--
Cheers,
Jeremy