Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Dilip kumar <dilip(dot)kumar(at)huawei(dot)com>, Jan Lentfer <Jan(dot)Lentfer(at)web(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>
Subject: Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date: 2014-07-16 06:43:25
Message-ID: CABUevEyPFmOtLmqT124BDQtNhHBaDw6YPUijgUXu8UktENMc+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 16, 2014 7:05 AM, "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com> wrote:
>
> Tom Lane wrote:
> > Dilip kumar <dilip(dot)kumar(at)huawei(dot)com> writes:
> > > On 15 July 2014 19:01, Magnus Hagander Wrote,
> > >> I am late to this game, but the first thing to my mind was - do we
> > >> really need the whole forking/threading thing on the client at all?
> >
> > > Thanks for the review, I understand you point, but I think if we have
do this directly by independent connection,
> > > It's difficult to equally divide the jobs b/w multiple independent
connections.
> >
> > That argument seems like complete nonsense. You're confusing work
> > allocation strategy with the implementation technology for the multiple
> > working threads. I see no reason why a good allocation strategy
couldn't
> > work with either approach; indeed, I think it would likely be easier to
> > do some things *without* client-side physical parallelism, because that
> > makes it much simpler to handle feedback between the results of
different
> > operational threads.
>
> So you would have one initial connection, which generates a task list;
> then open N libpq connections. Launch one vacuum on each, and then
> sleep on select() on the three sockets. Whenever one returns
> read-ready, the vacuuming is done and we send another item from the task
> list. Repeat until tasklist is empty. No need to fork anything.
>

Yeah, those are exactly my points. I think it would be significantly
simpler to do it that way, rather than forking and threading. And also
easier to make portable...

(and as a optimization on Alvaros suggestion, you can of course reuse the
initial connection as one of the workers as long as you got the full list
of tasks from it up front, which I think you do anyway in order to do
sorting of tasks...)

/Magnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-07-16 08:00:32 Re: Deadlocks in HS (on 9.0 :( )
Previous Message Heikki Linnakangas 2014-07-16 06:26:25 Re: Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)