Re: modeling parallel contention (was: Parallel Append implementation)

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: modeling parallel contention (was: Parallel Append implementation)
Date: 2017-05-05 02:54:05
Message-ID: CAEepm=1KoDvMPGussumtdW=eaj5G_ZANfpBiwionVyCY36Zf5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 5, 2017 at 2:23 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 5 May 2017 at 13:37, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2017-05-02 15:13:58 -0400, Robert Haas wrote:
>>> Multiple people (including David Rowley
>>> as well as folks here at EnterpriseDB) have demonstrated that for
>>> certain queries, we can actually use a lot more workers and everything
>>> works great. The problem is that for other queries, using a lot of
>>> workers works terribly. The planner doesn't know how to figure out
>>> which it'll be - and honestly, I don't either.
>>
>> Have those benchmarks, even in a very informal form, been shared /
>> collected / referenced centrally? I'd be very interested to know where
>> the different contention points are. Possibilities:
>
> I posted mine on [1], although the post does not go into much detail
> about the contention points. I only really briefly mention it at the
> end.

Just for fun, check out pages 42 and 43 of Wei Hong's thesis. He
worked on Berkeley POSTGRES parallel query and a spin-off called XPRS,
and they got linear seq scan scaling up to number of spindles:

http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf

It gather from flicking through the POSTGRES 4.2 sources and this
stuff about XPRS that they switched from a "launch N workers!" model
to a "generate tasks and schedule them" model somewhere between these
systems. Chapters 2 and 3 cover the problem of avoiding excessive
parallelism that reduces performance adjusting dynamically to maximum
throughput. I suspect we're going that way too at some point, and it
would certainly fix some problems I ran into with Parallel Shared
Hash.

XPRS's cost model included resource consumption, not just 'timerons'.
This is something I grappled with when trying to put a price tag on
Parallel Shared Hash plans where just one worker builds the hash table
while the others wait. I removed that plan from the patch because it
became mostly redundant, but when it was there Postgres thought it was
the same cost as a plan where every worker hammers your system
building the same hash table, whereas XPRS would have considered such
a plan ludicrously expensive (depending on his 'w' term, see page 28,
which determines whether you care more about resource usage or
response time).

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2017-05-05 02:57:21 Re: CTE inlining
Previous Message Andres Freund 2017-05-05 02:48:43 Re: modeling parallel contention (was: Parallel Append implementation)