From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: modeling parallel contention (was: Parallel Append implementation) |
Date: | 2017-05-05 02:54:05 |
Message-ID: | CAEepm=1KoDvMPGussumtdW=eaj5G_ZANfpBiwionVyCY36Zf5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, May 5, 2017 at 2:23 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 5 May 2017 at 13:37, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> On 2017-05-02 15:13:58 -0400, Robert Haas wrote:
>>> Multiple people (including David Rowley
>>> as well as folks here at EnterpriseDB) have demonstrated that for
>>> certain queries, we can actually use a lot more workers and everything
>>> works great. The problem is that for other queries, using a lot of
>>> workers works terribly. The planner doesn't know how to figure out
>>> which it'll be - and honestly, I don't either.
>>
>> Have those benchmarks, even in a very informal form, been shared /
>> collected / referenced centrally? I'd be very interested to know where
>> the different contention points are. Possibilities:
>
> I posted mine on [1], although the post does not go into much detail
> about the contention points. I only really briefly mention it at the
> end.
Just for fun, check out pages 42 and 43 of Wei Hong's thesis. He
worked on Berkeley POSTGRES parallel query and a spin-off called XPRS,
and they got linear seq scan scaling up to number of spindles:
http://db.cs.berkeley.edu/papers/ERL-M93-28.pdf
It gather from flicking through the POSTGRES 4.2 sources and this
stuff about XPRS that they switched from a "launch N workers!" model
to a "generate tasks and schedule them" model somewhere between these
systems. Chapters 2 and 3 cover the problem of avoiding excessive
parallelism that reduces performance adjusting dynamically to maximum
throughput. I suspect we're going that way too at some point, and it
would certainly fix some problems I ran into with Parallel Shared
Hash.
XPRS's cost model included resource consumption, not just 'timerons'.
This is something I grappled with when trying to put a price tag on
Parallel Shared Hash plans where just one worker builds the hash table
while the others wait. I removed that plan from the patch because it
became mostly redundant, but when it was there Postgres thought it was
the same cost as a plan where every worker hammers your system
building the same hash table, whereas XPRS would have considered such
a plan ludicrously expensive (depending on his 'w' term, see page 28,
which determines whether you care more about resource usage or
response time).
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Joe Conway | 2017-05-05 02:57:21 | Re: CTE inlining |
Previous Message | Andres Freund | 2017-05-05 02:48:43 | Re: modeling parallel contention (was: Parallel Append implementation) |