Re: Parallel Seq Scan

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parallel Seq Scan
Date: 2014-12-19 19:26:28
Message-ID: 54947BE4.8080900@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20/12/14 03:54, Heikki Linnakangas wrote:
> On 12/19/2014 04:39 PM, Stephen Frost wrote:
>> * Marko Tiikkaja (marko(at)joh(dot)to) wrote:
>>> On 12/19/14 3:27 PM, Stephen Frost wrote:
>>>> We'd have to coach our users to
>>>> constantly be tweaking the enable_parallel_query (or whatever) option
>>>> for the queries where it helps and turning it off for others. I'm not
>>>> so excited about that.
>>>
>>> I'd be perfectly (that means 100%) happy if it just defaulted to
>>> off, but I could turn it up to 11 whenever I needed it. I don't
>>> believe to be the only one with this opinion, either.
>>
>> Perhaps we should reconsider our general position on hints then and
>> add them so users can define the plan to be used.. For my part, I don't
>> see this as all that much different.
>>
>> Consider if we were just adding HashJoin support today as an example.
>> Would we be happy if we had to default to enable_hashjoin = off? Or if
>> users had to do that regularly because our costing was horrid? It's bad
>> enough that we have to resort to those tweaks today in rare cases.
>
> This is somewhat different. Imagine that we achieve perfect
> parallelization, so that when you set enable_parallel_query=8, every
> query runs exactly 8x faster on an 8-core system, by using all eight
> cores.
>
> Now, you might still want to turn parallelization off, or at least set
> it to a lower setting, on an OLTP system. You might not want a single
> query to hog all CPUs to run one query faster; you'd want to leave
> some for other queries. In particular, if you run a mix of short
> transactions, and some background-like tasks that run for minutes or
> hours, you do not want to starve the short transactions by giving all
> eight CPUs to the background task.
>
> Admittedly, this is a rather crude knob to tune for such things,
> but it's quite intuitive to a DBA: how many CPU cores is one query
> allowed to utilize? And we don't really have anything better.
>
> In real life, there's always some overhead to parallelization, so that
> even if you can make one query run faster by doing it, you might hurt
> overall throughput. To some extent, it's a latency vs. throughput
> tradeoff, and it's quite reasonable to have a GUC for that because
> people have different priorities.
>
> - Heikki
>
>
>
How about 3 numbers:

minCPUs # > 0
maxCPUs # >= minCPUs
fractionOfCPUs # rounded up

If you just have the /*number*/ of CPUs then a setting that is
appropriate for quad core, may be too /*small*/ for an octo core processor.

If you just have the /*fraction*/ of CPUs then a setting that is
appropriate for quad core, may be too /*large*/ for an octo core processor.

Cheers,
Gavin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-12-19 19:49:29 Re: Parallel Seq Scan
Previous Message Bruce Momjian 2014-12-19 18:31:17 Re: Commitfest problems