Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, John Gorman <johngorman2(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-23 11:42:51
Message-ID: CAA4eK1+B=c6rNNTNFcap=QXeCaEeDijqdz6dwdrdcD-T58b7ig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 22, 2015 at 7:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Jan 22, 2015 at 5:57 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > 1. Scanning block-by-block has negative impact on performance and
> > I thin it will degrade more if we increase parallel count as that can
lead
> > to more randomness.
> >
> > 2. Scanning in fixed chunks improves the performance. Increasing
> > parallel count to a very large number might impact the performance,
> > but I think we can have a lower bound below which we will not allow
> > multiple processes to scan the relation.
>
> I'm confused. Your actual test numbers seem to show that the
> performance with the block-by-block approach was slightly higher with
> parallelism than without, where as the performance with the
> chunk-by-chunk approach was lower with parallelism than without, but
> the text quoted above, summarizing those numbers, says the opposite.
>
> Also, I think testing with 2 workers is probably not enough. I think
> we should test with 8 or even 16.
>

Below is the data with more number of workers, the amount of data and
other configurations remains as previous, I have only increased parallel
worker count:

*Block-By-Block*

*No. of workers/Time (ms)* *0* *2* *4* *8* *16* *24* *32* Run-1 257851
287353 350091 330193 284913 338001 295057 Run-2 263241 314083 342166 347337
378057 351916 348292 Run-3 315374 334208 389907 340327 328695 330048 330102
Run-4 301054 312790 314682 352835 323926 324042 302147 Run-5 304547 314171
349158 350191 350468 341219 281315

*Fixed-Chunks*

*No. of workers/Time (ms)* *0* *2* *4* *8* *16* *24* *32* Run-1 250536
266279 251263 234347 87930 50474 35474 Run-2 249587 230628 225648 193340
83036 35140 9100 Run-3 234963 220671 230002 256183 105382 62493 27903
Run-4 239111 245448 224057 189196 123780 63794 24746 Run-5 239937 222820
219025 220478 114007 77965 39766

The trend remains same although there is some variation.
In block-by-block approach, it performance dips (execution takes
more time) with more number of workers, though it stabilizes at
some higher value, still I feel it is random as it leads to random
scan.
In Fixed-chunk approach, the performance improves with more
number of workers especially at slightly higher worker count.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-01-23 14:38:32 Re: WITH CHECK and Column-Level Privileges
Previous Message Dilip kumar 2015-01-23 11:23:57 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]