Re: Initial prefetch performance testing

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial prefetch performance testing
Date: 2008-09-22 17:09:01
Message-ID: 8763onvzyp.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


[resending due to the attachment being too large for the -hackers list --
weren't we going to raise it when we killed -patches?]

Greg Smith <gsmith(at)gregsmith(dot)com> writes:

> Using the maximum prefetch working set tested, 8192, here's the speedup
> multiplier on this benchmark for both sorted and unsorted requests using a 8GB
> file:
>
> OS Spindles Unsorted X Sorted X
> 1:Linux 1 2.3 2.1
> 2:Linux 1 1.5 1.0
> 3:Solaris 1 2.6 3.0
> 4:Linux 3 6.3 2.8
> 5:Linux (Stark) 3 5.3 3.6
> 6:Linux 10 5.4 4.9
> 7:Solaris* 48 16.9 9.2

Incidentally I've been looking primarily at the sorted numbers because they
parallel bitmap heap scans. (Note that the heap scan is only about half the
i/o of a bitmap index scan + heap scan so even if it's infinitely faster it'll
only halve the time spent in the two nodes.)

Hm, I'm disappointed with the 48-drive array here. I wonder why it maxed out
at only 10x the bandwidth of one drive. I would expect more like 24x or more.
I wonder if Solaris's aio has an internal limit on how many pending i/o
requests it can handle. Perhaps it's a tunable?

Unfortunately I don't see a convenient low-invasive way to integrate aio into
Postgres. posix_fadvise we can just issue the advice and then forget about it.
But aio we would pretty much have to pick a target buffer, pin it, issue the
aio and then remember the pin later when we need to read the buffer. That
would require restructuring the code significantly. I'm quite surprised
Solaris doesn't support posix_fadvise -- perhaps it's in some other version of
Solaris?

Here's a graph of results from this program for various sized arrays on a
single machine:

http://wiki.postgresql.org/images/a/a3/Results.svg

Each colour corresponds to an array of a different number of spindles ranging
from 1 to 15 drives. The X axis is how much prefetching was done and the Y
axis is the bandwidth obtained.

There is a distinct maximum and then dropoff and it would be great to get some
data points for larger arrays to understand where that maximum goes as the
array gets larger.

> Conclusion: on all the systems I tested on, this approach gave excellent
> results, which makes me feel confident that I should see a corresponding
> speedup on database-level tests that use this same basic technique. I'm not
> sure whether it might make sense to bundle this test program up somehow so
> others can use it for similar compatibility tests (I'm thinking of something
> similar to contrib/test_fsync), will revisit that after the rest of the review.
>
> Next step: I've got two data sets (one generated, one real-world sample) that
> should demonstrate a useful heap scan prefetch speedup, and one test program I
> think will demonstrate whether the sequential scan prefetch code works right.
> Now that I've vetted all the hardware/OS combinations I hope I can squeeze that
> in this week, I don't need to test all of them now that I know which are the
> interesting systems.

I have an updated patch I'll be sending along shortly. You might want to test
with that?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2008-09-22 17:13:09 Re: Where to Host Project
Previous Message Stefan Kaltenbrunner 2008-09-22 17:08:16 Re: Where to Host Project