Re: There's random access and then there's random access

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: There's random access and then there's random access
Date: 2007-12-11 14:32:26
Message-ID: 8763z5xpxh.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Gregory Stark" <stark(at)enterprisedb(dot)com> writes:

> I think this will be easiest to do for bitmap index scans. Since we gather up
> all the pages we'll need before starting the heap scan we can easily skim
> through them, issue posix_fadvises for at least a certain number ahead of the
> actual read point and then proceed with the rest of the scan unchanged.

I've written up a simple test implementation of prefetching using
posix_fadvise(). Here are some nice results on a query accessing 1,000 records
from a 10G table with 300 million records:

postgres=# set preread_pages=0; explain analyze select (select count(*) from h where h = any (x)) from (select random_array(1000,1,300000000) as x)x;
SET
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan x (cost=0.00..115.69 rows=1 width=32) (actual time=6069.505..6069.509 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.061 rows=1 loops=1)
SubPlan
-> Aggregate (cost=115.66..115.67 rows=1 width=0) (actual time=6069.425..6069.426 rows=1 loops=1)
-> Bitmap Heap Scan on h (cost=75.49..115.63 rows=10 width=0) (actual time=3543.107..6068.335 rows=1000 loops=1)
Recheck Cond: (h = ANY ($0))
-> Bitmap Index Scan on hi (cost=0.00..75.49 rows=10 width=0) (actual time=3542.220..3542.220 rows=1000 loops=1)
Index Cond: (h = ANY ($0))
Total runtime: 6069.632 ms
(9 rows)

postgres=# set preread_pages=300; explain analyze select (select count(*) from h where h = any (x)) from (select random_array(1000,1,300000000) as x)x;
SET
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan x (cost=0.00..115.69 rows=1 width=32) (actual time=3945.602..3945.607 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.060..0.064 rows=1 loops=1)
SubPlan
-> Aggregate (cost=115.66..115.67 rows=1 width=0) (actual time=3945.520..3945.521 rows=1 loops=1)
-> Bitmap Heap Scan on h (cost=75.49..115.63 rows=10 width=0) (actual time=3505.546..3944.817 rows=1000 loops=1)
Recheck Cond: (h = ANY ($0))
-> Bitmap Index Scan on hi (cost=0.00..75.49 rows=10 width=0) (actual time=3452.759..3452.759 rows=1000 loops=1)
Index Cond: (h = ANY ($0))
Total runtime: 3945.730 ms
(9 rows)

Note that while the query itself is only 50% faster the bitmap heap scan
specifically is actually 575% faster than without readahead.

It would be nice to optimize the bitmap index scan as well but that will be a
bit trickier and it probably won't be able to cover as many pages. As a result
it probably won't be a 5x speedup like the heap scan.

Also, this is with a fairly aggressive readahead which only makes sense for
queries that look a lot like this and will read all the tuples. For a more
general solution I think it would make sense to water down the performance a
bit in exchange for some protection against doing unnecessary I/O in cases
where the query isn't actually going to read all the tuples.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2007-12-11 14:33:53 Re: VACUUM ANALYZE out of memory
Previous Message Michael Akinde 2007-12-11 14:18:54 Re: VACUUM ANALYZE out of memory