Re: New style of hash join proposal

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New style of hash join proposal
Date: 2008-03-17 13:14:06
Message-ID: 87ve3lcx29.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> We currently execute a lot of joins as Nested Loops which would be more
>> efficient if we could batch together all the outer keys and execute a single
>> inner bitmap index scan for all of them together.
>
> Please give an example of what you're talking about that you think we
> can't do now.

So basically given a simple join like:

postgres=# explain analyze select * from a where i in (select i from b);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Hash IN Join (cost=3.25..181.75 rows=100 width=4) (actual time=0.704..44.262 rows=100 loops=1)
Hash Cond: (a.i = b.i)
-> Seq Scan on a (cost=0.00..140.00 rows=10000 width=4) (actual time=0.034..20.864 rows=10000 loops=1)
-> Hash (cost=2.00..2.00 rows=100 width=4) (actual time=0.530..0.530 rows=100 loops=1)
-> Seq Scan on b (cost=0.00..2.00 rows=100 width=4) (actual time=0.013..0.236 rows=100 loops=1)
Total runtime: 44.550 ms
(6 rows)

Note that we're doing a full sequential scan of "a" even though we've already
finished hashing "b" and know full well which keys we'll need. If we have an
index on "a" and "b" is sufficiently smaller than "a", as in this case, then
we could do a bitmap index scan on "a" and pull out just those keys.

In a simple case like this you could hack it by doing a query like:

postgres=# explain analyze select * from a where i = any (array(select i from b));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on a (cost=40.59..64.01 rows=10 width=4) (actual time=2.193..2.535 rows=100 loops=1)
Recheck Cond: (i = ANY ($0))
InitPlan
-> Seq Scan on b (cost=0.00..2.00 rows=100 width=4) (actual time=0.024..0.279 rows=100 loops=1)
-> Bitmap Index Scan on ai (cost=0.00..38.59 rows=10 width=0) (actual time=2.155..2.155 rows=100 loops=1)
Index Cond: (i = ANY ($0))
Total runtime: 2.829 ms
(7 rows)

This is effectively an equivalent plan but it runs 20x faster.

But constructing an array is pointless, we could just do a hash_seq_search
directly and in any case I'm not sure it's always possible to transform the
query in this way.

I was thinking we could pass the hash itself as a parameter to the bitmap
index scan kind of like how we pass the bitmap structures up from bitmap index
scans to bitmap heap scans and get a plan like:

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Hash IN Join
Hash Cond: (a.i = b.i)
-> Bitmap Heap Scan on a
Recheck Cond: (a.i = ANY ($0))
-> Bitmap Index Scan on ai
Index Cond: (i = ANY ($0))
-> Hash
-> Seq Scan on b

The really promising thing about this is it would actually simplify the work
in the case where the hash overflows. Instead of having to set aside tuples
for subsequent batches we would know we got all the matching records in the
index scan. So we can throw out the hash and start a new one for each batch.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-03-17 13:16:59 Re: Remove hacks for old bad qsort() implementations?
Previous Message Mike Aubury 2008-03-17 11:26:59 Request for feature - ECPGget_PGconn