Quick Links

Re: New style of hash join proposal

Lists:	pgsql-hackers

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	New style of hash join proposal
Date:	2007-12-13 13:13:33
Message-ID:	87y7by4u0y.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

We currently execute a lot of joins as Nested Loops which would be more
efficient if we could batch together all the outer keys and execute a single
inner bitmap index scan for all of them together.

Essentially what I'm saying is that we're missing a trick with Hash Joins
which currently require that we can execute the inner side once without any
parameters from the outer side.

Instead what we could do is build up the hash table, then scan the hash table
building up an array of keys and pass them as a parameter to the inner side.
The inner side could do a bitmap index scan to fetch them all at once and
start returning them just as normal to the hash join.

There are a couple details:

1) Batched hash joins. Actually I think this would be fairly straightforward.
You want to rescan the inner side once for each batch. That would actually
be easier than what we currently do with saving tuples to files and all
that.

2) How to pass the keys. This could be a bit tricky especially for
multi-column keys. My first thought was to build up an actually Array node
but that only really works for single-column keys I think. Besides it would
be more efficient to somehow arrange to pass over a reference to the whole
hash.

I fear the real complexity would be (as always) in the planner rather than the
executor. I haven't really looked into what it would take to arrange this or
how to decide when to do it.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

From:	Decibel! <decibel(at)decibel(dot)org>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2007-12-20 00:24:33
Message-ID:	2E697850-E210-46F4-8602-BA68E33E5D29@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Dec 13, 2007, at 7:13 AM, Gregory Stark wrote:
> We currently execute a lot of joins as Nested Loops which would be
> more
> efficient if we could batch together all the outer keys and execute
> a single
> inner bitmap index scan for all of them together.
>
> Essentially what I'm saying is that we're missing a trick with Hash
> Joins
> which currently require that we can execute the inner side once
> without any
> parameters from the outer side.
>
> Instead what we could do is build up the hash table, then scan the
> hash table
> building up an array of keys and pass them as a parameter to the
> inner side.
> The inner side could do a bitmap index scan to fetch them all at
> once and
> start returning them just as normal to the hash join.
>
> There are a couple details:
>
> 1) Batched hash joins. Actually I think this would be fairly
> straightforward.
> You want to rescan the inner side once for each batch. That
> would actually
> be easier than what we currently do with saving tuples to files
> and all
> that.
>
> 2) How to pass the keys. This could be a bit tricky especially for
> multi-column keys. My first thought was to build up an actually
> Array node
> but that only really works for single-column keys I think.
> Besides it would
> be more efficient to somehow arrange to pass over a reference to
> the whole
> hash.
>
> I fear the real complexity would be (as always) in the planner
> rather than the
> executor. I haven't really looked into what it would take to
> arrange this or
> how to decide when to do it.

TODO?
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	"Decibel!" <decibel(at)decibel(dot)org>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2007-12-22 02:50:08
Message-ID:	200712220250.lBM2o8Y07366@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Decibel! wrote:
> > I fear the real complexity would be (as always) in the planner
> > rather than the
> > executor. I haven't really looked into what it would take to
> > arrange this or
> > how to decide when to do it.
>
> TODO?

This email was added to the 8.4 queue:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Once we are in 8.4 development we can decide if we are going to
implement it or add it to the TODO list.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 02:34:16
Message-ID:	21264.1205721256@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> We currently execute a lot of joins as Nested Loops which would be more
> efficient if we could batch together all the outer keys and execute a single
> inner bitmap index scan for all of them together.

Please give an example of what you're talking about that you think we
can't do now.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 13:14:06
Message-ID:	87ve3lcx29.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> We currently execute a lot of joins as Nested Loops which would be more
>> efficient if we could batch together all the outer keys and execute a single
>> inner bitmap index scan for all of them together.
>
> Please give an example of what you're talking about that you think we
> can't do now.

So basically given a simple join like:

postgres=# explain analyze select * from a where i in (select i from b);
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Hash IN Join (cost=3.25..181.75 rows=100 width=4) (actual time=0.704..44.262 rows=100 loops=1)
Hash Cond: (a.i = b.i)
-> Seq Scan on a (cost=0.00..140.00 rows=10000 width=4) (actual time=0.034..20.864 rows=10000 loops=1)
-> Hash (cost=2.00..2.00 rows=100 width=4) (actual time=0.530..0.530 rows=100 loops=1)
-> Seq Scan on b (cost=0.00..2.00 rows=100 width=4) (actual time=0.013..0.236 rows=100 loops=1)
Total runtime: 44.550 ms
(6 rows)

Note that we're doing a full sequential scan of "a" even though we've already
finished hashing "b" and know full well which keys we'll need. If we have an
index on "a" and "b" is sufficiently smaller than "a", as in this case, then
we could do a bitmap index scan on "a" and pull out just those keys.

In a simple case like this you could hack it by doing a query like:

postgres=# explain analyze select * from a where i = any (array(select i from b));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on a (cost=40.59..64.01 rows=10 width=4) (actual time=2.193..2.535 rows=100 loops=1)
Recheck Cond: (i = ANY ($0))
InitPlan
-> Seq Scan on b (cost=0.00..2.00 rows=100 width=4) (actual time=0.024..0.279 rows=100 loops=1)
-> Bitmap Index Scan on ai (cost=0.00..38.59 rows=10 width=0) (actual time=2.155..2.155 rows=100 loops=1)
Index Cond: (i = ANY ($0))
Total runtime: 2.829 ms
(7 rows)

This is effectively an equivalent plan but it runs 20x faster.

But constructing an array is pointless, we could just do a hash_seq_search
directly and in any case I'm not sure it's always possible to transform the
query in this way.

I was thinking we could pass the hash itself as a parameter to the bitmap
index scan kind of like how we pass the bitmap structures up from bitmap index
scans to bitmap heap scans and get a plan like:

QUERY PLAN
------------------------------------------------------------------------------------------------------------
Hash IN Join
Hash Cond: (a.i = b.i)
-> Bitmap Heap Scan on a
Recheck Cond: (a.i = ANY ($0))
-> Bitmap Index Scan on ai
Index Cond: (i = ANY ($0))
-> Hash
-> Seq Scan on b

The really promising thing about this is it would actually simplify the work
in the case where the hash overflows. Instead of having to set aside tuples
for subsequent batches we would know we got all the matching records in the
index scan. So we can throw out the hash and start a new one for each batch.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 14:41:33
Message-ID:	9251.1205764893@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Please give an example of what you're talking about that you think we
>> can't do now.

> Note that we're doing a full sequential scan of "a" even though we've already
> finished hashing "b" and know full well which keys we'll need. If we have an
> index on "a" and "b" is sufficiently smaller than "a", as in this case, then
> we could do a bitmap index scan on "a" and pull out just those keys.

You mean like this?

regression=# explain select * from tenk1 a where unique1 in (select f1 from int4_tbl b);
QUERY PLAN
-------------------------------------------------------------------------------------
Nested Loop (cost=1.06..42.52 rows=5 width=244)
-> HashAggregate (cost=1.06..1.11 rows=5 width=4)
-> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..8.27 rows=1 width=244)
Index Cond: (a.unique1 = b.f1)
(5 rows)

In the example you give, this type of plan was rejected because there
were too many rows in the subplan (or so I suppose anyway; you might
play around with the cost constants and see what happens).

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 16:01:26
Message-ID:	877ig1cpbd.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> Please give an example of what you're talking about that you think we
>>> can't do now.
>
>> Note that we're doing a full sequential scan of "a" even though we've already
>> finished hashing "b" and know full well which keys we'll need. If we have an
>> index on "a" and "b" is sufficiently smaller than "a", as in this case, then
>> we could do a bitmap index scan on "a" and pull out just those keys.
>
> You mean like this?
>
> regression=# explain select * from tenk1 a where unique1 in (select f1 from int4_tbl b);
> QUERY PLAN
> -------------------------------------------------------------------------------------
> Nested Loop (cost=1.06..42.52 rows=5 width=244)
> -> HashAggregate (cost=1.06..1.11 rows=5 width=4)
> -> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
> -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..8.27 rows=1 width=244)
> Index Cond: (a.unique1 = b.f1)
> (5 rows)
>
> In the example you give, this type of plan was rejected because there
> were too many rows in the subplan (or so I suppose anyway; you might
> play around with the cost constants and see what happens).

Sort of, except using a bitmap index scan which would be precisely what allows
it to work for much larger sets of records.

It doesn't have to be an IN join either. Hash joins are used for normal joins
for even quite large sets of records. If the join is very selective then a
nested loop is fine, and if it covers most of the table then a sequential scan
might be fine. But in many cases you're dealing with a smallish percentage of
a very large table. That's what bitmap index scans are tailor made for.

Cases like:

select * from invoice join invoice_detail on (invoice_id) where invoice.quarter='Q4'

Is going to pull out thousands of invoices from the invoice table, then want
to pull out all the matching invoice_detail records from the invoice_detail
table. It'll probably be 5-10% of the invoice_detail table making a sequential
scan a big waste but enough records that a nested loop with a plain index scan
will be very slow. Worse, if it does the hash join there may be enough
invoices to force the hash join to work in batches.

It would be ideal if it could scan the invoices using an index, toss them all
in a hash, then do a bitmap index scan to pull out all the matching detail
records. If there are multiple batches it can start a whole new index scan for
the each of the batches.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 16:52:18
Message-ID:	87tzj5b8e5.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Gregory Stark" <stark(at)enterprisedb(dot)com> writes:

> It would be ideal if it could scan the invoices using an index, toss them all
> in a hash, then do a bitmap index scan to pull out all the matching detail
> records. If there are multiple batches it can start a whole new index scan for
> the each of the batches.

A more general solution to this would be to find a way to tackle the general
problem of postponing the heap lookups until we really need columns which
aren't present in the index keys.

So something like the aforementioned

select * from invoice join invoice_detail on (invoice_id) where invoice.quarter='Q4'

could be done doing something like

Heap Scan on invoice_detail
-> Heap Scan on invoice
-> Nested Loop
-> Index Scan on invoice_quarter
Index Cond: (quarter='Q4')
-> Index Scan on pk_invoice_detail
Index Cond: (invoice_id = $0)

But that would be a much more wide-ranging change. And it would still not be
sequential unless we do extra work to sort the index tuples by tid.

There would be plenty of fiddly bits around which paths it's safe to execute
prior to checking the visibility as well. Obviously the visibility would have
to be checked before things like Unique or Aggregate nodes.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 17:05:47
Message-ID:	14662.1205773547@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> It would be ideal if it could scan the invoices using an index, toss them all
> in a hash, then do a bitmap index scan to pull out all the matching detail
> records. If there are multiple batches it can start a whole new index scan for
> the each of the batches.

I don't understand which part of "we can do that now" isn't clear to you.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 17:38:38
Message-ID:	87prttb68x.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> It would be ideal if it could scan the invoices using an index, toss them all
>> in a hash, then do a bitmap index scan to pull out all the matching detail
>> records. If there are multiple batches it can start a whole new index scan for
>> the each of the batches.
>
> I don't understand which part of "we can do that now" isn't clear to you.

Uh, except we can't.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 19:43:32
Message-ID:	25307.1205783012@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> I don't understand which part of "we can do that now" isn't clear to you.

> Uh, except we can't.

I already demonstrated that we could. If the problem is that the
planner is cutting over from one plan type to the other at the wrong
rowcount because of bad cost estimates, that's a different issue
altogether.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 21:00:17
Message-ID:	87d4ptawwu.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> I don't understand which part of "we can do that now" isn't clear to you.
>
>> Uh, except we can't.
>
> I already demonstrated that we could. If the problem is that the
> planner is cutting over from one plan type to the other at the wrong
> rowcount because of bad cost estimates, that's a different issue
> altogether.

We seem to be talking past each other. The plan you showed is analogous but
using a plain old index scan. That means it has to look up each matching
record in a separate index scan, potentially repeatedly. A bitmap index scan
would be able to look up precisely the elements in the hash in sequential
order and precisely once.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-17 22:32:57
Message-ID:	116.1205793177@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> I already demonstrated that we could.

> We seem to be talking past each other. The plan you showed is analogous but
> using a plain old index scan.

That's only because that seemed like the appropriate thing for the given
case's statistics. [ fiddles with example... ]

regression=# explain select * from tenk1 a where thousand in (select f1 from int4_tbl b);
QUERY PLAN
------------------------------------------------------------------------------------------
Nested Loop (cost=5.39..198.81 rows=51 width=244)
-> HashAggregate (cost=1.06..1.11 rows=5 width=4)
-> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
-> Bitmap Heap Scan on tenk1 a (cost=4.33..39.41 rows=10 width=244)
Recheck Cond: (a.thousand = b.f1)
-> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..4.33 rows=10 width=0)
Index Cond: (a.thousand = b.f1)
(7 rows)

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-18 00:23:06
Message-ID:	878x0gc239.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> I already demonstrated that we could.
>
>> We seem to be talking past each other. The plan you showed is analogous but
>> using a plain old index scan.
>
> That's only because that seemed like the appropriate thing for the given
> case's statistics. [ fiddles with example... ]
>
> regression=# explain select * from tenk1 a where thousand in (select f1 from int4_tbl b);
> QUERY PLAN
> ------------------------------------------------------------------------------------------
> Nested Loop (cost=5.39..198.81 rows=51 width=244)
> -> HashAggregate (cost=1.06..1.11 rows=5 width=4)
> -> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
> -> Bitmap Heap Scan on tenk1 a (cost=4.33..39.41 rows=10 width=244)
> Recheck Cond: (a.thousand = b.f1)
> -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..4.33 rows=10 width=0)
> Index Cond: (a.thousand = b.f1)
> (7 rows)

Sure, but that's still re-executing the bitmap index scan 51 times -- possibly
having to fetch the same records off disk repeatedly. Avoiding that is kind of
the point behind the hash join plan after all.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-18 00:58:33
Message-ID:	4947.1205801913@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Nested Loop (cost=5.39..198.81 rows=51 width=244)
>> -> HashAggregate (cost=1.06..1.11 rows=5 width=4)
>> -> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
>> -> Bitmap Heap Scan on tenk1 a (cost=4.33..39.41 rows=10 width=244)
>> Recheck Cond: (a.thousand = b.f1)
>> -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..4.33 rows=10 width=0)
>> Index Cond: (a.thousand = b.f1)
>> (7 rows)

> Sure, but that's still re-executing the bitmap index scan 51 times -- possibly
> having to fetch the same records off disk repeatedly.

It's not fetching any record repeatedly, because the HashAggregate
step eliminated duplicate keys on the other side.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-03-18 03:41:01
Message-ID:	87myowaecy.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> Nested Loop (cost=5.39..198.81 rows=51 width=244)
>>> -> HashAggregate (cost=1.06..1.11 rows=5 width=4)
>>> -> Seq Scan on int4_tbl b (cost=0.00..1.05 rows=5 width=4)
>>> -> Bitmap Heap Scan on tenk1 a (cost=4.33..39.41 rows=10 width=244)
>>> Recheck Cond: (a.thousand = b.f1)
>>> -> Bitmap Index Scan on tenk1_thous_tenthous (cost=0.00..4.33 rows=10 width=0)
>>> Index Cond: (a.thousand = b.f1)
>>> (7 rows)
>
>> Sure, but that's still re-executing the bitmap index scan 51 times -- possibly
>> having to fetch the same records off disk repeatedly.

sorry 5 times

> It's not fetching any record repeatedly, because the HashAggregate
> step eliminated duplicate keys on the other side.

Only because it's an IN query. If it was a normal join which hash joins are
perfectly capable of handling then it would be. As it is it could be fetching
the same page repeatedly but not precisely the same tuples.

It happens that transforming this query to

explain analyze select * from tenk1 a where thousand = any (array(select f1 from int4_tbl b));

makes it run about 40% faster. That could just be that the hash is small
enough that a linear search is more efficient than calling hashint4 though.
(Perhaps we should be protecting against that in dynahash, actually)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-04-06 23:53:51
Message-ID:	200804062353.m36Nrp517194@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> We currently execute a lot of joins as Nested Loops which would be more
> efficient if we could batch together all the outer keys and execute a single
> inner bitmap index scan for all of them together.
>
> Essentially what I'm saying is that we're missing a trick with Hash Joins
> which currently require that we can execute the inner side once without any
> parameters from the outer side.
>
> Instead what we could do is build up the hash table, then scan the hash table
> building up an array of keys and pass them as a parameter to the inner side.
> The inner side could do a bitmap index scan to fetch them all at once and
> start returning them just as normal to the hash join.
>
> There are a couple details:
>
> 1) Batched hash joins. Actually I think this would be fairly straightforward.
> You want to rescan the inner side once for each batch. That would actually
> be easier than what we currently do with saving tuples to files and all
> that.
>
> 2) How to pass the keys. This could be a bit tricky especially for
> multi-column keys. My first thought was to build up an actually Array node
> but that only really works for single-column keys I think. Besides it would
> be more efficient to somehow arrange to pass over a reference to the whole
> hash.
>
> I fear the real complexity would be (as always) in the planner rather than the
> executor. I haven't really looked into what it would take to arrange this or
> how to decide when to do it.

If the scanning of the inner side is a performance problem, why would we
be choosing a nested loop in the first place, vs. another type of join?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Bruce Momjian" <bruce(at)momjian(dot)us>
Cc:	"pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New style of hash join proposal
Date:	2008-04-07 09:30:48
Message-ID:	878wzqovuf.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> If the scanning of the inner side is a performance problem, why would we
> be choosing a nested loop in the first place, vs. another type of join?

I clearly haven't done a good job explaining this as nobody seems to getting
what I'm describing. I think I'm better off focusing on the patches I've
already started rather than starting yet another project though so perhaps I
should put this aside until I can construct a good demonstration.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!