Re: small table left outer join big table

Lists: pgsql-hackers
From: Jie Li <jay23jack(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: small table left outer join big table
Date: 2010-12-28 10:13:40
Message-ID: AANLkTikNKpOfM=OPzfMgi9_q2tyUVSSPA0vBYaJ2_mk4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Please see the following plan:

postgres=# explain select * from small_table left outer join big_table using
(id);
QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
Hash Cond: (small_table.id = big_table.id)
-> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
-> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
-> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
width=8)
(5 rows)

Here I have a puzzle, why not choose the small table to build hash table? It
can avoid multiple batches thus save significant I/O cost, isn't it?

We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can
be done with another flag bit

The only compromise is the output order, due to the two separate phases. Not
sure whether the SQL standard requires it.

Thanks,
Li Jie


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Jie Li <jay23jack(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-28 15:09:51
Message-ID: AANLkTi=_==09etYitq0Aiysa2BPScWZcq4WSq=mwm-3t@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:

> Hi,
>
> Please see the following plan:
>
> postgres=# explain select * from small_table left outer join big_table
> using (id);
> QUERY PLAN
>
>
> ----------------------------------------------------------------------------
> Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
> Hash Cond: (small_table.id = big_table.id)
> -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
> -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
> -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
> width=8)
> (5 rows)
>
> Here I have a puzzle, why not choose the small table to build hash table?
> It can avoid multiple batches thus save significant I/O cost, isn't it?
>
> We can perform this query in two phases:
> 1) inner join, using the small table to build hash table.
> 2) check whether each tuple in the hash table has matches before, which can
> be done with another flag bit
>
> The only compromise is the output order, due to the two separate phases.
> Not sure whether the SQL standard requires it.
>
>
SQL standard does not require the result to be in any particular order
unless an ORDER BY is used.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jie Li <jay23jack(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 12:17:17
Message-ID: AANLkTi=7=BP6fsHSO31C9x_RAFgSttXxac1vreYWdL6M@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
> Hi,
>
> Please see the following plan:
>
> postgres=# explain select * from small_table left outer join big_table using
> (id);
>                                  QUERY PLAN
> ----------------------------------------------------------------------------
>  Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
>    Hash Cond: (small_table.id = big_table.id)
>    ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
>    ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
>          ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000
> width=8)
> (5 rows)
>
> Here I have a puzzle, why not choose the small table to build hash table? It
> can avoid multiple batches thus save significant I/O cost, isn't it?

Yeah, you'd think. Can you post a full reproducible test case?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 12:34:05
Message-ID: 1293626045.1892.4409.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
> >
> > Here I have a puzzle, why not choose the small table to build hash table? It
> > can avoid multiple batches thus save significant I/O cost, isn't it?
>
> Yeah, you'd think. Can you post a full reproducible test case?

It's not a bug, that's the way it currently works. We don't need a test
case for that.

I agree that the optimisation would be a useful one.

It allows you to ask the query "Show me sales for each of my stores"
efficiently, rather than being forced to request the inner join query
"Show me the sales for each of my stores for which there have been
sales", which is a much less useful query.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 12:39:43
Message-ID: 1293626360-sup-1760@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
> > Hi,
> >
> > Please see the following plan:
> >
> > postgres=# explain select * from small_table left outer join big_table using
> > (id);
> >                                  QUERY PLAN
> > ----------------------------------------------------------------------------
> >  Hash Left Join  (cost=126408.00..142436.98 rows=371 width=12)
> >    Hash Cond: (small_table.id = big_table.id)
> >    ->  Seq Scan on small_table  (cost=0.00..1.09 rows=9 width=8)
> >    ->  Hash  (cost=59142.00..59142.00 rows=4100000 width=8)
> >          ->  Seq Scan on big_table  (cost=0.00..59142.00 rows=4100000
> > width=8)
> > (5 rows)
> >
> > Here I have a puzzle, why not choose the small table to build hash table? It
> > can avoid multiple batches thus save significant I/O cost, isn't it?
>
> Yeah, you'd think. Can you post a full reproducible test case?

Also, what version is this?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 12:59:19
Message-ID: AANLkTim77uhiyCkfTfUW2Az47EtM163ztTk1eSOC+nFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think.  Can you post a full reproducible test case?
>
> It's not a bug, that's the way it currently works. We don't need a test
> case for that.
>
> I agree that the optimisation would be a useful one.
>
> It allows you to ask the query "Show me sales for each of my stores"
> efficiently, rather than being forced to request the inner join query
> "Show me the sales for each of my stores for which there have been
> sales", which is a much less useful query.

Oh, you're right. I missed the fact that it's a left join.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Li Jie" <jay23jack(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 14:45:00
Message-ID: 005301cba766$fb00fd80$0801a8c0@A0078508
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

----- Original Message -----
From: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Jie Li" <jay23jack(at)gmail(dot)com>; "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, December 29, 2010 8:39 PM
Subject: Re: [HACKERS] small table left outer join big table

> Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
>> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
>> > Hi,
>> >
>> > Please see the following plan:
>> >
>> > postgres=# explain select * from small_table left outer join big_table using
>> > (id);
>> > QUERY PLAN
>> > ----------------------------------------------------------------------------
>> > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
>> > Hash Cond: (small_table.id = big_table.id)
>> > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
>> > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
>> > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
>> > width=8)
>> > (5 rows)
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think. Can you post a full reproducible test case?
>
> Also, what version is this?
>
> --
> Álvaro Herrera <alvherre(at)commandprompt(dot)com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support

The version is 9.0.1. I believe the latest version works in the same way.

Thanks,
Li Jie


From: "Li Jie" <jay23jack(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 14:59:16
Message-ID: 006801cba769$09ed6a70$0801a8c0@A0078508
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you for all your comments.

I think the condition of this optimization is whether the small table can fit into memory. If not, then it doesn't work since two tables still need to be written to disk. But if yes, we can save all I/O costs in the hash join process.

Thanks,
Li Jie

----- Original Message -----
From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Jie Li" <jay23jack(at)gmail(dot)com>; "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, December 29, 2010 8:59 PM
Subject: Re: [HACKERS] small table left outer join big table

On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think. Can you post a full reproducible test case?
>
> It's not a bug, that's the way it currently works. We don't need a test
> case for that.
>
> I agree that the optimisation would be a useful one.
>
> It allows you to ask the query "Show me sales for each of my stores"
> efficiently, rather than being forced to request the inner join query
> "Show me the sales for each of my stores for which there have been
> sales", which is a much less useful query.

Oh, you're right. I missed the fact that it's a left join.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 14:59:32
Message-ID: 9856.1293634772@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> It's not a bug, that's the way it currently works. We don't need a test
>> case for that.

> Oh, you're right. I missed the fact that it's a left join.

The only thing that struck me as curious about it was that the OP didn't
get a nestloop-with-inner-indexscan plan. That would be explainable if
there was no index on the large table's "id" column ... but columns
named like that usually have indexes.

I can't get all *that* excited about complicating hash joins as
proposed. The query is still fundamentally going to be slow because
you won't get out of having to seqscan the large table. The only way
to make it really fast is to not read all of the large table, and
nestloop-with-inner-indexscan is the only plan type with a hope of
doing that.

regards, tom lane


From: "Li Jie" <jay23jack(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 15:13:27
Message-ID: 006d01cba76a$f5190260$0801a8c0@A0078508
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>; "Jie Li" <jay23jack(at)gmail(dot)com>; "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, December 29, 2010 10:59 PM
Subject: Re: [HACKERS] small table left outer join big table

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> It's not a bug, that's the way it currently works. We don't need a test
>>> case for that.
>
>> Oh, you're right. I missed the fact that it's a left join.
>
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan. That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
>
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.
>
> regards, tom lane

Yes there is no index on the joined column, otherwise nestloop-with-inner-indexscan should be preferred.

But why can't outer join be as clever as inner join? Anyway, if we unfortunately don't have available index, we have no choice but rely on hash join, right?

Thanks,
Li Jie


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-29 20:58:18
Message-ID: 1293656298.1892.10087.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> It's not a bug, that's the way it currently works. We don't need a test
> >> case for that.
>
> > Oh, you're right. I missed the fact that it's a left join.
>
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan. That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
>
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.

Seq scanning the big table isn't bad... we've gone to a lot of trouble
to make it easy to do this, especially with many users.

Maintaining many large indexes is definitely bad, all that random I/O is
going to suck badly.

Seems like an interesting and relatively optimisation to me. Not sure if
this is a request for feature, or a proposal to write the optimisation.
I hope its the latter.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Jie Li <jay23jack(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-30 01:54:10
Message-ID: AANLkTin2naZLSc8imUXUUP9F9tE=bQh3aBN+3-fOsQTD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 29, 2010 at 3:58 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > >> It's not a bug, that's the way it currently works. We don't need a
> test
> > >> case for that.
> >
> > > Oh, you're right. I missed the fact that it's a left join.
> >
> > The only thing that struck me as curious about it was that the OP didn't
> > get a nestloop-with-inner-indexscan plan. That would be explainable if
> > there was no index on the large table's "id" column ... but columns
> > named like that usually have indexes.
> >
> > I can't get all *that* excited about complicating hash joins as
> > proposed. The query is still fundamentally going to be slow because
> > you won't get out of having to seqscan the large table. The only way
> > to make it really fast is to not read all of the large table, and
> > nestloop-with-inner-indexscan is the only plan type with a hope of
> > doing that.
>
> Seq scanning the big table isn't bad... we've gone to a lot of trouble
> to make it easy to do this, especially with many users.
>
> Maintaining many large indexes is definitely bad, all that random I/O is
> going to suck badly.
>
> Seems like an interesting and relatively optimisation to me. Not sure if
> this is a request for feature, or a proposal to write the optimisation.
> I hope its the latter.
>
>
Thanks for your comments. Yeah I'm excited to write code for PostgreSQL,
but I'm new here
and not familiar with the code routine or patch submission. I will try to
learn in near future. So
for the moment, it is a request for feature, and I'm looking forward to any
pgsql-hackers working
on this.

Thanks,
Li Jie


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jie Li <jay23jack(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2010-12-30 15:45:54
Message-ID: 16990.1293723954@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I had an epiphany about this topic, or actually two of them.

1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason. The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right. So we could
greatly reduce the scope of the dreaded "FULL JOIN is only supported
with merge-joinable join conditions" error. (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)

2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData. The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations. (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.) But we only need one bit, so what about commandeering
an infomask bit in the tuple itself? For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2. We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore. But
that seems like a kluge that could wait until we really need the flag
space.

Comments?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2010-12-30 15:50:16
Message-ID: AANLkTikcuSc9p1XDmN3ZE5YDtWvFF+1MU-LCjEVRVyBD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason.  The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right.  So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error.  (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)

Yeah, that would be neat. It might be a lot faster in some cases, too.

> 2. The obvious way to implement this would involve adding an extra bool
> field to struct HashJoinTupleData.  The difficulty with that, and the
> reason I'd been resistant to the whole idea, is that it'd eat up a full
> word per hashtable entry because of alignment considerations.  (On
> 64-bit machines it'd be free because of alignment considerations, but
> that's cold comfort when 32-bit machines are the ones pressed for
> address space.)  But we only need one bit, so what about commandeering
> an infomask bit in the tuple itself?  For the initial implementation
> I'd be inclined to take one of the free bits in t_infomask2.  We could
> actually get away with overlaying the flag bit with one of the tuple
> visibility bits, since it will only be used in tuples that are in the
> in-memory hash table, which don't need visibility info anymore.  But
> that seems like a kluge that could wait until we really need the flag
> space.

I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jie Li <jay23jack(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2010-12-30 16:20:36
Message-ID: AANLkTikfbkea=_uSR-XfWMo_hOAidL_yq7iE3zx06TTP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I had an epiphany about this topic, or actually two of them.
> >
> > 1. Whether or not you think there's a significant performance reason
> > to support hash right joins, there's a functionality reason. The
> > infrastructure for right join could just as easily do full joins.
> > And AFAICS, a hash full join would only require one hashable join
> > clause --- the other FULL JOIN ON conditions could be anything at all.
> > This is unlike the situation for merge join, where all the JOIN ON
> > conditions have to be mergeable or it doesn't work right. So we could
> > greatly reduce the scope of the dreaded "FULL JOIN is only supported
> > with merge-joinable join conditions" error. (Well, okay, it's not
> > *that* dreaded, but people complain about it occasionally.)
>
> Yeah, that would be neat. It might be a lot faster in some cases, too.
>

Yeah, PostgreSQL should have this great feature.

Actually Oracle 10g already has the right hash join,
http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html

And Oracle 11g has the full hash join.
http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm

Haven't checked whether other DBMS have this feature.

Thanks,
Li Jie


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2010-12-30 16:35:05
Message-ID: 18600.1293726905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... But we only need one bit, so what about commandeering
>> an infomask bit in the tuple itself? For the initial implementation
>> I'd be inclined to take one of the free bits in t_infomask2. We could
>> actually get away with overlaying the flag bit with one of the tuple
>> visibility bits, since it will only be used in tuples that are in the
>> in-memory hash table, which don't need visibility info anymore. But
>> that seems like a kluge that could wait until we really need the flag
>> space.

> I think that's a reasonable approach, although I might be inclined to
> do the overlay sooner rather than later if it doesn't add too much
> complexity.

Well, there's no "complexity" involved, it's just which bit do we define
the macro as. Any complexity is conceptual.

regards, tom lane


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: small table left outer join big table
Date: 2010-12-30 21:05:09
Message-ID: m2fwtfynsq.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.

That sounds somewhat like Loose Indexscan as described in the following
wiki page, right?

http://wiki.postgresql.org/wiki/Loose_indexscan

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jie Li <jay23jack(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table)
Date: 2011-01-01 11:18:40
Message-ID: 1293880720.1892.55684.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2010-12-30 at 10:45 -0500, Tom Lane wrote:

> Comments?

Thanks for working on this. I love the reuse of tuple flags; I can't
help feeling that opens up doors, just not sure how yet...

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services