Lists: | pgsql-hackers |
---|
From: | Jie Li <jay23jack(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | small table left outer join big table |
Date: | 2010-12-28 10:13:40 |
Message-ID: | AANLkTikNKpOfM=OPzfMgi9_q2tyUVSSPA0vBYaJ2_mk4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Please see the following plan:
postgres=# explain select * from small_table left outer join big_table using
(id);
QUERY PLAN
----------------------------------------------------------------------------
Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
Hash Cond: (small_table.id = big_table.id)
-> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
-> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
-> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
width=8)
(5 rows)
Here I have a puzzle, why not choose the small table to build hash table? It
can avoid multiple batches thus save significant I/O cost, isn't it?
We can perform this query in two phases:
1) inner join, using the small table to build hash table.
2) check whether each tuple in the hash table has matches before, which can
be done with another flag bit
The only compromise is the output order, due to the two separate phases. Not
sure whether the SQL standard requires it.
Thanks,
Li Jie
From: | Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> |
---|---|
To: | Jie Li <jay23jack(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-28 15:09:51 |
Message-ID: | AANLkTi=_==09etYitq0Aiysa2BPScWZcq4WSq=mwm-3t@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
> Hi,
>
> Please see the following plan:
>
> postgres=# explain select * from small_table left outer join big_table
> using (id);
> QUERY PLAN
>
>
> ----------------------------------------------------------------------------
> Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
> Hash Cond: (small_table.id = big_table.id)
> -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
> -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
> -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
> width=8)
> (5 rows)
>
> Here I have a puzzle, why not choose the small table to build hash table?
> It can avoid multiple batches thus save significant I/O cost, isn't it?
>
> We can perform this query in two phases:
> 1) inner join, using the small table to build hash table.
> 2) check whether each tuple in the hash table has matches before, which can
> be done with another flag bit
>
> The only compromise is the output order, due to the two separate phases.
> Not sure whether the SQL standard requires it.
>
>
SQL standard does not require the result to be in any particular order
unless an ORDER BY is used.
Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com
singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet
Mail sent from my BlackLaptop device
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Jie Li <jay23jack(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 12:17:17 |
Message-ID: | AANLkTi=7=BP6fsHSO31C9x_RAFgSttXxac1vreYWdL6M@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
> Hi,
>
> Please see the following plan:
>
> postgres=# explain select * from small_table left outer join big_table using
> (id);
> QUERY PLAN
> ----------------------------------------------------------------------------
> Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
> Hash Cond: (small_table.id = big_table.id)
> -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
> -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
> -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
> width=8)
> (5 rows)
>
> Here I have a puzzle, why not choose the small table to build hash table? It
> can avoid multiple batches thus save significant I/O cost, isn't it?
Yeah, you'd think. Can you post a full reproducible test case?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 12:34:05 |
Message-ID: | 1293626045.1892.4409.camel@ebony |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
> >
> > Here I have a puzzle, why not choose the small table to build hash table? It
> > can avoid multiple batches thus save significant I/O cost, isn't it?
>
> Yeah, you'd think. Can you post a full reproducible test case?
It's not a bug, that's the way it currently works. We don't need a test
case for that.
I agree that the optimisation would be a useful one.
It allows you to ask the query "Show me sales for each of my stores"
efficiently, rather than being forced to request the inner join query
"Show me the sales for each of my stores for which there have been
sales", which is a much less useful query.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
From: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 12:39:43 |
Message-ID: | 1293626360-sup-1760@alvh.no-ip.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
> > Hi,
> >
> > Please see the following plan:
> >
> > postgres=# explain select * from small_table left outer join big_table using
> > (id);
> > QUERY PLAN
> > ----------------------------------------------------------------------------
> > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
> > Hash Cond: (small_table.id = big_table.id)
> > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
> > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
> > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
> > width=8)
> > (5 rows)
> >
> > Here I have a puzzle, why not choose the small table to build hash table? It
> > can avoid multiple batches thus save significant I/O cost, isn't it?
>
> Yeah, you'd think. Can you post a full reproducible test case?
Also, what version is this?
--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 12:59:19 |
Message-ID: | AANLkTim77uhiyCkfTfUW2Az47EtM163ztTk1eSOC+nFg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think. Can you post a full reproducible test case?
>
> It's not a bug, that's the way it currently works. We don't need a test
> case for that.
>
> I agree that the optimisation would be a useful one.
>
> It allows you to ask the query "Show me sales for each of my stores"
> efficiently, rather than being forced to request the inner join query
> "Show me the sales for each of my stores for which there have been
> sales", which is a much less useful query.
Oh, you're right. I missed the fact that it's a left join.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | "Li Jie" <jay23jack(at)gmail(dot)com> |
---|---|
To: | "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 14:45:00 |
Message-ID: | 005301cba766$fb00fd80$0801a8c0@A0078508 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
----- Original Message -----
From: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Jie Li" <jay23jack(at)gmail(dot)com>; "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, December 29, 2010 8:39 PM
Subject: Re: [HACKERS] small table left outer join big table
> Excerpts from Robert Haas's message of mié dic 29 09:17:17 -0300 2010:
>> On Tue, Dec 28, 2010 at 5:13 AM, Jie Li <jay23jack(at)gmail(dot)com> wrote:
>> > Hi,
>> >
>> > Please see the following plan:
>> >
>> > postgres=# explain select * from small_table left outer join big_table using
>> > (id);
>> > QUERY PLAN
>> > ----------------------------------------------------------------------------
>> > Hash Left Join (cost=126408.00..142436.98 rows=371 width=12)
>> > Hash Cond: (small_table.id = big_table.id)
>> > -> Seq Scan on small_table (cost=0.00..1.09 rows=9 width=8)
>> > -> Hash (cost=59142.00..59142.00 rows=4100000 width=8)
>> > -> Seq Scan on big_table (cost=0.00..59142.00 rows=4100000
>> > width=8)
>> > (5 rows)
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think. Can you post a full reproducible test case?
>
> Also, what version is this?
>
> --
> Álvaro Herrera <alvherre(at)commandprompt(dot)com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
The version is 9.0.1. I believe the latest version works in the same way.
Thanks,
Li Jie
From: | "Li Jie" <jay23jack(at)gmail(dot)com> |
---|---|
To: | "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 14:59:16 |
Message-ID: | 006801cba769$09ed6a70$0801a8c0@A0078508 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Thank you for all your comments.
I think the condition of this optimization is whether the small table can fit into memory. If not, then it doesn't work since two tables still need to be written to disk. But if yes, we can save all I/O costs in the hash join process.
Thanks,
Li Jie
----- Original Message -----
From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Jie Li" <jay23jack(at)gmail(dot)com>; "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, December 29, 2010 8:59 PM
Subject: Re: [HACKERS] small table left outer join big table
On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2010-12-29 at 07:17 -0500, Robert Haas wrote:
>> >
>> > Here I have a puzzle, why not choose the small table to build hash table? It
>> > can avoid multiple batches thus save significant I/O cost, isn't it?
>>
>> Yeah, you'd think. Can you post a full reproducible test case?
>
> It's not a bug, that's the way it currently works. We don't need a test
> case for that.
>
> I agree that the optimisation would be a useful one.
>
> It allows you to ask the query "Show me sales for each of my stores"
> efficiently, rather than being forced to request the inner join query
> "Show me the sales for each of my stores for which there have been
> sales", which is a much less useful query.
Oh, you're right. I missed the fact that it's a left join.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 14:59:32 |
Message-ID: | 9856.1293634772@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> It's not a bug, that's the way it currently works. We don't need a test
>> case for that.
> Oh, you're right. I missed the fact that it's a left join.
The only thing that struck me as curious about it was that the OP didn't
get a nestloop-with-inner-indexscan plan. That would be explainable if
there was no index on the large table's "id" column ... but columns
named like that usually have indexes.
I can't get all *that* excited about complicating hash joins as
proposed. The query is still fundamentally going to be slow because
you won't get out of having to seqscan the large table. The only way
to make it really fast is to not read all of the large table, and
nestloop-with-inner-indexscan is the only plan type with a hope of
doing that.
regards, tom lane
From: | "Li Jie" <jay23jack(at)gmail(dot)com> |
---|---|
To: | "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 15:13:27 |
Message-ID: | 006d01cba76a$f5190260$0801a8c0@A0078508 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>; "Jie Li" <jay23jack(at)gmail(dot)com>; "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Sent: Wednesday, December 29, 2010 10:59 PM
Subject: Re: [HACKERS] small table left outer join big table
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> It's not a bug, that's the way it currently works. We don't need a test
>>> case for that.
>
>> Oh, you're right. I missed the fact that it's a left join.
>
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan. That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
>
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.
>
> regards, tom lane
Yes there is no index on the joined column, otherwise nestloop-with-inner-indexscan should be preferred.
But why can't outer join be as clever as inner join? Anyway, if we unfortunately don't have available index, we have no choice but rely on hash join, right?
Thanks,
Li Jie
From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-29 20:58:18 |
Message-ID: | 1293656298.1892.10087.camel@ebony |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> It's not a bug, that's the way it currently works. We don't need a test
> >> case for that.
>
> > Oh, you're right. I missed the fact that it's a left join.
>
> The only thing that struck me as curious about it was that the OP didn't
> get a nestloop-with-inner-indexscan plan. That would be explainable if
> there was no index on the large table's "id" column ... but columns
> named like that usually have indexes.
>
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.
Seq scanning the big table isn't bad... we've gone to a lot of trouble
to make it easy to do this, especially with many users.
Maintaining many large indexes is definitely bad, all that random I/O is
going to suck badly.
Seems like an interesting and relatively optimisation to me. Not sure if
this is a request for feature, or a proposal to write the optimisation.
I hope its the latter.
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services
From: | Jie Li <jay23jack(at)gmail(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-30 01:54:10 |
Message-ID: | AANLkTin2naZLSc8imUXUUP9F9tE=bQh3aBN+3-fOsQTD@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Dec 29, 2010 at 3:58 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, 2010-12-29 at 09:59 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > On Wed, Dec 29, 2010 at 7:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > >> It's not a bug, that's the way it currently works. We don't need a
> test
> > >> case for that.
> >
> > > Oh, you're right. I missed the fact that it's a left join.
> >
> > The only thing that struck me as curious about it was that the OP didn't
> > get a nestloop-with-inner-indexscan plan. That would be explainable if
> > there was no index on the large table's "id" column ... but columns
> > named like that usually have indexes.
> >
> > I can't get all *that* excited about complicating hash joins as
> > proposed. The query is still fundamentally going to be slow because
> > you won't get out of having to seqscan the large table. The only way
> > to make it really fast is to not read all of the large table, and
> > nestloop-with-inner-indexscan is the only plan type with a hope of
> > doing that.
>
> Seq scanning the big table isn't bad... we've gone to a lot of trouble
> to make it easy to do this, especially with many users.
>
> Maintaining many large indexes is definitely bad, all that random I/O is
> going to suck badly.
>
> Seems like an interesting and relatively optimisation to me. Not sure if
> this is a request for feature, or a proposal to write the optimisation.
> I hope its the latter.
>
>
Thanks for your comments. Yeah I'm excited to write code for PostgreSQL,
but I'm new here
and not familiar with the code routine or patch submission. I will try to
learn in near future. So
for the moment, it is a request for feature, and I'm looking forward to any
pgsql-hackers working
on this.
Thanks,
Li Jie
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jie Li <jay23jack(at)gmail(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table) |
Date: | 2010-12-30 15:45:54 |
Message-ID: | 16990.1293723954@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I had an epiphany about this topic, or actually two of them.
1. Whether or not you think there's a significant performance reason
to support hash right joins, there's a functionality reason. The
infrastructure for right join could just as easily do full joins.
And AFAICS, a hash full join would only require one hashable join
clause --- the other FULL JOIN ON conditions could be anything at all.
This is unlike the situation for merge join, where all the JOIN ON
conditions have to be mergeable or it doesn't work right. So we could
greatly reduce the scope of the dreaded "FULL JOIN is only supported
with merge-joinable join conditions" error. (Well, okay, it's not
*that* dreaded, but people complain about it occasionally.)
2. The obvious way to implement this would involve adding an extra bool
field to struct HashJoinTupleData. The difficulty with that, and the
reason I'd been resistant to the whole idea, is that it'd eat up a full
word per hashtable entry because of alignment considerations. (On
64-bit machines it'd be free because of alignment considerations, but
that's cold comfort when 32-bit machines are the ones pressed for
address space.) But we only need one bit, so what about commandeering
an infomask bit in the tuple itself? For the initial implementation
I'd be inclined to take one of the free bits in t_infomask2. We could
actually get away with overlaying the flag bit with one of the tuple
visibility bits, since it will only be used in tuples that are in the
in-memory hash table, which don't need visibility info anymore. But
that seems like a kluge that could wait until we really need the flag
space.
Comments?
regards, tom lane
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jie Li <jay23jack(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table) |
Date: | 2010-12-30 15:50:16 |
Message-ID: | AANLkTikcuSc9p1XDmN3ZE5YDtWvFF+1MU-LCjEVRVyBD@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I had an epiphany about this topic, or actually two of them.
>
> 1. Whether or not you think there's a significant performance reason
> to support hash right joins, there's a functionality reason. The
> infrastructure for right join could just as easily do full joins.
> And AFAICS, a hash full join would only require one hashable join
> clause --- the other FULL JOIN ON conditions could be anything at all.
> This is unlike the situation for merge join, where all the JOIN ON
> conditions have to be mergeable or it doesn't work right. So we could
> greatly reduce the scope of the dreaded "FULL JOIN is only supported
> with merge-joinable join conditions" error. (Well, okay, it's not
> *that* dreaded, but people complain about it occasionally.)
Yeah, that would be neat. It might be a lot faster in some cases, too.
> 2. The obvious way to implement this would involve adding an extra bool
> field to struct HashJoinTupleData. The difficulty with that, and the
> reason I'd been resistant to the whole idea, is that it'd eat up a full
> word per hashtable entry because of alignment considerations. (On
> 64-bit machines it'd be free because of alignment considerations, but
> that's cold comfort when 32-bit machines are the ones pressed for
> address space.) But we only need one bit, so what about commandeering
> an infomask bit in the tuple itself? For the initial implementation
> I'd be inclined to take one of the free bits in t_infomask2. We could
> actually get away with overlaying the flag bit with one of the tuple
> visibility bits, since it will only be used in tuples that are in the
> in-memory hash table, which don't need visibility info anymore. But
> that seems like a kluge that could wait until we really need the flag
> space.
I think that's a reasonable approach, although I might be inclined to
do the overlay sooner rather than later if it doesn't add too much
complexity.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Jie Li <jay23jack(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table) |
Date: | 2010-12-30 16:20:36 |
Message-ID: | AANLkTikfbkea=_uSR-XfWMo_hOAidL_yq7iE3zx06TTP@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Dec 30, 2010 at 11:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I had an epiphany about this topic, or actually two of them.
> >
> > 1. Whether or not you think there's a significant performance reason
> > to support hash right joins, there's a functionality reason. The
> > infrastructure for right join could just as easily do full joins.
> > And AFAICS, a hash full join would only require one hashable join
> > clause --- the other FULL JOIN ON conditions could be anything at all.
> > This is unlike the situation for merge join, where all the JOIN ON
> > conditions have to be mergeable or it doesn't work right. So we could
> > greatly reduce the scope of the dreaded "FULL JOIN is only supported
> > with merge-joinable join conditions" error. (Well, okay, it's not
> > *that* dreaded, but people complain about it occasionally.)
>
> Yeah, that would be neat. It might be a lot faster in some cases, too.
>
Yeah, PostgreSQL should have this great feature.
Actually Oracle 10g already has the right hash join,
http://dbcrusade.blogspot.com/2008/01/oracle-hash-join-right-outer.html
And Oracle 11g has the full hash join.
http://www.dba-oracle.com/oracle11g/oracle_11g_full_hash_join.htm
Haven't checked whether other DBMS have this feature.
Thanks,
Li Jie
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Jie Li <jay23jack(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table) |
Date: | 2010-12-30 16:35:05 |
Message-ID: | 18600.1293726905@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Dec 30, 2010 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... But we only need one bit, so what about commandeering
>> an infomask bit in the tuple itself? For the initial implementation
>> I'd be inclined to take one of the free bits in t_infomask2. We could
>> actually get away with overlaying the flag bit with one of the tuple
>> visibility bits, since it will only be used in tuples that are in the
>> in-memory hash table, which don't need visibility info anymore. But
>> that seems like a kluge that could wait until we really need the flag
>> space.
> I think that's a reasonable approach, although I might be inclined to
> do the overlay sooner rather than later if it doesn't add too much
> complexity.
Well, there's no "complexity" involved, it's just which bit do we define
the macro as. Any complexity is conceptual.
regards, tom lane
From: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jie Li <jay23jack(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: small table left outer join big table |
Date: | 2010-12-30 21:05:09 |
Message-ID: | m2fwtfynsq.fsf@2ndQuadrant.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> I can't get all *that* excited about complicating hash joins as
> proposed. The query is still fundamentally going to be slow because
> you won't get out of having to seqscan the large table. The only way
> to make it really fast is to not read all of the large table, and
> nestloop-with-inner-indexscan is the only plan type with a hope of
> doing that.
That sounds somewhat like Loose Indexscan as described in the following
wiki page, right?
http://wiki.postgresql.org/wiki/Loose_indexscan
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jie Li <jay23jack(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: RIGHT/FULL OUTER hash joins (was Re: small table left outer join big table) |
Date: | 2011-01-01 11:18:40 |
Message-ID: | 1293880720.1892.55684.camel@ebony |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, 2010-12-30 at 10:45 -0500, Tom Lane wrote:
> Comments?
Thanks for working on this. I love the reuse of tuple flags; I can't
help feeling that opens up doors, just not sure how yet...
--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services