Re: a few crazy ideas about hash joins

Lists: pgsql-hackers
From: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 15:24:12
Message-ID: 6EEA43D22289484890D119821101B1DF05190DEB@exchange20.mercury.ad.ubc.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> While investigating some performance problems recently I've had cause
> to think about the way PostgreSQL uses hash joins. So here are a few
> thoughts. Some of these have been brought up before.
>
> 1. When the hash is not expected to spill to disk, it preserves the
> pathkeys of the outer side of the join. If the optimizer were allowed
> to assume that, it could produce significantly more efficient query
> plans in some cases.

This is definitely possible, but you will have to dynamically modify the
execution path if the hash join ends up to be more than one batch.

> 3. Avoid building the exact same hash table twice in the same query.
> This happens more often you'd think. For example, a table may have
> two columns creator_id and last_updater_id which both reference person
> (id). If you're considering a hash join between paths A and B, you
> could conceivably check whether what is essentially a duplicate of B
> has already been hashed somewhere within path A. If so, you can reuse
> that same hash table at zero startup-cost.

> 4. As previously discussed, avoid hashing for distinct and then
> hashing the results for a hash join on the same column with the same
> operators.
>
> Thoughts on the value and/or complexity of implementation of any of
these?

I would be interested in working with you on any of these changes to
hash join if you decide to pursue them. I am especially interested in
looking at the hash aggregation code and potentially improving its
efficiency.

We have implemented a multi-way hash join (can join more than 2 tables
at a time) which may help with cases #3 and #4. Performance results
look very good, and we are planning on building a patch for this over
the summer.

--
Ramon Lawrence


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 15:45:55
Message-ID: 603c8f070904030845n57d7f8f7u68282a7381a8616e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 3, 2009 at 11:24 AM, Lawrence, Ramon <ramon(dot)lawrence(at)ubc(dot)ca> wrote:
> I would be interested in working with you on any of these changes to
> hash join if you decide to pursue them.   I am especially interested in
> looking at the hash aggregation code and potentially improving its
> efficiency.

Wow, that would be great!

> We have implemented a multi-way hash join (can join more than 2 tables
> at a time) which may help with cases #3 and #4.  Performance results
> look very good, and we are planning on building a patch for this over
> the summer.

I'd be interested in hearing you cope with this on the planner end of things.

...Robert


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 16:55:12
Message-ID: 4136ffa0904030955u5dce6e03nf999beeb242f73c7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> 1. When the hash is not expected to spill to disk, it preserves the
>> pathkeys of the outer side of the join.  If the optimizer were allowed
>> to assume that, it could produce significantly more efficient query
>> plans in some cases.
>
> This is definitely possible, but you will have to dynamically modify the
> execution path if the hash join ends up to be more than one batch.

Yeah, this item seems to be another case where having a "conditional"
branch in the plan would be valuable. What's interesting is that it's
branching based on whether the hash has spilled into batches rather
than whether the number of rows is greater or less than some breakeven
point. It looks like these branche nodes are going to need to know
more than just the generic plan parameters. They're going to need to
know about specifics like "sort has spilled to disk" or "hash has
spilled into batches" etc.

I like the idea of coalescing hash tables. I'm not sure the order in
which the planner decides on things is conducive to being able to make
good decisions about it though. Even if we did it afterwards without
adjusting the planner it might still be worthwhile though.

Incidentally a similar optimization is possible for materialize or
even sorts. They may not come up nearly so often since you would
normally want to go around adding indexes if you're repeatedly sorting
on the same columns. Biut it might not be any harder to get them all
in one swoop.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 20:15:10
Message-ID: 603c8f070904031315v61fcff1cp3c2442692d492908@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 3, 2009 at 12:55 PM, Greg Stark <stark(at)enterprisedb(dot)com> wrote:
>>> 1. When the hash is not expected to spill to disk, it preserves the
>>> pathkeys of the outer side of the join.  If the optimizer were allowed
>>> to assume that, it could produce significantly more efficient query
>>> plans in some cases.
>>
>> This is definitely possible, but you will have to dynamically modify the
>> execution path if the hash join ends up to be more than one batch.
>
> Yeah, this item seems to be another case where having a "conditional"
> branch in the plan would be valuable. What's interesting is that it's
> branching based on whether the hash has spilled into batches rather
> than whether the number of rows is greater or less than some breakeven
> point. It looks like these branche nodes are going to need to know
> more than just the generic plan parameters. They're going to need to
> know about specifics like "sort has spilled to disk" or "hash has
> spilled into batches" etc.

In this particular case, the operation that has to be performed is
more specific than "sort", it's "merge this set of sorted tapes". So
it's more subtle than just inserting a sort node.

> I like the idea of coalescing hash tables. I'm not sure the order in
> which the planner decides on things is conducive to being able to make
> good decisions about it though. Even if we did it afterwards without
> adjusting the planner it might still be worthwhile though.

I don't see why hash_inner_and_outer can't walk the outer path looking
for suitable hashes to reuse. I think the question is how aggressive
we want to be in performing that search. If we limit the search to
hashes on base tables without restriction conditions, we'll probably
catch most of the interesting cases, but obviously not all of them.
If we look for hashes on baserels with restriction conditions or
hashes on joinrels, etc., we might pick up a few more cases, but at
the expense of increased planning time.

The advantage of searching in the executor, I suppose, is that the
checks don't have to be as cheap, since you're only checking the plan
that has already won, rather than lots and lots of potential plans.
On the other hand, your costing will be less accurate, which could
lead to bad decisions elsewhere.

> Incidentally a similar optimization is possible for materialize or
> even sorts. They may not come up nearly so often since you would
> normally want to go around adding indexes if you're repeatedly sorting
> on the same columns. Biut it might not be any harder to get them all
> in one swoop.

At least in my experience, sort and materialize nodes are pretty rare,
so I'm not sure it would be worth the time it would take to search for
them. In most cases it seems to be cheaper to hash the inner and
outer paths than to sort even one of them and then merge-join the
result. When I do get these sorts of paths, they tend to be in
larger, more complex queries where there's less chance of useful
reuse. But my experience might not be representative...

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 20:29:25
Message-ID: 21826.1238790565@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I don't see why hash_inner_and_outer can't walk the outer path looking
> for suitable hashes to reuse. I think the question is how aggressive
> we want to be in performing that search.

Correct, but you've got the details all wrong. The real problem is that
the planner might discard a join path hash(A,B) at level 2 because it
loses compared to, say, merge(A,B). But when we get to level three,
perhaps hash(hash(A,B),C) would've been the best plan due to synergy
of the two hashes. We'll never find that out unless we keep the
"inferior" hash path around. We can certainly do that; the question
is what's it going to cost us to allow more paths to survive to the
next join level. (And I'm afraid the answer may be "plenty"; it's a
combinatorial explosion we're looking at here.)

>> Incidentally a similar optimization is possible for materialize or
>> even sorts.

The planner already goes to great lengths to avoid extra sorts for
multiple levels of mergejoin ... that's what all the "pathkey" thrashing
is about. It's pretty expensive.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 21:02:03
Message-ID: 603c8f070904031402y5a47cd48w92cd09a59fa0d6f2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I don't see why hash_inner_and_outer can't walk the outer path looking
>> for suitable hashes to reuse.  I think the question is how aggressive
>> we want to be in performing that search.
>
> Correct, but you've got the details all wrong.  The real problem is that
> the planner might discard a join path hash(A,B) at level 2 because it
> loses compared to, say, merge(A,B).  But when we get to level three,
> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> of the two hashes.  We'll never find that out unless we keep the
> "inferior" hash path around.  We can certainly do that; the question
> is what's it going to cost us to allow more paths to survive to the
> next join level.  (And I'm afraid the answer may be "plenty"; it's a
> combinatorial explosion we're looking at here.)

That would be crazy. I think doing it the way I suggested is correct,
just not guaranteed to catch every case. The reality is that even if
we took Greg Stark's suggestion of detecting this situation only in
the executor, we'd still get some benefit out of this. If we take my
intermediate approach, we'll catch more cases where this is a win.
What you're suggesting here would catch every conceivable case, but at
the expense of what I'm sure would be an unacceptable increase in
planning time for very limit benefit.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-03 21:10:56
Message-ID: 22570.1238793056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Correct, but you've got the details all wrong. The real problem is that
>> the planner might discard a join path hash(A,B) at level 2 because it
>> loses compared to, say, merge(A,B). But when we get to level three,
>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
>> of the two hashes. We'll never find that out unless we keep the
>> "inferior" hash path around. We can certainly do that; the question
>> is what's it going to cost us to allow more paths to survive to the
>> next join level. (And I'm afraid the answer may be "plenty"; it's a
>> combinatorial explosion we're looking at here.)

> That would be crazy. I think doing it the way I suggested is correct,
> just not guaranteed to catch every case. The reality is that even if
> we took Greg Stark's suggestion of detecting this situation only in
> the executor, we'd still get some benefit out of this. If we take my
> intermediate approach, we'll catch more cases where this is a win.
> What you're suggesting here would catch every conceivable case, but at
> the expense of what I'm sure would be an unacceptable increase in
> planning time for very limit benefit.

Maybe, maybe not. I've seen plenty of plans that have several
mergejoins stacked up on top of each other with no intervening sorts.
There is 0 chance that the planner would have produced that if it
thought that it had to re-sort at each level; something else would have
looked cheaper. I think that your proposals will end up getting very
little of the possible benefit, because the planner will fail to choose
plan trees in which the optimization can be exploited.

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-04 11:26:00
Message-ID: 2F337F24-DD69-4619-83C1-C44C040BF1E4@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le 3 avr. 09 à 22:29, Tom Lane a écrit :
> Correct, but you've got the details all wrong. The real problem is
> that
> the planner might discard a join path hash(A,B) at level 2 because it
> loses compared to, say, merge(A,B). But when we get to level three,
> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> of the two hashes. We'll never find that out unless we keep the
> "inferior" hash path around. We can certainly do that; the question
> is what's it going to cost us to allow more paths to survive to the
> next join level. (And I'm afraid the answer may be "plenty"; it's a
> combinatorial explosion we're looking at here.)

I remember having done some board game simulation project at school,
using alpha-beta algorithms with cuts, and as an optimisation a
minimax too. Those are heuristics, but that you can decide to run on
the full set of possible trees when you want a global optimum rather
than a local one.

Now, I don't know the specifics of the planner code, but would it be
possible to use a minimax kind of heuristic? Then a planner effort GUC
would allow users to choose whether they want to risk the "plenty"
combinatorial explosion in some requests.

It could be that the planner already is smarter than this of course,
and I can't even say I'd be surprised about it, but still trying...
--
dim

http://en.wikipedia.org/wiki/Minimax


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-05 02:39:34
Message-ID: 603c8f070904041939j2b28f322mfbcfdfa6294f6902@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Correct, but you've got the details all wrong.  The real problem is that
>>> the planner might discard a join path hash(A,B) at level 2 because it
>>> loses compared to, say, merge(A,B).  But when we get to level three,
>>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
>>> of the two hashes.  We'll never find that out unless we keep the
>>> "inferior" hash path around.  We can certainly do that; the question
>>> is what's it going to cost us to allow more paths to survive to the
>>> next join level.  (And I'm afraid the answer may be "plenty"; it's a
>>> combinatorial explosion we're looking at here.)
>
>> That would be crazy.  I think doing it the way I suggested is correct,
>> just not guaranteed to catch every case.  The reality is that even if
>> we took Greg Stark's suggestion of detecting this situation only in
>> the executor, we'd still get some benefit out of this.  If we take my
>> intermediate approach, we'll catch more cases where this is a win.
>> What you're suggesting here would catch every conceivable case, but at
>> the expense of what I'm sure would be an unacceptable increase in
>> planning time for very limit benefit.
>
> Maybe, maybe not.  I've seen plenty of plans that have several
> mergejoins stacked up on top of each other with no intervening sorts.
> There is 0 chance that the planner would have produced that if it
> thought that it had to re-sort at each level; something else would have
> looked cheaper.  I think that your proposals will end up getting very
> little of the possible benefit, because the planner will fail to choose
> plan trees in which the optimization can be exploited.

Well, I'm all ears if you have suggestions for improvement. For
sorts, we use PathKeys to represent the ordering of each path and keep
the paths for each set of pathkeys. By analogy, we could maintain a
list of PathHash structures for each path representing the tables that
had already been hashed. add_path() would then have to consider both
the PathHash structures and the PathKey structures before concluding
that a path was definitely worse than some path previously found. At
each level of the join tree, we'd need to truncate PathHash structures
that provably have no further use (e.g. on a base table that does not
appear again above the level of the join already planned) to avoid
keeping around paths that appeared to be better only because we didn't
know that the paths they have hashed are worthless in practice. Maybe
that wouldn't even be that expensive, actually, because there will be
lots of cases where you know the relevant table doesn't appear
elsewhere in the query and not save any extra paths. But I think we'd
have to write the code and benchmark it to really know.

I guess the reason I'm not too worked up about this is because my
experience is that the planner nearly always prefers hash joins on
small tables, even when an index is present - the queries I'm worried
about optimizing don't need any additional encouragement to use hash
joins; they're doing it already. But certainly it doesn't hurt to see
how many cases we can pick up.

...Robert


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-07 13:55:28
Message-ID: 200904071355.n37DtS617785@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Are there any TODOs here?

---------------------------------------------------------------------------

Robert Haas wrote:
> On Fri, Apr 3, 2009 at 5:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Fri, Apr 3, 2009 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> Correct, but you've got the details all wrong. ?The real problem is that
> >>> the planner might discard a join path hash(A,B) at level 2 because it
> >>> loses compared to, say, merge(A,B). ?But when we get to level three,
> >>> perhaps hash(hash(A,B),C) would've been the best plan due to synergy
> >>> of the two hashes. ?We'll never find that out unless we keep the
> >>> "inferior" hash path around. ?We can certainly do that; the question
> >>> is what's it going to cost us to allow more paths to survive to the
> >>> next join level. ?(And I'm afraid the answer may be "plenty"; it's a
> >>> combinatorial explosion we're looking at here.)
> >
> >> That would be crazy. ?I think doing it the way I suggested is correct,
> >> just not guaranteed to catch every case. ?The reality is that even if
> >> we took Greg Stark's suggestion of detecting this situation only in
> >> the executor, we'd still get some benefit out of this. ?If we take my
> >> intermediate approach, we'll catch more cases where this is a win.
> >> What you're suggesting here would catch every conceivable case, but at
> >> the expense of what I'm sure would be an unacceptable increase in
> >> planning time for very limit benefit.
> >
> > Maybe, maybe not. ?I've seen plenty of plans that have several
> > mergejoins stacked up on top of each other with no intervening sorts.
> > There is 0 chance that the planner would have produced that if it
> > thought that it had to re-sort at each level; something else would have
> > looked cheaper. ?I think that your proposals will end up getting very
> > little of the possible benefit, because the planner will fail to choose
> > plan trees in which the optimization can be exploited.
>
> Well, I'm all ears if you have suggestions for improvement. For
> sorts, we use PathKeys to represent the ordering of each path and keep
> the paths for each set of pathkeys. By analogy, we could maintain a
> list of PathHash structures for each path representing the tables that
> had already been hashed. add_path() would then have to consider both
> the PathHash structures and the PathKey structures before concluding
> that a path was definitely worse than some path previously found. At
> each level of the join tree, we'd need to truncate PathHash structures
> that provably have no further use (e.g. on a base table that does not
> appear again above the level of the join already planned) to avoid
> keeping around paths that appeared to be better only because we didn't
> know that the paths they have hashed are worthless in practice. Maybe
> that wouldn't even be that expensive, actually, because there will be
> lots of cases where you know the relevant table doesn't appear
> elsewhere in the query and not save any extra paths. But I think we'd
> have to write the code and benchmark it to really know.
>
> I guess the reason I'm not too worked up about this is because my
> experience is that the planner nearly always prefers hash joins on
> small tables, even when an index is present - the queries I'm worried
> about optimizing don't need any additional encouragement to use hash
> joins; they're doing it already. But certainly it doesn't hurt to see
> how many cases we can pick up.
>
> ...Robert
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-07 14:13:42
Message-ID: 603c8f070904070713s5f6bce44t4df8a998bc8ee500@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Are there any TODOs here?

I'd say that all of the items listed in my original email could be
TODOs. I'm planning to work on as many of them as I have time for.
Ramon Lawrence is also working on some related ideas, as discussed
upthread. AFAICS no one has expressed the idea that anything that's
been talked about is a bad idea, so it's just a question of finding
enough round tuits.

...Robert


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-07 21:11:08
Message-ID: 200904072111.n37LB8u15286@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Are there any TODOs here?
>
> I'd say that all of the items listed in my original email could be
> TODOs. I'm planning to work on as many of them as I have time for.
> Ramon Lawrence is also working on some related ideas, as discussed
> upthread. AFAICS no one has expressed the idea that anything that's
> been talked about is a bad idea, so it's just a question of finding
> enough round tuits.

OK, would you please add them to the Index/Hash section of the TODO
list; I am afraid I will not do the justice.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-07 21:52:12
Message-ID: 603c8f070904071452o7b2736f1wabcb63bcbc14d86a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 7, 2009 at 5:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > Are there any TODOs here?
>>
>> I'd say that all of the items listed in my original email could be
>> TODOs.  I'm planning to work on as many of them as I have time for.
>> Ramon Lawrence is also working on some related ideas, as discussed
>> upthread.  AFAICS no one has expressed the idea that anything that's
>> been talked about is a bad idea, so it's just a question of finding
>> enough round tuits.
>
> OK, would you please add them to the Index/Hash section of the TODO
> list;  I am afraid I will not do the justice.

I think perhaps Optimizer / Executor would be more appropriate, since
these are not about hash indices but rather about hash joins. I will
look at doing that.

Also I think the last item under Index / Hash is actually NOT done,
and belongs in the main index section rather than Index / Hash.

The first item in the Index / Hash section doesn't really look like a
TODO, or at the very least it's unclear what the action item is
supposed to be.

...Robert


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-07 21:57:20
Message-ID: 200904072157.n37LvKR04430@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Tue, Apr 7, 2009 at 5:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> On Tue, Apr 7, 2009 at 9:55 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> > Are there any TODOs here?
> >>
> >> I'd say that all of the items listed in my original email could be
> >> TODOs. ?I'm planning to work on as many of them as I have time for.
> >> Ramon Lawrence is also working on some related ideas, as discussed
> >> upthread. ?AFAICS no one has expressed the idea that anything that's
> >> been talked about is a bad idea, so it's just a question of finding
> >> enough round tuits.
> >
> > OK, would you please add them to the Index/Hash section of the TODO
> > list; ?I am afraid I will not do the justice.
>
> I think perhaps Optimizer / Executor would be more appropriate, since
> these are not about hash indices but rather about hash joins. I will
> look at doing that.

Yes, please.

> Also I think the last item under Index / Hash is actually NOT done,
> and belongs in the main index section rather than Index / Hash.

Yep, I didn't realize that editing "Index" also does the subsections,
while editing the subsections doesn't edit the upper level.

> The first item in the Index / Hash section doesn't really look like a
> TODO, or at the very least it's unclear what the action item is
> supposed to be.

Yep, remove, thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)enterprisedb(dot)com>, "Lawrence, Ramon" <ramon(dot)lawrence(at)ubc(dot)ca>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a few crazy ideas about hash joins
Date: 2009-04-08 03:02:54
Message-ID: 603c8f070904072002t15f9ff4i474c957f58dd7baa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 7, 2009 at 5:57 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> I think perhaps Optimizer / Executor would be more appropriate, since
>> these are not about hash indices but rather about hash joins.  I will
>> look at doing that.
>
> Yes, please.

Done. See what you think...

>> Also I think the last item under Index / Hash is actually NOT done,
>> and belongs in the main index section rather than Index / Hash.
>
> Yep, I didn't realize that editing "Index" also does the subsections,
> while editing the subsections doesn't edit the upper level.

Heh. I'd write some webapps to do some of these things, but I haven't
been able to interest anyone in providing me with a
postgresql.org-based hosting arrangement.

>> The first item in the Index / Hash section doesn't really look like a
>> TODO, or at the very least it's unclear what the action item is
>> supposed to be.
>
> Yep, remove, thanks.

...Robert