Re: Hash Join Optimization

Lists: pgsql-hackers
From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Hash Join Optimization
Date: 2008-03-25 20:32:54
Message-ID: 9362e74e0803251332y43391419j2be7eec320b084b1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,
I had a chance to go through the Hash join code of Postgresql and had the
following thoughts.

- Currently postgres takes the heaptuple from the slot and creates and
minimal_tuple and copies it into the temp file.

I think the creation of minimal_tuple in the middle is a overhead which can
be avoided by creating a mem-map and directly creating the minimal_tuple in
the mem-map. Since Hash join is used mainly to join huge tables, this might
benefit those warehouse customers of postgres.

Am i missing something???

Thanks,
Gokul.


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
Cc: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Join Optimization
Date: 2008-03-28 08:34:20
Message-ID: 20080328172434.6A07.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com> wrote:

> I think the creation of minimal_tuple in the middle is a overhead which can
> be avoided by creating a mem-map and directly creating the minimal_tuple in
> the mem-map.

Many implementations of mem-map disallow to extend the sizes.
Do you have any solution about extending the mmap-ed region?

> Since Hash join is used mainly to join huge tables, this might
> benefit those warehouse customers of postgres.

If we use mmap, we will be restricted by virtual memory size.
It means we need to drop huge tempspace supports in 32bit platform, no?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash Join Optimization
Date: 2008-03-30 10:32:11
Message-ID: 9362e74e0803300332j29020b90u4b29f5042004db8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 28, 2008 at 2:04 PM, ITAGAKI Takahiro <
itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:

>
> "Gokulakannan Somasundaram" <gokul007(at)gmail(dot)com> wrote:
>
> > I think the creation of minimal_tuple in the middle is a overhead which
> can
> > be avoided by creating a mem-map and directly creating the minimal_tuple
> in
> > the mem-map.
>
> Many implementations of mem-map disallow to extend the sizes.
> Do you have any solution about extending the mmap-ed region?

No. i think the solution would be to unmap and remap it. But since the mmap
is local to the backend, this should not be a problem.

>
>
> > Since Hash join is used mainly to join huge tables, this might
> > benefit those warehouse customers of postgres.
>
> If we use mmap, we will be restricted by virtual memory size.
> It means we need to drop huge tempspace supports in 32bit platform, no?

Yes you are right here. i am in the mood of 64 bit platforms. In 32 bit
platform this might need more work. Selectively mapping and unmapping
portions of the file, based on necessity.

But my aim here is to avoid two copying. HeapTuple -> MinimalTuple and
MinimalTuple -> file. Suggestions are welcome..

Thanks,
Gokul.