Quick Links

Re: query execution question

Lists:	pgsql-hackers

From:	amit sehas <cun23(at)yahoo(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	query execution question
Date:	2011-02-09 19:16:11
Message-ID:	872907.86083.qm@web32203.mail.mud.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

We are making some performance measurements, we are trying to determine query execution behavior. Lets say we have 4 tables T1, T2, T3 and T4

and the query has the form:

select * from T1, T2, T3, T4 where (T1.a = T2.b and T2.c = T3.d
T3.e = T4.f)

where a,b,c,d,e,f are properties of the respective tables.

Lets say that the cost based optimizer determines that the order of the joins should be T1.a=T2.b followed by T2.c = T3.d followed by T3.e = T4.f

the question we have is during query execution are the joins evaluated completely one by one in that order, or the first join is evaluated completely and generates an intermediate table which is then utilized to perform the next join....this means that for such a query we will need space for all the intermediate tables, which if they are very large tables as they are in our case can significantly alter the cost of the operations...

thanks

-Ashish

From:	Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To:	amit sehas <cun23(at)yahoo(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: query execution question
Date:	2011-02-09 23:30:50
Message-ID:	AANLkTim+4gPvFTeEEPb5au-EhXxzB_JxYArh6hFZTko1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2011/2/9 amit sehas <cun23(at)yahoo(dot)com>:

> Lets say that the cost based optimizer determines that the order of the
> joins should be T1.a=T2.b followed by T2.c = T3.d followed by T3.e = T4.f
>
> the question we have is during query execution are the joins evaluated
> completely one by one in that order, or the first join is evaluated
> completely and generates an intermediate table which is then utilized
> to perform the next join....this means that for such a query we will need
> space for all the intermediate tables, which if they are very large tables
> as they are in our case can significantly alter the cost of the operations...

[ This is a question more appropriate for pgsql-performance. ]

The optimizer doesn't only determine the order (or "tree" actually) in
which to perform the joins, but also how to perform them: nested loop,
merge, or hash join. Depending on those physical join types, something
might need to be materialized (merge: the intermediate sort "tapes";
hash: the full outer operand's contents) or not (nested loop).

Please see the EXPLAIN statement if you want to know how the query
would be executed.

Nicolas

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<pgsql-hackers(at)postgresql(dot)org>,"amit sehas" <cun23(at)yahoo(dot)com>
Subject:	Re: query execution question
Date:	2011-02-09 23:42:50
Message-ID:	4D52D21A020000250003A71F@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

amit sehas <cun23(at)yahoo(dot)com> wrote:

> We are making some performance measurements, we are trying to
> determine query execution behavior.

I hope you're not doing this on an un-tuned server or "toy" tables.
There are a number of configuration parameters which should be tuned
for your particular server, which will affect the plans used on that
server. Also, the plans the optimizer will choose for small tables
are generally quite different for those chosen for large tables.

> the question we have is during query execution are the joins
> evaluated completely one by one in that order, or the first join
> is evaluated completely and generates an intermediate table which
> is then utilized to perform the next join....this means that for
> such a query we will need space for all the intermediate tables,
> which if they are very large tables as they are in our case can
> significantly alter the cost of the operations...

The query looks at the estimated cost of various plans, and uses the
one with the lowest estimated cost. The plan consists of a set of
steps, with each step pulling rows from one or more lower steps; so
the process is driven from the top down -- you try to pull a result
row, and if the top level step needs something from a lower row it
tries to pull that, etc. Some plans involve "materializing" a set
of rows, some pull through indexes or heap scans as rows are
requested from them. Depending on the query, the indexes, the table
sizes, etc., a SELECT statement with joins like you describe might
use nested index joins and never have more than a few rows in RAM at
a time without ever writing anything to disk.

By the way, this is the wrong list for this question. The -hackers
list is for discussion about work on developing the product.
Questions like this about how it works are better posted to the
-general or -novice list.

-Kevin