Re: query decorrelation in postgres

Lists: pgsql-hackers
From: mahendra chavan <mahcha(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: query decorrelation in postgres
Date: 2009-07-23 12:29:19
Message-ID: 5a0dbeb90907230529q4e8ce1earcdd78a3ede57e5f8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I am a master's student in computer science at IIT Bombay. As part of my
project, I need to get a decorrelated version of a SQL query. Please could
anyone let me know if we have query decorrelation feature implemented in
postgres ?

Thanks,
Mahendra Chavan


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: mahendra chavan <mahcha(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: query decorrelation in postgres
Date: 2009-07-24 01:02:25
Message-ID: 20090724095459.93D7.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


mahendra chavan <mahcha(at)gmail(dot)com> wrote:

> I am a master's student in computer science at IIT Bombay. As part of my
> project, I need to get a decorrelated version of a SQL query. Please could
> anyone let me know if we have query decorrelation feature implemented in
> postgres ?

What do you mean by "query decorrelation"? Is it an addtional method for
query optimization? At least there is no word 'decorrelation' in
the postgres documentation.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: mahendra chavan <mahcha(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: query decorrelation in postgres
Date: 2009-07-24 06:27:51
Message-ID: 5a0dbeb90907232327g22b9f3b1hefc8e281a6225eb0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am sorry for not elaborating on that. What I meant by de-correlation was
optimizing a query to get rid of sub-queirs by using joins.

eg. In the TPC-H schema, a query to find out the names of suppliers who
supply parts having size < 100

*Query with nested subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
WHERE
S_SUPPKEY
IN (
SELECT
PS_SUPPKEY
FROM
PARTSUPP
WHERE
PS_PARTKEY
IN (
SELECT
P_PARTKEY
FROM
PART
WHERE
P_SIZE < 100
)

*Query with joins without subqueries:*

SELECT
S_NAME
FROM
SUPPLIER
INNER JOIN
PARTSUPP
ON
S_SUPPKEY = PS_SUPPKEY
INNER JOIN
PART
ON
P_PARTKEY = PS_PARTKEY
WHERE
P_SIZE < 100

Thanks,
Mahendra

On Thu, Jul 23, 2009 at 9:02 PM, Itagaki Takahiro <
itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:

>
> mahendra chavan <mahcha(at)gmail(dot)com> wrote:
>
> > I am a master's student in computer science at IIT Bombay. As part of my
> > project, I need to get a decorrelated version of a SQL query. Please
> could
> > anyone let me know if we have query decorrelation feature implemented in
> > postgres ?
>
> What do you mean by "query decorrelation"? Is it an addtional method for
> query optimization? At least there is no word 'decorrelation' in
> the postgres documentation.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "mahendra chavan" <mahcha(at)gmail(dot)com>, "Itagaki Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: query decorrelation in postgres
Date: 2009-07-24 14:02:38
Message-ID: 4A6978AF0200002500028D07@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mahendra chavan <mahcha(at)gmail(dot)com> wrote:
> What I meant by de-correlation was optimizing a query to get rid of
> sub-queirs by using joins.
>
> eg. In the TPC-H schema, a query to find out the names of suppliers
> who supply parts having size < 100
>
> *Query with nested subqueries:*
>
> SELECT
> S_NAME
> FROM
> SUPPLIER
> WHERE
> S_SUPPKEY
> IN (
> SELECT
> PS_SUPPKEY
> FROM
> PARTSUPP
> WHERE
> PS_PARTKEY
> IN (
> SELECT
> P_PARTKEY
> FROM
> PART
> WHERE
> P_SIZE < 100
> )
>
>
>
> *Query with joins without subqueries:*
>
>
> SELECT
> S_NAME
> FROM
> SUPPLIER
> INNER JOIN
> PARTSUPP
> ON
> S_SUPPKEY = PS_SUPPKEY
> INNER JOIN
> PART
> ON
> P_PARTKEY = PS_PARTKEY
> WHERE
> P_SIZE < 100
>

Those two queries aren't exactly identical, because you could get
duplicate rows in the second which would not be there in the first.
Optimizations to "pull up" subqueries into a higher level FROM clause
as joins have been in PostgreSQL for as long as I've been using it,
but the ability to do the specific optimization you show there
(without the duplicates) was added in version 8.4 using "semi-joins".
I don't think any syntax was added to explicitly write a query using
semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.

To see how the planner chooses to execute a given query against a
particular schema which has a particular set of statistics about the
data distributions, use the EXPLAIN option.

http://www.postgresql.org/docs/8.4/interactive/sql-explain.html

-Kevin


From: mahendra chavan <mahcha(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: query decorrelation in postgres
Date: 2009-07-24 15:25:40
Message-ID: 5a0dbeb90907240825g662834ceg80dbb88f073e4c2c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you for you response. I was looking for a query rewriting mechanism
which would be outside the optimizer and will do this kind of
transformations at the query level.

~Mahendra

On Fri, Jul 24, 2009 at 7:32 PM, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov
> wrote:

> mahendra chavan <mahcha(at)gmail(dot)com> wrote:
> > What I meant by de-correlation was optimizing a query to get rid of
> > sub-queirs by using joins.
> >
> > eg. In the TPC-H schema, a query to find out the names of suppliers
> > who supply parts having size < 100
> >
> > *Query with nested subqueries:*
> >
> > SELECT
> > S_NAME
> > FROM
> > SUPPLIER
> > WHERE
> > S_SUPPKEY
> > IN (
> > SELECT
> > PS_SUPPKEY
> > FROM
> > PARTSUPP
> > WHERE
> > PS_PARTKEY
> > IN (
> > SELECT
> > P_PARTKEY
> > FROM
> > PART
> > WHERE
> > P_SIZE < 100
> > )
> >
> >
> >
> > *Query with joins without subqueries:*
> >
> >
> > SELECT
> > S_NAME
> > FROM
> > SUPPLIER
> > INNER JOIN
> > PARTSUPP
> > ON
> > S_SUPPKEY = PS_SUPPKEY
> > INNER JOIN
> > PART
> > ON
> > P_PARTKEY = PS_PARTKEY
> > WHERE
> > P_SIZE < 100
> >
>
> Those two queries aren't exactly identical, because you could get
> duplicate rows in the second which would not be there in the first.
> Optimizations to "pull up" subqueries into a higher level FROM clause
> as joins have been in PostgreSQL for as long as I've been using it,
> but the ability to do the specific optimization you show there
> (without the duplicates) was added in version 8.4 using "semi-joins".
> I don't think any syntax was added to explicitly write a query using
> semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.
>
> To see how the planner chooses to execute a given query against a
> particular schema which has a particular set of statistics about the
> data distributions, use the EXPLAIN option.
>
> http://www.postgresql.org/docs/8.4/interactive/sql-explain.html
>
> -Kevin
>