Re: CTE vs Subquery

Lists: pgsql-performance
From: Linos <info(at)linos(dot)es>
To: pgsql-performance(at)postgresql(dot)org
Subject: CTE vs Subquery
Date: 2011-10-25 16:22:42
Message-ID: 4EA6E252.6030002@linos.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all,
i am having any problems with performance of queries that uses CTE, can the
join on a CTE use the index of the original table?, suppose two simple tables:

CREATE TABLE employee
(
emp_id integer NOT NULL,
name character varying NOT NULL,
CONSTRAINT employee_pkey PRIMARY KEY (emp_id )
);

CREATE TABLE employee_telephone
(
emp_id integer NOT NULL,
phone_type character varying NOT NULL,
phone_number character varying NOT NULL,
CONSTRAINT employee_telephone_pkey PRIMARY KEY (emp_id , phone_type ),
CONSTRAINT employee_telephone_emp_id_fkey FOREIGN KEY (emp_id)
REFERENCES employee (emp_id)
);

and this two queries, i know this particular case don't need either a CTE or
subquery it is only an example:

WITH phones AS (SELECT emp_id,
phone_number
ORDER BY emp_id,
phone_type)
SELECT emp.emp_id,
emp.name,
array_to_string(array_agg(phones.phone_number)) AS phones

FROM employee AS emp
JOIN phones ON phones.emp_id = emp.emp_id

VS

SELECT emp.emp_id,
emp.name,
array_to_string(array_agg(phones.phone_number)) AS phones

FROM employee AS emp
JOIN (SELECT emp_id,
phone_number
ORDER BY emp_id,
phone_type) AS phones ON phones.emp_id = emp.emp_id

Why the CTE it is slower in many cases? does the CTE don't use the index for the
join and the subquery do? if the CTE it is usually slower where should be used
instead of a subquery other than recursive CTE's?

Regards,
Miguel Angel.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Linos <info(at)linos(dot)es>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-25 16:43:45
Message-ID: 24672.1319561025@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Linos <info(at)linos(dot)es> writes:
> i am having any problems with performance of queries that uses CTE, can the
> join on a CTE use the index of the original table?

CTEs act as optimization fences. This is a feature, not a bug. Use
them when you want to isolate the evaluation of a subquery.

regards, tom lane


From: Linos <info(at)linos(dot)es>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-25 16:47:53
Message-ID: 4EA6E839.6050804@linos.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

El 25/10/11 18:43, Tom Lane escribió:
> Linos <info(at)linos(dot)es> writes:
>> i am having any problems with performance of queries that uses CTE, can the
>> join on a CTE use the index of the original table?
>
> CTEs act as optimization fences. This is a feature, not a bug. Use
> them when you want to isolate the evaluation of a subquery.
>
> regards, tom lane
>

The truth it is that complex queries seems more readable using them (maybe a
personal preference no doubt).

Do have other popular databases the same behavior? SQL Server or Oracle for example?

Regards,
Miguel Ángel.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Linos <info(at)linos(dot)es>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-25 17:11:23
Message-ID: CAHyXU0wTCW0ymy4OQWxmO_kuMiSf+3PkFonTujDy_PVMCvGYYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Oct 25, 2011 at 11:47 AM, Linos <info(at)linos(dot)es> wrote:
> El 25/10/11 18:43, Tom Lane escribió:
>> Linos <info(at)linos(dot)es> writes:
>>>     i am having any problems with performance of queries that uses CTE, can the
>>> join on a CTE use the index of the original table?
>>
>> CTEs act as optimization fences.  This is a feature, not a bug.  Use
>> them when you want to isolate the evaluation of a subquery.
>>
>>                       regards, tom lane
>>
>
> The truth it is that complex queries seems more readable using them (maybe a
> personal preference no doubt).
>
> Do have other popular databases the same behavior? SQL Server or Oracle for example?

In my experience, SQL server also materializes them -- basically CTE
is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
foo. If you want join behavior, use a join (by the way IIRC SQL
Server is a lot more restrictive about placement of ORDER BY).

I like CTE current behavior -- the main place I find it awkward is in
use of recursive queries because the CTE fence forces me to abstract
the recursion behind a function, not a view since pushing the view
qual down into the CTE is pretty horrible:

postgres=# explain select foo.id, (with bar as (select id from foo f
where f.id = foo.id) select * from bar) from foo where foo.id = 11;
QUERY PLAN
-------------------------------------------------------------------------------------
Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4)
Index Cond: (id = 11)
SubPlan 2
-> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4)
CTE bar
-> Index Scan using foo_idx on foo f (cost=0.00..8.28
rows=1 width=4)
Index Cond: (id = $0)
(7 rows)

whereas for function you can inject your qual inside the CTE pretty
easily. this is a different problem than the one you're describing
though. for the most part, CTE execution fence is a very good thing,
since it enforces restrictions that other features can leverage, for
example 'data modifying with' queries (by far my all time favorite
postgres enhancement).

merlin


From: Linos <info(at)linos(dot)es>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-26 09:00:48
Message-ID: 4EA7CC40.9000200@linos.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

El 25/10/11 19:11, Merlin Moncure escribió:
> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info(at)linos(dot)es> wrote:
>> El 25/10/11 18:43, Tom Lane escribió:
>>> Linos <info(at)linos(dot)es> writes:
>>>> i am having any problems with performance of queries that uses CTE, can the
>>>> join on a CTE use the index of the original table?
>>>
>>> CTEs act as optimization fences. This is a feature, not a bug. Use
>>> them when you want to isolate the evaluation of a subquery.
>>>
>>> regards, tom lane
>>>
>>
>> The truth it is that complex queries seems more readable using them (maybe a
>> personal preference no doubt).
>>
>> Do have other popular databases the same behavior? SQL Server or Oracle for example?
>
> In my experience, SQL server also materializes them -- basically CTE
> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
> foo. If you want join behavior, use a join (by the way IIRC SQL
> Server is a lot more restrictive about placement of ORDER BY).
>
> I like CTE current behavior -- the main place I find it awkward is in
> use of recursive queries because the CTE fence forces me to abstract
> the recursion behind a function, not a view since pushing the view
> qual down into the CTE is pretty horrible:
>
> postgres=# explain select foo.id, (with bar as (select id from foo f
> where f.id = foo.id) select * from bar) from foo where foo.id = 11;
> QUERY PLAN
> -------------------------------------------------------------------------------------
> Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4)
> Index Cond: (id = 11)
> SubPlan 2
> -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4)
> CTE bar
> -> Index Scan using foo_idx on foo f (cost=0.00..8.28
> rows=1 width=4)
> Index Cond: (id = $0)
> (7 rows)
>
> whereas for function you can inject your qual inside the CTE pretty
> easily. this is a different problem than the one you're describing
> though. for the most part, CTE execution fence is a very good thing,
> since it enforces restrictions that other features can leverage, for
> example 'data modifying with' queries (by far my all time favorite
> postgres enhancement).
>
> merlin
>

ok, i get the idea, but i still don't understand what Tom says about isolate
evaluation, apart from the performance and the readability, if i am not using
writable CTE or recursive CTE, what it is the difference in evaluation (about
being isolate) of a subquery vs CTE with the same text inside.

I have been using this form lately:

WITH inv (SELECT item_id,
SUM(units) AS units
FROM invoices),

quo AS (SELECT item_id,
SUM(units) AS units
FROM quotes)

SELECT items.item_id,
CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS
units_invoices,
CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS
units_quotes

FROM items
LEFT JOIN inv ON inv.item_id = items.item_id
LEFT JOIN quo ON quo.item_id = items.item_id

Well this is oversimplified because i use much more tables and filter based on
dates, but you get the idea, it seems that this type of query should use
subqueries, no?

Regards,
Miguel Angel.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Linos <info(at)linos(dot)es>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-26 12:23:18
Message-ID: CAHyXU0z7qj=6G-B6aJ+-D0+kQtV7rBg3-J5=iFuhQEg_0WmoEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Oct 26, 2011 at 4:00 AM, Linos <info(at)linos(dot)es> wrote:
> El 25/10/11 19:11, Merlin Moncure escribió:
>> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info(at)linos(dot)es> wrote:
>>> El 25/10/11 18:43, Tom Lane escribió:
>>>> Linos <info(at)linos(dot)es> writes:
>>>>>     i am having any problems with performance of queries that uses CTE, can the
>>>>> join on a CTE use the index of the original table?
>>>>
>>>> CTEs act as optimization fences.  This is a feature, not a bug.  Use
>>>> them when you want to isolate the evaluation of a subquery.
>>>>
>>>>                       regards, tom lane
>>>>
>>>
>>> The truth it is that complex queries seems more readable using them (maybe a
>>> personal preference no doubt).
>>>
>>> Do have other popular databases the same behavior? SQL Server or Oracle for example?
>>
>> In my experience, SQL server also materializes them -- basically CTE
>> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
>> foo.  If you want join behavior, use a join (by the way IIRC SQL
>> Server is a lot more restrictive about placement of ORDER BY).
>>
>> I like CTE current behavior -- the main place I find it awkward is in
>> use of recursive queries because the CTE fence forces me to abstract
>> the recursion behind a function, not a view since pushing the view
>> qual down into the CTE is pretty horrible:
>>
>> postgres=# explain select foo.id, (with bar as (select id from foo f
>> where f.id = foo.id) select * from bar) from foo where foo.id = 11;
>>                                      QUERY PLAN
>> -------------------------------------------------------------------------------------
>>  Index Scan using foo_idx on foo  (cost=0.00..16.57 rows=1 width=4)
>>    Index Cond: (id = 11)
>>    SubPlan 2
>>      ->  CTE Scan on bar  (cost=8.28..8.30 rows=1 width=4)
>>            CTE bar
>>              ->  Index Scan using foo_idx on foo f  (cost=0.00..8.28
>> rows=1 width=4)
>>                    Index Cond: (id = $0)
>> (7 rows)
>>
>> whereas for function you can inject your qual inside the CTE pretty
>> easily.  this is a different problem than the one you're describing
>> though.  for the most part, CTE execution fence is a very good thing,
>> since it enforces restrictions that other features can leverage, for
>> example 'data modifying with' queries (by far my all time favorite
>> postgres enhancement).
>>
>> merlin
>>
>
> ok, i get the idea, but i still don't understand what Tom says about isolate
> evaluation, apart from the performance and the readability, if i am not using
> writable CTE or recursive CTE, what it is the difference in evaluation (about
> being isolate) of a subquery vs CTE with the same text inside.
>
> I have been using this form lately:
>
> WITH inv (SELECT item_id,
>                               SUM(units) AS units
>                 FROM invoices),
>
> quo AS (SELECT item_id,
>                             SUM(units) AS units
>              FROM quotes)
>
> SELECT items.item_id,
>              CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS
> units_invoices,
>              CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS
> units_quotes
>
> FROM items
>    LEFT JOIN inv ON inv.item_id = items.item_id
>    LEFT JOIN quo ON quo.item_id = items.item_id
>
> Well this is oversimplified because i use much more tables and filter based on
> dates, but you get the idea, it seems that this type of query should use
> subqueries, no?

Think about a query like this:
with foo as
(
select id, volatile_func() from bar
)
select * from baz join foo using (id) join bla using(id) limit 10;

How many times does volatile_func() get called? How many times in the
JOIN version? The answers are different...

One of the key features of CTEs is controlling how/when query
operations occur so you can do things like control side effects and
force query plans that the server would not otherwise choose (although
this is typically an unoptimization).

merlin


From: Linos <info(at)linos(dot)es>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-26 12:55:36
Message-ID: 4EA80348.3030203@linos.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

El 26/10/11 14:23, Merlin Moncure escribió:
> On Wed, Oct 26, 2011 at 4:00 AM, Linos <info(at)linos(dot)es> wrote:
>> El 25/10/11 19:11, Merlin Moncure escribió:
>>> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info(at)linos(dot)es> wrote:
>>>> El 25/10/11 18:43, Tom Lane escribió:
>>>>> Linos <info(at)linos(dot)es> writes:
>>>>>> i am having any problems with performance of queries that uses CTE, can the
>>>>>> join on a CTE use the index of the original table?
>>>>>
>>>>> CTEs act as optimization fences. This is a feature, not a bug. Use
>>>>> them when you want to isolate the evaluation of a subquery.
>>>>>
>>>>> regards, tom lane
>>>>>
>>>>
>>>> The truth it is that complex queries seems more readable using them (maybe a
>>>> personal preference no doubt).
>>>>
>>>> Do have other popular databases the same behavior? SQL Server or Oracle for example?
>>>
>>> In my experience, SQL server also materializes them -- basically CTE
>>> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
>>> foo. If you want join behavior, use a join (by the way IIRC SQL
>>> Server is a lot more restrictive about placement of ORDER BY).
>>>
>>> I like CTE current behavior -- the main place I find it awkward is in
>>> use of recursive queries because the CTE fence forces me to abstract
>>> the recursion behind a function, not a view since pushing the view
>>> qual down into the CTE is pretty horrible:
>>>
>>> postgres=# explain select foo.id, (with bar as (select id from foo f
>>> where f.id = foo.id) select * from bar) from foo where foo.id = 11;
>>> QUERY PLAN
>>> -------------------------------------------------------------------------------------
>>> Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4)
>>> Index Cond: (id = 11)
>>> SubPlan 2
>>> -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4)
>>> CTE bar
>>> -> Index Scan using foo_idx on foo f (cost=0.00..8.28
>>> rows=1 width=4)
>>> Index Cond: (id = $0)
>>> (7 rows)
>>>
>>> whereas for function you can inject your qual inside the CTE pretty
>>> easily. this is a different problem than the one you're describing
>>> though. for the most part, CTE execution fence is a very good thing,
>>> since it enforces restrictions that other features can leverage, for
>>> example 'data modifying with' queries (by far my all time favorite
>>> postgres enhancement).
>>>
>>> merlin
>>>
>>
>> ok, i get the idea, but i still don't understand what Tom says about isolate
>> evaluation, apart from the performance and the readability, if i am not using
>> writable CTE or recursive CTE, what it is the difference in evaluation (about
>> being isolate) of a subquery vs CTE with the same text inside.
>>
>> I have been using this form lately:
>>
>> WITH inv (SELECT item_id,
>> SUM(units) AS units
>> FROM invoices),
>>
>> quo AS (SELECT item_id,
>> SUM(units) AS units
>> FROM quotes)
>>
>> SELECT items.item_id,
>> CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS
>> units_invoices,
>> CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS
>> units_quotes
>>
>> FROM items
>> LEFT JOIN inv ON inv.item_id = items.item_id
>> LEFT JOIN quo ON quo.item_id = items.item_id
>>
>> Well this is oversimplified because i use much more tables and filter based on
>> dates, but you get the idea, it seems that this type of query should use
>> subqueries, no?
>
> Think about a query like this:
> with foo as
> (
> select id, volatile_func() from bar
> )
> select * from baz join foo using (id) join bla using(id) limit 10;
>
> How many times does volatile_func() get called? How many times in the
> JOIN version? The answers are different...
>
> One of the key features of CTEs is controlling how/when query
> operations occur so you can do things like control side effects and
> force query plans that the server would not otherwise choose (although
> this is typically an unoptimization).
>
> merlin

Ok, i think i understand now the difference, thanks Merlin.

Regards,
Miguel Ángel.