Re: CTE vs Subquery

From: Linos <info(at)linos(dot)es>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CTE vs Subquery
Date: 2011-10-26 12:55:36
Message-ID: 4EA80348.3030203@linos.es
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

El 26/10/11 14:23, Merlin Moncure escribió:
> On Wed, Oct 26, 2011 at 4:00 AM, Linos <info(at)linos(dot)es> wrote:
>> El 25/10/11 19:11, Merlin Moncure escribió:
>>> On Tue, Oct 25, 2011 at 11:47 AM, Linos <info(at)linos(dot)es> wrote:
>>>> El 25/10/11 18:43, Tom Lane escribió:
>>>>> Linos <info(at)linos(dot)es> writes:
>>>>>> i am having any problems with performance of queries that uses CTE, can the
>>>>>> join on a CTE use the index of the original table?
>>>>>
>>>>> CTEs act as optimization fences. This is a feature, not a bug. Use
>>>>> them when you want to isolate the evaluation of a subquery.
>>>>>
>>>>> regards, tom lane
>>>>>
>>>>
>>>> The truth it is that complex queries seems more readable using them (maybe a
>>>> personal preference no doubt).
>>>>
>>>> Do have other popular databases the same behavior? SQL Server or Oracle for example?
>>>
>>> In my experience, SQL server also materializes them -- basically CTE
>>> is short hand for 'CREATE TEMP TABLE foo AS SELECT...' then joining to
>>> foo. If you want join behavior, use a join (by the way IIRC SQL
>>> Server is a lot more restrictive about placement of ORDER BY).
>>>
>>> I like CTE current behavior -- the main place I find it awkward is in
>>> use of recursive queries because the CTE fence forces me to abstract
>>> the recursion behind a function, not a view since pushing the view
>>> qual down into the CTE is pretty horrible:
>>>
>>> postgres=# explain select foo.id, (with bar as (select id from foo f
>>> where f.id = foo.id) select * from bar) from foo where foo.id = 11;
>>> QUERY PLAN
>>> -------------------------------------------------------------------------------------
>>> Index Scan using foo_idx on foo (cost=0.00..16.57 rows=1 width=4)
>>> Index Cond: (id = 11)
>>> SubPlan 2
>>> -> CTE Scan on bar (cost=8.28..8.30 rows=1 width=4)
>>> CTE bar
>>> -> Index Scan using foo_idx on foo f (cost=0.00..8.28
>>> rows=1 width=4)
>>> Index Cond: (id = $0)
>>> (7 rows)
>>>
>>> whereas for function you can inject your qual inside the CTE pretty
>>> easily. this is a different problem than the one you're describing
>>> though. for the most part, CTE execution fence is a very good thing,
>>> since it enforces restrictions that other features can leverage, for
>>> example 'data modifying with' queries (by far my all time favorite
>>> postgres enhancement).
>>>
>>> merlin
>>>
>>
>> ok, i get the idea, but i still don't understand what Tom says about isolate
>> evaluation, apart from the performance and the readability, if i am not using
>> writable CTE or recursive CTE, what it is the difference in evaluation (about
>> being isolate) of a subquery vs CTE with the same text inside.
>>
>> I have been using this form lately:
>>
>> WITH inv (SELECT item_id,
>> SUM(units) AS units
>> FROM invoices),
>>
>> quo AS (SELECT item_id,
>> SUM(units) AS units
>> FROM quotes)
>>
>> SELECT items.item_id,
>> CASE WHEN inv.units IS NOT NULL THEN inv.units ELSE 0 END AS
>> units_invoices,
>> CASE WHEN quo.units IS NOT NULL THEN quo.units ELSE 0 END AS
>> units_quotes
>>
>> FROM items
>> LEFT JOIN inv ON inv.item_id = items.item_id
>> LEFT JOIN quo ON quo.item_id = items.item_id
>>
>> Well this is oversimplified because i use much more tables and filter based on
>> dates, but you get the idea, it seems that this type of query should use
>> subqueries, no?
>
> Think about a query like this:
> with foo as
> (
> select id, volatile_func() from bar
> )
> select * from baz join foo using (id) join bla using(id) limit 10;
>
> How many times does volatile_func() get called? How many times in the
> JOIN version? The answers are different...
>
> One of the key features of CTEs is controlling how/when query
> operations occur so you can do things like control side effects and
> force query plans that the server would not otherwise choose (although
> this is typically an unoptimization).
>
> merlin

Ok, i think i understand now the difference, thanks Merlin.

Regards,
Miguel Ángel.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Cezariusz Marek 2011-10-26 12:56:51 Re: Slow cursor
Previous Message Gregg Jaskiewicz 2011-10-26 12:49:17 Re: Slow cursor