Re: Query Optimizer Failure / Possible Bug

Lists: pgsql-performance
From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Query Optimizer Failure / Possible Bug
Date: 2005-03-24 10:31:11
Message-ID: d1u4m3$1qm2$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

The query and the corresponding EXPLAIN is at

http://hannes.imos.net/query.txt

I'd like to use the column q.replaced_serials for multiple calculations
in the SELECT clause, but every time it is referenced there in some way
the whole query in the FROM clause returning q is executed again.

This doesn't make sense to me at all and eats performance.

If this wasn't clear enough, for every

q.replaced_serials <insert_random_calculation> AS some_column

in the SELECT clause there is new block of

---------------------------------------------------------------
-> Aggregate (cost=884.23..884.23 rows=1 width=0)
-> Nested Loop (cost=0.00..884.23 rows=1 width=0)
-> Index Scan using ix_rma_ticket_serials_replace on

rma_ticket_serials rts (cost=0.00..122.35
rows=190 width=4)
Index Cond: ("replace" = false)
-> Index Scan using pk_serials on serials s
(cost=0.00..3.51 rows=1 width=4)
Index Cond: (s.serial_id = "outer".serial_id)
Filter: ((article_no = $0) AND (delivery_id = $1))
---------------------------------------------------------------

in the EXPLAIN result.

For those who wonder why I do this FROM (SELECT...). I was searching for
a way to use the result of an subselect for multiple calculations in the
SELECT clause and return that calculation results as individual columns.

I tested a bit further and found out that PG behaves the same in case q
is a view. This makes me wonder how efficient the optimizer can work
with views - or even worse - nested views.

Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.

Thanks in advance,
Hannes Dorbath


From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Optimizer Failure / Possible Bug
Date: 2005-03-28 14:14:44
Message-ID: d293gg$1p0e$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

hm, a few days and not a single reply :|

any more information needed? test data? simplified test case? anything?

thanks

Hannes Dorbath wrote:
> The query and the corresponding EXPLAIN is at
>
> http://hannes.imos.net/query.txt
>
> I'd like to use the column q.replaced_serials for multiple calculations
> in the SELECT clause, but every time it is referenced there in some way
> the whole query in the FROM clause returning q is executed again.
>
> This doesn't make sense to me at all and eats performance.
>
> If this wasn't clear enough, for every
>
> q.replaced_serials <insert_random_calculation> AS some_column
>
> in the SELECT clause there is new block of
>
> ---------------------------------------------------------------
> -> Aggregate (cost=884.23..884.23 rows=1 width=0)
> -> Nested Loop (cost=0.00..884.23 rows=1 width=0)
> -> Index Scan using ix_rma_ticket_serials_replace on
> rma_ticket_serials rts (cost=0.00..122.35
> rows=190 width=4)
> Index Cond: ("replace" = false)
> -> Index Scan using pk_serials on serials s
> (cost=0.00..3.51 rows=1 width=4)
> Index Cond: (s.serial_id = "outer".serial_id)
> Filter: ((article_no = $0) AND (delivery_id = $1))
> ---------------------------------------------------------------
>
> in the EXPLAIN result.
>
> For those who wonder why I do this FROM (SELECT...). I was searching for
> a way to use the result of an subselect for multiple calculations in the
> SELECT clause and return that calculation results as individual columns.
>
> I tested a bit further and found out that PG behaves the same in case q
> is a view. This makes me wonder how efficient the optimizer can work
> with views - or even worse - nested views.
>
> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>
>
> Thanks in advance,
> Hannes Dorbath

--
imos Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Optimizer Failure / Possible Bug
Date: 2005-03-28 19:51:17
Message-ID: 200503281151.17344.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hannes,

> The query and the corresponding EXPLAIN is at
>
> http://hannes.imos.net/query.txt

The problem is that you're using a complex corellated sub-select in the SELECT
clause:

SELECT
d.delivery_id,
da.article_no,
da.amount,
(
SELECT
COUNT(*)
FROM
serials s
INNER JOIN rma_ticket_serials rts ON (
s.serial_id = rts.serial_id
)
WHERE
s.article_no = da.article_no AND
s.delivery_id = d.delivery_id AND
rts.replace = FALSE
) AS replaced_serials

This means that the planner pretty much has to iterate over the subquery,
running it once for each row in the result set. If you want the optimizer
to use a JOIN structure instead, put the subselect in the FROM clause.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Optimizer Failure / Possible Bug
Date: 2005-03-29 00:15:01
Message-ID: d2a6m1$tkh$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thank you very much for your reply. I'll try to modify it.

Josh Berkus wrote:
> Hannes,
>
>
>>The query and the corresponding EXPLAIN is at
>>
>>http://hannes.imos.net/query.txt
>
>
> The problem is that you're using a complex corellated sub-select in the SELECT
> clause:
>
> SELECT
> d.delivery_id,
> da.article_no,
> da.amount,
> (
> SELECT
> COUNT(*)
> FROM
> serials s
> INNER JOIN rma_ticket_serials rts ON (
> s.serial_id = rts.serial_id
> )
> WHERE
> s.article_no = da.article_no AND
> s.delivery_id = d.delivery_id AND
> rts.replace = FALSE
> ) AS replaced_serials
>
> This means that the planner pretty much has to iterate over the subquery,
> running it once for each row in the result set. If you want the optimizer
> to use a JOIN structure instead, put the subselect in the FROM clause.
>

--
imos Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost
Tel: 07161 93339-14 // Fax: 07161 93339-99 // Internet: www.imos.net


From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Hannes Dorbath" <light(at)theendofthetunnel(dot)de>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Optimizer Failure / Possible Bug
Date: 2005-04-03 08:01:13
Message-ID: op.sonckb1pth1vuj@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Noticed this problem,too.
You can always make the calculation you want done once inside a set
returning function so it'll behave like a table, but that's ugly.

On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
<light(at)theendofthetunnel(dot)de> wrote:

> hm, a few days and not a single reply :|
>
> any more information needed? test data? simplified test case? anything?
>
>
> thanks
>
>
> Hannes Dorbath wrote:
>> The query and the corresponding EXPLAIN is at
>> http://hannes.imos.net/query.txt
>> I'd like to use the column q.replaced_serials for multiple calculations
>> in the SELECT clause, but every time it is referenced there in some way
>> the whole query in the FROM clause returning q is executed again.
>> This doesn't make sense to me at all and eats performance.
>> If this wasn't clear enough, for every
>> q.replaced_serials <insert_random_calculation> AS some_column
>> in the SELECT clause there is new block of
>> ---------------------------------------------------------------
>> -> Aggregate (cost=884.23..884.23 rows=1 width=0)
>> -> Nested Loop (cost=0.00..884.23 rows=1 width=0)
>> -> Index Scan using ix_rma_ticket_serials_replace on
>> rma_ticket_serials rts (cost=0.00..122.35
>> rows=190 width=4)
>> Index Cond: ("replace" = false)
>> -> Index Scan using pk_serials on serials s
>> (cost=0.00..3.51 rows=1 width=4)
>> Index Cond: (s.serial_id = "outer".serial_id)
>> Filter: ((article_no = $0) AND (delivery_id = $1))
>> ---------------------------------------------------------------
>> in the EXPLAIN result.
>> For those who wonder why I do this FROM (SELECT...). I was searching
>> for
>> a way to use the result of an subselect for multiple calculations in the
>> SELECT clause and return that calculation results as individual columns.
>> I tested a bit further and found out that PG behaves the same in case q
>> is a view. This makes me wonder how efficient the optimizer can work
>> with views - or even worse - nested views.
>> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>> Thanks in advance,
>> Hannes Dorbath
>


From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Optimizer Failure / Possible Bug
Date: 2005-04-04 15:18:24
Message-ID: 42515478$0$5507$8fe63b2a@news.disputo.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Mhh. I have no clue about the internals of PostgreSQL and query planing,
but to me as user this should really be a thing the optimizer has to
work out..

On 03.04.2005 10:01, PFC wrote:
>
> Noticed this problem,too.
> You can always make the calculation you want done once inside a set
> returning function so it'll behave like a table, but that's ugly.
>
> On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
> <light(at)theendofthetunnel(dot)de> wrote:
>
>> hm, a few days and not a single reply :|
>>
>> any more information needed? test data? simplified test case? anything?
>>
>>
>> thanks
>>
>>
>> Hannes Dorbath wrote:
>>
>>> The query and the corresponding EXPLAIN is at
>>> http://hannes.imos.net/query.txt
>>> I'd like to use the column q.replaced_serials for multiple calculations
>>> in the SELECT clause, but every time it is referenced there in some way
>>> the whole query in the FROM clause returning q is executed again.
>>> This doesn't make sense to me at all and eats performance.
>>> If this wasn't clear enough, for every
>>> q.replaced_serials <insert_random_calculation> AS some_column
>>> in the SELECT clause there is new block of
>>> ---------------------------------------------------------------
>>> -> Aggregate (cost=884.23..884.23 rows=1 width=0)
>>> -> Nested Loop (cost=0.00..884.23 rows=1 width=0)
>>> -> Index Scan using ix_rma_ticket_serials_replace on
>>> rma_ticket_serials rts (cost=0.00..122.35
>>> rows=190 width=4)
>>> Index Cond: ("replace" = false)
>>> -> Index Scan using pk_serials on serials s
>>> (cost=0.00..3.51 rows=1 width=4)
>>> Index Cond: (s.serial_id = "outer".serial_id)
>>> Filter: ((article_no = $0) AND (delivery_id = $1))
>>> ---------------------------------------------------------------
>>> in the EXPLAIN result.
>>> For those who wonder why I do this FROM (SELECT...). I was
>>> searching for
>>> a way to use the result of an subselect for multiple calculations in the
>>> SELECT clause and return that calculation results as individual columns.
>>> I tested a bit further and found out that PG behaves the same in case q
>>> is a view. This makes me wonder how efficient the optimizer can work
>>> with views - or even worse - nested views.
>>> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>>> Thanks in advance,
>>> Hannes Dorbath
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>


From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query Optimizer Failure / Possible Bug
Date: 2005-04-16 10:45:30
Message-ID: 4260e649$0$24290$8fe63b2a@news.disputo.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Some people on the #postgresql irc channel pointed out that it's a known
issue.

http://www.qaix.com/postgresql-database-development/246-557-select-based-on-function-result-read.shtml

A more simple testcase is below. Adding OFFSET 0 to the inner query does
indeed fix it in my case.

SELECT
tmp.user_id AS foo,
tmp.user_id AS bar,
tmp.user_id AS baz
FROM
(
SELECT
u.user_id
FROM
users u
) AS tmp;

Seq Scan on users (cost=0.00..1.53 rows=53 width=4) (actual
time=0.230..0.233 rows=1 loops=1)
Total runtime: 0.272 ms

---------------------------

SELECT
tmp.user_id AS foo,
tmp.user_id AS bar,
tmp.user_id AS baz
FROM
(
SELECT
(SELECT 1) AS user_id
FROM
users u
) AS tmp;

Seq Scan on users u (cost=0.03..1.56 rows=53 width=0) (actual
time=0.216..0.219 rows=1 loops=1)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.004..0.006 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.002..0.004 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.002..0.003 rows=1 loops=1)
Total runtime: 0.270 ms

---------------------------

SELECT
tmp.user_id AS foo,
tmp.user_id AS bar,
tmp.user_id AS baz
FROM
(
SELECT
(SELECT 1) AS user_id
FROM
users u
OFFSET 0
) AS tmp;

Subquery Scan tmp (cost=0.01..1.03 rows=1 width=4) (actual
time=0.032..0.042 rows=1 loops=1)
-> Limit (cost=0.01..1.02 rows=1 width=0) (actual time=0.026..0.033
rows=1 loops=1)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.003..0.004 rows=1 loops=1)
-> Seq Scan on users u (cost=0.00..1.01 rows=1 width=0)
(actual time=0.022..0.027 rows=1 loops=1)
Total runtime: 0.090 ms

On 04.04.2005 17:18, Hannes Dorbath wrote:
> Mhh. I have no clue about the internals of PostgreSQL and query planing,
> but to me as user this should really be a thing the optimizer has to
> work out..
>
>
> On 03.04.2005 10:01, PFC wrote:
>
>>
>> Noticed this problem,too.
>> You can always make the calculation you want done once inside a
>> set returning function so it'll behave like a table, but that's ugly.
>>
>> On Mon, 28 Mar 2005 16:14:44 +0200, Hannes Dorbath
>> <light(at)theendofthetunnel(dot)de> wrote:
>>
>>> hm, a few days and not a single reply :|
>>>
>>> any more information needed? test data? simplified test case? anything?
>>>
>>>
>>> thanks
>>>
>>>
>>> Hannes Dorbath wrote:
>>>
>>>> The query and the corresponding EXPLAIN is at
>>>> http://hannes.imos.net/query.txt
>>>> I'd like to use the column q.replaced_serials for multiple
>>>> calculations
>>>> in the SELECT clause, but every time it is referenced there in some way
>>>> the whole query in the FROM clause returning q is executed again.
>>>> This doesn't make sense to me at all and eats performance.
>>>> If this wasn't clear enough, for every
>>>> q.replaced_serials <insert_random_calculation> AS some_column
>>>> in the SELECT clause there is new block of
>>>> ---------------------------------------------------------------
>>>> -> Aggregate (cost=884.23..884.23 rows=1 width=0)
>>>> -> Nested Loop (cost=0.00..884.23 rows=1 width=0)
>>>> -> Index Scan using ix_rma_ticket_serials_replace on
>>>> rma_ticket_serials rts (cost=0.00..122.35
>>>> rows=190 width=4)
>>>> Index Cond: ("replace" = false)
>>>> -> Index Scan using pk_serials on serials s
>>>> (cost=0.00..3.51 rows=1 width=4)
>>>> Index Cond: (s.serial_id = "outer".serial_id)
>>>> Filter: ((article_no = $0) AND (delivery_id = $1))
>>>> ---------------------------------------------------------------
>>>> in the EXPLAIN result.
>>>> For those who wonder why I do this FROM (SELECT...). I was
>>>> searching for
>>>> a way to use the result of an subselect for multiple calculations in
>>>> the
>>>> SELECT clause and return that calculation results as individual
>>>> columns.
>>>> I tested a bit further and found out that PG behaves the same in
>>>> case q
>>>> is a view. This makes me wonder how efficient the optimizer can work
>>>> with views - or even worse - nested views.
>>>> Tested and reproduced on PG 7.4.1 linux and 8.0.0 win32.
>>>> Thanks in advance,
>>>> Hannes Dorbath
>>>
>>>
>>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faq
>>