Re: How do query optimizers affect window functions

Lists: pgsql-general
From: Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: How do query optimizers affect window functions
Date: 2012-11-14 08:12:42
Message-ID: CABBDWwfCfL779RiMpGaj6epJcurW0CBdM3gt5AVTqxr=LHGPug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi, Postgresql,

I want to understand how the query optimizers affect the output of the
window functions.

For example, set "cpu_tuple_cost = 50" in postgresql.conf and start the
server, I apply the regress test (make installcheck). The test of window
function fails.

Checking the diff and I found the output of the window functions are
different. For example,

For the following query:

SELECT sum(unique1) over (rows between current row and unbounded
following), unique1, four FROM tenk1 WHERE unique1 < 10;

The expected results are:

sum | unique1 | four
-----+---------+------
45 | 4 | 0
41 | 2 | 2
39 | 1 | 1
38 | 6 | 2
32 | 9 | 1
23 | 8 | 0
15 | 5 | 1
10 | 3 | 3
7 | 7 | 3
0 | 0 | 0

But the real results are:

sum | unique1 | four
-----+---------+------
45 | 0 | 0
45 | 1 | 1
44 | 2 | 2
42 | 3 | 3
39 | 4 | 0
35 | 5 | 1
30 | 6 | 2
24 | 7 | 3
17 | 8 | 0
9 | 9 | 1

There're altogether 6 queries in window test that outputs different query
results.

I don't understand why the results are different. Intuitively, the queries
show return the same results no matter what plan the optimizer choose.

I suspected the previous queries had some side effect on the latter one
(e.g., change the current row), so I removed all the previous queries
before this query in window.sql. But the result did not change.

Could anyone explain this behavior? Or point out how to investigate?

Thanks a lot!
Tianyin


From: Igor Romanchenko <igor(dot)a(dot)romanchenko(at)gmail(dot)com>
To: Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How do query optimizers affect window functions
Date: 2012-11-14 09:55:59
Message-ID: CAP95Gq=VpHbtX8gvML0q13ugqx97PA_h58HZORUUAkvrOOjCbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Nov 14, 2012 at 10:12 AM, Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu> wrote:

> Hi, Postgresql,
>
> I want to understand how the query optimizers affect the output of the
> window functions.
>
> For example, set "cpu_tuple_cost = 50" in postgresql.conf and start the
> server, I apply the regress test (make installcheck). The test of window
> function fails.
>
> Checking the diff and I found the output of the window functions are
> different. For example,
>
> For the following query:
>
> SELECT sum(unique1) over (rows between current row and unbounded
> following), unique1, four FROM tenk1 WHERE unique1 < 10;
>
> The expected results are:
>
> sum | unique1 | four
> -----+---------+------
> 45 | 4 | 0
> 41 | 2 | 2
> 39 | 1 | 1
> 38 | 6 | 2
> 32 | 9 | 1
> 23 | 8 | 0
> 15 | 5 | 1
> 10 | 3 | 3
> 7 | 7 | 3
> 0 | 0 | 0
>
> But the real results are:
>
> sum | unique1 | four
> -----+---------+------
> 45 | 0 | 0
> 45 | 1 | 1
> 44 | 2 | 2
> 42 | 3 | 3
> 39 | 4 | 0
> 35 | 5 | 1
> 30 | 6 | 2
> 24 | 7 | 3
> 17 | 8 | 0
> 9 | 9 | 1
>
> There're altogether 6 queries in window test that outputs different query
> results.
>
> I don't understand why the results are different. Intuitively, the queries
> show return the same results no matter what plan the optimizer choose.
>
> I suspected the previous queries had some side effect on the latter one
> (e.g., change the current row), so I removed all the previous queries
> before this query in window.sql. But the result did not change.
>
> Could anyone explain this behavior? Or point out how to investigate?
>
> Thanks a lot!
> Tianyin
>
>

Hi.
In short: if no explicit ordering specivied for a query the resulting set
can be in any order. It is up to query optimizer to chose in what order the
resulting tuples will be.
The window function used in this test case rely on the order of the
resulting set (it sums from current to the last) so it will generate
different results for different query plans.

I think for this test cases (window functions) explicit ordering should be
specified. In "normal" cases order dependent window functions are newer
used without explicit ordering.


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How do query optimizers affect window functions
Date: 2012-11-14 19:59:29
Message-ID: CAMkU=1xe8waKiFLv-8VSKYvzCF3oY4WkBvaxY1v1GyY+s_Zg9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu> wrote:
> Hi, Postgresql,
>
> I want to understand how the query optimizers affect the output of the
> window functions.

Use "EXPLAIN".

One is an index scan, one is a bitmap scan. They return rows in a
different order.

..

> I don't understand why the results are different. Intuitively, the queries
> show return the same results no matter what plan the optimizer choose.

My intuition is that the query should refuse to run at all, because
the results are order dependent and you haven't specified an ordering.

Cheers,

Jeff


From: Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How do query optimizers affect window functions
Date: 2012-11-14 22:16:29
Message-ID: CABBDWwehiNbVwSDpYA4b4U-eKVXxoqUfGSVUW3MRm-O6WdZzCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks a lot, Jeff!

On Wed, Nov 14, 2012 at 11:59 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu> wrote:
> > Hi, Postgresql,
> >
> > I want to understand how the query optimizers affect the output of the
> > window functions.
>
> Use "EXPLAIN".
>
> One is an index scan, one is a bitmap scan. They return rows in a
> different order.
>
> ..
>
> > I don't understand why the results are different. Intuitively, the
> queries
> > show return the same results no matter what plan the optimizer choose.
>
> My intuition is that the query should refuse to run at all, because
> the results are order dependent and you haven't specified an ordering.
>
>
What do you mean by "refused to run"? You mean we have to specify the order
when using the window functions? Could you explain more?

Thanks!

> Cheers,
>
> Jeff
>

--
Tianyin XU,
http://cseweb.ucsd.edu/~tixu/


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How do query optimizers affect window functions
Date: 2012-11-14 22:38:46
Message-ID: CAMkU=1z2iZc3SYZMLc+WE_NaEJ96ONMUxpraA0e56deTsXi5qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Nov 14, 2012 at 2:16 PM, Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu> wrote:
> Thanks a lot, Jeff!
>
>
> On Wed, Nov 14, 2012 at 11:59 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>> On Wed, Nov 14, 2012 at 12:12 AM, Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu> wrote:
>> > Hi, Postgresql,
>> >
>> > I want to understand how the query optimizers affect the output of the
>> > window functions.
>>
>> Use "EXPLAIN".
>>
>> One is an index scan, one is a bitmap scan. They return rows in a
>> different order.
>>
>> ..
>>
>> > I don't understand why the results are different. Intuitively, the
>> > queries
>> > show return the same results no matter what plan the optimizer choose.
>>
>> My intuition is that the query should refuse to run at all, because
>> the results are order dependent and you haven't specified an ordering.
>>
>
> What do you mean by "refused to run"?

I mean that it could throw an error. Kind of like the way this
currently throws an error:

select b, sum(b) from foo;
ERROR: column "foo.b" must appear in the GROUP BY clause or be used
in an aggregate function.

To be clear, I am not saying that it does do this (clearly it does
not), just that my intuition is that it should do this.

> You mean we have to specify the order
> when using the window functions? Could you explain more?

Not all uses of window functions have results that depend on the
order. If you only use "partition by", there would be no reason to
force an ordering, for example.

Cheers,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How do query optimizers affect window functions
Date: 2012-11-14 23:04:12
Message-ID: 28279.1352934252@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Wed, Nov 14, 2012 at 2:16 PM, Tianyin Xu <tixu(at)cs(dot)ucsd(dot)edu> wrote:
>> What do you mean by "refused to run"?

> I mean that it could throw an error. Kind of like the way this
> currently throws an error:

> select b, sum(b) from foo;
> ERROR: column "foo.b" must appear in the GROUP BY clause or be used
> in an aggregate function.

> To be clear, I am not saying that it does do this (clearly it does
> not), just that my intuition is that it should do this.

The SQL standard says that underspecified window ordering gives you
implementation-dependent results, but not an error. (Their use of
"implementation-dependent" basically means "unspecified".)

I think this is a fairly reasonable definition, since in many practical
cases it would be hard for the parser to tell whether the window
ordering was nailed down sufficiently to give a unique result, anyway.
(Even if we required you to give an ORDER BY for each column, there are
examples such as zero/minus-zero in float8 where that doesn't produce a
unique ordering. And such a requirement would just be a pain in the
rear a lot of the time.)

It's also consistent with what you get if, for example, you use LIMIT
without an ORDER BY or with an ORDER BY that doesn't constrain the
results to a unique row ordering.

In practice it's on the user to be sure he's nailed down the row
ordering sufficiently to get the results he wants in these cases.

regards, tom lane