BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries

Lists: pgsql-bugs
From: kouber(at)saparev(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-22 18:53:19
Message-ID: E1S0HJX-0004pP-9Y@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 6483
Logged by: Kouber Saparev
Email address: kouber(at)saparev(dot)com
PostgreSQL version: 9.1.2
Operating system: Debian
Description:

The rows of a SELECT statement are being evaluated, even when not shown in
the final result, when using an OFFSET > 0. Although I know that LIMIT is
imposed just before flushing the result set to the client, this behaviour
seems quite confusing, especially when using DML statements in the field
list of the SELECT itself.

CREATE TABLE xxx (id INT);

CREATE FUNCTION f(xxx) RETURNS VOID AS $$
BEGIN
-- imagine some DML statements here --

RAISE NOTICE '%', $1.id;
END;
$$ LANGUAGE PLPGSQL;

INSERT INTO xxx VALUES (1), (2), (3), (4), (5);

-- shows a notice for 1 and 2
SELECT x.id, f(x) FROM xxx as x LIMIT 2;

-- shows a notice for 1, 2, 3 and 4
SELECT x.id, f(x) FROM xxx as x LIMIT 2 OFFSET 2;


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: kouber(at)saparev(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-22 18:58:43
Message-ID: CAFj8pRBFVec_RuoQ-4CrYPjcq6zpdB+HpkFCm+9-HhN3j-wc2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hello

this is not bug

"OFFSET" doesn't mean go to line n - it means - first n lines don't
send to client.

Regards

Pavel Stehule

2012/2/22 <kouber(at)saparev(dot)com>:
> The following bug has been logged on the website:
>
> Bug reference:      6483
> Logged by:          Kouber Saparev
> Email address:      kouber(at)saparev(dot)com
> PostgreSQL version: 9.1.2
> Operating system:   Debian
> Description:
>
> The rows of a SELECT statement are being evaluated, even when not shown in
> the final result, when using an OFFSET > 0. Although I know that LIMIT is
> imposed just before flushing the result set to the client, this behaviour
> seems quite confusing, especially when using DML statements in the field
> list of the SELECT itself.
>
> CREATE TABLE xxx (id INT);
>
> CREATE FUNCTION f(xxx) RETURNS VOID AS $$
> BEGIN
>  -- imagine some DML statements here --
>
>  RAISE NOTICE '%', $1.id;
> END;
> $$ LANGUAGE PLPGSQL;
>
> INSERT INTO xxx VALUES (1), (2), (3), (4), (5);
>
>
> -- shows a notice for 1 and 2
> SELECT x.id, f(x) FROM xxx as x LIMIT 2;
>
>
> -- shows a notice for 1, 2, 3 and 4
> SELECT x.id, f(x) FROM xxx as x LIMIT 2 OFFSET 2;
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: kouber(at)saparev(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-22 21:13:03
Message-ID: CABRT9RA6BWuTV1PESSQ-x_5+3qUJMgz0NnVZyzbFyjwY1CQgmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, Feb 22, 2012 at 20:53, <kouber(at)saparev(dot)com> wrote:
> -- shows a notice for 1, 2, 3 and 4
> SELECT x.id, f(x) FROM xxx as x LIMIT 2 OFFSET 2;

Currently the way to fix this is to use a subquery that acts as an
optimization barrier in the presence of OFFSET:
SELECT x.id, f(x) FROM (SELECT * FROM xxx as x LIMIT 2 OFFSET 2) as xxx;

> The rows of a SELECT statement are being evaluated, even when not shown in
> the final result, when using an OFFSET > 0. Although I know that LIMIT is
> imposed just before flushing the result set to the client, this behaviour
> seems quite confusing, especially when using DML statements in the field
> list of the SELECT itself.

Interesting, the model for evaluating queries is documented here:
http://www.postgresql.org/docs/9.1/static/sql-select.html

According to this model, evaluating SELECT clause fields for *all*
found rows is done in step 5, whereas LIMIT/OFFSET are only applied
later at step 9. So we're already bending the rules here (in general
we don't do such optimizations around volatile functions). The worst
thing is that it's inconsistent -- the LIMIT gets applied when
computing the SELECT list, but OFFSET doesn't.

In theory we could bend the model even more -- to push SELECT list
fields below the "Limit" node if they aren't referenced by ORDER and
there are no set operations. However, adapting the model to back to
this behavior seems rather impossible -- ORDER BY must be strictly
evaluated after SELECT list (it can refer to SELECT fields), and LIMIT
must be evaluated after ORDER BY, otherwise it makes no sense.

Or going the other way -- we could make it evaluate all rows if the
SELECT list if it contains volatile functions, and then apply the
LIMIT afterwards. That would go even more against "common sense", but
at least it would be "correct" :)

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: kouber(at)saparev(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-22 21:40:32
Message-ID: 25657.1329946832@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> According to this model, evaluating SELECT clause fields for *all*
> found rows is done in step 5, whereas LIMIT/OFFSET are only applied
> later at step 9. So we're already bending the rules here (in general
> we don't do such optimizations around volatile functions). The worst
> thing is that it's inconsistent -- the LIMIT gets applied when
> computing the SELECT list, but OFFSET doesn't.

On what grounds do you say that? LIMIT and OFFSET are practically the
same thing internally, and are certainly applied in the same way.

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: kouber(at)saparev(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-22 22:05:22
Message-ID: CABRT9RCzOKPCbnFwoaw_nfVgz17q+GkW=eUMVa01mEBPw_yO5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, Feb 22, 2012 at 23:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>> According to this model, evaluating SELECT clause fields for *all*
>> found rows is done in step 5, whereas LIMIT/OFFSET are only applied
>> later at step 9. So we're already bending the rules here (in general
>> we don't do such optimizations around volatile functions). The worst
>> thing is that it's inconsistent -- the LIMIT gets applied when
>> computing the SELECT list, but OFFSET doesn't.
>
> On what grounds do you say that?  LIMIT and OFFSET are practically the
> same thing internally, and are certainly applied in the same way.

The difference is that the SELECT fields for the first OFFSET rows are
*evaluated*, but aren't simply returned to the client. But beyond
LIMIT, query evaluation terminates entirely -- the rest of the SELECT
clause rows aren't evaluated.

AFAICT, the model in the documentation suggests that the SELECT fields
are evaluated for all matching rows in indeterminate order, before
ORDER BY is applied and before the result set is sliced by
OFFSET/LIMIT.

Regards,
Marti


From: Kouber Saparev <postgres(at)saparev(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-23 09:23:30
Message-ID: ji50ii$1e8$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 02/23/2012 12:05 AM, Marti Raudsepp wrote:
> On Wed, Feb 22, 2012 at 23:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>>> According to this model, evaluating SELECT clause fields for *all*
>>> found rows is done in step 5, whereas LIMIT/OFFSET are only applied
>>> later at step 9. So we're already bending the rules here (in general
>>> we don't do such optimizations around volatile functions). The worst
>>> thing is that it's inconsistent -- the LIMIT gets applied when
>>> computing the SELECT list, but OFFSET doesn't.
>>
>> On what grounds do you say that? LIMIT and OFFSET are practically the
>> same thing internally, and are certainly applied in the same way.
>
> The difference is that the SELECT fields for the first OFFSET rows are
> *evaluated*, but aren't simply returned to the client. But beyond
> LIMIT, query evaluation terminates entirely -- the rest of the SELECT
> clause rows aren't evaluated.
>
> AFAICT, the model in the documentation suggests that the SELECT fields
> are evaluated for all matching rows in indeterminate order, before
> ORDER BY is applied and before the result set is sliced by
> OFFSET/LIMIT.

Indeed, that's probably the main issue - it is not behaving
symmetrically, i.e. fetching the first two rows has one effect (and
performance impact), while fetching the last two - completely different.

In my case, I am making something like an "ON SELECT" rule, triggering
some actions once the rows are read (and sent to the client) from a
SELECT statement. The thing is that "read" and "sent to the client"
appear to be two different things in that case. While I will certainly
use a subquery for it, as proposed by Marti (since real cursors are not
an option in my stateless web environment), I do believe that at least
the documentation should be more clear concerning cases like that (if
the behaviour stays that way).

Regards,
--
Kouber Saparev


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: kouber(at)saparev(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6483: Rows being evaluated, although being outside the LIMIT / OFFSET boundaries
Date: 2012-02-23 16:18:15
Message-ID: 14925.1330013895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> AFAICT, the model in the documentation suggests that the SELECT fields
> are evaluated for all matching rows in indeterminate order, before
> ORDER BY is applied and before the result set is sliced by
> OFFSET/LIMIT.

That is in fact the case if you have a query plan that involves a Sort
node followed by Limit. We have some optimizations that avoid the need
for an explicit sort, but it would be pretty hard to write a
specification for exactly when unretrieved rows will not be evaluated.

regards, tom lane