why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?

Lists: pgsql-hackers
From: Chris Rogers <teukros(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?
Date: 2014-10-31 16:07:59
Message-ID: CAPo4y_U=2XjXrdttd8tJ7asnL_J0u59KT189=8gwSZW3ke6nUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm on PostgreSQL 9.3. This should reproduce on any table with 100,000+
rows. The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
2, but I can't figure out why.

Limit 1:

EXPLAIN ANALYZE WITH base AS (
SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
), filter AS (
SELECT rownum, true AS thing FROM base
) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
LIMIT 1

Result:

Limit (cost=283512.19..283517.66 rows=1 width=2114) (actual
time=0.019..0.019 rows=1 loops=1)
CTE base
-> WindowAgg (cost=0.00..188702.69 rows=4740475 width=101) (actual
time=0.008..0.008 rows=1 loops=1)
-> Seq Scan on a_big_table (cost=0.00..129446.75 rows=4740475
width=101) (actual time=0.003..0.003 rows=1 loops=1)
CTE filter
-> CTE Scan on base base_1 (cost=0.00..94809.50 rows=4740475 width=8)
(actual time=0.000..0.000 rows=1 loops=1)
-> Nested Loop (cost=0.00..307677626611.24 rows=56180269915 width=2114)
(actual time=0.018..0.018 rows=1 loops=1)
Join Filter: (base.rownum = filter.rownum)
-> CTE Scan on base (cost=0.00..94809.50 rows=4740475 width=2113)
(actual time=0.011..0.011 rows=1 loops=1)
-> CTE Scan on filter (cost=0.00..94809.50 rows=2370238 width=9)
(actual time=0.002..0.002 rows=1 loops=1)
Filter: thing
Total runtime: 0.057 ms

Limit 2:

EXPLAIN ANALYZE WITH base AS (
SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
), filter AS (
SELECT rownum, true AS thing FROM base
) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
LIMIT 2

Result:

Limit (cost=283512.19..283523.14 rows=2 width=2114) (actual
time=0.018..14162.283 rows=2 loops=1)
CTE base
-> WindowAgg (cost=0.00..188702.69 rows=4740475 width=101) (actual
time=0.008..4443.359 rows=4714243 loops=1)
-> Seq Scan on a_big_table (cost=0.00..129446.75 rows=4740475
width=101) (actual time=0.002..1421.622 rows=4714243 loops=1)
CTE filter
-> CTE Scan on base base_1 (cost=0.00..94809.50 rows=4740475 width=8)
(actual time=0.001..10214.684 rows=4714243 loops=1)
-> Nested Loop (cost=0.00..307677626611.24 rows=56180269915 width=2114)
(actual time=0.018..14162.280 rows=2 loops=1)
Join Filter: (base.rownum = filter.rownum)
Rows Removed by Join Filter: 4714243
-> CTE Scan on base (cost=0.00..94809.50 rows=4740475 width=2113)
(actual time=0.011..0.028 rows=2 loops=1)
-> CTE Scan on filter (cost=0.00..94809.50 rows=2370238 width=9)
(actual time=0.009..6595.770 rows=2357122 loops=2)
Filter: thing
Total runtime: 14247.374 ms


From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?
Date: 2014-10-31 16:38:36
Message-ID: 1414773516879-5825212.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Chris Rogers wrote
> I'm on PostgreSQL 9.3. This should reproduce on any table with 100,000+
> rows. The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
> 2, but I can't figure out why.
>
> EXPLAIN ANALYZE WITH base AS (
> SELECT *, ROW_NUMBER() OVER () AS rownum FROM a_big_table
> ), filter AS (
> SELECT rownum, true AS thing FROM base
> ) SELECT * FROM base LEFT JOIN filter USING (rownum) WHERE filter.thing
> LIMIT 1

The LIMIT 1 case has been optimized (special cased) while all others end up
using a normal plan.

Two things make your example query particularly unrealistic:

1. The presence of a ROW_NUMBER() window aggregate on an unsorted input
2. A LEFT JOIN condition matched with a WHERE clause with a right-side
column being non-NULL

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/why-does-LIMIT-2-take-orders-of-magnitude-longer-than-LIMIT-1-in-this-query-tp5825209p5825212.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Rogers <teukros(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: why does LIMIT 2 take orders of magnitude longer than LIMIT 1 in this query?
Date: 2014-10-31 16:42:15
Message-ID: 4951.1414773735@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Chris Rogers <teukros(at)gmail(dot)com> writes:
> I'm on PostgreSQL 9.3. This should reproduce on any table with 100,000+
> rows. The EXPLAIN ANALYZE shows many more rows getting scanned with LIMIT
> 2, but I can't figure out why.

This is not -hackers material.

The first row pulled from the nestloop LEFT JOIN is created by joining
the first output row from "base" to the first output row from "filter"
(which is indirectly also the first row from "base"). So, cheap; we've
only had to read one row from "a_big_table".

The second attempt to pull a row from the nestloop LEFT JOIN requires
evaluating all the rest of the output of the "filter" CTE, to see if there
are any more "filter" rows matching the first "base" row. (There are not,
since the "rownum" output is unique, but the nestloop doesn't know that.)
That in turn causes scanning all the rest of "base" and so all the rest
of "a_big_table". Only after that do we get to the second "base" row
at which another join output row is possible.

If the planner were about an order of magnitude cleverer than it is,
it might realize that this would happen and switch to another plan ...
although TBH the only way I can see to not have a large startup cost
would be to somehow realize that the outputs of both CTEs are sorted
by rownum and hence can be mergejoined without an explicit sort step.
That would require more understanding of the behavior of row_number()
than it's got or probably should have.

regards, tom lane