BUG #10816: explain analyze reports different rows counts before and after sort

Lists: pgsql-bugs
From: bashtanov(at)imap(dot)cc
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #10816: explain analyze reports different rows counts before and after sort
Date: 2014-06-30 09:43:43
Message-ID: 20140630094343.15698.80138@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 10816
Logged by: Alexey Bashtanov
Email address: bashtanov(at)imap(dot)cc
PostgreSQL version: 9.3.4
Operating system: Ubuntu Linux 12.04
Description:

Hello!

Please have a look at the two bottom nodes of the plan. Function scan
produces 1000 rows, then they are sorted and turn into 3201 rows! How could
this happen?

localhost/stat 13:31:19 > select version();
version

-----------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on i686-pc-linux-gnu, compiled by gcc (Ubuntu/Linaro
4.6.3-1ubuntu5) 4.6.3, 32-bit
(1 row)

localhost/stat 13:31:21 > set enable_hashjoin to off;
SET
localhost/stat 13:31:22 > explain analyze select * from (select k % 5 b from
generate_series(1,4000) k) l natural join generate_series(1,1000) b;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=119.66..227.16 rows=5000 width=4) (actual
time=5.977..11.848 rows=3200 loops=1)
Merge Cond: (((k.k % 5)) = b.b)
-> Sort (cost=59.83..62.33 rows=1000 width=4) (actual time=4.571..5.825
rows=4000 loops=1)
Sort Key: ((k.k % 5))
Sort Method: quicksort Memory: 221kB
-> Function Scan on generate_series k (cost=0.00..10.00 rows=1000
width=4) (actual time=0.732..2.737 rows=4000 loops=1)
-> Sort (cost=59.83..62.33 rows=1000 width=4) (actual time=0.878..1.942
rows=3201 loops=1)
Sort Key: b.b
Sort Method: quicksort Memory: 40kB
-> Function Scan on generate_series b (cost=0.00..10.00 rows=1000
width=4) (actual time=0.193..0.528 rows=1000 loops=1)
Total runtime: 12.814 ms
(11 rows)

Regards,
Alexey Bashtanov


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: bashtanov(at)imap(dot)cc
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10816: explain analyze reports different rows counts before and after sort
Date: 2014-06-30 14:16:57
Message-ID: 4200.1404137817@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

bashtanov(at)imap(dot)cc writes:
> Please have a look at the two bottom nodes of the plan. Function scan
> produces 1000 rows, then they are sorted and turn into 3201 rows! How could
> this happen?

The discrepancy is easily explained because the sort is the righthand
child of a merge join. Merge join will rewind and rescan its right
child every time the left child produces duplicate keys, so that the
appropriate right-child rows get joined to all the similarly-keyed
rows from the left child. And you do have duplicate outputs coming
from that left child ...

IOW, it's just a measurement artifact arising from re-fetching the
same rows repeatedly. I think there's a warning about this in the
docs.

regards, tom lane