performance of SELECT * much faster than SELECT <colname> with large offset

From: Marc Slemko <marcs(at)znep(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: performance of SELECT * much faster than SELECT <colname> with large offset
Date: 2014-10-03 02:17:48
Message-ID: CAN1FPGN1ynBj3m1DMszc9MEYVj41S96OMy8-Q1cgKMGxv482SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I ran into this oddity lately that goes against everything I thought I
understood and was wondering if anyone had any insight. Version/env
details at the end.

The root of it is these query times:

marcs=# select * from ccrimes offset 5140000 limit 1;
[...data omitted...]
(1 row)
Time: 650.280 ms
marcs=# select description from ccrimes offset 5140000 limit 1;
description
-------------------------------------
FINANCIAL IDENTITY THEFT OVER $ 300
(1 row)

Time: 1298.672 ms

These times are all from data that is cached and are very repeatable.
Yes, I know that offset and limit without an order by isn't useful for
paging through data.

And an explain on them both... everything looks the same other than
the width and actual times:

marcs=# explain (analyze,buffers) select * from ccrimes offset 5140000 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=204146.73..204146.73 rows=1 width=202) (actual
time=1067.901..1067.901 rows=1 loops=1)
Buffers: shared hit=152743
-> Seq Scan on ccrimes (cost=0.00..204146.73 rows=5139873
width=202) (actual time=0.014..810.672 rows=5140001 loops=1)
Buffers: shared hit=152743
Total runtime: 1067.951 ms
(5 rows)

Time: 1068.612 ms
marcs=# explain (analyze,buffers) select description from ccrimes
offset 5140000 limit 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Limit (cost=204146.73..204146.73 rows=1 width=17) (actual
time=1713.027..1713.027 rows=1 loops=1)
Buffers: shared hit=152743
-> Seq Scan on ccrimes (cost=0.00..204146.73 rows=5139873
width=17) (actual time=0.013..1457.521 rows=5140001 loops=1)
Buffers: shared hit=152743
Total runtime: 1713.053 ms
(5 rows)

Time: 1713.612 ms

When I run the query and capture a profile using "perf" and compare
the two, the thing that stands out is the slot_getsomeattrs call that
dominates the trace in the slow query but not in the faster "SELECT *"
version:

- 39.25% postgres postgres [.] _start
- _start
- 99.47% slot_getsomeattrs
ExecProject
ExecScan
ExecProcNode
ExecLimit
ExecProcNode
standard_ExecutorRun
0x7f4315f7c427
PortalRun
PostgresMain
PostmasterMain
main
__libc_start_main
+ 0.53% ExecProject
+ 18.82% postgres postgres [.] HeapTupleSatisfiesMVCC
+ 12.01% postgres postgres [.] 0xb6353
+ 9.47% postgres postgres [.] ExecProject

The table is defined as:

Column | Type | Modifiers
----------------------+--------------------------------+-----------
s_updated_at_0 | timestamp(3) with time zone |
s_version_1 | bigint |
s_id_2 | bigint |
s_created_at_3 | timestamp(3) with time zone |
id | numeric |
case_number | text |
date | timestamp(3) without time zone |
block | text |
iucr | text |
primary_type | text |
description | text |
location_description | text |
arrest | boolean |
domestic | boolean |
beat | text |
district | text |
ward | numeric |
community_area | text |
fbi_code | text |
x_coordinate | numeric |
y_coordinate | numeric |
year | numeric |
updated_on | timestamp(3) without time zone |
latitude | numeric |
longitude | numeric |
location_lat | double precision |
location_long | double precision |

I've been testing this against Postgres 9.3.5 on Ubuntu 12.04 LTS
running with a 3.2.0 kernel, and get similar results on both raw
hardware and in Azure VMs. This repros on boxes with no other load.

Any suggestions about what is going on or where to dig further would
be appreciated. I can make a pgdump of the data I'm using if anyone
is interested.

Thanks.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2014-10-03 02:39:37 Re: performance of SELECT * much faster than SELECT <colname> with large offset
Previous Message Peter Geoghegan 2014-10-03 00:54:06 Re: Yet another abort-early plan disaster on 9.3