sequential scan result order vs performance

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: sequential scan result order vs performance
Date: 2016-10-30 07:36:55
Message-ID: 20161030073655.rfa6nvbyk4w2kkpk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

while working on the executor, to process "batches" or "bubbles" of
tuples I hit some weird performance issues (as in things didn't improve
as much as I had hoped). A fair amount of headscratching later I
figured out that the tuple order in sequential scans is a major
bottleneck.

When iterating over a page we return tuples in itemid order, which makes
them returned in *descending* order address-wise, as tuples are stored
starting from the end of the page. But when actually accessing the
tuples, we access them increasing address order (header, and then column
by column). It appears that that, quite understandable confuses prefetch
units, leading to drastically increased cache miss ratios.

It's quite easy to change iteration so we start with the latest item,
and iterate till the first, rather than the other way round. In
benchmarks with somewhat wide columns and aggregation, this yields
speedups of over 30%, before hitting other bottlenecks.

I do wonder however if it's acceptable to change the result order of
sequential scans. People don't tend to specify ORDER BY everwhere - as
evidenced by large swathes of our regression tests failing spuriously -
so they might not be happy to see a somewhat weird order (pages
sequentially increasing, but individual tuples inside a page in reverse
order).

We could change the order only in cases where the user doesn't actually
see the result, say below aggregation, sort, and whatnot nodes. On the
other hand the benefit is quite significant for heavily filtered
sequential scans as well, COPY out also benefits.

Comments?

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2016-10-30 07:40:56 Re: JIT compiler for expressions
Previous Message Karl O. Pinc 2016-10-30 07:20:47 Re: Patch to implement pg_current_logfile() function