Quick Links

Re: sequential scan result order vs performance

Lists:	pgsql-hackers

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	sequential scan result order vs performance
Date:	2016-10-30 07:36:55
Message-ID:	20161030073655.rfa6nvbyk4w2kkpk@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

while working on the executor, to process "batches" or "bubbles" of
tuples I hit some weird performance issues (as in things didn't improve
as much as I had hoped). A fair amount of headscratching later I
figured out that the tuple order in sequential scans is a major
bottleneck.

When iterating over a page we return tuples in itemid order, which makes
them returned in *descending* order address-wise, as tuples are stored
starting from the end of the page. But when actually accessing the
tuples, we access them increasing address order (header, and then column
by column). It appears that that, quite understandable confuses prefetch
units, leading to drastically increased cache miss ratios.

It's quite easy to change iteration so we start with the latest item,
and iterate till the first, rather than the other way round. In
benchmarks with somewhat wide columns and aggregation, this yields
speedups of over 30%, before hitting other bottlenecks.

I do wonder however if it's acceptable to change the result order of
sequential scans. People don't tend to specify ORDER BY everwhere - as
evidenced by large swathes of our regression tests failing spuriously -
so they might not be happy to see a somewhat weird order (pages
sequentially increasing, but individual tuples inside a page in reverse
order).

We could change the order only in cases where the user doesn't actually
see the result, say below aggregation, sort, and whatnot nodes. On the
other hand the benefit is quite significant for heavily filtered
sequential scans as well, COPY out also benefits.

Comments?

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sequential scan result order vs performance
Date:	2016-10-30 07:43:12
Message-ID:	20161030074312.gm5iovv3hehxmtml@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2016-10-30 00:36:55 -0700, Andres Freund wrote:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

One more point: Over IM Robert commented that it's not guaranteed that
itemid order correlates precisely with reverse "tuple data" order. After
PageRepairFragmentation() that'll not be the case anymore. That's true,
but I suspect in many cases with larger analytics queries the
correlation will still be significant, and we also could make it
guaranteed with the price of making PageRepairFragmentation() a bit more
expensive.

Greetings,

Andres Freund

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sequential scan result order vs performance
Date:	2016-10-30 14:12:05
Message-ID:	20461.1477836725@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

> I do wonder however if it's acceptable to change the result order of
> sequential scans.

I think there will be a lot of howls. People expect that creating
a table by inserting a bunch of rows, and then reading back those
rows, will not change the order. We already futzed with that guarantee
a bit with syncscans, but that only affects quite large tables --- and
even there, we were forced to provide a way to turn it off.

If you were talking about 3X then maybe it would be worth it, but for 30%
(on a subset of queries) I am not excited.

I wonder whether we could instead adjust the rules for insertion so
that tuples tend to be physically in order by itemid. I'm imagining
leaving two "holes" in a page and sometimes (hopefully not often)
having to shift data during insert to preserve the ordering.

regards, tom lane

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: sequential scan result order vs performance
Date:	2016-10-31 03:37:47
Message-ID:	dac9b505-56d2-c852-805b-e1c902de113e@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/30/16 9:12 AM, Tom Lane wrote:
> I think there will be a lot of howls. People expect that creating
> a table by inserting a bunch of rows, and then reading back those
> rows, will not change the order. We already futzed with that guarantee
> a bit with syncscans, but that only affects quite large tables --- and
> even there, we were forced to provide a way to turn it off.

Leaving a 30% performance improvement on the floor because some people
don't grok how sets work seems insane to me.

We could have a GUC to disable this. I suspect ORDER BY ctid would be
another option.

BTW, I've sometimes wished for a mode where queries would silently have
result ordering intentionally futzed, to eliminate any possibility of
dependence on tuple ordering (as well as having sequences start at some
random value). I guess with the hooks that are in place today it
wouldn't be hard to stick a ORDER BY random() in if there wasn't already
a Sort node at the top level...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

From:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"'Jim Nasby EXTERN'" <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: sequential scan result order vs performance
Date:	2016-10-31 08:14:37
Message-ID:	A737B7A37273E048B164557ADEF4A58B539720AF@ntex2010i.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby wrote:
> On 10/30/16 9:12 AM, Tom Lane wrote:
>> I think there will be a lot of howls. People expect that creating
>> a table by inserting a bunch of rows, and then reading back those
>> rows, will not change the order. We already futzed with that guarantee
>> a bit with syncscans, but that only affects quite large tables --- and
>> even there, we were forced to provide a way to turn it off.
>
> Leaving a 30% performance improvement on the floor because some people
> don't grok how sets work seems insane to me.

Yours,
Laurenz Albe

From:	ilmari(at)ilmari(dot)org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=)
To:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: sequential scan result order vs performance
Date:	2016-10-31 10:52:31
Message-ID:	d8jk2cordsg.fsf@dalvik.ping.uio.no
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> writes:

FWIW, SQLite has this, in the form of 'PRAGMA reverse_unordered_selects'.

http://sqlite.org/pragma.html#pragma_reverse_unordered_selects

--
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
the consequences of." -- Skud's Meta-Law

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: sequential scan result order vs performance
Date:	2016-10-31 20:54:18
Message-ID:	CADkLM=f43+ceZc0YYy_U+-7aDcbczuJrPp-L+k+bw7qgZNwd=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Oct 30, 2016 at 11:37 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
wrote:

> BTW, I've sometimes wished for a mode where queries would silently have
> result ordering intentionally futzed, to eliminate any possibility of
> dependence on tuple ordering (as well as having sequences start at some
> random value). I guess with the hooks that are in place today it wouldn't
> be hard to stick a ORDER BY random() in if there wasn't already a Sort node
> at the top level...

+1
In Oracle, we sorta had that feature by adding a parallel hint to a query
even if it didn't need it. It came in handy.