Re: Why is indexonlyscan so darned slow?

Lists: pgsql-hackers
From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Why is indexonlyscan so darned slow?
Date: 2012-05-17 03:08:23
Message-ID: 1645421906.291004.1337224103503.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So, I set up a test which should have been ideal setup for index-only scan. The index was 1/10 the size of the table, and fit in RAM (1G) which the table does not:

bench2=# select pg_size_pretty(pg_relation_size('pgbench_accounts_pkey'));
pg_size_pretty
----------------
428 MB
(1 row)

bench2=# select pg_size_pretty(pg_relation_size('pgbench_accounts'));
pg_size_pretty
----------------
5768 MB
(1 row)

The table was just VACUUM ANALYZED and had no subsequent updates. So, what's going on here?

bench2=# explain ( analyze on, buffers on ) select count(*) from pgbench_accounts;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
------------
Aggregate (cost=855069.99..855070.00 rows=1 width=0) (actual time=64014.573..64014.574 rows=1 loops=1)
Buffers: shared hit=33 read=738289
I/O Timings: read=27691.314
-> Seq Scan on pgbench_accounts (cost=0.00..831720.39 rows=9339839 width=0) (actual time=6790.669..46530.408 rows=200000
00 loops=1)
Buffers: shared hit=33 read=738289
I/O Timings: read=27691.314
Total runtime: 64014.626 ms
(7 rows)

bench2=# explain ( analyze on, buffers on ) select count(*) from pgbench_accounts;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------
Aggregate (cost=382829.37..382829.38 rows=1 width=0) (actual time=38325.026..38325.027 rows=1 loops=1)
Buffers: shared hit=1 read=54653
I/O Timings: read=907.202
-> Index Only Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..359479.77 rows=9339839 width=0) (actual t
ime=33.459..20110.908 rows=20000000 loops=1)
Heap Fetches: 0
Buffers: shared hit=1 read=54653
I/O Timings: read=907.202
Total runtime: 38333.536 ms

As you can see, the indexonlyscan version of the query spends 5% as much time reading the data as the seq scan version, and doesn't have to read the heap at all. Yet it spends 20 seconds doing ... what, exactly?

BTW, kudos on the new explain analyze reporting ... works great!

--Josh

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-17 09:46:54
Message-ID: CA+CSw_tA8hjGobbLnx-a3WYJHpVOohdRd-Pi25TX72Bd4zE9fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 17, 2012 at 6:08 AM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
> As you can see, the indexonlyscan version of the query spends 5% as much time reading the data as the seq scan version, and doesn't have to read the heap at all.  Yet it spends 20 seconds doing ... what, exactly?
>
> BTW, kudos on the new explain analyze reporting ... works great!

Looks like timing overhead. Timing is called twice per tuple which
gives around 950ns per timing call for your index only result. This is
around what is expected of hpet based timing. If you are on Linux you
can check what clocksource you are using by running cat
/sys/devices/system/clocksource/clocksource0/current_clocksource

You can verify that it is due to timing overhead by adding timing off
to the explain clause. Or use the pg_test_timing utility to check the
timing overhead on your system. With hpet based timing I'm seeing
660ns timing overhead and 26.5s execution for your query, with timing
off execution time falls to 2.1s. For reference, tsc based timing
gives 19.2ns overhead and 2.3s execution time with timing.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-17 12:22:12
Message-ID: 250538564.300581.1337257332564.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ants,

Well, that's somewhat better, but again hardly the gain in performance I'd expect to see ... especially since this is ideal circumstances for index-only scan.

bench2=# select count(*) from pgbench_accounts;
count
----------
20000000
(1 row)

Time: 3827.508 ms

bench2=# set enable_indexonlyscan=off;
SET
Time: 0.241 ms
bench2=# select count(*) from pgbench_accounts;
count
----------
20000000
(1 row)

Time: 16012.444 ms

For some reason counting tuples in an index takes 5X as long (per tuple) as counting them in a table. Why?

----- Original Message -----
> On Thu, May 17, 2012 at 6:08 AM, Joshua Berkus <josh(at)agliodbs(dot)com>
> wrote:
> > As you can see, the indexonlyscan version of the query spends 5% as
> > much time reading the data as the seq scan version, and doesn't
> > have to read the heap at all.  Yet it spends 20 seconds doing ...
> > what, exactly?
> >
> > BTW, kudos on the new explain analyze reporting ... works great!
>
> Looks like timing overhead. Timing is called twice per tuple which
> gives around 950ns per timing call for your index only result. This
> is
> around what is expected of hpet based timing. If you are on Linux you
> can check what clocksource you are using by running cat
> /sys/devices/system/clocksource/clocksource0/current_clocksource
>
> You can verify that it is due to timing overhead by adding timing off
> to the explain clause. Or use the pg_test_timing utility to check the
> timing overhead on your system. With hpet based timing I'm seeing
> 660ns timing overhead and 26.5s execution for your query, with timing
> off execution time falls to 2.1s. For reference, tsc based timing
> gives 19.2ns overhead and 2.3s execution time with timing.
>
> Ants Aasma
> --
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-17 16:26:24
Message-ID: CAMkU=1woatEkfHguEiLF3zFYhrf2OKUGpXLV_07uQEwShoL8Jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 17, 2012 at 5:22 AM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
> Ants,
>
> Well, that's somewhat better, but again hardly the gain in performance I'd expect to see ... especially since this is ideal circumstances for index-only scan.
>
> bench2=# select count(*) from pgbench_accounts;
>  count
> ----------
>  20000000
> (1 row)
>
> Time: 3827.508 ms
>
> bench2=# set enable_indexonlyscan=off;
> SET
> Time: 0.241 ms
> bench2=# select count(*) from pgbench_accounts;
>  count
> ----------
>  20000000
> (1 row)
>
> Time: 16012.444 ms
>
> For some reason counting tuples in an index takes 5X as long (per tuple) as counting them in a table.  Why?
>

It looks like the IOS is taking 4x less time, not more time.

Anyway, the IOS follows the index logical structure, not the physical
structure, so if the index is not in RAM it will really be hurt by the
lack of sequential reads.

Cheers,

Jeff


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-17 18:35:14
Message-ID: 1974185342.339650.1337279714066.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff,

That's in-RAM speed ... I ran the query twice to make sure the index was cached, and it didn't get any better. And I meant 5X per byte rather than 5X per tuple.

I talked this over with Haas, and his opinion is that we have a LOT of overhead in the way we transverse indexes, especially lookups which happen once per leaf node instead of in bulk. Certainly the performance I'm seeing would be consistent with that idea.

I'll try some multi-column covering indexes next to see how it looks.

----- Original Message -----
> On Thu, May 17, 2012 at 5:22 AM, Joshua Berkus <josh(at)agliodbs(dot)com>
> wrote:
> > Ants,
> >
> > Well, that's somewhat better, but again hardly the gain in
> > performance I'd expect to see ... especially since this is ideal
> > circumstances for index-only scan.
> >
> > bench2=# select count(*) from pgbench_accounts;
> >  count
> > ----------
> >  20000000
> > (1 row)
> >
> > Time: 3827.508 ms
> >
> > bench2=# set enable_indexonlyscan=off;
> > SET
> > Time: 0.241 ms
> > bench2=# select count(*) from pgbench_accounts;
> >  count
> > ----------
> >  20000000
> > (1 row)
> >
> > Time: 16012.444 ms
> >
> > For some reason counting tuples in an index takes 5X as long (per
> > tuple) as counting them in a table.  Why?
> >
>
> It looks like the IOS is taking 4x less time, not more time.
>
> Anyway, the IOS follows the index logical structure, not the physical
> structure, so if the index is not in RAM it will really be hurt by
> the
> lack of sequential reads.
>
> Cheers,
>
> Jeff
>


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-17 18:53:51
Message-ID: CAMkU=1wV+n6B+38_N=d6UKB56EPvUU1VOxqOVDx3hzhH9c-0-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 17, 2012 at 11:35 AM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
> Jeff,
>
> That's in-RAM speed ... I ran the query twice to make sure the index was cached, and it didn't get any better.  And I meant 5X per byte rather than 5X per tuple.

Ah, OK that makes more sense. I played around with this, specifically
count(*), quite a bit when IOS first came out, and I attributed a
large part of the time to the code that forms a tuple out of raw
bytes, and the code that advances the aggregate. The first one is
probably more a per-tuple cost than per byte, and the second
definitely is per tuple cost.

I can't find my detailed notes from this work, so this is just from memory.

Cheers,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-20 19:24:50
Message-ID: 7600.1337541890@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> On Thu, May 17, 2012 at 11:35 AM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
>> That's in-RAM speed ... I ran the query twice to make sure the index was cached, and it didn't get any better. And I meant 5X per byte rather than 5X per tuple.

> Ah, OK that makes more sense. I played around with this, specifically
> count(*), quite a bit when IOS first came out, and I attributed a
> large part of the time to the code that forms a tuple out of raw
> bytes, and the code that advances the aggregate. The first one is
> probably more a per-tuple cost than per byte, and the second
> definitely is per tuple cost.
> I can't find my detailed notes from this work, so this is just from memory.

I did a quick investigation of this example with oprofile, and found
that there's not going to be any easy large improvement available.
It's basically all per-tuple CPU costs, breaking down like this:

samples % symbol name
15513 13.4664 IndexOnlyNext
10886 9.4498 index_getnext_tid
7526 6.5331 visibilitymap_test
7116 6.1772 ExecClearTuple
7054 6.1234 _bt_checkkeys
6804 5.9064 _bt_next
6344 5.5070 ExecProject
6033 5.2371 advance_aggregates
5619 4.8777 ExecScan
5331 4.6277 advance_transition_function
5202 4.5157 btgettuple
4928 4.2779 _bt_saveitem
4653 4.0391 ExecProcNode
4473 3.8829 int8inc
3404 2.9549 MemoryContextReset
3125 2.7127 _bt_readpage
2768 2.4028 FunctionCall2Coll
2278 1.9775 ExecAgg
1502 1.3038 ExecStoreVirtualTuple
1198 1.0399 BufferGetBlockNumber
1105 0.9592 ExecIndexOnlyScan
946 0.8212 hash_search_with_hash_value

A fair chunk of what's being attributed to IndexOnlyNext is actually the
inlined code for StoreIndexTuple, and that in turn is mostly the inlined
code for index_getattr. We might possibly save a bit here if we noticed
that the query doesn't actually need us to fetch the indexed columns,
but it looks like that would only buy a couple percent overall --- and
testing for that would add useless cycles to every case where we *do*
need the indexed value. So I'm afraid that it might amount to optimizing
SELECT COUNT(*) at the expense of everything else, which I'm not for.

Another possibility is to try to reduce the costs of index_getnext_tid
and FunctionCall2Coll, which are basically just trampolines to reach
btgettuple. It's not immediately obvious how to make that much better
though.

Anyway, on my machine it seems that the per-tuple CPU costs for SELECT
COUNT(*) with an index-only scan are something like 10% higher than the
per-tuple costs with a heap scan. We might get that down to roughly par
with some hacking, but it's never going to be vastly better. The
argument in favor of index-only scans is mainly about reducing I/O costs
anyway.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Joshua Berkus <josh(at)agliodbs(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-20 23:57:06
Message-ID: CA+TgmoaR3OS=SPGBX62z_oZP02495F6QN9y-=ZBQjHEXg_f+9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, May 20, 2012 at 3:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Another possibility is to try to reduce the costs of index_getnext_tid
> and FunctionCall2Coll, which are basically just trampolines to reach
> btgettuple.  It's not immediately obvious how to make that much better
> though.

Hmm... this seems awfully similar to the problem we tried to solve
with the sortsupport infrastructure. Maybe something similar would be
warranted here, to save the overhead of repeated argument packing and
unpacking.

Here's some 'perf' results from the IBM POWER7 box:

10.01% postgres postgres [.] visibilitymap_test
8.78% postgres postgres [.] IndexOnlyNext
7.85% postgres postgres [.] btgettuple
5.67% postgres postgres [.] ExecProject
5.56% postgres postgres [.] ExecProcNode
5.51% postgres postgres [.] advance_transition_function
5.06% postgres postgres [.] advance_aggregates
5.02% postgres postgres [.] ExecScan
4.43% postgres postgres [.] FunctionCall2Coll
4.11% postgres postgres [.] _bt_checkkeys
3.54% postgres postgres [.] ExecClearTuple
3.42% postgres postgres [.] int8inc
3.25% postgres postgres [.] _bt_next
3.19% postgres postgres [.] MemoryContextReset
2.95% postgres postgres [.] index_getnext_tid
2.81% postgres postgres [.] _bt_readpage
2.43% postgres postgres [.] _bt_saveitem
2.42% postgres postgres [.] ExecIndexOnlyScan
2.32% postgres libc-2.14.90.so [.] memcpy

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 17:30:59
Message-ID: 4FBA7BD3.2060203@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Anyway, on my machine it seems that the per-tuple CPU costs for SELECT
> COUNT(*) with an index-only scan are something like 10% higher than the
> per-tuple costs with a heap scan. We might get that down to roughly par
> with some hacking, but it's never going to be vastly better. The
> argument in favor of index-only scans is mainly about reducing I/O costs
> anyway.

Well, if it's not CPU costs, then something else is eating the time,
since I'm seeing per-tuple COUNT counts on indexes being 400% more than
on heap.

In the airport you said something about index-only scan not scanning the
tuples in leaf page order. Can you elaborate on that?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 17:41:02
Message-ID: 2640.1337622062@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Well, if it's not CPU costs, then something else is eating the time,
> since I'm seeing per-tuple COUNT counts on indexes being 400% more than
> on heap.

Well, I'm not: as I said, it looks like about 10% here. Perhaps you're
testing a cassert-enabled build?

> In the airport you said something about index-only scan not scanning the
> tuples in leaf page order. Can you elaborate on that?

If the index is too big to fit in RAM, you'd be looking at random
fetches of the index pages in most cases (since logical ordering of the
index pages is typically different from physical ordering), leading to
it likely being a lot slower per page than a heapscan. Not sure this
has anything to do with your test case though, since you said you'd
sized the index to fit in RAM.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 17:44:59
Message-ID: 4FBA7F1B.1010608@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/21/12 10:41 AM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Well, if it's not CPU costs, then something else is eating the time,
>> since I'm seeing per-tuple COUNT counts on indexes being 400% more than
>> on heap.
>
> Well, I'm not: as I said, it looks like about 10% here. Perhaps you're
> testing a cassert-enabled build?

Oh, right, now I get you. It's not the per-tuple costs which matter,
it's the per-size costs. Per-tuple costs are fairly similar, right.

> If the index is too big to fit in RAM, you'd be looking at random
> fetches of the index pages in most cases (since logical ordering of the
> index pages is typically different from physical ordering), leading to
> it likely being a lot slower per page than a heapscan. Not sure this
> has anything to do with your test case though, since you said you'd
> sized the index to fit in RAM.

Right. So what I'm trying to figure out is why counting an index which
fits in ram (and I've confirmed via EXPLAIN ( buffers on ) ) is not
being heap-fetched or read from disk would take 25% as long as counting
a table which is 80% on disk.

I'll try comparting on-disk to on-disk speeds, as well as in-memory to
in-memory speeds, and some non-count tests, as well as multicolumn
covering indexes. I just need to generate more complex test cases than
I can get from pgbench first.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 19:22:52
Message-ID: CA+U5nMJUK1opc=XDhTj9fWdfob0psvgmdivg3HXkzpkUEnaowQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 May 2012 13:41, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Well, if it's not CPU costs, then something else is eating the time,
>> since I'm seeing per-tuple COUNT counts on indexes being 400% more than
>> on heap.
>
> Well, I'm not: as I said, it looks like about 10% here.  Perhaps you're
> testing a cassert-enabled build?
>
>> In the airport you said something about index-only scan not scanning the
>> tuples in leaf page order.   Can you elaborate on that?
>
> If the index is too big to fit in RAM, you'd be looking at random
> fetches of the index pages in most cases (since logical ordering of the
> index pages is typically different from physical ordering), leading to
> it likely being a lot slower per page than a heapscan.  Not sure this
> has anything to do with your test case though, since you said you'd
> sized the index to fit in RAM.

As you point out, this is still an IndexScan even if the heap access is zero.

Surely the way to solve this is by having a new plan node that does a
physical SeqScan of the index relation. It means we wouldn't preserve
the sort order of the rows from the index, but that is just a plan
cost issue.

This is exactly what we do for VACUUM and it works faster there.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 20:02:47
Message-ID: 5495.1337630567@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Surely the way to solve this is by having a new plan node that does a
> physical SeqScan of the index relation. It means we wouldn't preserve
> the sort order of the rows from the index, but that is just a plan
> cost issue.

> This is exactly what we do for VACUUM and it works faster there.

The reason that's okay for vacuum is that vacuum doesn't care if it
visits the same index tuple multiple times. It will not work for real
queries, unless you would like to lock out all concurrent inserts.

regards, tom lane


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 20:30:37
Message-ID: CAMkU=1w-i8nA_-xEhgZNah6ZF2U0c7OhQDyyhLEEUw4P2rmUZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 21, 2012 at 10:44 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
> Right.  So what I'm trying to figure out is why counting an index which
> fits in ram (and I've confirmed via EXPLAIN ( buffers on ) ) is not
> being heap-fetched or read from disk would take 25% as long as counting
> a table which is 80% on disk.

Sequential disk reads are fast. Parsing the data after it has been
read from disk is also fast, but not infinitely so. If you can get
your IO system to be about 4 times faster, then you would start being
limited by CPU even on disk-based sequential scans.

Earlier you said that this should be an ideal setup for IOS. But it
isn't really--the ideal set up is one in which the alternative to an
IOS is a regular index scan which makes many uncached scattered reads
into the heap. I don't think that that situation can't really be
engineered with a where-less query.

Iterating over any non-trivial data structure with 20,000,000 entries
is going to take some time. As way of comparison, iterating over a
Perl hash doing nothing but a counter increment takes several times
longer than a same-sized IOS count does. (Of course you don't need to
iterate over a Perl hash to get the size, but just directly fetching
the size would not be a fair comparison)

Cheers,

Jeff


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 20:37:06
Message-ID: CA+U5nM+gf5F=nm3ceaiWofXS+rjybUn_vsLRQakOp+0byGDn4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 May 2012 16:02, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> Surely the way to solve this is by having a new plan node that does a
>> physical SeqScan of the index relation. It means we wouldn't preserve
>> the sort order of the rows from the index, but that is just a plan
>> cost issue.
>
>> This is exactly what we do for VACUUM and it works faster there.
>
> The reason that's okay for vacuum is that vacuum doesn't care if it
> visits the same index tuple multiple times.  It will not work for real
> queries, unless you would like to lock out all concurrent inserts.

I checked a little more and Oracle supports something called a Fast
Index Scan. Maybe there is a way.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 20:42:26
Message-ID: 4FBAA8B2.7010601@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Earlier you said that this should be an ideal setup for IOS. But it
> isn't really--the ideal set up is one in which the alternative to an
> IOS is a regular index scan which makes many uncached scattered reads
> into the heap. I don't think that that situation can't really be
> engineered with a where-less query.

Can you give me some suggested comparisons which *would* be ideal, then?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 21:07:13
Message-ID: CA+U5nM+HC9LnaX2Lch5BhybqscwAe2cQiSFd-iO+ynRrE9=g7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21 May 2012 16:42, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> Earlier you said that this should be an ideal setup for IOS.  But it
>> isn't really--the ideal set up is one in which the alternative to an
>> IOS is a regular index scan which makes many uncached scattered reads
>> into the heap.  I don't think that that situation can't really be
>> engineered with a where-less query.
>
> Can you give me some suggested comparisons which *would* be ideal, then?

Presumably the use case we were tuning for was established prior to
development of the patch?

Surely the performance test results that demonstrate the gain have
been posted to hackers, so we just need to refer back to them?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 21:17:26
Message-ID: CAMkU=1yQujSOjbu29r23LWH3HM-gPAv7X0FQwaxP86v_RZkMwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 21, 2012 at 1:42 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> Earlier you said that this should be an ideal setup for IOS.  But it
>> isn't really--the ideal set up is one in which the alternative to an
>> IOS is a regular index scan which makes many uncached scattered reads
>> into the heap.  I don't think that that situation can't really be
>> engineered with a where-less query.
>
> Can you give me some suggested comparisons which *would* be ideal, then?

Are you looking for vaguely real-life examples, or highly contrived
examples used to dissect the server?

For vaguely real life, take your example of pgbench -i -s200 -F 50,
and I have 2Gig RAM, which seems to be the same as you do.

With select only work load (pgbench -S -M prepared -T 30), I get

tps = 193

But now enable index-only scans:

psql -c "create index on pgbench_accounts(aid, abalance);"

and it goes up to.

tps = 10137

Cheers,

Jeff


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-21 21:29:32
Message-ID: CAHyXU0xT=Se_zMdjEaC9W6uGuuP7LEOLKpJ7CRwrH8z+k4SQ-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 21, 2012 at 4:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, May 21, 2012 at 1:42 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>
>>> Earlier you said that this should be an ideal setup for IOS.  But it
>>> isn't really--the ideal set up is one in which the alternative to an
>>> IOS is a regular index scan which makes many uncached scattered reads
>>> into the heap.  I don't think that that situation can't really be
>>> engineered with a where-less query.
>>
>> Can you give me some suggested comparisons which *would* be ideal, then?
>
> Are you looking for vaguely real-life examples, or highly contrived
> examples used to dissect the server?
>
> For vaguely real life, take your example of pgbench -i -s200 -F 50,
> and I have 2Gig RAM, which seems to be the same as you do.
>
> With select only work load (pgbench -S -M prepared -T 30), I get
>
> tps = 193
>
> But now enable index-only scans:
>
> psql -c "create index on pgbench_accounts(aid, abalance);"
>
> and it goes up to.
>
> tps = 10137

Right -- the main driver here is that your index fits neatly in ram
and the heap does not -- so you're effectively measuring the
difference between a buffered and non-buffered access. That's highly
contrived as you noted and unlikely to come up all *that* often in the
real world.

Generally though the real world wins (although the gains will be
generally less spectacular) are heavily i/o bound queries where the
indexed subset of data you want is nicely packed and the (non
clustered) heap records are all over the place. By skipping the semi
random heap lookups you can see enormous speedups. I figure 50-90%
improvement would be the norm there, but this is against queries that
are taking forever, being i/o bound.

See here: http://www.devheads.net/database/postgresql/performance/index-all-necessary-columns-postgres-vs-mssql.htm
for a 'in the wild' gripe about about not having index scans.

merlin


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-22 00:13:18
Message-ID: CA+CSw_vAEtNsNz4q5BPYi6ok=6+e7fH+-ouUpHhs9P-vwe4psw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 22, 2012 at 12:29 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> Generally though the real world wins (although the gains will be
> generally less spectacular) are heavily i/o bound queries where the
> indexed subset of data you want is nicely packed and the (non
> clustered) heap records are all over the place.  By skipping the semi
> random heap lookups you can see enormous speedups.  I figure 50-90%
> improvement would be the norm there, but this is against queries that
> are taking forever, being i/o bound.

The heap fetch/visibility check overhead is also a problem for CPU
bound workloads. Example:

CREATE TABLE test AS SELECT x, (RANDOM()*1000000000) AS value FROM
generate_series(1,10000000) AS x;
CREATE INDEX ON test(value, x);
VACUUM ANALYZE test;

Then running the following pgbench script with 4G buffers:

\setrandom rangemin 1 1000000000
\set rangemax :rangemin + 1000000
SELECT MIN(x) FROM test WHERE value BETWEEN :rangemin AND :rangemax;

I get the following results:

bitmap scan: 106 tps
index scan: 146 tps
index only scan: 653 tps

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-22 16:33:32
Message-ID: CAMkU=1wL75zCo40EHyV9h6SX-bUE4yajPsiRGUHRuTr6n-Fnwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 21, 2012 at 2:29 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Mon, May 21, 2012 at 4:17 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>
>> For vaguely real life, take your example of pgbench -i -s200 -F 50,
>> and I have 2Gig RAM, which seems to be the same as you do.
>>
>> With select only work load (pgbench -S -M prepared -T 30), I get
>>
>> tps = 193
>>
>> But now enable index-only scans:
>>
>> psql -c "create index on pgbench_accounts(aid, abalance);"
>>
>> and it goes up to.
>>
>> tps = 10137
>
> Right -- the main driver here is that your index fits neatly in ram
> and the heap does not -- so you're effectively measuring the
> difference between a buffered and non-buffered access.  That's highly
> contrived as you noted and unlikely to come up all *that* often in the
> real world.

I don't think this one is highly contrived. With the index being 10
fold smaller than the table, there is plenty of window for one to fit
in RAM and the other one not to in a variety of real world situations.
(Although I only get 10 fold window because of -F 50. I don't why
Josh had that big of a different in his original, unless he also used
a nondefault fill factor setting. But of course many real world
tables will be wider than pgbench_accounts is.)

The highly contrived example useful for dissecting the implementation
would be to do:

set enable_seqscan=off;
set enable_indexonlyscan=off;
select count(*) from pgbench_accounts where aid is not null;

The key that I keep forgetting is the "where aid is not null'.
Without that it uses the full table scan, even with enable_seqscan
off, rather than doing an ordinary index scan.

> Generally though the real world wins (although the gains will be
> generally less spectacular) are heavily i/o bound queries where the
> indexed subset of data you want is nicely packed and the (non
> clustered) heap records are all over the place.  By skipping the semi
> random heap lookups you can see enormous speedups.  I figure 50-90%
> improvement would be the norm there, but this is against queries that
> are taking forever, being i/o bound.
>
> See here: http://www.devheads.net/database/postgresql/performance/index-all-necessary-columns-postgres-vs-mssql.htm
> for a 'in the wild' gripe about about not having index scans.

But without scripts to recreate the data with the right selectivities
and correlations, and to generate a long stream of appropriate query
parameterizations so that they don't become cached, that is just a
gripe and not an example.

I tried to reproduce the problem as stated, and couldn't make IOS be
useful because I couldn't make it be slow even without them.
Presumably I'm doing something wrong, but how could I tell what? Have
we heard back on whether IOS was tried and proved useful to the
originator of that thread?

If you want an example where neither index nor table fit in memory,
then just bump up the scale to 2000---and get a machine with only 2G
of memory if you don't already have one :)

With the extra index in place:

with enable_indexonlyscan=off
tps = 155.1
with enable_indexonlyscan=on
tps = 453.7

It seems like that should have only doubled, I'm not sure why it did
more than double. Maybe the index became better cached when the table
stopped competing with it for buffers.

If you leave the select-only world and go back to doing updates, then
that extra index is doing to hurt you somewhat, but if the dominant
bottleneck is rpm of your WAL drive, it might not be noticeable.

Cheers,

Jeff


From: Greg Stark <stark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-23 15:09:04
Message-ID: CAM-w4HP3Oy=TjPo64PHDJR3q2M=R5pa9cyKQv3N6e9p22wPtdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 21, 2012 at 9:37 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> This is exactly what we do for VACUUM and it works faster there.
>>
>> The reason that's okay for vacuum is that vacuum doesn't care if it
>> visits the same index tuple multiple times.  It will not work for real
>> queries, unless you would like to lock out all concurrent inserts.

Even if you didn't care about seeing duplicates (such as for instance
if you keep a hash table of seen tids to dedup) Vacuum's approach
wouldn't work because the technique it uses to detect concurrent page
splits to know that it has to go back and look for resulting child
pages only works if there's only one scanner at a time. Vacuum
actually marks each page with a vacuum generation number -- that
doesn't work if you want to allow multiple concurrent *scans*. Locking
out concurrent inserts just might even be conceivably tolerable for
some use cases but locking out concurrent read-only scans would really
be beyond the pale.

> I checked a little more and Oracle supports something called a Fast
> Index Scan. Maybe there is a way.

Oracle maintains their indexes differently. Since they do
transactional consistency at the block level and it applies to all
relations -- even indexes -- they see a consistent view of the entire
index. This is engineering. There's always a way but there's no free
lunch. They incur some overhead when they find a block in the index
and have to look up the old version of the block in the rollback
segment. In Postgres I suspect the kind of change that would be needed
would cost concurrency on inserts/updates in exchange for more
flexibility scanning the index.

--
greg


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is indexonlyscan so darned slow?
Date: 2012-05-23 15:47:44
Message-ID: CAHyXU0wVkO+etd=wCY_AW=vbMNixAfXpj7wgKq_GfbX4MqHNCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 22, 2012 at 11:33 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, May 21, 2012 at 2:29 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> See here: http://www.devheads.net/database/postgresql/performance/index-all-necessary-columns-postgres-vs-mssql.htm
>> for a 'in the wild' gripe about about not having index scans.
>
> But without scripts to recreate the data with the right selectivities
> and correlations, and to generate a long stream of appropriate query
> parameterizations so that they don't become cached, that is just a
> gripe and not an example.
>
> I tried to reproduce the problem as stated, and couldn't make IOS be
> useful because I couldn't make it be slow even without them.
> Presumably I'm doing something wrong, but how could I tell what?  Have
> we heard back on whether IOS was tried and proved useful to the
> originator of that thread?

nope. but the damning evidence was that non-IOS on sql server
performed on par with postgres on the OP's data. (i also tried to
reproduce with similar results as you).

I bet i/o bound IOS will do better than 50% most of the time because
the 'tuples' are packed better than on a typical heap page unless the
heap is well clustered around that particular index resulting in less
random I/O. This will directly translate to cpu efficiencies as
storage gets faster. It's just an all around fabulous feature and
like HOT is something to really consider carefully when laying out
schema.

merlin