Re: Planner avoidance of index only scans for partial indexes

Lists: pgsql-hackers
From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Planner avoidance of index only scans for partial indexes
Date: 2012-08-15 22:28:10
Message-ID: CAHyXU0x1OGao48WajAfUsbXqkUDLf=_6ho6hLMB8DSfkWdaVVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

If you create an index like this:

create index on foo(a,b,v) where d = some_constant;

there is no way to get an IOS on the index: you have to supply a the
partial index exclusionary value to get the value of the index and
that fools the IOS chooser because it doesn't see the value in the
explicit list of index columns. The workaround is to include the
'partial value' (d) in the index, but often that's unnecessary.

In other words, if you have a partial index that is based on a
constant, a query that is filtering on the constant is an exception to
the rule that all columns must be in the index to get the IOS. Not a
bug, but it's worth noting.

Aside: the performance gains I'm seeing for IOS are nothing short of
spectacular.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Planner avoidance of index only scans for partial indexes
Date: 2012-08-15 23:20:58
Message-ID: 25141.1345072858@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> If you create an index like this:
> create index on foo(a,b,v) where d = some_constant;
> there is no way to get an IOS on the index: you have to supply a the
> partial index exclusionary value to get the value of the index and
> that fools the IOS chooser because it doesn't see the value in the
> explicit list of index columns.

Yeah, this is a known limitation that we'll probably try to improve
someday. Per the comment in check_index_only():

/*
* Check that all needed attributes of the relation are available from the
* index.
*
* XXX this is overly conservative for partial indexes, since we will
* consider attributes involved in the index predicate as required even
* though the predicate won't need to be checked at runtime. (The same is
* true for attributes used only in index quals, if we are certain that
* the index is not lossy.) However, it would be quite expensive to
* determine that accurately at this point, so for now we take the easy
* way out.
*/

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Planner avoidance of index only scans for partial indexes
Date: 2012-08-16 20:02:26
Message-ID: 502D51D2.30909@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/15/12 3:28 PM, Merlin Moncure wrote:
> Aside: the performance gains I'm seeing for IOS are nothing short of
> spectacular.

Do you have some metrics? I could use them for publicity stuff.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Planner avoidance of index only scans for partial indexes
Date: 2012-08-17 15:36:04
Message-ID: CAHyXU0yYN1MqTnerkYu42W6eeLbm3wPfr9iwMS3O6qDSmKcRpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 16, 2012 at 5:20 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I'm mainly looking for numbers for the 9.2 release. Like, "up to 14X
> speedup on data warehousing queries".

OK:
I work in the multi family industry and we have a large database which
we are organizing into an on-demand OLAP style data delivery project.
The data is organized into a fact table like this:
(PropertyId, Floorplan, UnitType, TimeSeries, AggregationTypeCode,
MetricCode, MetricValue)

Where metric value contains a single numeric value for the calculated
metric (say, average rent). The other fields point at the identifying
criteria for the metric: property it pertains to, etc. TimeSeries
represents a point in time: It's a string that is 'Y2012M01',
'Y2011Q3', etc.

The table is partitioned on a two year basis. The 2010_2011 partition
has 37million records and is only expected to grow as we add new
properties and metrics. One of the important questions this table has
to answer is to gather metric values like this:

SELECT PropertyId, AGG(MetricValue) FROM PrepertyMetric
WHERE Floorplan = w and UnitType = x and TimeSeries = y and MetricCode = z;

And that query might return 7k-15k records depending. It is extremely
important to be fast and cache warm-up is a huge issue for us as we
have various nightly processes that blow the cache out. The main
interface to the function is a routine called GetChartData which runs
the query above N times looking back in time from a known point
(typically 12 times) and returns arrays which get converted to json
when going out the door.

On the 9.2 database I organized one of the partitions (only) to
utilize IOS and started capturing timings when I came in this morning
with a completely cold cache. Partition tables have been chilled as
part of the nightly build.

hese are 100% real world results. output data is 3 parallel arrays of size 12.

9.1 Run 1 (Cold), 12M partition
Time: 5147.000 ms

9.1 Run 2 (Warm), 12M partition
Time: 219.000 ms

9.2 Run 1 (Cold) 12M partition, Index Only scan
Time: 257.000 ms

9.2 Run 2 (Warm) 2M partition, Index Only scan
Time: 92.000 ms

9.1 Run 1 (Cold) 37M partition
Time: 22074.000 ms

9.1 Run 2 (Warm) 37M partition
Time: 435.000 ms

9.2 Run 1 (Cold) 37M partition (not IOS!)
Time: 7629.000 ms

9.2 Run 2 (Warm) 37M partition (not IOS!)
Time: 183.000 ms

The takeaway is that for this query I'm seeing end to end query
execution improvement of 58% in the worst case (warm cache) and 20x or
more in the best case when the cache is cold: that 22 sec time is
unpredictable and can run into the minutes as the index randomizes the
i/o to the heap (the table is not clustered on this index nor can it
be). This 'best case' is extremely important to us as the generally
the first thing the user sees when firing up the application the first
time and with 9.2 it's a 'night and day' experience. Here is the
bottom line in 9.1 vs 9.2 IOS:

9.1:
ysanalysis=# explain (buffers, analyze) SELECT ...
QUERY PLAN
----------------------------------------------------------------------------------------
Result (cost=0.00..0.26 rows=1 width=0) (actual
time=219.979..219.979 rows=1 loops=1)
Buffers: shared hit=176227
Total runtime: 219.990 ms

9.2:
ysanalysis=# explain (buffers, analyze) SELECT ...
QUERY PLAN
--------------------------------------------------------------------------------------
Result (cost=0.00..0.26 rows=1 width=0) (actual time=92.909..92.909
rows=1 loops=1)
Buffers: shared hit=27198
Total runtime: 92.921 ms

That's going to translate into less cache pressure and overall cpu
efficiency. It's a also a good technique to manage i/o patterns
especially for those of us that don't have the luxury of an all-SSD
storage unit (my backend is a
generally-good-but-you-never-know-performance enterprise SAN). Good
knowledge of the database internals is helpful and precise indexing is
a must.

merlin


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Planner avoidance of index only scans for partial indexes
Date: 2012-08-17 20:53:54
Message-ID: 502EAF62.4020101@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin,

Thanks!

> The takeaway is that for this query I'm seeing end to end query
> execution improvement of 58% in the worst case (warm cache) and 20x or
> more in the best case when the cache is cold: that 22 sec time is
> unpredictable and can run into the minutes as the index randomizes the
> i/o to the heap (the table is not clustered on this index nor can it
> be). This 'best case' is extremely important to us as the generally
> the first thing the user sees when firing up the application the first
> time and with 9.2 it's a 'night and day' experience. Here is the
> bottom line in 9.1 vs 9.2 IOS:

So I could easily say: "An up to 20X speedup on data warehousing
queries,", no? Or "A between 2X and 20X speed increase on data
warehousing queries".

Of course, I'd love to see you able to blog the detail ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com