Quick Links

Re: contsel and gist

Lists:	pgsql-hackers

From:	Ben <midfield(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	contsel and gist
Date:	2010-10-28 16:57:49
Message-ID:	42FF469A-0BB5-4CE4-90A0-D9A7EF3C0BB3@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hello --

i have a largish table (~8 billion rows) which makes use of the temporal period datatype and gist indexes. i find that query plans are somewhat "unstable" in that it is often the case that slightly altering a query can result in a change from using the index (usually via a bitmap scan) to a sequential scan. there is basically no situation where a sequential scan is the right plan for the kinds of queries we are doing -- the table is so large that any sequential scan would take hours. (i will give more details about our setup below.)

my guess is that it has to do with the selectivity of the @> operator. i've looked and noticed that the selectivity functions for @> and other period operators are basically stubs, with constant selectivity. my questions are :

1 - am i wrong in my assessment? is the constant contsel, areasel, etc hurting us?

2 - how hard would it be to implement contsel et al for period data types? i've read the gist papers but find the eqsel code a bit hard to understand. (would anyone be willing to help?)

more details :

pg 9, linux x64_64 box with 24gb ram and software raid-5. (not ideal, i understand.) the table is

create table timeseries ( id integer not null, value float not null, timespan period not null );
create index timeseries_id on timeseries (id);
create index timeseries_timespan on timeseries using gist (timespan);

in our dataset there are about 2000 different time series, each given a different id. the time series are piecewise constant functions we are representing as (id, value, time interval) triples. the intervals are typically no more than a few seconds, at most a few minutes. for each id, the intervals are non-overlapping and cover the total time period. there are about 8 billion rows of historical data, and there are about 3 million new rows a day, relatively evenly spread across the different ids. the database gets updated once a day with the new rows, and the rows are loaded in time order; the historical data is basically ordered in time. other than that single daily update, the workload is basically read-only. the most important query for us is to sample the time series at (potentially irregular) grid points (i will give some example queries below.)

there are some small side tables (less than 150K rows) for things like different grid points to sample at, or auxiliary data which augment the time series.

create table grids ( gridid integer not null, gridpt timestamp with timezone, primary key (gridid, gridpt) );
create table adjs1 ( id integer not null, timespan period not null, adj double precision not null);
create index adjs1_timespan on adjs1 using gist (timespan);
create index adjs1_id on adjs1 (id);

an example query that works :

postgres=> explain analyze select * from timeseries join grids on timespan @> gridpt where gridid = 2 and gridpt between '2006-10-12' and '2006-10-15';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=18082.74..33760441.77 rows=6525038912 width=48) (actual time=204.655..2498.152 rows=34275 loops=1)
-> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12) (actual time=0.059..0.559 rows=38 loops=1)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-12 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-15 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on timeseries (cost=18082.74..1460749.52 rows=567395 width=36) (actual time=32.561..62.545 rows=902 loops=38)
Recheck Cond: (timeseries.timespan @> grids.gridpt)
-> Bitmap Index Scan on timeseries_idx_timespan (cost=0.00..17940.89 rows=567395 width=0) (actual time=32.184..32.184 rows=902 loops=38)
Index Cond: (timeseries.timespan @> grids.gridpt)
Total runtime: 2553.386 ms
(8 rows)

Time: 2555.498 ms

where there are about 10-100 gridpts between the times selected. this query plan looks good to me, and indeed it runs fairly fast.

an example query that goes haywire :

postgres=> explain select * from timeseries as TS join grids on TS.timespan @> gridpt join adjs1 as AD1 on TS.id=AD1.id and AD1.timespan @> gridpt where gridid=2 and gridpt between '2006-10-19' and '2006-10-22';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=7166.37..2517194919.79 rows=83476436469 width=76)
Hash Cond: (ts.id = ad1.id)
Join Filter: (ts.timespan @> grids.gridpt)
-> Seq Scan on timeseries ts (cost=0.00..10402235.88 rows=567394688 width=36)
-> Hash (cost=2024.57..2024.57 rows=411344 width=40)
-> Nested Loop (cost=4.36..2024.57 rows=411344 width=40)
-> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-19 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-22 00:00:00'::timestamp without time zone))
-> Bitmap Heap Scan on adjs1 ad1 (cost=4.36..84.24 rows=36 width=28)
Recheck Cond: (ad1.timespan @> grids.gridpt)
-> Bitmap Index Scan on adjs1_timespan (cost=0.00..4.35 rows=36 width=0)
Index Cond: (ad1.timespan @> grids.gridpt)

i've omitted the explain analyze because it is too slow.

i can circumvent this sequential scan by some outer join trickery, but that seems like a hack, and although relatively quick doesn't seem like an idea plan:

postgres=> explain analyze select * from timeseries as TS join grids on TS.timespan @> gridpt left outer join adjs1 as AD1 on TS.id=AD1.id and AD1.timespan @> gridpt where gridid=2 and gridpt between '2006-10-19' and '2006-10-22';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1068.80..2563197208.41 rows=83476436469 width=76) (actual time=145.504..2542.112 rows=34125 loops=1)
Hash Cond: (ts.id = ad1.id)
Join Filter: (ad1.timespan @> grids.gridpt)
-> Nested Loop (cost=0.00..44001399.98 rows=6525038912 width=48) (actual time=7.079..1634.089 rows=34125 loops=1)
-> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12) (actual time=0.040..0.426 rows=38 loops=1)
Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-19 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-22 00:00:00'::timestamp without time zone))
-> Index Scan using timeseries_idx_timespan on timeseries ts (cost=0.00..1906008.58 rows=567395 width=36) (actual time=3.695..39.859 rows=898 loops=38)
Index Cond: (ts.timespan @> grids.gridpt)
-> Hash (cost=621.69..621.69 rows=35769 width=28) (actual time=138.374..138.374 rows=35769 loops=1)
Buckets: 4096 Batches: 1 Memory Usage: 2236kB
-> Seq Scan on adjs1 ad1 (cost=0.00..621.69 rows=35769 width=28) (actual time=0.009..63.904 rows=35769 loops=1)
Total runtime: 2596.653 ms

what i would really like to do is to inform the planner that the row counts and selectivity of the gist index are such that it is better to use the index in most every situation. i have run vacuum analyze but it does not seem to help. alternatively, is there some way of rephrasing the query (via subselects?) which will encourage the query planner in the right direction?

best regards, ben

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ben <midfield(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: contsel and gist
Date:	2010-10-28 17:50:24
Message-ID:	23452.1288288224@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ben <midfield(at)gmail(dot)com> writes:
> my guess is that it has to do with the selectivity of the @> operator. i've looked and noticed that the selectivity functions for @> and other period operators are basically stubs, with constant selectivity. my questions are :

> 1 - am i wrong in my assessment? is the constant contsel, areasel, etc hurting us?

The stub selectivity functions definitely suck.

> 2 - how hard would it be to implement contsel et al for period data types?

If it were easy, it'd likely have been done already :-(

However, having said that: the constant value of the stub contsel
function is intended to be small enough to encourage use of an
indexscan. Maybe we just need to decrease it a bit more. Have you
investigated what the cutover point is for your queries?

regards, tom lane

From:	Ben <midfield(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: contsel and gist
Date:	2010-10-28 20:05:07
Message-ID:	3BA3D995-4597-4C25-BB38-B0FB4E53189B@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

thanks for the prompt reply.

On Oct 28, 2010, at 10:50 AM, Tom Lane wrote:

>> 1 - am i wrong in my assessment? is the constant contsel, areasel, etc hurting us?
>
> The stub selectivity functions definitely suck.

i'm taking this as implying that my intuition here is basically right.

>> 2 - how hard would it be to implement contsel et al for period data types?
>
> If it were easy, it'd likely have been done already :-(

i am interested in learning more about this, in hopes that it might be possible for me to do it some day. do you have any pointers as far as things to look at to learn from? i imagine this must be a problem for the postgis people too.....

i guess the first step is to figure out what kind of statistics / histograms to collect for the period datatype. (i don't see anything in pg_stats.) has there been previous work / thinking on this?

> However, having said that: the constant value of the stub contsel
> function is intended to be small enough to encourage use of an
> indexscan. Maybe we just need to decrease it a bit more. Have you
> investigated what the cutover point is for your queries?

i'd be happy to investigate this for you, but my guess is my dataset is probably not a good example to use for setting the constant more generally. i'm joining an 8e10 table vs a 150K table, so the selectivity fraction would probably have to drop by many orders of magnitude. that being said, i'll poke around and see if i can find the cutoff point. is there an easy way to do this that doesn't involve recompiling postgres?

best regards, b

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ben <midfield(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: contsel and gist
Date:	2010-10-28 21:19:40
Message-ID:	26257.1288300780@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ben <midfield(at)gmail(dot)com> writes:
> On Oct 28, 2010, at 10:50 AM, Tom Lane wrote:
>> However, having said that: the constant value of the stub contsel
>> function is intended to be small enough to encourage use of an
>> indexscan. Maybe we just need to decrease it a bit more. Have you
>> investigated what the cutover point is for your queries?

> i'd be happy to investigate this for you, but my guess is my dataset
is probably not a good example to use for setting the constant more
generally. i'm joining an 8e10 table vs a 150K table, so the
selectivity fraction would probably have to drop by many orders of
magnitude.

I doubt it.

> that being said, i'll poke around and see if i can find the cutoff point. is there an easy way to do this that doesn't involve recompiling postgres?

No, those are just hardwired constants. If you wanted to avoid multiple
recompilations while experimenting, you could set up a custom GUC
variable for the functions to read...

regards, tom lane