Re: Index isn't used during a join.

From: Robert Creager <Robert_Creager(at)LogicalChaos(dot)org>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: PGPerformance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index isn't used during a join.
Date: 2006-01-11 14:26:59
Message-ID: 20060111072659.7c1772ad@thunder.logicalchaos.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

When grilled further on (Wed, 11 Jan 2006 00:56:55 -0700),
Michael Fuhr <mike(at)fuhr(dot)org> confessed:

> On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote:
> > The query is now correct, but still is slow because of lack of
> > index usage. I don't know how to structure the query correctly to
> > use the index.
>
> Have you tried adding restrictions on doy in the WHERE clause?
> Something like this, I think:

I cannot. That's what I thought I would get from the join. The query shown will always have two days involved, and only grows from there. The data is graphed at http://www.logicalchaos.org/weather/index.html, and I'm looking at adding historical data to the graphs.

Opps, never mind. You hit the nail on the head:

weather-# SELECT *, unmunge_time( time_group ) AS time,
weather-# EXTRACT( doy FROM unmunge_time( time_group ) )
weather-# FROM minute."windspeed"
weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy )
weather-# WHERE unmunge_time( time_group ) > ( now() - '24 hour'::interval )
weather-# AND doy BETWEEN EXTRACT( doy FROM now() - '24 hour'::interval)
weather-# AND EXTRACT( doy FROM now() )
weather-# ORDER BY time_group;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=21914.09..21914.10 rows=1 width=48) (actual time=76.595..76.662 rows=286 loops=1)
Sort Key: windspeed.time_group
-> Hash Join (cost=21648.19..21914.08 rows=1 width=48) (actual time=64.656..75.562 rows=286 loops=1)
Hash Cond: (date_part('doy'::text, unmunge_time("outer".time_group)) = "inner".doy)
-> Bitmap Heap Scan on windspeed (cost=2.27..267.40 rows=74 width=28) (actual time=0.585..1.111 rows=286 loops=1)
Recheck Cond: (unmunge_time(time_group) > (now() - '24:00:00'::interval))
-> Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.27 rows=74 width=0) (actual time=0.566..0.566 rows=287 loops=1)
Index Cond: (unmunge_time(time_group) > (now() - '24:00:00'::interval))
-> Hash (cost=21645.92..21645.92 rows=3 width=20) (actual time=63.849..63.849 rows=2 loops=1)
-> HashAggregate (cost=21645.84..21645.89 rows=3 width=20) (actual time=63.832..63.834 rows=2 loops=1)
-> Bitmap Heap Scan on readings (cost=59.21..21596.85 rows=6532 width=20) (actual time=15.174..53.249 rows=7613 loops=1)
Recheck Cond: ((date_part('doy'::text, "when") >= date_part('doy'::text, (now() - '24:00:00'::interval))) AND (date_part('doy'::text, "when") <= date_part('doy'::text, now())))
-> Bitmap Index Scan on readings_doy_index (cost=0.00..59.21 rows=6532 width=0) (actual time=12.509..12.509 rows=10530 loops=1)
Index Cond: ((date_part('doy'::text, "when") >= date_part('doy'::text, (now() - '24:00:00'::interval))) AND (date_part('doy'::text, "when") <= date_part('doy'::text, now())))
Total runtime: 77.177 ms

What I had thought is that PG would (could?) be smart enough to realize that one query was restricted, and apply that restriction to the other based on the join. I know it works in other cases (using indexes on both tables using the join)...

>
> Something else occurred to me: do you (or will you) have more than
> one year of data? If so then matching on doy could be problematic
> unless you also check for the year, or unless you want to match
> more than one year.

Yes and yes. I'm doing both aggregate by day of the year for all data, and aggregate by day of year within each year. The examples are:

weather=# select * from doy_agg where doy = extract( doy from now() );
doy | avg_windspeed | max_windspeed
-----+------------------+---------------
11 | 6.14058239764748 | 69
(1 row)

weather=# select * from doy_day_agg where extract( doy from day ) = extract( doy from now() );
day | avg_windspeed | max_windspeed
---------------------+------------------+---------------
2004-01-11 00:00:00 | 5.03991313397539 | 17
2006-01-11 00:00:00 | 18.532050716667 | 69
2005-01-11 00:00:00 | 3.6106763448041 | 13

Thanks for your help Michael.

Cheers,
Rob

--
07:07:30 up 3 days, 23:34, 9 users, load average: 2.29, 2.44, 2.43
Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Creager 2006-01-11 15:02:37 Re: Index isn't used during a join.
Previous Message Bendik Rognlien Johansen 2006-01-11 10:59:39 Slow query with joins