Simple join optimized badly?

From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Simple join optimized badly?
Date: 2006-10-07 06:34:31
Message-ID: 45274A77.7080005@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have two tables, SAMPLE and HITLIST that when joined, generate a monsterous sort.

HITLIST_ROWS has about 48,000 rows
SAMPLE has about 16 million rows

The joined column is indexed in SAMPLE
HITLIST_ROWS is a scratch table which is used a few times then discarded.
HITLIST_ROWS has no indexes at all

There are two plans below. The first is before an ANALYZE HITLIST_ROWS, and it's horrible -- it looks to me like it's sorting the 16 million rows of the SEARCH table. Then I run ANALYZE HITLIST_ROWS, and the plan is pretty decent.

First question: HITLIST_ROWS so small, I don't understand why the lack of ANALYZE should cause SAMPLE's contents to be sorted.

Second question: Even though ANALYZE brings it down from 26 minutes to 47 seconds, a huge improvement, it still seems slow to me. Its going at roughly 1 row per millisecond -- are my expectations too high? This is a small-ish Dell computer (Xeon), 4 GB memory, with a four-disk SATA software RAID0 (bandwidth limited to about 130 MB/sec due to PCI cards). Other joins of a similar size seem much faster.

It looks like I'll need to do an ANALYZE every time I modify HITLIST_ROWS, which seems like a waste because HITLIST_ROWS is rarely used more than once or twice before being truncated and rebuilt with new content. (HITLIST_ROWS can't be an actual temporary table, though, because it's a web application and each access is from a new connection.)

This is Postgres 8.0.3. (We're upgrading soon.)

Thanks,
Craig

explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=4782.35..1063809.82 rows=613226 width=4) (actual time=174.212..1593886.582 rows=176294 loops=1)
Merge Cond: ("outer".version_id = "inner".objectid)
-> Index Scan using i_sample_version_id on sample t (cost=0.00..1008713.68 rows=16446157 width=8) (actual time=0.111..1571911.208 rows=16446157 loops=1)
-> Sort (cost=4782.35..4910.39 rows=51216 width=4) (actual time=173.669..389.496 rows=176329 loops=1)
Sort Key: ph.objectid
-> Seq Scan on hitlist_rows_378593 ph (cost=0.00..776.16 rows=51216 width=4) (actual time=0.015..90.059 rows=48834 loops=1)
Total runtime: 1594093.725 ms
(7 rows)

chmoogle2=> analyze HITLIST_ROWS;
ANALYZE
chmoogle2=> explain analyze select t.SAMPLE_ID from SAMPLE t, HITLIST_ROWS ph where t.VERSION_ID = ph.ObjectID;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=874.43..457976.83 rows=584705 width=4) (actual time=302.792..47796.719 rows=176294 loops=1)
Hash Cond: ("outer".version_id = "inner".objectid)
-> Seq Scan on sample t (cost=0.00..369024.57 rows=16446157 width=8) (actual time=46.344..26752.343 rows=16446157 loops=1)
-> Hash (cost=752.34..752.34 rows=48834 width=4) (actual time=149.548..149.548 rows=0 loops=1)
-> Seq Scan on hitlist_rows_378593 ph (cost=0.00..752.34 rows=48834 width=4) (actual time=0.048..80.721 rows=48834 loops=1)
Total runtime: 47988.572 ms
(6 rows)

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2006-10-07 15:51:40 Re: Simple join optimized badly?
Previous Message Carlo Stonebanks 2006-10-06 20:45:07 Re: Performance Optimization for Dummies 2 - the SQL