Per-table random_page_cost for tables that we know are always cached

From: PFC <lists(at)peufeu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Per-table random_page_cost for tables that we know are always cached
Date: 2008-04-22 10:13:53
Message-ID: op.t90b1fmucigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


It started with this query :

EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON
(n.id=r.child_id) WHERE r.parent_id=16330;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=370.96..496.29 rows=543 width=273) (actual
time=18.887..21.164 rows=543 loops=1)
Hash Cond: (r.child_id = n.id)
-> Index Scan using relations_unique on relations r
(cost=0.00..111.75 rows=543 width=58) (actual time=0.022..0.519 rows=543
loops=1)
Index Cond: (parent_id = 16330)
-> Hash (cost=243.76..243.76 rows=10176 width=215) (actual
time=18.830..18.830 rows=10176 loops=1)
-> Seq Scan on nodes n (cost=0.00..243.76 rows=10176 width=215)
(actual time=0.006..5.135 rows=10176 loops=1)
Total runtime: 21.453 ms

SET enable_hashjoin TO 0;

EXPLAIN ANALYZE SELECT * FROM relations r JOIN nodes n ON
(n.id=r.child_id) WHERE r.parent_id=16330;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..514.50 rows=543 width=273) (actual
time=0.037..4.412 rows=543 loops=1)
-> Index Scan using relations_unique on relations r
(cost=0.00..111.75 rows=543 width=58) (actual time=0.023..0.476 rows=543
loops=1)
Index Cond: (parent_id = 16330)
-> Index Scan using nodes_pkey on nodes n (cost=0.00..0.73 rows=1
width=215) (actual time=0.004..0.005 rows=1 loops=543)
Index Cond: (n.id = r.child_id)
Total runtime: 4.638 ms

In order to have it use the fast plan I must set random_page_cost to 1
which I absolutely don't want to do.
Setting effective_cache_size to huge values has no effect.
If I select a value of parent_id that has much less children, the index
will be used, but in this case I think the threshold is misplaced, it
should be slightly higher. Here we have about 5% of values selected. Hash
join becomes better at about 15% because the table is cached.
This is 8.3.

Perhaps there would be a need for a per-object setting
(object=table,index,partition) to alter the aggressiveness/lazyness of the
page flushing and how long the pages for this object are kept in
shared_buffers... this would be used to modify random_page_cost on a
per-table/index/partition basis.

Example : let's imagine a "cache priority" setting.

- "cache priority" set to the minimum means this table is mostly write-only
- "cache priority" set to default would give current behaviour (which is
correct in most cases)
- "cache priority" set to a high value would tell Postgres "I know this
table/index/partition is small and often accessed rather randomly, so I
want you to keep it in shared_buffers, purge it if you must but otherwise
keep it in memory, flush something else instead which has lower
cache_priority".

The optimizer could then use a different (much lower) value of
random_page_cost for tables for which "cache priority" is set highest
since it would know.

An alternative would be for the background writer to keep some stats and
do the thing for us :

- begin bgwriter scan
- setup hashtable of [relid => page count]
- at each page that is scanned, increment "page count" for this relation
(uses very little CPU)
- end bgwriter stats
- for each relation, compare the number of pages we found in
shared_buffers with the number of pages in the relation and draw
conclusions about how well cached the relation is
- update random_page_cost accordingly for this relation

This would not examine whatever is in the OS' cache, though.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2008-04-22 10:55:19 Re: Regression test fails when BLCKSZ is 1kB
Previous Message Zdenek Kotala 2008-04-22 09:13:30 Re: Regression test fails when BLCKSZ is 1kB