Quick Links

Re: Per-table random_page_cost for tables that we know are always cached

From:	Decibel! <decibel(at)decibel(dot)org>
To:	PFC <lists(at)peufeu(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Per-table random_page_cost for tables that we know are always cached
Date:	2008-04-22 18:41:51
Message-ID:	F1CD2C9D-B78F-4205-913E-BE76AFE0EF96@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Apr 22, 2008, at 5:13 AM, PFC wrote:
> In order to have it use the fast plan I must set random_page_cost
> to 1 which I absolutely don't want to do.
> Setting effective_cache_size to huge values has no effect.
> If I select a value of parent_id that has much less children, the
> index will be used, but in this case I think the threshold is
> misplaced, it should be slightly higher. Here we have about 5% of
> values selected. Hash join becomes better at about 15% because the
> table is cached.
> This is 8.3.
>
> Perhaps there would be a need for a per-object setting
> (object=table,index,partition) to alter the aggressiveness/lazyness
> of the page flushing and how long the pages for this object are
> kept in shared_buffers... this would be used to modify
> random_page_cost on a per-table/index/partition basis.
>
> Example : let's imagine a "cache priority" setting.
>
> - "cache priority" set to the minimum means this table is mostly
> write-only
> - "cache priority" set to default would give current behaviour
> (which is correct in most cases)
> - "cache priority" set to a high value would tell Postgres "I know
> this table/index/partition is small and often accessed rather
> randomly, so I want you to keep it in shared_buffers, purge it if
> you must but otherwise keep it in memory, flush something else
> instead which has lower cache_priority".
>
> The optimizer could then use a different (much lower) value of
> random_page_cost for tables for which "cache priority" is set
> highest since it would know.

"cache priority" to me sounds like we're trying to influence caching
behavior, which isn't what's happening. I do agree that we need a
better way to tell the planner what tables are in memory.

> An alternative would be for the background writer to keep some
> stats and do the thing for us :
>
> - begin bgwriter scan
> - setup hashtable of [relid => page count]
> - at each page that is scanned, increment "page count" for this
> relation (uses very little CPU)
> - end bgwriter stats
> - for each relation, compare the number of pages we found in
> shared_buffers with the number of pages in the relation and draw
> conclusions about how well cached the relation is
> - update random_page_cost accordingly for this relation
>
> This would not examine whatever is in the OS' cache, though.

Actually, there's no need for bgwriter to do that; we can just look
at the hit rate for the object. But we'd also need stats for how
often we find pages for a relation in the OS cache, which no one has
come up with a good method for.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Per-table random_page_cost for tables that we know are always cached at 2008-04-22 10:13:53 from PFC

Responses

Re: Per-table random_page_cost for tables that we know are always cached at 2008-04-22 18:58:07 from Gurjeet Singh
Re: Per-table random_page_cost for tables that we know are always cached at 2008-04-22 23:39:38 from Tom Lane
Re: Per-table random_page_cost for tables that we know are always cached at 2008-04-23 13:47:59 from Zeugswetter Andreas OSB SD
Re: Per-table random_page_cost for tables that we know are always cached at 2008-04-24 04:25:20 from Ron Mayer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Decibel!	2008-04-22 18:54:41	Re: Regression test fails when BLCKSZ is 1kB
Previous Message	A.M.	2008-04-22 18:20:41	Re: MERGE Specification