Re: Super Optimizing Postgres

From: Hannu Krosing <hannu(at)sid(dot)tm(dot)ee>
To: Justin Clift <justin(at)postgresql(dot)org>
Cc: mlw <markw(at)mohawksoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, matthew(at)zeut(dot)net, Alex Pilosov <alex(at)pilosoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Super Optimizing Postgres
Date: 2001-11-18 06:39:11
Message-ID: 3BF7578F.7000108@sid.tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Justin Clift wrote:

>>In my example, two computers with exactly the same hardware, except one has a
>>5400 RPM IDE drive, the other has a 10,000 RPM IDE drive. These machines should
>>not use the same settings, it is obvious that a sequential scan block read on
>>one will be faster than the other.
>>
>
>If we're going to do this bit properly, then we'll have to take into
>consideration many database objects will need their own individual
>statistics. For example, lets say we have a database with a bunch of
>10k rpm SCSI drives which the tables are on, and the system also has one
>or more 15k rpm SCSI drives (lets say a Seagate Cheetah II drives) on
>which the indices have been placed. With the 10k rpm drives, the tables
>needing the fastest throughput or having the highest usage are put on
>the outer edges of the disk media, and the rest of the tables are placed
>in the available space.
>
>On this theoretical system, we will be better off measuring the
>performance of each table and index in turn then generating and storing
>costs for each one which are as "accurate as possible at this point in
>time".
>
That would mean that these statistic values must be stored in pg_class
and not be SET
variables at all.
This will probably have the added benefit that some cacheing effects of
small/big tables
will be accounted for automatically so you dont have to do that in
optimizer.

> A model like this would probably have these costs re-calculated
>each time the ANALYZE command is run to ensure their accuracy through
>database growth and changes.
>
Then the ANALYZE should be run on both tables and indexes. AFAIK we
currently
analyze only real data.

>
>I think this would be decently accurate, and RAID systems would be
>accurately analysed. Don't know how to take into account large cache
>sizes though. :)
>
Maybe some volatile statistict on how much of table "may be" cached in
the disk/fs
caches, assuming that we currently know how much of each is in shared
memory.

-----------
Hannu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2001-11-18 06:40:37 Re: OCTET_LENGTH is wrong
Previous Message Hannu Krosing 2001-11-18 06:22:09 Re: OCTET_LENGTH is wrong