Re: Super Optimizing Postgres

From: Justin Clift <justin(at)postgresql(dot)org>
To: mlw <markw(at)mohawksoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, matthew(at)zeut(dot)net, Alex Pilosov <alex(at)pilosoft(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Super Optimizing Postgres
Date: 2001-11-17 15:26:48
Message-ID: 3BF681B8.21DCD91A@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

mlw wrote:
>
> Tom Lane wrote:
> >
> > Justin Clift <justin(at)postgresql(dot)org> writes:
> > > I think it's an interesting thought of having a program which will test
> > > a system and work out the Accurate and Correct values for this.
> >
> > I think if you start out with the notion that there is an Accurate
> > and Correct value for these parameters, you've already lost the game.

I believe we can evolve and refine the model so it becomes more
accurate. We at least have to be willing to try otherwise I think THAT
is where we lose the game. :)

<snip>
> In my example, two computers with exactly the same hardware, except one has a
> 5400 RPM IDE drive, the other has a 10,000 RPM IDE drive. These machines should
> not use the same settings, it is obvious that a sequential scan block read on
> one will be faster than the other.

If we're going to do this bit properly, then we'll have to take into
consideration many database objects will need their own individual
statistics. For example, lets say we have a database with a bunch of
10k rpm SCSI drives which the tables are on, and the system also has one
or more 15k rpm SCSI drives (lets say a Seagate Cheetah II drives) on
which the indices have been placed. With the 10k rpm drives, the tables
needing the fastest throughput or having the highest usage are put on
the outer edges of the disk media, and the rest of the tables are placed
in the available space.

On this theoretical system, we will be better off measuring the
performance of each table and index in turn then generating and storing
costs for each one which are as "accurate as possible at this point in
time". A model like this would probably have these costs re-calculated
each time the ANALYZE command is run to ensure their accuracy through
database growth and changes.

I think this would be decently accurate, and RAID systems would be
accurately analysed. Don't know how to take into account large cache
sizes though. :)

Regards and best wishes,

Justin Clift

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Palle Girgensohn 2001-11-17 16:50:38 Multilingual application, ORDER BY w/ different locales?
Previous Message mlw 2001-11-17 14:24:58 Re: Super Optimizing Postgres