Tuning queries on large database

From: Valerie Schneider DSI/DEV <Valerie(dot)Schneider(at)meteo(dot)fr>
To: pgsql-performance(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Cc: Valerie(dot)Schneider(at)meteo(dot)fr
Subject: Tuning queries on large database
Date: 2004-08-04 12:44:43
Message-ID: 200408041244.i74CihO19344@mu.meteo.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-performance

Hi,

I have some problem of performance on a PG database, and I don't
know how to improve. I Have two questions : one about the storage
of data, one about tuning queries. If possible !

My job is to compare Oracle and Postgres. All our operational databases
have been running under Oracle for about fifteen years. Now I try to replace
Oracle by Postgres.

I have a test platform under linux (Dell server, 4 Gb RAM, bi-processor,
Linux Red Hat 9 (2.4.20-31.9)) with 2 databases, 1 with Oracle
(V8i or V9i it's quite the same), 1 with PG (7.4.2). Both databases
have the same structure, same content, about 100 Gb each. I developped
some benches, representative of our use of databases. My problem
is that I have tables (relations) with more than 100 millions rows,
and each row has about 160 fields and an average size 256 bytes.

For Oracle I have a SGA size of 500 Mb.
For PG I have a postgresql.conf as :
max_connections = 1500
shared_buffers = 30000
sort_mem = 50000
effective_cache_size = 200000
and default value for other parameters.

I have a table named "data" which looks like this :
bench=> \d data
Table "public.data"
Column | Type | Modifiers
------------+-----------------------------+-----------
num_poste | numeric(9,0) | not null
dat | timestamp without time zone | not null
datrecu | timestamp without time zone | not null
rr1 | numeric(5,1) |
qrr1 | numeric(2,0) | ...
... all numeric fields
...
Indexes:
"pk_data" primary key, btree (num_poste, dat)
"i_data_dat" btree (dat)

It contains 1000 different values of "num_poste" and for each one
125000 different values of "dat" (1 row per hour, 15 years).

I run a vacuum analyze of the table.

bench=> select * from tailledb ;
schema | relfilenode | table | index | reltuples | size
--------+-------------+------------------+------------+-------------+----------
public | 125615917 | data | | 1.25113e+08 | 72312040
public | 251139049 | data | i_data_dat | 1.25113e+08 | 2744400
public | 250870177 | data | pk_data | 1.25113e+08 | 4395480

My first remark is that the table takes a lot of place on disk, about
70 Gb, instead of 35 Gb with oracle.
125 000 000 rows x 256 b = about 32 Gb. This calculation gives an idea
not so bad for oracle. What about for PG ? How data is stored ?

The different queries of the bench are "simple" queries (no join,
sub-query, ...) and are using indexes (I "explained" each one to
be sure) :
Q1 select_court : access to about 700 rows : 1 "num_poste" and 1 month
(using PK : num_poste=p1 and dat between p2 and p3)
Q2 select_moy : access to about 7000 rows : 10 "num_poste" and 1 month
(using PK : num_poste between p1 and p1+10 and dat between p2 and p3)
Q3 select_long : about 250 000 rows : 2 "num_poste"
(using PK : num_poste in (p1,p1+2))
Q4 select_tres_long : about 3 millions rows : 25 "num_poste"
(using PK : num_poste between p1 and p1 + 25)

The result is that for "short queries" (Q1 and Q2) it runs in a few
seconds on both Oracle and PG. The difference becomes important with
Q3 : 8 seconds with oracle
80 sec with PG
and too much with Q4 : 28s with oracle
17m20s with PG !

Of course when I run 100 or 1000 parallel queries such as Q3 or Q4,
it becomes a disaster !
I can't understand these results. The way to execute queries is the
same I think. I've read recommended articles on the PG site.
I tried with a table containing 30 millions rows, results are similar.

What can I do ?

Thanks for your help !

********************************************************************
* Les points de vue exprimes sont strictement personnels et *
* n'engagent pas la responsabilite de METEO-FRANCE. *
********************************************************************
* Valerie SCHNEIDER Tel : +33 (0)5 61 07 81 91 *
* METEO-FRANCE / DSI/DEV Fax : +33 (0)5 61 07 81 09 *
* 42, avenue G. Coriolis Email : Valerie(dot)Schneider(at)meteo(dot)fr *
* 31057 TOULOUSE Cedex - FRANCE http://www.meteo.fr *
********************************************************************

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2004-08-04 13:06:54 Re: [PERFORM] Tuning queries on large database
Previous Message terry 2004-08-04 12:07:55 Re: trash talk

Browse pgsql-performance by date

  From Date Subject
Next Message Paul Serby 2004-08-04 12:45:55 The black art of postgresql.conf tweaking
Previous Message Ulrich Wisser 2004-08-04 12:00:39 How to know which queries are to be optimised?