Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search archives
  Advanced Search

Re: Load a database into memory


  • From: Guillaume Le'mery <glemery(at)comclick(dot)com>
  • To: Denis Perchine <dyp(at)perchine(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
  • Subject: Re: Load a database into memory
  • Date: Fri, 26 Jan 2001 09:25:47 +0100
  • Message-id: <3A71348B.9020000@comclick.com> <text/plain>


For memory :

CREATE TABLE accord_editeur
(
id_regie int4 not null,
num_editeur int4 not null,
num_site int4 not null,
num_emplacement int4 not null,
num_campagne int4 not null,
num_publicite int4 not null,
num_periode int4,
par_id_technologie int4 not null,
affichage_possible int4 default 0,
ponderation_calculee int4,
date_pilotage timestamp NULL,
id_ct1 int4,
PRIMARY
KEY(id_regie,num_editeur,num_site,num_emplacement,num_campagne,num_publicit
e) );

(I've got a primary key on multiple fields because of the complexity of
my database. It's the only way to have unique record.)

CREATE TABLE parametre
(
id_parametre int4 not null primary key,
id_regie int4 NULL ,
par_id_parametre int4 NULL ,
type INT4 not null,
valeur_str varchar null,
valeur_int int4 null,
valeur_fl float8 null,
valeur_txt varchar,
date_pilotage timestamp NULL,
id_ct1 int4
);
CREATE INDEX parametre_tracking_idx ON parametre(type, valeur_int);

The query :
SELECT ae.id_regie,
ae.num_campagne,
ae.num_publicite,
ae.ponderation_calculee * random(),
>> FROM accord_editeur ae,
>> parametre par
>> WHERE ae.id_regie=1
>> AND ae.num_editeur = 1494
>> AND ae.num_site = 1
>> AND ae.num_emplacement = 1
>> AND ae.affichage_possible = 1
>> AND ae.par_id_technologie = par.id_parametre
>> AND par.type = 10
>> AND par.valeur_int = 1
>>
>> And the Explain :
>> NOTICE:  QUERY PLAN:
>>
>> Nested Loop  (cost=0.00..228.27 rows=1 width=56)
>> -> Index Scan using accord_editeur_pkey on accord_editeur ae
>> (cost=0.00..225.50 rows=1 width=48)
>> -> Index Scan using parametre_tracking_idx on parametre par
>> (cost=0.00..2.02 rows=1 width=8)
>>
>> EXPLAIN

How much tuples do you usually get from this query?
About 100.

random() can be quite expensive.
Ok, Here is with the change :

Query whitout the random() :
SELECT ae.id_regie,
ae.num_campagne,
ae.num_publicite,
ae.ponderation_calculee,
ae.num_periode
FROM accord_editeur ae,
parametre par
WHERE ae.id_regie=1
AND ae.num_editeur = 1494
AND ae.num_site = 1
AND ae.num_emplacement = 1
AND ae.affichage_possible = 1
AND ae.par_id_technologie = par.id_parametre
AND par.type = 10
AND par.valeur_int = 1

Explain :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..228.27 rows=1 width=56)
-> Index Scan using parametre_tracking_idx on parametre par (cost=0.00..2.02 rows=1 width=8) -> Index Scan using accord_editeur_pkey on accord_editeur ae (cost=0.00..225.50 rows=60 width=48)

EXPLAIN

random() is not quite expensive... I'll do it in my C code...


Also please give us output of
vacuum verbose analyze parametre;
NOTICE:  --Relation parametre--
NOTICE: Pages 3: Changed 0, reaped 0, Empty 0, New 0; Tup 235: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 64, MaxLen 148; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
NOTICE:  Index parametre_pkey: Pages 2; Tuples 235. CPU 0.00s/0.00u sec.
NOTICE: Index parametre_tracking_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u sec. NOTICE: Index parametre_idct1_idx: Pages 2; Tuples 235. CPU 0.00s/0.00u sec.
VACUUM

vacuum verbose analyze accord_editeur;
NOTICE:  --Relation accord_editeur--
NOTICE: Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124; Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU 0.21s/1.85u sec. NOTICE: Index accord_editeur_pkey: Pages 3339; Tuples 447032: Deleted 0. CPU 0.19s/1.01u sec.
VACUUM

But it seems that the problem is in your too large index. It is quite expensive to traverse it (note that all cost flows from index scan over it).

I do not know the exact statictics for your data, but if you create an index with small amount of fields, which account vast of the data, query should became much faster, if optimizer will decide to use this index. As a first try you can create index on num_editeur. Do not forget to do vacuum analyze accord_editeur; after this.

OK, I dropped the large index and only set one on num_editeur.

Vacuum verbose analyze accord_editeur :
NOTICE:  --Relation accord_editeur--
NOTICE: Pages 7096: Changed 0, reaped 6790, Empty 0, New 0; Tup 447032: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 23768, MinLen 124, MaxLen 124; Re-using: Free/Avail. Space 721776/0; EndEmpty/Avail. Pages 0/0. CPU 0.33s/1.74u sec. NOTICE: Index accord_editeur_tracking_idx: Pages 1330; Tuples 447032: Deleted 0. CPU 0.05s/0.93u sec.
VACUUM

Here is the new Explain (always whithout the random()) :
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..512.65 rows=1 width=56)
-> Index Scan using parametre_tracking_idx on parametre par (cost=0.00..2.02 rows=1 width=8) -> Index Scan using accord_editeur_tracking_idx on accord_editeur ae (cost=0.00..509.88 rows=60 width=48)

EXPLAIN

So, the large index seems to be better...
Or maybe I should try with a hash index ?

If I hash the num_editeur in another field and I set my index on these 2 fields, it gives me better cost, but performance is always slow :-(


In fact it is a CPU-Bound... But why ?
One query takes up to 14% of CPU usage ! The server is a bi-proc PIII 660.
And Id'like to run more than 100 at the same time...

Maybe if I set hash indices on single fields instead of one index on
multiple columns ?

Thanks,

Guillaume.




Home | Main Index | Thread Index

Privacy Policy | About PostgreSQL
Copyright © 1996 – 2012 PostgreSQL Global Development Group