Re: performance with query

Lists: pgsql-performance
From: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
To: pgsql-performance(at)postgresql(dot)org
Subject: performance with query
Date: 2009-06-16 13:37:42
Message-ID: 1245159462.5027.16.camel@dalmaso-opensuse.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi everybody, I'm creating my database on postgres and after some days
of hard work I'm arrived to obtain good performance and owfull
performace with the same configuration.
I have complex query that perform very well with mergejoin on and
nestloop off.
If I activate nestloop postgres try to use it and the query execution
become inconclusive: after 3 hours still no answare so I kill the query.
Tht's ok but, with this configuration, very simple and little query like
"slect colum from table where primarykey=value bacome incredibly slow.
The only solutionI found at the momento is to set mergejoin to off
before doing this query.
That is an awfull solution because with that solution I have to change
all the software (a big, old software) in the (many) points in witch
this kind of query are used (the same problem to set to off mergejoin
for all the system and activate it on che connection that have to make
the hard query).
Do you have any suggestion to accelerate both complex and silply query?
I've tried a lot of configuration in enabling different "Planner Method
Configuration" but the only combination that really accelerate hard
query is mergejoin on and nestloop off, other settings seems to be
useless.
Thank's in advance.


From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 14:13:22
Message-ID: 20090616141322.GB23544@eddie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jun 16, 2009 at 03:37:42PM +0200, Alberto Dalmaso wrote:
> Hi everybody, I'm creating my database on postgres and after some days
> of hard work I'm arrived to obtain good performance and owfull
> performace with the same configuration.
> I have complex query that perform very well with mergejoin on and
> nestloop off.
> If I activate nestloop postgres try to use it and the query execution
> become inconclusive: after 3 hours still no answare so I kill the query.
> Tht's ok but, with this configuration, very simple and little query like
> "slect colum from table where primarykey=value bacome incredibly slow.
> The only solutionI found at the momento is to set mergejoin to off
> before doing this query.
> That is an awfull solution because with that solution I have to change
> all the software (a big, old software) in the (many) points in witch
> this kind of query are used (the same problem to set to off mergejoin
> for all the system and activate it on che connection that have to make
> the hard query).
> Do you have any suggestion to accelerate both complex and silply query?
> I've tried a lot of configuration in enabling different "Planner Method
> Configuration" but the only combination that really accelerate hard
> query is mergejoin on and nestloop off, other settings seems to be
> useless.
> Thank's in advance.

It would be helpful if you posted EXPLAIN ANALYZE results for both queries.
This will require you to run each query to completion; if that's not possible
for the 3 hour query, at least run EXPLAIN and post those results.

- Josh / eggyknap


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alberto Dalmaso" <dalmaso(at)clesius(dot)it>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance with query
Date: 2009-06-16 14:21:18
Message-ID: 4A37640E0200002500027BCF@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> wrote:

> I have complex query that perform very well with mergejoin on and
> nestloop off.
> If I activate nestloop postgres try to use it and the query
> execution become inconclusive: after 3 hours still no answare so I
> kill the query.
> Tht's ok but, with this configuration, very simple and little query
> like "slect colum from table where primarykey=value bacome
> incredibly slow.
> The only solutionI found at the momento is to set mergejoin to off
> before doing this query.

We'll need a lot more information to be able to provide useful
advice.

What version of PostgreSQL?

What OS?

What does the hardware look like? (CPUs, drives, memory, etc.)

Do you have autovacuum running? What other regular maintenance to you
do?

What does your postgresql.conf file look like? (If you can strip out
all comments and show the rest, that would be great.)

With that as background, if you can show us the schema for the
table(s) involved and the text of a query, along with the EXPLAIN
ANALYZE output (or just EXPLAIN, if the query runs too long to get the
EXPLAIN ANALYZE results) that would allow us to wee where things are
going wrong. Please show this information without setting any of the
optimizer options off; but then, as a diagnostic step, *also* show
EXPLAIN ANALYZE results when you set options to a configuration that
runs faster.

-Kevin


From: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 14:45:53
Message-ID: 1245163553.5027.33.camel@dalmaso-opensuse.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> What version of PostgreSQL?
8.3 that comes with opensuse 11.1
>
> What OS?
Linux, opensuse 11.1 64 bit
>
> What does the hardware look like? (CPUs, drives, memory, etc.)
2 * opteron dual core 8 GB RAM, 70 GB SCSI U320 RAID 1
>
> Do you have autovacuum running? What other regular maintenance to you
> do?
YES, autovacuum and analyze are running, the only other activity is the
wal backup
>
> What does your postgresql.conf file look like? (If you can strip out
> all comments and show the rest, that would be great.)

I'll post only the value I've changed

shared_buffers = 1536MB
temp_buffers = 5MB
max_prepared_transactions = 30

work_mem = 50MB # I've lot of work in order by
maintenance_work_mem =50MB
max_stack_depth = 6MB

max_fsm_pages = 160000
max_fsm_relations = 5000

wal_buffers = 3072kB

enable_bitmapscan = on
enable_hashagg = on
enable_hashjoin = off
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = off
enable_seqscan = off
enable_sort = off
enable_tidscan = on

effective_cache_size = 3600MB

geqo = off
default_statistics_target = 100

>
> With that as background, if you can show us the schema for the
> table(s) involved and the text of a query, along with the EXPLAIN
> ANALYZE output (or just EXPLAIN, if the query runs too long to get the
> EXPLAIN ANALYZE results) that would allow us to wee where things are
> going wrong. Please show this information without setting any of the
> optimizer options off; but then, as a diagnostic step, *also* show
> EXPLAIN ANALYZE results when you set options to a configuration that
> runs faster.
>
> -Kevin

The problem is that in the simply query it uses mergejoin instead of
nastedloop (obvious for the parameters I set) but in this situation in
becomes very very slow (15 sec vs 5 ms when I set to off mergejoin).

That is the explain of the complex query that works with more than
acceptable performance

"Merge Right Join (cost=508603077.17..508603195.59 rows=1 width=227)"
" Merge Cond: (ve_edil_rendite.id_domanda = domande.id_domanda)"
" -> GroupAggregate (cost=0.00..105.51 rows=1031 width=11)"
" -> Index Scan using pk_ve_edil_rendite on ve_edil_rendite
(cost=0.00..86.84 rows=1157 width=11)"
" -> Materialize (cost=508603077.17..508603077.18 rows=1 width=195)"
" -> Nested Loop (cost=506932259.90..508603077.17 rows=1
width=195)"
" -> Merge Join (cost=406932259.90..408603074.89 rows=1
width=188)"
" Merge Cond: (domande.id_domanda =
c_elaout_7.id_domanda)"
" -> Merge Join (cost=406932259.90..408188339.97
rows=1 width=240)"
" Merge Cond: (c_elaout_5.id_domanda =
domande.id_domanda)"
" -> Merge Join (cost=3895.15..1259628.81
rows=138561 width=41)"
" Merge Cond: (edil_veneto.id_domanda =
c_elaout_5.id_domanda)"
" -> Merge Join
(cost=1123.18..372710.75 rows=98122 width=29)"
" Merge Cond:
(edil_veneto.id_domanda = c_elaout_6.id_domanda)"
" -> Index Scan using
"IDX_pk_Edil_Veneto" on edil_veneto (cost=0.00..11825.14 rows=232649
width=17)"
" -> Index Scan using
"IDX_3_c_elaout" on c_elaout c_elaout_6 (cost=0.00..359914.34
rows=98122 width=12)"
" Index Cond:
((c_elaout_6.node)::text = 'contributo_sociale'::text)"
" -> Index Scan using "IDX_3_c_elaout"
on c_elaout c_elaout_5 (cost=0.00..887091.20 rows=245306 width=12)"
" Index Cond:
((c_elaout_5.node)::text = 'contributo'::text)"
" -> Materialize
(cost=406928364.74..406928364.75 rows=1 width=199)"
" -> Nested Loop
(cost=402583154.89..406928364.74 rows=1 width=199)"
" Join Filter:
((r_enti.codice_ente)::text = (r_luoghi.cod_catastale)::text)"
" -> Merge Join
(cost=202583154.89..206928031.60 rows=1 width=198)"
" Merge Cond:
(domande.id_domanda = c_elaout_4.id_domanda)"
" -> Merge Join
(cost=202583154.89..206425374.54 rows=1 width=186)"
" Merge Cond:
(domande.id_domanda = c_elain_3.id_domanda)"
" -> Merge Join
(cost=201328203.80..205170407.27 rows=41 width=138)"
" Merge Cond:
(domande.id_domanda = c_elain_7.id_domanda)"
" -> Merge Join
(cost=201328203.80..204498966.35 rows=93 width=126)"
" Merge
Cond: (domande.id_domanda = c_elain_9.id_domanda)"
" -> Merge
Join (cost=201322293.83..203828121.81 rows=424 width=114)"
"
Merge Cond: (domande.id_domanda = c_elain_8.id_domanda)"
" ->
Nested Loop (cost=201318498.02..203164011.74 rows=2431 width=102)"
"
-> Merge Join (cost=101318498.02..103147289.10 rows=2431 width=79)"
"
Merge Cond: (domande.id_domanda = doc.id)"
"
-> Merge Join (cost=101318487.80..103060677.64 rows=2493 width=75)"
"
Merge Cond: (domande.id_domanda = c_elain_1.id_domanda)"
"
-> Merge Join (cost=101316002.90..102447327.03 rows=15480 width=63)"
"
Merge Cond: (domande.id_domanda = c_elain.id_domanda)"
"
-> Merge Join (cost=101314975.72..101780946.74 rows=88502 width=51)"
"
Merge Cond: (c_elain_2.id_domanda = domande.id_domanda)"
"
-> Index Scan using "IDX_1_c_elain" on c_elain c_elain_2
(cost=0.00..461104.96 rows=129806 width=12)"
"
Index Cond: ((node)::text = 'N_componenti'::text)"
"
-> Sort (cost=101314967.66..101316800.15 rows=732995 width=39)"
"
Sort Key: domande.id_domanda"
"
-> Merge Join (cost=119414.31..1243561.32 rows=732995 width=39)"
"
Merge Cond: (domande.id_dichiarazione =
generiche_data_nascita_piu_anziano.id_dichiarazione)"
"
-> Merge Join (cost=18770.82..1126115.64 rows=123933 width=39)"
"
Merge Cond: (domande.id_dichiarazione = c_elaout.id_domanda)"
"
-> Index Scan using "IDX_5_domande" on domande (cost=0.00..91684.40
rows=31967 width=27)"
"
Index Cond: (id_servizio = 11002)"
"
Filter: (id_ente > 0)"
"
-> Index Scan using "IDX_2_c_elaout" on c_elaout
(cost=0.00..1031179.16 rows=805279 width=12)"
"
Filter: ((c_elaout.node)::text = 'ISEE'::text)"
"
-> Materialize (cost=100643.49..106653.58 rows=601009 width=12)"
"
-> Subquery Scan generiche_data_nascita_piu_anziano
(cost=0.00..100042.48 rows=601009 width=12)"
"
-> GroupAggregate (cost=0.00..94032.39 rows=601009 width=12)"
"
-> Index Scan using "IDX_1_componenti" on componenti
(cost=0.00..76403.45 rows=2023265 width=12)"
"
-> Index Scan using "IDX_1_c_elain" on c_elain (cost=0.00..665581.51
rows=188052 width=12)"
"
Index Cond: ((c_elain.node)::text = 'VSE'::text)"
"
-> Index Scan using "IDX_1_c_elain" on c_elain c_elain_1
(cost=0.00..613000.48 rows=173074 width=12)"
"
Index Cond: ((c_elain_1.node)::text = 'AffittoISEE'::text)"
"
-> Index Scan using pk_doc on doc (cost=0.00..81963.12 rows=1847118
width=4)"
"
Filter: (doc.id_tp_stato_doc = 1)"
"
-> Index Scan using "IDX_pk_R_Enti" on r_enti (cost=0.00..6.87 rows=1
width=31)"
"
Index Cond: (r_enti.id_ente = domande.id_ente)"
" ->
Index Scan using "IDX_1_c_elain" on c_elain c_elain_8
(cost=0.00..663631.02 rows=187497 width=12)"
"
Index Cond: ((c_elain_8.node)::text = 'Spese'::text)"
" -> Index
Scan using "IDX_2_c_elain" on c_elain c_elain_9 (cost=0.00..670253.16
rows=235758 width=12)"
"
Filter: ((c_elain_9.node)::text = 'Mesi'::text)"
" -> Index Scan
using "IDX_2_c_elain" on c_elain c_elain_7 (cost=0.00..670253.16
rows=474845 width=12)"
" Filter:
((c_elain_7.node)::text = 'Affitto'::text)"
" -> Materialize
(cost=1254951.09..1254963.95 rows=1286 width=48)"
" -> Merge Join
(cost=2423.84..1254949.80 rows=1286 width=48)"
" Merge
Cond: (c_elain_3.id_domanda = c_elaout_1.id_domanda)"
" -> Merge
Join (cost=1094.64..606811.53 rows=1492 width=36)"
"
Merge Cond: (c_elain_3.id_domanda = c_elaout_3.id_domanda)"
" ->
Merge Join (cost=224.20..182997.39 rows=2667 width=24)"
"
Merge Cond: (c_elain_3.id_domanda = c_elaout_2.id_domanda)"
"
-> Index Scan using "IDX_1_c_elain" on c_elain c_elain_3
(cost=0.00..74101.14 rows=19621 width=12)"
"
Index Cond: ((node)::text = 'Solo_anziani'::text)"
"
-> Index Scan using "IDX_3_c_elaout" on c_elaout c_elaout_2
(cost=0.00..108761.74 rows=28155 width=12)"
"
Index Cond: ((c_elaout_2.node)::text = 'ise_fsa'::text)"
" ->
Index Scan using "IDX_3_c_elaout" on c_elaout c_elaout_3
(cost=0.00..423543.07 rows=115886 width=12)"
"
Index Cond: ((c_elaout_3.node)::text = 'incidenza'::text)"
" -> Index
Scan using "IDX_3_c_elaout" on c_elaout c_elaout_1
(cost=0.00..647740.85 rows=178481 width=12)"
"
Index Cond: ((c_elaout_1.node)::text = 'isee_fsa'::text)"
" -> Index Scan using
"IDX_3_c_elaout" on c_elaout c_elaout_4 (cost=0.00..502312.35
rows=137879 width=12)"
" Index Cond:
((c_elaout_4.node)::text = 'esito'::text)"
" -> Seq Scan on r_luoghi
(cost=100000000.00..100000200.84 rows=10584 width=11)"
" -> Index Scan using "IDX_3_c_elaout" on c_elaout
c_elaout_7 (cost=0.00..414451.53 rows=113348 width=12)"
" Index Cond: ((c_elaout_7.node)::text =
'contributo_regolare'::text)"
" -> Index Scan using "IDX_pk_VE_EDIL_tp_superfici" on
ve_edil_tp_superfici (cost=0.00..2.27 rows=1 width=11)"
" Index Cond: (ve_edil_tp_superfici.id_tp_superficie
= edil_veneto.id_tp_superficie)"

and that is the explain of the too slow simple query

"Merge Join (cost=0.00..1032305.52 rows=4 width=12)"
" Merge Cond: (domande.id_dichiarazione = c_elaout.id_domanda)"
" -> Index Scan using "IDX_8_domande" on domande (cost=0.00..8.39
rows=1 width=4)"
" Index Cond: (id_domanda = 4165757)"
" -> Index Scan using "IDX_2_c_elaout" on c_elaout
(cost=0.00..1030283.89 rows=805279 width=12)"
" Filter: ((c_elaout.node)::text = 'Invalido'::text)"

this cost 15 sec

with mergejoin to off:

"Nested Loop (cost=100000000.00..100000022.97 rows=4 width=12)"
" -> Index Scan using "IDX_8_domande" on domande (cost=0.00..8.39
rows=1 width=4)"
" Index Cond: (id_domanda = 4165757)"
" -> Index Scan using "IDX_2_c_elaout" on c_elaout (cost=0.00..14.54
rows=4 width=12)"
" Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione)"
" Filter: ((c_elaout.node)::text = 'Invalido'::text)"

this cost 15 msec!!!

This query work fine even with
set enable_mergejoin='on';
set enable_nestloop='on';

"Nested Loop (cost=0.00..22.97 rows=4 width=12) (actual
time=10.110..10.122 rows=1 loops=1)"
" -> Index Scan using "IDX_8_domande" on domande (cost=0.00..8.39
rows=1 width=4) (actual time=0.071..0.075 rows=1 loops=1)"
" Index Cond: (id_domanda = 4165757)"
" -> Index Scan using "IDX_2_c_elaout" on c_elaout (cost=0.00..14.54
rows=4 width=12) (actual time=10.029..10.031 rows=1 loops=1)"
" Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione)"
" Filter: ((c_elaout.node)::text = 'Invalido'::text)"
"Total runtime: 10.211 ms"

but in this situation the previous kind of query doesn't arrive at the
end and the plan becomes:
"Merge Right Join (cost=100707011.72..100707130.15 rows=1 width=227)"
" Merge Cond: (ve_edil_rendite.id_domanda = domande.id_domanda)"
" -> GroupAggregate (cost=0.00..105.51 rows=1031 width=11)"
" -> Index Scan using pk_ve_edil_rendite on ve_edil_rendite
(cost=0.00..86.84 rows=1157 width=11)"
" -> Materialize (cost=100707011.72..100707011.73 rows=1 width=195)"
" -> Nested Loop (cost=100689558.36..100707011.72 rows=1
width=195)"
" -> Nested Loop (cost=100689558.36..100706997.17 rows=1
width=247)"
" -> Nested Loop (cost=100689558.36..100706982.62
rows=1 width=235)"
" Join Filter: ((r_enti.codice_ente)::text =
(r_luoghi.cod_catastale)::text)"
" -> Nested Loop (cost=689558.36..706649.48
rows=1 width=234)"
" -> Nested Loop
(cost=689558.36..706647.20 rows=1 width=227)"
" -> Nested Loop
(cost=689558.36..706632.65 rows=1 width=215)"
" -> Nested Loop
(cost=689558.36..706618.10 rows=1 width=203)"
" Join Filter:
(domande.id_domanda = edil_veneto.id_domanda)"
" -> Index Scan using
"IDX_pk_Edil_Veneto" on edil_veneto (cost=0.00..11825.14 rows=232649
width=17)"
" -> Materialize
(cost=689558.36..689558.37 rows=1 width=186)"
" -> Nested Loop
(cost=100643.49..689558.36 rows=1 width=186)"
" ->
Nested Loop (cost=100643.49..689543.81 rows=1 width=174)"
" ->
Nested Loop (cost=100643.49..689530.86 rows=1 width=162)"
"
-> Nested Loop (cost=100643.49..689517.93 rows=1 width=150)"
"
-> Nested Loop (cost=100643.49..689505.01 rows=1 width=138)"
"
-> Nested Loop (cost=100643.49..689490.46 rows=1 width=126)"
"
-> Nested Loop (cost=100643.49..688816.73 rows=44 width=114)"
"
-> Merge Join (cost=100643.49..657277.54 rows=2431 width=102)"
"
Merge Cond: (domande.id_dichiarazione =
generiche_data_nascita_piu_anziano.id_dichiarazione)"
"
-> Nested Loop (cost=0.00..549096.04 rows=412 width=102)"
"
-> Nested Loop (cost=0.00..547345.02 rows=106 width=90)"
"
-> Nested Loop (cost=0.00..546615.85 rows=106 width=67)"
"
-> Nested Loop (cost=0.00..545694.51 rows=109 width=63)"
"
-> Nested Loop (cost=0.00..537605.96 rows=621 width=51)"
"
-> Nested Loop (cost=0.00..487675.59 rows=3860 width=39)"
"
-> Index Scan using "IDX_5_domande" on domande (cost=0.00..91684.40
rows=31967 width=27)"
"
Index Cond: (id_servizio = 11002)"
"
Filter: (id_ente > 0)"
"
-> Index Scan using "IDX_1_c_elain" on c_elain c_elain_2
(cost=0.00..12.37 rows=1 width=12)"
"
Index Cond: (((c_elain_2.node)::text = 'N_componenti'::text) AND
(c_elain_2.id_domanda = domande.id_domanda))"
"
-> Index Scan using "IDX_1_c_elain" on c_elain c_elain_1
(cost=0.00..12.92 rows=1 width=12)"
"
Index Cond: (((c_elain_1.node)::text = 'AffittoISEE'::text) AND
(c_elain_1.id_domanda = domande.id_domanda))"
"
-> Index Scan using "IDX_1_c_elain" on c_elain (cost=0.00..13.01
rows=1 width=12)"
"
Index Cond: (((c_elain.node)::text = 'VSE'::text) AND
(c_elain.id_domanda = domande.id_domanda))"
"
-> Index Scan using pk_doc on doc (cost=0.00..8.44 rows=1 width=4)"
"
Index Cond: (doc.id = domande.id_domanda)"
"
Filter: (doc.id_tp_stato_doc = 1)"
"
-> Index Scan using "IDX_pk_R_Enti" on r_enti (cost=0.00..6.87 rows=1
width=31)"
"
Index Cond: (r_enti.id_ente = domande.id_ente)"
"
-> Index Scan using "IDX_2_c_elaout" on c_elaout (cost=0.00..16.47
rows=4 width=12)"
"
Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione)"
"
Filter: ((c_elaout.node)::text = 'ISEE'::text)"
"
-> Materialize (cost=100643.49..106653.58 rows=601009 width=12)"
"
-> Subquery Scan generiche_data_nascita_piu_anziano
(cost=0.00..100042.48 rows=601009 width=12)"
"
-> GroupAggregate (cost=0.00..94032.39 rows=601009 width=12)"
"
-> Index Scan using "IDX_1_componenti" on componenti
(cost=0.00..76403.45 rows=2023265 width=12)"
"
-> Index Scan using "IDX_1_c_elain" on c_elain c_elain_3
(cost=0.00..12.96 rows=1 width=12)"
"
Index Cond: (((c_elain_3.node)::text = 'Solo_anziani'::text) AND
(c_elain_3.id_domanda = domande.id_domanda))"
"
-> Index Scan using "IDX_3_c_elaout" on c_elaout c_elaout_2
(cost=0.00..15.30 rows=1 width=12)"
"
Index Cond: (((c_elaout_2.node)::text = 'ise_fsa'::text) AND
(c_elaout_2.id_domanda = domande.id_domanda))"
"
-> Index Scan using "IDX_2_c_elaout" on c_elaout c_elaout_3
(cost=0.00..14.54 rows=1 width=12)"
"
Index Cond: (c_elaout_3.id_domanda = domande.id_domanda)"
"
Filter: ((c_elaout_3.node)::text = 'incidenza'::text)"
"
-> Index Scan using "IDX_2_c_elain" on c_elain c_elain_9
(cost=0.00..12.91 rows=1 width=12)"
"
Index Cond: (c_elain_9.id_domanda = domande.id_domanda)"
"
Filter: ((c_elain_9.node)::text = 'Mesi'::text)"
"
-> Index Scan using "IDX_2_c_elain" on c_elain c_elain_8
(cost=0.00..12.91 rows=1 width=12)"
"
Index Cond: (c_elain_8.id_domanda = domande.id_domanda)"
"
Filter: ((c_elain_8.node)::text = 'Spese'::text)"
" ->
Index Scan using "IDX_2_c_elain" on c_elain c_elain_7 (cost=0.00..12.91
rows=3 width=12)"
"
Index Cond: (c_elain_7.id_domanda = domande.id_domanda)"
"
Filter: ((c_elain_7.node)::text = 'Affitto'::text)"
" -> Index
Scan using "IDX_2_c_elaout" on c_elaout c_elaout_1 (cost=0.00..14.54
rows=1 width=12)"
"
Index Cond: (c_elaout_1.id_domanda = domande.id_domanda)"
"
Filter: ((c_elaout_1.node)::text = 'isee_fsa'::text)"
" -> Index Scan using
"IDX_2_c_elaout" on c_elaout c_elaout_7 (cost=0.00..14.54 rows=1
width=12)"
" Index Cond:
(c_elaout_7.id_domanda = domande.id_domanda)"
" Filter:
((c_elaout_7.node)::text = 'contributo_regolare'::text)"
" -> Index Scan using
"IDX_2_c_elaout" on c_elaout c_elaout_6 (cost=0.00..14.54 rows=1
width=12)"
" Index Cond:
(c_elaout_6.id_domanda = domande.id_domanda)"
" Filter:
((c_elaout_6.node)::text = 'contributo_sociale'::text)"
" -> Index Scan using
"IDX_pk_VE_EDIL_tp_superfici" on ve_edil_tp_superfici (cost=0.00..2.27
rows=1 width=11)"
" Index Cond:
(ve_edil_tp_superfici.id_tp_superficie = edil_veneto.id_tp_superficie)"
" -> Seq Scan on r_luoghi
(cost=100000000.00..100000200.84 rows=10584 width=11)"
" -> Index Scan using "IDX_2_c_elaout" on c_elaout
c_elaout_5 (cost=0.00..14.54 rows=1 width=12)"
" Index Cond: (c_elaout_5.id_domanda =
domande.id_domanda)"
" Filter: ((c_elaout_5.node)::text =
'contributo'::text)"
" -> Index Scan using "IDX_2_c_elaout" on c_elaout
c_elaout_4 (cost=0.00..14.54 rows=1 width=12)"
" Index Cond: (c_elaout_4.id_domanda =
domande.id_domanda)"
" Filter: ((c_elaout_4.node)::text = 'esito'::text)"

Really thanks for your interest and your help!


From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 14:58:56
Message-ID: alpine.DEB.2.00.0906161554550.4337@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 16 Jun 2009, Alberto Dalmaso wrote:
>> What does your postgresql.conf file look like?

> enable_hashjoin = off
> enable_nestloop = off
> enable_seqscan = off
> enable_sort = off

Why are these switched off?

> and that is the explain of the too slow simple query
>
> "Merge Join (cost=0.00..1032305.52 rows=4 width=12)"
> " Merge Cond: (domande.id_dichiarazione = c_elaout.id_domanda)"
> " -> Index Scan using "IDX_8_domande" on domande (cost=0.00..8.39
> rows=1 width=4)"
> " Index Cond: (id_domanda = 4165757)"
> " -> Index Scan using "IDX_2_c_elaout" on c_elaout
> (cost=0.00..1030283.89 rows=805279 width=12)"
> " Filter: ((c_elaout.node)::text = 'Invalido'::text)"
>
> this cost 15 sec
>
>
> with mergejoin to off:
>
> "Nested Loop (cost=100000000.00..100000022.97 rows=4 width=12)"
> " -> Index Scan using "IDX_8_domande" on domande (cost=0.00..8.39
> rows=1 width=4)"
> " Index Cond: (id_domanda = 4165757)"
> " -> Index Scan using "IDX_2_c_elaout" on c_elaout (cost=0.00..14.54
> rows=4 width=12)"
> " Index Cond: (c_elaout.id_domanda = domande.id_dichiarazione)"
> " Filter: ((c_elaout.node)::text = 'Invalido'::text)"
>
> this cost 15 msec!!!

Well duh. What you're effectively doing is telling Postgres to NEVER use a
nested loop. Then you're getting upset because it isn't using a nested
loop. When you tell it to NEVER use anything (switching all join
algorithms off), it ignores you and chooses the right plan anyway.

Matthew

--
You can configure Windows, but don't ask me how. -- Bill Gates


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alberto Dalmaso" <dalmaso(at)clesius(dot)it>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance with query
Date: 2009-06-16 15:06:33
Message-ID: 4A376EA90200002500027BE1@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> wrote:

>> What version of PostgreSQL?
> 8.3 that comes with opensuse 11.1

Could you show us the result of SELECT version(); ?

> max_prepared_transactions = 30

Unless you're using distributed transactions or need a lot of locks,
that's just going to waste some RAM. Zero is fine for most people.

> maintenance_work_mem =50MB

That's a little small -- this only comes into play for maintenance
tasks like index builds. Not directly part of your reported problem,
but maybe something to bump to the 1GB range.

> max_fsm_pages = 160000
> max_fsm_relations = 5000

Have you done any VACUUM VERBOSE lately and captured the output? If
so, what do the last few lines say? (That's a lot of relations for
the number of pages; just curious how it maps to actual.)

> enable_hashjoin = off
> enable_nestloop = off
> enable_seqscan = off
> enable_sort = off

That's probably a bad idea. If particular queries aren't performing
well, you can always set these temporarily on a particular connection.
Even then, turning these off is rarely a good idea except for
diagnostic purposes. I *strongly* recommend you put all of these back
to the defaults of 'on' and start from there, turning off selected
items as needed to get EXPLAIN ANALYZE output to demonstrate the
better plans you've found for particular queries.

> effective_cache_size = 3600MB

That seems a little on the low side for an 8GB machine, unless you
have other things on there using a lot of RAM. Do you?

If you could set the optimizer options back on and get new plans where
you show specifically which options (if any) where turned off for the
run, that would be good. Also, please attach the plans to the email
instead of pasting -- the word wrap makes them hard to read. Finally,
if you could do \d on the tables involved in the query, it would help.
I'll hold off looking at these in hopes that you can do the above.

-Kevin


From: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 15:12:54
Message-ID: 1245165174.5027.37.camel@dalmaso-opensuse.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
> On Tue, 16 Jun 2009, Alberto Dalmaso wrote:
> >> What does your postgresql.conf file look like?
>
> > enable_hashjoin = off
> > enable_nestloop = off
> > enable_seqscan = off
> > enable_sort = off
>
> Why are these switched off?
>
because of the need to pump up the performance of the complex query. If
I set then to on then it try to use nasted loop even in the complex
query and that query does never arrive to a response.... and, of course,
I need a response from it!!!
So my problem is to find a configuration taht save performance for all
the two kind of query, but I'm not abble to find it.
Move to parameters of the RAM can save a 10% of the time in the complex
query, wile I have no changes on the simple one...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 15:31:30
Message-ID: 15480.1245166290@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> writes:
> Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
>>> enable_hashjoin = off
>>> enable_nestloop = off
>>> enable_seqscan = off
>>> enable_sort = off
>>
>> Why are these switched off?
>>
> because of the need to pump up the performance of the complex query.

That is *not* the way to improve performance of a query. Turning off
specific enable_ parameters can be helpful while investigating planner
behavior, but it is never recommended as a production solution. You
have already found out why.

regards, tom lane


From: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 15:31:56
Message-ID: 1245166316.5027.51.camel@dalmaso-opensuse.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Could you show us the result of SELECT version(); ?
of course I can
PostgreSQL 8.3.7 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE
Linux) 4.3.2 [gcc-4_3-branch revision 141291]
>
> Have you done any VACUUM VERBOSE lately and captured the output? If
> so, what do the last few lines say? (That's a lot of relations for
> the number of pages; just curious how it maps to actual.)
It need a lot of time (20 GB database), when I will have the answare
I'll post it
>
> > enable_hashjoin = off
> > enable_nestloop = off
> > enable_seqscan = off
> > enable_sort = off
>
> That's probably a bad idea. If particular queries aren't performing
> well, you can always set these temporarily on a particular connection.
> Even then, turning these off is rarely a good idea except for
> diagnostic purposes. I *strongly* recommend you put all of these back
> to the defaults of 'on' and start from there, turning off selected
> items as needed to get EXPLAIN ANALYZE output to demonstrate the
> better plans you've found for particular queries.

OK, it will became the viceversa of what I'm doing now (set them to on
and set them to off only on the appropriate connection instead of set
them to off and set them to on only on some appropriate connection).
But the question is: do you thing it is impossible to find a
configuration that works fine for both the kind of query? The
application have to run even versus oracle db... i wont have to write a
different source for the two database...

>
> > effective_cache_size = 3600MB
>
> That seems a little on the low side for an 8GB machine, unless you
> have other things on there using a lot of RAM. Do you?
yes there are two instances of postgress running on the same server (the
database have to stay complitely separated).
>
> If you could set the optimizer options back on and get new plans where
> you show specifically which options (if any) where turned off for the
> run, that would be good. Also, please attach the plans to the email
> instead of pasting -- the word wrap makes them hard to read. Finally,
> if you could do \d on the tables involved in the query, it would help.
> I'll hold off looking at these in hopes that you can do the above.
>
> -Kevin
I attach the explanation of the log query after setting all the enable
to on. In this condition the query will never finish...

Attachment Content-Type Size
long_query-explain.txt text/plain 12.6 KB

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alberto Dalmaso" <dalmaso(at)clesius(dot)it>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance with query
Date: 2009-06-16 15:32:36
Message-ID: 4A3774C40200002500027BED@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> wrote:
> Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha
> scritto:
>> On Tue, 16 Jun 2009, Alberto Dalmaso wrote:

>> > enable_hashjoin = off
>> > enable_nestloop = off
>> > enable_seqscan = off
>> > enable_sort = off
>>
>> Why are these switched off?
>>
> because of the need to pump up the performance of the complex query.

These really are meant primarily for diagnostic purposes. As a last
resort, you could set them off right before running a problem query,
and set them back on again afterward; but you will be much better off
if you can cure the underlying problem. The best chance of that is to
show us the plan you get with all turned on.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alberto Dalmaso" <dalmaso(at)clesius(dot)it>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance with query
Date: 2009-06-16 15:48:30
Message-ID: 4A37787E0200002500027BF8@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> wrote:

> do you thing it is impossible to find a
> configuration that works fine for both the kind of query?

No. We probably just need a little more information.

> The application have to run even versus oracle db... i wont have to
> write a different source for the two database...

I understand completely.

> I attach the explanation of the log query after setting all the
> enable to on. In this condition the query will never finish...

We're getting close. Can you share the table structure and the actual
query you are running? It's a lot easier (for me, anyway) to put this
puzzle together with all the pieces in hand.

Also, if you can set off some of the optimizer options and get a fast
plan, please show us an EXPLAIN ANALYZE for that, with information on
which settings were turned off. That will help show where bad
estimates may be causing a problem, or possibly give a hint of table
or index bloat problems.

I think we're getting close to critical mass for seeing the
solution....

-Kevin


From: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 15:54:37
Message-ID: 1245167677.5027.72.camel@dalmaso-opensuse.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Il giorno mar, 16/06/2009 alle 11.31 -0400, Tom Lane ha scritto:
> Alberto Dalmaso <dalmaso(at)clesius(dot)it> writes:
> > Il giorno mar, 16/06/2009 alle 15.58 +0100, Matthew Wakeling ha scritto:
> >>> enable_hashjoin = off
> >>> enable_nestloop = off
> >>> enable_seqscan = off
> >>> enable_sort = off
> >>
> >> Why are these switched off?
> >>
> > because of the need to pump up the performance of the complex query.
>
> That is *not* the way to improve performance of a query. Turning off
> specific enable_ parameters can be helpful while investigating planner
> behavior, but it is never recommended as a production solution. You
> have already found out why.
>
> regards, tom lane
Ok, but the problem is that my very long query performes quite well when
it works with merge join but it cannot arrive to an end if it use other
kind of joining.
If i put all the parameter to on, as both of you tell me, in the
explanation I'll see that the db use nasted loop.
If i put to off nasted loop, it will use hash join.
How can I write the query so that the analyzer will use mergejoin (that
is the only option that permit the query to give me the waited answare)
without changing the settings every time on the connection?


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alberto Dalmaso" <dalmaso(at)clesius(dot)it>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance with query
Date: 2009-06-16 16:00:04
Message-ID: 4A377B340200002500027BFD@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> wrote:

> I attach the explanation of the log query after setting all the
> enable to on. In this condition the query will never finish...

I notice that you many joins in there. If the query can't be
simplified, you probably need to boost the join_collapse_limit and
from_collapse_limit quite a bit. If planning time goes through the
roof in that case, you may need to enable geqo -- this is what it's
intended to help. If you try geqo, you may need to tune it; I'm not
familiar with the knobs for tuning that, so maybe someone else will
jump in if you get to that point.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 16:12:56
Message-ID: 16299.1245168776@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso <dalmaso(at)clesius(dot)it> writes:
> Ok, but the problem is that my very long query performes quite well when
> it works with merge join but it cannot arrive to an end if it use other
> kind of joining.
> If i put all the parameter to on, as both of you tell me, in the
> explanation I'll see that the db use nasted loop.
> If i put to off nasted loop, it will use hash join.
> How can I write the query so that the analyzer will use mergejoin (that
> is the only option that permit the query to give me the waited answare)
> without changing the settings every time on the connection?

You have the wrong mindset completely. Instead of thinking "how can I
force the planner to do it my way", you need to be thinking "why is the
planner guessing wrong about which is the best way to do it? And how
can I improve its guess?"

There's not really enough information in what you've posted so far to
let people help you with that question, but one thing that strikes me
from the EXPLAIN is that you have a remarkably large number of joins.
Perhaps increasing from_collapse_limit and/or join_collapse_limit
(to more than the number of tables in the query) would help.

regards, tom lane


From: Alberto Dalmaso <dalmaso(at)clesius(dot)it>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance with query
Date: 2009-06-16 16:12:58
Message-ID: 1245168778.5027.82.camel@dalmaso-opensuse.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Unfortunatly the query need that level of complxity as the information I
have to show are spread around different table.
I have tryed the geqo on at the beginning but only with the default
parameters.
Tomorrow (my working day here in Italy is finished some minutes ago, so
I will wait for the end of the explain analyze and the go home ;-P )
I'll try to increase, as you suggest, join_collapse_limit and
from_collapse_limit.
If someone can give me some information on how to configure geqo, I'll
try it again.
In the meantime this night I leave the vacuum verbose to work for me.


From: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Speeding up a query.
Date: 2009-06-16 18:35:23
Message-ID: 366642367C5B354197A1E0D27BC175BD0225970C@KGHMAIL.KGH.ON.CA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Good afternoon.

I have developed an application to efficiently schedule chemotherapy
patients at our hospital. The application takes into account several
resource constraints (available chairs, available nurses, nurse coverage
assignment to chairs) as well as the chair time and nursing time
required for a regimen.

The algorithm for packing appointments in respects each constraint and
typically schedules a day of treatments (30-60) within 9-10 seconds on
my workstation, down from 27 seconds initially. I would like to get it
below 5 seconds if possible.

I think what's slowing is down is simply the number of rows and joins.
The algorithm creates a scheduling matrix with one row per 5 minute
timeslot, per unit, per nurse assigned to the unit. That translates to
3,280 rows for the days I have designed in development (each day can
change).

To determine the available slots, the algorithm finds the earliest slot
that has an available chair and a count of the required concurrent
intervals afterwards. So a 60 minute regimen requires 12 concurrent
rows. This is accomplished by joining the table on itself. A second
query is ran for the same range, but with respect to the nurse time and
an available nurse. Finally, those two are joined against each other.
Effectively, it is:

Select *
>From (
Select *
From matrix m1, matrix m2
Where m1.xxxxx = m2.xxxxx
) chair,
(
Select *
From matrix m1, matrix m2
Where m1.xxxxx = m2.xxxxx
) nurse
Where chair.id = nurse.id

With matrix having 3,280 rows. Ugh.

I have tried various indexes and clustering approachs with little
success. Any ideas?

Thanks,

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294


From: Anthony Presley <anthony(at)resolution(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up a query.
Date: 2009-06-16 19:36:45
Message-ID: 1245181005.3687.80.camel@dellLaptop.resolution.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On the DB side of things, you will want to make sure that your caching
as much as possible - putting a front-end like memcached could help. I
assume you have indexes on the appropriate tables? What does the
EXPLAIN ANALYZE on that query look like?

Not necessarily a "postgres" solution, but I'd think this type of
solution would work really, really well inside of say a a mixed integer
or integer solver ... something like glpk or cbc. You'd need to
reformulate the problem, but we've built applications using these tools
which can crunch through multiple billions of combinations in under 1 or
2 seconds.

(Of course, you still need to store the results, and feed the input,
using a database of some kind).

--
Anthony Presley

On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From (
> Select *
> From matrix m1, matrix m2
> Where m1.xxxxx = m2.xxxxx
> ) chair,
> (
> Select *
> From matrix m1, matrix m2
> Where m1.xxxxx = m2.xxxxx
> ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?
>
> Thanks,
>
> Matthew Hartman
> Programmer/Analyst
> Information Management, ICP
> Kingston General Hospital
> (613) 549-6666 x4294
>
>


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Alberto Dalmaso *EXTERN*" <dalmaso(at)clesius(dot)it>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: performance with query (OT)
Date: 2009-06-17 07:16:59
Message-ID: D960CB61B694CF459DCFB4B0128514C202FF6651@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alberto Dalmaso wrote:
[...]
> in the explanation I'll see that the db use nasted loop.
[...]

Sorry for the remark off topic, but I *love* the term
"nasted loop". It should not go to oblivion unnoticed.

Yours,
Laurenz Albe


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Hartman, Matthew *EXTERN*" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Speeding up a query.
Date: 2009-06-17 07:33:35
Message-ID: D960CB61B694CF459DCFB4B0128514C202FF6652@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Matthew Hartman wrote:
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From (
> Select *
> From matrix m1, matrix m2
> Where m1.xxxxx = m2.xxxxx
> ) chair,
> (
> Select *
> From matrix m1, matrix m2
> Where m1.xxxxx = m2.xxxxx
> ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?

I don't understand your data model well enough to understand
the query, so I can only give you general hints (which you probably
already know):

- Frequently the biggest performance gains can be reached by
a (painful) redesign. Can ou change the table structure in a way
that makes this query less expensive?

- You have an index on matrix.xxxxx, right?

- Can you reduce the row count of the two subqueries by adding
additional conditions that weed out rows that can be excluded
right away?

- Maybe you can gain a little by changing the "select *" to
"select id" in both subqueries and adding an additional join
with matrix that adds the relevant columns in the end.
I don't know the executor, so I don't know if that will help,
but it would be a simple thing to test in an experiment.

Yours,
Laurenz Albe


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Hartman, Matthew *EXTERN*" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up a query.
Date: 2009-06-17 08:40:30
Message-ID: 2f4958ff0906170140q673ba1c6jf8eff4aa42798a22@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Jun 17, 2009 at 8:33 AM, Albe Laurenz<laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:

>
> I don't understand your data model well enough to understand
> the query, so I can only give you general hints (which you probably
> already know):

He is effectively joining same table 4 times in a for loop, to get
result, this is veeery ineffective.
imagine:
for(x)
for(x)
for(x)
for(x)
..

where X is number of rows in table matrix. not scarred yet ?

--
GJ


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up a query.
Date: 2009-06-17 13:08:33
Message-ID: b42b73150906170608p2f1cc545lf0ded4113b723b7a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jun 16, 2009 at 2:35 PM, Hartman,
Matthew<Matthew(dot)Hartman(at)krcc(dot)on(dot)ca> wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From   (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) chair,
>        (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?

how far in advance do you schedule? As far as necessary?

How many chairs are there? How many nurses are there? This is a
tricky (read: interesting) problem.

merlin


From: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
To: <pgsql-performance(at)postgresql(dot)org>
Cc: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "Grzegorz Jaskiewicz" <gryzman(at)gmail(dot)com>, "Alberto Dalmaso" <dalmaso(at)clesius(dot)it>, "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Subject: Re: Speeding up a query.
Date: 2009-06-17 14:58:50
Message-ID: 366642367C5B354197A1E0D27BC175BD0225971A@KGHMAIL.KGH.ON.CA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks for the replies everyone. I'll try to answer them all in this one email. I will send another email immediately after this with additional details about the query.

> - Frequently the biggest performance gains can be reached by
> a (painful) redesign. Can ou change the table structure in a way
> that makes this query less expensive?

I have considered redesigning the algorithm to accommodate this. As I've said, there's one row per five minute time slot. Instead, I could represent an interval of time with a row. For example, "start_time" of "08:00" with an "end_time" of "12:00" or perhaps an interval "duration" of "4 hours". The difficulty becomes in managing separate time requirements (nurse vs unit) for each time slot, and in inserting/updating new rows as pieces of those time slots or intervals are used up. Having a row per five minute interval avoids those complications so far. Still, I'd start with 32 rows and increase the number, never reaching 3,280.. :)

> - You have an index on matrix.xxxxx, right?

I have tried indexes on each common join criteria. Usually it's "time,unit", "time,nurse", or "time,unit_scheduled", "time,nurse_scheduled" (the later two being Booleans). In the first two cases it's made a difference of less than a second. In the last two, the time actually increases if I add "analyze" statements in after updates are made.

> - Can you reduce the row count of the two subqueries by adding
> additional conditions that weed out rows that can be excluded
> right away?

I use some additional conditions. I'll paste the meat of the query below.

> - Maybe you can gain a little by changing the "select *" to
> "select id" in both subqueries and adding an additional join
> with matrix that adds the relevant columns in the end.
> I don't know the executor, so I don't know if that will help,
> but it would be a simple thing to test in an experiment.

I wrote the "select *" as simplified, but really, it returns the primary key for that row.

> how far in advance do you schedule? As far as necessary?

It's done on a per day basis, each day taking 8-12 seconds or so on my workstation. We typically schedule patients as much as three to six months in advance. The query already pulls data to a temporary table to avoid having to manage a massive number of rows.

> How many chairs are there? How many nurses are there? This is a
> tricky (read: interesting) problem.

In my current template there are 17 chairs and 7 nurses. Chairs are grouped into pods of 2-4 chairs. Nurses cover one to many pods, allowing for a primary nurse per pod as well as floater nurses that cover multiple pods.

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure(at)gmail(dot)com]
Sent: Wednesday, June 17, 2009 9:09 AM
To: Hartman, Matthew
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Speeding up a query.

On Tue, Jun 16, 2009 at 2:35 PM, Hartman,
Matthew<Matthew(dot)Hartman(at)krcc(dot)on(dot)ca> wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From   (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) chair,
>        (
>        Select *
>        From matrix m1, matrix m2
>        Where m1.xxxxx = m2.xxxxx
>        ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?

how far in advance do you schedule? As far as necessary?

How many chairs are there? How many nurses are there? This is a
tricky (read: interesting) problem.

merlin


From: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
To: "Anthony Presley" <anthony(at)resolution(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Speeding up a query.
Date: 2009-06-17 15:13:14
Message-ID: 366642367C5B354197A1E0D27BC175BD0225971D@KGHMAIL.KGH.ON.CA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Sorry, I missed this reponse.

I'm entirely new to PostgreSQL and have yet to figure out how to use
EXPLAIN ANALYZE on a function. I think I realize where the problem is
though (the loop), I simply do not know how to fix it ;).

Glpk and cbc, thanks, I'll look into those. You're right, the very
nature of using a loop suggests that another tool might be more
appropriate.

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital
(613) 549-6666 x4294

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Anthony
Presley
Sent: Tuesday, June 16, 2009 3:37 PM
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Speeding up a query.

On the DB side of things, you will want to make sure that your caching
as much as possible - putting a front-end like memcached could help. I
assume you have indexes on the appropriate tables? What does the
EXPLAIN ANALYZE on that query look like?

Not necessarily a "postgres" solution, but I'd think this type of
solution would work really, really well inside of say a a mixed integer
or integer solver ... something like glpk or cbc. You'd need to
reformulate the problem, but we've built applications using these tools
which can crunch through multiple billions of combinations in under 1 or
2 seconds.

(Of course, you still need to store the results, and feed the input,
using a database of some kind).

--
Anthony Presley

On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote:
> Good afternoon.
>
> I have developed an application to efficiently schedule chemotherapy
> patients at our hospital. The application takes into account several
> resource constraints (available chairs, available nurses, nurse
coverage
> assignment to chairs) as well as the chair time and nursing time
> required for a regimen.
>
> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).
>
> To determine the available slots, the algorithm finds the earliest
slot
> that has an available chair and a count of the required concurrent
> intervals afterwards. So a 60 minute regimen requires 12 concurrent
> rows. This is accomplished by joining the table on itself. A second
> query is ran for the same range, but with respect to the nurse time
and
> an available nurse. Finally, those two are joined against each other.
> Effectively, it is:
>
> Select *
> From (
> Select *
> From matrix m1, matrix m2
> Where m1.xxxxx = m2.xxxxx
> ) chair,
> (
> Select *
> From matrix m1, matrix m2
> Where m1.xxxxx = m2.xxxxx
> ) nurse
> Where chair.id = nurse.id
>
> With matrix having 3,280 rows. Ugh.
>
> I have tried various indexes and clustering approachs with little
> success. Any ideas?
>
> Thanks,
>
> Matthew Hartman
> Programmer/Analyst
> Information Management, ICP
> Kingston General Hospital
> (613) 549-6666 x4294
>
>

--
Sent via pgsql-performance mailing list
(pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Speeding up a query.
Date: 2009-07-07 09:38:58
Message-ID: 1246959538.3874.146.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Tue, 2009-06-16 at 14:35 -0400, Hartman, Matthew wrote:

> The algorithm for packing appointments in respects each constraint and
> typically schedules a day of treatments (30-60) within 9-10 seconds on
> my workstation, down from 27 seconds initially. I would like to get it
> below 5 seconds if possible.
>
> I think what's slowing is down is simply the number of rows and joins.
> The algorithm creates a scheduling matrix with one row per 5 minute
> timeslot, per unit, per nurse assigned to the unit. That translates to
> 3,280 rows for the days I have designed in development (each day can
> change).

ISTM the efficiency of your algorithm is geometrically related to the
number of time slots into which appointments might fit. So reduce number
of possible time slots...

Assign the slot (randomly/hash/round robin) to either the morning or the
afternoon and then run exactly same queries just with half number of
time slots. That should reduce your execution time by one quarter
without using multiple CPUs for each morning/afternoon. Then run twice,
once for morning, once for afternoon.

You could parallelise this and run both at same time on different CPUs,
if the extra work is worthwhile, but it seems not, judging from your
requirements.

Another way would be to arrange all appointments that need odd number of
timeslots into pairs so that you have at most one appointment that needs
an odd number of timeslots. Then schedule appointments on 10 minute
boundaries, rounding up their timeslot requirement. (The single odd
timeslot appointment will always waste 1 timeslot).

Hope that helps.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Speeding up a query.
Date: 2009-07-07 12:32:58
Message-ID: 366642367C5B354197A1E0D27BC175BD02259847@KGHMAIL.KGH.ON.CA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> From: Simon Riggs [mailto:simon(at)2ndQuadrant(dot)com]
> Sent: Tuesday, July 07, 2009 5:39 AM
>
> Another way would be to arrange all appointments that need odd number
of
> timeslots into pairs so that you have at most one appointment that
needs
> an odd number of timeslots. Then schedule appointments on 10 minute
> boundaries, rounding up their timeslot requirement. (The single odd
> timeslot appointment will always waste 1 timeslot).

Now THAT is an interesting idea. I'll have to play with this in my head
a bit (during really boring meetings) and get back to you. Thanks!

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital