Lists: | pgsql-general |
---|
From: | David Link <dlink(at)soundscan(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 19:17:16 |
Message-ID: | 4394923C.5060808@soundscan.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hi, This has become a major problem for us. Thank you in advance for
your help.
OS: SUSE Linux 2.6.5-7.191-bigsmp
PostgreSQL: 7.4.8
Application: ModPerl Web application using DBI.pm
Database size: 100 Gb, 1025 Tables.
Problem: EXPLAIN SELECT ... does not return.
Description:
The Application uses an EXPLAIN cost to determine whether a client's dynamic
request for data is too demanding for the server so it can gracefully deny
them. (Currently, anything over cost=0.00..500000.00).
The system gets about 3000 page requests a day.
Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return. As a result they seem to stay churning in the
system. Once that happens other queries build up and the performance of the
whole database server grinds to a halt. Postgresql never dies, but
eventually, user requests start timing out.
This happens on average two or three times a week. I kill an offending
process and all's well again. I have not been able to identify with
certainty
an offending SQL statement.
Config params, that have changed from default:
tcpip_socket = true
max_connections = 200
shared_buffers = 2000
sort_mem = 1048576
vacuum_mem = 65536
max_fsm_pages = 100000
max_fsm_relations = 1000
max_files_per_process = 1000
fsync = false
wal_sync_method = fsync
wal_buffers = 800
checkpoint_segments = 30
commit_delay = 100
commit_siblings = 50
effective_cache_size = 1000
random_page_cost = 4
geqo = true
geqo_threshold = 14
default_statistics_target = 100
from_collapse_limit = 13
join_collapse_limit = 13
Note: we load lumps of data ea. week. Then primarily it is a readonly
database.
From: | Jaime Casanova <systemguards(at)gmail(dot)com> |
---|---|
To: | David Link <dlink(at)soundscan(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 19:38:39 |
Message-ID: | c2d9e70e0512051138s7c94cac5h98b884f59002a908@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On 12/5/05, David Link <dlink(at)soundscan(dot)com> wrote:
> Hi, This has become a major problem for us. Thank you in advance for
> your help.
>
> OS: SUSE Linux 2.6.5-7.191-bigsmp
> PostgreSQL: 7.4.8
> Application: ModPerl Web application using DBI.pm
> Database size: 100 Gb, 1025 Tables.
>
> Problem: EXPLAIN SELECT ... does not return.
>
> Description:
>
> The Application uses an EXPLAIN cost to determine whether a client's dynamic
> request for data is too demanding for the server so it can gracefully deny
> them. (Currently, anything over cost=0.00..500000.00).
>
> The system gets about 3000 page requests a day.
>
> Certain SQL Queries, I believe those with many table joins, when run as
> EXPLAIN plans, never return. As a result they seem to stay churning in the
> system. Once that happens other queries build up and the performance of the
> whole database server grinds to a halt. Postgresql never dies, but
> eventually, user requests start timing out.
>
> This happens on average two or three times a week. I kill an offending
> process and all's well again. I have not been able to identify with
> certainty
> an offending SQL statement.
>
> Config params, that have changed from default:
>
> tcpip_socket = true
> max_connections = 200
> shared_buffers = 2000
> sort_mem = 1048576
> vacuum_mem = 65536
> max_fsm_pages = 100000
> max_fsm_relations = 1000
> max_files_per_process = 1000
> fsync = false
> wal_sync_method = fsync
> wal_buffers = 800
> checkpoint_segments = 30
> commit_delay = 100
> commit_siblings = 50
> effective_cache_size = 1000
> random_page_cost = 4
> geqo = true
> geqo_threshold = 14
> default_statistics_target = 100
> from_collapse_limit = 13
> join_collapse_limit = 13
>
> Note: we load lumps of data ea. week. Then primarily it is a readonly
> database.
>
>
when you have thoses cases, you can take a look in pg_stats_activity
to find the offending query...
or simply logs all queries
--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | David Link <dlink(at)soundscan(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 19:54:01 |
Message-ID: | 1927.1133812441@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
David Link <dlink(at)soundscan(dot)com> writes:
> Certain SQL Queries, I believe those with many table joins, when run as
> EXPLAIN plans, never return.
I'd guess that one or all of these settings are excessive:
> geqo_threshold = 14
> from_collapse_limit = 13
> join_collapse_limit = 13
Keep in mind that the planning cost is exponential in these limits,
eg geqo_threshold = 14 probably allows planning times about 14 times
greater than geqo_threshold = 13.
While I'm looking:
> shared_buffers = 2000
That seems extremely low for modern machines.
> sort_mem = 1048576
That, on the other hand, is almost certainly way too high for a system-wide
setting. You're promising you have 1Gb available for *each* sort.
> max_fsm_pages = 100000
And this way too low for a 100Gb database, unless most of the tables
never see any UPDATEs or DELETEs.
> wal_buffers = 800
Seems a bit high, especially considering you have fsync disabled and
thus there is no benefit whatever to buffering WAL.
> commit_delay = 100
> commit_siblings = 50
Have you measured any benefit to having this turned on?
All in all it looks like your configuration settings were chosen by
throwing darts :-(
regards, tom lane
From: | David Link <dlink(at)soundscan(dot)com> |
---|---|
To: | Jaime Casanova <systemguards(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 22:07:28 |
Message-ID: | 4394BA20.3020403@soundscan.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Jaime Casanova wrote:
>when you have thoses cases, you can take a look in pg_stats_activity
>to find the offending query...
>
>or simply logs all queries
>
>
Thanks for the advice. I also turned on stat_command_string
From: | David Link <dlink(at)soundscan(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 22:11:08 |
Message-ID: | 4394BAFC.5030406@soundscan.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Tom Lane wrote:
>David Link <dlink(at)soundscan(dot)com> writes:
>
>
>>Certain SQL Queries, I believe those with many table joins, when run as
>>EXPLAIN plans, never return.
>>
>>
>
>I'd guess that one or all of these settings are excessive:
>
>
>
>>geqo_threshold = 14
>>from_collapse_limit = 13
>>join_collapse_limit = 13
>>
>>
>
>Keep in mind that the planning cost is exponential in these limits,
>eg geqo_threshold = 14 probably allows planning times about 14 times
>greater than geqo_threshold = 13.
>
>While I'm looking:
>
>
>
>>shared_buffers = 2000
>>
>>
>
>That seems extremely low for modern machines.
>
>
>
>>sort_mem = 1048576
>>
>>
>
>That, on the other hand, is almost certainly way too high for a system-wide
>setting. You're promising you have 1Gb available for *each* sort.
>
>
>
>>max_fsm_pages = 100000
>>
>>
>
>And this way too low for a 100Gb database, unless most of the tables
>never see any UPDATEs or DELETEs.
>
>
>
>>wal_buffers = 800
>>
>>
>
>Seems a bit high, especially considering you have fsync disabled and
>thus there is no benefit whatever to buffering WAL.
>
>
>
>>commit_delay = 100
>>commit_siblings = 50
>>
>>
>
>Have you measured any benefit to having this turned on?
>
>All in all it looks like your configuration settings were chosen by
>throwing darts :-(
>
> regards, tom lane
>
>
>
Thanks for your reply, Tom. Different folks have made different
suggestions. Can you suggest more reasonable values for these? But
more importantly, do you think the problem I am having is due to these
configuration short comings?
Thanks much.
David
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | David Link <dlink(at)soundscan(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 22:17:17 |
Message-ID: | 4394BC6D.509@commandprompt.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
>>
>>
> Thanks for your reply, Tom. Different folks have made different
> suggestions. Can you suggest more reasonable values for these? But
> more importantly, do you think the problem I am having is due to these
> configuration short comings?
>
> Thanks much.
David take a look at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html
Joshua D. Drake
> David
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | David Link <dlink(at)soundscan(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: EXPLAIN SELECT .. does not return |
Date: | 2005-12-05 22:19:58 |
Message-ID: | 3135.1133821198@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
David Link <dlink(at)soundscan(dot)com> writes:
> more importantly, do you think the problem I am having is due to these
> configuration short comings?
Yeah, the planning thresholds ...
regards, tom lane