EXPLAIN SELECT .. does not return

Lists: pgsql-general
From: David Link <dlink(at)soundscan(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: EXPLAIN SELECT .. does not return
Date: 2005-12-05 19:17:16
Message-ID: 4394923C.5060808@soundscan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi, This has become a major problem for us. Thank you in advance for
your help.

OS: SUSE Linux 2.6.5-7.191-bigsmp
PostgreSQL: 7.4.8
Application: ModPerl Web application using DBI.pm
Database size: 100 Gb, 1025 Tables.

Problem: EXPLAIN SELECT ... does not return.

Description:

The Application uses an EXPLAIN cost to determine whether a client's dynamic
request for data is too demanding for the server so it can gracefully deny
them. (Currently, anything over cost=0.00..500000.00).

The system gets about 3000 page requests a day.

Certain SQL Queries, I believe those with many table joins, when run as
EXPLAIN plans, never return. As a result they seem to stay churning in the
system. Once that happens other queries build up and the performance of the
whole database server grinds to a halt. Postgresql never dies, but
eventually, user requests start timing out.

This happens on average two or three times a week. I kill an offending
process and all's well again. I have not been able to identify with
certainty
an offending SQL statement.

Config params, that have changed from default:

tcpip_socket = true
max_connections = 200
shared_buffers = 2000
sort_mem = 1048576
vacuum_mem = 65536
max_fsm_pages = 100000
max_fsm_relations = 1000
max_files_per_process = 1000
fsync = false
wal_sync_method = fsync
wal_buffers = 800
checkpoint_segments = 30
commit_delay = 100
commit_siblings = 50
effective_cache_size = 1000
random_page_cost = 4
geqo = true
geqo_threshold = 14
default_statistics_target = 100
from_collapse_limit = 13
join_collapse_limit = 13

Note: we load lumps of data ea. week. Then primarily it is a readonly
database.


From: Jaime Casanova <systemguards(at)gmail(dot)com>
To: David Link <dlink(at)soundscan(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: EXPLAIN SELECT .. does not return
Date: 2005-12-05 19:38:39
Message-ID: c2d9e70e0512051138s7c94cac5h98b884f59002a908@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 12/5/05, David Link <dlink(at)soundscan(dot)com> wrote:
> Hi, This has become a major problem for us. Thank you in advance for
> your help.
>
> OS: SUSE Linux 2.6.5-7.191-bigsmp
> PostgreSQL: 7.4.8
> Application: ModPerl Web application using DBI.pm
> Database size: 100 Gb, 1025 Tables.
>
> Problem: EXPLAIN SELECT ... does not return.
>
> Description:
>
> The Application uses an EXPLAIN cost to determine whether a client's dynamic
> request for data is too demanding for the server so it can gracefully deny
> them. (Currently, anything over cost=0.00..500000.00).
>
> The system gets about 3000 page requests a day.
>
> Certain SQL Queries, I believe those with many table joins, when run as
> EXPLAIN plans, never return. As a result they seem to stay churning in the
> system. Once that happens other queries build up and the performance of the
> whole database server grinds to a halt. Postgresql never dies, but
> eventually, user requests start timing out.
>
> This happens on average two or three times a week. I kill an offending
> process and all's well again. I have not been able to identify with
> certainty
> an offending SQL statement.
>
> Config params, that have changed from default:
>
> tcpip_socket = true
> max_connections = 200
> shared_buffers = 2000
> sort_mem = 1048576
> vacuum_mem = 65536
> max_fsm_pages = 100000
> max_fsm_relations = 1000
> max_files_per_process = 1000
> fsync = false
> wal_sync_method = fsync
> wal_buffers = 800
> checkpoint_segments = 30
> commit_delay = 100
> commit_siblings = 50
> effective_cache_size = 1000
> random_page_cost = 4
> geqo = true
> geqo_threshold = 14
> default_statistics_target = 100
> from_collapse_limit = 13
> join_collapse_limit = 13
>
> Note: we load lumps of data ea. week. Then primarily it is a readonly
> database.
>
>

when you have thoses cases, you can take a look in pg_stats_activity
to find the offending query...

or simply logs all queries

--
Atentamente,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Link <dlink(at)soundscan(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: EXPLAIN SELECT .. does not return
Date: 2005-12-05 19:54:01
Message-ID: 1927.1133812441@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

David Link <dlink(at)soundscan(dot)com> writes:
> Certain SQL Queries, I believe those with many table joins, when run as
> EXPLAIN plans, never return.

I'd guess that one or all of these settings are excessive:

> geqo_threshold = 14
> from_collapse_limit = 13
> join_collapse_limit = 13

Keep in mind that the planning cost is exponential in these limits,
eg geqo_threshold = 14 probably allows planning times about 14 times
greater than geqo_threshold = 13.

While I'm looking:

> shared_buffers = 2000

That seems extremely low for modern machines.

> sort_mem = 1048576

That, on the other hand, is almost certainly way too high for a system-wide
setting. You're promising you have 1Gb available for *each* sort.

> max_fsm_pages = 100000

And this way too low for a 100Gb database, unless most of the tables
never see any UPDATEs or DELETEs.

> wal_buffers = 800

Seems a bit high, especially considering you have fsync disabled and
thus there is no benefit whatever to buffering WAL.

> commit_delay = 100
> commit_siblings = 50

Have you measured any benefit to having this turned on?

All in all it looks like your configuration settings were chosen by
throwing darts :-(

regards, tom lane


From: David Link <dlink(at)soundscan(dot)com>
To: Jaime Casanova <systemguards(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: EXPLAIN SELECT .. does not return
Date: 2005-12-05 22:07:28
Message-ID: 4394BA20.3020403@soundscan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jaime Casanova wrote:

>when you have thoses cases, you can take a look in pg_stats_activity
>to find the offending query...
>
>or simply logs all queries
>
>
Thanks for the advice. I also turned on stat_command_string


From: David Link <dlink(at)soundscan(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: EXPLAIN SELECT .. does not return
Date: 2005-12-05 22:11:08
Message-ID: 4394BAFC.5030406@soundscan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:

>David Link <dlink(at)soundscan(dot)com> writes:
>
>
>>Certain SQL Queries, I believe those with many table joins, when run as
>>EXPLAIN plans, never return.
>>
>>
>
>I'd guess that one or all of these settings are excessive:
>
>
>
>>geqo_threshold = 14
>>from_collapse_limit = 13
>>join_collapse_limit = 13
>>
>>
>
>Keep in mind that the planning cost is exponential in these limits,
>eg geqo_threshold = 14 probably allows planning times about 14 times
>greater than geqo_threshold = 13.
>
>While I'm looking:
>
>
>
>>shared_buffers = 2000
>>
>>
>
>That seems extremely low for modern machines.
>
>
>
>>sort_mem = 1048576
>>
>>
>
>That, on the other hand, is almost certainly way too high for a system-wide
>setting. You're promising you have 1Gb available for *each* sort.
>
>
>
>>max_fsm_pages = 100000
>>
>>
>
>And this way too low for a 100Gb database, unless most of the tables
>never see any UPDATEs or DELETEs.
>
>
>
>>wal_buffers = 800
>>
>>
>
>Seems a bit high, especially considering you have fsync disabled and
>thus there is no benefit whatever to buffering WAL.
>
>
>
>>commit_delay = 100
>>commit_siblings = 50
>>
>>
>
>Have you measured any benefit to having this turned on?
>
>All in all it looks like your configuration settings were chosen by
>throwing darts :-(
>
> regards, tom lane
>
>
>
Thanks for your reply, Tom. Different folks have made different
suggestions. Can you suggest more reasonable values for these? But
more importantly, do you think the problem I am having is due to these
configuration short comings?

Thanks much.
David


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: David Link <dlink(at)soundscan(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: EXPLAIN SELECT .. does not return
Date: 2005-12-05 22:17:17
Message-ID: 4394BC6D.509@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


>>
>>
> Thanks for your reply, Tom. Different folks have made different
> suggestions. Can you suggest more reasonable values for these? But
> more importantly, do you think the problem I am having is due to these
> configuration short comings?
>
> Thanks much.
David take a look at:
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Joshua D. Drake

> David
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Link <dlink(at)soundscan(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: EXPLAIN SELECT .. does not return
Date: 2005-12-05 22:19:58
Message-ID: 3135.1133821198@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

David Link <dlink(at)soundscan(dot)com> writes:
> more importantly, do you think the problem I am having is due to these
> configuration short comings?

Yeah, the planning thresholds ...

regards, tom lane