Quick Links

Re: how could select id=xx so slow?

Lists:	pgsql-performance

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	how could select id=xx so slow?
Date:	2012-07-06 06:17:23
Message-ID:	CAOA66tEJLDOt4JpJ0RwOEsgizfQPWkCHd+SgT7kbiHix_io18A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I have grabbed one day slow query log and analyzed it by pgfouine, to my
surprise, the slowest query is just a simple select statement:

*select diggcontent_data_message.thing_id, diggcontent_data_message.KEY,
diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;*

where thing_id is the primary key, guess how long it takes?

754.61 seconds!!

I tried explain analyze it and below is the result, which is very fast:

*
*
*
explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY, diggcontent_data_message.value,
diggcontent_data_message.kind FROM diggcontent_data_message WHERE
diggcontent_data_message.thing_id = 3570882;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message (cost=0.00..15.34 rows=32 width=51) (actual
time=0.080..0.096 rows=8 loops=1)
Index Cond: (thing_id = 3570882)
Total runtime: 0.115 ms
(3 rows)
*

so I wonder could this simple select is innocent and affected badly by
other queries? how could I find those queries that really slow down the
database?
thanks!

From:	Daniel Farina <daniel(at)heroku(dot)com>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-06 09:46:18
Message-ID:	CAAZKuFZn3gXpr-b60+b_A6XqX8Do6P8hL770bcTg9BaLT0mBFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, Jul 5, 2012 at 11:17 PM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> I have grabbed one day slow query log and analyzed it by pgfouine, to my
> surprise, the slowest query is just a simple select statement:
>
> select diggcontent_data_message.thing_id, diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
> diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;
>
>
> where thing_id is the primary key, guess how long it takes?
>
> 754.61 seconds!!

Is it possible that the size of the tuple is enormous? Because one
area where I've noticed EXPLAIN ANALYZE blows away normal performance
is when a lot of the work would be in reassembling, decompressing
(collectively: de-TOASTING) and sending the data.

Even then, that time seems excessive...but something to think about.

--
fdr

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-06 11:16:27
Message-ID:	4FF6C90B.8070402@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/06/2012 02:17 PM, Yan Chunlu wrote:

> so I wonder could this simple select is innocent and affected badly by
> other queries? how could I find those queries that really slow down
> the database?

It might not be other queries. Your query could be taking that long
because it was blocked by a lock during maintenance work (say, an ALTER
TABLE). It's also quite possible that it was held up by a slow
checkpoint; check your logs to see if there are warnings about
checkpoint activity.

--
Craig Ringer

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Yan Chunlu EXTERN" <springrider(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-06 13:10:51
Message-ID:	D960CB61B694CF459DCFB4B0128514C2081BF021@exadv11.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Yan Chunlu wrote:
> I have grabbed one day slow query log and analyzed it by pgfouine, to
my surprise, the slowest query
> is just a simple select statement:
>
> select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
> where thing_id is the primary key, guess how long it takes?
>
> 754.61 seconds!!
>
> I tried explain analyze it and below is the result, which is very
fast:
>
> explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
> diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
> diggcontent_data_message.thing_id = 3570882;
>
QUERY PLAN
>
------------------------------------------------------------------------
------------------------------
> -------------------------------------------------------------
> Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message (cost=0.00..15.34
> rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
> Index Cond: (thing_id = 3570882)
> Total runtime: 0.115 ms
> (3 rows)
>
> so I wonder could this simple select is innocent and affected badly by
other queries? how could I find
> those queries that really slow down the database?

Are these by any chance the aggregated costs in pgFouine?
Could it be that the statement just ran very often and used that time in
total?

Other than that, it could have been blocked by something that takes an
exclusive lock on the table.

There are no ON SELECT DO INSTEAD rules or similar things on the table,
right?

Yours,
Laurenz Albe

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-09 09:20:56
Message-ID:	CAOA66tF8dAzAS9wG381bhp+AiC2LrzcF7MDmOxvgW0p07rMNtw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

thanks for all the help. I checked the probability and found that:
1, the size of tuple is small
2, I checked the log manually and it indeed cost that much of time, not
aggregated

the value of "log_min_messages" in postgresql.conf is error, I have changed
it to "warning", so far does not received any warning, still waiting.

beside I do see some COMMIT which is relatively slow for example:
60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG: duration: 375.851 ms
statement: COMMIT
61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG: duration: 327.964 ms
statement: COMMIT

but only one "BEGIN" in the same one day log file, did that influence the
query time too?

On Fri, Jul 6, 2012 at 9:10 PM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>wrote:

> Yan Chunlu wrote:
> > I have grabbed one day slow query log and analyzed it by pgfouine, to
> my surprise, the slowest query
> > is just a simple select statement:
> >
> > select diggcontent_data_message.thing_id,
> diggcontent_data_message.KEY,
> > diggcontent_data_message.value, diggcontent_data_message.kind FROM
> diggcontent_data_message WHERE
> > diggcontent_data_message.thing_id = 3570882;
> >
> > where thing_id is the primary key, guess how long it takes?
> >
> > 754.61 seconds!!
> >
> > I tried explain analyze it and below is the result, which is very
> fast:
> >
> > explain analyze select diggcontent_data_message.thing_id,
> diggcontent_data_message.KEY,
> > diggcontent_data_message.value, diggcontent_data_message.kind FROM
> diggcontent_data_message WHERE
> > diggcontent_data_message.thing_id = 3570882;
> >
> QUERY PLAN
> >
> ------------------------------------------------------------------------
> ------------------------------
> > -------------------------------------------------------------
> > Index Scan using idx_thing_id_diggcontent_data_message on
> diggcontent_data_message (cost=0.00..15.34
> > rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
> > Index Cond: (thing_id = 3570882)
> > Total runtime: 0.115 ms
> > (3 rows)
> >
> > so I wonder could this simple select is innocent and affected badly by
> other queries? how could I find
> > those queries that really slow down the database?
>
> Are these by any chance the aggregated costs in pgFouine?
> Could it be that the statement just ran very often and used that time in
> total?
>
> Other than that, it could have been blocked by something that takes an
> exclusive lock on the table.
>
> There are no ON SELECT DO INSTEAD rules or similar things on the table,
> right?
>
> Yours,
> Laurenz Albe
>

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-10 01:25:33
Message-ID:	4FFB848D.7080900@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/09/2012 05:20 PM, Yan Chunlu wrote:
>
> the value of "log_min_messages" in postgresql.conf is error, I have
> changed it to "warning", so far does not received any warning, still
> waiting.
When trying to track down performance issues, increasing logging to at
least `info' would seem to be sensible.

I suggest increasing your logging and enabling the auto_explain module
so it logs slow queries. If you can afford the substantial performance
hit you could enable its analyze mode to get details on why.

>
> beside I do see some COMMIT which is relatively slow for example:
> 60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG: duration: 375.851
> ms statement: COMMIT
> 61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG: duration: 327.964
> ms statement: COMMIT
That certainly is slow. Again, I suspect checkpoint activity could be at
fault. You may need to tune to spread your checkpoints out and use more
aggressive bgwriter settings. See the wiki for performance tuning info.

>
> but only one "BEGIN" in the same one day log file, did that influence
> the query time too?

Only one BEGIN in the whole day? Or do you mean "only one BEGIN slow
enough to trigger slow query logging" ?

Do you have a "log_min_duration_statement" directive set in your
postgresql.conf ?

--
Craig Ringer

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-10 02:25:59
Message-ID:	CAOA66tEe4m8OQQ+26AKxrreEF458R-983UVMARZ6zeqEkZxwCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

the transaction part is wired, I have filtered BEGIN and COMMIT from a one
day log by using:
cat /usr/local/pgsql/data/pg_log/Saturday.log |grep -E "BEGIN|COMMIT"
>trans.txt

and pasted it to gist(only three BEGIN and many COMMIT):
https://gist.github.com/3080600

I didn't set log_min_duration_statement in the postgresql.conf, but execute
*dbapi_con.cursor().execute("SET log_min_duration_statement to 30")*
*for every connection.*
*the system is in production and relatively heavy load, I thought it would
be better not "reload" the db too frequently, use the **"SET
log_min_duration_statement to 30" I could turn the log off within my
application.*

On Tue, Jul 10, 2012 at 9:25 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:

> On 07/09/2012 05:20 PM, Yan Chunlu wrote:
>
>>
>> the value of "log_min_messages" in postgresql.conf is error, I have
>> changed it to "warning", so far does not received any warning, still
>> waiting.
>>
> When trying to track down performance issues, increasing logging to at
> least `info' would seem to be sensible.
>
> I suggest increasing your logging and enabling the auto_explain module so
> it logs slow queries. If you can afford the substantial performance hit you
> could enable its analyze mode to get details on why.
>
>
>
>> beside I do see some COMMIT which is relatively slow for example:
>> 60 2012-07-08 00:00:29 CST [19367]: [131-1] LOG: duration: 375.851 ms
>> statement: COMMIT
>> 61 2012-07-08 00:00:30 CST [19367]: [132-1] LOG: duration: 327.964 ms
>> statement: COMMIT
>>
> That certainly is slow. Again, I suspect checkpoint activity could be at
> fault. You may need to tune to spread your checkpoints out and use more
> aggressive bgwriter settings. See the wiki for performance tuning info.
>
>
>
>> but only one "BEGIN" in the same one day log file, did that influence the
>> query time too?
>>
>
> Only one BEGIN in the whole day? Or do you mean "only one BEGIN slow
> enough to trigger slow query logging" ?
>
> Do you have a "log_min_duration_statement" directive set in your
> postgresql.conf ?
>
> --
> Craig Ringer
>

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-10 02:46:25
Message-ID:	4FFB9781.60601@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/10/2012 10:25 AM, Yan Chunlu wrote:
> I didn't set log_min_duration_statement in the postgresql.conf, but
> execute
> /dbapi_con.cursor().execute("SET log_min_duration_statement to 30")/
> /for every connection./

OK, same effect: You're only logging slow statements.

It's not at all surprising that BEGIN doesn't appear when a
log_min_duration_statement is set. It's an incredibly fast operation.
What's amazing is that it appears even once - that means your database
must be in serious performance trouble, as BEGIN should take tenths of a
millisecond on an unloaded system. For example my quick test here:

LOG: statement: BEGIN;
LOG: duration: 0.193 ms

... which is actually a lot slower than I expected, but hardly slow
statement material.

The frequent appearance of slow (multi-second) COMMIT statements in your
slow statement logs suggests there's enough load on your database that
there's real contention for disk, and/or that checkpoints are stalling
transactions.

First, you need to set log_min_messages = 'info' to allow Pg to complain
about things like checkpoint frequency.

Now temporarily set log_checkpoints = on to record when checkpoints
happen and how long they take. Most likely you'll find you need to tune
checkpoint behaviour. Some information, albeit old, on that is here:

http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
<http://www.westnet.com/%7Egsmith/content/postgresql/chkp-bgw-83.htm>

Basically you might want to try increasing your
checkpoint_completion_target and making the bgwriter more aggressive -
assuming that your performance issues are in fact checkpoint related.

It's also possible that they're just overall load, especially if you
have lots and lots (hundreds) of connections to the database all trying
to do work at once without any kind of admission control or
pooling/queuing. In that case, introducing a connection pool like
PgBouncer may help.

--
Craig Ringer

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-10 02:58:46
Message-ID:	CAOA66tFrgZ12FcxeM3ob+M_vSMK3oO7AkmL9EqSn3qktmAusVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

great thanks for the help and explanation, I will start logging the
information you mentioned and do some analysis.

On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>wrote:

> On 07/10/2012 10:25 AM, Yan Chunlu wrote:
>
> I didn't set log_min_duration_statement in the postgresql.conf, but execute
> *dbapi_con.cursor().execute("SET log_min_duration_statement to 30")*
> *for every connection.*
>
>
> OK, same effect: You're only logging slow statements.
>
> It's not at all surprising that BEGIN doesn't appear when a
> log_min_duration_statement is set. It's an incredibly fast operation.
> What's amazing is that it appears even once - that means your database must
> be in serious performance trouble, as BEGIN should take tenths of a
> millisecond on an unloaded system. For example my quick test here:
>
> LOG: statement: BEGIN;
> LOG: duration: 0.193 ms
>
> ... which is actually a lot slower than I expected, but hardly slow
> statement material.
>
> The frequent appearance of slow (multi-second) COMMIT statements in your
> slow statement logs suggests there's enough load on your database that
> there's real contention for disk, and/or that checkpoints are stalling
> transactions.
>
>
> First, you need to set log_min_messages = 'info' to allow Pg to complain
> about things like checkpoint frequency.
>
> Now temporarily set log_checkpoints = on to record when checkpoints happen
> and how long they take. Most likely you'll find you need to tune checkpoint
> behaviour. Some information, albeit old, on that is here:
>
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
>
> Basically you might want to try increasing your
> checkpoint_completion_target and making the bgwriter more aggressive -
> assuming that your performance issues are in fact checkpoint related.
>
> It's also possible that they're just overall load, especially if you have
> lots and lots (hundreds) of connections to the database all trying to do
> work at once without any kind of admission control or pooling/queuing. In
> that case, introducing a connection pool like PgBouncer may help.
>
> --
> Craig Ringer
>

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-11 06:24:24
Message-ID:	CAOA66tEiZ+OrU6mRutVq6Vksa6tRsgQYvB4erSNUpWx7t8LksQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I have logged one day data and found the checkpoint is rather
frequently(detail: https://gist.github.com/3088338). Not sure if it is
normal, but the average time of checkpoint is about 100sec~200sec, it seems
related with my settings:

574 checkpoint_segments = 64
575 wal_keep_segments = 5000

I set checkpoint_segments as a very large value which is because otherwise
the slave server always can not follow the master, should I lower that
value?

or the slow query is about something else? thanks!

On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>wrote:

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Yan Chunlu EXTERN" <springrider(at)gmail(dot)com>, "Craig Ringer" <ringerc(at)ringerc(dot)id(dot)au>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-11 08:23:07
Message-ID:	D960CB61B694CF459DCFB4B0128514C2082464D4@exadv11.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Yan Chunlu wrote:
> I have logged one day data and found the checkpoint is rather
frequently(detail:
> https://gist.github.com/3088338). Not sure if it is normal, but the
average time of checkpoint is
> about 100sec~200sec, it seems related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because
otherwise the slave server always can
> not follow the master, should I lower that value?

You mean, you set wal_keep_segments high for the standby, right?

wal_keep_segments has no impact on checkpoint frequency and intensity.

You are right that your checkpoint frequency is high. What is your value
of checkpoint_timeout?

You can increase the value of checkpoint_segments to decrease the
checkpoint frequence, but recovery will take longer then.

> or the slow query is about something else? thanks!

I guess the question is how saturated the I/O system is during
checkpoints. But even if it is very busy, I find it hard to believe
that such a trivial statement can take extremely long.

Yours,
Laurenz Albe

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-11 11:40:35
Message-ID:	CAOA66tE9PwbQ07+g9=HhbVwqXosy=Tb=M51NwSpoAmh5LvnVnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

could that because of my system is really busy?
1, postgresql always have 400+ connections(dozens of python process using
client pool)
2, the query peak is 50+/s
3, I do have some bad performance sql executing periodically, need 100+
second to complete. could those bad performance sql influence others?
because when I execute those simple sql directly, they was fast. but the
slow query log shows it took too much time.

On Wed, Jul 11, 2012 at 4:23 PM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>wrote:

> Yan Chunlu wrote:
> > I have logged one day data and found the checkpoint is rather
> frequently(detail:
> > https://gist.github.com/3088338). Not sure if it is normal, but the
> average time of checkpoint is
> > about 100sec~200sec, it seems related with my settings:
> >
> > 574 checkpoint_segments = 64
> > 575 wal_keep_segments = 5000
> >
> > I set checkpoint_segments as a very large value which is because
> otherwise the slave server always can
> > not follow the master, should I lower that value?
>
> You mean, you set wal_keep_segments high for the standby, right?
>
> wal_keep_segments has no impact on checkpoint frequency and intensity.
>
> You are right that your checkpoint frequency is high. What is your value
> of checkpoint_timeout?
>
> You can increase the value of checkpoint_segments to decrease the
> checkpoint frequence, but recovery will take longer then.
>
> > or the slow query is about something else? thanks!
>
> I guess the question is how saturated the I/O system is during
> checkpoints. But even if it is very busy, I find it hard to believe
> that such a trivial statement can take extremely long.
>
> Yours,
> Laurenz Albe
>

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-11 11:59:35
Message-ID:	CA+CSw_saQJXHUnnBQU5G54VdBNMbmxs5G1vJazfA_wdOBBzSXA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> I have logged one day data and found the checkpoint is rather
> frequently(detail: https://gist.github.com/3088338). Not sure if it is
> normal, but the average time of checkpoint is about 100sec~200sec, it seems
> related with my settings:
>
> 574 checkpoint_segments = 64
> 575 wal_keep_segments = 5000
>
> I set checkpoint_segments as a very large value which is because otherwise
> the slave server always can not follow the master, should I lower that
> value?
>
> or the slow query is about something else? thanks!

Some things to notice from the checkpoints log:
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)

This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.

If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.

On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Ants Aasma <ants(at)cybertec(dot)at>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-11 16:35:59
Message-ID:	CAOA66tFRfXGFvNaU2hvBq1d285QeiFcJR-0jru+bS=TBJ=nZ6A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

huge thanks for the patient explanations, I think you are right, it is
really related to the IO.
I monitor the IO using iostat -x and found the utilize part reach 100%
frequently, postgresql is the only service running on that machine, so I
think it is either checkpoint or queries caused the problem.

and I agree that checkpoint may not the problem, I guess I need to tackle
those damn queries.
currently the data dir(pgsql/data/base) used 111GB disk space, some tables
has tens of millions records. could that cause the query heavy disk IO?
when should I split the data to other machines(aka sharding)?

and you are right the machine has 16GB memory and commodity 500GB disk.
kernel: Linux adams 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011
x86_64 GNU/Linux

by "new kernel" which version do you mean?

and about those IO intensive queries, I can only tell the time used from
slow query log, is there anything like "explain analyze" that shows
specific information about IO usage?

On Wed, Jul 11, 2012 at 7:59 PM, Ants Aasma <ants(at)cybertec(dot)at> wrote:

> On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> > I have logged one day data and found the checkpoint is rather
> > frequently(detail: https://gist.github.com/3088338). Not sure if it is
> > normal, but the average time of checkpoint is about 100sec~200sec, it
> seems
> > related with my settings:
> >
> > 574 checkpoint_segments = 64
> > 575 wal_keep_segments = 5000
> >
> > I set checkpoint_segments as a very large value which is because
> otherwise
> > the slave server always can not follow the master, should I lower that
> > value?
> >
> > or the slow query is about something else? thanks!
>
> Some things to notice from the checkpoints log:
> * All chcekpoints are triggered by checkpoint_timeout, using up only a
> couple log files
> * Checkpoints write out around 40MB of buffers
> * The write out period is spread out nicely like it's supposed to but
> the sync phase is occasionally taking a very long time (more than 2
> minutes)
>
> This looks like something (not necessarily the checkpoint sync itself)
> is overloading the IO system. You might want to monitor the IO load
> with iostat and correlate it with the checkpoints and slow queries to
> find the culprit. It's also possible that something else is causing
> the issues.
>
> If the cause is checkpoints, just making them less frequent might make
> the problem worse. I'm assuming you have 16GB+ of RAM because you have
> 4GB of shared_buffers. Just making checkpoint_timeout longer will
> accumulate a larger number of dirty buffers that will clog up the IO
> queues even worse. If you are on Linux, lowering
> dirty_expire_centisecs or dirty_background_bytes might help to spread
> the load out but will make overall throughput worse.
>
> On the otherhand, if the I/O overload is from queries (more likely
> because some checkpoints sync quickly) there are no easy tuning
> answers. Making queries less IO intensive is probably the best you can
> do. From the tuning side, newer Linux kernels handle I/O fairness a
> lot better, and you could also try tweaking the I/O scheduler to
> achieve better throughput to avoid congestion or at least provide
> better latency for trivial queries. And of course its always possible
> to throw more hardware at the problem and upgrade the I/O subsystem.
>
> Ants Aasma
> --
> Cybertec Schönig & Schönig GmbH
> Gröhrmühlgasse 26
> A-2700 Wiener Neustadt
> Web: http://www.postgresql-support.de
>

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 00:18:23
Message-ID:	4FFE17CF.1050601@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/11/2012 07:40 PM, Yan Chunlu wrote:
> could that because of my system is really busy?
> 1, postgresql always have 400+ connections(dozens of python process
> using client pool)
> 2, the query peak is 50+/s
> 3, I do have some bad performance sql executing periodically, need
> 100+ second to complete. could those bad performance sql influence
> others? because when I execute those simple sql directly, they was
> fast. but the slow query log shows it took too much time.
>
Oh, come on, these are the sorts of things you tell us /when you ask
your question/, not days later after lots of back-and-forth discussion.

--
Craig Ringer

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 00:47:21
Message-ID:	CAOA66tF1M+S=m-7mU=TV91_hkxdcb1T8vwSwb-KRFOrdgBqWwA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Really sorry for the lack of information, but I did asked if the slow
queries could affect those simple one:
'so I wonder could this simple select is innocent and affected badly by
other queries? '

I didn't mention the connections number because I don't think my app is
that busy, and the large number connections was caused by slow queries.

I was wrong, everything is connected, too many factor could end with the
result,I am really sorry, I will tell everything I knew the next time:)
I learnt a lot during the back and forth!

On Thursday, July 12, 2012, Craig Ringer wrote:

> On 07/11/2012 07:40 PM, Yan Chunlu wrote:
>
> could that because of my system is really busy?
> 1, postgresql always have 400+ connections(dozens of python process using
> client pool)
> 2, the query peak is 50+/s
> 3, I do have some bad performance sql executing periodically, need 100+
> second to complete. could those bad performance sql influence others?
> because when I execute those simple sql directly, they was fast. but the
> slow query log shows it took too much time.
>
> Oh, come on, these are the sorts of things you tell us *when you ask
> your question*, not days later after lots of back-and-forth discussion.
>
> --
> Craig Ringer
>

From:	Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 01:07:00
Message-ID:	CAOtHd0Cq-hy1q1Mc8PW0MP8WMuztCsVHxGvU1z+B1xY5P6QkrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Jul 11, 2012 at 5:47 PM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> I learnt a lot during the back and forth!

Great to hear.

>> 1, postgresql always have 400+ connections(dozens of python process using client pool)

Note that Postgres does not deal well with a large number of
connections[1]: consider shrinking the size of the pool.

[1]: http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 02:20:26
Message-ID:	4FFE346A.7030402@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/12/2012 08:47 AM, Yan Chunlu wrote:
> Really sorry for the lack of information
I shouldn't have grumped like that either, sorry about that.

> I didn't mention the connections number because I don't think my app
> is that busy, and the large number connections was caused by slow queries.

Yep - assumptions are a killer like that.

Now you know to watch your system load with iostat, vmstat, top, etc and
to monitor your overall load.

--
Craig Ringer

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 05:10:06
Message-ID:	CAOA66tGY7K=3aPagXUsmFu+3YXn+5gmM7HvxvoPz7YSk-=6EWA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

after check out the wiki page Maciek mentioned, turns out that heavy
connection also burden the disk hardly.
looks like I am in the vicious circle:
1, slow query cause connection blocked so the client request more
connection.
2, more connection cause high disk io and make even the simplest query slow
and block.

I guess I should optimized those queries first...

On Thu, Jul 12, 2012 at 10:20 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>wrote:

> On 07/12/2012 08:47 AM, Yan Chunlu wrote:
>
> Really sorry for the lack of information
>
> I shouldn't have grumped like that either, sorry about that.
>
>
> I didn't mention the connections number because I don't think my app is
> that busy, and the large number connections was caused by slow queries.
>
>
> Yep - assumptions are a killer like that.
>
> Now you know to watch your system load with iostat, vmstat, top, etc and
> to monitor your overall load.
>
> --
> Craig Ringer
>

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 06:56:29
Message-ID:	4FFE751D.3050105@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/12/2012 01:10 PM, Yan Chunlu wrote:
> after check out the wiki page Maciek mentioned, turns out that heavy
> connection also burden the disk hardly.
> looks like I am in the vicious circle:
> 1, slow query cause connection blocked so the client request more
> connection.
> 2, more connection cause high disk io and make even the simplest query
> slow and block.

While true, you can often control this by making sure you don't
completely overload your hardware, queuing queries instead of running
them all at once.

You may still discover that your hardware can't cope with the workload
in that your queues may just keep on getting deeper or time out. In that
case, you certainly need to optimise your queries, tune your database,
and/or get bigger hardware.

--
Craig Ringer

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 12:48:01
Message-ID:	CAOA66tGPS0crmWXZ6tr6Y1-HvNOmuGV3Jfu8kg2U9yhdg2gPeQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

yes the system seems overloaded, I am dealing with a simple "INSERT" but
not sure if it is normal that it took more time than the explain estimated:

explain analyze INSERT INTO vote_content ( thing1_id, thing2_id, name,
date) VALUES (1,1, E'1', '2012-07-12T12:34:29.926863+00:00'::timestamptz)

QUERY PLAN

------------------------------------------------------------------------------------------
Insert (cost=0.00..0.01 rows=1 width=0) (actual time=79.610..79.610
rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.058..0.060
rows=1 loops=1)
Total runtime: 79.656 ms

it is a table with *50 million* rows, so not sure if it is too large... I
have attached the schema below:

Column | Type |
Modifiers
-----------+--------------------------+------------------------------------------------------------------------------------
rel_id | bigint | not null default
nextval('vote_content_rel_id_seq'::regclass)
thing1_id | bigint | not null
thing2_id | bigint | not null
name | character varying | not null
date | timestamp with time zone | not null
Indexes:
"vote_content_pkey" PRIMARY KEY, btree (rel_id)
"vote_content_thing1_id_key" UNIQUE, btree (thing1_id, thing2_id, name)
"idx_date_vote_content" btree (date)
"idx_name_vote_content" btree (name)
"idx_thing1_id_vote_content" btree (thing1_id)
"idx_thing1_name_date_vote_content" btree (thing1_id, name, date)
"idx_thing2_id_vote_content" btree (thing2_id)
"idx_thing2_name_date_vote_content" btree (thing2_id, name, date)

besides, it not the rush hour, so the disk IO is not the problem
currently(I think):
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await r_await w_await svctm %util
sda 0.00 44.50 9.50 21.50 76.00 264.00 21.94
0.16 5.10 12.42 1.86 4.39 13.60
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00

On Thu, Jul 12, 2012 at 2:56 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:

> On 07/12/2012 01:10 PM, Yan Chunlu wrote:
>
> after check out the wiki page Maciek mentioned, turns out that heavy
> connection also burden the disk hardly.
> looks like I am in the vicious circle:
> 1, slow query cause connection blocked so the client request more
> connection.
> 2, more connection cause high disk io and make even the simplest query
> slow and block.
>
>
> While true, you can often control this by making sure you don't completely
> overload your hardware, queuing queries instead of running them all at once.
>
> You may still discover that your hardware can't cope with the workload in
> that your queues may just keep on getting deeper or time out. In that case,
> you certainly need to optimise your queries, tune your database, and/or get
> bigger hardware.
>
> --
> Craig Ringer
>

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 14:39:28
Message-ID:	4FFEE1A0.7060500@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 07/12/2012 08:48 PM, Yan Chunlu wrote:
>
>
> explain analyze INSERT INTO vote_content ( thing1_id, thing2_id, name,
> date) VALUES (1,1, E'1', '2012-07-12T12:34:29.926863+00:00'::timestamptz)
>
> QUERY PLAN
> ------------------------------------------------------------------------------------------
> Insert (cost=0.00..0.01 rows=1 width=0) (actual time=79.610..79.610
> rows=0 loops=1)
> -> Result (cost=0.00..0.01 rows=1 width=0) (actual
> time=0.058..0.060 rows=1 loops=1)
> Total runtime: 79.656 ms
>
> it is a table with *50 million* rows, so not sure if it is too
> large... I have attached the schema below:

You have eight indexes on that table according to the schema you showed.
Three of them cover three columns. Those indexes are going to be
expensive to update; frankly I'm amazed it's that FAST to update them
when they're that big.

Use pg_size_pretty(pg_relation_size('index_name')) to get the index
sizes and compare to the pg_relation_size of the table. It might be
informative.

You may see some insert performance benefits with a non-100% fill factor
on the indexes, but with possible performance costs to index scans.

--
Craig Ringer

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Yan Chunlu <springrider(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 16:07:17
Message-ID:	CA+CSw_sWjnGoKPSqPKBB-u1Phe0ba-bm428wp-11-=qtRazkcA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
> yes the system seems overloaded, I am dealing with a simple "INSERT" but not
> sure if it is normal that it took more time than the explain estimated:

The estimated cost is in arbitrary units, its purpose is to compare
different execution plans, not estimate time taken. So it's completely
normal that it doesn't match actual time taken.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Ants Aasma <ants(at)cybertec(dot)at>
Cc:	Yan Chunlu <springrider(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-12 18:53:35
Message-ID:	CAMkU=1xb-5uqpc=fG42vmWUo9RrxDKE5pnwpNHBY=4rs8XE39g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Thu, Jul 12, 2012 at 9:07 AM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider(at)gmail(dot)com> wrote:
>> yes the system seems overloaded, I am dealing with a simple "INSERT" but not
>> sure if it is normal that it took more time than the explain estimated:
>
> The estimated cost is in arbitrary units, its purpose is to compare
> different execution plans, not estimate time taken. So it's completely
> normal that it doesn't match actual time taken.

Right. And to make explicit what you implied, when there is only one
to do something (like insert a row, or do maintenance on an index) it
often doesn't even attempt to cost that at all as there is no choice.
So it is not just a matter of units.

Cheers,

Jeff

From:	Yan Chunlu <springrider(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Ants Aasma <ants(at)cybertec(dot)at>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: how could select id=xx so slow?
Date:	2012-07-13 04:02:02
Message-ID:	CAOA66tEcE9wRaTLAYdS+y+L_FA86=2MZF53N+hqZ7TAuGBGnQg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

got it, thanks! without your help I really have no idea what should be fast
and what supposed to be slower.

I also find "select" involves a lot of writes:

iotop shows:

2789 be/4 postgres 0.00 B 57.34 M 0.00 % 0.00 % postgres: goov
conta 192.168.1.129(27300) SELECT

I knew that select could cause writes, but not at this magnitude....

On Fri, Jul 13, 2012 at 2:53 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Thu, Jul 12, 2012 at 9:07 AM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> > On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu <springrider(at)gmail(dot)com>
> wrote:
> >> yes the system seems overloaded, I am dealing with a simple "INSERT"
> but not
> >> sure if it is normal that it took more time than the explain estimated:
> >
> > The estimated cost is in arbitrary units, its purpose is to compare
> > different execution plans, not estimate time taken. So it's completely
> > normal that it doesn't match actual time taken.
>
> Right. And to make explicit what you implied, when there is only one
> to do something (like insert a row, or do maintenance on an index) it
> often doesn't even attempt to cost that at all as there is no choice.
> So it is not just a matter of units.
>
> Cheers,
>
> Jeff
>