Help in avoiding a query 'Warm-Up' period/shared buffer cache

Lists: pgsql-performance
From: Mark Liberman <mliberman(at)mixedsignals(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Date: 2006-01-04 22:49:43
Message-ID: 200601041449.43557.mliberman@mixedsignals.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello,

We have a web-application running against a postgres 8.1 database, and
basically, every time I run a report after no other reports have been run for
several hours, the report will take significantly longer (e.g. 30 seconds),
then if I re-run the report again, or run the report when the web-application
has been used recently (< 1 second).

Keep in mind that our web-app might issue 30 or more individual queries to
return a given report and that this behavior is not just isolated to a single
report-type - it basically happens for any of the reports after the web-app
has been inactive. Also, I can trace it back to the timing of the underlying
queries, which show this same behavior (e.g. it's not because of overhead in
our web-app).

So, it appears to be some sort of caching issue. I'm not 100% clear on how
the shared buffer cache works, and what we might do to make sure that we
don't have these periods where queries take a long time. Since our users'
typical usage scenario is to not use the web-app for a long time and then
come back and use it, if reports which generally take a second are taking 30
seconds, we have a real problem.

I have isolated a single example of one such query which is very slow when no
other queries have been run, and then speeds up significantly on the second
run.

First run, after a night of inactivity:

explain analyze SELECT average_size, end_time FROM 1min_events WHERE file_id =
'137271' AND end_time > now() - interval '2 minutes' ORDER BY end_time DESC
LIMIT 1;

Limit (cost=47.06..47.06 rows=1 width=24) (actual time=313.585..313.585
rows=1 loops=1)
-> Sort (cost=47.06..47.06 rows=1 width=24) (actual time=313.584..313.584
rows=1 loops=1)
Sort Key: end_time
-> Bitmap Heap Scan on 1min_events (cost=44.03..47.05 rows=1
width=24) (actual time=313.562..313.568 rows=2 loops=1)
Recheck Cond: ((end_time > (now() - '00:02:00'::interval)) AND
(file_id = 137271))
-> BitmapAnd (cost=44.03..44.03 rows=1 width=0) (actual
time=313.551..313.551 rows=0 loops=1)
-> Bitmap Index Scan on 1min_events_end_idx
(cost=0.00..5.93 rows=551 width=0) (actual time=0.076..0.076 rows=46 loops=1)
Index Cond: (end_time > (now() -
'00:02:00'::interval))
-> Bitmap Index Scan on 1min_events_file_id_begin_idx
(cost=0.00..37.85 rows=3670 width=0) (actual time=313.468..313.468 rows=11082
loops=1)
Index Cond: (file_id = 137271)
Total runtime: 313.643 ms
(11 rows)

Second run, after that:

explain analyze SELECT average_size, end_time FROM 1min_events WHERE file_id
= '137271' AND end_time > now() - interval '2 minutes' ORDER BY end_time DESC
LIMIT 1;

Limit (cost=47.06..47.06 rows=1 width=24) (actual time=2.209..2.209 rows=1
loops=1)
-> Sort (cost=47.06..47.06 rows=1 width=24) (actual time=2.208..2.208
rows=1 loops=1)
Sort Key: end_time
-> Bitmap Heap Scan on 1min_events (cost=44.03..47.05 rows=1
width=24) (actual time=2.192..2.194 rows=2 loops=1)
Recheck Cond: ((end_time > (now() - '00:02:00'::interval)) AND
(file_id = 137271))
-> BitmapAnd (cost=44.03..44.03 rows=1 width=0) (actual
time=2.186..2.186 rows=0 loops=1)
-> Bitmap Index Scan on 1min_events_end_idx
(cost=0.00..5.93 rows=551 width=0) (actual time=0.076..0.076 rows=46 loops=1)
Index Cond: (end_time > (now() -
'00:02:00'::interval))
-> Bitmap Index Scan on 1min_events_file_id_begin_idx
(cost=0.00..37.85 rows=3670 width=0) (actual time=2.106..2.106 rows=11082
loops=1)
Index Cond: (file_id = 137271)
Total runtime: 2.276 ms
(11 rows)

One of the things that is perplexing about the initial slowness of this query
is that it's accessing the most recent rows in a given table (e.g. those in
the last 2 minutes). So, I would expect the OS cache to be updated with
these new rows.

Some general information about the server / db:

1) The database is 25G, and has about 60 tables - some very small, but several
> 5 MM rows.

2) The table I am querying against above (1min_events) has 5.5 MM rows, but is
indexed on end_time, as well as a compound index on file_id, begin_time

3) The following are running on the server that holds the db:

a) A program which is reading files and making several (5-10) database calls
per minute (these calls tend to take < 100 ms each). These calls are
inserting 10's of rows into several of the tables.

b) An apache web-server

c) The 8.1 postgres DB

d) we are running periodic CRON jobs (generally at 11pm, 1 am and 3am) that
truncate some of the older data

e) we have autovacuum on with a 60 second naptime and and low scale factors
0.2, so analyzes and vacuums happen throughout the day - vacuums are
generally triggered by the truncate CRON jobs too.

4) Some of our config settings:

shared_buffers = 8192
work_mem = 8192

Total RAM on server is 1 Gig

Basically any advice as to what to look at to avoid this situation would be
greatly appreciated. Is this simply a matter of tuning the shared_buffers
parameter? If not, is scheduling a set of queries to force the proper
loading of the cache a logical solution?

Thanks in advance,

Mark


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Date: 2006-01-05 23:12:13
Message-ID: dpk922$slo$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


"Mark Liberman" <mliberman(at)mixedsignals(dot)com> wrote
>
> First run, after a night of inactivity:
>
> -> Bitmap Index Scan on 1min_events_file_id_begin_idx
> (cost=0.00..37.85 rows=3670 width=0) (actual time=313.468..313.468
> rows=11082
> loops=1)
> Index Cond: (file_id = 137271)
> Total runtime: 313.643 ms
>
> Second run, after that:
>
> -> Bitmap Index Scan on 1min_events_file_id_begin_idx
> (cost=0.00..37.85 rows=3670 width=0) (actual time=2.106..2.106 rows=11082
> loops=1)
> Index Cond: (file_id = 137271)
> Total runtime: 2.276 ms

It is clear that the first query takes longer time because of the IO time of
index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
currently there is no easy solution for this situation, unless you could
predicate which part of relation/index your query will use, then you can
preload or "warm-up" cache for it.

Regards,
Qingqing


From: Mark Liberman <mliberman(at)mixedsignals(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Date: 2006-01-06 02:15:36
Message-ID: 200601051815.36181.mliberman@mixedsignals.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thursday 05 January 2006 15:12, Qingqing Zhou wrote:
> "Mark Liberman" <mliberman(at)mixedsignals(dot)com> wrote
>
> > First run, after a night of inactivity:
> >
> > -> Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=313.468..313.468 rows=11082
> > loops=1)
> > Index Cond: (file_id = 137271)
> > Total runtime: 313.643 ms
> >
> > Second run, after that:
> >
> > -> Bitmap Index Scan on
> > 1min_events_file_id_begin_idx (cost=0.00..37.85 rows=3670 width=0)
> > (actual time=2.106..2.106 rows=11082 loops=1)
> > Index Cond: (file_id = 137271)
> > Total runtime: 2.276 ms
>
> It is clear that the first query takes longer time because of the IO time
> of index 1min_events_file_id_begin_idx (see 313.468 vs. 2.106). I am afraid
> currently there is no easy solution for this situation, unless you could
> predicate which part of relation/index your query will use, then you can
> preload or "warm-up" cache for it.
>
> Regards,
> Qingqing

Thanks Qingqing,

this actually helped me determine that the compound index,
1min_events_file_id_begin_idx, is not the proper index to use as it is based
on file_id and begin_time - the later of which is not involved in the where
clause. It is only using that index to "filter" out the listed file_id.

Now, my follow-up question / assumption. I am assuming that the IO time is
so long on that index because it has to read the entire index (for that
file_id) into memory (because it cannot just scan the rows with a certain
date range because we are not using begin_time in the where clause).

But, if I replaced that compound index with the proper compound index of
file_id / end_time, it would give similar performance results to the scan on
1min_events_end_idx (which was < 1 ms). E.g. the latest rows that were
updated are more likely to be in the cache - and it is smart enough to only
read the index rows that it needs.

Alternatively, I could create a single index on file_id (and rely upon the new
bitmap scan capabilities in 1.2). But, I fear that, although this will be
smaller than the erroneous compound index on file_id / begin_time, it will
still display the same behavior in that it will need to read all rows from
that index for the appropriate file_id - and since the data goes back every
minute for 60 days, that IO might be large.

Obviously, I will be testing this - but it might take a few days, as I haven't
figure out how to simulate the "period of inactivity" to get the data flushed
out of the cache ... so I have to run this each morning. But, any
confirmation / corrections to my assumptions are greatly appreciated. E.g. is
the compound index the way to go, or the solo index on file_id?

Thanks,

Mark


From: David Lang <dlang(at)invendra(dot)net>
To: Mark Liberman <mliberman(at)mixedsignals(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help in avoiding a query 'Warm-Up' period/shared buffer
Date: 2006-01-06 02:50:22
Message-ID: Pine.LNX.4.62.0601051848510.11046@qnivq.ynat.uz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 5 Jan 2006, Mark Liberman wrote:

> Obviously, I will be testing this - but it might take a few days, as I haven't
> figure out how to simulate the "period of inactivity" to get the data flushed
> out of the cache ... so I have to run this each morning.

cat large_file >/dev/null

will probably do a pretty good job of this (especially if large_file is
noticably larger then the amount of ram you have)

David Lang


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Help in avoiding a query 'Warm-Up' period/shared buffer cache
Date: 2006-01-06 03:08:05
Message-ID: dpkms7$1van$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


"Mark Liberman" <mliberman(at)mixedsignals(dot)com> wrote
>
> Now, my follow-up question / assumption. I am assuming that the IO time
> is
> so long on that index because it has to read the entire index (for that
> file_id) into memory
>
> any confirmation / corrections to my assumptions are greatly appreciated.
> E.g. is
> the compound index the way to go, or the solo index on file_id?
>

Only part of the index file is read. It is a btree index. Keep the index
smaller but sufficient to guide your search is always good because even by
the guidiance of the index, a heap visit to get the real data is not
avoidable.

Regards,
Qingqing


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: David Lang <dlang(at)invendra(dot)net>
Cc: Mark Liberman <mliberman(at)mixedsignals(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Help in avoiding a query 'Warm-Up' period/shared buffer
Date: 2006-01-06 03:14:21
Message-ID: 20060106031421.GM43311@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Jan 05, 2006 at 06:50:22PM -0800, David Lang wrote:
> On Thu, 5 Jan 2006, Mark Liberman wrote:
>
> >Obviously, I will be testing this - but it might take a few days, as I
> >haven't
> >figure out how to simulate the "period of inactivity" to get the data
> >flushed
> >out of the cache ... so I have to run this each morning.
>
> cat large_file >/dev/null
>
> will probably do a pretty good job of this (especially if large_file is
> noticably larger then the amount of ram you have)

The following C code is much faster...

/*
* $Id: clearmem.c,v 1.1 2003/06/29 20:41:33 decibel Exp $
*
* Utility to clear out a chunk of memory and zero it. Useful for flushing disk buffers
*/

int main(int argc, char *argv[]) {
if (!calloc(atoi(argv[1]), 1024*1024)) { printf("Error allocating memory.\n"); }
}

Compile it and then pass in the number of MB of memory to allocate on
the command line.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461