Re: Are bitmap index scans slow to start?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Are bitmap index scans slow to start?
Date: 2013-02-26 18:11:57
Message-ID: CAMkU=1wc1uMsx5viu6NHU=LFn3q_AjwUWE0y6vuCN=eYKZ+eAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Feb 25, 2013 at 9:04 AM, Carlo Stonebanks <
stonec(dot)register(at)sympatico(dot)ca> wrote:

> Hi Jeff, thanks for the insight.****
>
> ** **
>
> << And then the next question would be, once they are in the cache, why
> don't they stay there? For that you would have to know what other types of
> activities are going on that might be driving the data out of the cache.**
> **
>
> >>** **
>
> ** **
>
> To give you an idea of the activity level, each physical machine hosts
> multiple DB’s with the same structure – one DB per client.****
>
> ** **
>
> We run automated ETL processes which digests client feeds (E) normalizes
> them (T) and then stores them in our DB (L).****
>
> ** **
>
> Looking at the stats from our audit log, the average feed load is 4 hours,
> divided up into 14 client sessions. Each session averages about 50 write
> (update, insert, no deletes) operations per second, representing 700 write
> operations per second.
>

Is each of these write operations just covering a single row? Does this
description apply to just one of the many (how many?) databases, so that
there are really 14*N concurrent sessions?

> The ratio of reads per write is pretty high as the system goes through the
> transformation process.****
>
> ** **
>
> Since I don’t know how this compares to other PG installations, the
> question of using periodic REINDEX and CLUSTER brings up these questions:*
> ***
>
> ** **
>
> **1) **Because we are hosting multiple DB’s, what is the impact on
> OS and disk caches?
>

They have to share the RAM. One strategy would be run ETL processes only
one at a time, rather than trying to run several concurrently, if that is
what you are doing. That way you can concentrate one customers data in
RAM, and then another's, to reduce the competition.

> ****
>
> **2) **Is there an automated CLUSTER and REINDEX strategy that will
> not interfere with normal operations?****
>
> **3) **By PG standards, is this a busy DB - and does explain why the
> general caches expire?
>

You really need to know whether those reads and writes are concentrated in
a small region (relative to the amount of your RAM), or widely scattered.
If you are reading and writing intensively (which you do seem to be doing)
but only within a compact region, then it should not drive other data out
of the cache. But, since you do seem to have IO problems from cache
misses, and you do have a high level of activity, the easy conclusion is
that you have too little RAM to hold the working size of your data.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Daw 2013-02-26 19:35:45 Estimation question...
Previous Message Andre 2013-02-26 12:53:17 Re: Server stalls, all CPU 100% system time