Re: Are bitmap index scans slow to start?

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: "'Jeff Janes'" <jeff(dot)janes(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Are bitmap index scans slow to start?
Date: 2013-03-08 21:27:54
Message-ID: 005001ce1c43$cbd9acb0$638d0610$@sympatico.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Sorry this took so long to get back to you. Here is where we were:

I said: <<

So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.

>>

Your reply: <<
Is that a restriction you have observed, or are you extrapolating based on a
single query? If you run a bunch of similar queries in close succession, it
is likely that the first few queries will warm up the cache, and following
queries will then run much faster. Also, if you restructure the series of
queries into a large one that reconstructs many rows simultaneously, it
might choose a more efficient path than if it is fed the queries one at a
time.
>>

Actual observation. The first run with a new parameter actually takes 90
seconds. Another run with the same parameter takes 15-30 seconds. Running
the query immediately afterwards with different parameters starts with a new
90 seconds query. Unfortunately, since going to LINUX, our sys ops hiss and
snarl at anyone who comes anywhere near machine or DB server configs, so I
am no longer well informed on how well optimized the machines are.

Ultimately, the machines need to be optimized by an expert. As I mentioned
before, our ETL is entirely single-load reads-and-writes (I didn't go into
the "why" of this because the nature of the data and the product dictates
this). And this is an example of one of the few complex joins that return
hundreds/thousands of rows. The problem is that a full index scan has to be
done before we can start building the results. So, if clustering will help
such that the index scan KNOWS that there's no point is scanning the rest of
the index because we've gone beyond the maximum value in the list of
possible values, then that would help, as each table being scanned has 50 -
100 million rows (there is one table for every month of production).

As always, thanks.

From: Jeff Janes [mailto:jeff(dot)janes(at)gmail(dot)com]
Sent: March 5, 2013 4:21 PM
To: Carlo Stonebanks
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Are bitmap index scans slow to start?

On Wed, Feb 27, 2013 at 1:38 PM, Carlo Stonebanks
<stonec(dot)register(at)sympatico(dot)ca> wrote:

>> Is the original query you posted part of the transform process, rather
than being the production query you run after the ETL is over?

Neither, it is part of our auditing and maintenance processes. It is not
called with any great frequency. The audit report generates rows defining
how the a particular item (an "item" being a particular table/row) was
created: it returns the names of the import tables, the row ids, the write
operations and any transformation messages that may have been generated -
all in the order they occurred.


...

So, this query is not called often, but the fact is that if it takes over 30
seconds to load an item (because the audit report takes so long to prepare
the bitmap index scan when passed new query parameters) then it severely
restricts how much data we can resurrect at any one time.

Is that a restriction you have observed, or are you extrapolating based on a
single query? If you run a bunch of similar queries in close succession, it
is likely that the first few queries will warm up the cache, and following
queries will then run much faster. Also, if you restructure the series of
queries into a large one that reconstructs many rows simultaneously, it
might choose a more efficient path than if it is fed the queries one at a
time.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gregg Jaskiewicz 2013-03-09 17:53:19 Re: New server setup
Previous Message Emre Hasegeli 2013-03-08 19:56:23 Re: PostgreSQL 9.2.3 performance problem caused Exclusive locks