Re: How to optimize select count(*)..group by?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bryan Field-Elliot <bryan_lists(at)netmeme(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How to optimize select count(*)..group by?
Date: 2005-07-28 17:22:04
Message-ID: 87fyty977n.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Bryan Field-Elliot <bryan_lists(at)netmeme(dot)org> writes:

> We have this simple query:
>
> select status, count(*) from customer group by status;
>
> There is already a btree index on status, but, the customer table is
> huge, and this query must be executed very frequently... an "explain" on
> this query shows that it is quite costly (and we notice it runs
> slowly)...
>
> Can someone recommend the best technique to optimize this? We can create
> new indices, we can re-write this query.. But we'd rather not add new
> tables or columns if possible (not just to solve this problem).

Without creating a new table you have three options, none of which will be
very fast:

1) If the entire table fits in ram then you could try setting random_page_cost
close enough to 1 to cause the index to be used.

2) If it doesn't but there are a reasonably small number of distinct values of
status you would hope to see a sequential scan and a hash aggregate being
used. I would expect this to be what you would see with the default
configuration. If not you might have to increase work_mem (or sort_mem
depending on your version I think).

3) If you have a large table with a large set of status values then Postgres
may be compelled to resort the entire table. In which case you should
experiment with work_mem/sort_mem to get the largest value you can without
inducing swapping. You could also experiment with putting pgsql_tmp on
separate spindles.

None of these will be fast enough for OLTP access like a web page. You would
be best off performing this query periodically and stuffing the results in a
table for quick access.

--
greg

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2005-07-28 17:23:39 Re: How to optimize select count(*)..group by?
Previous Message Jaime Casanova 2005-07-28 17:19:04 Fwd: Trigger ad mutli database