Re: Yet Another COUNT(*)...WHERE...question

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Phoenix Kiula" <phoenix(dot)kiula(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Yet Another COUNT(*)...WHERE...question
Date: 2007-08-15 15:24:13
Message-ID: dcc563d10708150824q2ed10e27g6457dcec8cf14671@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 8/15/07, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
> On 15/08/07, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> > On 8/15/07, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
> > > On 15/08/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> > > > "Phoenix Kiula" <phoenix(dot)kiula(at)gmail(dot)com> writes:
> > > >
> > > > > I'm grappling with a lot of reporting code for our app that relies on
> > > > > queries such as:
> > > > >
> > > > > SELECT COUNT(*) FROM TABLE WHERE ....(conditions)...
> > > > >...
> > > > > The number of such possibilities for multiple WHERE conditions is
> > > > > infinite...
> > > >
> > > > Depends on the "conditions" bit. You can't solve all of the infinite
> > > > possibilities -- well you can, just run the query above -- but if you want > to do better it's all about understanding your data.
> > >
> > >
> > > I am not sure what the advice here is. The WHERE condition comes from
> > > the indices. So if the query was not "COUNT(*)" but just a couple of
> > > columns, the query executes in less than a second. Just that COUNT(*)
> > > becomes horribly slow.
> >
> > Sorry, but I don't believe you. if you're doing a count(*) on the
> > same dataset that returns in < 1 second, then the count(*) with the
> > same where clause will run in < 1 second. I haven't seen pgsql do
> > anything else.
>
>
>
> Sorry I was not clear. Imagine an Amazon.com search results page. It
> has about 15 results on Page 1, then it shows "Page 1 of 190".
>
> To show each page, the query probably has a "LIMIT 15 OFFSET 0" for
> Page 1. However, to calculate the total number of pages, they probably
> do a separate counts query, because doing a "select *" and then
> counting the number of rows returned would be even more inefficient
> than a count(*).

When I go to amazon.com I only ever get three pages of results. ever.
Because they know that returning 190 pages is not that useful, as
hardly anyone is going to wander through that many pages.

Google, you'll notice says "Results 1 - 10 of about 5,610,000 for
blacksmith" i.e. it's guesstimating as well. no reason for google to
look at every single row for blacksmith to know that there's about 5.6
million.

> So, in reporting, two queries are fairly common I would think, unless
> I am missing something?

Yes, optimization. :) You don't need an exact count to tell someone
that there's more data and they can go to it. Note that if you are
planning on doing things google sized, you'll need to do what they
did, invent your own specialized database.

For us mere mortals, it's quite likely that you can do something like:

explain select * from table where field like 'abc%';

and then parse the explain output for an approximate number.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Broersma Jr 2007-08-15 15:29:30 Re: Yet Another COUNT(*)...WHERE...question
Previous Message Phoenix Kiula 2007-08-15 15:22:51 Customizing psql console to show execution times