Re: Performance of count(*) on large tables vs SQL Server

From: Stef <stef(at)ummon(dot)com>
To: Shridhar Daithankar <ghodechhap(at)ghodechhap(dot)net>
Cc: Andrew Mayo <ajmayo(at)kohuconsulting(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of count(*) on large tables vs SQL Server
Date: 2005-02-01 13:22:50
Message-ID: 20050201132250.GF32625@survivor.hades
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Hello Andrew,
Everything that Shridhar says makes perfect
sense, and, speaking from experience in dealing with
this type of 'problem', everything you say does as
well. Such is life really :)

I would not be at -all- surprised if Sybase
and Oracle did query re-writing behind the scene's
to send un-defined count's to a temporary table which
holds the row count. For an example of such done in
postgreSQL (using triggers and a custom procedure)
look into the 'General Bits' newsletter. Specifically
http://www.varlena.com/varlena/GeneralBits/49.php

I know, giving a URL as an answer 'sucks', but,
well, it simply repeats my experience. Triggers and
Procedures.

Regards
Steph

On Tue, Feb 01, 2005 at 06:32:56PM +0530, Shridhar Daithankar wrote:
> On Tuesday 01 Feb 2005 6:11 pm, Andrew Mayo wrote:
> > PG, on the other hand, appears to do a full table scan
> > to answer this question, taking nearly 4 seconds to
> > process the query.
> >
> > Doing an ANALYZE on the table and also VACUUM did not
> > seem to affect this.
> >
> > Can PG find a table's row count more efficiently?.
> > This is not an unusual practice in commercial
> > applications which assume that count(*) with no WHERE
> > clause will be a cheap query - and use it to test if
> > a table is empty, for instance. (because for
> > Oracle/Sybase/SQL Server, count(*) is cheap).
>
> First of all, such an assumption is no good. It should hit concurrency under
> heavy load but I know people do use it.
>
> For the specific question, after a vacuum analyze, you can use
>
> select reltuples from pg_class where relname='Foo';
>
> Remember, you will get different results between 'analyze' and 'vacuum
> analyze', since later actually visit every page in the table and hence is
> expected to be more accurate.
>
> > (sure, I appreciate there are other ways of doing
> > this, but I am curious about the way PG works here).
>
> Answer is MVCC and PG's inability use index alone. This has been a FAQ for a
> loong time.. Furthermore PG has custom aggregates to complicate the matter..
>
> Most of the pg developers/users think that unqualified select count(*) is of
> no use. You can search the archives for more details..
>
> HTH
>
> Shridhar
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2005-02-01 13:36:30 Re: Performance of count(*) on large tables vs SQL Server
Previous Message Shridhar Daithankar 2005-02-01 13:02:56 Re: Performance of count(*) on large tables vs SQL Server