Re: Critical performance problems on large databases

From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Critical performance problems on large databases
Date: 2002-04-11 00:20:28
Message-ID: Pine.LNX.4.21.0204110040270.2690-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On Wed, 10 Apr 2002, Tom Lane wrote:

> Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org> writes:
> > We also noted that a
> > SELECT COUNT(*) FROM BigQuery;
> > can take quite a long time and again use a lot of resources,
> > whereas
> > SELECT COUNT(smallcolumn) FROM BigQuery;
> > may be faster and less resource consuming.
>
> This is complete nonsense... if anything, the second one will take
> more cycles, since it has to actually examine a column.
>

OK, I hate to do this, partly because I seem to remember mention of how this
sort of thing has to happen like this because the system isn't clever enough to
do the optimisation, but...

Shouldn't
SELECT count(*) FROM chat_post
give an immediate answer, especially when there's an index it can use?

Example, with what appears to be an overkill but the the primary key is a two
column affair where as the 'time' column counted in the second select has it's
own index:

avid_chat_archive=> SET ENABLE_SEQSCAN TO FALSE;
SET VARIABLE
avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(*) FROM chat_post;
NOTICE: QUERY PLAN:

Aggregate (cost=100020853.10..100020853.10 rows=1 width=0) (actual time=48557.4
8..48557.49 rows=1 loops=1)
-> Seq Scan on chat_post (cost=100000000.00..100018291.08 rows=1024808 width
=0) (actual time=6.68..32598.56 rows=1024808 loops=1)
Total runtime: 48557.93 msec

EXPLAIN
avid_chat_archive=> SELECT COUNT(*) FROM chat_post;
count
---------
1024808
(1 row)

avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(time) FROM chat_post;
NOTICE: QUERY PLAN:

Aggregate (cost=100020853.10..100020853.10 rows=1 width=8) (actual time=51314.5
4..51314.55 rows=1 loops=1)
-> Seq Scan on chat_post (cost=100000000.00..100018291.08 rows=1024808 width
=8) (actual time=6.78..35012.81 rows=1024808 loops=1)
Total runtime: 51314.97 msec

EXPLAIN
avid_chat_archive=> SELECT COUNT(time) FROM chat_post;
count
---------
1024808
(1 row)

avid_chat_archive=> \d chat_post

Table "chat_post"
Column | Type | Modifiers
-------------+--------------------------+-----------
session_id | smallint | not null
poster_name | character varying(32) | not null
time | timestamp with time zone | not null
post_number | smallint | not null
Indexes: chat_post_time_key,
chat_post_user_key
Primary key: chat_post_pkey
Triggers: RI_ConstraintTrigger_4369014,
RI_ConstraintTrigger_4369012,
RI_ConstraintTrigger_4369010,
RI_ConstraintTrigger_4369008,
RI_ConstraintTrigger_4369006

avid_chat_archive=>

Having muttered about the primary key using two columns I see the planner can
see the table size without having to revert to an index. Which makes sense if
only I'd turned my brain on first.

Anyway, the question still stands, why does postgres do this query this
way? It is doing the full sequential scan, i.e. fetching the tuples from
disk, when this data is not necessary for the query result. Is it to do with
calling requirement of count(), other aggregate functions and/or functions in
general when used in the return list and/or that it requires too much
intelligence for the system to determine such optimisations?

And, I can't see this specific item addressed in the FAQ. I'm sure I'm not the
only to have brought this up (oh wait, I'm reply to a related message so
obviously not). Shouldn't it be there?

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nigel J. Andrews 2002-04-11 00:26:25 Re: Critical performance problems on large databases
Previous Message Ronan Lucio 2002-04-10 23:26:40 Database permissions