Re: Performance Issues with count()

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Jan Wieck <janwieck(at)yahoo(dot)com>
Cc: asdf asdasfa <sjg(at)email(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance Issues with count()
Date: 2002-04-23 23:53:37
Message-ID: 20020424095337.A17391@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Apr 23, 2002 at 07:25:34PM -0400, Jan Wieck wrote:
> In particular, the Perl script uses a hash table with one
> entry for each group. Now what happens if the input data is
> spread out and contains 1G groups? It'll simply blow away
> your script because it runs out of memory. This behaviour is
> unacceptable for a database system, so as you see in the
> Explain output, PostgreSQL sorts and groups the input data in
> temporary files before counting the rows. Due to that, the
> PostgreSQL solution to the problem requires a gazillion of IO
> operations, but it'll work whatever the input data is, given
> that there is enough disk space.

I've always had a little problem with this setup. Sure, if you're using 1
billion groups then you have a problem, but if you are going into only a
small number of groups, the sorting takes forever compared to using a hash.
Currently the planner beleives that the output of a group will be 1/10th of
the input, but I think the square root would be a better estimate.

Here we regularly summerise 2 million rows into 6 groups and it's almost
faster to do 6 sequential scans than it is to do the sort/group.

Besides, the running out of memory argument is silly, because if sorting
moves out to disk when it gets too big, you can do the same with a hash
table.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Canada, Mexico, and Australia form the Axis of Nations That
> Are Actually Quite Nice But Secretly Have Nasty Thoughts About America

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2002-04-23 23:56:11 Re: PgAccess patching
Previous Message Bruce Momjian 2002-04-23 23:52:50 Re: nested transactions