Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: Group by more efficient than distinct?



On Fri, 18 Apr 2008 11:36:02 +0200, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:

"Francisco Reyes" <lists(at)stringsutils(dot)com> writes:

Is there any dissadvantage of using "group by" to obtain a unique list?

On a small dataset the difference was about 20% percent.

Group by
HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
time=76.641..85.167 rows=2890 loops=1)

	Basically :

- If you process up to some percentage of your RAM worth of data, hashing is going to be a lot faster - If the size of the hash grows larger than your RAM, hashing will fail miserably and sorting will be much faster since PG's disksort is really good
	- GROUP BY knows this and acts accordingly
	- DISTINCT doesn't know this, it only knows sorting, so it sorts
- If you need DISTINCT x ORDER BY x, sorting may be faster too (depending on the % of distinct rows)
	- If you need DISTINCT ON, well, you're stuck with the Sort
	- So, for the time being, you can replace DISTINCT with GROUP BY...



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group