Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: Group by more efficient than distinct?



Title: Re: [PERFORM] Group by more efficient than distinct?
Hi Francisco,

Generally, PG sorting is much slower than hash aggregation for performing the distinct operation.  There may be small sizes where this isn’t true, but for large amounts of data (in-memory or not), hash agg (used most often, but not always by GROUP BY) is faster.

We’ve implemented a special optimization to PG sorting that does the distinct processing within the sort, instead of afterward, but it’s limited to some small-ish number (10,000) of distinct values due to it’s use of a memory and processing intensive heap.

So, you’re better off using GROUP BY and making sure that the planner is using hash agg to do the work.

- Luke


On 4/17/08 8:46 PM, "Francisco Reyes" <lists(at)stringsutils(dot)com> wrote:

I am trying to get a distinct set of rows from 2 tables.
After looking at someone else's query I noticed they were doing a group by
to obtain the unique list.

After comparing on multiple machines with several tables, it seems using
group by to obtain a distinct list is substantially faster than using
select distinct.

Is there any dissadvantage of using "group by" to obtain a unique list?

On a small dataset the difference was about 20% percent.

Group by
 HashAggregate  (cost=369.61..381.12 rows=1151 width=8) (actual
time=76.641..85.167 rows=2890 loops=1)

Distinct
 Unique  (cost=1088.23..1174.53 rows=1151 width=8) (actual
time=90.516..140.123 rows=2890 loops=1)

Although I don't have the numbers here with me, a simmilar result was
obtaining against a query that would return 100,000 rows. 20% and more
speed differnce between "group by" over "select distinct".  

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group