Re: Sort performance on large tables

From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "Charlie Savage" <cfis(at)interserv(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort performance on large tables
Date: 2005-11-08 16:21:39
Message-ID: 3E37B936B592014B978C4415F90D662D0193B9F0@MI8NYCMAIL06.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Charlie,

> Should I expect results like this? I realize that the
> computer is quite low-end and is very IO bound for this
> query, but I'm still surprised that the sort operation takes so long.

It's the sort performance of Postgres that's your problem.

> Out of curiosity, I setup an Oracle database on the same
> machine with the same data and ran the same query. Oracle
> was over an order of magnitude faster. Looking at its query
> plan, it avoided the sort by using "HASH GROUP BY." Does
> such a construct exist in PostgreSQL (I see only hash joins)?

Yes, hashaggregate does a similar thing. You can force the planner to
do it, don't remember off the top of my head but someone else on-list
will.

> Also as an experiment I forced oracle to do a sort by running
> this query:
>
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid
> ORDER BY tlid;
>
> Even with this, it was more than a magnitude faster than Postgresql.
> Which makes me think I have somehow misconfigured postgresql
> (see the relevant parts of postgresql.conf below).

Just as we find with a similar comparison (with a "popular commercial,
proprietary database" :-) Though some might suggest you increase
work_mem or other tuning suggestions to speed sorting, none work. In
fact, we find that increasing work_mem actually slows sorting slightly.

We are commissioning an improved sorting routine for bizgres
(www.bizgres.org) which will be contributed to the postgres main, but
won't come out at least until 8.2 comes out, possibly 12 mos. In the
meantime, you will be able to use the new routine in the bizgres version
of postgres, possibly in the next couple of months.

Also - we (Greenplum) are about to announce the public beta of the
bizgres MPP database, which will use all of your CPUs, and those of
other nodes in a cluster, for sorting. We see a linear scaling of sort
performance, so you could add CPUs and/or hosts and scale out of the
problem.

Cheers,

- Luke

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2005-11-08 17:38:21 Re: Sort performance on large tables
Previous Message Marc Morin 2005-11-08 13:19:44 Re: Sort performance on large tables