Re: Sort performance on large tables

Lists: pgsql-performance
From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "Charlie Savage" <cfis(at)interserv(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort performance on large tables
Date: 2005-11-08 16:21:39
Message-ID: 3E37B936B592014B978C4415F90D662D0193B9F0@MI8NYCMAIL06.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Charlie,

> Should I expect results like this? I realize that the
> computer is quite low-end and is very IO bound for this
> query, but I'm still surprised that the sort operation takes so long.

It's the sort performance of Postgres that's your problem.

> Out of curiosity, I setup an Oracle database on the same
> machine with the same data and ran the same query. Oracle
> was over an order of magnitude faster. Looking at its query
> plan, it avoided the sort by using "HASH GROUP BY." Does
> such a construct exist in PostgreSQL (I see only hash joins)?

Yes, hashaggregate does a similar thing. You can force the planner to
do it, don't remember off the top of my head but someone else on-list
will.

> Also as an experiment I forced oracle to do a sort by running
> this query:
>
> SELECT tlid, min(ogc_fid)
> FROM completechain
> GROUP BY tlid
> ORDER BY tlid;
>
> Even with this, it was more than a magnitude faster than Postgresql.
> Which makes me think I have somehow misconfigured postgresql
> (see the relevant parts of postgresql.conf below).

Just as we find with a similar comparison (with a "popular commercial,
proprietary database" :-) Though some might suggest you increase
work_mem or other tuning suggestions to speed sorting, none work. In
fact, we find that increasing work_mem actually slows sorting slightly.

We are commissioning an improved sorting routine for bizgres
(www.bizgres.org) which will be contributed to the postgres main, but
won't come out at least until 8.2 comes out, possibly 12 mos. In the
meantime, you will be able to use the new routine in the bizgres version
of postgres, possibly in the next couple of months.

Also - we (Greenplum) are about to announce the public beta of the
bizgres MPP database, which will use all of your CPUs, and those of
other nodes in a cluster, for sorting. We see a linear scaling of sort
performance, so you could add CPUs and/or hosts and scale out of the
problem.

Cheers,

- Luke


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Luke Lonergan <LLonergan(at)greenplum(dot)com>
Cc: Charlie Savage <cfis(at)interserv(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort performance on large tables
Date: 2005-11-08 17:38:21
Message-ID: 20051108093500.N31541@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 8 Nov 2005, Luke Lonergan wrote:

> > SELECT tlid, min(ogc_fid)
> > FROM completechain
> > GROUP BY tlid
> > ORDER BY tlid;
> >
> > Even with this, it was more than a magnitude faster than Postgresql.
> > Which makes me think I have somehow misconfigured postgresql
> > (see the relevant parts of postgresql.conf below).
>
> Just as we find with a similar comparison (with a "popular commercial,
> proprietary database" :-) Though some might suggest you increase
> work_mem or other tuning suggestions to speed sorting, none work. In
> fact, we find that increasing work_mem actually slows sorting slightly.

I wish you'd qualify your statements, because I can demonstrably show that
I can make sorts go faster on my machine at least by increasing work_mem
under some conditions.


From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Kurt Harriman" <kharriman(at)greenplum(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort performance on large tables
Date: 2005-11-08 19:09:07
Message-ID: BF9637D3.1338F%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Stephan,

On 11/8/05 9:38 AM, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com> wrote:

>> >
>> > Just as we find with a similar comparison (with a "popular commercial,
>> > proprietary database" :-) Though some might suggest you increase
>> > work_mem or other tuning suggestions to speed sorting, none work. In
>> > fact, we find that increasing work_mem actually slows sorting slightly.
>
> I wish you'd qualify your statements, because I can demonstrably show that
> I can make sorts go faster on my machine at least by increasing work_mem
> under some conditions.
>
Cool ­ can you provide your test case please? I¹ll ask our folks to do the
same, but as I recall we did some pretty thorough testing and found that it
doesn¹t help. Moreover, the conclusion was that the current algorithm isn¹t
designed to use memory effectively.

Recognize also that we¹re looking for a factor of 10 or more improvement
here ­ this is not a small increase that¹s needed.

- Luke


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Kurt Harriman <kharriman(at)greenplum(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort performance on large tables
Date: 2005-11-08 20:48:45
Message-ID: 20051108124255.Y43056@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Tue, 8 Nov 2005, Luke Lonergan wrote:

> Stephan,
>
> On 11/8/05 9:38 AM, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com> wrote:
>
> >> >
> >> > Just as we find with a similar comparison (with a "popular commercial,
> >> > proprietary database" :-) Though some might suggest you increase
> >> > work_mem or other tuning suggestions to speed sorting, none work. In
> >> > fact, we find that increasing work_mem actually slows sorting slightly.
> >
> > I wish you'd qualify your statements, because I can demonstrably show that
> > I can make sorts go faster on my machine at least by increasing work_mem
> > under some conditions.
> >
> Cool can you provide your test case please?

I probably should have added the wink smiley to make it obvious I was
talking about the simplest case, things that don't fit in work_mem at the
current level but for which it's easy to raise work_mem to cover. It's not
a big a gain as one might hope, but it does certainly drop again.

> Recognize also that were looking for a factor of 10 or more improvement
> here this is not a small increase thats needed.

I agree that we definately need help on that regard. I do see the effect
where raising work_mem lowers the performance up until that point. I just
think that it requires more care in the discussion than disregarding the
suggestions entirely especially since people are going to see this in the
archives.