Re: Best way to load test a postgresql server

Lists: pgsql-performance
From: "Peter Sheats" <psheats(at)pbpost(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Best way to load test a postgresql server
Date: 2009-06-01 15:55:53
Message-ID: C6497249.20452%psheats@pbpost.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I¹m about to set up a large instance on Amazon EC2 to be our DB server.

Before we switch to using it in production I would like to simulate some
load on it so that I know what it can handle and so that I can make sure I
have the optimal settings in the config file.

What is the best strategy out there for doing this? Does anyone know of
some resource that talks about doing this?

Thanks,

Peter


From: Alan McKay <alan(dot)mckay(at)gmail(dot)com>
To: Peter Sheats <psheats(at)pbpost(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-01 16:27:23
Message-ID: 844129e80906010927p5243b7cj588e821cbde63281@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Disclaimer : I'm very much a newbie here!

But I am on the path in my new job to figure this stuff out as well,
and went to PG Con here in Ottawa 2 weeks ago and attended quite a few
lectures on this topic. Have a look at :

http://wiki.postgresql.org/wiki/PgCon_2009

And in particular "Database Hardware Benchmarking" by Greg Smith
and
"Visualizing Postgres" by Michael Glaesmann
"Performance Whack-a-Mole" by Josh Berkus

--
“Mother Nature doesn’t do bailouts.”
- Glenn Prickett


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Peter Sheats" <psheats(at)pbpost(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-02 09:26:41
Message-ID: 87iqjf2dgu.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi, "Peter Sheats" <psheats(at)pbpost(dot)com> writes: > I’m about to
set up a large instance on Amazon EC2 to be our DB server. > >
Before we switch to using it in production I would like to
simulate some load on it so that I know what it can handle and so
that I can make sure I have the > optimal settings in the config
file. > > What is the best strategy out there for doing this?
Does anyone know of some resource that talks about doing this?
I'd recommand having a look at tsung which will be able to replay
a typical application scenario with as many concurrent users as
you want to:
http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
http://tsung.erlang-projects.org/
http://pgfouine.projects.postgresql.org/tsung.html

If you want to replay your logs at the current production speed
and
concurrency, see Playr.
https://area51.myyearbook.com/trac.cgi/wiki/Playr

Regards,
--
dim


From: "Kenneth Cox" <kenstir(at)gmail(dot)com>
To: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>, "Peter Sheats" <psheats(at)pbpost(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-02 13:02:51
Message-ID: op.uuwei1pb5ru9c3@kent60.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:
> I'd recommand having a look at tsung which will be able to replay a
> typical application scenario with as many concurrent users as you want
> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
> http://tsung.erlang-projects.org/
> http://pgfouine.projects.postgresql.org/tsung.html

I am having a look at tsung and not getting very far yet. Have you had
luck with it and do you really mean as many concurrent users as you want?
I was hoping to use it to simulate my current load while tuning and making
improvements. So far tsung doesn't appear well suited to my needs. I use
persistent connections; each tsung session uses a new connection. I have
multiple applications that have very usage patterns (some web and largely
idle, some non web and almost saturated); tsung has virtual users choosing
a session based on a probability with think times. I know many
programming languages; tsung (and its error messages) is in erlang.

> If you want to replay your logs at the current production speed and
> concurrency, see Playr.
> https://area51.myyearbook.com/trac.cgi/wiki/Playr

Thanks for this tip. It seems worth a look.

Regards,
Ken


From: Shaul Dar <shauldar(at)gmail(dot)com>
To: Peter Sheats <psheats(at)pbpost(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-02 13:31:03
Message-ID: 234efe30906020631o72482f22q4617cd41629c68ee@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Peter,

I was looking for the same recently, and my answer is as follows:

1. If you want to test the *H/W and configuration of your DBMS* then you can
use the pgbench tool (which uses a specific built-in DB+schema, following
the TPC benchmark).

2. If you want to *load test your own specific DB* then I am unaware of any
such tools. I ended up using JMeter with the JDBC connector for
Postgresql<http://jakarta.apache.org/jmeter/usermanual/build-db-test-plan.html>.
It took me a while to get it configured and running, but I now think JMeter
is excellent. I suggest you use JMeter 2.3.2, as I upgraded to 2.3.3 and it
seems to have a bug with JDBC connection to Postgres.

-- Shaul

On Mon, Jun 1, 2009 at 6:55 PM, Peter Sheats <psheats(at)pbpost(dot)com> wrote:

> Hi,
>
> I’m about to set up a large instance on Amazon EC2 to be our DB server.
>
> Before we switch to using it in production I would like to simulate some
> load on it so that I know what it can handle and so that I can make sure I
> have the optimal settings in the config file.
>
> What is the best strategy out there for doing this? Does anyone know of
> some resource that talks about doing this?
>
> Thanks,
>
> Peter
>


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Kenneth Cox" <kenstir(at)gmail(dot)com>
Cc: "Peter Sheats" <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-03 09:29:02
Message-ID: 87iqjdy8bl.fsf@enterprise.technique.hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Kenneth Cox" <kenstir(at)gmail(dot)com> writes:
> On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
> <dfontaine(at)hi-media(dot)com> wrote:
>> I'd recommand having a look at tsung which will be able to replay a
>> typical application scenario with as many concurrent users as you want
>> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
>> http://tsung.erlang-projects.org/
>> http://pgfouine.projects.postgresql.org/tsung.html
>
> I am having a look at tsung and not getting very far yet. Have you had luck
> with it and do you really mean as many concurrent users as you want?

Last time I used it it was in the context of a web application and to
compare PostgreSQL against Informix after a migration. So I used the
HTTP protocol support of the injector.

Tsung is based on erlang and can be run from more than one node at any
time, last time I checked you could run 600 to 800 concurrent clients
from each node. Recent versions of erlang allow a much greater number
per node, one or two orders of magnitude greater, as I've been told by
Tsung's main developer.

> I was
> hoping to use it to simulate my current load while tuning and making
> improvements. So far tsung doesn't appear well suited to my needs. I use
> persistent connections; each tsung session uses a new connection. I have
> multiple applications that have very usage patterns (some web and largely
> idle, some non web and almost saturated); tsung has virtual users choosing
> a session based on a probability with think times. I know many programming
> languages; tsung (and its error messages) is in erlang.

Tsung can be setup as an http or postgresql proxy: in this mode it'll
prepare session files for you while you use your application as
usual. The thinktime it sees will then get randomized at run time to
better reflect real usage.

You can define several user arrival phases to see what happens when the
load raises then get back to normal traffic. Lots of options, really.

Tsung generates statistics and comes with tools to analyze them and
provide graphs organized into a web page, one of those tools allow to
draw graphs from different simulations onto the same chart, with the
same scaling, in order to easily compare results.

It seems to me tsung is a good tool for your use case.

Regards,
--
dim


From: Shaul Dar <shauldar(at)gmail(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Kenneth Cox <kenstir(at)gmail(dot)com>, Peter Sheats <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-03 11:11:16
Message-ID: 234efe30906030411y68361b97w6e34a244f9b8a13@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I considered Tsung myself but haven't tried it. If you intend to, I suggest
you read this excellent tutorial on using Tsung for test-loading
Postgresql<http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php>.
While impressed I decided the procedure was too daunting and went with
JMeter :-) It too can run test from multiple clients and has built in tables
and graphs and you can save results as CSV or XML etc. In particular I
recommend adding the extenion "listener" (JMeter term for anything that
captures and portrays test results) called Statitical Aggregate Report.

May the force be with you,

-- Shaul

On Wed, Jun 3, 2009 at 12:29 PM, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>wrote:

> "Kenneth Cox" <kenstir(at)gmail(dot)com> writes:
> > On Tue, 02 Jun 2009 05:26:41 -0400, Dimitri Fontaine
> > <dfontaine(at)hi-media(dot)com> wrote:
> >> I'd recommand having a look at tsung which will be able to replay a
> >> typical application scenario with as many concurrent users as you want
> >> to: http://archives.postgresql.org/pgsql-admin/2008-12/msg00032.php
> >> http://tsung.erlang-projects.org/
> >> http://pgfouine.projects.postgresql.org/tsung.html
> >
> > I am having a look at tsung and not getting very far yet. Have you had
> luck
> > with it and do you really mean as many concurrent users as you want?
>
> Last time I used it it was in the context of a web application and to
> compare PostgreSQL against Informix after a migration. So I used the
> HTTP protocol support of the injector.
>
> Tsung is based on erlang and can be run from more than one node at any
> time, last time I checked you could run 600 to 800 concurrent clients
> from each node. Recent versions of erlang allow a much greater number
> per node, one or two orders of magnitude greater, as I've been told by
> Tsung's main developer.
>
> > I was
> > hoping to use it to simulate my current load while tuning and making
> > improvements. So far tsung doesn't appear well suited to my needs. I
> use
> > persistent connections; each tsung session uses a new connection. I have
> > multiple applications that have very usage patterns (some web and largely
> > idle, some non web and almost saturated); tsung has virtual users
> choosing
> > a session based on a probability with think times. I know many
> programming
> > languages; tsung (and its error messages) is in erlang.
>
> Tsung can be setup as an http or postgresql proxy: in this mode it'll
> prepare session files for you while you use your application as
> usual. The thinktime it sees will then get randomized at run time to
> better reflect real usage.
>
> You can define several user arrival phases to see what happens when the
> load raises then get back to normal traffic. Lots of options, really.
>
> Tsung generates statistics and comes with tools to analyze them and
> provide graphs organized into a web page, one of those tools allow to
> draw graphs from different simulations onto the same chart, with the
> same scaling, in order to easily compare results.
>
> It seems to me tsung is a good tool for your use case.
>
> Regards,
> --
> dim
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: "Kenneth Cox" <kenstir(at)gmail(dot)com>
To: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>
Cc: "Peter Sheats" <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-03 13:09:15
Message-ID: op.uux9hpz65ru9c3@kent60.office.vivox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 03 Jun 2009 05:29:02 -0400, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:
> Last time I used it it was in the context of a web application and to
> compare PostgreSQL against Informix after a migration. So I used the
> HTTP protocol support of the injector.

Tsung seems well suited for that.

> Tsung is based on erlang...you could run 600 to 800 concurrent clients
> from each node.

But each tsung session (virtual user) uses a separate PG connection, and I
need 30k virtual users. I can't imagine 30k PG connections. I could
imagine using pgbouncer in statement pooling mode, but that doesn't
characterize my load well, where different PG connections have different
profiles. I have about 500 connections:

~450 from web servers, often idle, various work loads, no prepared
statements
50 from another client, mostly idle, small set of prepared statements
10 from another client, extremely active, small set of prepared
statements

I know a tsung session doesn't have to exactly mimic a user and I tried to
coerce a tsung session to represent instead a DB client, with loops and
multiple CSV files. I wasn't so successful there, and was nagged by the
assignment of sessions by probability, when I wanted a fixed number
running each session.

I do appreciate the suggestions, and I agree Tsung has lots of nifty
features. I used pgfouine to generate tsung sessions I love the graph
generation but for me it comes down to simulating my DB load so that I can
profile and tune the DB. I am not seeing how to get tsung to fit my case.

Next up I will try JMeter (thanks Shaul Dar for the suggestions).

Regards,
Ken


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Shaul Dar <shauldar(at)gmail(dot)com>
Cc: Peter Sheats <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-03 23:01:32
Message-ID: alpine.GSO.2.01.0906031854270.27272@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 2 Jun 2009, Shaul Dar wrote:

> If you want to test the H/W and configuration of your DBMS then you can
> use the pgbench tool (which uses a specific built-in DB+schema,
> following the TPC benchmark).

There are a lot of TPC benchmarks. pgbench simulates TPC-B (badly), which
is a benchmark from 1990. It's not at all representative of the current
TPC benchmarks.

> If you want to load test your own specific DB then I am unaware of any
> such tools.

pgbench will run against any schema and queries, the built-in set are just
the easiest to use. I just released a bunch of slides and a package I
named pgbench-tools that show some of the possibilities here, links to
everything are at:
http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.html

I'd mentioned working on that this before on this list but the code just
got stable enough to release recently. Anybody who is running lots of
pgbench tests at different database sizes and client loads might benefit
from using my toolset to automate running the tests and reporting on the
results.

The last few slides of my pgbench presentation show how you might write a
custom test that measures how fast rows of various sizes can be inserted
into your database at various client counts.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Shaul Dar <shauldar(at)gmail(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Peter Sheats <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-09 12:50:10
Message-ID: 234efe30906090550t1794b9aevfefc6a1cee1cc142@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Jun 4, 2009 at 2:01 AM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

>
> If you want to load test your own specific DB then I am unaware of any
>> such tools.
>>
>
> pgbench will run against any schema and queries, the built-in set are just
> the easiest to use. I just released a bunch of slides and a package I named
> pgbench-tools that show some of the possibilities here, links to everything
> are at:
> http://notemagnet.blogspot.com/2009/05/bottom-up-postgresql-benchmarking-and.html
>

ׂGreg,

Have you actually run pgbench against your own schema? Can you point me to
an example? I also had the same impression reading the documentation. But
when I tried it with the proper flags to use my own DB and query file I got
an error that it couldn't find one of the tables mentioned in the built-in
test! I concluded that I cannot use any schema, I could only supply my own
DB but with the same set of tables pgbench expects. Maybe I missed something
or made a mistake?

Thanks,

-- Shaul


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shaul Dar <shauldar(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Peter Sheats <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-09 13:53:30
Message-ID: 17341.1244555610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Shaul Dar <shauldar(at)gmail(dot)com> writes:
> Have you actually run pgbench against your own schema? Can you point me to
> an example? I also had the same impression reading the documentation. But
> when I tried it with the proper flags to use my own DB and query file I got
> an error that it couldn't find one of the tables mentioned in the built-in
> test! I concluded that I cannot use any schema,

No, you just need to read the documentation. There's a switch that
prevents the default action of trying to vacuum the "standard" tables.
I think -N, but too lazy to look ...

regards, tom lane


From: Erik Aronesty <erik(at)q32(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shaul Dar <shauldar(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Peter Sheats <psheats(at)pbpost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Best way to load test a postgresql server
Date: 2009-06-11 21:29:57
Message-ID: ccd588d90906111429s3a27a91cx4f26fc4f0c97bc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Technically you can then use pgbench on that set of statements, but I
usually just use perl's "Benchmark" module.... (i'm sure ruby or java
or whatever has a similar tool)

(First, I log statements by loading the application or web server with
statement logging turned on.... so I'm not "guessing" what sql will be
called. Usually doing this exposes a flotilla of inefficencies in
the code ....)

On Tue, Jun 9, 2009 at 9:53 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Shaul Dar <shauldar(at)gmail(dot)com> writes:
>> Have you actually run pgbench against your own schema? Can you point me to
>> an example? I also had the same impression reading the documentation. But
>> when I tried it with the proper flags to use my own DB and query file I got
>> an error that it couldn't find one of the tables mentioned in the built-in
>> test! I concluded that I cannot use any schema,
>
> No, you just need to read the documentation.  There's a switch that
> prevents the default action of trying to vacuum the "standard" tables.
> I think -N, but too lazy to look ...
>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: Craig James <craig_james(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Sorting by an arbitrary criterion
Date: 2009-07-09 16:26:42
Message-ID: 4A561A42.1080301@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5). I need to sort by an arbitrary (i.e. user-specified) mapping of CATEGORY, something like this:

1 => 'z'
2 => 'a'
3 => 'b'
4 => 'w'
5 => 'h'

So when I get done, the sort order should be 2,3,5,4,1.

I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL statement?

Thanks,
Craig


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 16:35:01
Message-ID: 2f4958ff0907090935r32f44227m13a158c6d2fcc5b2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james(at)emolecules(dot)com> wrote:
> Suppose I have a large table with a small-cardinality CATEGORY column (say,
> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
> mapping of CATEGORY, something like this:
>
>  1 => 'z'
>  2 => 'a'
>  3 => 'b'
>  4 => 'w'
>  5 => 'h'
>
> So when I get done, the sort order should be 2,3,5,4,1.
>
> I could create a temporary table with the category-to-key mapping, but is
> there any way to do this in a single SQL statement?
>

you can create translation table, join it, and sort by its key.

--
GJ


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 16:38:39
Message-ID: 2181.1247157519@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman(at)gmail(dot)com> writes:
> On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james(at)emolecules(dot)com> wrote:
>> Suppose I have a large table with a small-cardinality CATEGORY column (say,
>> categories 1..5). I need to sort by an arbitrary (i.e. user-specified)
>> mapping of CATEGORY, something like this:

> you can create translation table, join it, and sort by its key.

Much easier to
ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END

Actually, consider putting the CASE into a function and doing
ORDER BY sort_order(category)

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Craig James" <craig_james(at)emolecules(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 16:39:04
Message-ID: 4A55D6D80200002500028659@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Craig James <craig_james(at)emolecules(dot)com> wrote:
> Suppose I have a large table with a small-cardinality CATEGORY
> column (say, categories 1..5). I need to sort by an arbitrary
> (i.e. user-specified) mapping of CATEGORY

There was a recent thread discussing ways to do that:

http://archives.postgresql.org/pgsql-admin/2009-07/msg00016.php

-Kevin


From: Alexander Staubo <alex(at)bengler(dot)no>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 16:39:15
Message-ID: 88daf38c0907090939l68cd97dak38360473e6ffd9e8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Jul 9, 2009 at 6:26 PM, Craig James<craig_james(at)emolecules(dot)com> wrote:
> Suppose I have a large table with a small-cardinality CATEGORY column (say,
> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
> mapping of CATEGORY, something like this:
>
>  1 => 'z'
>  2 => 'a'
>  3 => 'b'
>  4 => 'w'
>  5 => 'h'
>
> So when I get done, the sort order should be 2,3,5,4,1.

If the object is to avoid a separate table, you can do it with a
"case" statement:

select ... from ...
order by case category
when 1 then 'z'
when 2 then 'a'
when 3 then 'b'
when 4 then 'w'
when 5 then 'h'
end

If you this sounds slow, you're right. But it might perform well
enough for your use case.

A.


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 18:11:09
Message-ID: 20090709181109.GB25676@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote:
> Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5). I need to sort by an arbitrary (i.e. user-specified) mapping of CATEGORY, something like this:
>
> 1 => 'z'
> 2 => 'a'
> 3 => 'b'
> 4 => 'w'
> 5 => 'h'
> So when I get done, the sort order should be 2,3,5,4,1.
> I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL statement?

You can do it like this:

select c.*
from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'), (5, 'h') ) as o (id, ordering) on c.id = o.id
order by o.ordering

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz(at)depesz(dot)com / aim:depeszhdl / skype:depesz_hdl / gg:6749007


From: "Hartman, Matthew" <Matthew(dot)Hartman(at)krcc(dot)on(dot)ca>
To: <depesz(at)depesz(dot)com>, "Craig James" <craig_james(at)emolecules(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 18:13:52
Message-ID: 366642367C5B354197A1E0D27BC175BD02259884@KGHMAIL.KGH.ON.CA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote:
> You can do it like this:
> select c.*
> from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'),
(5,
> 'h') ) as o (id, ordering) on c.id = o.id
> order by o.ordering

Another option would be:

select c.*
from categories c
order by case(c.category) when 1 then 'z' when 2 then 'a' then 3 then
'b' when 4 then 'w' when 5 then 'h' end;

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sorting by an arbitrary criterion
Date: 2009-07-09 20:25:21
Message-ID: 2f4958ff0907091325v3f21dd96y7d8c0ae93513f052@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2009/7/9 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman(at)gmail(dot)com> writes:
>> On Thu, Jul 9, 2009 at 5:26 PM, Craig James<craig_james(at)emolecules(dot)com> wrote:
>>> Suppose I have a large table with a small-cardinality CATEGORY column (say,
>>> categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
>>> mapping of CATEGORY, something like this:
>
>> you can create translation table, join it, and sort by its key.
>
> Much easier to
>        ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END
>
> Actually, consider putting the CASE into a function and doing
>        ORDER BY sort_order(category)

I suppose table is handy, when you have a lot of items as keys...

--
GJ