Re: low cardinality column

Lists: pgsql-performance
From: "Rong Wu" <rwu(at)cbnco(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: low cardinality column
Date: 2003-10-02 18:30:01
Message-ID: 33423.207.164.182.8.1065119401.squirrel@mail.cbnco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I have a select like this:

SELECT MAX(transactionid) FROM cbntransaction WHERE transactiontypeid=0;

in the query:
transactionid is the primary key of cbntransaction table,
But transactiontypeid is a low cardinality column, there're over 100,000
records has the same trnsactiontypeid.
I was trying to create an index on (transactiontypeid, transactionid), but
no luck on that, postgresql will still scan the table.
I'm wondering if there's solution for this query:
Maybe something like if I can partition the table using transactiontypeid,
and do a local index on transactionid on each partition, but I couldnt'
find any doc on postgresql to do that.

Thanks in advance,
rong :-)


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: rwu(at)cbnco(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: low cardinality column
Date: 2003-10-02 18:37:22
Message-ID: 200310021137.22119.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Rong,

> I have a select like this:
>
> SELECT MAX(transactionid) FROM cbntransaction WHERE transactiontypeid=0;

Simple workaround:

Create an mulit-column index on transactiontypeid, transactionid.

SELECT transactionid FROM cbtransaction
WHERE transactiontypeid=0
ORDER BY transactionid DESC LIMIT 1;

This approach will use the index.

Of course, if the reason you are selecting the max id is to get the next id,
there are much better ways to do that.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: rwu(at)cbnco(dot)com
Cc: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: low cardinality column
Date: 2003-10-02 18:50:44
Message-ID: 1065120643.89807.40.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 2003-10-02 at 14:30, Rong Wu wrote:
> Hi,
>
> I have a select like this:
>
> SELECT MAX(transactionid) FROM cbntransaction WHERE transactiontypeid=0;

For various reasons (primarily MVCC and the ability to make custom
aggregates making it difficult) MAX() is not optimized in this fashion.

Try:

SELECT transactionid
FROM ...
WHERE ...
ORDER BY transactionid DESC
LIMIT 1;


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To:
Cc: rwu(at)cbnco(dot)com, Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: low cardinality column
Date: 2003-10-02 19:00:19
Message-ID: 3F7C75C3.4060904@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Rod Taylor wrote:
> On Thu, 2003-10-02 at 14:30, Rong Wu wrote:
>
>>Hi,
>>
>>I have a select like this:
>>
>>SELECT MAX(transactionid) FROM cbntransaction WHERE transactiontypeid=0;
>
>
> For various reasons (primarily MVCC and the ability to make custom
> aggregates making it difficult) MAX() is not optimized in this fashion.
>
> Try:
>
> SELECT transactionid
> FROM ...
> WHERE ...
> ORDER BY transactionid DESC
> LIMIT 1;

Despite this good suggestion, if you're using this technique to generate
the next transaction ID, you're going to have errors as concurrency rises.

Use a SERIAL, which guarantees that you won't have two processes generate
the same number.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


From: "Rong Wu" <rwu(at)cbnco(dot)com>
To: "Postgresql Performance" <pgsql-performance(at)postgresql(dot)org>
Subject: Thanks - Re: low cardinality column
Date: 2003-10-02 20:11:21
Message-ID: 33506.207.164.182.8.1065125481.squirrel@mail.cbnco.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks, Rod, Josh and Bill, That' fantastic.

have a nice day,
rong :-)

> Rod Taylor wrote:
>> On Thu, 2003-10-02 at 14:30, Rong Wu wrote:
>>
>>>Hi,
>>>
>>>I have a select like this:
>>>
>>>SELECT MAX(transactionid) FROM cbntransaction WHERE transactiontypeid=0;
>>
>>
>> For various reasons (primarily MVCC and the ability to make custom
>> aggregates making it difficult) MAX() is not optimized in this fashion.
>>
>> Try:
>>
>> SELECT transactionid
>> FROM ...
>> WHERE ...
>> ORDER BY transactionid DESC
>> LIMIT 1;
>
> Despite this good suggestion, if you're using this technique to generate
> the next transaction ID, you're going to have errors as concurrency rises.
>
> Use a SERIAL, which guarantees that you won't have two processes generate
> the same number.
>
> --
> Bill Moran
> Potential Technologies
> http://www.potentialtech.com
>
>