Re: Composite keys

Lists: pgsql-performance
From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Composite keys
Date: 2011-10-11 15:16:07
Message-ID: CCB4B01D184042ECBBE456D1ACB185A4@CAPRICA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Excuse the noob question, I couldn't find any reading material on this
topic.

Let's say my_table has two fields, pkey_id and another_id. The primary key
is pkey_id and of course indexed.

Then someone adds a composite index on btree(pkey_id, another_id).

Question 1) Is there any benefit to having pkey_id in the second index
(assuming the index was created to satisfy some arbitrary WHERE clause)?

Question 2) Regardless of the answer to Question 1 - if another_id is not
guaranteed to be unique, whereas pkey_id is - there any value to changing
the order of declaration (more generally, is there a performance impact for
column ordering in btree composite keys?)

Thanks,

Carlo


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-12 00:52:16
Message-ID: CAGTBQpYMzj+xufEQBr7_aL3ZjZWaADU1oF40-ji8n8GcpZLj1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Oct 11, 2011 at 5:16 PM, Carlo Stonebanks
<stonec(dot)register(at)sympatico(dot)ca> wrote:
> Question 2) Regardless of the answer to Question 1 - if another_id is not
> guaranteed to be unique, whereas pkey_id is – there any value to changing
> the order of declaration (more generally, is there a performance impact for
> column ordering in btree composite keys?)

Multicolumn indices on (c1, c2, ..., cn) can only be used on where
clauses involving c1..ck with k<n.

So, an index on (a,b) does *not* help for querying on b.

Furthermore, if a is unique, querying on a or querying on a and b is
equally selective. b there is just consuming space and cpu cycles.

I'd say, although it obviously depends on the queries you issue, you
only need an index on another_id.


From: Dave Crooke <dcrooke(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-12 01:28:29
Message-ID: CALi4UpjWC15S8D3w-8u7TDVRdcq+PtHoaCOBSppxPiKw-BXJVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Claudio is on point, I'll be even more pointed ....

If pkey_id truly is a primary key in the database sense of the term, and
thus unique, then IIUC there is no circumstance in which your composite
index would ever even get used ... all it's doing is slowing down writes :-)
If the query is sufficiently selective on pkey_id to merit using an index,
then the planner will use the primary key index, because it's narrower; if
not, then the only other option is to do a full table scan because there is
no index of which another_id is a prefix.

There are only three options which make sense:

1. No additional indexes, just the primary key
2. An additional index on (another_id)
3. An additional index on (another_id, pkey_id)
4. Both 2. and 3.

Choosing between these depends on a lot of variables of the query mix in
practice ... you could set up both 2. and 3. and then see which indexes the
planner actually uses in practice and then decide which to keep.

The value in having pkey_id in the index in 3. is for queries whose primary
selectivity is on another_id, but which also have some selectivity on
pkey_id .... the planner can use an index scan to filter candidate rows /
blocks to look at. This is especially helpful if another_id is not very
selective and / or the rows are quite wide.

On gut feel, it seems unlikely that you'd have a real-world circumstance in
which it makes sense to choose option 4. but it can't be ruled out without
further context.

Cheers
Dave

On Tue, Oct 11, 2011 at 7:52 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:

> On Tue, Oct 11, 2011 at 5:16 PM, Carlo Stonebanks
> <stonec(dot)register(at)sympatico(dot)ca> wrote:
> > Question 2) Regardless of the answer to Question 1 - if another_id is not
> > guaranteed to be unique, whereas pkey_id is – there any value to changing
> > the order of declaration (more generally, is there a performance impact
> for
> > column ordering in btree composite keys?)
>
> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
> clauses involving c1..ck with k<n.
>
> So, an index on (a,b) does *not* help for querying on b.
>
> Furthermore, if a is unique, querying on a or querying on a and b is
> equally selective. b there is just consuming space and cpu cycles.
>
> I'd say, although it obviously depends on the queries you issue, you
> only need an index on another_id.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: "'Dave Crooke'" <dcrooke(at)gmail(dot)com>, "'Claudio Freire'" <klaussfreire(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Composite keys
Date: 2011-10-12 04:39:08
Message-ID: EF3BA8897ED94C449F8D3C2722AE4210@CAPRICA
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks Dave & Claudio.

Unfortunately, my specific example had a primary key in it (based on a
real-world case) but this kind of distracted from the general point.

So with PG I will stick to the general SQL rule that IF I use compound keys
then we have the most selective columns to the left. correct?

_____

From: Dave Crooke [mailto:dcrooke(at)gmail(dot)com]
Sent: October 11, 2011 9:28 PM
To: Claudio Freire
Cc: Carlo Stonebanks; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Composite keys

Claudio is on point, I'll be even more pointed ....

If pkey_id truly is a primary key in the database sense of the term, and
thus unique, then IIUC there is no circumstance in which your composite
index would ever even get used ... all it's doing is slowing down writes :-)
If the query is sufficiently selective on pkey_id to merit using an index,
then the planner will use the primary key index, because it's narrower; if
not, then the only other option is to do a full table scan because there is
no index of which another_id is a prefix.

There are only three options which make sense:

1. No additional indexes, just the primary key
2. An additional index on (another_id)
3. An additional index on (another_id, pkey_id)
4. Both 2. and 3.

Choosing between these depends on a lot of variables of the query mix in
practice ... you could set up both 2. and 3. and then see which indexes the
planner actually uses in practice and then decide which to keep.

The value in having pkey_id in the index in 3. is for queries whose primary
selectivity is on another_id, but which also have some selectivity on
pkey_id .... the planner can use an index scan to filter candidate rows /
blocks to look at. This is especially helpful if another_id is not very
selective and / or the rows are quite wide.

On gut feel, it seems unlikely that you'd have a real-world circumstance in
which it makes sense to choose option 4. but it can't be ruled out without
further context.

Cheers
Dave

On Tue, Oct 11, 2011 at 7:52 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>
wrote:

On Tue, Oct 11, 2011 at 5:16 PM, Carlo Stonebanks
<stonec(dot)register(at)sympatico(dot)ca> wrote:
> Question 2) Regardless of the answer to Question 1 - if another_id is not
> guaranteed to be unique, whereas pkey_id is - there any value to changing
> the order of declaration (more generally, is there a performance impact
for
> column ordering in btree composite keys?)

Multicolumn indices on (c1, c2, ..., cn) can only be used on where
clauses involving c1..ck with k<n.

So, an index on (a,b) does *not* help for querying on b.

Furthermore, if a is unique, querying on a or querying on a and b is
equally selective. b there is just consuming space and cpu cycles.

I'd say, although it obviously depends on the queries you issue, you
only need an index on another_id.

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-12 10:26:33
Message-ID: 4E956B59.8030608@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10/12/2011 12:39 AM, Carlo Stonebanks wrote:
>
> So with PG I will stick to the general SQL rule that IF I use compound
> keys then we have the most selective columns to the left... correct?
>

There was a subtle point Dave made you should pay close attention to
though. If there are multiple indexes that start with the same column,
PostgreSQL is biased toward picking the smallest of them. The amount of
extra I/O needed to navigate a wider index is such that the second
column has to be very selective, too, before it will be used instead of
a narrower single column one. There are plenty of times that the reason
behind "why isn't it using my index?" is "the index is too fat to
navigate efficiently", because the actual number of blocks involved is
factored into the cost computations.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-31 17:08:11
Message-ID: CA+TgmoY4Q+CyvxEVcNRjtMH_HeCP+uNXkrC8goVDVktsbAPNnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Oct 11, 2011 at 8:52 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Tue, Oct 11, 2011 at 5:16 PM, Carlo Stonebanks
> <stonec(dot)register(at)sympatico(dot)ca> wrote:
>> Question 2) Regardless of the answer to Question 1 - if another_id is not
>> guaranteed to be unique, whereas pkey_id is – there any value to changing
>> the order of declaration (more generally, is there a performance impact for
>> column ordering in btree composite keys?)
>
> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
> clauses involving c1..ck with k<n.

I don't think that's true. I believe it can be used for a query that
only touches, say, c2. It's just extremely inefficient.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-31 17:52:14
Message-ID: CAGTBQpb1E7jQBn1S-yBCMo2w=V+iQRed-sDqNTj=cKPnYRicRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 31, 2011 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
>> clauses involving c1..ck with k<n.
>
> I don't think that's true.  I believe it can be used for a query that
> only touches, say, c2.  It's just extremely inefficient.

Does postgres generate those kinds of plans?
I do not think so. I've never seen it happening.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-31 18:24:46
Message-ID: CA+Tgmobvu1naVbBBPi9ng39=dDAPdzQ1wiMbEP8yJRVAF3O1fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 31, 2011 at 1:52 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Mon, Oct 31, 2011 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
>>> clauses involving c1..ck with k<n.
>>
>> I don't think that's true.  I believe it can be used for a query that
>> only touches, say, c2.  It's just extremely inefficient.
>
> Does postgres generate those kinds of plans?
> I do not think so. I've never seen it happening.

Sure it does:

rhaas=# create table baz (a bool, b int, c text, primary key (a, b));
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"baz_pkey" for table "baz"
CREATE TABLE
rhaas=# insert into baz select true, g,
random()::text||random()::text||random()::text||random()::text from
generate_series(1,400000) g;
INSERT 0 400000
rhaas=# analyze baz;
ANALYZE
rhaas=# explain analyze select * from baz where b = 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Scan using baz_pkey on baz (cost=0.00..7400.30 rows=1
width=74) (actual time=0.104..20.691 rows=1 loops=1)
Index Cond: (b = 1)
Total runtime: 20.742 ms
(3 rows)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-31 18:34:25
Message-ID: CAGTBQpb+HTEsJ8xFx5jjSzReS57W3t5CrcUptPwfmgj5RQiF6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 31, 2011 at 3:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Sure it does:
>
> rhaas=# create table baz (a bool, b int, c text, primary key (a, b));
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "baz_pkey" for table "baz"
> CREATE TABLE
> rhaas=# insert into baz select true, g,
> random()::text||random()::text||random()::text||random()::text from
> generate_series(1,400000) g;

Ok, that's artificially skewed, since the index has only one value in
the first column.

But it does prove PG considers the case, and takes into account the
number of values it has to iterate over on the first column, which is
very very interesting and cool.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-31 18:42:09
Message-ID: CA+TgmoaKfMX8rykDNbufyY-Mvk=pnt_cpvOxe9CaAEpzLc+Wkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Oct 31, 2011 at 2:34 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Mon, Oct 31, 2011 at 3:24 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Sure it does:
>>
>> rhaas=# create table baz (a bool, b int, c text, primary key (a, b));
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "baz_pkey" for table "baz"
>> CREATE TABLE
>> rhaas=# insert into baz select true, g,
>> random()::text||random()::text||random()::text||random()::text from
>> generate_series(1,400000) g;
>
> Ok, that's artificially skewed, since the index has only one value in
> the first column.
>
> But it does prove PG considers the case, and takes into account the
> number of values it has to iterate over on the first column, which is
> very very interesting and cool.

Yes. As your experience indicates, it's rare for this to be the best
plan. But it is considered. So there you have it. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Composite keys
Date: 2011-10-31 18:59:19
Message-ID: 5180.1320087559@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Claudio Freire <klaussfreire(at)gmail(dot)com> writes:
> On Mon, Oct 31, 2011 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Multicolumn indices on (c1, c2, ..., cn) can only be used on where
>>> clauses involving c1..ck with k<n.

>> I don't think that's true. I believe it can be used for a query that
>> only touches, say, c2. It's just extremely inefficient.

> Does postgres generate those kinds of plans?

Sure it does. It doesn't usually think they're efficient enough,
because they require full-index scans. But sometimes that's the
best you can do.

regards, tom lane