Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index

Lists: pgsql-general
From: Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-20 22:54:26
Message-ID: 200707210054.26881.vincenzo.romano@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all.
Maybe mine is a stupid question, but I'd like to know the answer if
possible.

In an inner join involving a 16M+ rows table and a 100+ rows table
performances got drastically improved by 100+ times by replacing a
UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in
the very same order. The query has not been modified.

In the older case, thanks to the EXPLAIN command, I saw that the join
was causing a sort on the index elements, while the primary key was
not.

So ther's some difference for sure, but I'm missing it.
Any hint?

--
Vincenzo Romano
--
Maybe Computer will never become as intelligent as Humans.
For sure they won't ever become so stupid. [VR-1988]


From: Michael Glaesemann <grzm(at)seespotcode(dot)net>
To: Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-20 23:36:27
Message-ID: D83A8DC3-7EE7-423A-A42A-ABB1663E1BEE@seespotcode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Jul 20, 2007, at 17:54 , Vincenzo Romano wrote:

> In an inner join involving a 16M+ rows table and a 100+ rows table
> performances got drastically improved by 100+ times by replacing a
> UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in
> the very same order. The query has not been modified.

There should be no difference in query performance, AIUI.

> In the older case, thanks to the EXPLAIN command, I saw that the join
> was causing a sort on the index elements, while the primary key was
> not.

Can you provide the actual EXPLAIN ANALYZE (not just EXPLAIN)
outputs you can provide for us to look at? I suspect there's a
difference wrt the size of the tables, the distribution of the values
of the involved columns, index bloat, or how recent the tables have
been analyzed. (Most likely the last.) Dropping the UNIQUE NOT NULL
constraint and adding the PRIMARY KEY constraint will cause the index
to be recreated, which could affect which plan is chosen and its
efficacy. Without the EXPLAIN ANALYZE output, I don't think there's a
lot of hope in understanding what's different.

Michael Glaesemann
grzm seespotcode net


From: "Josh Tolley" <eggyknap(at)gmail(dot)com>
To: "Michael Glaesemann" <grzm(at)seespotcode(dot)net>
Cc: "Vincenzo Romano" <vincenzo(dot)romano(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-21 05:32:29
Message-ID: e7e0a2570707202232q7f47d8deie8c33e20bdad224a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 7/20/07, Michael Glaesemann <grzm(at)seespotcode(dot)net> wrote:
>
> On Jul 20, 2007, at 17:54 , Vincenzo Romano wrote:
>
> > In an inner join involving a 16M+ rows table and a 100+ rows table
> > performances got drastically improved by 100+ times by replacing a
> > UNIQUE-NOT NULL index with a PRIMARY KEY on the very same columns in
> > the very same order. The query has not been modified.
>
> There should be no difference in query performance, AIUI.

If I read the documentation correctly, PRIMARY KEY is simply syntactic
sugar equivalent to UNIQUE + NOT NULL, the only difference being that
a PRIMARY KEY is reported as such to someone looking at the table
structure, which becomes more intuitive than seeing UNIQUE + NOT NULL.

>
> > In the older case, thanks to the EXPLAIN command, I saw that the join
> > was causing a sort on the index elements, while the primary key was
> > not.
>

Might it just be that the original UNIQUE + NOT NULL index was bloated
or otherwise degraded, and reindexing it would have resulted in the
same performance gain? That's just a guess.

-Josh


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Josh Tolley" <eggyknap(at)gmail(dot)com>
Cc: "Michael Glaesemann" <grzm(at)seespotcode(dot)net>, "Vincenzo Romano" <vincenzo(dot)romano(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-21 06:00:11
Message-ID: 21140.1184997611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Josh Tolley" <eggyknap(at)gmail(dot)com> writes:
> Might it just be that the original UNIQUE + NOT NULL index was bloated
> or otherwise degraded, and reindexing it would have resulted in the
> same performance gain? That's just a guess.

Yeah. There is precious little difference between UNIQUE+NOT NULL and
PRIMARY KEY --- to be exact, the latter will allow another table to
reference this one in FOREIGN KEY without specifying column names.
The planner knows nothing of that little convenience.

The interesting thing about this report is that the plan changed after
creating the new index. That has to mean that some statistic visible to
the planner changed. Creating an index does update the pg_class columns
about the table's size and number of rows, but probably those weren't
that far off to start with. My bet is that the new index is a lot
smaller than the old because of bloat in the old index. If so, REINDEX
would have had the same result.

regards, tom lane


From: Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Josh Tolley" <eggyknap(at)gmail(dot)com>, "Michael Glaesemann" <grzm(at)seespotcode(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-22 07:24:50
Message-ID: 200707220924.50949.vincenzo.romano@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Saturday 21 July 2007 08:00:11 Tom Lane wrote:
> "Josh Tolley" <eggyknap(at)gmail(dot)com> writes:
> > Might it just be that the original UNIQUE + NOT NULL index was
> > bloated or otherwise degraded, and reindexing it would have
> > resulted in the same performance gain? That's just a guess.
>
> Yeah. There is precious little difference between UNIQUE+NOT NULL
> and PRIMARY KEY --- to be exact, the latter will allow another
> table to reference this one in FOREIGN KEY without specifying
> column names. The planner knows nothing of that little convenience.
>
> The interesting thing about this report is that the plan changed
> after creating the new index. That has to mean that some statistic
> visible to the planner changed. Creating an index does update the
> pg_class columns about the table's size and number of rows, but
> probably those weren't that far off to start with. My bet is that
> the new index is a lot smaller than the old because of bloat in the
> old index. If so, REINDEX would have had the same result.
>
> regards, tom lane

I've done a bit deeper analisys.

In the original setup, the "UNIQUE" constraint had been dropped
*before* doing the tests. So the "slow" case is without the UNIQUE
constraint but with an index. The NOT NULL was instead there.

What I don't understand is why the planner in order to accomplish
a JOIN does the sort if it has no UNIQUEness constraint and doesn't
need to sort if it has.

--
Vincenzo Romano
--
Maybe Computer will never become as intelligent as Humans.
For sure they won't ever become so stupid. [VR-1988]


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com>
Cc: "Josh Tolley" <eggyknap(at)gmail(dot)com>, "Michael Glaesemann" <grzm(at)seespotcode(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-22 17:20:08
Message-ID: 1905.1185124808@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com> writes:
> In the original setup, the "UNIQUE" constraint had been dropped
> *before* doing the tests. So the "slow" case is without the UNIQUE
> constraint but with an index. The NOT NULL was instead there.

With what index, pray tell?

regards, tom lane


From: Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Josh Tolley" <eggyknap(at)gmail(dot)com>, "Michael Glaesemann" <grzm(at)seespotcode(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-22 17:31:08
Message-ID: 200707221931.09970.vincenzo.romano@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sunday 22 July 2007 19:20:08 Tom Lane wrote:
> Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com> writes:
> > In the original setup, the "UNIQUE" constraint had been dropped
> > *before* doing the tests. So the "slow" case is without the
> > UNIQUE constraint but with an index. The NOT NULL was instead
> > there.
>
> With what index, pray tell?
>
> regards, tom lane

Sorry for the incomplete sentence.
Read it as:

In the original setup, the "UNIQUE" constraint had been dropped
*before* doing the tests. So the "slow" case is without the
UNIQUE constraint but with an index on NOT NULL fields.

The "fast" case was with the primary key on the very same fields
in the very same order.

--
Vincenzo Romano
--
Maybe Computer will never become as intelligent as Humans.
For sure they won't ever become so stupid. [VR-1988]


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com>
Cc: "Josh Tolley" <eggyknap(at)gmail(dot)com>, "Michael Glaesemann" <grzm(at)seespotcode(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-22 18:09:03
Message-ID: 2454.1185127743@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com> writes:
> On Sunday 22 July 2007 19:20:08 Tom Lane wrote:
>> With what index, pray tell?

> In the original setup, the "UNIQUE" constraint had been dropped
> *before* doing the tests. So the "slow" case is without the
> UNIQUE constraint but with an index on NOT NULL fields.

You haven't said where you think this index is coming from.

regards, tom lane


From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Vincenzo Romano" <vincenzo(dot)romano(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Difference between PRIMARY KEY index and UNIQUE-NOT NULL index
Date: 2007-07-23 07:21:53
Message-ID: 758d5e7f0707230021p5c78e9b1pa6eeb1bf30f0bdb3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 7/22/07, Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com> wrote:
> On Sunday 22 July 2007 19:20:08 Tom Lane wrote:
> > Vincenzo Romano <vincenzo(dot)romano(at)gmail(dot)com> writes:
> > > In the original setup, the "UNIQUE" constraint had been dropped
> > > *before* doing the tests. So the "slow" case is without the
> > > UNIQUE constraint but with an index. The NOT NULL was instead
> > > there.
> >
> > With what index, pray tell?
> >
> > regards, tom lane
>
> Sorry for the incomplete sentence.
> Read it as:
>
> In the original setup, the "UNIQUE" constraint had been dropped
> *before* doing the tests. So the "slow" case is without the
> UNIQUE constraint but with an index on NOT NULL fields.

Control question: did you recreate non-unique index after dropping
the UNIQUE constraint?

Regards,
Dawid