Re: Allow replacement of bloated primary key indexes without foreign key rebuilds

Lists: pgsql-hackers
From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-07 03:53:12
Message-ID: CABwTF4UxTg+kERo1Nd4dt+H2miJoLPcASMFecS1-XHijABOpPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Today I learnt [1,2,3] that the feature ALTER TABLE .. ADD CONSTRAINT ...
USING INDEX we added back in the day is not so useful in the field. Having
to drop foreign key constraints before this command, and recreate them
afterwards makes this command useless to most database setups. I feel sorry
that no one brought this up when we were implementing the feature; maybe we
could've done something about it right then.

I wish to correct it now, and did some research. Long story short, I
realized that the foreign key constraint depends on the index relation of
the primary key, and does not depend on the constraint object as I had
expected (Please see rows 5 and 11 of the result set shown below). This
behaviour is also seen when the FKey references a unique constraint. As
much as that perplexes me, I think it makes our job a bit easier.

All we need to do is allow swapping of pg_class.relfilenode of two indexes.
This will let the dependency entries stand as they are and allow us to drop
the bloated primary key index structure without having to rebuild the
foreign key constraints.

As for the syntactical sugar, this can be added to either ALTER TABLE or to
ALTER INDEX. Although under no normal circumstances one would need to use
ALTER INDEX to swap two indexes' relfilenode (because one can easily create
a duplicate index and drop/rename-in-place the old one), I think it would
make more sense here since it is just an operation on two indexes and has
nothing to do with the constraints, apart from the fact that we want to use
this feature to meddle with the constraints.

Syntax options:

ALTER TABLE tbl REPLACE [CONSTRAINT constr] {PRIMARY KEY | UNIQUE} USING
INDEX new_index;

ALTER INDEX ind REPLACE WITH new_index;

Note that in both the syntaxes, it is assumed that all remnants of
new_index will be gone after the command completes successfully; that is,
the commands will behave as if they deleted the index structure of the
index being replaced and placed the new structure in its place, while
dropping the index that was used for the replacement.

I don't think we need to ensure that the new_index is completely flushed to
disk before the operation, but we do need to issue relevant cache
invalidation messages after the operation is done.

For replacement to be successful, new_index should not be associated with
any constraints, and, new_index should be identical to the index being
replaced, except for the index names.

The ALTER TABLE syntax closely emulates the existing syntax of replacing a
constraint using an existing index, but looking at the grammar construction
I feel that it may be more complex to implement than the ALTER INDEX syntax.

ALTER INDEX feels easier to do, since we won't have to jump through hoops
like in ALTER TABLE's multi-command support (ATExec*() functions), and
dropping the new_index might be easier to do.

Thoughts?

postgres=# \d+ test
Table "public.test"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
a | integer | not null | plain | |
b | integer | | plain | |
Indexes:
"test_pkey" PRIMARY KEY, btree (a)
"temp_idx" UNIQUE, btree (a)
Referenced by:
TABLE "test2" CONSTRAINT "test2_b_fkey" FOREIGN KEY (b) REFERENCES
test(a)
Has OIDs: no

postgres=# \d+ test2
Table "public.test2"
Column | Type | Modifiers | Storage | Stats target | Description
--------+---------+-----------+---------+--------------+-------------
a | integer | | plain | |
b | integer | | plain | |
Foreign-key constraints:
"test2_b_fkey" FOREIGN KEY (b) REFERENCES test(a)
Has OIDs: no

Relevant output of query [4] on pg_depend:

classid | objid | objid | refclassid |
refobjid | refobjid | deptype
---------------+-------+------------------------------+---------------+----------+--------------+---------
pg_class | 16413 | test | pg_namespace |
2200 | public | n
pg_type | 16415 | test | pg_class |
16413 | test | i
pg_type | 16414 | test[] | pg_type |
16415 | test | i
pg_constraint | 16417 | test_pkey | pg_class |
16413 | test | a
pg_class | 16416 | test_pkey | pg_constraint |
16417 | test_pkey | i
pg_class | 16418 | test2 | pg_namespace |
2200 | public | n
pg_type | 16420 | test2 | pg_class |
16418 | test2 | i
pg_type | 16419 | test2[] | pg_type |
16420 | test2 | i
pg_constraint | 16421 | test2_b_fkey | pg_class |
16413 | test | n
pg_constraint | 16421 | test2_b_fkey | pg_class |
16418 | test2 | a
pg_constraint | 16421 | test2_b_fkey | pg_class |
16416 | test_pkey | n
pg_trigger | 16422 | RI_ConstraintTrigger_a_16422 | pg_constraint |
16421 | test2_b_fkey | i
pg_trigger | 16423 | RI_ConstraintTrigger_a_16423 | pg_constraint |
16421 | test2_b_fkey | i
pg_trigger | 16424 | RI_ConstraintTrigger_c_16424 | pg_constraint |
16421 | test2_b_fkey | i
pg_trigger | 16425 | RI_ConstraintTrigger_c_16425 | pg_constraint |
16421 | test2_b_fkey | i

[1] http://archives.postgresql.org/pgsql-general/2012-07/msg00104.php
[2] http://archives.postgresql.org/pgsql-general/2012-07/msg00105.php
[3] http://archives.postgresql.org/pgsql-general/2012-07/msg00110.php
[4] select classid::regclass, objid, case classid::regclass::text when
'pg_class' then objid::regclass::text when 'pg_type' then
objid::regtype::text when 'pg_constraint' then (select conname from
pg_constraint where oid = objid) when 'pg_namespace' then (select nspname
from pg_namespace where oid = objid) when 'pg_trigger' then (select tgname
from pg_trigger where oid = objid) else objid::text end,
refclassid::regclass, refobjid, case refclassid::regclass::text when
'pg_class' then refobjid::regclass::text when 'pg_type' then
refobjid::regtype::text when 'pg_constraint' then (select conname from
pg_constraint where oid = refobjid) when 'pg_namespace' then (select
nspname from pg_namespace where oid = refobjid) when 'pg_trigger' then
(select tgname from pg_trigger where oid = refobjid) else refobjid::text
end, deptype from pg_depend;

--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


From: Greg Stark <stark(at)mit(dot)edu>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-10 14:30:29
Message-ID: CAM-w4HNyq+Ljq-4Kav0Y1bf_rB--+MBkxYxMNpYOCwHGhiwNSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 7, 2012 at 4:53 AM, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
> All we need to do is allow swapping of pg_class.relfilenode of two indexes.
> This will let the dependency entries stand as they are and allow us to drop
> the bloated primary key index structure without having to rebuild the
> foreign key constraints.

Fwiw I don't like swapping relfilenodes on indexes the user created.
REINDEX currently does this but it's a bit of a hack and only works
because reindex carefully builds the new index with exactly the same
definition as the old one.

The problem you describe is one of constraints and dependencies and
not one of indexes. It seems what you really want is a way to alter
foreign key dependencies to depend on a new index. Either an explicit
command that lets you set the new dependency or what seems even better
would be to have DROP INDEX check any dependent objects to see if
there's another index that can satisfy them and change their
dependency.

These might suffer from deadlock problems but hopefully they could be
manageable since it's not a frequent operation and there aren't any
other operations that rejigger dependencies.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-10 14:44:03
Message-ID: 21013.1341931443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> On Sat, Jul 7, 2012 at 4:53 AM, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
>> All we need to do is allow swapping of pg_class.relfilenode of two indexes.

> Fwiw I don't like swapping relfilenodes on indexes the user created.
> REINDEX currently does this but it's a bit of a hack and only works
> because reindex carefully builds the new index with exactly the same
> definition as the old one.

Yes. The swap-relfilenodes operation would have to carefully check that
the index definitions were exactly equivalent, and there would be a
constant risk for bugs of omission if that code weren't taught about
any new index properties we invent.

> The problem you describe is one of constraints and dependencies and
> not one of indexes. It seems what you really want is a way to alter
> foreign key dependencies to depend on a new index. Either an explicit
> command that lets you set the new dependency or what seems even better
> would be to have DROP INDEX check any dependent objects to see if
> there's another index that can satisfy them and change their
> dependency.

Either of these have exactly the same issue, namely their correctness
depends on determining if two indexes have identical properties.

All of these things seem like ugly, hard-to-use kluges anyway (the
make-sure-the-indexes-match business is just as much of a PITA for the
DBA as it is for the system). What we really want is REINDEX
CONCURRENTLY.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-10 15:11:50
Message-ID: CAM-w4HNevJ6HkMW3UBSRnm_DjMPuBzXwWZQzWqnN3SerMZeVhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 10, 2012 at 3:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The problem you describe is one of constraints and dependencies and
>> not one of indexes. It seems what you really want is a way to alter
>> foreign key dependencies to depend on a new index. Either an explicit
>> command that lets you set the new dependency or what seems even better
>> would be to have DROP INDEX check any dependent objects to see if
>> there's another index that can satisfy them and change their
>> dependency.
>
> Either of these have exactly the same issue, namely their correctness
> depends on determining if two indexes have identical properties.

This doesn't sound right to me. In these cases all it would have to
know about is the same set of properties that CREATE CONSTRAINT looks
for to find a satisfactory index to depend on.

--
greg


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-10 15:40:08
Message-ID: 1341934770-sup-6570@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of mar jul 10 10:44:03 -0400 2012:

> All of these things seem like ugly, hard-to-use kluges anyway (the
> make-sure-the-indexes-match business is just as much of a PITA for the
> DBA as it is for the system). What we really want is REINDEX
> CONCURRENTLY.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961 ?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-10 16:02:48
Message-ID: 22748.1341936168@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of mar jul 10 10:44:03 -0400 2012:
>> What we really want is REINDEX CONCURRENTLY.

> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.9961 ?

Hm ... that paper looks like something we might want to incorporate into
btree's VACUUM processing, but it's not very on-point if someone really
wants to rebuild the index totally.

regards, tom lane


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-12 23:08:36
Message-ID: CABwTF4X6q7K=bANQ6gM7n_CTTaLJzjexCBW0VOb7+CLZ4x07fA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 10, 2012 at 10:44 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Greg Stark <stark(at)mit(dot)edu> writes:
> > On Sat, Jul 7, 2012 at 4:53 AM, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
> wrote:
> >> All we need to do is allow swapping of pg_class.relfilenode of two
> indexes.
>
> > Fwiw I don't like swapping relfilenodes on indexes the user created.
> > REINDEX currently does this but it's a bit of a hack and only works
> > because reindex carefully builds the new index with exactly the same
> > definition as the old one.
>
> Yes. The swap-relfilenodes operation would have to carefully check that
> the index definitions were exactly equivalent, and there would be a
> constant risk for bugs of omission if that code weren't taught about
> any new index properties we invent.
>

IMHO there must be many other places in this code-base where we run that
risk.

The way I am planning to do it was to compare all relevant fields of the
FormData_pg_index. And I am assuming anybody changing the struct members
will take care of relevant changes needed for this code too.

We can add a runtime/compile-time assert to make sure that
Natts_pg_index==17. That way, if a new column gets added, we will get
alerted promptly.

> All of these things seem like ugly, hard-to-use kluges anyway (the
> make-sure-the-indexes-match business is just as much of a PITA for the
> DBA as it is for the system). What we really want is REINDEX
> CONCURRENTLY.
>

+1, but I can't take on that task.

--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Allow replacement of bloated primary key indexes without foreign key rebuilds
Date: 2012-07-12 23:18:15
Message-ID: CABwTF4WYVdERR+Mrra-tPESDk7=Oi3v7mKFLWSLdOV653Jt+VA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 10, 2012 at 11:11 AM, Greg Stark <stark(at)mit(dot)edu> wrote:

> On Tue, Jul 10, 2012 at 3:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The problem you describe is one of constraints and dependencies and
> >> not one of indexes. It seems what you really want is a way to alter
> >> foreign key dependencies to depend on a new index. Either an explicit
> >> command that lets you set the new dependency or what seems even better
> >> would be to have DROP INDEX check any dependent objects to see if
> >> there's another index that can satisfy them and change their
> >> dependency.
> >
> > Either of these have exactly the same issue, namely their correctness
> > depends on determining if two indexes have identical properties.
>
> This doesn't sound right to me. In these cases all it would have to
> know about is the same set of properties that CREATE CONSTRAINT looks
> for to find a satisfactory index to depend on.
>

I like the DROP index idea, but the silent side-effect may not make people
happy. Can you give me a pointer to relevant code.

--
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company