Re: Support for REINDEX CONCURRENTLY

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2012-12-17 02:44:00
Message-ID: CAB7nPqRMQzJvVGJuDUGAN9RtohmbhJgsBSXDTZLvicvAEuc1LQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for all your comments.
The new version (v5) of this patch fixes the error you found when
reindexing indexes being referenced in foreign keys.
The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c, in
charge of scanning pg_constraint for foreign keys that refer the parent
relation (confrelid) of the index being swapped and then switch conindid to
the new index if the old index was referenced.
This API also takes care of switching the dependency between the foreign
key and the old index by calling changeDependencyFor.
I also added a regression test for this purpose.

On Tue, Dec 11, 2012 at 12:28 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Some review comments:
>
> * Some of the added !is_reindex in index_create don't seem safe to
> me.
>
This is added to control concurrent index relation for toast indexes. If we
do not add an additional flag for that it will not be possible to reindex
concurrently a toast index.

> * Why do we now support reindexing exclusion constraints?
>
CREATE INDEX CONCURRENTLY is not supported for exclusive constraints but I
played around with exclusion constraints with my patch and did not
particularly see any problems in supporting them as for example index_build
performs a second scan of the heap when running so it looks enough solid
for that. Is it because the structure of REINDEX CONCURRENTLY patch is
different? Honestly I think no so is there something I am not aware of?

* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
> concurrent reindexing for user-tables and non-concurrent for system
> tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...
>
OK. I thought that this was out of scope for the time being. I haven't done
anything about that yet. Supporting that will not be complicated as
ReindexRelationsConcurrently (new API) is more flexible now, the only thing
needed is to gather the list of relations that need to be reindexed.

* ISTM index_concurrent_swap should get exlusive locks on the relation
> *before* printing their names. This shouldn't be required because we
> have a lock prohibiting schema changes on the parent table, but it
> feels safer.
>
Done. AccessExclusiveLock is taken before calling RenameRelationInternal
now.

* temporary index names during swapping should also be named via
> ChooseIndexName
>
Done. I used instead ChooseRelationName which is externalized through
defrem.h.

> * why does create_toast_table pass an unconditional 'is_reindex' to
> index_create?
>
Done. The flag is changed to false.

> * would be nice (but thats probably a step #2 thing) to do the
> individual steps of concurrent reindex over multiple relations to
> avoid too much overall waiting for other transactions.
>
I think I did that by now using one transaction per index for each
operation except the drop phase...

> * ReindexConcurrentIndexes:
>

I renamed ReindexConcurrentIndexes to ReindexRelationsConcurrently and
changed the arguments it used to something more generic:
ReindexRelationsConcurrently(List *relationIds)
relationIds is a list of relation Oids that can be include tables and/or
indexes Oid.
Based on this list of relation Oid, we build the list of indexes that are
rebuilt, including the toast indexes if necessary.

* says " Such indexes are simply bypassed if caller has not specified
> anything." but ERROR's. Imo ERROR is fine, but the comment should be
> adjusted...
>
Done.

>
> * should perhaps be names ReindexIndexesConcurrently?
>
Kind of done.

> * Imo the PHASE 1 comment should be after gathering/validitating the
> chosen indexes
>
Comment is moved. Thanks.

> * It seems better to me to do use individual transactions + snapshots
> for each index, no need to keep very long transactions open (PHASE
> 2/3)
>
Good point. I did that. Now individual transactions are used for each index.

> * s/same whing/same thing/
>
Done.

> * Shouldn't a CacheInvalidateRelcacheByRelid be done after PHASE 2 and
> 5 as well?
>
Done. Nice catch.

> * PHASE 6 should acquire exlusive locks on the indexes
>
The necessary lock is taken when calling index_drop through
performMultipleDeletion. Do you think it is not enough and that i should
add an Exclusive lock inside index_concurrent_drop?

* can some of index_concurrent_* infrastructure be reused for
> DROP INDEX CONCURRENTLY?
>
Indeed. After looking at the code I found that that 2 steps are done in a
concurrent context: invalidating the index and set it as dead.
As REINDEX CONCURRENTLY does the following 2 steps in batch for a list of
indexes, I added index_concurrent_set_dead to set up the dropped indexes as
dead, and index_concurrent_clear_valid. Those 2 functions are used by both
REINDEX CONCURRENTLY and DROP INDEX CONCURRENTLY.

* in CREATE/DROP INDEX CONCURRENTLY 'CONCURRENTLY comes before the
> object name, should we keep that conventioN?
>
Good point. I changed the grammar to REINDEX obj [ CONCURRENTLY ] objname.

Thanks,
--
Michael Paquier
http://michael.otacoo.com

Attachment Content-Type Size
20121217_reindex_concurrently_v5.patch application/octet-stream 77.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-12-17 02:45:20 Re: Unresolved error 0xC0000409 on Windows Server
Previous Message Noah Misch 2012-12-17 01:17:58 Re: Adjusting elog behavior in bootstrap/standalone mode