Quick Links

Re: Support for REINDEX CONCURRENTLY

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for REINDEX CONCURRENTLY
Date:	2013-01-15 09:16:59
Message-ID:	CAB7nPqSN+TKihL6enssBYTxfikGK9CBRFt-SVuHPqnQ=FVWWsQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

OK. I am back to this patch after a too long time.

Please find an updated version of the patch attached (v6). I address all
the previous comments, except regarding the support for REINDEX DATABASE
CONCURRENTLY. I am working on that precisely but I am not sure it is that
straight-forward...

On Wed, Dec 19, 2012 at 11:24 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2012-12-17 11:44:00 +0900, Michael Paquier wrote:
> > Thanks for all your comments.
> > The new version (v5) of this patch fixes the error you found when
> > reindexing indexes being referenced in foreign keys.
> > The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c,
> in
> > charge of scanning pg_constraint for foreign keys that refer the parent
> > relation (confrelid) of the index being swapped and then switch conindid
> to
> > the new index if the old index was referenced.
> > This API also takes care of switching the dependency between the foreign
> > key and the old index by calling changeDependencyFor.
> > I also added a regression test for this purpose.
>
> Ok. Are there no other depencencies towards indexes? I don't know of any
> right now, but I have the feeling there were some other cases.
>
The patch cover the cases of PRIMARY, UNIQUE, normal indexes, exclusion
constraints and foreign keys. Just based on the docs, I don't think there
is something missing.
http://www.postgresql.org/docs/9.2/static/ddl-constraints.html

> > * REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
> > > concurrent reindexing for user-tables and non-concurrent for system
> > > tables would be very useful. E.g. for the upgrade from
> 9.1.5->9.1.6...
> > >
> > OK. I thought that this was out of scope for the time being. I haven't
> done
> > anything about that yet. Supporting that will not be complicated as
> > ReindexRelationsConcurrently (new API) is more flexible now, the only
> thing
> > needed is to gather the list of relations that need to be reindexed.
>
> Imo that so greatly reduces the usability of this patch that you should
> treat it as in scope ;). Especially as you say, it really shouldn't be
> that much work with all the groundwork built.
>
OK. So... What should we do when a REINDEX DATABASE CONCURRENTLY is done?
- only reindex user tables and bypass system tables?
- reindex user tables concurrently and system tables non-concurrently?
- forbid this operation when this operation is done on a database having
system tables?
Some input?
Btw, the attached version of the patch does not include this feature yet
but I am working on it.

>
> > > * would be nice (but thats probably a step #2 thing) to do the
> > > individual steps of concurrent reindex over multiple relations to
> > > avoid too much overall waiting for other transactions.
> > >
> > I think I did that by now using one transaction per index for each
> > operation except the drop phase...
>
> Without yet having read the new version, I think thats not what I
> meant. There currently is a wait for concurrent transactions to end
> after most of the phases for every relation, right? If you have a busy
> database with somewhat longrunning transactions thats going to slow
> everything down with waiting quite bit. I wondered whether it would make
> sense to do PHASE1 for all indexes in all relations, then wait once,
> then PHASE2...

That obviously has some space and index maintainece overhead issues, but
> its probably sensible anyway in many cases.
>
OK, phase 1 is done with only one transaction for all the indexes. Do you
mean that we should do that with a single transaction for each index?

> Isn't the following block content thats mostly available somewhere else
> already?
> [... doc extract ...]
>
Yes, this portion of the docs is pretty similar to what is findable in
CREATE INDEX CONCURRENTLY. Why not creating a new common documentation
section that CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY could refer
to? I think we should first work on the code and then do the docs properly
though.

> > - if (concurrent && is_exclusion)
> > + if (concurrent && is_exclusion && !is_reindex)
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > errmsg_internal("concurrent index
> creation for exclusion constraints is not supported")));
>
> This is what I referred to above wrt reindex and CONCURRENTLY. We
> shouldn't pass concurrently if we don't deem it to be safe for exlusion
> constraints.
>
So does that mean that it is not possible to create an exclusive constraint
in a concurrent context? Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completely new
index. Shouldn't this work for indexes used by exclusion indexes also?

> > +/*
> > + * index_concurrent_drop
> > + *
> > + * Drop a list of indexes as the last step of a concurrent process.
> Deletion is
> > + * done through performDeletion or dependencies of the index are not
> dropped.
> > + * At this point all the indexes are already considered as invalid and
> dead so
> > + * they can be dropped without using any concurrent options.
> > + */
> > +void
> > +index_concurrent_drop(List *indexIds)
> > +{
> > + ListCell *lc;
> > + ObjectAddresses *objects = new_object_addresses();
> > +
> > + Assert(indexIds != NIL);
> > +
> > + /* Scan the list of indexes and build object list for normal
> indexes */
> > + foreach(lc, indexIds)
> > + {
> > + Oid indexOid = lfirst_oid(lc);
> > + Oid constraintOid =
> get_index_constraint(indexOid);
> > + ObjectAddress object;
> > +
> > + /* Register constraint or index for drop */
> > + if (OidIsValid(constraintOid))
> > + {
> > + object.classId = ConstraintRelationId;
> > + object.objectId = constraintOid;
> > + }
> > + else
> > + {
> > + object.classId = RelationRelationId;
> > + object.objectId = indexOid;
> > + }
> > +
> > + object.objectSubId = 0;
> > +
> > + /* Add object to list */
> > + add_exact_object_address(&object, objects);
> > + }
> > +
> > + /* Perform deletion for normal and toast indexes */
> > + performMultipleDeletions(objects,
> > + DROP_RESTRICT,
> > + 0);
> > +}
>
> Just for warm and fuzzy feeling I think it would be a good idea to
> recheck that indexes are !indislive here.
>
OK done. The indexes with indislive set at true are not bypassed now.

> diff --git a/src/backend/catalog/pg_constraint.c
> b/src/backend/catalog/pg_constraint.c
> > index 5e8c6da..55c092d 100644
> > +
> > +/*
> > + * switchIndexConstraintOnForeignKey
> > + *
> > + * Switch foreign keys references for a given index to a new index
> created
> > + * concurrently. This process is used when swapping indexes for a
> concurrent
> > + * process. All the constraints that are not referenced externally like
> primary
> > + * keys or unique indexes should be switched using the structure of
> index.c for
> > + * concurrent index creation and drop.
> > + * This function takes care of also switching the dependencies of the
> foreign
> > + * key from the old index to the new index in pg_depend.
> > + *
> > + * In order to complete this process, the following process is done:
> > + * 1) Scan pg_constraint and extract the list of foreign keys that
> refer to the
> > + * parent relation of the index being swapped as conrelid.
> > + * 2) Check in this list the foreign keys that use the old index as
> reference
> > + * here with conindid
> > + * 3) Update field conindid to the new index Oid on all the foreign keys
> > + * 4) Switch dependencies of the foreign key to the new index
> > + */
> > +void
> > +switchIndexConstraintOnForeignKey(Oid parentOid,
> > + Oid
> oldIndexOid,
> > + Oid
> newIndexOid)
> > +{
> > + ScanKeyData skey[1];
> > + SysScanDesc conscan;
> > + Relation conRel;
> > + HeapTuple htup;
> > +
> > + /*
> > + * Search pg_constraint for the foreign key constraints associated
> > + * with the index by scanning using conrelid.
> > + */
> > + ScanKeyInit(&skey[0],
> > + Anum_pg_constraint_confrelid,
> > + BTEqualStrategyNumber, F_OIDEQ,
> > + ObjectIdGetDatum(parentOid));
> > +
> > + conRel = heap_open(ConstraintRelationId, AccessShareLock);
> > + conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
> > + true,
> SnapshotNow, 1, skey);
> > +
> > + while (HeapTupleIsValid(htup = systable_getnext(conscan)))
> > + {
> > + Form_pg_constraint contuple = (Form_pg_constraint)
> GETSTRUCT(htup);
> > +
> > + /* Check if a foreign constraint uses the index being
> swapped */
> > + if (contuple->contype == CONSTRAINT_FOREIGN &&
> > + contuple->confrelid == parentOid &&
> > + contuple->conindid == oldIndexOid)
> > + {
> > + /* Found an index, so update its pg_constraint
> entry */
> > + contuple->conindid = newIndexOid;
> > + /* And write it back in place */
> > + heap_inplace_update(conRel, htup);
>
> I am pretty doubtful that using heap_inplace_update is the correct thing
> to do here. What if we fail later? Even if there's some justification
> for it being safe it deserves a big comment.
>
> The other cases where heap_inplace_update is used in the context of
> CONCURRENTLY are pretty careful about where to do it and have special
> state flags of indicating that this has been done...
>
Oops, fixed. I changed it to simple_heap_update.

> >
> > +bool
> > +ReindexRelationsConcurrently(List *relationIds)
> > +{
> > + foreach(lc, relationIds)
> > + {
> > + Oid relationOid = lfirst_oid(lc);
> > +
> > + switch (get_rel_relkind(relationOid))
> > + {
> > + case RELKIND_RELATION:
> > + {
> > + /*
> > + * In the case of a relation, find
> all its indexes
> > + * including toast indexes.
> > + */
> > + Relation heapRelation =
> heap_open(relationOid,
> > +
> ShareUpdateExclusiveLock);
> > +
> > + /* Relation on which is based
> index cannot be shared */
> > + if
> (heapRelation->rd_rel->relisshared)
> > + ereport(ERROR,
> > +
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > +
> errmsg("concurrent reindex is not supported for shared relations")));
> > +
> > + /* Add all the valid indexes of
> relation to list */
> > + foreach(lc2,
> RelationGetIndexList(heapRelation))
> > + {
> > + Oid
> cellOid = lfirst_oid(lc2);
> > + Relation
> indexRelation = index_open(cellOid,
> > +
> ShareUpdateExclusiveLock);
> > +
> > + if
> (!indexRelation->rd_index->indisvalid)
> > + ereport(WARNING,
> > +
> (errcode(ERRCODE_INDEX_CORRUPTED),
> > +
> errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
> > +
> get_namespace_name(get_rel_namespace(cellOid)),
> > +
> get_rel_name(cellOid))));
> > + else
> > + indexIds =
> list_append_unique_oid(indexIds,
> > +
> cellOid);
> > +
> > + index_close(indexRelation,
> ShareUpdateExclusiveLock);
> > + }
>
> Why are we releasing the locks here if we are going to reindex the
> relations? They might change inbetween. I think we should take an
> appropriate lock here, including the locks on the parent relations. Yes,
> its slightly more duplicative code, and not acquiring locks multiple
> times is somewhat complicated, but I think its required.
>
OK, the locks are now maintained until the end of the transaction and when
the session locks are taken on those relations, so it will not be possible
to have schema changes between the moment where the list of indexes is
built and the moment the session locks are taken.

> I think you should also explicitly do the above in a transaction...
>
I am not sure I get your point here. This phase is in place to gather the
list of all the indexes to reindex based on the list of relations given by
caller.

>
> > + /*
> > + * Phase 2 of REINDEX CONCURRENTLY
> > + *
> > + * Build concurrent indexes in a separate transaction for each
> index to
> > + * avoid having open transactions for an unnecessary long time.
> We also
> > + * need to wait until no running transactions could have the
> parent table
> > + * of index open. A concurrent build is done for each concurrent
> > + * index that will replace the old indexes.
> > + */
> > +
> > + /* Get the first element of concurrent index list */
> > + lc2 = list_head(concurrentIndexIds);
> > +
> > + foreach(lc, indexIds)
> > + {
> > + Relation indexRel;
> > + Oid indOid = lfirst_oid(lc);
> > + Oid concurrentOid = lfirst_oid(lc2);
> > + Oid relOid;
> > + bool primary;
> > + LOCKTAG *heapLockTag = NULL;
> > + ListCell *cell;
> > +
> > + /* Move to next concurrent item */
> > + lc2 = lnext(lc2);
> > +
> > + /* Start new transaction for this index concurrent build */
> > + StartTransactionCommand();
> > +
> > + /* Get the parent relation Oid */
> > + relOid = IndexGetRelation(indOid, false);
> > +
> > + /*
> > + * Find the locktag of parent table for this index, we
> need to wait for
> > + * locks on it.
> > + */
> > + foreach(cell, lockTags)
> > + {
> > + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
> > + if (relOid == localTag->locktag_field2)
> > + heapLockTag = localTag;
> > + }
> > +
> > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> InvalidOid);
> > + WaitForVirtualLocks(*heapLockTag, ShareLock);
>
> Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
> once for all relations after each phase? Otherwise the waiting time will
> really start to hit when you do this on a somewhat busy server.
>
Each new index is built and set as ready in a separate single transaction,
so doesn't it make sense to wait for the parent relation each time. It is
possible to wait for a parent relation only once during this phase but in
this case all the indexes of the same relation need to be set as ready in
the same transaction. So here the choice is either to wait for the same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to my mind as
the phase 2 is done clearly for each index separately. Thoughts?

>
> > + /*
> > + * Invalidate the relcache for the table, so that after
> this commit all
> > + * sessions will refresh any cached plans taht might
> reference the index.
> > + */
> > + CacheInvalidateRelcacheByRelid(relOid);
>
> I am not sure whether I suggested adding a
> CacheInvalidateRelcacheByRelid here, but afaics its not required yet,
> the plan isn't valid yet, so no need for replanning.
>
Sure I removed it.

> > + indexRel = index_open(indOid, ShareUpdateExclusiveLock);
>
> I wonder we should directly open it exlusive here given its going to
> opened exclusively in a bit anyway. Not that that will really reduce the
> deadlock likelihood since we already hold the ShareUpdateExclusiveLock
> in session mode ...
>
I tried to use an AccessExclusiveLock here but it happens that this is not
compatible with index_set_state_flags. Does taking an exclusive lock
increments the transaction ID of running transaction? Because what I am
seeing is that taking AccessExclusiveLock on this index does a transaction
update.
For those reasons current code sticks with ShareUpdateExclusiveLock. Not a
big deal btw...

> > + /*
> > + * Phase 5 of REINDEX CONCURRENTLY
> > + *
> > + * The old indexes need to be marked as not ready. We need also to
> wait for
> > + * transactions that might use them. Each operation is performed
> with a
> > + * separate transaction.
> > + */
> > +
> > + /* Mark the old indexes as not ready */
> > + foreach(lc, indexIds)
> > + {
> > + LOCKTAG *heapLockTag;
> > + Oid indOid = lfirst_oid(lc);
> > + Oid relOid;
> > +
> > + StartTransactionCommand();
> > + relOid = IndexGetRelation(indOid, false);
> > +
> > + /*
> > + * Find the locktag of parent table for this index, we
> need to wait for
> > + * locks on it.
> > + */
> > + foreach(lc2, lockTags)
> > + {
> > + LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
> > + if (relOid == localTag->locktag_field2)
> > + heapLockTag = localTag;
> > + }
> > +
> > + Assert(heapLockTag && heapLockTag->locktag_field2 !=
> InvalidOid);
> > +
> > + /* Finish the index invalidation and set it as dead */
> > + index_concurrent_set_dead(indOid, relOid, *heapLockTag);
> > +
> > + /* Commit this transaction to make the update visible. */
> > + CommitTransactionCommand();
> > + }
>
> No waiting here?
>
A wait phase is done inside index_concurrent_set_dead, so no problem.

> > + StartTransactionCommand();
> > +
> > + /* Get fresh snapshot for next step */
> > + PushActiveSnapshot(GetTransactionSnapshot());
> > +
> > + /*
> > + * Phase 6 of REINDEX CONCURRENTLY
> > + *
> > + * Drop the old indexes. This needs to be done through
> performDeletion
> > + * or related dependencies will not be dropped for the old
> indexes. The
> > + * internal mechanism of DROP INDEX CONCURRENTLY is not used as
> here the
> > + * indexes are already considered as dead and invalid, so they
> will not
> > + * be used by other backends.
> > + */
> > + index_concurrent_drop(indexIds);
> > +
> > + /*
> > + * Last thing to do is release the session-level lock on the
> parent table
> > + * and the indexes of table.
> > + */
> > + foreach(lc, relationLocks)
> > + {
> > + LockRelId lockRel = * (LockRelId *) lfirst(lc);
> > + UnlockRelationIdForSession(&lockRel,
> ShareUpdateExclusiveLock);
> > + }
> > +
> > + /* We can do away with our snapshot */
> > + PopActiveSnapshot();
>
> I think I would do the drop in individual transactions as well.
>
Done. Each drop is now done in a single transaction.
--
Michael Paquier
http://michael.otacoo.com

Attachment	Content-Type	Size
20130115_reindex_concurrently_v6.patch	application/octet-stream	78.2 KB

In response to

Re: Support for REINDEX CONCURRENTLY at 2012-12-19 02:24:55 from Andres Freund

Responses

Re: Support for REINDEX CONCURRENTLY at 2013-01-16 06:33:15 from Michael Paquier
Re: Support for REINDEX CONCURRENTLY at 2013-01-23 18:41:34 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dickson S. Guedes	2013-01-15 10:14:36	Re: pg_ctl idempotent option
Previous Message	Magnus Hagander	2013-01-15 07:51:36	Re: pg_retainxlog for inclusion in 9.3?