Re: Support for REINDEX CONCURRENTLY

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2013-03-05 13:35:16
Message-ID: CAB7nPqRosYo2JQOmhVu+w8w3CtEY20Va_JATN=6Run8_em9EwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review. All your comments are addressed and updated patches
are attached.
Please see below for the details, and if you find anything else just let me
know.

On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> Have you benchmarked the toastrelidx removal stuff in any way? If not,
> thats fine, but if yes I'd be interested.
>
No I haven't. Is it really that easily measurable? I think not, but me too
I'd be interested in looking at such results.

> On 2013-03-04 22:33:53 +0900, Michael Paquier wrote:
> > + ListCell *lc;
> > + int count = 0;
>
> I find count a confusing name for a loop iteration variable... i of orr,
> idxno, or ...
>
That's only a matter of personal way of doing... But done for all the
functions I modified in this file.

> + if (toastrel->rd_indexvalid == 0)
> > + RelationGetIndexList(toastrel);
>
> Hm, I think we should move this into a macro, this is cropping up at
> more and more places.
>
This is not necessary. RelationGetIndexList does a check similar at its
top, so I simply removed all those checks.

> > + for (count = 0; count < num_indexes; count++)
> > + index_insert(toastidxs[count], t_values, t_isnull,
> > + &(toasttup->t_self),
> > + toastrel,
> > +
> toastidxs[count]->rd_index->indisunique ?
> > + UNIQUE_CHECK_YES :
> UNIQUE_CHECK_NO);
>
> The indisunique check looks like a copy & pasto to me, albeit not
> yours...
>
Yes it is the same for all the indexes normally, but it looks more solid to
me to do that as it is. So unchanged.

> + /*
> > + * We actually use only the first index but taking a lock on all is
> > + * necessary.
> > + */
>
> Hm, is it guaranteed that the first index is valid?
>
Not at all. Fixed. If all the indexes are invalid, an error is returned.

> + * If we're swapping two toast tables by content, do the same for
> all of
> > + * their indexes. The swap can actually be safely done only if all
> the indexes
> > + * have valid Oids.
>
> What's an index without a valid oid?
>
That's a good question... I re-read the code and it didn't any sense, so
switched to a check on empty index list for both relations.

> + /* Open relations */
> > + toastRel1 = heap_open(relform1->reltoastrelid,
> RowExclusiveLock);
> > + toastRel2 = heap_open(relform2->reltoastrelid,
> RowExclusiveLock);
>
> Shouldn't those be Access Exlusive Locks?
>
Yeah seems better for this swap.

>
> > + /* Obtain index list if necessary */
> > + if (toastRel1->rd_indexvalid == 0)
> > + RelationGetIndexList(toastRel1);
> > + if (toastRel2->rd_indexvalid == 0)
> > + RelationGetIndexList(toastRel2);
> > +
> > + /* Check if the swap is possible for all the toast indexes
> */
>
> So there's no error being thrown if this turns out not to be possible?
>
There are no errors also in the former process... This should fail
silently, no?

> > + if (count == 0)
> > + snprintf(NewToastName,
> NAMEDATALEN, "pg_toast_%u_index",
> > + OIDOldHeap);
> > + else
> > + snprintf(NewToastName,
> NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > + OIDOldHeap,
> count);
> > + RenameRelationInternal(lfirst_oid(lc),
> > +
> NewToastName);
> > + count++;
> > + }
>
> Hm. It seems wrong that this layer needs to know about _cct.
>
Any other idea? For the time being I removed cct and added only a suffix
based on the index number...

>
> > /*
> > - * Calculate total on-disk size of a TOAST relation, including its
> index.
> > + * Calculate total on-disk size of a TOAST relation, including its
> indexes.
> > * Must not be applied to non-TOAST relations.
> > */
> > static int64
> > @@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
> > {
> > ...
> > + /* Size is evaluated based on the first index available */
>
> Uh. Why? Imo all indexes should be counted.
>
They are! The comment only is incorrect. Fixed.

> > -#define CATALOG_VERSION_NO 201302181
> > +#define CATALOG_VERSION_NO 20130219
>
> Think you forgot a digit here ;)
>
Fixed.

> /*
> * This case is currently only supported during a concurrent index
> * rebuild, but there is no way to ask for it in the grammar otherwise
> * anyway.
> */
>
> Or similar.
>
Makes sense. Thanks.

> > + ReleaseSysCache(constTuple);
> > + }
>
> Very, very nitpicky, but I find "constTuple" to be confusing, I thought
> at first it meant that the tuple shouldn't be modified or something.
>
Made that clear.

> > + /*
> > + * Index is considered as a constraint if it is PRIMARY KEY or
> EXCLUSION.
> > + */
> > + isconstraint = indexRelation->rd_index->indisprimary ||
> > + indexRelation->rd_index->indisexclusion;
>
> unique constraints aren't mattering here?
>
No they are not. Unique indexes are not counted as constraints in the case
of index_create. Previous versions of the patch did that but there are
issues with unique indexes using expressions.

> > +/*
> > + * index_concurrent_swap
> > + *
> > + * Replace old index by old index in a concurrent context. For the time
> being
> > + * what is done here is switching the relation relfilenode of the
> indexes. If
> > + * extra operations are necessary during a concurrent swap, processing
> should
> > + * be added here. AccessExclusiveLock is taken on the index relations
> that are
> > + * swapped until the end of the transaction where this function is
> called.
> > + */
> > +void
> > +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
> > +{
> > + Relation oldIndexRel, newIndexRel, pg_class;
> > + HeapTuple oldIndexTuple, newIndexTuple;
> > + Form_pg_class oldIndexForm, newIndexForm;
> > + Oid tmpnode;
> > +
> > + /*
> > + * Take an exclusive lock on the old and new index before swapping
> them.
> > + */
> > + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
> > + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
> > +
> > + /* Now swap relfilenode of those indexes */
>
> Any chance to reuse swap_relation_files here? Not sure whether it would
> be beneficial given that it is more generic and normally works on a
> relation level...
>
Hum. I am not sure. The current way of doing is enough to my mind.

> We probably should remove the fsm of the index altogether after this?
>
The freespace map? Not sure it is necessary here. Isn't it going to be
removed with the relation anyway?

> + /* The lock taken previously is not released until the end of
> transaction */
> > + relation_close(oldIndexRel, NoLock);
> > + relation_close(newIndexRel, NoLock);
>
> It might be worthwile adding a heap_freetuple here for (old,
> new)IndexTuple, just to spare the reader the thinking whether it needs
> to be done.
>
Indeed, I forgot some cleanup here. Fixed.

> +/*
> > + * index_concurrent_drop
> > + */
>
> "or dependencies of the index would not get dropped"?
>
Fixed.

> > +void
> > +index_concurrent_drop(Oid indexOid)
> > +{
> > + Oid constraintOid =
> get_index_constraint(indexOid);
> > + ObjectAddress object;
> > + Form_pg_index indexForm;
> > + Relation pg_index;
> > + HeapTuple indexTuple;
> > + bool indislive;
> > +
> > + /*
> > + * Check that the index dropped here is not alive, it might be
> used by
> > + * other backends in this case.
> > + */
> > + pg_index = heap_open(IndexRelationId, RowExclusiveLock);
> > +
> > + indexTuple = SearchSysCacheCopy1(INDEXRELID,
> > +
> ObjectIdGetDatum(indexOid));
> > + if (!HeapTupleIsValid(indexTuple))
> > + elog(ERROR, "cache lookup failed for index %u", indexOid);
> > + indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
> > + indislive = indexForm->indislive;
> > +
> > + /* Clean up */
> > + heap_close(pg_index, RowExclusiveLock);
> > +
> > + /* Leave if index is still alive */
> > + if (indislive)
> > + return;
>
> This seems like a confusing path? Why is it valid to get here with a
> valid index and why is it ok to silently ignore that case?
>
I added that because of a comment of one of the past reviews. Personally I
think it makes more sense to remove that for clarity.

> + case RELKIND_RELATION:
> > + {
> > + /*
> > + * In the case of a relation, find all its
> indexes
> > + * including toast indexes.
> > + */
> > + Relation heapRelation =
> heap_open(relationOid,
> > +
> ShareUpdateExclusiveLock);
>
> Hm. This means we will not notice having about-to-be dropped indexes
> around. Which seems safe because locks will prevent that anyway...
>
I think that's OK as-is.

> + default:
> > + /* nothing to do */
> > + break;
>
> Shouldn't we error out?
>
Don't think so. For example what if the relation is a matview? For REINDEX
DATABASE this could finish as an error because a materialized view is
listed as a relation to reindex. I prefer having this path failing silently
and leave if there are no indexes.

> > + foreach(lc, indexIds)
> > + {
> > + Relation indexRel;
> > + Oid indOid = lfirst_oid(lc);
> > + Oid concurrentOid = lfirst_oid(lc2);
> > + bool primary;
> > +
> > + /* Move to next concurrent item */
> > + lc2 = lnext(lc2);
>
> forboth()
>
Oh, I didn't know this trick. Thanks.

> > + /*
> > + * Phase 3 of REINDEX CONCURRENTLY
> > + *
> > + * During this phase the concurrent indexes catch up with the
> INSERT that
> > + * might have occurred in the parent table and are marked as valid
> once done.
> > + *
> > + * We once again wait until no transaction can have the table open
> with
> > + * the index marked as read-only for updates. Each index
> validation is done
> > + * with a separate transaction to avoid opening transaction for an
> > + * unnecessary too long time.
> > + */
>
> Maybe I am being dumb because I have the feeling I said differently in
> the past, but why do we not need a WaitForMultipleVirtualLocks() here?
> The comment seems to say we need to do so.
>
Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything at once
to reduce possible conflicts.

> > + /*
> > + * Finish the index invalidation and set it as dead. It is
> not
> > + * necessary to wait for virtual locks on the parent
> relation as it
> > + * is already sure that this session holds sufficient
> locks.s
> > + */
>
> tiny typo (lock.s)
>
Fixed.

> > + /*
> > + * Phase 6 of REINDEX CONCURRENTLY
> > + *
> > + * Drop the concurrent indexes. This needs to be done through
> > + * performDeletion or related dependencies will not be dropped for
> the old
> > + * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is
> not used
> > + * as here the indexes are already considered as dead and invalid,
> so they
> > + * will not be used by other backends.
> > + */
> > + foreach(lc, concurrentIndexIds)
> > + {
> > + Oid indexOid = lfirst_oid(lc);
> > +
> > + /* Start transaction to drop this index */
> > + StartTransactionCommand();
> > +
> > + /* Get fresh snapshot for next step */
> > + PushActiveSnapshot(GetTransactionSnapshot());
> > +
> > + /*
> > + * Open transaction if necessary, for the first index
> treated its
> > + * transaction has been already opened previously.
> > + */
> > + index_concurrent_drop(indexOid);
> > +
> > + /*
> > + * For the last index to be treated, do not commit
> transaction yet.
> > + * This will be done once all the locks on indexes and
> parent relations
> > + * are released.
> > + */
>
> Hm. This doesn't seem to commit the last transaction at all right now?
>
It is better like this. The end of the process needs to be done inside a
transaction, so not committing immediately the last drop makes sense, no?

> Not sure why UnlockRelationIdForSession needs to be run in a transaction
> anyway?
>
Even in the case of CREATE INDEX CONCURRENTLY, UnlockRelationIdForSession
is run inside a transaction block.

> + /*
> > + * Check the case of a system index that might have been
> invalidated by a
> > + * failed concurrent process and allow its drop.
> > + */
>
> This is only possible for toast indexes right now, right? If so, the
> comment should mention that.
>
Yes, fixed. I mentioned that in the comment.
--
Michael

Attachment Content-Type Size
20130305_1_remove_reltoastidxid_v3.patch application/octet-stream 39.2 KB
20130305_2_reindex_concurrently_v17.patch application/octet-stream 75.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-03-05 13:46:15 Re: Fix pgstattuple/pgstatindex to use regclass-type as the argument
Previous Message Heikki Linnakangas 2013-03-05 13:32:43 Re: Optimizing pglz compressor