Re: Support for REINDEX CONCURRENTLY

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Support for REINDEX CONCURRENTLY
Date: 2013-06-19 00:55:24
Message-ID: CAB7nPqSKX=1YkYxLh7G4S3awnLhbqs3GU_W_WBFb-qd9G=2czA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please find an updated patch. The regression test rules has been
updated, and all the comments are addressed.

On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
>> diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
>> index c381f11..3a6342c 100644
>> --- a/contrib/pg_upgrade/info.c
>> +++ b/contrib/pg_upgrade/info.c
>> @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
>> "INSERT INTO info_rels "
>> "SELECT reltoastrelid "
>> "FROM info_rels i JOIN pg_catalog.pg_class c "
>> - " ON i.reloid = c.oid"));
>> + " ON i.reloid = c.oid "
>> + " AND c.reltoastrelid != %u", InvalidOid));
>> PQclear(executeQueryOrDie(conn,
>> "INSERT INTO info_rels "
>> - "SELECT reltoastidxid "
>> - "FROM info_rels i JOIN pg_catalog.pg_class c "
>> - " ON i.reloid = c.oid"));
>> + "SELECT indexrelid "
>> + "FROM pg_index "
>> + "WHERE indrelid IN (SELECT reltoastrelid "
>> + " FROM pg_class "
>> + " WHERE oid >= %u "
>> + " AND reltoastrelid != %u)",
>> + FirstNormalObjectId, InvalidOid));
>
> What's the idea behind the >= here?
It is here to avoid fetching the toast relations of system tables. But
I see your point, the inner query fetching the toast OIDs should do a
join on the exising info_rels and not try to do a join on a plain
pg_index, so changed this way.

> I think we should ignore the invalid indexes in that SELECT?
Yes indeed, it doesn't make sense to grab invalid toast indexes.
Changed this way.

>> @@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
>> }
>>
>> /*
>> - * If we're swapping two toast tables by content, do the same for their
>> - * indexes.
>> + * If we're swapping two toast tables by content, do the same for all of
>> + * their indexes. The swap can actually be safely done only if the
>> + * relations have indexes.
>> */
>> if (swap_toast_by_content &&
>> - relform1->reltoastidxid && relform2->reltoastidxid)
>> - swap_relation_files(relform1->reltoastidxid,
>> - relform2->reltoastidxid,
>> - target_is_pg_class,
>> - swap_toast_by_content,
>> - is_internal,
>> - InvalidTransactionId,
>> - InvalidMultiXactId,
>> - mapped_tables);
>> + relform1->reltoastrelid &&
>> + relform2->reltoastrelid)
>> + {
>> + Relation toastRel1, toastRel2;
>> +
>> + /* Open relations */
>> + toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
>> + toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
>> +
>> + /* Obtain index list */
>> + RelationGetIndexList(toastRel1);
>> + RelationGetIndexList(toastRel2);
>> +
>> + /* Check if the swap is possible for all the toast indexes */
>> + if (list_length(toastRel1->rd_indexlist) == 1 &&
>> + list_length(toastRel2->rd_indexlist) == 1)
>> + {
>> + ListCell *lc1, *lc2;
>> +
>> + /* Now swap each couple */
>> + lc2 = list_head(toastRel2->rd_indexlist);
>> + foreach(lc1, toastRel1->rd_indexlist)
>> + {
>> + Oid indexOid1 = lfirst_oid(lc1);
>> + Oid indexOid2 = lfirst_oid(lc2);
>> + swap_relation_files(indexOid1,
>> + indexOid2,
>> + target_is_pg_class,
>> + swap_toast_by_content,
>> + is_internal,
>> + InvalidTransactionId,
>> + InvalidMultiXactId,
>> + mapped_tables);
>> + lc2 = lnext(lc2);
>> + }
>
> Why are you iterating over the indexlists after checking they are both
> of length == 1? Looks like the code would be noticeably shorter without
> that.
OK. Modified this way.

>> + }
>> + else
>> + {
>> + /*
>> + * As this code path is only taken by shared catalogs, who cannot
>> + * have multiple indexes on their toast relation, simply return
>> + * an error.
>> + */
>> + ereport(ERROR,
>> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> + errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
>> + }
>> +
>
> Absolutely minor thing, using an elog() seems to be better here since
> that uses the appropriate error code for some codepath that's not
> expected to be executed.
OK. Modified this way.

>
>> /* Clean up. */
>> heap_freetuple(reltup1);
>> @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> if (OidIsValid(newrel->rd_rel->reltoastrelid))
>> {
>> Relation toastrel;
>> - Oid toastidx;
>> char NewToastName[NAMEDATALEN];
>> + ListCell *lc;
>> + int count = 0;
>>
>> toastrel = relation_open(newrel->rd_rel->reltoastrelid,
>> AccessShareLock);
>> - toastidx = toastrel->rd_rel->reltoastidxid;
>> + RelationGetIndexList(toastrel);
>> relation_close(toastrel, AccessShareLock);
>>
>> /* rename the toast table ... */
>> @@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
>> RenameRelationInternal(newrel->rd_rel->reltoastrelid,
>> NewToastName, true);
>>
>> - /* ... and its index too */
>> - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
>> - OIDOldHeap);
>> - RenameRelationInternal(toastidx,
>> - NewToastName, true);
>> + /* ... and its indexes too */
>> + foreach(lc, toastrel->rd_indexlist)
>> + {
>> + /*
>> + * The first index keeps the former toast name and the
>> + * following entries have a suffix appended.
>> + */
>> + if (count == 0)
>> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
>> + OIDOldHeap);
>> + else
>> + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
>> + OIDOldHeap, count);
>> + RenameRelationInternal(lfirst_oid(lc),
>> + NewToastName, true);
>> + count++;
>> + }
>> }
>> relation_close(newrel, NoLock);
>> }
>
> Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.

>> diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
>> index ec956ad..ac42389 100644
>> --- a/src/bin/pg_dump/pg_dump.c
>> +++ b/src/bin/pg_dump/pg_dump.c
>> @@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
>> Oid pg_class_reltoastidxid;
>>
>> appendPQExpBuffer(upgrade_query,
>> - "SELECT c.reltoastrelid, t.reltoastidxid "
>> + "SELECT c.reltoastrelid, t.indexrelid "
>> "FROM pg_catalog.pg_class c LEFT JOIN "
>> - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
>> - "WHERE c.oid = '%u'::pg_catalog.oid;",
>> + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
>> + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
>> pg_class_oid);
>
> This possibly needs a version qualification due to querying
> indisvalid. How far back do we support pg_upgrade?
By having a look at the docs, pg_upgrade has been added in 9.0 and
support upgrades for version >= 8.3.X. indisvalid has been added in
8.2 so we are fine.

>
>> diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
>> index 8ac2549..31309ed 100644
>> --- a/src/include/utils/relcache.h
>> +++ b/src/include/utils/relcache.h
>> @@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
>> typedef Relation *RelationPtr;
>>
>> /*
>> + * RelationGetIndexListIfValid
>> + * Get index list of relation without recomputing it.
>> + */
>> +#define RelationGetIndexListIfValid(rel) \
>> +do { \
>> + if (rel->rd_indexvalid == 0) \
>> + RelationGetIndexList(rel); \
>> +} while(0)
>
> Isn't this function misnamed and should be
> RelationGetIndexListIfInValid?
When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.
--
Michael

Attachment Content-Type Size
20130618_1_remove_reltoastidxid_v11.patch application/octet-stream 58.7 KB
20130618_2_reindex_concurrently_v27.patch application/octet-stream 91.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2013-06-19 01:11:42 Re: Vacuum, Freeze and Analyze: the big picture
Previous Message Michael Paquier 2013-06-19 00:50:07 Re: Support for REINDEX CONCURRENTLY