Index: doc/src/sgml/catalogs.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/catalogs.sgml,v retrieving revision 2.129 diff -c -r2.129 catalogs.sgml *** doc/src/sgml/catalogs.sgml 31 Jul 2006 20:08:55 -0000 2.129 --- doc/src/sgml/catalogs.sgml 21 Aug 2006 23:33:19 -0000 *************** *** 2421,2426 **** --- 2421,2436 ---- + indisvalid + bool + + True if the index is currently valid for queries. + When false, the index is possibly incomplete; it should still be + inserted into by INSERT/UPDATE operations, but it cannot safely be + used for queries. + + + indkey int2vector pg_attribute.attnum Index: doc/src/sgml/indices.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/indices.sgml,v retrieving revision 1.57 diff -c -r1.57 indices.sgml *** doc/src/sgml/indices.sgml 24 May 2006 11:01:39 -0000 1.57 --- doc/src/sgml/indices.sgml 21 Aug 2006 23:33:19 -0000 *************** *** 824,829 **** --- 824,909 ---- + + Building Indexes Concurrently + + + index + building concurrently + + + + With large tables or index types that require extensive work to create, + creating an index can be a long operation. In the case of large data + warehousing applications it can easily take hours or even days to build + indexes. It's important to understand the impact creating indexes has on a + system. + + + + Normally PostgreSQL locks the table and performs the + entire index build with a single scan of the table. Other transactions can + still read the table, but if they try to insert, update, or delete rows in + the table they will block until the index build is finished. + + + + PostgreSQL also supports building indexes without locking + out data modifications. This method is invoked by specifying the + CONCURRENTLY option in CREATE INDEX, for example + + CREATE INDEX CONCURRENTLY sales_index ON sales_table (id); + + + + + When the CONCURRENTLY option is used, + PostgreSQL must perform two scans of the table, + and in between these scans it must wait for all existing transactions + to terminate. Thus this method takes much longer to complete than the + regular index build process. However, since it allows normal operations + to continue while the index is built, this method is useful for adding + new indexes in a production environment. + + + + If a problem arises during the second scan of the table, such as a + uniqueness violation in a unique index, the CREATE INDEX + command will fail but leave behind an invalid index. This index + will be ignored for querying purposes because it may be incomplete; + however it will still consume update overhead. The recommended recovery + method in such cases is to drop the index and try again to perform + CREATE INDEX CONCURRENTLY. (Another possibility is to rebuild + the index with REINDEX. However, since REINDEX + does not support concurrent builds, this option is unlikely to seem + attractive.) + + + + Another caveat when building a unique index concurrently is that the + uniqueness constraint is already being enforced against other transactions + when the second table scan begins. This means that constraint violations + could be reported in other queries prior to the index becoming available + for use, or even in cases where the index build eventually fails. Also, + if a failure does occur in the second scan, the invalid index + continues to enforce its uniqueness constraint with respect to the entries + that it has. + + + + Concurrent builds of expression indexes and partial indexes are supported. + Errors occurring in the evaluation of these expressions could cause + behavior similar to that described above for unique constraint violations. + + + + Both regular and concurrent index builds permit other index builds on the + same table to occur in parallel, but no other types of schema modification + on the table are possible. + + + + Examining Index Usage Index: doc/src/sgml/mvcc.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/mvcc.sgml,v retrieving revision 2.56 diff -c -r2.56 mvcc.sgml *** doc/src/sgml/mvcc.sgml 23 Apr 2006 03:39:52 -0000 2.56 --- doc/src/sgml/mvcc.sgml 21 Aug 2006 23:33:20 -0000 *************** *** 622,628 **** ! Acquired by VACUUM (without ). --- 622,629 ---- ! Acquired by VACUUM (without ) ! and by CREATE INDEX CONCURRENTLY. *************** *** 641,647 **** ! Acquired by CREATE INDEX. --- 642,649 ---- ! Acquired by CREATE INDEX ! (without ). Index: doc/src/sgml/ref/create_index.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/create_index.sgml,v retrieving revision 1.55 diff -c -r1.55 create_index.sgml *** doc/src/sgml/ref/create_index.sgml 11 Jul 2006 21:05:57 -0000 1.55 --- doc/src/sgml/ref/create_index.sgml 21 Aug 2006 23:33:20 -0000 *************** *** 20,26 **** ! CREATE [ UNIQUE ] INDEX name ON table [ USING method ] ( { column | ( expression ) } [ opclass ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace ] --- 20,26 ---- ! CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] name ON table [ USING method ] ( { column | ( expression ) } [ opclass ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace ] *************** *** 111,116 **** --- 111,135 ---- + CONCURRENTLY + + + When this option is used, PostgreSQL will build the + index without taking any locks that prevent concurrent inserts, + updates, or deletes on the table; whereas a standard index build + locks out writes (but not reads) on the table until it's done. + This option allows a new index to be created in a production system + without interfering with regular operation, though the added I/O load + will still of course have some impact. This method requires more + total work than a standard index build and takes significantly longer + to complete. There are other caveats for concurrent index builds, + especially if an error such as a unique constraint violation occurs. + See for more information. + + + + + name Index: doc/src/sgml/ref/reindex.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/reindex.sgml,v retrieving revision 1.31 diff -c -r1.31 reindex.sgml *** doc/src/sgml/ref/reindex.sgml 4 Jul 2006 18:07:24 -0000 1.31 --- doc/src/sgml/ref/reindex.sgml 21 Aug 2006 23:33:20 -0000 *************** *** 30,36 **** REINDEX rebuilds an index using the data stored in the index's table, replacing the old copy of the index. There are ! three main reasons to use REINDEX: --- 30,36 ---- REINDEX rebuilds an index using the data stored in the index's table, replacing the old copy of the index. There are ! several scenarios in which to use REINDEX: *************** *** 61,66 **** --- 61,78 ---- for an index, and wish to ensure that the change has taken full effect. + + + + An index build with the CONCURRENTLY option failed, leaving + an invalid index. Such indexes are useless but it can be + convenient to use REINDEX to rebuild them. Note that + REINDEX will not perform a concurrent build. To build the + index without interfering with production you should drop the index and + reissue the CREATE INDEX CONCURRENTLY command. + + + Index: src/backend/bootstrap/bootparse.y =================================================================== RCS file: /cvsroot/pgsql/src/backend/bootstrap/bootparse.y,v retrieving revision 1.83 diff -c -r1.83 bootparse.y *** src/backend/bootstrap/bootparse.y 31 Jul 2006 01:16:36 -0000 1.83 --- src/backend/bootstrap/bootparse.y 21 Aug 2006 23:33:20 -0000 *************** *** 257,263 **** $10, NULL, NIL, NIL, false, false, false, ! false, false, true, false); do_end(); } ; --- 257,263 ---- $10, NULL, NIL, NIL, false, false, false, ! false, false, true, false, false); do_end(); } ; *************** *** 275,281 **** $11, NULL, NIL, NIL, true, false, false, ! false, false, true, false); do_end(); } ; --- 275,281 ---- $11, NULL, NIL, NIL, true, false, false, ! false, false, true, false, false); do_end(); } ; Index: src/backend/catalog/index.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/index.c,v retrieving revision 1.272 diff -c -r1.272 index.c *** src/backend/catalog/index.c 31 Jul 2006 20:09:00 -0000 1.272 --- src/backend/catalog/index.c 21 Aug 2006 23:33:20 -0000 *************** *** 34,45 **** --- 34,47 ---- #include "catalog/index.h" #include "catalog/indexing.h" #include "catalog/pg_constraint.h" + #include "catalog/pg_operator.h" #include "catalog/pg_opclass.h" #include "catalog/pg_type.h" #include "executor/executor.h" #include "miscadmin.h" #include "optimizer/clauses.h" #include "parser/parse_expr.h" + #include "storage/proc.h" #include "storage/procarray.h" #include "storage/smgr.h" #include "utils/builtins.h" *************** *** 49,54 **** --- 51,57 ---- #include "utils/memutils.h" #include "utils/relcache.h" #include "utils/syscache.h" + #include "utils/tuplesort.h" /* non-export function prototypes */ *************** *** 61,67 **** static void UpdateIndexRelation(Oid indexoid, Oid heapoid, IndexInfo *indexInfo, Oid *classOids, ! bool primary); static void index_update_stats(Relation rel, bool hasindex, bool isprimary, Oid reltoastidxid, double reltuples); static Oid IndexGetRelation(Oid indexId); --- 64,71 ---- static void UpdateIndexRelation(Oid indexoid, Oid heapoid, IndexInfo *indexInfo, Oid *classOids, ! bool primary, ! bool isvalid); static void index_update_stats(Relation rel, bool hasindex, bool isprimary, Oid reltoastidxid, double reltuples); static Oid IndexGetRelation(Oid indexId); *************** *** 315,321 **** Oid heapoid, IndexInfo *indexInfo, Oid *classOids, ! bool primary) { int2vector *indkey; oidvector *indclass; --- 319,326 ---- Oid heapoid, IndexInfo *indexInfo, Oid *classOids, ! bool primary, ! bool isvalid) { int2vector *indkey; oidvector *indclass; *************** *** 383,388 **** --- 388,394 ---- values[Anum_pg_index_indisunique - 1] = BoolGetDatum(indexInfo->ii_Unique); values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary); values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false); + values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid); values[Anum_pg_index_indkey - 1] = PointerGetDatum(indkey); values[Anum_pg_index_indclass - 1] = PointerGetDatum(indclass); values[Anum_pg_index_indexprs - 1] = exprsDatum; *************** *** 427,433 **** * isconstraint: index is owned by a PRIMARY KEY or UNIQUE constraint * allow_system_table_mods: allow table to be a system catalog * skip_build: true to skip the index_build() step for the moment; caller ! * must do it later (typically via reindex_index()) * * Returns OID of the created index. */ --- 433,440 ---- * isconstraint: index is owned by a PRIMARY KEY or UNIQUE constraint * allow_system_table_mods: allow table to be a system catalog * skip_build: true to skip the index_build() step for the moment; caller ! * must do it later (typically via reindex_index()) ! * concurrent: Avoid taking locks blocking writers to the table while building. * * Returns OID of the created index. */ *************** *** 443,449 **** bool isprimary, bool isconstraint, bool allow_system_table_mods, ! bool skip_build) { Relation pg_class; Relation heapRelation; --- 450,457 ---- bool isprimary, bool isconstraint, bool allow_system_table_mods, ! bool skip_build, ! bool concurrent) { Relation pg_class; Relation heapRelation; *************** *** 456,464 **** pg_class = heap_open(RelationRelationId, RowExclusiveLock); /* ! * Only SELECT ... FOR UPDATE/SHARE are allowed while doing this ! */ ! heapRelation = heap_open(heapRelationId, ShareLock); /* * The index will be in the same namespace as its parent table, and is --- 464,474 ---- pg_class = heap_open(RelationRelationId, RowExclusiveLock); /* ! * Only SELECT ... FOR UPDATE/SHARE are allowed while doing this unless ! * we're doing an concurrent index build in which case we block vacuum but not ! * UPDATE/INSERT/DELETE ! */ ! heapRelation = heap_open(heapRelationId, (concurrent ? ShareUpdateExclusiveLock : ShareLock)); /* * The index will be in the same namespace as its parent table, and is *************** *** 578,584 **** * ---------------- */ UpdateIndexRelation(indexRelationId, heapRelationId, indexInfo, ! classObjectId, isprimary); /* * Register constraint and dependencies for the index. --- 588,594 ---- * ---------------- */ UpdateIndexRelation(indexRelationId, heapRelationId, indexInfo, ! classObjectId, isprimary, !concurrent); /* * Register constraint and dependencies for the index. *************** *** 1394,1431 **** break; case HEAPTUPLE_INSERT_IN_PROGRESS: ! /* ! * Since caller should hold ShareLock or better, we should ! * not see any tuples inserted by open transactions --- ! * unless it's our own transaction. (Consider INSERT ! * followed by CREATE INDEX within a transaction.) An ! * exception occurs when reindexing a system catalog, ! * because we often release lock on system catalogs before ! * committing. */ - if (!TransactionIdIsCurrentTransactionId( - HeapTupleHeaderGetXmin(heapTuple->t_data)) - && !IsSystemRelation(heapRelation)) - elog(ERROR, "concurrent insert in progress"); indexIt = true; tupleIsAlive = true; break; case HEAPTUPLE_DELETE_IN_PROGRESS: ! ! /* ! * Since caller should hold ShareLock or better, we should ! * not see any tuples deleted by open transactions --- ! * unless it's our own transaction. (Consider DELETE ! * followed by CREATE INDEX within a transaction.) An ! * exception occurs when reindexing a system catalog, ! * because we often release lock on system catalogs before ! * committing. ! */ ! Assert(!(heapTuple->t_data->t_infomask & HEAP_XMAX_IS_MULTI)); ! if (!TransactionIdIsCurrentTransactionId( ! HeapTupleHeaderGetXmax(heapTuple->t_data)) ! && !IsSystemRelation(heapRelation)) ! elog(ERROR, "concurrent delete in progress"); indexIt = true; tupleIsAlive = false; break; --- 1404,1421 ---- break; case HEAPTUPLE_INSERT_IN_PROGRESS: ! /* We used to insist the caller held ShareLock or better ! * but we no longer do. Concurrent index builds may well get ! * here holding a weaker lock allowing concurrent updates. ! * In that case we could index or not index the tuple but ! * we always prefer to index it to aim for a better quality ! * index structure. */ indexIt = true; tupleIsAlive = true; break; case HEAPTUPLE_DELETE_IN_PROGRESS: ! /* See INSERT_IN_PROGRESS case above */ indexIt = true; tupleIsAlive = false; break; *************** *** 1605,1610 **** --- 1595,1924 ---- heap_close(heapRelation, NoLock); } + + + /* State and callback function for concurrent index builds. + * + * The strategy for concurrent index builds is to build the index regularly + * recognizing that some tuples may be missing from the index because they were + * inserted by a concurrent transaction after the index build had passed the + * page they were inserted into. At this point the index is marked as not being + * usable for query plans. + * + * Then validate_index is called on the index to insert any missing index + * tuples. It gathers up all the existing TIDs from the index using bulkdelete + * (hence validate_index_callback_from_indexscan which merely stores the TIDs + * in a tuple store and returns false). Sorts them, and does a kind of "merge" + * against the heap to find any missing tuples. We can't reuse + * IndexBuildHeapScan for this purpose since we need to exclude + * INSERT_IN_PROGRESS tuples and in any case we want to avoid executing + * expressions or predicates for records already in the index. + * + * This is a brute force strategy. We could try to be clever storing new tuples + * in a special area of the table (such making the table append-only by setting + * use_fsm) or by using clever visibility checks and probing the index instead + * of scanning the whole thing. However most of these strategies would be + * impossible or at least very hard to support for arbitrary index methods. + * They would also be more fragile and harder to protect against duplicate or + * missing index entries. This strategy is nicely robust. + */ + + struct v_i_state + { + Tuplesortstate *tuplesort; + ItemPointer cursor; + bool finished; + Relation heapRel; + int itups, htups, tups_inserted; + }; + + /* This is called from bulkdelete to gather the tids in a tuplesort */ + static bool + validate_index_callback_from_indexscan(ItemPointer itemptr, void *opaque) + { + struct v_i_state *state = opaque; + tuplesort_putdatum(state->tuplesort, PointerGetDatum(itemptr), false); + state->itups++; + return false; /* of course we're not actually deleting anything */ + } + + /* validate_index_heapscan - This is mostly copied from IndexBuildHeapScan but + * the expression executions and predicate tests are postponed until after the + * tid check. Also we need to be able to distinguish INSERT_IN_PROGRESS from + * other cases + * + * The only tricky bit is that caller (ie validate_index) *MUST* ensure that at + * the time this scan begins no in progress transactions are unaware of the new + * index. We must take care here not duplicate inserts so we ignore + * INSERT_IN_PROGRESS tuples since we can safely assume they will be indexed by + * the inserted. + */ + + static void + validate_index_heapscan(Relation heapRelation, + Relation indexRelation, + IndexInfo *indexInfo, + struct v_i_state *state) + { + HeapScanDesc scan; + HeapTuple heapTuple; + Datum values[INDEX_MAX_KEYS]; + bool isnull[INDEX_MAX_KEYS]; + List *predicate; + TupleTableSlot *slot; + EState *estate; + ExprContext *econtext; + TransactionId OldestXmin; + /* state variables used to merge */ + ItemPointer heapcursor, indexcursor = NULL; + bool tuplesort_empty = false; + + /* + * sanity checks + */ + Assert(OidIsValid(indexRelation->rd_rel->relam)); + + /* + * Need an EState for evaluation of index expressions and partial-index + * predicates. Also a slot to hold the current tuple. + */ + estate = CreateExecutorState(); + econtext = GetPerTupleExprContext(estate); + slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation)); + + /* Arrange for econtext's scan tuple to be the tuple under test */ + econtext->ecxt_scantuple = slot; + + /* Set up execution state for predicate, if any. */ + predicate = (List *) + ExecPrepareExpr((Expr *) indexInfo->ii_Predicate, + estate); + + /* + * Ok, begin our scan of the base relation. We use SnapshotAny because we + * must retrieve all tuples and do our own time qual checks. + */ + OldestXmin = GetOldestXmin(heapRelation->rd_rel->relisshared, true); + + scan = heap_beginscan(heapRelation, /* relation */ + SnapshotAny, /* seeself */ + 0, /* number of keys */ + NULL); /* scan key */ + + /* + * Scan all tuples in the base relation. + */ + while ((heapTuple = heap_getnext(scan, ForwardScanDirection)) != NULL) + { + bool indexIt; + CHECK_FOR_INTERRUPTS(); + + /* + * We could possibly get away with not locking the buffer here, + * since caller should hold ShareLock on the relation, but let's + * be conservative about it. + * + * XXX Is that still true for validate_index? + */ + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE); + switch (HeapTupleSatisfiesVacuum(heapTuple->t_data, OldestXmin, + scan->rs_cbuf)) + { + /* If the tuple is inserted by a transaction that could still be running + * it's important that we ignore the heap tuple. The transaction will + * definitely index it itself (that's why we waited before starting this + * process for all transactions to see the index) and we may have caught it + * between inserting the heap tuple and the index entry + */ + case HEAPTUPLE_DEAD: + case HEAPTUPLE_INSERT_IN_PROGRESS: + case HEAPTUPLE_DELETE_IN_PROGRESS: + /* connot continue here because we must release the buffer lock */ + indexIt = false; + break; + + /* Normal case, we have to make sure the tuple is in the index */ + case HEAPTUPLE_LIVE: + case HEAPTUPLE_RECENTLY_DEAD: + indexIt = true; + break; + + default: + indexIt = false; /* shut the compiler up */ + elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result"); + } + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_UNLOCK); + + if (!indexIt) + continue; + + state->htups++; + + /* "merge" by skipping through the index tuples until we get to (or past) + * the current heap tuple. + */ + + heapcursor = &heapTuple->t_self;; + + while (!tuplesort_empty && (!indexcursor || ItemPointerCompare(indexcursor, heapcursor) < 0)) { + bool ts_isnull; + if (indexcursor) + pfree(indexcursor); + tuplesort_empty = !tuplesort_getdatum(state->tuplesort, true, (Datum*)&indexcursor, &ts_isnull); + Assert(tuplesort_empty || !ts_isnull); + } + + MemoryContextReset(econtext->ecxt_per_tuple_memory); + + if (tuplesort_empty || ItemPointerCompare(indexcursor, heapcursor) > 0) { + /* we've overshot which means this heap tuple is missing from the index + * + * We've still postponed predicate checks until here, and expression + * calcualtions */ + + ExecStoreTuple(heapTuple, slot, InvalidBuffer, false); + if (predicate != NIL && !ExecQual(predicate, econtext, false)) + continue; + + /* + * For the current heap tuple, extract all the attributes we use in + * this index, and note which are null. This also performs evaluation + * of any expressions needed. + */ + FormIndexDatum(indexInfo, + slot, + estate, + values, + isnull); + + /* If the unique constraint fails then we throw an error. Currently + * this means the DBA has to manually drop the invalid index but we + * do have a solution for this to be implemented in the future */ + index_insert(indexRelation, + values, + isnull, + heapcursor, + heapRelation, + indexInfo->ii_Unique); + state->tups_inserted++; + } + } + + heap_endscan(scan); + + ExecDropSingleTupleTableSlot(slot); + + FreeExecutorState(estate); + + /* These may have been pointing to the now-gone estate */ + indexInfo->ii_ExpressionsState = NIL; + indexInfo->ii_PredicateState = NIL; + } + + void + validate_index(Oid IndexId) + { + Oid heapId; + Relation heapRelation, indexRelation, pg_index; + IndexInfo *indexInfo; + HeapTuple tuple; + Form_pg_index index; + struct v_i_state state; + IndexVacuumInfo ivinfo; + IndexBulkDeleteResult stats; + + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + + tuple = SearchSysCacheCopy(INDEXRELID, + ObjectIdGetDatum(IndexId), + 0, 0, 0); + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "cache lookup failed for index %u", IndexId); + index = (Form_pg_index) GETSTRUCT(tuple); + Assert(index->indexrelid = IndexId); + + /* Open and lock the parent heap relation. */ + heapId = IndexGetRelation(IndexId); + heapRelation = heap_open(heapId, ShareUpdateExclusiveLock); /* Allow concurrent DML but Vacuum so we can do bulkdelete */ + /* And the target index relation */ + indexRelation = index_open(IndexId, RowExclusiveLock); + LockRelation(indexRelation, AccessShareLock); + + /* Check this after we open the parent heap so we don't get two + * validate_indexes running concurrently. It wouldn't cause any damage just + * be pointless */ + if (index->indisvalid) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("index does not need validation"))); + + + /* Fetch info needed for index_build */ + indexInfo = BuildIndexInfo(indexRelation); + + ivinfo.index = indexRelation; + ivinfo.vacuum_full = false; + ivinfo.message_level = DEBUG2; + ivinfo.num_heap_tuples = -1; + + /* We have to sort the tids to "merge" them with the heap.*/ + state.tuplesort = tuplesort_begin_datum(TIDOID, + TIDLessOperator, + maintenance_work_mem, + false); + state.itups = 0; + /* Now gather up all the tids in the index */ + index_bulk_delete(&ivinfo, &stats, validate_index_callback_from_indexscan, &state); + + /* And we're off */ + tuplesort_performsort(state.tuplesort); + + /* Now scan the heap and "merge" it with the index + * + * We don't do a real symmetric "merge", instead we read in all the heap + * pages and pull out index tuples until we find the matching tid. If we + * don't find one before we get past the tid we're looking for then it's + * missing. + * + * We reuse the same state object -- why not + */ + + state.htups = state.tups_inserted = 0; + + validate_index_heapscan(heapRelation, + indexRelation, + indexInfo, + &state); + + tuplesort_end(state.tuplesort); + + ereport(DEBUG1, + (errmsg("Found %d heap tuples, %d index tuples. Inserted %d missing tuples", + state.htups, state.itups, state.tups_inserted))); + + /* Index is now valid -- update pg_index and catalog index */ + index->indisvalid = true; + simple_heap_update(pg_index, &tuple->t_self, tuple); + CatalogUpdateIndexes(pg_index, tuple); + + /* Make the new index using by queries in our transaction. This ought not + * be necessary if we're wrapped up in a single command */ + CommandCounterIncrement(); + + /* I don't think this is necessary, the index is always listed in the + * relcache anyways and is just ignored by plancat which is reconstructed + * on each command + */ + CacheInvalidateRelcache(heapRelation); + + /* Close rels, but keep locks */ + index_close(indexRelation, NoLock); + heap_close(heapRelation, NoLock); + heap_close(pg_index, NoLock); + } + + + /* * reindex_relation - This routine is used to recreate all indexes * of a relation (and optionally its toast relation too, if any). Index: src/backend/catalog/toasting.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/catalog/toasting.c,v retrieving revision 1.1 diff -c -r1.1 toasting.c *** src/backend/catalog/toasting.c 31 Jul 2006 01:16:37 -0000 1.1 --- src/backend/catalog/toasting.c 21 Aug 2006 23:33:20 -0000 *************** *** 227,233 **** BTREE_AM_OID, rel->rd_rel->reltablespace, classObjectId, (Datum) 0, ! true, false, true, false); /* * Store the toast table's OID in the parent relation's pg_class row --- 227,233 ---- BTREE_AM_OID, rel->rd_rel->reltablespace, classObjectId, (Datum) 0, ! true, false, true, false, false); /* * Store the toast table's OID in the parent relation's pg_class row Index: src/backend/commands/indexcmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.146 diff -c -r1.146 indexcmds.c *** src/backend/commands/indexcmds.c 31 Jul 2006 01:16:37 -0000 1.146 --- src/backend/commands/indexcmds.c 21 Aug 2006 23:33:20 -0000 *************** *** 18,23 **** --- 18,24 ---- #include "access/genam.h" #include "access/heapam.h" #include "access/reloptions.h" + #include "access/transam.h" #include "access/xact.h" #include "catalog/catalog.h" #include "catalog/dependency.h" *************** *** 85,90 **** --- 86,92 ---- * 'skip_build': make the catalog entries but leave the index file empty; * it will be filled later. * 'quiet': suppress the NOTICE chatter ordinarily provided for constraints. + * 'concurrent': avoid blocking writers to the table while building. */ void DefineIndex(RangeVar *heapRelation, *************** *** 102,108 **** bool is_alter_table, bool check_rights, bool skip_build, ! bool quiet) { Oid *classObjectId; Oid accessMethodId; --- 104,111 ---- bool is_alter_table, bool check_rights, bool skip_build, ! bool quiet, ! bool concurrent) { Oid *classObjectId; Oid accessMethodId; *************** *** 116,121 **** --- 119,125 ---- Datum reloptions; IndexInfo *indexInfo; int numberOfAttributes; + HeapTuple indexTuple; /* * count attributes in index *************** *** 133,140 **** /* * Open heap relation, acquire a suitable lock on it, remember its OID */ ! rel = heap_openrv(heapRelation, ShareLock); /* Note: during bootstrap may see uncataloged relation */ if (rel->rd_rel->relkind != RELKIND_RELATION && --- 137,149 ---- /* * Open heap relation, acquire a suitable lock on it, remember its OID + * + * Only SELECT ... FOR UPDATE/SHARE are allowed while doing a standard + * index build; but for concurrent builds we allow INSERT/UPDATE/DELETE + * (but not VACUUM). */ ! rel = heap_openrv(heapRelation, ! (concurrent ? ShareUpdateExclusiveLock : ShareLock)); /* Note: during bootstrap may see uncataloged relation */ if (rel->rd_rel->relkind != RELKIND_RELATION && *************** *** 410,419 **** primary ? "PRIMARY KEY" : "UNIQUE", indexRelationName, RelationGetRelationName(rel)))); ! index_create(relationId, indexRelationName, indexRelationId, ! indexInfo, accessMethodId, tablespaceId, classObjectId, ! reloptions, primary, isconstraint, ! allowSystemTableMods, skip_build); } --- 419,544 ---- primary ? "PRIMARY KEY" : "UNIQUE", indexRelationName, RelationGetRelationName(rel)))); ! indexRelationId = ! index_create(relationId, indexRelationName, indexRelationId, ! indexInfo, accessMethodId, tablespaceId, classObjectId, ! reloptions, primary, isconstraint, ! allowSystemTableMods, skip_build, concurrent); ! ! if (!concurrent) ! return; /* We're done, in the standard case */ ! ! /* Phase 2 of concurrent index build */ ! ! CommitTransactionCommand(); ! StartTransactionCommand(); ! ! #if 0 ! /* If we fail at this point do some fancy footwork to drop the table */ ! PG_TRY(); ! #endif ! { ! unsigned sleeps; ! ! /* we need to do this so we can get the latest snapshots below without ! * triggering an error */ ! GetTransactionSnapshot(); ! ! /* We're ready to go ahead when no running transaction could have the table ! * open with the old list of indexes. If we can take an exclusive lock then ! * there are none and anybody who opens it now will get the new index in ! * their relcache entry. Alternatively if our Xmin reaches our own ! * transaction then we know no running transactions that started before us ! * are left anyways. ! */ ! CHECK_FOR_INTERRUPTS(); ! for (sleeps=0; /* until break */ ; sleeps++ ) { ! Relation rel = RelationIdGetRelation(relationId); ! if (!RelationIsValid(rel)) ! ereport(ERROR, ! (errcode(ERRCODE_INTERNAL_ERROR), /* XXX */ ! errmsg("concurrent index build failed due to concurrent DROP TABLE"))); ! ! if (ConditionalLockRelation(rel, ExclusiveLock)) { ! UnlockRelation(rel,ExclusiveLock); ! RelationClose(rel); ! break; ! } ! RelationClose(rel); ! ! if (TransactionIdEquals(GetLatestSnapshot()->xmin, GetTopTransactionId())) ! break; ! ! if (!sleeps) ! ereport(DEBUG1, ! (errmsg("waiting for new index to be visible to other transactions"))); ! ! ! /* .5s */ ! pg_usleep(500000L); ! CHECK_FOR_INTERRUPTS(); ! } ! ! /* We have to protect against someone jumping in the window between ! * transactions and changing our index or table structure. The only piece ! * of state we care about here is the OID of the index however. Make sure ! * it hasn't been dropped (or worse, though extremely unlikley, replaced ! * with some other index entirely). Make sure it's still the index we want ! * with some belt-and-suspenders checks that probably can't ever happen. ! * Most other DDL either can work just fine on invalid indexes and ! * validate_index will work fine or else it's their responsibility to ! * detect and fail if they're run on an invalid index. ! */ ! ! indexTuple = SearchSysCache(INDEXRELID, ! ObjectIdGetDatum(indexRelationId), ! 0, 0, 0); ! if (!HeapTupleIsValid(indexTuple)) ! ereport(ERROR, ! (errcode(ERRCODE_INTERNAL_ERROR), /* XXX */ ! errmsg("concurrent index build failed due to concurrent DROP INDEX"))); ! { ! Form_pg_index result = (Form_pg_index) GETSTRUCT(indexTuple); ! if (result->indrelid != relationId || result->indisvalid) ! ereport(ERROR, ! (errcode(ERRCODE_INTERNAL_ERROR), /* XXX */ ! errmsg("concurrent index build failed due to concurrent DDL"))); ! } ! ReleaseSysCache(indexTuple); ! ! /* functions in indexes may want a snapshot set */ ! ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); ! ! validate_index(indexRelationId); ! ! CommitTransactionCommand(); ! } ! ! #if 0 ! /* Some tedious fancy footwork to drop the index if a unique constraint or ! * other problem arises */ ! PG_CATCH(); ! { ! 1 Copy error context (in portal memory context?) ! 2 abort the transaction ! 3 start new transaction ! ! ObjectAddress object; ! object.classId = RelationRelationId; ! object.objectId = indexRelationId; ! object.objectSubId = 0; ! performDeletion(&object, behavior); ! ! 5 commit ! 6 start new transaction ! 7 restore error context ! 8 throw said error ! ! PG_RE_THROW(); ! } ! #endif ! ! StartTransactionCommand(); } *************** *** 941,947 **** { Oid indOid; HeapTuple tuple; ! indOid = RangeVarGetRelid(indexRelation, false); tuple = SearchSysCache(RELOID, ObjectIdGetDatum(indOid), --- 1066,1075 ---- { Oid indOid; HeapTuple tuple; ! Relation pg_index; ! HeapTuple pg_index_tuple; ! Form_pg_index index; ! indOid = RangeVarGetRelid(indexRelation, false); tuple = SearchSysCache(RELOID, ObjectIdGetDatum(indOid), *************** *** 954,965 **** (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", indexRelation->relname))); ! /* Check permissions */ if (!pg_class_ownercheck(indOid, GetUserId())) aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, indexRelation->relname); ReleaseSysCache(tuple); reindex_index(indOid); --- 1082,1113 ---- (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("\"%s\" is not an index", indexRelation->relname))); ! /* Check permissions */ if (!pg_class_ownercheck(indOid, GetUserId())) aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS, indexRelation->relname); + + /* Check if this is an invalid index left over from a concurrent create. This + * gives DBAs two outs if ithey're create fails. They can drop the table or + * they can reindex it. */ + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + pg_index_tuple = SearchSysCacheCopy(INDEXRELID, + ObjectIdGetDatum(indOid), + 0, 0, 0); + if (!HeapTupleIsValid(pg_index_tuple)) + elog(ERROR, "cache lookup failed for index %u", indOid); + + index = (Form_pg_index)GETSTRUCT(pg_index_tuple); + + if (!index->indisvalid) { + index->indisvalid = true; + simple_heap_update(pg_index, &pg_index_tuple->t_self, pg_index_tuple); + CatalogUpdateIndexes(pg_index, pg_index_tuple); + } + heap_close(pg_index, NoLock); + ReleaseSysCache(tuple); reindex_index(indOid); Index: src/backend/commands/tablecmds.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/commands/tablecmds.c,v retrieving revision 1.200 diff -c -r1.200 tablecmds.c *** src/backend/commands/tablecmds.c 21 Aug 2006 00:57:24 -0000 1.200 --- src/backend/commands/tablecmds.c 21 Aug 2006 23:33:20 -0000 *************** *** 3832,3838 **** true, /* is_alter_table */ check_rights, skip_build, ! quiet); } /* --- 3832,3839 ---- true, /* is_alter_table */ check_rights, skip_build, ! quiet, ! false); } /* Index: src/backend/nodes/copyfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/nodes/copyfuncs.c,v retrieving revision 1.348 diff -c -r1.348 copyfuncs.c *** src/backend/nodes/copyfuncs.c 21 Aug 2006 00:57:24 -0000 1.348 --- src/backend/nodes/copyfuncs.c 21 Aug 2006 23:33:20 -0000 *************** *** 2049,2054 **** --- 2049,2055 ---- COPY_SCALAR_FIELD(unique); COPY_SCALAR_FIELD(primary); COPY_SCALAR_FIELD(isconstraint); + COPY_SCALAR_FIELD(concurrent); return newnode; } Index: src/backend/nodes/equalfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/nodes/equalfuncs.c,v retrieving revision 1.282 diff -c -r1.282 equalfuncs.c *** src/backend/nodes/equalfuncs.c 21 Aug 2006 00:57:24 -0000 1.282 --- src/backend/nodes/equalfuncs.c 21 Aug 2006 23:33:20 -0000 *************** *** 962,967 **** --- 962,968 ---- COMPARE_SCALAR_FIELD(unique); COMPARE_SCALAR_FIELD(primary); COMPARE_SCALAR_FIELD(isconstraint); + COMPARE_SCALAR_FIELD(concurrent); return true; } Index: src/backend/nodes/outfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/nodes/outfuncs.c,v retrieving revision 1.283 diff -c -r1.283 outfuncs.c *** src/backend/nodes/outfuncs.c 21 Aug 2006 00:57:24 -0000 1.283 --- src/backend/nodes/outfuncs.c 21 Aug 2006 23:33:20 -0000 *************** *** 1353,1358 **** --- 1353,1359 ---- WRITE_BOOL_FIELD(unique); WRITE_BOOL_FIELD(primary); WRITE_BOOL_FIELD(isconstraint); + WRITE_BOOL_FIELD(concurrent); } static void Index: src/backend/optimizer/util/plancat.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/optimizer/util/plancat.c,v retrieving revision 1.124 diff -c -r1.124 plancat.c *** src/backend/optimizer/util/plancat.c 5 Aug 2006 00:22:49 -0000 1.124 --- src/backend/optimizer/util/plancat.c 21 Aug 2006 23:33:20 -0000 *************** *** 138,143 **** --- 138,155 ---- indexRelation = index_open(indexoid, lmode); index = indexRelation->rd_index; + /* + * Ignore invalid indexes, since they can't safely be used for + * queries. Note that this is OK because the data structure + * we are constructing is only used by the planner --- the + * executor still needs to insert into "invalid" indexes! + */ + if (!index->indisvalid) + { + index_close(indexRelation, NoLock); + continue; + } + info = makeNode(IndexOptInfo); info->indexoid = index->indexrelid; Index: src/backend/parser/analyze.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/analyze.c,v retrieving revision 1.347 diff -c -r1.347 analyze.c *** src/backend/parser/analyze.c 21 Aug 2006 00:57:24 -0000 1.347 --- src/backend/parser/analyze.c 21 Aug 2006 23:33:21 -0000 *************** *** 1488,1493 **** --- 1488,1494 ---- index->tableSpace = constraint->indexspace; index->indexParams = NIL; index->whereClause = NULL; + index->concurrent = false; /* * Make sure referenced keys exist. If we are making a PRIMARY KEY Index: src/backend/parser/gram.y =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/gram.y,v retrieving revision 2.557 diff -c -r2.557 gram.y *** src/backend/parser/gram.y 21 Aug 2006 00:57:25 -0000 2.557 --- src/backend/parser/gram.y 21 Aug 2006 23:33:21 -0000 *************** *** 364,370 **** CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT ! COMMITTED CONNECTION CONSTRAINT CONSTRAINTS CONVERSION_P CONVERT COPY CREATE CREATEDB CREATEROLE CREATEUSER CROSS CSV CURRENT_DATE CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE --- 364,371 ---- CACHE CALLED CASCADE CASCADED CASE CAST CHAIN CHAR_P CHARACTER CHARACTERISTICS CHECK CHECKPOINT CLASS CLOSE CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT ! COMMITTED CONCURRENTLY CONNECTION CONSTRAINT CONSTRAINTS ! CONVERSION_P CONVERT COPY CREATE CREATEDB CREATEROLE CREATEUSER CROSS CSV CURRENT_DATE CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE *************** *** 3638,3657 **** /***************************************************************************** * ! * QUERY: ! * create index on ! * [ using ] "(" ( [ using ] )+ ")" ! * [ tablespace ] [ where ] * * Note: we cannot put TABLESPACE clause after WHERE clause unless we are * willing to make TABLESPACE a fully reserved word. *****************************************************************************/ ! IndexStmt: CREATE index_opt_unique INDEX index_name ON qualified_name ! access_method_clause '(' index_params ')' opt_definition OptTableSpace where_clause { IndexStmt *n = makeNode(IndexStmt); n->unique = $2; n->idxname = $4; n->relation = $6; n->accessMethod = $7; --- 3639,3660 ---- /***************************************************************************** * ! * QUERY: CREATE INDEX ! * ! * Note: we can't factor CONCURRENTLY into a separate production without ! * making it a reserved word. * * Note: we cannot put TABLESPACE clause after WHERE clause unless we are * willing to make TABLESPACE a fully reserved word. *****************************************************************************/ ! IndexStmt: CREATE index_opt_unique INDEX index_name ! ON qualified_name access_method_clause '(' index_params ')' ! opt_definition OptTableSpace where_clause { IndexStmt *n = makeNode(IndexStmt); n->unique = $2; + n->concurrent = false; n->idxname = $4; n->relation = $6; n->accessMethod = $7; *************** *** 3661,3666 **** --- 3664,3685 ---- n->whereClause = $13; $$ = (Node *)n; } + | CREATE index_opt_unique INDEX CONCURRENTLY index_name + ON qualified_name access_method_clause '(' index_params ')' + opt_definition OptTableSpace where_clause + { + IndexStmt *n = makeNode(IndexStmt); + n->unique = $2; + n->concurrent = true; + n->idxname = $5; + n->relation = $7; + n->accessMethod = $8; + n->indexParams = $10; + n->options = $12; + n->tableSpace = $13; + n->whereClause = $14; + $$ = (Node *)n; + } ; index_opt_unique: *************** *** 8491,8496 **** --- 8510,8516 ---- | COMMENT | COMMIT | COMMITTED + | CONCURRENTLY | CONNECTION | CONSTRAINTS | CONVERSION_P Index: src/backend/parser/keywords.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/parser/keywords.c,v retrieving revision 1.175 diff -c -r1.175 keywords.c *** src/backend/parser/keywords.c 12 Aug 2006 02:52:05 -0000 1.175 --- src/backend/parser/keywords.c 21 Aug 2006 23:33:21 -0000 *************** *** 84,89 **** --- 84,90 ---- {"comment", COMMENT}, {"commit", COMMIT}, {"committed", COMMITTED}, + {"concurrently", CONCURRENTLY}, {"connection", CONNECTION}, {"constraint", CONSTRAINT}, {"constraints", CONSTRAINTS}, Index: src/backend/storage/page/itemptr.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/storage/page/itemptr.c,v retrieving revision 1.17 diff -c -r1.17 itemptr.c *** src/backend/storage/page/itemptr.c 14 Jul 2006 14:52:23 -0000 1.17 --- src/backend/storage/page/itemptr.c 21 Aug 2006 23:33:21 -0000 *************** *** 16,28 **** #include "storage/itemptr.h" /* * ItemPointerEquals * Returns true if both item pointers point to the same item, * otherwise returns false. * * Note: ! * Assumes that the disk item pointers are not NULL. */ bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2) --- 16,29 ---- #include "storage/itemptr.h" + /* * ItemPointerEquals * Returns true if both item pointers point to the same item, * otherwise returns false. * * Note: ! * Asserts that the disk item pointers are both valid! */ bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2) *************** *** 35,37 **** --- 36,65 ---- else return false; } + + /* + * ItemPointerCompare + * Generic btree-style comparison for item pointers. + */ + int32 + ItemPointerCompare(ItemPointer arg1, ItemPointer arg2) + { + /* + * Don't use ItemPointerGetBlockNumber or ItemPointerGetOffsetNumber here, + * because they assert ip_posid != 0 which might not be true for a + * user-supplied TID. + */ + BlockNumber b1 = BlockIdGetBlockNumber(&(arg1->ip_blkid)); + BlockNumber b2 = BlockIdGetBlockNumber(&(arg2->ip_blkid)); + + if (b1 < b2) + return -1; + else if (b1 > b2) + return 1; + else if (arg1->ip_posid < arg2->ip_posid) + return -1; + else if (arg1->ip_posid > arg2->ip_posid) + return 1; + else + return 0; + } Index: src/backend/tcop/utility.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/tcop/utility.c,v retrieving revision 1.266 diff -c -r1.266 utility.c *** src/backend/tcop/utility.c 15 Aug 2006 18:26:58 -0000 1.266 --- src/backend/tcop/utility.c 21 Aug 2006 23:33:21 -0000 *************** *** 781,786 **** --- 781,789 ---- { IndexStmt *stmt = (IndexStmt *) parsetree; + if (stmt->concurrent) + PreventTransactionChain(stmt, "CREATE INDEX CONCURRENTLY"); + CheckRelationOwnership(stmt->relation, true); DefineIndex(stmt->relation, /* relation */ *************** *** 795,804 **** stmt->unique, stmt->primary, stmt->isconstraint, ! false, /* is_alter_table */ ! true, /* check_rights */ ! false, /* skip_build */ ! false); /* quiet */ } break; --- 798,808 ---- stmt->unique, stmt->primary, stmt->isconstraint, ! false, /* is_alter_table */ ! true, /* check_rights */ ! false, /* skip_build */ ! false, /* quiet */ ! stmt->concurrent); /* concurrent */ } break; Index: src/backend/utils/adt/tid.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/tid.c,v retrieving revision 1.54 diff -c -r1.54 tid.c *** src/backend/utils/adt/tid.c 21 Jul 2006 20:51:32 -0000 1.54 --- src/backend/utils/adt/tid.c 21 Aug 2006 23:33:21 -0000 *************** *** 158,193 **** * PUBLIC ROUTINES * *****************************************************************************/ - static int32 - tid_cmp_internal(ItemPointer arg1, ItemPointer arg2) - { - /* - * Don't use ItemPointerGetBlockNumber or ItemPointerGetOffsetNumber here, - * because they assert ip_posid != 0 which might not be true for a - * user-supplied TID. - */ - BlockNumber b1 = BlockIdGetBlockNumber(&(arg1->ip_blkid)); - BlockNumber b2 = BlockIdGetBlockNumber(&(arg2->ip_blkid)); - - if (b1 < b2) - return -1; - else if (b1 > b2) - return 1; - else if (arg1->ip_posid < arg2->ip_posid) - return -1; - else if (arg1->ip_posid > arg2->ip_posid) - return 1; - else - return 0; - } - Datum tideq(PG_FUNCTION_ARGS) { ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(tid_cmp_internal(arg1,arg2) == 0); } Datum --- 158,170 ---- * PUBLIC ROUTINES * *****************************************************************************/ Datum tideq(PG_FUNCTION_ARGS) { ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(ItemPointerCompare(arg1,arg2) == 0); } Datum *************** *** 196,202 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(tid_cmp_internal(arg1,arg2) != 0); } Datum --- 173,179 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(ItemPointerCompare(arg1,arg2) != 0); } Datum *************** *** 205,211 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(tid_cmp_internal(arg1,arg2) < 0); } Datum --- 182,188 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(ItemPointerCompare(arg1,arg2) < 0); } Datum *************** *** 214,220 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(tid_cmp_internal(arg1,arg2) <= 0); } Datum --- 191,197 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(ItemPointerCompare(arg1,arg2) <= 0); } Datum *************** *** 223,229 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(tid_cmp_internal(arg1,arg2) > 0); } Datum --- 200,206 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(ItemPointerCompare(arg1,arg2) > 0); } Datum *************** *** 232,238 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(tid_cmp_internal(arg1,arg2) >= 0); } Datum --- 209,215 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_BOOL(ItemPointerCompare(arg1,arg2) >= 0); } Datum *************** *** 241,247 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_INT32(tid_cmp_internal(arg1, arg2)); } Datum --- 218,224 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_INT32(ItemPointerCompare(arg1, arg2)); } Datum *************** *** 250,256 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_ITEMPOINTER(tid_cmp_internal(arg1,arg2) >= 0 ? arg1 : arg2); } Datum --- 227,233 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_ITEMPOINTER(ItemPointerCompare(arg1,arg2) >= 0 ? arg1 : arg2); } Datum *************** *** 259,265 **** ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_ITEMPOINTER(tid_cmp_internal(arg1,arg2) <= 0 ? arg1 : arg2); } --- 236,242 ---- ItemPointer arg1 = PG_GETARG_ITEMPOINTER(0); ItemPointer arg2 = PG_GETARG_ITEMPOINTER(1); ! PG_RETURN_ITEMPOINTER(ItemPointerCompare(arg1,arg2) <= 0 ? arg1 : arg2); } Index: src/bin/psql/describe.c =================================================================== RCS file: /cvsroot/pgsql/src/bin/psql/describe.c,v retrieving revision 1.142 diff -c -r1.142 describe.c *** src/bin/psql/describe.c 27 Jul 2006 19:52:06 -0000 1.142 --- src/bin/psql/describe.c 21 Aug 2006 23:33:21 -0000 *************** *** 942,948 **** PGresult *result; printfPQExpBuffer(&buf, ! "SELECT i.indisunique, i.indisprimary, i.indisclustered, a.amname, c2.relname,\n" " pg_catalog.pg_get_expr(i.indpred, i.indrelid, true)\n" "FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_am a\n" "WHERE i.indexrelid = c.oid AND c.oid = '%s' AND c.relam = a.oid\n" --- 942,948 ---- PGresult *result; printfPQExpBuffer(&buf, ! "SELECT i.indisunique, i.indisprimary, i.indisclustered, i.indisvalid, a.amname, c2.relname,\n" " pg_catalog.pg_get_expr(i.indpred, i.indrelid, true)\n" "FROM pg_catalog.pg_index i, pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_am a\n" "WHERE i.indexrelid = c.oid AND c.oid = '%s' AND c.relam = a.oid\n" *************** *** 962,970 **** char *indisunique = PQgetvalue(result, 0, 0); char *indisprimary = PQgetvalue(result, 0, 1); char *indisclustered = PQgetvalue(result, 0, 2); ! char *indamname = PQgetvalue(result, 0, 3); ! char *indtable = PQgetvalue(result, 0, 4); ! char *indpred = PQgetvalue(result, 0, 5); int count_footers = 0; if (strcmp(indisprimary, "t") == 0) --- 962,971 ---- char *indisunique = PQgetvalue(result, 0, 0); char *indisprimary = PQgetvalue(result, 0, 1); char *indisclustered = PQgetvalue(result, 0, 2); ! char *indisvalid = PQgetvalue(result, 0, 3); ! char *indamname = PQgetvalue(result, 0, 4); ! char *indtable = PQgetvalue(result, 0, 5); ! char *indpred = PQgetvalue(result, 0, 6); int count_footers = 0; if (strcmp(indisprimary, "t") == 0) *************** *** 985,990 **** --- 986,994 ---- if (strcmp(indisclustered, "t") == 0) appendPQExpBuffer(&tmpbuf, _(", clustered")); + if (strcmp(indisvalid, "f") == 0) + appendPQExpBuffer(&tmpbuf, _(", invalid")); + footers = pg_malloc_zero(4 * sizeof(*footers)); footers[count_footers++] = pg_strdup(tmpbuf.data); add_tablespace_footer(tableinfo.relkind, tableinfo.tablespace, *************** *** 1067,1073 **** if (tableinfo.hasindex) { printfPQExpBuffer(&buf, ! "SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered, " "pg_catalog.pg_get_indexdef(i.indexrelid, 0, true), c2.reltablespace\n" "FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i\n" "WHERE c.oid = '%s' AND c.oid = i.indrelid AND i.indexrelid = c2.oid\n" --- 1071,1077 ---- if (tableinfo.hasindex) { printfPQExpBuffer(&buf, ! "SELECT c2.relname, i.indisprimary, i.indisunique, i.indisclustered, i.indisvalid, " "pg_catalog.pg_get_indexdef(i.indexrelid, 0, true), c2.reltablespace\n" "FROM pg_catalog.pg_class c, pg_catalog.pg_class c2, pg_catalog.pg_index i\n" "WHERE c.oid = '%s' AND c.oid = i.indrelid AND i.indexrelid = c2.oid\n" *************** *** 1201,1207 **** ? " UNIQUE," : "")); /* Everything after "USING" is echoed verbatim */ ! indexdef = PQgetvalue(result1, i, 4); usingpos = strstr(indexdef, " USING "); if (usingpos) indexdef = usingpos + 7; --- 1205,1211 ---- ? " UNIQUE," : "")); /* Everything after "USING" is echoed verbatim */ ! indexdef = PQgetvalue(result1, i, 5); usingpos = strstr(indexdef, " USING "); if (usingpos) indexdef = usingpos + 7; *************** *** 1211,1221 **** if (strcmp(PQgetvalue(result1, i, 3), "t") == 0) appendPQExpBuffer(&buf, " CLUSTER"); /* Print tablespace of the index on the same line */ count_footers += 1; initPQExpBuffer(&tmpbuf); if (add_tablespace_footer('i', ! atooid(PQgetvalue(result1, i, 5)), footers, &count_footers, tmpbuf, false)) { appendPQExpBuffer(&buf, ", "); --- 1215,1228 ---- if (strcmp(PQgetvalue(result1, i, 3), "t") == 0) appendPQExpBuffer(&buf, " CLUSTER"); + if (strcmp(PQgetvalue(result1, i, 4), "f") == 0) + appendPQExpBuffer(&buf, " INVALID"); + /* Print tablespace of the index on the same line */ count_footers += 1; initPQExpBuffer(&tmpbuf); if (add_tablespace_footer('i', ! atooid(PQgetvalue(result1, i, 6)), footers, &count_footers, tmpbuf, false)) { appendPQExpBuffer(&buf, ", "); Index: src/include/catalog/catversion.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/catversion.h,v retrieving revision 1.352 diff -c -r1.352 catversion.h *** src/include/catalog/catversion.h 21 Aug 2006 00:57:26 -0000 1.352 --- src/include/catalog/catversion.h 21 Aug 2006 23:33:21 -0000 *************** *** 53,58 **** */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 200608191 #endif --- 53,58 ---- */ /* yyyymmddN */ ! #define CATALOG_VERSION_NO 200608211 #endif Index: src/include/catalog/index.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/index.h,v retrieving revision 1.70 diff -c -r1.70 index.h *** src/include/catalog/index.h 31 Jul 2006 01:16:37 -0000 1.70 --- src/include/catalog/index.h 21 Aug 2006 23:33:21 -0000 *************** *** 39,45 **** bool isprimary, bool isconstraint, bool allow_system_table_mods, ! bool skip_build); extern void index_drop(Oid indexId); --- 39,46 ---- bool isprimary, bool isconstraint, bool allow_system_table_mods, ! bool skip_build, ! bool concurrent); extern void index_drop(Oid indexId); *************** *** 67,70 **** --- 68,73 ---- extern void reindex_index(Oid indexId); extern bool reindex_relation(Oid relid, bool toast_too); + extern void validate_index(Oid IndexId); + #endif /* INDEX_H */ Index: src/include/catalog/pg_attribute.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/pg_attribute.h,v retrieving revision 1.123 diff -c -r1.123 pg_attribute.h *** src/include/catalog/pg_attribute.h 10 Jul 2006 16:20:51 -0000 1.123 --- src/include/catalog/pg_attribute.h 21 Aug 2006 23:33:21 -0000 *************** *** 460,468 **** { 0, {"indisunique"}, 16, -1, 1, 4, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ { 0, {"indisprimary"}, 16, -1, 1, 5, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ { 0, {"indisclustered"}, 16, -1, 1, 6, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 0, {"indkey"}, 22, -1, -1, 7, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \ ! { 0, {"indclass"}, 30, -1, -1, 8, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \ ! { 0, {"indexprs"}, 25, -1, -1, 9, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 0, {"indpred"}, 25, -1, -1, 10, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 } #endif /* PG_ATTRIBUTE_H */ --- 460,469 ---- { 0, {"indisunique"}, 16, -1, 1, 4, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ { 0, {"indisprimary"}, 16, -1, 1, 5, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ { 0, {"indisclustered"}, 16, -1, 1, 6, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 0, {"indisvalid"}, 16, -1, 1, 7, 0, -1, -1, true, 'p', 'c', true, false, false, true, 0 }, \ ! { 0, {"indkey"}, 22, -1, -1, 8, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \ ! { 0, {"indclass"}, 30, -1, -1, 9, 1, -1, -1, false, 'p', 'i', true, false, false, true, 0 }, \ ! { 0, {"indexprs"}, 25, -1, -1, 10, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 }, \ ! { 0, {"indpred"}, 25, -1, -1, 11, 0, -1, -1, false, 'x', 'i', false, false, false, true, 0 } #endif /* PG_ATTRIBUTE_H */ Index: src/include/catalog/pg_index.h =================================================================== RCS file: /cvsroot/pgsql/src/include/catalog/pg_index.h,v retrieving revision 1.39 diff -c -r1.39 pg_index.h *** src/include/catalog/pg_index.h 5 Mar 2006 15:58:54 -0000 1.39 --- src/include/catalog/pg_index.h 21 Aug 2006 23:33:21 -0000 *************** *** 41,46 **** --- 41,47 ---- bool indisunique; /* is this a unique index? */ bool indisprimary; /* is this index for primary key? */ bool indisclustered; /* is this the index last clustered by? */ + bool indisvalid; /* is this index valid for use by queries? */ /* VARIABLE LENGTH FIELDS: */ int2vector indkey; /* column numbers of indexed cols, or 0 */ *************** *** 63,78 **** * compiler constants for pg_index * ---------------- */ ! #define Natts_pg_index 10 #define Anum_pg_index_indexrelid 1 #define Anum_pg_index_indrelid 2 #define Anum_pg_index_indnatts 3 #define Anum_pg_index_indisunique 4 #define Anum_pg_index_indisprimary 5 #define Anum_pg_index_indisclustered 6 ! #define Anum_pg_index_indkey 7 ! #define Anum_pg_index_indclass 8 ! #define Anum_pg_index_indexprs 9 ! #define Anum_pg_index_indpred 10 #endif /* PG_INDEX_H */ --- 64,80 ---- * compiler constants for pg_index * ---------------- */ ! #define Natts_pg_index 11 #define Anum_pg_index_indexrelid 1 #define Anum_pg_index_indrelid 2 #define Anum_pg_index_indnatts 3 #define Anum_pg_index_indisunique 4 #define Anum_pg_index_indisprimary 5 #define Anum_pg_index_indisclustered 6 ! #define Anum_pg_index_indisvalid 7 ! #define Anum_pg_index_indkey 8 ! #define Anum_pg_index_indclass 9 ! #define Anum_pg_index_indexprs 10 ! #define Anum_pg_index_indpred 11 #endif /* PG_INDEX_H */ Index: src/include/commands/defrem.h =================================================================== RCS file: /cvsroot/pgsql/src/include/commands/defrem.h,v retrieving revision 1.75 diff -c -r1.75 defrem.h *** src/include/commands/defrem.h 18 Jul 2006 17:42:01 -0000 1.75 --- src/include/commands/defrem.h 21 Aug 2006 23:33:21 -0000 *************** *** 33,39 **** bool is_alter_table, bool check_rights, bool skip_build, ! bool quiet); extern void RemoveIndex(RangeVar *relation, DropBehavior behavior); extern void ReindexIndex(RangeVar *indexRelation); extern void ReindexTable(RangeVar *relation); --- 33,40 ---- bool is_alter_table, bool check_rights, bool skip_build, ! bool quiet, ! bool concurrent); extern void RemoveIndex(RangeVar *relation, DropBehavior behavior); extern void ReindexIndex(RangeVar *indexRelation); extern void ReindexTable(RangeVar *relation); Index: src/include/nodes/parsenodes.h =================================================================== RCS file: /cvsroot/pgsql/src/include/nodes/parsenodes.h,v retrieving revision 1.324 diff -c -r1.324 parsenodes.h *** src/include/nodes/parsenodes.h 21 Aug 2006 00:57:26 -0000 1.324 --- src/include/nodes/parsenodes.h 21 Aug 2006 23:33:21 -0000 *************** *** 1463,1468 **** --- 1463,1469 ---- bool unique; /* is index unique? */ bool primary; /* is index on primary key? */ bool isconstraint; /* is it from a CONSTRAINT clause? */ + bool concurrent; /* should this be a concurrent index build? */ } IndexStmt; /* ---------------------- Index: src/include/storage/itemptr.h =================================================================== RCS file: /cvsroot/pgsql/src/include/storage/itemptr.h,v retrieving revision 1.28 diff -c -r1.28 itemptr.h *** src/include/storage/itemptr.h 5 Mar 2006 15:58:59 -0000 1.28 --- src/include/storage/itemptr.h 21 Aug 2006 23:33:21 -0000 *************** *** 141,145 **** --- 141,146 ---- */ extern bool ItemPointerEquals(ItemPointer pointer1, ItemPointer pointer2); + extern int32 ItemPointerCompare(ItemPointer arg1, ItemPointer arg2); #endif /* ITEMPTR_H */ Index: src/test/regress/expected/create_index.out =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/expected/create_index.out,v retrieving revision 1.20 diff -c -r1.20 create_index.out *** src/test/regress/expected/create_index.out 11 Jul 2006 19:49:14 -0000 1.20 --- src/test/regress/expected/create_index.out 21 Aug 2006 23:33:21 -0000 *************** *** 360,362 **** --- 360,412 ---- create unique index hash_f8_index_1 on hash_f8_heap(abs(random)); create unique index hash_f8_index_2 on hash_f8_heap((seqno + 1), random); create unique index hash_f8_index_3 on hash_f8_heap(random) where seqno > 1000; + -- + -- Try some concurrent index builds + -- + -- Unfortunately this only tests about half the code paths because there are + -- no concurrent updates happening to the table at the same time. + CREATE TABLE concur_heap (f1 text, f2 text); + -- empty table + CREATE INDEX CONCURRENTLY concur_index1 ON concur_heap(f2,f1); + INSERT INTO concur_heap VALUES ('a','b'); + INSERT INTO concur_heap VALUES ('b','b'); + -- unique index + CREATE UNIQUE INDEX CONCURRENTLY concur_index2 ON concur_heap(f1); + -- check if constraint is set up properly to be enforced + INSERT INTO concur_heap VALUES ('b','x'); + ERROR: duplicate key violates unique constraint "concur_index2" + -- check if constraint is enforced properly at build time + CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2); + ERROR: could not create unique index + DETAIL: Table contains duplicated values. + -- test that expression indexes and partial indexes work concurrently + CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a'; + CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x'; + CREATE INDEX CONCURRENTLY concur_index6 on concur_heap((f2||f1)); + -- You can't do a concurrent index build in a transaction + BEGIN; + CREATE INDEX CONCURRENTLY concur_index7 ON concur_heap(f1); + ERROR: CREATE INDEX CONCURRENTLY cannot run inside a transaction block + COMMIT; + -- But you can do a regular index build in a transaction + BEGIN; + CREATE INDEX std_index on concur_heap(f2); + COMMIT; + -- check to make sure that the failed indexes were cleaned up properly and the + -- successful indexes are created properly. Notably that they do NOT have the + -- "invalid" flag set. + \d concur_heap + Table "public.concur_heap" + Column | Type | Modifiers + --------+------+----------- + f1 | text | + f2 | text | + Indexes: + "concur_index2" UNIQUE, btree (f1) + "concur_index1" btree (f2, f1) + "concur_index4" btree (f2) WHERE f1 = 'a'::text + "concur_index5" btree (f2) WHERE f1 = 'x'::text + "concur_index6" btree ((f2 || f1)) + "std_index" btree (f2) + + DROP TABLE concur_heap; Index: src/test/regress/sql/create_index.sql =================================================================== RCS file: /cvsroot/pgsql/src/test/regress/sql/create_index.sql,v retrieving revision 1.19 diff -c -r1.19 create_index.sql *** src/test/regress/sql/create_index.sql 11 Jul 2006 19:49:14 -0000 1.19 --- src/test/regress/sql/create_index.sql 21 Aug 2006 23:33:22 -0000 *************** *** 219,221 **** --- 219,261 ---- create unique index hash_f8_index_1 on hash_f8_heap(abs(random)); create unique index hash_f8_index_2 on hash_f8_heap((seqno + 1), random); create unique index hash_f8_index_3 on hash_f8_heap(random) where seqno > 1000; + + -- + -- Try some concurrent index builds + -- + -- Unfortunately this only tests about half the code paths because there are + -- no concurrent updates happening to the table at the same time. + + CREATE TABLE concur_heap (f1 text, f2 text); + -- empty table + CREATE INDEX CONCURRENTLY concur_index1 ON concur_heap(f2,f1); + INSERT INTO concur_heap VALUES ('a','b'); + INSERT INTO concur_heap VALUES ('b','b'); + -- unique index + CREATE UNIQUE INDEX CONCURRENTLY concur_index2 ON concur_heap(f1); + -- check if constraint is set up properly to be enforced + INSERT INTO concur_heap VALUES ('b','x'); + -- check if constraint is enforced properly at build time + CREATE UNIQUE INDEX CONCURRENTLY concur_index3 ON concur_heap(f2); + -- test that expression indexes and partial indexes work concurrently + CREATE INDEX CONCURRENTLY concur_index4 on concur_heap(f2) WHERE f1='a'; + CREATE INDEX CONCURRENTLY concur_index5 on concur_heap(f2) WHERE f1='x'; + CREATE INDEX CONCURRENTLY concur_index6 on concur_heap((f2||f1)); + + -- You can't do a concurrent index build in a transaction + BEGIN; + CREATE INDEX CONCURRENTLY concur_index7 ON concur_heap(f1); + COMMIT; + + -- But you can do a regular index build in a transaction + BEGIN; + CREATE INDEX std_index on concur_heap(f2); + COMMIT; + + -- check to make sure that the failed indexes were cleaned up properly and the + -- successful indexes are created properly. Notably that they do NOT have the + -- "invalid" flag set. + + \d concur_heap + + DROP TABLE concur_heap;