Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:

From: Noah Misch <noah(at)leadboat(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:
Date: 2014-03-21 03:45:56
Message-ID: 20140321034556.GA3927180@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 08, 2014 at 11:14:30AM +0000, Simon Riggs wrote:
> On 7 March 2014 09:04, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > The right thing to do here is to not push to the extremes. If we mess
> > too much with the ruleutil stuff it will just be buggy. A more
> > considered analysis in a later release is required for a full and
> > complete approach. As I indicated earlier, an 80/20 solution is better
> > for this release.
> >
> > Slimming down the patch, I've removed changes to lock levels for
> > almost all variants. The only lock levels now reduced are those for
> > VALIDATE, plus setting of relation and attribute level options.

Good call.

> The following commands (only) are allowed with
> ShareUpdateExclusiveLock, patch includes doc changes.
>
> ALTER TABLE ... VALIDATE CONSTRAINT constraint_name
> covered by isolation test, plus verified manually with pg_dump

I found a pre-existing bug aggravated by this, which I will shortly report on
a new thread.

> ALTER TABLE ... ALTER COLUMN ... SET STATISTICS
> ALTER TABLE ... ALTER COLUMN ... SET (...)
> ALTER TABLE ... ALTER COLUMN ... RESET (...)
>
> ALTER TABLE ... CLUSTER ON ...
> ALTER TABLE ... SET WITHOUT CLUSTER

A comment at check_index_is_clusterable() still mentions "exclusive lock".

> ALTER TABLE ... SET (...)
> covered by isolation test
>
> ALTER TABLE ... RESET (...)
>
> ALTER INDEX ... SET (...)
> ALTER INDEX ... RESET (...)

See discussion below.

> All other ALTER commands take AccessExclusiveLock

> --- a/doc/src/sgml/mvcc.sgml
> +++ b/doc/src/sgml/mvcc.sgml
> @@ -865,7 +865,8 @@ ERROR: could not serialize access due to read/write dependencies among transact
> <para>
> Acquired by <command>VACUUM</command> (without <option>FULL</option>),
> <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
> - some forms of <command>ALTER TABLE</command>.
> + <command>ALTER TABLE VALIDATE</command> and other
> + <command>ALTER TABLE</command> variants that set options.

ALTER TABLE's documentation covers the details, so the old text sufficed for
here. I find "variants that set options" too vague, considering the nuances
of the actual list of affected forms.

> </para>
> </listitem>
> </varlistentry>
> @@ -951,7 +952,7 @@ ERROR: could not serialize access due to read/write dependencies among transact
> </para>
>
> <para>
> - Acquired by the <command>ALTER TABLE</>, <command>DROP TABLE</>,
> + Acquired by the <command>ALTER TABLE</> for rewriting, <command>DROP TABLE</>,
> <command>TRUNCATE</command>, <command>REINDEX</command>,
> <command>CLUSTER</command>, and <command>VACUUM FULL</command>
> commands.

Forms that rewrite the table are just one class of examples. I would write
"Acquired by DROP TABLE, ..., VACUUM FULL, and some forms of ALTER TABLE."

> --- a/doc/src/sgml/ref/alter_table.sgml
> +++ b/doc/src/sgml/ref/alter_table.sgml

> @@ -420,6 +439,9 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
> index specification from the table. This affects
> future cluster operations that don't specify an index.
> </para>
> + <para>
> + Changing cluster options uses a <literal>SHARE UPDATE EXCLUSIVE</literal> lock.
> + </para>
> </listitem>
> </varlistentry>
>
> @@ -467,6 +489,10 @@ ALTER TABLE [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
> FULL</>, <xref linkend="SQL-CLUSTER"> or one of the forms
> of <command>ALTER TABLE</> that forces a table rewrite.
> </para>
> + <para>
> + Changing storage parameters requires only a
> + <literal>SHARE UPDATE EXCLUSIVE</literal> lock.
> + </para>

Some places say "requires only", while others say "uses". Please adopt one of
those consistently. I somewhat prefer the latter.

> @@ -1075,6 +1105,14 @@ ALTER TABLE distributors ADD CONSTRAINT distfk FOREIGN KEY (address) REFERENCES
> </para>
>
> <para>
> + To add a foreign key constraint to a table minimising impact on other work:

Our documentation has used the "minimize" spelling exclusively.

> --- a/src/backend/catalog/toasting.c
> +++ b/src/backend/catalog/toasting.c

> @@ -52,7 +53,7 @@ static bool needs_toast_table(Relation rel);
> * to end with CommandCounterIncrement if it makes any changes.
> */
> void
> -AlterTableCreateToastTable(Oid relOid, Datum reloptions)
> +AlterTableCreateToastTable(Oid relOid, Datum reloptions, LOCKMODE lockmode)
> {
> Relation rel;
>
> @@ -63,10 +64,10 @@ AlterTableCreateToastTable(Oid relOid, Datum reloptions)
> * concurrent readers of the pg_class tuple won't have visibility issues,
> * so let's be safe.
> */

The comment ending right here is falsified by the change.

> - rel = heap_open(relOid, AccessExclusiveLock);
> + rel = heap_open(relOid, lockmode);

We now request whatever lock our caller has already taken. If that is
AccessExclusiveLock, create_toast_table() could actually add a toast table.
Otherwise, it will either deduce that no change is required or raise an error.

> @@ -161,6 +164,14 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
> return false;
>
> /*
> + * We need to create a toast table. We shouldn't
> + * have got this far if we're in ALTER TABLE unless
> + * we are holding AccessExclusiveLock.
> + */
> + if (lockmode < AccessExclusiveLock)
> + elog(ERROR, "AccessExclusiveLock required to add toast tables.");

Our model of lock levels has not regarded them as being ordered, so I
recommend "lockmode != AccessExclusiveLock". (I don't object to using
incidental order in AlterTableGetLockLevel(), where all the relevant levels
originate in the same function.)

> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -2687,8 +2687,7 @@ AlterTableLookupRelation(AlterTableStmt *stmt, LOCKMODE lockmode)
> * The caller must lock the relation, with an appropriate lock level
> * for the subcommands requested. Any subcommand that needs to rewrite
> * tuples in the table forces the whole command to be executed with
> - * AccessExclusiveLock (actually, that is currently required always, but
> - * we hope to relax it at some point). We pass the lock level down
> + * AccessExclusiveLock. We pass the lock level down

The set of commands needing AccessExclusiveLock is much broader than those
rewriting the table. Consider replacing the quoted paragraph with something
like:

* The caller has locked the relation with "lockmode", which must be at least
* as strong as AlterTableGetLockLevel(stmt->cmds). We pass the lock level
* down so ...

> @@ -2750,35 +2772,29 @@ AlterTableInternal(Oid relid, List *cmds, bool recurse)
> * ALTER TABLE RENAME which is treated as a different statement type T_RenameStmt
> * and does not travel through this section of code and cannot be combined with
> * any of the subcommands given here.
> + *
> + * Note that Hot Standby only knows about AccessExclusiveLocks on the master
> + * so any changes that might affect SELECTs running on standbys need to use
> + * AccessExclusiveLocks even if you think a lesser lock would do, unless you
> + * have a solution for that also.

Out of curiosity, do SELECTs on hot standbys impose known challenges in this
area not shared with local SELECTs?

> + *
> + * Also note that pg_dump uses only an AccessShareLock, meaning that anything
> + * that takes a lock less than AccessExclusiveLock can change object definitions
> + * while pg_dump is running. Be careful to check that the appropriate data is
> + * derived by pg_dump using an MVCC snapshot, rather than syscache lookups,
> + * otherwise we might end up with an inconsistent dump that can't restore.
> + *
> + * Be careful to ensure this function is called for Tables and Indexes only.
> + * It is not currently safe to be called for Views because security_barrier
> + * is listed as an option and so would be allowed to be set at a level lower
> + * than AccessExclusiveLock, which would not be correct.

This statement is accepted and takes only ShareUpdateExclusiveLock:

alter table information_schema.triggers set (security_barrier = true);

I suggest adding a LOCKMODE field to relopt_gen and adding a
reloptions_locklevel() function to determine the required level from an
options list. That puts control of the lock level near the code that
understands the implications for each option. You can then revert the
addition of AlterViewInternal() and some of the utility.c changes.

If I had to pick a reloption most likely to benefit from AccessExclusiveLock,
it would be user_catalog_table rather than security_barrier. I wonder if
output plugins would want a clear moment in the WAL stream -- the commit of
the transaction that sets the reloption -- after which they can assume that
all future records pertaining to the table in question have the needed bits.
If so, that implies AccessExclusiveLock for this reloption, because
heap_page_prune_opt() issues affected WAL records under AccessShareLock.

AT_ReplaceRelOptions could require the highest lock level required by any
option. It usage seems to be limited to CREATE OR REPLACE VIEW, so I would
give it AccessExclusiveLock and be done.

> - * 2. Relcache needs to be internally consistent, so unless we lock the
> - * definition during reads we have no way to guarantee that.

I looked for hazards like this, but I found none in the ALTER forms covered by
this patch. None of them modify multiple catalog rows affecting the same
relcache entry. However, thinking about that did lead me to ponder another
class of hazards. When backends can use one or more relations concurrently
with a DDL operation affecting those relations, those backends can find
themselves running with a subset of the catalog changes made within a
particular DDL operation. Consider VALIDATE CONSTRAINT against an inherited
constraint of an inheritance parent. It validates child table constraints,
modifying one catalog row per table. At COMMIT time, we queue sinval messages
for all affected tables. We add to the queue in atomic groups of
WRITE_QUANTUM (64) messages. Between two such groups joining the queue,
another backend may process the first group of messages. If the original DDL
used AccessExclusiveLock, this is always harmless. The DDL-issuing backend
still holds its lock, which means the inval-accepting backend must not have
the relation open. If the inval-accepting backend later opens the affected
relation, it will first acquire some lock and process the rest of the
invalidations from the DDL operation. When doing DDL under a weaker lock, the
inval-accepting backend might apply half the invalidations and immediately use
them in the context of an open relation. For VALIDATE CONSTRAINT, this means
a backend might briefly recognize only a subset of the inheritance tree
becoming valid. (I did not actually build a test case to confirm this.)

Considering that constraint exclusion is the sole consumer of
convalidated/ccvalid that can run in parallel with VALIDATE CONSTRAINT, I
think this is harmless. I did not find problems of this nature in any ALTER
TABLE forms affected by the patch. Let's just keep it in mind during future
lock level changes.

> --- a/src/backend/utils/adt/ruleutils.c
> +++ b/src/backend/utils/adt/ruleutils.c

> static char *
> pg_get_constraintdef_worker(Oid constraintId, bool fullCommand,
> - int prettyFlags)
> + int prettyFlags, bool useSyscache)
> {
> HeapTuple tup;
> Form_pg_constraint conForm;
> StringInfoData buf;
> + SysScanDesc scandesc;
> + ScanKeyData scankey[1];
> + Relation relation;

gcc 4.2.4 gives me these warnings:

ruleutils.c: In function `pg_get_constraintdef_worker':
ruleutils.c:1318: warning: `relation' may be used uninitialized in this function
ruleutils.c:1316: warning: `scandesc' may be used uninitialized in this function

> +
> + if (useSyscache)
> + tup = SearchSysCache1(CONSTROID, ObjectIdGetDatum(constraintId));
> + else
> + {
> + Snapshot snapshot = RegisterSnapshot(GetTransactionSnapshot());
> + PushActiveSnapshot(snapshot);

There's no need to push the snapshot; registration is enough.

> +
> + relation = heap_open(ConstraintRelationId, AccessShareLock);
> +
> + ScanKeyInit(&scankey[0],
> + ObjectIdAttributeNumber,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(constraintId));
> +
> + scandesc = systable_beginscan(relation,
> + ConstraintOidIndexId,
> + true,
> + snapshot,
> + 1,
> + scankey);
> + tup = systable_getnext(scandesc);
> +
> + PopActiveSnapshot();
> + UnregisterSnapshot(snapshot);
> + }

Later code uses SysCacheGetAttr() regardless of how we acquired the tuple.
That works, but it deserves a brief comment mention.

pg_get_constraintdef_mvcc() still does syscache lookups by way of
decompile_column_index_array(), get_constraint_index(), and
deparse_expression_pretty(). It uses MVCC for things that matter for pg_dump
vs. reduced lock levels, but not comprehensively. I recommend not adding a
new function and instead changing pg_get_constraintdef() to use the
transaction snapshot unconditionally. It's hard to imagine an application
author making a reasoned choice between the two. Our in-tree callers, psql
and pg_dump, have no cause to prefer the less-MVCC behavior at any time.

> --- a/src/backend/utils/cache/relcache.c
> +++ b/src/backend/utils/cache/relcache.c
> @@ -161,6 +161,14 @@ static bool eoxact_list_overflowed = false;
> eoxact_list_overflowed = true; \
> } while (0)
>
> +/*
> + * EOXactTupleDescArray stores *TupleDescs that (might) need AtEOXact

Stray asterisk.

> @@ -2312,6 +2334,37 @@ RelationCloseSmgrByOid(Oid relationId)
> RelationCloseSmgr(relation);
> }
>
> +void
> +RememberToFreeTupleDescAtEOX(TupleDesc td)
> +{
> + if (EOXactTupleDescArray == NULL)
> + {
> + MemoryContext oldcxt;
> + oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> +
> + EOXactTupleDescArray = (TupleDesc *) palloc(16 * sizeof(TupleDesc));
> + EOXactTupleDescArrayLen = 16;
> + NextEOXactTupleDescNum = 0;
> + MemoryContextSwitchTo(oldcxt);
> + }
> + else if (NextEOXactTupleDescNum >= EOXactTupleDescArrayLen)
> + {
> + MemoryContext oldcxt;
> + int32 newlen = EOXactTupleDescArrayLen * 2;
> +
> + oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
> +
> + Assert(EOXactTupleDescArrayLen > 0);
> +
> + EOXactTupleDescArray = (TupleDesc *) repalloc(EOXactTupleDescArray,
> + newlen * sizeof(TupleDesc));
> + EOXactTupleDescArrayLen = newlen;
> + MemoryContextSwitchTo(oldcxt);

No need to switch contexts for repalloc().

> --- /dev/null
> +++ b/src/test/isolation/specs/alter-table-1.spec
> @@ -0,0 +1,34 @@
> +# ALTER TABLE - Add and Validate constraint with concurrent writes
> +#
> +# ADD FOREIGN KEY allows a minimum of ShareRowExclusiveLock,
> +# so we mix writes with it to see what works or waits.
> +# VALIDATE allows a minimum of ShareUpdateExclusiveLock
> +# so we mix reads with it to see what works or waits

This comment is obsolete regarding the ADD FOREIGN KEY lock type.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-03-21 04:12:21 equalTupleDescs() ignores ccvalid/ccnoinherit
Previous Message Craig Ringer 2014-03-21 03:22:06 Re: QSoC proposal: Rewrite pg_dump and pg_restore