Re: DDL Damage Assessment

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DDL Damage Assessment
Date: 2014-10-02 21:11:41
Message-ID: 20141002211141.GA28859@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Peter Geoghegan (pg(at)heroku(dot)com) wrote:
> On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > The downside of the 'explain' approach is that the script then has to be
> > modified to put 'explain' in front of everything and then you have to go
> > through each statement and consider it. Having a 'dry-run' transaction
> > type which then produces a report at the end feels like it'd be both
> > easier to assess the overall implications, and less error-prone as you
> > don't have to prefex every statement with 'explain'. It might even be
> > possible to have the local "view" of post-alter statements be available
> > inside of this 'dry-run' option- that is, if you add a column in the
> > transaction then the column exists to the following commands, so it
> > doesn't just error out. Having 'explain <whatever>' wouldn't give you
> > that and so you really wouldn't be able to have whole scripts run by
> > just pre-pending each command with 'explain'.
>
> It's kind of tricky to implement a patch to figure this out ahead of
> time. Some of the actual lock acquisitions are well hidden, in terms
> of how the code is structured. In others cases, it may not even be
> possible to determine ahead of time exactly what locks will be taken.

I was thinking this would be a new kind of transaction and we'd have to
teach parts of the system about it- yes, that's pretty invasive, but
it's at least one approach to consider.

> As Harold mentioned, another idea along the same lines would be to
> decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite
> assertion". Basically, if this DDL (or perhaps any DDL, if this is
> implemented as a GUC instead) necessitates a table rewrite (and
> requires an AccessExclusiveLock), throw an error. That's the case that
> most people care about.

The problem I see with this approach is outlined above. I agree that it
may be independently valuable, but I don't see it as being a solution to
the issue.

> This may not even be good enough, though. Consider:
>
> Session 1 is a long running transaction. Maybe it's a spurious
> idle-in-transaction situation, but it could also be totally
> reasonable. It holds an AccessShareLock on some relation, as long
> running transactions are inclined to do.
>
> Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
> TABLE on the same relation (or whatever). But it doesn't need a
> rewrite, which is good. It comes along and attempts to acquire the
> lock, blocking on session 1.
>
> Session 3 is an innocent bystander. It goes to query the same table in
> an ordinary, routine way - a SELECT statement. Even though session 2's
> lock is not granted yet, session 3 is not at liberty to skip the queue
> and get its own AccessShareLock. The effect is about the same as if
> session 2 did need to hold an AccessExclusiveLock for ages: read
> queries block for a long time. And yet, in theory session 2's impact
> on production should not be minimal, if we consider something like
> EXPLAIN output.
>
> Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
> particularly bad case. NOWAIT might be the wrong thing for DDL
> generally.

I agree that this is a concern, but this feels to me like a next-step
over top of the "assess the locks required" transaction type which I am
trying to outline. Specifically, having a way to take the report of
what locks are going to be required and then actaully attempt to acquire
them all (or fail if any can't be granted immediately) would be a
natural next step and a way to start off the actual migration script-
either all get acquired and the script runs to completion, or a lock
isn't granted and the whole thing fails immediately without anything
actually being done.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-02 21:14:13 Re: DDL Damage Assessment
Previous Message Andres Freund 2014-10-02 21:11:39 Re: TAP test breakage on MacOS X