Re: Make relation_openrv atomic wrt DDL

From: Noah Misch <noah(at)leadboat(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: Make relation_openrv atomic wrt DDL
Date: 2011-06-20 03:42:11
Message-ID: 20110620034211.GA3320@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Greg,

On Sun, Jun 19, 2011 at 11:44:35PM +0100, Greg Stark wrote:
> So I was the victim assigned to review this patch.

Thanks for doing so.

> The code is pretty much impeccable as usual from Noah. But I have
> questions about the semantics of it.
>
> Firstly this bit makes me wonder:
>
> + /* Not-found is always final. */
> + if (!OidIsValid(relOid1))
> + return relOid1;
>
> If someone does
>
> BEGIN; DROP TABLE foo; CREATE TABLE foo; COMMIT;
>
> Then what prevents this logic from finding the doomed relation,
> blocking until the transaction commits, then finding it's deleted and
> returning InvalidOid?
> RangeVarGetRelid is just going to complete its index scan of pg_class
> and may not come across the newly inserted row.

RangeVarGetRelid() always runs its index scan to completion, and the blocking
happens in LockRelationOid(). You will get a sequence like this:

RangeVarGetRelid("foo") => 20000
LockRelationOid(20000) [... blocks ...]
AcceptInvalidationMessages() [process a message]
RangeVarGetRelid("foo") => 20001
[restart loop]
LockRelationOid(20001)
AcceptInvalidationMessages() [no new messages - done]

RangeVarGetRelid() *is* vulnerable to the problem Simon just reported in the
"ALTER TABLE lock strength reduction patch is unsafe" thread, which arises
when the DDL transaction actually commits in the middle of a concurrent system
table scan. I don't think this patch makes things better or worse in that
regard, but I haven't thought it through in great depth.

> Am I missing something? I would have expected to have to loop around
> and retry if no valid record is found. But this raises the question --
> if no lock was acquired then what would have triggered an
> AcceptInvalidatationMessages and how would we know we waited long
> enough to find out about the newly created table?

Good question. I think characterizing "long enough" quickly leads to defining
one or more sequence points after which all backends must recognize a new
table as existing. My greenfield definition would be "a command should see
precisely the tables visible to its MVCC snapshot", but that has practical
problems. Let's see what implementation concerns would suggest...

This leads to a case I had not considered explicitly: CREATE TABLE on a name
that has not recently mapped to any table. If the catcache has a negative
entry on the key in question, we will rely on that and miss the new table
until we call AcceptInvalidationMessages() somehow. To hit this, you need a
command that dynamically chooses to query a table that has been created since
the command started running. DROP/CREATE of the same name in a single
transaction can't hit the problem. Consider this test script:

psql -X <<\_EOSQL &
-- Cleanup from last run
DROP TABLE IF EXISTS public.foo;

BEGIN;

-- Create the neg catcache entry.
SAVEPOINT q;
SELECT 1 FROM public.foo;
ROLLBACK to q;

--SET client_min_messages = debug5; -- use with CACHEDEBUG for insight
DO $$
BEGIN
EXECUTE 'SELECT 1 FROM pg_am'; -- prime basic catcache entries
PERFORM pg_sleep(11);
EXECUTE 'SELECT 1 FROM public.foo';
END
$$;
_EOSQL

sleep 1
psql -Xc 'CREATE TABLE public.foo ()'

wait

The first backend fails to see the new table despite its creating transaction
having committed ~10s ago. Starting a transaction, beginning to process a new
client-issued command, or successfully locking any relation prevents the miss.
We could narrow the window in most cases by re-adding a call to
AcceptInvalidationMessages() before RangeVarLockRelid()'s first call to
RangeVarGetRelid(). My current thinking is that it's not worth adding that
cost to every RangeVarLockRelid(). Thus, specify that, minimally, each
client-issued command will see all tables whose names were occupied at the
time the command started. I would add a comment to that effect. Thoughts?

> As a side note, if there are a long stream of such concurrent DDL then
> this code will leave all the old versions locked. This is consistent
> with our "hold locks until end of transaction" semantics but it seems
> weird for tables that we locked "accidentally" and didn't really end
> up using at all. I'm not sure it's really bad though.

Yes. If that outcome were more common, this would be a good place to try
relaxing the rule.

nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-06-20 03:47:14 Re: patch for 9.2: enhanced errors
Previous Message Steve Singer 2011-06-20 02:32:15 Re: patch for 9.2: enhanced errors