Re: locks in CREATE TRIGGER, ADD FK

Lists: pgsql-hackers
From: Neil Conway <neilc(at)samurai(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-22 05:56:31
Message-ID: 423FB38F.4090002@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

AndrewSN pointed out on IRC that ALTER TABLE ... ADD FOREIGN KEY and
CREATE TRIGGER both acquire AccessExclusiveLocks on the table they are
adding triggers to (the PK table, in the case of ALTER TABLE). Is this
necessary? I don't see why we can't allow SELECT queries on the table to
proceed while the trigger is being added.

-Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-22 06:10:11
Message-ID: 423FB6C3.8010107@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> AndrewSN pointed out on IRC that ALTER TABLE ... ADD FOREIGN KEY and
> CREATE TRIGGER both acquire AccessExclusiveLocks on the table they are
> adding triggers to (the PK table, in the case of ALTER TABLE). Is this
> necessary? I don't see why we can't allow SELECT queries on the table to
> proceed while the trigger is being added.

Sorry, I forgot to mention: I think RowExclusiveLock or ExclusiveLock
would be sufficient instead.

-Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 01:32:58
Message-ID: 4240C74A.1020201@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> AndrewSN pointed out on IRC that ALTER TABLE ... ADD FOREIGN KEY and
> CREATE TRIGGER both acquire AccessExclusiveLocks on the table they are
> adding triggers to (the PK table, in the case of ALTER TABLE). Is this
> necessary? I don't see why we can't allow SELECT queries on the table to
> proceed while the trigger is being added.

Attached is a patch that changes both to use ShareRowExclusiveLock, and
updates the documentation accordingly. I'll apply this later today,
barring any objections.

-Neil

Attachment Content-Type Size
create_trigger_lock-1.patch text/x-patch 2.9 KB

From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 01:40:21
Message-ID: 4240C905.9080301@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

If you want to be my friend forever, then fix CLUSTER so that it uses
sharerowexclusive as well :D

Chris

Neil Conway wrote:
> Neil Conway wrote:
>
>> AndrewSN pointed out on IRC that ALTER TABLE ... ADD FOREIGN KEY and
>> CREATE TRIGGER both acquire AccessExclusiveLocks on the table they are
>> adding triggers to (the PK table, in the case of ALTER TABLE). Is this
>> necessary? I don't see why we can't allow SELECT queries on the table
>> to proceed while the trigger is being added.
>
>
> Attached is a patch that changes both to use ShareRowExclusiveLock, and
> updates the documentation accordingly. I'll apply this later today,
> barring any objections.


From: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 02:37:59
Message-ID: 200503231337.59881.mr-russ@pws.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 23 Mar 2005 12:40 pm, Christopher Kings-Lynne wrote:
> If you want to be my friend forever, then fix CLUSTER so that it uses
> sharerowexclusive as well :D
>
I don't think it's as easy as that, because you have to move tuples
around in the cluster operation. Same sort of issue as vacuum full I would suggest.

Russell Smith

> Chris
>
> Neil Conway wrote:
> > Neil Conway wrote:
> >
> >> AndrewSN pointed out on IRC that ALTER TABLE ... ADD FOREIGN KEY and
> >> CREATE TRIGGER both acquire AccessExclusiveLocks on the table they are
> >> adding triggers to (the PK table, in the case of ALTER TABLE). Is this
> >> necessary? I don't see why we can't allow SELECT queries on the table
> >> to proceed while the trigger is being added.
> >
> >
> > Attached is a patch that changes both to use ShareRowExclusiveLock, and
> > updates the documentation accordingly. I'll apply this later today,
> > barring any objections.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 02:42:01
Message-ID: 4240D779.3070103@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>If you want to be my friend forever, then fix CLUSTER so that it uses
>>sharerowexclusive as well :D
>
> I don't think it's as easy as that, because you have to move tuples
> around in the cluster operation. Same sort of issue as vacuum full I would suggest.

Cluster doesn't move rows...

I didn't say it was easy. It would involve changing how cluster works.
It would keep the old table around while building the new, then grab
an exclusive lock to swap the two.

Chris


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 02:49:41
Message-ID: 20050323024941.GA7702@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 23, 2005 at 10:42:01AM +0800, Christopher Kings-Lynne wrote:
> >>If you want to be my friend forever, then fix CLUSTER so that it uses
> >>sharerowexclusive as well :D
> >
> >I don't think it's as easy as that, because you have to move tuples
> >around in the cluster operation. Same sort of issue as vacuum full I
> >would suggest.
>
> Cluster doesn't move rows...
>
> I didn't say it was easy. It would involve changing how cluster works.
> It would keep the old table around while building the new, then grab
> an exclusive lock to swap the two.

Huh, cluster already does that.

I don't remember what the rationale was for locking the table, leaving
even simple SELECTs out. (In fact, IIRC the decision wasn't made by me,
and it wasn't discussed at all.)

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
"I would rather have GNU than GNOT." (ccchips, lwn.net/Articles/37595/)


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 02:54:22
Message-ID: 4240DA5E.6030005@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Huh, cluster already does that.

It does and it doesn't. Something like the first thing it does is muck
with the old table's filenode IIRC, meaning that immediately the old
table will no longer work.

Chris


From: Neil Conway <neilc(at)samurai(dot)com>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 02:55:53
Message-ID: 4240DAB9.1020601@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne wrote:
> If you want to be my friend forever, then fix CLUSTER so that it uses
> sharerowexclusive as well :D

Hmm, this might be possible as well. During a CLUSTER, we currently

- lock the heap relation with AccessExclusiveLock
- lock the index we're clustering on with AccessExclusiveLock
- create a temporary heap relation
- fill with data from the old heap relation, via an index scan
- swap the relfilenodes of the old and temporary heap relations
- rebuild indexes

We certainly can't allow concurrent modifications to either the table or
the clustered index while this is happening. Allowing index scans
*should* be safe -- an index scan could result in modifications to the
index (e.g. updating "tuple is killed" bits), but those shouldn't be
essential. We might also want to disallow SELECT FOR UPDATE, since we
would end up invoking heap_mark4update() on the old heap relation. Not
sure offhand how serious that would be.

So I think it should be possible to lock both the heap relation and the
index with ExclusiveLock, which would allow SELECTs on them. This would
apply to both the single relation and multiple relation variants of
CLUSTER (since we do each individual clustering in its own transaction).

... except that when we rebuild the relation's indexes, we acquire an
AccessExclusiveLock on the index. This would introduce the risk of
deadlock. It seems necessary to acquire an AccessExclusiveLock when
rebuilding shared indexes, since we do the index build in-place, but I
think we can get by with an ExclusiveLock in the non-shared case, for
similar reasons as above: we build the new index and then swap relfilenodes.

-Neil


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 03:03:26
Message-ID: 200503230303.j2N33Q813031@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> So I think it should be possible to lock both the heap relation and the
> index with ExclusiveLock, which would allow SELECTs on them. This would
> apply to both the single relation and multiple relation variants of
> CLUSTER (since we do each individual clustering in its own transaction).
>
> ... except that when we rebuild the relation's indexes, we acquire an
> AccessExclusiveLock on the index. This would introduce the risk of
> deadlock. It seems necessary to acquire an AccessExclusiveLock when
> rebuilding shared indexes, since we do the index build in-place, but I
> think we can get by with an ExclusiveLock in the non-shared case, for
> similar reasons as above: we build the new index and then swap relfilenodes.

Certainly we need to upgrade to an exclusive table lock to replace the
heap table. Do we want to get a shared lock and possibly starve waiting
for an exclusive lock on the table to swap the new one in? Do we do
such escallation anywhere else?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Neil Conway <neilc(at)samurai(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 03:12:24
Message-ID: 4240DE98.40806@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> Certainly we need to upgrade to an exclusive table lock to replace the
> heap table.

Well, we will be holding an ExclusiveLock on the heap relation
regardless. We "replace" the heap table by swapping its relfilenode, so
ISTM we needn't hold an AccessExclusiveLock.

> Do we want to get a shared lock and possibly starve waiting
> for an exclusive lock on the table to swap the new one in?

What I'm saying is that REINDEX on non-shared indexes need only acquire
an ExclusiveLock, and hence not need to escalate its lock.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:03:36
Message-ID: 1385.1111550616@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
>> AndrewSN pointed out on IRC that ALTER TABLE ... ADD FOREIGN KEY and
>> CREATE TRIGGER both acquire AccessExclusiveLocks on the table they are
>> adding triggers to (the PK table, in the case of ALTER TABLE). Is this
>> necessary? I don't see why we can't allow SELECT queries on the table to
>> proceed while the trigger is being added.

> Attached is a patch that changes both to use ShareRowExclusiveLock, and
> updates the documentation accordingly. I'll apply this later today,
> barring any objections.

I don't think this has been adequately thought through at all ... but
at least make it ExclusiveLock. What is the use-case for allowing
SELECT FOR UPDATE in parallel with this? One may suppose that someone
doing SELECT FOR UPDATE intends an UPDATE. (No, don't tell me about
foreign keys. Alvaro is going to fix that.)

As Chris suggests nearby, this is really only the tip of the iceberg.
I would prefer to see someone do a survey of all our DDL commands and
put forward a coherent proposal for minimum required locks for all of
them.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:06:18
Message-ID: 1420.1111550778@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:

> /*
> ! * Grab an exclusive lock on the pk table, so that someone doesn't
> ! * delete rows out from under us. (Although a lesser lock would do for
> ! * that purpose, we'll need exclusive lock anyway to add triggers to
> ! * the pk table; trying to start with a lesser lock will just create a
> ! * risk of deadlock.)
> */
> ! pkrel = heap_openrv(fkconstraint->pktable, AccessExclusiveLock);

> /*
> * Validity and permissions checks
> --- 3829,3839 ----
> Oid constrOid;

> /*
> ! * Grab a lock on the pk table, so that someone doesn't delete
> ! * rows out from under us; ShareRowExclusive should be good
> ! * enough.
> */

BTW, the above comment change is seriously inadequate, because it
removes the explanation of *why* that is the minimum required lock.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:07:45
Message-ID: 1447.1111550865@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
> It would keep the old table around while building the new, then grab
> an exclusive lock to swap the two.

Lock upgrading is right out.

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:09:11
Message-ID: 4240EBE7.4090207@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> ... except that when we rebuild the relation's indexes, we acquire an
> AccessExclusiveLock on the index. This would introduce the risk of
> deadlock. It seems necessary to acquire an AccessExclusiveLock when
> rebuilding shared indexes, since we do the index build in-place, but I
> think we can get by with an ExclusiveLock in the non-shared case, for
> similar reasons as above: we build the new index and then swap
> relfilenodes.

From looking at the code, it should be quite possible to do this.

Further points from discussion on IRC:

- TRUNCATE suffers from the same behavior (it acquires an
AccessExclusiveLock where really an ExclusiveLock or similar should be
good enough)

- if we make these changes, we will need some way to delete a
no-longer-visible relfilenode. It should be sufficient to delete a
relfilenode when the expired pg_class row that refers to it is no longer
visible to any transactions -- but this isn't necessarily going to be
true when the transaction that executed the REINDEX/CLUSTER/TRUNCATE
commits. We could perform this check in some kind of periodic process,
perhaps -- like the bgwriter, at checkpoint time.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:19:38
Message-ID: 1537.1111551578@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Well, we will be holding an ExclusiveLock on the heap relation
> regardless. We "replace" the heap table by swapping its relfilenode, so
> ISTM we needn't hold an AccessExclusiveLock.

Utterly wrong. When you commit you will physically drop the old table.
If there is a SELECT running against the old table it will be quite
unhappy after that.

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:26:37
Message-ID: 4240EFFD.4090805@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Utterly wrong. When you commit you will physically drop the old table.
> If there is a SELECT running against the old table it will be quite
> unhappy after that.

How can we drop the file at commit, given that a serializable
transaction's snapshot should still be able to see old relfilenode's
content?

(If the serializable transaction has already acquired a read lock before
the TRUNCATE begins, it will block the TRUNCATE -- but there is no
guarantee that the operations will be ordered like that.)

-Neil


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: swapping relfilenodes (was: Re: locks in CREATE TRIGGER, ADD FK)
Date: 2005-03-23 04:28:57
Message-ID: slrnd41s49.2a3u.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2005-03-23, Neil Conway <neilc(at)samurai(dot)com> wrote:
> - swap the relfilenodes of the old and temporary heap relations

While discussing this one further on IRC, I noticed the following:

Everywhere I could find that currently replaces the relfilenode of a
relation does so while holding an AccessExclusive lock, and assumes that
this is sufficient to ensure that the old relfilenode can be killed when
the transaction commits. This is not correct.

Example:

- backend A begins a serializable transaction
- backend B truncates a table (and commits)
- backend A, still in the same transaction, accesses the truncated table

Currently backend A sees the truncated table as empty, which is obviously
not right. This is obviously related to any attempt to weaken the locking
on other operations that modify relfilenodes, because doing it right implies
a mechanism to defer the removals past the commit of the modifying
transaction and up to the point where the old data can no longer be seen by
a live transaction.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:33:02
Message-ID: 200503230433.j2N4X2m25327@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> Tom Lane wrote:
> > Utterly wrong. When you commit you will physically drop the old table.
> > If there is a SELECT running against the old table it will be quite
> > unhappy after that.
>
> How can we drop the file at commit, given that a serializable
> transaction's snapshot should still be able to see old relfilenode's
> content?

Vacuum will not remove any old rows because of the transaction xid so
why does it care if the table is clustered/reindexed? It doesn't have
the table open yet.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: andrew(at)supernews(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 04:41:30
Message-ID: 200503230441.j2N4fUe26407@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew - Supernews wrote:
> On 2005-03-23, Neil Conway <neilc(at)samurai(dot)com> wrote:
> > - swap the relfilenodes of the old and temporary heap relations
>
> While discussing this one further on IRC, I noticed the following:
>
> Everywhere I could find that currently replaces the relfilenode of a
> relation does so while holding an AccessExclusive lock, and assumes that
> this is sufficient to ensure that the old relfilenode can be killed when
> the transaction commits. This is not correct.
>
> Example:
>
> - backend A begins a serializable transaction
> - backend B truncates a table (and commits)
> - backend A, still in the same transaction, accesses the truncated table
>
> Currently backend A sees the truncated table as empty, which is obviously
> not right. This is obviously related to any attempt to weaken the locking
> on other operations that modify relfilenodes, because doing it right implies
> a mechanism to defer the removals past the commit of the modifying
> transaction and up to the point where the old data can no longer be seen by
> a live transaction.

This is a good point. While DELETE keeps the old rows around and VACUUM
perserves them until the serialized transaction commits, truncate does
not keep the old rows around.

In fact, would a truncate during a backup cause the backup to be
inconsistent because it wouldn't be a true snapshot of the database at
backup start time? Seems so.

The docs mention:

TRUNCATE cannot be used if there are foreign-key refer-
ences to the table from other tables. Checking validity in
such cases would require table scans, and the whole point
is not to do one.

so it doesn't make the referential integrity inconsistent.

Perhaps we should document this.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:42:29
Message-ID: 1707.1111552949@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> How can we drop the file at commit, given that a serializable
> transaction's snapshot should still be able to see old relfilenode's
> content?

It isn't 100% MVCC, I agree. But it works because system catalog
lookups are SnapshotNow, and so when another session comes and wants to
look at the table it will see the committed new version of the pg_class
row pointing at the new relfilenode file. What you have to prevent is
somebody accessing the table *while* the changeover happens ... and
that's why your lock has to be AccessExclusive.

If you want to complain about MVCC violations in CLUSTER, think about
the fact that it scans the table with SnapshotNow, and therefore loses
rows that are committed-dead but might still be visible to somebody.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 04:47:36
Message-ID: 1766.1111553256@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> In fact, would a truncate during a backup cause the backup to be
> inconsistent because it wouldn't be a true snapshot of the database at
> backup start time? Seems so.

No, because pg_dump holds AccessShareLock on every table that it intends
to dump, thereby ensuring that TRUNCATE/CLUSTER/etc are held off. The
proposal to weaken the locks that those operations take would in fact
break pg_dump.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 04:51:16
Message-ID: 200503230451.j2N4pG927892@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > In fact, would a truncate during a backup cause the backup to be
> > inconsistent because it wouldn't be a true snapshot of the database at
> > backup start time? Seems so.
>
> No, because pg_dump holds AccessShareLock on every table that it intends
> to dump, thereby ensuring that TRUNCATE/CLUSTER/etc are held off. The
> proposal to weaken the locks that those operations take would in fact
> break pg_dump.

Oh, it pre-locks. I didn't know that.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 04:53:58
Message-ID: 1824.1111553638@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> - if we make these changes, we will need some way to delete a
> no-longer-visible relfilenode.

This is presuming that we abandon the notion that system catalog
access use SnapshotNow. Which opens the question of what they should
use instead ... to which "transaction snapshot" isn't the answer,
because we have to be able to do system catalog accesses before
we've set the snapshot. (Else forget issuing LOCK TABLE before
the snapshot is set.)

I really think that you haven't the faintest idea of the size of
the can of worms you are opening here :-(

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 05:16:36
Message-ID: 4240FBB4.3010702@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> It isn't 100% MVCC, I agree. But it works because system catalog
> lookups are SnapshotNow, and so when another session comes and wants to
> look at the table it will see the committed new version of the pg_class
> row pointing at the new relfilenode file.

If by "works", you mean "provides correct transactional semantics", then
that simply isn't true. Not making CLUSTER and similar DDL commands MVCC
compliant isn't the end of the world, I agree, but that doesn't make it
correct, either.

> If you want to complain about MVCC violations in CLUSTER, think about
> the fact that it scans the table with SnapshotNow, and therefore loses
> rows that are committed-dead but might still be visible to somebody.

This seems like another facet of the same problem (a serializable
transaction's snapshot effectively includes the relfilenodes that were
visible when the snapshot was taken, and swapping in another relfilenode
under its nose is asking for trouble).

We could fix the CLUSTER bug, although not the TRUNCATE bug, by scanning
the old relation with SnapshotAny (or ideally, "the snapshot such that
we can see all tuples visible to any currently running transaction", if
we can produce such a snapshot easily). Not sure if that's worth doing;
it would be nice to solve the root problem (scanning system catalogs
with SnapshotNow, per discussion elsewhere in thread).

-Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 05:50:02
Message-ID: 4241038A.30201@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> This is presuming that we abandon the notion that system catalog
> access use SnapshotNow. Which opens the question of what they should
> use instead ... to which "transaction snapshot" isn't the answer,
> because we have to be able to do system catalog accesses before
> we've set the snapshot.

I wonder if it would be possible to use SnapshotNow before the
transaction's snapshot has been established, and the transaction's
snapshot subsequently. Although it definitely makes me nervous to use
multiple snapshots over the life of a single transaction...

-Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 07:47:22
Message-ID: 42411F0A.2020509@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I don't think this has been adequately thought through at all ... but
> at least make it ExclusiveLock. What is the use-case for allowing
> SELECT FOR UPDATE in parallel with this?

Ok, patch applied -- I adjusted it to use ExclusiveLock, and fleshed out
some of the comments.

-Neil

Attachment Content-Type Size
create_trigger_lock-2.patch text/x-patch 3.4 KB

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 14:45:40
Message-ID: 87d5tq7757.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:

> Tom Lane wrote:
> > It isn't 100% MVCC, I agree. But it works because system catalog
> > lookups are SnapshotNow, and so when another session comes and wants to
> > look at the table it will see the committed new version of the pg_class
> > row pointing at the new relfilenode file.
>
> If by "works", you mean "provides correct transactional semantics", then that
> simply isn't true. Not making CLUSTER and similar DDL commands MVCC compliant
> isn't the end of the world, I agree, but that doesn't make it correct, either.

I think he means it works because it doesn't matter whether the serializable
transaction sees the old table or the new one. As soon as the CLUSTER commits
the serializable transaction can start using the new one since it's
functionally identical to the old one (at least it's supposed to be, Tom
points out it isn't).

> > If you want to complain about MVCC violations in CLUSTER, think about
> > the fact that it scans the table with SnapshotNow, and therefore loses
> > rows that are committed-dead but might still be visible to somebody.

Ouch. That's, er, a problem. I guess currently it's fine for any transaction
using READ COMMITTED but it's already wrong for serializable transactions. And
it'll be wrong for READ COMMITTED if CLUSTER is changed not to take an
exclusive lock.

--
greg


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 14:47:45
Message-ID: 877jjy771q.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > In fact, would a truncate during a backup cause the backup to be
> > inconsistent because it wouldn't be a true snapshot of the database at
> > backup start time? Seems so.
>
> No, because pg_dump holds AccessShareLock on every table that it intends
> to dump, thereby ensuring that TRUNCATE/CLUSTER/etc are held off. The
> proposal to weaken the locks that those operations take would in fact
> break pg_dump.

It seems like that would be true for TRUNCATE but not CLUSTER.

Though pg_dump works in READ COMMITTED mode doesn't it? So it doesn't really
get a consistent view of the database exactly anyways. If it tried to go in
SERIALIZABLE mode I suspect it would rarely complete though.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 15:04:43
Message-ID: 6459.1111590283@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Tom Lane wrote:
>> I don't think this has been adequately thought through at all ... but
>> at least make it ExclusiveLock. What is the use-case for allowing
>> SELECT FOR UPDATE in parallel with this?

> Ok, patch applied -- I adjusted it to use ExclusiveLock, and fleshed out
> some of the comments.

I think last night's discussion makes it crystal-clear why I felt that
this hasn't been sufficiently thought through. Please revert until the
discussion comes to a conclusion.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-23 15:57:17
Message-ID: 6934.1111593437@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Tom Lane wrote:
>> It isn't 100% MVCC, I agree. But it works because system catalog
>> lookups are SnapshotNow, and so when another session comes and wants to
>> look at the table it will see the committed new version of the pg_class
>> row pointing at the new relfilenode file.

> If by "works", you mean "provides correct transactional semantics", then
> that simply isn't true. Not making CLUSTER and similar DDL commands MVCC
> compliant isn't the end of the world, I agree, but that doesn't make it
> correct, either.

I agree that we aren't MVCC with respect to DDL operations (and for this
purpose CLUSTER is DDL). Trying to become so would open a can of worms
far larger than it's worth, though, IMHO.

Let me give you a couple of illustrations of why things are the way they
are. Consider

Transaction 1 Transaction 2

BEGIN;

... BEGIN;

... INSERT INTO a;

CLUSTER a; ...

Currently, because T2 continues to hold a write lock on A until it
commits, T1's CLUSTER will block until T2 commits; therefore CLUSTER's
use of SnapshotNow is sufficient to not lose any live tuples. Were T1
to use a transaction-aware snapshot to scan A, this would not be so.
(SnapshotAny would work, but CLUSTER would have to become an order of
magnitude harder than it is now, because it would have to preserve
UPDATE chain link relationships ... compare VACUUM FULL's treatment of
tuple chains.)

Transaction 1 Transaction 2

BEGIN;

... CLUSTER a;

INSERT INTO a;

Were T1 using a transaction-aware snapshot to read pg_class, it would
insert its new tuple into the wrong relfilenode for A, causing either
immediate failure or eventual loss of a live tuple.

Transaction 1 Transaction 2

BEGIN;

... CREATE TRIGGER ... ON INSERT TO a ...

INSERT INTO a;

Were T1 using a transaction-aware snapshot to read pg_trigger, it would
fail to fire the already-committed insert trigger. (Variants of this
include failing to insert index entries into a new index, failing to
apply a new constraint, etc.) You don't have to assume that T1 is
in Serializable mode, either. It might be using Read Committed, but
the INSERT starts and sets its snapshot while T2 is still in progress;
then of course blocks until T2 commits; then does the wrong thing
because it is still paying attention to pre-T2 catalog entries. This is
why LockRelation accepts SI inval messages immediately after each lock
acquisition; it's to ensure that we do see the effects of T2.

Transaction 1 Transaction 2

BEGIN;

SELECT FROM a;

... CREATE TRIGGER ... ON INSERT TO a ...

INSERT INTO a;

Ordinarily, once T1 has touched relation A, it can be sure that A's
schema will not change until end of transaction. The change you
committed last night removes that guarantee, at least for the
limited case of triggers, and makes the above interleaving possible.
While I haven't come up with a clear failure case after a few minutes'
thought, I'm not convinced that there isn't one.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 17:11:37
Message-ID: 7861.1111597897@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Though pg_dump works in READ COMMITTED mode doesn't it?

Certainly not.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 18:35:15
Message-ID: 87k6ny5hy4.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Though pg_dump works in READ COMMITTED mode doesn't it?
>
> Certainly not.

It works in serializable mode? I guess the only reason we don't see
"transaction failed" ever is because it's not doing any updates?

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: swapping relfilenodes (was: Re: locks in CREATE TRIGGER,
Date: 2005-03-23 18:40:13
Message-ID: 8847.1111603213@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Greg Stark <gsstark(at)mit(dot)edu> writes:
>>> Though pg_dump works in READ COMMITTED mode doesn't it?
>>
>> Certainly not.

> It works in serializable mode? I guess the only reason we don't see
> "transaction failed" ever is because it's not doing any updates?

Right.

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-24 00:03:35
Message-ID: 424203D7.7040701@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I think last night's discussion makes it crystal-clear why I felt that
> this hasn't been sufficiently thought through. Please revert until the
> discussion comes to a conclusion.

I applied the patch because I don't think it is very closely related to
the discussion. But if you'd prefer, I'm happy to wait until we're done,
so I've reverted the patch.

-Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-24 00:31:57
Message-ID: 42420A7D.1010004@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I agree that we aren't MVCC with respect to DDL operations (and for this
> purpose CLUSTER is DDL). Trying to become so would open a can of worms
> far larger than it's worth, though, IMHO.

I think if we can come up with a reasonable way to handle all the
consequences, it's worth doing. And yes, I realize there are a lot of
consequences, so it may well not be possible.

> Transaction 1 Transaction 2
>
> BEGIN;
>
> ... BEGIN;
>
> ... INSERT INTO a;
>
> CLUSTER a; ...
>
> Currently, because T2 continues to hold a write lock on A until it
> commits, T1's CLUSTER will block until T2 commits; therefore CLUSTER's
> use of SnapshotNow is sufficient to not lose any live tuples. Were T1
> to use a transaction-aware snapshot to scan A, this would not be so.

I think this is somewhat tangential: we're discussing changing the
snapshot used to scan system catalogs, not user relations like A. The
only reason that CLUSTER's use of SnapshotNow is a problem at the moment
is the same reason that TRUNCATE is a problem -- a concurrent
serializable transaction will use the new relfilenode, not the old one.

> Transaction 1 Transaction 2
>
> BEGIN;
>
> ... CLUSTER a;
>
> INSERT INTO a;
>
> Were T1 using a transaction-aware snapshot to read pg_class, it would
> insert its new tuple into the wrong relfilenode for A, causing either
> immediate failure or eventual loss of a live tuple.

Yes, definitely a problem :( The same applies to TRUNCATE, naturally.
The only somewhat reasonable behavior I can think of is to cause
modifications to the oldrelfilenode to fail in a concurrent serializable
transaction. The behavior would be somewhat analogous to an UPDATE in a
serializable transaction failing because of a concurrent data
modification, although in this case we would error out on any
modification (e.g. INSERT).

-Neil


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-03-28 05:04:01
Message-ID: 200503280504.j2S541B12913@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> On Wed, Mar 23, 2005 at 10:42:01AM +0800, Christopher Kings-Lynne wrote:
> > >>If you want to be my friend forever, then fix CLUSTER so that it uses
> > >>sharerowexclusive as well :D
> > >
> > >I don't think it's as easy as that, because you have to move tuples
> > >around in the cluster operation. Same sort of issue as vacuum full I
> > >would suggest.
> >
> > Cluster doesn't move rows...
> >
> > I didn't say it was easy. It would involve changing how cluster works.
> > It would keep the old table around while building the new, then grab
> > an exclusive lock to swap the two.
>
> Huh, cluster already does that.
>
> I don't remember what the rationale was for locking the table, leaving
> even simple SELECTs out. (In fact, IIRC the decision wasn't made by me,
> and it wasn't discussed at all.)
>

The issue is that we would have to escalate the shared lock to exclusive
to swap out the files, and lock escallation could lead to
deadlock/starvation.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-05-30 01:52:55
Message-ID: 1117417975.23266.2.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2005-03-23 at 10:04 -0500, Tom Lane wrote:
> I think last night's discussion makes it crystal-clear why I felt that
> this hasn't been sufficiently thought through. Please revert until the
> discussion comes to a conclusion.

Are there any remaining objections to reapplying this patch?

The original commit message is here:

http://archives.postgresql.org/pgsql-committers/2005-03/msg00316.php

The archives of the -hackers thread are here:

http://archives.postgresql.org/pgsql-hackers/2005-03/msg00764.php

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-05-30 02:33:39
Message-ID: 4750.1117420419@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Are there any remaining objections to reapplying this patch?
> The original commit message is here:
> http://archives.postgresql.org/pgsql-committers/2005-03/msg00316.php
> The archives of the -hackers thread are here:
> http://archives.postgresql.org/pgsql-hackers/2005-03/msg00764.php

I'm still concerned about the last example I raised in
http://archives.postgresql.org/pgsql-hackers/2005-03/msg00840.php
which was:

>> Transaction 1 Transaction 2
>>
>> BEGIN;
>>
>> SELECT FROM a;
>>
>> ... CREATE TRIGGER ... ON INSERT TO a ...
>>
>> INSERT INTO a;
>>
>> Ordinarily, once T1 has touched relation A, it can be sure that A's
>> schema will not change until end of transaction. The change you
>> committed last night removes that guarantee, at least for the
>> limited case of triggers, and makes the above interleaving possible.
>> While I haven't come up with a clear failure case after a few minutes'
>> thought, I'm not convinced that there isn't one.

It's possible that this is safe for triggers only, but I'm not 100%
convinced of that, and I'm not sure I see the point of relaxing the
general rule "schema changes require AccessExclusiveLock" for just
this one operation.

regards, tom lane


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: locks in CREATE TRIGGER, ADD FK
Date: 2005-05-30 11:33:09
Message-ID: slrnd9lufl.1d3v.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2005-05-30, Neil Conway <neilc(at)samurai(dot)com> wrote:
> On Wed, 2005-03-23 at 10:04 -0500, Tom Lane wrote:
>> I think last night's discussion makes it crystal-clear why I felt that
>> this hasn't been sufficiently thought through. Please revert until the
>> discussion comes to a conclusion.
>
> Are there any remaining objections to reapplying this patch?

I've run into a few questions recently that might be relevent to the
issue of DDL locking in general and therefore possibly this change in
particular.

The most significant one is to do with the pg_get_*def functions and
their consistency, or otherwise, with explicit scans of the system
catalogs. The problem here of course is that the pg_get_*def functions
mostly (but not exclusively) use the syscache and therefore see data
relative to SnapshotNow, whereas the queries that are invoking them
are likely to be doing explicit scans of the catalog tables within the
transaction's active snapshot (and for a long-running serializable
transaction such as pg_dump, these may be some way apart).

The obvious place to look for failure modes is to see whether pg_dump
can be made to fail by deleting something (index, perhaps?) that it is
expecting to find, and see whether it chokes (pg_get_indexdef will elog
if the index doesn't exist in SnapshotNow). Dropping a view might be
another case where this can be made to fail.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services