Quick Links

Re: Should I implement DROP INDEX CONCURRENTLY?

Lists:	pgsql-hackers

From:	Daniel Farina <daniel(at)heroku(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-08-24 18:24:49
Message-ID:	CACN56+NNLO=RamDAy+uSa_mKXVsM+HjrVj8ehGjfg-mO9qcpzA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello list,

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Quoth index.c:

/*
* To drop an index safely, we must grab exclusive lock on its parent
* table. Exclusive lock on the index alone is insufficient because
* another backend might be about to execute a query on the parent table.
* If it relies on a previously cached list of index OIDs, then it could
* attempt to access the just-dropped index. We must therefore take a
* table lock strong enough to prevent all queries on the table from
* proceeding until we commit and send out a shared-cache-inval notice
* that will make them update their index lists.
*/

Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?

The general idea is:

1) set an index as "invalid", to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.

A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.

--
fdr

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-08-24 18:33:46
Message-ID:	CAHyXU0w4sKtbk_HzkcvJLJ80ZM41mSBAmRxD1bNz6w93d+6wGw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> Hello list,
>
> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
> recently when frobbing around some indexes I realized that there is no
> equivalent for DROP INDEX, and this is a similar but lesser problem
> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
> EXCLUSIVE lock on the parent table while doing the work to unlink
> files, which nominally one would think to be trivial, but I assure you
> it is not at times for even indexes that are a handful of gigabytes
> (let's say ~=< a dozen). By non-trivial, I mean it can take 30+
> seconds, but less than a couple of minutes. The storage layer
> (starting from the higher levels of abstraction) are XFS, a somewhat
> trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

merlin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-08-24 19:36:29
Message-ID:	CA+TgmoYvGjaARPofXTWC+OBVF13Nib2MLVcc7940w-91ZuQGtg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 24, 2011 at 2:24 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> Could I make the ACCESS EXCLUSIVE section just long enough to commit
> catalog updates, and then have the bulk of the work happen afterwards?
>
> The general idea is:
>
> 1) set an index as "invalid", to ensure no backend will use it in planning
> 2) wait for the xmin horizon to advance to ensure no open snapshots
> that may not see the invalidation of the index are gone (is there a
> way to tighten that up? although even this conservative version would
> be 80-90% of the value for us...)
> 3) then use performDeletions without taking a lock on the parent
> table, similar to what's in tablecmds.c already.
>
> A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
> instead of waiting for statement confirmation, just like CREATE INDEX
> CONCURRENTLY.

This might be a dumb idea, but could we rearrange CommitTransaction()
so that smgrDoPendingDeletes() happens just a bit further down, after
those ResourceOwnerRelease() calls? It seems like that might
accomplish what you're trying to do here without needing a new
command.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-08-24 19:38:09
Message-ID:	4063.1314214689@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
>> recently when frobbing around some indexes I realized that there is no
>> equivalent for DROP INDEX, and this is a similar but lesser problem
>> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
>> EXCLUSIVE lock on the parent table while doing the work to unlink
>> files, which nominally one would think to be trivial, but I assure you
>> it is not at times for even indexes that are a handful of gigabytes
>> (let's say ~=< a dozen).

> Are you sure that you are really waiting on the time to unlink the
> file? there's other stuff going on in there like waiting for lock,
> plan invalidation, etc. Point being, maybe the time consuming stuff
> can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

regards, tom lane

From:	Daniel Farina <daniel(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-08-24 20:04:55
Message-ID:	CAAZKuFZdz23cvOov7g0XxSyK6j+a+TFAoiNTeNrwhiUZx5DV9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>>> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
>>> recently when frobbing around some indexes I realized that there is no
>>> equivalent for DROP INDEX, and this is a similar but lesser problem
>>> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
>>> EXCLUSIVE lock on the parent table while doing the work to unlink
>>> files, which nominally one would think to be trivial, but I assure you
>>> it is not at times for even indexes that are a handful of gigabytes
>>> (let's say ~=< a dozen).
>
>> Are you sure that you are really waiting on the time to unlink the
>> file? there's other stuff going on in there like waiting for lock,
>> plan invalidation, etc. Point being, maybe the time consuming stuff
>> can't really be deferred which would make the proposal moot.
>
> Assuming the issue really is the physical unlinks (which I agree I'd
> like to see some evidence for), I wonder whether the problem could be
> addressed by moving smgrDoPendingDeletes() to after locks are released,
> instead of before, in CommitTransaction/AbortTransaction. There does
> not seem to be any strong reason why we have to do that before lock
> release, since incoming potential users of a table should not be trying
> to access the old physical storage after that anyway.

Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.

--
fdr

From:	Daniel Farina <daniel(at)heroku(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-09-09 22:02:53
Message-ID:	CAAZKuFZyTdWx-Cz4mKgxB1JSYXonhabLKOUcXUQOz0o3-+JjjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Assuming the issue really is the physical unlinks (which I agree I'd
>> like to see some evidence for), I wonder whether the problem could be
>> addressed by moving smgrDoPendingDeletes() to after locks are released,
>> instead of before, in CommitTransaction/AbortTransaction. There does
>> not seem to be any strong reason why we have to do that before lock
>> release, since incoming potential users of a table should not be trying
>> to access the old physical storage after that anyway.
>
> Alright, since this concern about confirming the expensive part of
> index dropping has come up a few times but otherwise the waters are
> warm, I'll go ahead and do some work to pin things down a bit before
> we continue working on those assumptions.
>

This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process. There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.

I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.

Back to the drawing board...

--
fdr

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-12-30 22:20:20
Message-ID:	1325283620.11282.7.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
> recently when frobbing around some indexes I realized that there is no
> equivalent for DROP INDEX, and this is a similar but lesser problem
> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
> EXCLUSIVE lock on the parent table while doing the work to unlink
> files, which nominally one would think to be trivial, but I assure you
> it is not at times for even indexes that are a handful of gigabytes
> (let's say ~=< a dozen). By non-trivial, I mean it can take 30+
> seconds, but less than a couple of minutes. The storage layer
> (starting from the higher levels of abstraction) are XFS, a somewhat
> trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).
>
> I was poking around at tablecmds and index.c and wonder if a similar
> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
> create a DROP INDEX CONCURRENTLY, and if there would be any interest
> in accepting such a patch.

Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2011-12-31 13:26:25
Message-ID:	CA+U5nMLQjFQvgyDVZ7wH7EcGM7bZRoLngJSp7aOFXko0-7Ab9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
>> I was poking around at tablecmds and index.c and wonder if a similar
>> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
>> create a DROP INDEX CONCURRENTLY, and if there would be any interest
>> in accepting such a patch.
>
> Hmm, it seems I just independently came up with this same concept. My
> problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
> exclusive lock on the table just to clean that up. If the table is
> under constant load, you can't easily do that. So a two-pass DROP INDEX
> CONCURRENTLY might have been helpful for me.

Here's a patch for this. Please review.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
drop_index_concurrently.v1.patch	text/x-patch	21.8 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Daniel Farina <daniel(at)heroku(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-03 18:11:58
Message-ID:	CA+U5nM+WY6rF=mrGEPu2cY0ufth3qUHXe3SnWSs95UadBZbj-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Assuming the issue really is the physical unlinks (which I agree I'd
>>> like to see some evidence for), I wonder whether the problem could be
>>> addressed by moving smgrDoPendingDeletes() to after locks are released,
>>> instead of before, in CommitTransaction/AbortTransaction. There does
>>> not seem to be any strong reason why we have to do that before lock
>>> release, since incoming potential users of a table should not be trying
>>> to access the old physical storage after that anyway.
>>
>> Alright, since this concern about confirming the expensive part of
>> index dropping has come up a few times but otherwise the waters are
>> warm, I'll go ahead and do some work to pin things down a bit before
>> we continue working on those assumptions.
>>
>
> This suspicion seems to be proven correct; there came an opportunity
> where we were removing some indexes on a live system and I took the
> opportunity to carefully control and time the process. There's not
> much relationship between size of the index and the delay, but the
> pauses are still very real. On the other hand, the first time this was
> noticed there was significantly higher load.
>
> I'd still like to do something to solve this problem, though: even if
> the time-consuming part of the process is not file unlinking, it's
> clearly something after the AccessExclusiveLock is acquired based on
> our other measurements.

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
strategyfreebuffer.v1.patch	text/x-patch	1.1 KB

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-03 23:11:37
Message-ID:	FBB4EDF8-2CEB-48B3-8F99-A87352A87847@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
> This could well be related to the fact that DropRelFileNodeBuffers()
> does a scan of shared_buffers, which is an O(N) approach no matter the
> size of the index.
>
> On top of that, taking what Robert Haas mentioned on another thread,
> InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
> an ExclusiveLock on the BufFreelistLock. On a busy system this will be
> heavily contended, so adding blocks to the freelist only if the lock
> is free seems warranted.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

In fact, IIRC the function that scans for buffers actually checks to see if a rel still exists before it returns the buffer...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-03 23:28:19
Message-ID:	28784.1325633299@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>> This could well be related to the fact that DropRelFileNodeBuffers()
>> does a scan of shared_buffers, which is an O(N) approach no matter the
>> size of the index.

> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist. Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.

(BTW, do we have a separate clock sweep hand for each backend? If not,
there might be some low hanging fruit there.)

regards, tom lane

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-04 00:03:17
Message-ID:	CA994BD8-9A07-40EB-A3CA-7C4A0A0B0E67@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>> size of the index.
>
>> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
>
> No, we can't, because if they're still dirty then the bgwriter would
> first try to write them to the no-longer-existing storage file. It's
> important that we kill the buffers immediately during relation drop.
>
> I'm still thinking that it might be sufficient to mark the buffers
> invalid and let the clock sweep find them, thereby eliminating the need
> for a freelist. Simon is after a different solution involving getting
> rid of the clock sweep, but he has failed to explain how that's not
> going to end up being the same type of contention-prone coding that we
> got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
> takes a lot of spinlocks, but that only matters if there is contention
> for them, and the sweep approach avoids the need for a centralized data
> structure.

Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).

Heh, it occurs to me that the SQL analogy for how things work right now is that backends currently have to run a SeqScan (or 5) to find a free page... what we need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.

> (BTW, do we have a separate clock sweep hand for each backend? If not,
> there might be some low hanging fruit there.)

No... having multiple clock hands is an interesting idea, but I'm worried that it could potentially get us into trouble if scores of backends were suddenly decrementing usage counts all over the place. For example, what if 5 backends all had their hands in basically the same place, all pointing at a very heavily used buffer. All 5 backends go for free space, they each grab the spinlock on that buffer in succession and suddenly this highly used buffer that started with a count of 5 has now been freed. We could potentially use more than one hand, but I think the relation between the number of hands and the maximum usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-04 00:11:35
Message-ID:	29638.1325635895@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-04 01:34:34
Message-ID:	CA+TgmoYtQ2NpE3d7iavsNn-=SxwZJmF-DjSpkhdcu7i5kBViQg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
>
> Well, maybe, but I think the historical evidence suggests that that
> approach will be a loser, simply because no matter how cheap, the
> freelist will remain a centralized and heavily contended data structure.
> IMO we need to be looking for a mechanism that has no single point of
> contention, and modifying the clock sweep rules looks like the best way
> to get there.
>
> Still, he who wants to do the work can try whatever approach he feels
> like.

It might be possible to partition the free list. So imagine, for
example, 8 free lists. The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list. Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it. If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again. This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.

Of course you have to think about what happens if you find that your
chosen free list is empty. In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground. You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-04 03:59:03
Message-ID:	06CF0B08-8E1E-4E47-81F3-90C6771D58F9@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
> On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jim Nasby <jim(at)nasby(dot)net> writes:
>>> Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
>>
>> Well, maybe, but I think the historical evidence suggests that that
>> approach will be a loser, simply because no matter how cheap, the
>> freelist will remain a centralized and heavily contended data structure.
>> IMO we need to be looking for a mechanism that has no single point of
>> contention, and modifying the clock sweep rules looks like the best way
>> to get there.
>>
>> Still, he who wants to do the work can try whatever approach he feels
>> like.
>
> It might be possible to partition the free list. So imagine, for
> example, 8 free lists. The background writer runs the clock sweep and
> finds some buffers that are about ready to be reallocated and
> distributes one-eighth of them to each free list. Then, when a
> backend wants to allocate a buffer, it picks a free list somehow
> (round robin?) and pulls a buffer off it. If the buffer turns out to
> have been used since it was added to the free list, we give up on it
> and try again. This hopefully shouldn't happen too often, though, as
> long as we only add enough buffers to the free list to satisfy the
> requests that we expect to get over the next
> small-fraction-of-a-second.
>
> Of course you have to think about what happens if you find that your
> chosen free list is empty. In that case you probably want to try a
> different one, and if in the worst case where they're all empty, run
> the clock sweep in the foreground. You probably also want to kick the
> background writer in the pants if even a single one is empty (or
> nearly empty?) and tell it to hurry up and add some more buffers to
> the freelist.

If it comes down to it, we can look at partitioning the free list. But here's the thing: this is the strategy that FreeBSD (and I think now Linux as well) use to service memory requests, be they for free memory or for reading data from disk. If it's good enough for an OS to use, I would expect we could make it work as well.

I would expect that pulling a page off the free list would be an extremely fast operation... lock the read lock on the list (we should probably have separate locks for putting stuff on the list vs taking it off), pin the buffer (which shouldn't be contentious), update the read pointer and drop the read lock. Even in C code that's probably less than 100 machine instructions, and no looping. Compared to the cost of running a clock sweep it should be significantly faster. So while there will be contention on the free list read lock, none of that contention should last very long.

Do we have any other places where we take extremely short locks like this and still run into problems?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-04 11:14:29
Message-ID:	CA+U5nMJOBUOMFdSscKr1UZ6PZ0RDZo-=qjFeNnp98XMC6vAfZg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>> size of the index.
>
>> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
>
> No, we can't, because if they're still dirty then the bgwriter would
> first try to write them to the no-longer-existing storage file. It's
> important that we kill the buffers immediately during relation drop.
>
> I'm still thinking that it might be sufficient to mark the buffers
> invalid and let the clock sweep find them, thereby eliminating the need
> for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

> Simon is after a different solution involving getting
> rid of the clock sweep...

err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.

But lets discuss that on the main freelist thread.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
removebufmgrfreelist.v1.patch	text/x-patch	4.7 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-08 16:19:48
Message-ID:	CA+U5nMLDq_xCVPL5FR5=-0yU5yzzNs8+0RaZ_R2wnUrzxYb2ow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> Not having a freelist at all is probably a simpler way of avoiding the
> lock contention, so I'll happily back that suggestion instead. Patch
> attached, previous patch revoked.

v2 attached with cleanup of some random stuff that crept onto patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
removebufmgrfreelist.v2.patch	text/x-patch	3.5 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-16 16:17:00
Message-ID:	CA+Tgmob+_CyarTbdB9GByVkrju9vgu7gQRoTsrC5o1N9JTaVOQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 31, 2011 at 8:26 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:
>>> I was poking around at tablecmds and index.c and wonder if a similar
>>> two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
>>> create a DROP INDEX CONCURRENTLY, and if there would be any interest
>>> in accepting such a patch.
>>
>> Hmm, it seems I just independently came up with this same concept. My
>> problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
>> exclusive lock on the table just to clean that up. If the table is
>> under constant load, you can't easily do that. So a two-pass DROP INDEX
>> CONCURRENTLY might have been helpful for me.
>
> Here's a patch for this. Please review.

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

Maybe we need to change indisvalid to a "char" and make it three
valued: c = being created currently, v = valid, d = being dropped
concurrently, or something like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-16 19:06:18
Message-ID:	1326740778.29466.10.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
> I don't see how setting indisvalid to false helps with this, because
> IIUC when a session sees indisvalid = false, it is supposed to avoid
> using the index for queries but still make new index entries when a
> write operation happens - but to drop an index, I think you'd need to
> get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-16 19:46:21
Message-ID:	CA+Tgmoa5u8M4UAdBJCnUArEz74osJ2T2WDBdFpMZ1Rp3nuTQyQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
>> I don't see how setting indisvalid to false helps with this, because
>> IIUC when a session sees indisvalid = false, it is supposed to avoid
>> using the index for queries but still make new index entries when a
>> write operation happens - but to drop an index, I think you'd need to
>> get into a state where no one was using the index for anything at all.
>
> ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-17 17:06:04
Message-ID:	1326819964.2820.8.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
> >> I don't see how setting indisvalid to false helps with this, because
> >> IIUC when a session sees indisvalid = false, it is supposed to avoid
> >> using the index for queries but still make new index entries when a
> >> write operation happens - but to drop an index, I think you'd need to
> >> get into a state where no one was using the index for anything at all.
> >
> > ISTM that one would need to set indisready to false instead.
>
> Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense. There is
only just-created -> ready -> valid. We might as well convert that to a
single "char" column, as you had indicated in your earlier email. But
that's independent of the proposed patch.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-18 01:28:38
Message-ID:	CA+Tgmobx59VkyRCyHp6EvCmccLgm3+3aGkM4+pn=LsOHd8UkSA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
>> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
>> >> I don't see how setting indisvalid to false helps with this, because
>> >> IIUC when a session sees indisvalid = false, it is supposed to avoid
>> >> using the index for queries but still make new index entries when a
>> >> write operation happens - but to drop an index, I think you'd need to
>> >> get into a state where no one was using the index for anything at all.
>> >
>> > ISTM that one would need to set indisready to false instead.
>>
>> Maybe we should set both to false?
>
> Well, ready = false and valid = true doesn't make any sense. There is
> only just-created -> ready -> valid. We might as well convert that to a
> single "char" column, as you had indicated in your earlier email. But
> that's independent of the proposed patch.

Sure, but the point is that I think if you want everyone to stop
touching the index, you ought to mark it both not-valid and not-ready,
which the current patch doesn't do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-18 22:49:56
Message-ID:	CA+U5nM+TR5RfZ3PLmttjEguRPR3zGE9+QxKbJpNabapZOzym-A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 18, 2012 at 1:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:
>>> On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> > On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:
>>> >> I don't see how setting indisvalid to false helps with this, because
>>> >> IIUC when a session sees indisvalid = false, it is supposed to avoid
>>> >> using the index for queries but still make new index entries when a
>>> >> write operation happens - but to drop an index, I think you'd need to
>>> >> get into a state where no one was using the index for anything at all.
>>> >
>>> > ISTM that one would need to set indisready to false instead.
>>>
>>> Maybe we should set both to false?
>>
>> Well, ready = false and valid = true doesn't make any sense. There is
>> only just-created -> ready -> valid. We might as well convert that to a
>> single "char" column, as you had indicated in your earlier email. But
>> that's independent of the proposed patch.
>
> Sure, but the point is that I think if you want everyone to stop
> touching the index, you ought to mark it both not-valid and not-ready,
> which the current patch doesn't do.

Thanks for the review and the important observation. I agree that I've
changed the wrong column. indisready must be set false. Also agree
setting both false makes most sense.

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-18 23:12:40
Message-ID:	CA+TgmoZzzWn7VqV1LJmXjrND2vQZLBSscCLwB-yo1wT082yLLg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Can I just check with you that the only review comment is a one line
> change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed. I can look it
over more carefully, and test it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-19 15:02:31
Message-ID:	CA+U5nMLMKWEs84Zgc5Px-JTOT5Ea=Csvs0p8or39=WWsXmd08g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Can I just check with you that the only review comment is a one line
>> change? Seems better to make any additional review comments in one go.
>
> No, I haven't done a full review yet - I just noticed that right at
> the outset and wanted to be sure that got addressed. I can look it
> over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
drop_index_concurrently.v2.patch	text/x-patch	24.5 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-20 17:54:00
Message-ID:	4F19AA38.3000602@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 04.01.2012 13:14, Simon Riggs wrote:
> On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jim Nasby<jim(at)nasby(dot)net> writes:
>>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>>> size of the index.
>>
>>> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
>>
>> No, we can't, because if they're still dirty then the bgwriter would
>> first try to write them to the no-longer-existing storage file. It's
>> important that we kill the buffers immediately during relation drop.
>>
>> I'm still thinking that it might be sufficient to mark the buffers
>> invalid and let the clock sweep find them, thereby eliminating the need
>> for a freelist.
>
> My patch puts things on the freelist only when it is free to do so.
> Not having a freelist at all is probably a simpler way of avoiding the
> lock contention, so I'll happily back that suggestion instead. Patch
> attached, previous patch revoked.

So, you're proposing that we remove freelist altogether? Sounds
reasonable, but that needs to be performance tested somehow. I'm not
sure what exactly the test should look like, but it probably should
involve an OLTP workload, and large tables being created, populated to
fill the cache with pages from the table, and dropped, while the OLTP
workload is running. I'd imagine that the worst case scenario looks
something like that.

Removing the freelist should speed up the drop, but the question is how
costly are the extra clock sweeps that the backends have to do.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-21 01:53:33
Message-ID:	CA+TgmoahY-RJFBAe2x5MDGrXmAMLJAD48fBLnnbM9GD1PX34eQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> Can I just check with you that the only review comment is a one line
>>> change? Seems better to make any additional review comments in one go.
>>
>> No, I haven't done a full review yet - I just noticed that right at
>> the outset and wanted to be sure that got addressed. I can look it
>> over more carefully, and test it.
>
> Corrected your point, and another found during retest.
>
> Works as advertised, docs corrected.

This comment needs updating:

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table? If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there. There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough. In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid. There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it. That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen. I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere. I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work. If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready. That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
[, ...] [ CASCADE | RESTRICT ]
CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict. After that you must write
CONCURRENTLY, followed by exactly one name. I think what you want is:

DROP INDEX [ IF EXISTS ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY. It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code. So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time. That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob
RESTRICT, which ought to be legal even if CASCADE isn't supported.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-21 15:11:04
Message-ID:	CA+U5nMKyHikNHJOaRkLJU3Fn+aribZ5crEuHwhuxS_sJfQiQYQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>> Can I just check with you that the only review comment is a one line
>>>> change? Seems better to make any additional review comments in one go.
>>>
>>> No, I haven't done a full review yet - I just noticed that right at
>>> the outset and wanted to be sure that got addressed. I can look it
>>> over more carefully, and test it.
>>
>> Corrected your point, and another found during retest.
>>
>> Works as advertised, docs corrected.
>
> This comment needs updating:
>
> /*
> * To drop an index safely, we must grab exclusive lock on its parent
> * table. Exclusive lock on the index alone is insufficient because
> * another backend might be about to execute a query on the parent table.
> * If it relies on a previously cached list of index OIDs, then it could
> * attempt to access the just-dropped index. We must therefore take a
> * table lock strong enough to prevent all queries on the table from
> * proceeding until we commit and send out a shared-cache-inval notice
> * that will make them update their index lists.
> */

OK, I did reword some but clearly not enough.

> Speaking of that comment, why does a concurrent index drop need any
> lock at all on the parent table? If, as the current comment text
> says, the point of locking the parent table is to make sure that no
> new backends can create relcache entries until our transaction
> commits, then it's not needed here, or at least not for that purpose.
> We're going to out-wait anyone who has the index open *after* we've
> committed our changes to the pg_index entry, so there shouldn't be a
> race condition there. There may very well be a reason why we need a
> lock on the parent, or there may not, but that's not it.

I feel happier locking the parent as well. I'd rather avoid strange
weirdness later as has happened before in this type of discussion.

> On the flip side, it strikes me that there's considerable danger that
> ShareUpdateExclusiveLock on the index might not be strong enough. In
> particular, it would fall down if there's anyone who takes an
> AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
> to access the index data despite its being marked !indisready and
> !indisvalid. There are certainly instances of this in contrib, such
> as in pgstatindex(), which I'm pretty sure will abort with a nastygram
> if somebody truncates away the file under it. That might not be
> enough reason to reject the patch, but I do think it would be the only
> place in the system where we allow something like that to happen. I'm
> not sure whether there are any similar risks in core - I am a bit
> worried about get_actual_variable_range() and gincostestimate(), but I
> can't tell for sure whether there is any actual problem there, or
> elsewhere. I wonder if we should only do the *first* phase of the
> drop with ShareUpdateExcusliveLock, and then after outwaiting
> everyone, take an AccessExclusiveLock on the index (but not the table)
> to do the rest of the work. If that doesn't allow the drop to proceed
> concurrently, then we go find the places that block against it and
> teach them not to lock/use/examine indexes that are !indisready. That
> way, the worst case is that D-I-C is less concurrent than we'd like,
> rather than making random other things fall over - specifically,
> anything that count on our traditional guarantee that AccessShareLock
> is enough to keep the file from disappearing.

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.

> In the department of minor nitpicks, the syntax synopsis you've added
> looks wrong:
>
> DROP INDEX
> [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
> [, ...] [ CASCADE | RESTRICT ]
> CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>
>
> That implies that you may write if exists, followed by 1 or more
> names, followed by cascade or restrict. After that you must write
> CONCURRENTLY, followed by exactly one name. I think what you want is:
>
> DROP INDEX [ IF EXISTS ] <replaceable
> class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
> DROP INDEX CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>
>
> ...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
> CONCURRENTLY. It could also be very useful to be able to concurrently
> drop multiple indexes at once, since that would require waiting out
> all concurrent transactions only once instead of once per drop.
> Either way, I think we should have only one syntax, and then reject
> combinations we can't handle in the code. So I think we should have
> the parser accept:
>
> DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] <replaceable
> class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
>
> And then, if you try to use cascade, or try to drop multiple indexes
> at once, we should throw an error saying that those options aren't
> supported in combination with CONCURRENTLY, rather than rejecting them
> as a syntax error at parse time. That gives a better error message
> and will make it easier for anyone who wants to extend this in the
> future - plus it will allow things like DROP INDEX CONCURRENTLY bob
> RESTRICT, which ought to be legal even if CASCADE isn't supported.

The syntax diagram is correct because there are restrictions there.

That's mostly because of problems trying to get the grammar clean and
conflict free, so that's the best I could do.

If you or anyone else know better how to do that, I'd appreciate some
insight there.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-21 22:29:58
Message-ID:	700ADBFE-DF86-4658-BC90-6C463B177B39@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:
> On 04.01.2012 13:14, Simon Riggs wrote:
>> On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Jim Nasby<jim(at)nasby(dot)net> writes:
>>>> On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:
>>>>> This could well be related to the fact that DropRelFileNodeBuffers()
>>>>> does a scan of shared_buffers, which is an O(N) approach no matter the
>>>>> size of the index.
>>>
>>>> Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.
>>>
>>> No, we can't, because if they're still dirty then the bgwriter would
>>> first try to write them to the no-longer-existing storage file. It's
>>> important that we kill the buffers immediately during relation drop.
>>>
>>> I'm still thinking that it might be sufficient to mark the buffers
>>> invalid and let the clock sweep find them, thereby eliminating the need
>>> for a freelist.
>>
>> My patch puts things on the freelist only when it is free to do so.
>> Not having a freelist at all is probably a simpler way of avoiding the
>> lock contention, so I'll happily back that suggestion instead. Patch
>> attached, previous patch revoked.
>
> So, you're proposing that we remove freelist altogether? Sounds reasonable, but that needs to be performance tested somehow. I'm not sure what exactly the test should look like, but it probably should involve an OLTP workload, and large tables being created, populated to fill the cache with pages from the table, and dropped, while the OLTP workload is running. I'd imagine that the worst case scenario looks something like that.

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-23 02:06:39
Message-ID:	CA+TgmoaOFLiAf8wEqJKv7xvf+TdybJ4wCkQxkO95uXtJ=yc8Bw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well. The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test. The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!). So I guess
my vote would be to leave it alone pending further study, and maybe
remove it later if we can't find a way to rejigger it to be more
useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-23 05:12:48
Message-ID:	2096.1327295568@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>> We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

> That's kinda my feeling as well. The free list in its current form is
> pretty much useless, but I don't think we'll save much by getting rid
> of it, because that's just a single test. The expensive part of what
> we do while holding BufFreelistLock is, I think, iterating through
> buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested. So I think
it would be advisable to prove rather than just assume that that's a
problem.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-23 13:06:46
Message-ID:	CA+Tgmoap8ZtdQSRB0KHJds0AeFe6hCse8oBvTVCqt=iqDg7X=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>> We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...
>
>> That's kinda my feeling as well. The free list in its current form is
>> pretty much useless, but I don't think we'll save much by getting rid
>> of it, because that's just a single test. The expensive part of what
>> we do while holding BufFreelistLock is, I think, iterating through
>> buffers taking and releasing a spinlock on each one (!).
>
> Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested? Or for that matter,
that even an uncontested spinlock operation is cheap enough to do
while holding a badly contended LWLock?

> So I think
> it would be advisable to prove rather than just assume that that's a
> problem.

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock. I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-23 13:16:14
Message-ID:	CA+U5nMJD3Hjgb6Urvn-+s0uqZz8d+Y20dx22TbpsxRpxnRhdFQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 23, 2012 at 1:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> It's pretty trivial to prove that there is a very serious problem with
> BufFreelistLock. I'll admit I can't prove what the right fix is just
> yet, and certainly measurement is warranted.

I agree there is a problem with BufFreelistLock (so please share your
results with Heikki, who seems not to).

As a result, I've published patches which reduce contention on that
lock in various ways, all of which seem valid to me.

There are lots of things we could have done for 9.2 but didn't, yet we
have some things that did get done on the table right now so it would
be useful to push those through immediately or at least defer
discussion on other things until we get back around to this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-23 16:01:52
Message-ID:	205.1327334512@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The expensive part of what
>>> we do while holding BufFreelistLock is, I think, iterating through
>>> buffers taking and releasing a spinlock on each one (!).

>> Yeah ... spinlocks that, by definition, will be uncontested.

> What makes you think that they are uncontested?

Ah, never mind. I was thinking that we'd only be touching buffers that
were *on* the freelist, but of course this is incorrect. The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer. I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently. Or else fix
things so that the freelist never (well, hardly ever) runs dry.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-23 16:03:28
Message-ID:	CA+TgmobeGYzbNpvf3gwyHM7Onhh75Pw6D8kUYKPjpMViSXsN=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 23, 2012 at 11:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> The expensive part of what
>>>> we do while holding BufFreelistLock is, I think, iterating through
>>>> buffers taking and releasing a spinlock on each one (!).
>
>>> Yeah ... spinlocks that, by definition, will be uncontested.
>
>> What makes you think that they are uncontested?
>
> Ah, never mind. I was thinking that we'd only be touching buffers that
> were *on* the freelist, but of course this is incorrect. The real
> problem there is that BufFreelistLock is also used to protect the
> clock sweep pointer. I think basically we gotta find a way to allow
> multiple backends to run clock sweeps concurrently. Or else fix
> things so that the freelist never (well, hardly ever) runs dry.

I'd come to the same conclusion myself. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-24 14:49:14
Message-ID:	CAMkU=1xpfQRcFPx3x++KVxQ7FQO=6MYERPS22Fh5ekJztmfytA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 21, 2012 at 2:29 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:

>> So, you're proposing that we remove freelist altogether? Sounds reasonable, but that needs to be performance tested somehow. I'm not sure what exactly the test should look like, but it probably should involve an OLTP workload, and large tables being created, populated to fill the cache with pages from the table, and dropped, while the OLTP workload is running. I'd imagine that the worst case scenario looks something like that.
>
> We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

If the head and tail are both protected by BufFreelistLock, I'm pretty
sure this will make things worse, not better.

If we change to having head and tail protected by separate spinlocks,
then I don't see how you can remove the last buffer from the list, or
add a buffer to an empty list, without causing havoc.

Does anyone have ideas for implementing a cross-platform, lock-free,
FIFO linked list? Without that, I don't see how we are going to get
anywhere on this approach.

Cheers,

Jeff

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-24 15:04:41
Message-ID:	CAMkU=1zrHLMUrdr7E9UT1nFyR_eXannJPz3T3vnN+6UBx+CEJw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 23, 2012 at 5:06 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
>>>> We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...
>>
>>> That's kinda my feeling as well. The free list in its current form is
>>> pretty much useless, but I don't think we'll save much by getting rid
>>> of it, because that's just a single test. The expensive part of what
>>> we do while holding BufFreelistLock is, I think, iterating through
>>> buffers taking and releasing a spinlock on each one (!).
>>
>> Yeah ... spinlocks that, by definition, will be uncontested.
>
> What makes you think that they are uncontested?

It is automatically partitioned over 131,072 spinlocks if
shared_buffers=1GB, so the chances of contention seem pretty low.
If a few very hot index root blocks are heavily contested, still that
would only be
encountered a few times out of the 131,072. I guess we could count
them, similar
to your LWLOCK_STATS changes.

> Or for that matter,
> that even an uncontested spinlock operation is cheap enough to do
> while holding a badly contended LWLock?

Is the concern that getting a uncontested spinlock has lots of
instructions, or that the associated fences are expensive?

>
>> So I think
>> it would be advisable to prove rather than just assume that that's a
>> problem.
>
> It's pretty trivial to prove that there is a very serious problem with
> BufFreelistLock. I'll admit I can't prove what the right fix is just
> yet, and certainly measurement is warranted.

From my own analysis and experiments, the mere act of getting the
BufFreelistLock when it is contended is far more important than
anything actually done while holding that lock. When the clock-sweep
is done often enough that the lock is contended, then it is done often
enough to keep the average usagecount low and so the average number of
buffers visited during a clock sweep before finding a usable one was
around 2.0. YMMV.

Can we get some people who run busy high-CPU servers that churn the
cache and think they may be getting hit with this problem post the
results of this query:

select usagecount, count(*) from pg_buffercache group by usagecount;

Cheers,

Jeff

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-01-24 15:07:07
Message-ID:	CA+U5nMJd35FaMGZxYkXRG+vP4C16=ESn0Zq714e1PAsPw0ZYBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 23, 2012 at 4:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> The real
> problem there is that BufFreelistLock is also used to protect the
> clock sweep pointer.

Agreed

> I think basically we gotta find a way to allow
> multiple backends to run clock sweeps concurrently. Or else fix
> things so that the freelist never (well, hardly ever) runs dry.

Agreed.

The only question is what do we do now. I'm happy with the thought of
jam tomorrow, I'd like to see some minor improvements now, given that
when we say "we'll do that even better in the next release" it often
doesn't happen. Which is where those patches come in.

I've posted an improved lock wait analysis patch and have scheduled
some runs on heavily used systems to see what this tells us. Results
from live machines also greatly appreciated, since test systems seem
likely to be inappropriate tests. Patch copied here again.

This is a key issue since RAM is cheap enough now that people are
swamped by it, so large shared_buffers are very common.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-26 02:25:33
Message-ID:	CA+TgmoZj3WjU-eA8z4KPg5TL7LhDt0GFq7tKBHPeiXA4U9CFpg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Your caution is wise. All users of an index have already checked
> whether the index is usable at plan time, so although there is much
> code that assumes they can look at the index itself, that is not
> executed until after the correct checks.
>
> I'll look at VACUUM and other utilities, so thanks for that.
>
> I don't see much point in having the higher level lock, except perhaps
> as a test this code works.

I thought of another way this can cause a problem: suppose that while
we're dropping the relation with only ShareUpdateExclusiveLock, we get
as far as calling DropRelFileNodeBuffers. Just after we finish, some
other process that holds AccessShareLock or RowShareLock or
RowExclusiveLock reads and perhaps even dirties a page in the
relation. Now we've got pages in the buffer pool that might even be
dirty, but the backing file is truncated or perhaps removed
altogether. If the page is dirty, I think the background writer will
eventually choke trying to write out the page. If the page is not
dirty, I'm less certain whether anything is going to explode
violently, but it seems mighty sketchy to have pages in the buffer
pool with a buffer tag that don't exist any more. As the comment
says:

* It is the responsibility of higher-level code to ensure that the
* deletion or truncation does not lose any data that could be needed
* later. It is also the responsibility of higher-level code to ensure
* that no other process could be trying to load more pages of the
* relation into buffers.

...and of course, the intended mechanism is an AccessExclusiveLock.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-01-29 22:01:01
Message-ID:	CA+U5nMK1YzXvBWb94t_VU9Wx44QL9RZ5t6fCR=_WH_LEQRjHZw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 26, 2012 at 2:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Your caution is wise. All users of an index have already checked
>> whether the index is usable at plan time, so although there is much
>> code that assumes they can look at the index itself, that is not
>> executed until after the correct checks.
>>
>> I'll look at VACUUM and other utilities, so thanks for that.
>>
>> I don't see much point in having the higher level lock, except perhaps
>> as a test this code works.
>
> I thought of another way this can cause a problem: suppose that while
> we're dropping the relation with only ShareUpdateExclusiveLock, we get
> as far as calling DropRelFileNodeBuffers. Just after we finish, some
> other process that holds AccessShareLock or RowShareLock or
> RowExclusiveLock reads and perhaps even dirties a page in the
> relation.

I can't see any way that situation can occur. The patch *explicitly*
waits until all people that can see the index as usable have dropped
their lock. So I don't think this is necessary. Having said that,
since we are talking about the index and not the whole table, if I
believe the above statement then I can't have any reasonable objection
to doing as you suggest.

Patch now locks index in AccessExclusiveLock in final stage of drop.

v3 attached.

If you have suggestions to improve grammar issues, they;re most
welcome. Otherwise this seems good to go.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
drop_index_concurrently.v3.patch	text/x-diff	23.0 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-02-01 02:56:12
Message-ID:	CA+TgmoZwbk=+wJFJcVgkAKggNxP-jfyL2K-Nj-CFRdRm8tNw3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 29, 2012 at 5:01 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I can't see any way that situation can occur. The patch *explicitly*
> waits until all people that can see the index as usable have dropped
> their lock. So I don't think this is necessary. Having said that,
> since we are talking about the index and not the whole table, if I
> believe the above statement then I can't have any reasonable objection
> to doing as you suggest.
>
> Patch now locks index in AccessExclusiveLock in final stage of drop.
>
> v3 attached.
>
> If you have suggestions to improve grammar issues, they;re most
> welcome. Otherwise this seems good to go.

I improved the grammar issues in the attached version of the patch -
the syntax is now simpler and more consistent, IF EXISTS now works,
and RESTRICT is accepted (without changing the behavior) while CASCADE
fails with a nicer error message. I also fixed a bug in
RangeVarCallbackForDropRelation.

However, testing reveals that this doesn't really work. I found that
by playing around with a couple of concurrent sessions, I could get a
SELECT statement to hang waiting on the AccessExclusiveLock in the
second phase, because we're really still opening the relation (even
though it's invalid) and thus still locking it, and thus still waiting
for the AccessExclusiveLock. And I managed to get this:

rhaas=# select * from foo where a = 1;
ERROR: could not open relation with OID 16388

So then I tried changing the lock level back to
ShareUpdateExclusiveLock. It appears that that merely narrows the
window for errors of this type, rather than getting rid of them. I
ran this in one window:

pgbench -c 30 -j 30 -f f -n -T 180

where the file f contained this:

select * from foo where a = 1

and then in the other window I repeatedly did this:

rhaas=# create index foo_a on foo (a);
CREATE INDEX
rhaas=# drop index concurrently foo_a;
DROP INDEX

and pgbench started issuing messages like this:

Client 2 aborted in state 0: ERROR: could not open relation with OID 16394
Client 9 aborted in state 0: ERROR: could not open relation with OID 16397
Client 18 aborted in state 0: ERROR: could not open relation with OID 16397

My conclusion is that regardless of whether ShareUpdateExclusiveLock
or AccessExclusiveLock is used for the final phase of the drop, this
isn't safe. I haven't tracked down exactly where the wheels are
coming off, but the problem does not occur when using the non-current
form of DROP INDEX.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
drop_index_concurrently.v4.patch	application/octet-stream	26.8 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-02-01 17:18:59
Message-ID:	CA+U5nM+c0ZZ9PB+fNh+MJT+4+_QUJb9xMk_woZ2OzbzjBM0s1A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 2:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I improved the grammar issues in the attached version of the patch -
> the syntax is now simpler and more consistent, IF EXISTS now works,
> and RESTRICT is accepted (without changing the behavior) while CASCADE
> fails with a nicer error message. I also fixed a bug in
> RangeVarCallbackForDropRelation.

Plus tests as well. Many thanks.

I fixed the main bug you observed and your test case now works
perfectly. I used pgbench to continuously drop/create an index, so a
little more than manual testing.

v5 Attached.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
drop_index_concurrently.v5.patch	text/x-diff	27.6 KB

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-02-01 19:31:12
Message-ID:	1328124672.28270.3.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:
> Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-02-01 19:39:00
Message-ID:	CA+U5nMJycjsHcu__qC4nxu7nyn2F7eF08hX=z=oHu873nzhiDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:
>> Patch now locks index in AccessExclusiveLock in final stage of drop.
>
> Doesn't that defeat the point of doing the CONCURRENTLY business in the
> first place?

That was my initial reaction.

We lock the index in AccessExclusiveLock only once we are certain
nobody else is looking at it any more.

So its a Kansas City Shuffle, with safe locking in case of people
doing strange low level things.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-02-03 15:28:05
Message-ID:	CA+TgmoYJ8o8ZT=K=4STR1uNvebYzuP2==FuN2Xn-C_YiTMvfOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:
>>> Patch now locks index in AccessExclusiveLock in final stage of drop.
>>
>> Doesn't that defeat the point of doing the CONCURRENTLY business in the
>> first place?
>
> That was my initial reaction.
>
> We lock the index in AccessExclusiveLock only once we are certain
> nobody else is looking at it any more.
>
> So its a Kansas City Shuffle, with safe locking in case of people
> doing strange low level things.

Yeah, I think this is much safer, and in this version that doesn't
seem to harm concurrency.

Given our previous experiences in this area, I wouldn't like to bet my
life savings on this having no remaining bugs - but if it does, I
can't find them.

I'll mark this "Ready for Committer".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jim Nasby <jim(at)nasby(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-02-12 19:30:40
Message-ID:	CAMkU=1yqTy=zeegfoVfk8hZY7cWu-rt8ubixLCa=y+Hcy4djFg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 8, 2012 at 8:19 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> Not having a freelist at all is probably a simpler way of avoiding the
>> lock contention, so I'll happily back that suggestion instead. Patch
>> attached, previous patch revoked.
>
> v2 attached with cleanup of some random stuff that crept onto patch.

Hi Simon,

Based on the way this patch leaves the old code behind (using #ifdef),
this looks more like a WIP patch which you want people to do
performance testing with, rather than patch proposed for committing.
If that is the case, could you outline the type of performance testing
where you think it would make a difference (and whether it should be
done on top of the main patch from this thread, the concurrent index
drop one)?

Also, it would be much easier to do the performance testing if this
behavior was controlled by a temporarily added GUC, rather than an
#ifdef. Do you think it is feasible to do that, or would the overhead
of a single "if (some_guc)" per StrategyGetBuffer and
StrategyFreeBuffer call distort things too much?

Cheers,

Jeff

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Should I implement DROP INDEX CONCURRENTLY?
Date:	2012-03-28 13:51:54
Message-ID:	CA+TgmobFn9VvAoOr4ucJvCUJfyWzjpi-n=9mi0JiszS2HDRYJw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 3, 2012 at 10:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:
>>>> Patch now locks index in AccessExclusiveLock in final stage of drop.
>>>
>>> Doesn't that defeat the point of doing the CONCURRENTLY business in the
>>> first place?
>>
>> That was my initial reaction.
>>
>> We lock the index in AccessExclusiveLock only once we are certain
>> nobody else is looking at it any more.
>>
>> So its a Kansas City Shuffle, with safe locking in case of people
>> doing strange low level things.
>
> Yeah, I think this is much safer, and in this version that doesn't
> seem to harm concurrency.
>
> Given our previous experiences in this area, I wouldn't like to bet my
> life savings on this having no remaining bugs - but if it does, I
> can't find them.
>
> I'll mark this "Ready for Committer".

I don't think this has been committed - does that mean you've decided
against doing so, or just haven't had the round tuits?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Merlin Moncure <mmoncure(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-06-10 20:19:53
Message-ID:	20120610201953.GC10817@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 24, 2011 at 03:38:09PM -0400, Tom Lane wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> > On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> >> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
> >> recently when frobbing around some indexes I realized that there is no
> >> equivalent for DROP INDEX, and this is a similar but lesser problem
> >> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
> >> EXCLUSIVE lock on the parent table while doing the work to unlink
> >> files, which nominally one would think to be trivial, but I assure you
> >> it is not at times for even indexes that are a handful of gigabytes
> >> (let's say ~=< a dozen).
>
> > Are you sure that you are really waiting on the time to unlink the
> > file? there's other stuff going on in there like waiting for lock,
> > plan invalidation, etc. Point being, maybe the time consuming stuff
> > can't really be deferred which would make the proposal moot.
>
> Assuming the issue really is the physical unlinks (which I agree I'd
> like to see some evidence for), I wonder whether the problem could be

I'd believe it. With a cold cache (sudo sysctl -w vm.drop_caches=3), my
desktop takes 2.6s to "rm" a 985 MiB file.

> addressed by moving smgrDoPendingDeletes() to after locks are released,
> instead of before, in CommitTransaction/AbortTransaction. There does
> not seem to be any strong reason why we have to do that before lock
> release, since incoming potential users of a table should not be trying
> to access the old physical storage after that anyway.

Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
reviewed the code that runs between the old and new call sites, and I did not
identify a hazard of moving it as you describe.

Attachment	Content-Type	Size
unlink-after-lockrel-v1.patch	text/plain	3.0 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-06-10 21:27:33
Message-ID:	CA+TgmoY5kcn0Y=oXF0RHg2J3XwnvFWmL0cj1W9HqpfUcvAR9RQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Wed, Aug 24, 2011 at 03:38:09PM -0400, Tom Lane wrote:
>> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> > On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>> >> At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
>> >> recently when frobbing around some indexes I realized that there is no
>> >> equivalent for DROP INDEX, and this is a similar but lesser problem
>> >> (as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
>> >> EXCLUSIVE lock on the parent table while doing the work to unlink
>> >> files, which nominally one would think to be trivial, but I assure you
>> >> it is not at times for even indexes that are a handful of gigabytes
>> >> (let's say ~=< a dozen).
>>
>> > Are you sure that you are really waiting on the time to unlink the
>> > file? there's other stuff going on in there like waiting for lock,
>> > plan invalidation, etc. Point being, maybe the time consuming stuff
>> > can't really be deferred which would make the proposal moot.
>>
>> Assuming the issue really is the physical unlinks (which I agree I'd
>> like to see some evidence for), I wonder whether the problem could be
>
> I'd believe it. With a cold cache (sudo sysctl -w vm.drop_caches=3), my
> desktop takes 2.6s to "rm" a 985 MiB file.
>
>> addressed by moving smgrDoPendingDeletes() to after locks are released,
>> instead of before, in CommitTransaction/AbortTransaction. There does
>> not seem to be any strong reason why we have to do that before lock
>> release, since incoming potential users of a table should not be trying
>> to access the old physical storage after that anyway.
>
> Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
> reviewed the code that runs between the old and new call sites, and I did not
> identify a hazard of moving it as you describe.

I looked at this when we last discussed it and didn't see a problem
either, so I tend to agree that we ought to go ahead and do this,
unless someone else sees a problem. Holding locks for even slightly
less time is a good idea, and if it turns out to be substantially less
time in some cases, then we win more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-06-10 21:37:50
Message-ID:	20664.1339364270@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
>> reviewed the code that runs between the old and new call sites, and I did not
>> identify a hazard of moving it as you describe.

> I looked at this when we last discussed it and didn't see a problem
> either, so I tend to agree that we ought to go ahead and do this,

+1, as long as you mean in 9.3 not 9.2. I don't see any risk either,
but the time for taking new risks in 9.2 is past.

Noah, please add this patch to the upcoming CF, if you didn't already.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-06-14 14:21:49
Message-ID:	CA+TgmoYtU0F_twcy9ZzcY4kAuzETontzMxn1Hncfbd-27G3QJw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 10, 2012 at 5:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>>> Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
>>> reviewed the code that runs between the old and new call sites, and I did not
>>> identify a hazard of moving it as you describe.
>
>> I looked at this when we last discussed it and didn't see a problem
>> either, so I tend to agree that we ought to go ahead and do this,
>
> +1, as long as you mean in 9.3 not 9.2. I don't see any risk either,
> but the time for taking new risks in 9.2 is past.
>
> Noah, please add this patch to the upcoming CF, if you didn't already.

I re-reviewed this and committed it.

Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it
just for extensions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-06-14 14:45:40
Message-ID:	7743.1339685140@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it
> just for extensions?

I'm too lazy to go look, but it certainly ought to be in use.
The idea is that that's the phase for post-lock-release cleanup,
and anything that can possibly be postponed till after releasing
locks certainly should be ...

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)
Date:	2012-06-14 15:04:34
Message-ID:	CA+TgmoY6FdA=JcjM1D2TdRj2dTH=X0TdegH8BFOun04E5=Yi4w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 14, 2012 at 10:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it
>> just for extensions?
>
> I'm too lazy to go look, but it certainly ought to be in use.
> The idea is that that's the phase for post-lock-release cleanup,
> and anything that can possibly be postponed till after releasing
> locks certainly should be ...

Oh, you're right. I missed the logic in ResourceOwnerReleaseInternal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company