Re: Should I implement DROP INDEX CONCURRENTLY?

From: Jim Nasby <jim(at)nasby(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should I implement DROP INDEX CONCURRENTLY?
Date: 2012-01-04 03:59:03
Message-ID: 06CF0B08-8E1E-4E47-81F3-90C6771D58F9@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:
> On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jim Nasby <jim(at)nasby(dot)net> writes:
>>> Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).
>>
>> Well, maybe, but I think the historical evidence suggests that that
>> approach will be a loser, simply because no matter how cheap, the
>> freelist will remain a centralized and heavily contended data structure.
>> IMO we need to be looking for a mechanism that has no single point of
>> contention, and modifying the clock sweep rules looks like the best way
>> to get there.
>>
>> Still, he who wants to do the work can try whatever approach he feels
>> like.
>
> It might be possible to partition the free list. So imagine, for
> example, 8 free lists. The background writer runs the clock sweep and
> finds some buffers that are about ready to be reallocated and
> distributes one-eighth of them to each free list. Then, when a
> backend wants to allocate a buffer, it picks a free list somehow
> (round robin?) and pulls a buffer off it. If the buffer turns out to
> have been used since it was added to the free list, we give up on it
> and try again. This hopefully shouldn't happen too often, though, as
> long as we only add enough buffers to the free list to satisfy the
> requests that we expect to get over the next
> small-fraction-of-a-second.
>
> Of course you have to think about what happens if you find that your
> chosen free list is empty. In that case you probably want to try a
> different one, and if in the worst case where they're all empty, run
> the clock sweep in the foreground. You probably also want to kick the
> background writer in the pants if even a single one is empty (or
> nearly empty?) and tell it to hurry up and add some more buffers to
> the freelist.

If it comes down to it, we can look at partitioning the free list. But here's the thing: this is the strategy that FreeBSD (and I think now Linux as well) use to service memory requests, be they for free memory or for reading data from disk. If it's good enough for an OS to use, I would expect we could make it work as well.

I would expect that pulling a page off the free list would be an extremely fast operation... lock the read lock on the list (we should probably have separate locks for putting stuff on the list vs taking it off), pin the buffer (which shouldn't be contentious), update the read pointer and drop the read lock. Even in C code that's probably less than 100 machine instructions, and no looping. Compared to the cost of running a clock sweep it should be significantly faster. So while there will be contention on the free list read lock, none of that contention should last very long.

Do we have any other places where we take extremely short locks like this and still run into problems?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2012-01-04 05:47:50 PL/Perl Does not Like vstrings
Previous Message Tom Lane 2012-01-04 02:38:38 Re: controlling the location of server-side SSL files