patch: avoid heavyweight locking on hash metapage

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: patch: avoid heavyweight locking on hash metapage
Date: 2012-05-30 22:14:41
Message-ID: CA+Tgmoaf=nOJxLyzGcbrrY+pe-0VLL0vfHi6tjdM3fFtVwsOmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I developed the attached patch to avoid taking a heavyweight lock on
the metapage of a hash index. Instead, an exclusive buffer content
lock is viewed as sufficient permission to modify the metapage, and a
shared buffer content lock is used when such modifications need to be
prevented. For the most part this is a trivial change, because we
were already taking these locks: we were just taking the heavyweight
locks in addition. The only sticking point is that, when we're
searching or inserting, we previously locked the bucket before
releasing the heavyweight metapage lock, which is unworkable when
holding only a buffer content lock because (1) we might deadlock and
(2) buffer content locks can't be held for long periods of time even
when there's no deadlock risk. To fix this, I implemented a simple
loop-and-retry system: we release the metapage content lock, acquire
the heavyweight lock on the target bucket, and then reacquire the
metapage content lock and check that the bucket mapping has not
changed. Normally it hasn't, and we're done. But if by chance it
has, we simply unlock the metapage, release the heavyweight lock we
acquired previously, lock the new bucket, and loop around again. Even
in the worst case we cannot loop very many times here, since we don't
split the same bucket again until we've split all the other buckets,
and 2^N gets big pretty fast.

I tested the effect of this by setting up a series of 5-minute
read-only pgbench run at scale factor 300 with 8GB of shared buffers
on the IBM POWER7 machine. For these runs, I dropped the the primary
key constraint on pgbench_accounts (aid) and created a hash index on
that column instead. I ran each test three times and took the median
result. Here are the results on unpatched master, at various client
counts:

m01 tps = 9004.070422 (including connections establishing)
m04 tps = 34838.126542 (including connections establishing)
m08 tps = 70584.356826 (including connections establishing)
m16 tps = 128726.248198 (including connections establishing)
m32 tps = 123639.248172 (including connections establishing)
m64 tps = 104650.296143 (including connections establishing)
m80 tps = 88412.736416 (including connections establishing)

And here are the results with the patch:

h01 tps = 9110.561413 (including connections establishing) [+1.2%]
h04 tps = 36012.787524 (including connections establishing) [+3.4%]
h08 tps = 72606.302993 (including connections establishing) [+2.9%]
h16 tps = 141938.762793 (including connections establishing) [+10%]
h32 tps = 205325.232316 (including connections establishing) [+66%]
h64 tps = 274156.881975 (including connections establishing) [+162%]
h80 tps = 291224.012066 (including connections establishing) [+229%]

Obviously, even with this change, there's a lot not to like about hash
indexes: they still won't be crash-safe, and they still won't perform
as well under high concurrency as btree indexes. But neither of those
problems seems like a good reason not to fix this problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
hash-avoid-heavyweight-metapage-locks-v1.patch application/octet-stream 14.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ants Aasma 2012-05-30 22:40:13 Re: Early hint bit setting
Previous Message Tom Lane 2012-05-30 22:10:12 We're not lax enough about maximum time zone offset from UTC