Re: [Testperf-general] Re: First set of OSDL Shared Memscalability results, some wierdness ...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Timothy D(dot) Witham" <wookie(at)osdl(dot)org>, josh(at)agliodbs(dot)com, pgsql-performance(at)postgresql(dot)org, testperf-general(at)pgfoundry(dot)org
Subject: Re: [Testperf-general] Re: First set of OSDL Shared Memscalability results, some wierdness ...
Date: 2004-10-15 16:48:14
Message-ID: 10202.1097858894@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> Speculating wildly because I don't know that portion of the code this might
> be:
> CONJECTURE 1: the act of searching for a block in cache is an O(n)
> operation, not an O(1) or O(log n) operation

I'm not sure how this meme got into circulation, but I've seen a couple
of people recently either conjecturing or asserting that. Let me remind
people of the actual facts:

1. We use a hashtable to keep track of which blocks are currently in
shared buffers. Either a cache hit or a cache miss should be O(1),
because the hashtable size is scaled proportionally to shared_buffers,
and so the number of hash entries examined should remain constant.

2. There are some allegedly-not-performance-critical operations that do
scan through all the buffers, and therefore are O(N) in shared_buffers.

I just eyeballed all the latter, and came up with this list of O(N)
operations and their call points:

AtEOXact_Buffers
transaction commit or abort
UnlockBuffers
transaction abort, backend exit
StrategyDirtyBufferList
background writer's idle loop
FlushRelationBuffers
VACUUM
DROP TABLE, DROP INDEX
TRUNCATE, CLUSTER, REINDEX
ALTER TABLE SET TABLESPACE
DropRelFileNodeBuffers
TRUNCATE (only for ON COMMIT TRUNC temp tables)
REINDEX (inplace case only)
smgr_internal_unlink (ie, the tail end of DROP TABLE/INDEX)
DropBuffers
DROP DATABASE

The fact that the first two are called during transaction commit/abort
is mildly alarming. The constant factors are going to be very tiny
though, because what these routines actually do is scan backend-local
status arrays looking for locked buffers, which they're not going to
find very many of. For instance AtEOXact_Buffers looks like

int i;

for (i = 0; i < NBuffers; i++)
{
if (PrivateRefCount[i] != 0)
{
// some code that should never be executed at all in the commit
// case, and not that much in the abort case either
}
}

I suppose with hundreds of thousands of shared buffers this might get to
the point of being noticeable, but I've never seen it show up at all in
profiling with more-normal buffer counts. Not sure if it's worth
devising a more complex data structure to aid in finding locked buffers.
(To some extent this code is intended to be belt-and-suspenders stuff
for catching omissions elsewhere, and so a more complex data structure
that could have its own bugs is not especially attractive.)

The one that's bothering me at the moment is StrategyDirtyBufferList,
which is a new overhead in 8.0. It wouldn't directly affect foreground
query performance, but indirectly it would hurt by causing the bgwriter
to suck more CPU cycles than one would like (and it holds the BufMgrLock
while it's doing it, too :-(). One easy way you could see whether this
is an issue in the OSDL test is to see what happens if you double all
three bgwriter parameters (delay, percent, maxpages). This should
result in about the same net I/O demand from the bgwriter, but
StrategyDirtyBufferList will be executed half as often.

I doubt that the other ones are issues. We could improve them by
devising a way to quickly find all buffers for a given relation, but
I am just about sure that complicating the buffer management to do so
would be a net loss for normal workloads.

> For the record, what I think we need is dynamically resizable
> shared_buffers, not a-priori knowledge of what you should set
> shared_buffers to.

This isn't likely to happen because the SysV shared memory API isn't
conducive to it. Absent some amazingly convincing demonstration that
we have to have it, the effort of making it happen in a portable way
isn't going to get spent.

> I've been thinking about implementing a scheme that helps you decide how
> big the shared_buffers SHOULD BE, by making the LRU list bigger than the
> cache itself, so you'd be able to see whether there is beneficial effect in
> increasing shared_buffers.

ARC already keeps such a list --- couldn't you learn what you want to
know from the existing data structure? It'd be fairly cool if we could
put out warnings "you ought to increase shared_buffers" analogous to the
existing facility for noting excessive checkpointing.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yann Michel 2004-10-15 16:49:39 Re: plans for bitmap indexes?
Previous Message Gaetano Mendola 2004-10-15 16:39:36 Re: Why we still see some reports of "could not access transaction

Browse pgsql-performance by date

  From Date Subject
Next Message Richard_D_Levine 2004-10-15 16:54:44 Does PostgreSQL run with Oracle?
Previous Message Merlin Moncure 2004-10-15 16:22:40 Re: Performance on Win32 vs Cygwin