Re: "large" spam tables and performance: postgres memory parameters

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Gary Warner" <gar(at)cis(dot)uab(dot)edu>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: "large" spam tables and performance: postgres memory parameters
Date: 2010-01-07 17:35:48
Message-ID: 4B45C714020000250002DEF4@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Welcome out of the shadows, Gary! ;-)

Gary Warner <gar(at)cis(dot)uab(dot)edu> wrote:

> My biggest question mark there really has to do with how many
> users I have and how that might alter the results.

In benchmarks I've run with our software load, I've found that I get
best throughput when I use a connection pool which limits the active
database transaction count to (2 * CPU_count) + effective_spindles.
CPU_count should be fairly obvious; effective_spindles is
essentially "what's the maximum number of random read requests your
disk subsystem can productively handle concurrently?"

One or two others commented that their benchmark results seemed to
fit with that formula. I don't know just how far to trust it as a
generalization, but in the absence of anything else, it probably
isn't a horrible rule of thumb. If you expect to have more logical
connections than that number, you might want to establish a
connection pool which limits to that number. Be sure that if it is
"full" when a request to start a transaction comes in, the request
queues instead of failing.

To convince yourself that this really helps, picture a hypothetical
machine which can only make progress on one request at a time, but
will task-switch among as many requests as are presented. Then
picture 100 requests being presented simultaneously, each of which
needs one second of time to complete. Even without figuring in the
overhead of task switching or the cache effects, it's clear that a
connection "pool" of one connection improve response time by almost
50% with no cost in throughput. When you factor in the very real
costs of having large numbers of requests competing, both throughput
and response time win with connection pooling above some threshold.
Of course, with multiple CPUs, multiple spindles, network latency,
etc., the pool should be large enough to tend to keep them all busy.

Of course, the exact point at which a connection pool gives optimal
performance depends on so many factors that the only way to *really*
get it right is to test with a realistic load through your actual
software. The above is just intended to suggest a reasonable
starting point.

-Kevin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Kempter 2010-01-07 17:46:24 Re: Massive table (500M rows) update nightmare
Previous Message Kevin Grittner 2010-01-07 16:57:59 Re: Massive table (500M rows) update nightmare