amazon ec2

Lists: pgsql-performance
From: Joel Reymont <joelr1(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: amazon ec2
Date: 2011-05-03 18:48:35
Message-ID: ADB48657-780F-4465-ABE6-752C2FB3AF48@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

What are the best practices for setting up PG 9.x on Amazon EC2 to get the best performance?

Thanks in advance, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------


From: Alan Hodgson <ahodgson(at)simkin(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 19:41:20
Message-ID: 201105031241.20348.ahodgson@simkin.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On May 3, 2011 11:48:35 am Joel Reymont wrote:
> What are the best practices for setting up PG 9.x on Amazon EC2 to get the
> best performance?
>

I am also interested in tips for this. EBS seems to suck pretty bad.


From: Joel Reymont <joelr1(at)gmail(dot)com>
To: Alan Hodgson <ahodgson(at)simkin(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 19:43:13
Message-ID: F3B854F7-53C8-4895-99BF-EB64BD6E5B53@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:

> I am also interested in tips for this. EBS seems to suck pretty bad.

Alan, can you elaborate? Are you using PG on top of EBS?

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------


From: Alan Hodgson <ahodgson(at)simkin(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 20:09:51
Message-ID: 201105031309.52154.ahodgson@simkin.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On May 3, 2011 12:43:13 pm you wrote:
> On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > I am also interested in tips for this. EBS seems to suck pretty bad.
>
> Alan, can you elaborate? Are you using PG on top of EBS?
>

Trying to, yes.

Let's see ...

EBS volumes seem to vary in speed. Some are relatively fast. Some are really
slow. Some fast ones become slow randomly. Some are fast attached to one
instance, but really slow attached to another.

Fast being a relative term, though. The fast ones seem to be able to do maybe
400 random IOPS. And of course you can only get about 80MB/sec sequential
access to them on a good day.

Which is why I'm interested in how other people are doing it. So far EC2
doesn't seem well suited to running databases at all.


From: Mark Rostron <mrostron(at)ql2(dot)com>
To: Alan Hodgson <ahodgson(at)simkin(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 20:48:47
Message-ID: BANLkTinKbgzSGwg1GWTpwUCkwSpaCaebsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

iowait is a problem on any platform that relies on spinning media, compared
to RAM.
no matter how fast a disk is, and no matter how intelligent the controller
is, you are still dealing with an access speed differential of 10^6 (speed
of disk access compared to memory access).
i have had good results by avoiding it.
if you can do this, ec2 is not too shabby, but beware - it doesn't come
free.
this is achievable under the following circumstances (and maybe there are
other ways to do this).
i use a technique of pro-actively querying enough of my anticipated result
set with a daemon procedure.
as long as the frequency of your query daemon execution is greater than that
of the competitor processes (eg ETL and other update activity), AND a
substantial part of the result set will fit in available RAM, then the
result set will be served from file system cache at the time you want it.
i have found that it doesn't take much to get this to happen, once you have
identified your critical result set.
like - you can get away with running it once/hour, and i'm still reducing
the frequency and getting good results.
this approach basically assumes a 90/10 rule - at any point in time, you
only want to access 10% of your data. if you can work out what the 10% is,
and it will fit into RAM, then you can set it up to cache it.
it also imposes no additional cost in ec2, because Amazon doesn't bill you
for CPU activity, although the large-RAM machines do cost more. Depends on
how big your critical result set is, and how much speed you need.

dont know if this helps - the success/failure of it depends on your typical
query activity, the size of your critical result set, and whether you are
able to get enough RAM to make this work.

as i said it doesn't come for free, but you can make it work.

as a further point, try also checking out greenplum - it is an excellent
postgres derivative with a very powerful free version. the reason why i
bring it up is because it offers block-level compression (with caveats - it
also doesn't come for free, so do due diligence and rtfm carefully). The
compression enabled me to improve the cache hit rate, and so you further
reduce the iowait problem.
greenplum is also a better parallel machine than postgres, so combining the
cache technique above with greenplum compression and parallel query, i have
been able to get 20:1 reduction in response times for some of our queries.
obviously introducing new database technology is a big deal, but we needed
the speed, and it kinda worked.

mr

On Tue, May 3, 2011 at 1:09 PM, Alan Hodgson <ahodgson(at)simkin(dot)ca> wrote:

> On May 3, 2011 12:43:13 pm you wrote:
> > On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > > I am also interested in tips for this. EBS seems to suck pretty bad.
> >
> > Alan, can you elaborate? Are you using PG on top of EBS?
> >
>
> Trying to, yes.
>
> Let's see ...
>
> EBS volumes seem to vary in speed. Some are relatively fast. Some are
> really
> slow. Some fast ones become slow randomly. Some are fast attached to one
> instance, but really slow attached to another.
>
> Fast being a relative term, though. The fast ones seem to be able to do
> maybe
> 400 random IOPS. And of course you can only get about 80MB/sec sequential
> access to them on a good day.
>
> Which is why I'm interested in how other people are doing it. So far EC2
> doesn't seem well suited to running databases at all.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
To: Alan Hodgson <ahodgson(at)simkin(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 20:52:37
Message-ID: BANLkTi=nTpG=bS+rAONQi_ZDorBMwUzN1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, May 3, 2011 at 2:09 PM, Alan Hodgson <ahodgson(at)simkin(dot)ca> wrote:

> On May 3, 2011 12:43:13 pm you wrote:
> > On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
> > > I am also interested in tips for this. EBS seems to suck pretty bad.
> >
> > Alan, can you elaborate? Are you using PG on top of EBS?
> >
>
> Trying to, yes.
>
> Let's see ...
>
> EBS volumes seem to vary in speed. Some are relatively fast. Some are
> really
> slow. Some fast ones become slow randomly. Some are fast attached to one
> instance, but really slow attached to another.
>
>
I ran pgbench tests late last year comparing EC2, GoGrid, a 5 year-old lab
server and a new server. Whether I used a stock postgresql.conf or tweaked,
the current 8.4 or 9.0, or varied the EC2 instance size EC2 was always at
the bottom ranging from 409.834 to 693.100 tps. GoGrid's pgbench TPS
numbers in similar tests were, on average, 3X that of EC2 (1,399.550 to
1,631.887 tps). The tests I conducted were small with 10 connections and
total 5,000 transactions. The single variable that helped pgbench tests in
EC2 was to select an instance size where the number of cores was equal to or
greater than the number of connections I used in the tests however this only
improved things slightly (715.931 tps).

For comparisons purposes, I ran the same tests on a 24-way X5650 with 12 GB
and SAS RAID 10. This server typically ranged from 2,188.348 to 2,216.377
tps.

I attributed GoGrids superior performance over EC2 as EC2 simply
being over-allocated but that's just speculation on my part. To test my
theory, I had wanted to put the database on a ramdisk, or like device, in
EC2 and GoGrid but never got around to it.

> Fast being a relative term, though. The fast ones seem to be able to do
> maybe
> 400 random IOPS. And of course you can only get about 80MB/sec sequential
> access to them on a good day.
>
> Which is why I'm interested in how other people are doing it. So far EC2
> doesn't seem well suited to running databases at all.
>
>
I was doing this perhaps to convince management to give me some time to
validate our software (PG backed) on some of the cloud providers but with
those abysmal numbers I didn't even bother at the time. I may revisit at
some point b/c I know Amazon at least has been making architecture
adjustments and updates.

Greg


From: david(at)lang(dot)hm
To: Alan Hodgson <ahodgson(at)simkin(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 20:58:43
Message-ID: alpine.DEB.2.00.1105031357220.25985@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

phoronix did some benchmarks of the ec2 machines and they show pretty poor
numbers, especially in the I/O side of things

http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_round1&num=1
http://www.phoronix.com/scan.php?page=article&item=amazon_ec2_micro&num=1

David Lang

On Tue, 3 May 2011, Alan Hodgson wrote:

> Date: Tue, 3 May 2011 13:09:51 -0700
> From: Alan Hodgson <ahodgson(at)simkin(dot)ca>
> To: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] amazon ec2
>
> On May 3, 2011 12:43:13 pm you wrote:
>> On May 3, 2011, at 8:41 PM, Alan Hodgson wrote:
>>> I am also interested in tips for this. EBS seems to suck pretty bad.
>>
>> Alan, can you elaborate? Are you using PG on top of EBS?
>>
>
> Trying to, yes.
>
> Let's see ...
>
> EBS volumes seem to vary in speed. Some are relatively fast. Some are really
> slow. Some fast ones become slow randomly. Some are fast attached to one
> instance, but really slow attached to another.
>
> Fast being a relative term, though. The fast ones seem to be able to do maybe
> 400 random IOPS. And of course you can only get about 80MB/sec sequential
> access to them on a good day.
>
> Which is why I'm interested in how other people are doing it. So far EC2
> doesn't seem well suited to running databases at all.
>
>


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 21:39:23
Message-ID: 4DC0760B.4060501@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 5/3/11 11:48 AM, Joel Reymont wrote:
> What are the best practices for setting up PG 9.x on Amazon EC2 to get the best performance?

Yes. Don't use EC2.

There is no "best" performance on EC2. There's not even "good
performance". Basically, EC2 is the platform for when performance
doesn't matter.

Use a dedicated server, or use a better cloud host.

http://it.toolbox.com/blogs/database-soup/how-to-make-your-database-perform-well-on-amazon-ec2-45725

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Mark Rostron <mrostron(at)ql2(dot)com>
Cc: Alan Hodgson <ahodgson(at)simkin(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 22:39:28
Message-ID: 4DC08420.7050602@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Mark Rostron wrote:
> the success/failure of it depends on your typical query activity, the
> size of your critical result set, and whether you are able to get
> enough RAM to make this work.

Basically, it all comes down to "does the working set of data I access
frequently fit in RAM?" If it does, it's possible to get reasonable
performance out of an EC2 instance. The EBS disks are so slow, both on
average and particularly in cases where you have contention with other
users slowing you down, that any situation where you have to use them is
never going to work well. If most of the data fits in RAM, and the CPU
resources available to your instance are sufficient to service your
queries, you might see acceptable performance.

> greenplum is also a better parallel machine than postgres, so
> combining the cache technique above with greenplum compression and
> parallel query, i have been able to get 20:1 reduction in response
> times for some of our queries.

I've also seen over a 20:1 speedup over PostgreSQL by using Greenplum's
free Community Edition server, in situations where its column store +
compression features work well on the data set. That's easiest with an
append-only workload, and the data set needs to fit within the
constraints where indexes on compressed data are useful. But if you fit
the use profile it's good at, you end up with considerable ability to
trade-off using more CPU resources to speed up queries. It effectively
increases the amount of data that can be cached in RAM by a large
multiple, and in the EC2 context (where any access to disk is very slow)
it can be quite valuable. My colleague Gabrielle wrote something about
setting this up on an earlier version of Greenplum's software at
http://blog.2ndquadrant.com/en/2010/03/installing-greenplum-sne-ec2.html
that gives an idea how that was setup.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Greg Spiegelberg <gspiegelberg(at)gmail(dot)com>
Cc: Alan Hodgson <ahodgson(at)simkin(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-03 22:44:13
Message-ID: 4DC0853D.4070401@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg Spiegelberg wrote:
> I ran pgbench tests late last year comparing EC2, GoGrid, a 5 year-old
> lab server and a new server. Whether I used a stock postgresql.conf
> or tweaked, the current 8.4 or 9.0, or varied the EC2 instance size
> EC2 was always at the bottom ranging from 409.834 to 693.100 tps.
> GoGrid's pgbench TPS numbers in similar tests were, on average, 3X
> that of EC2 (1,399.550 to 1,631.887 tps). The tests I conducted were
> small with 10 connections and total 5,000 transactions. The single
> variable that helped pgbench tests in EC2 was to select an instance
> size where the number of cores was equal to or greater than the number
> of connections I used in the tests however this only improved things
> slightly (715.931 tps).

The standard pgbench test is extremely sensitive to how fast
transactions can be committed to disk. That doesn't reflect what
performance looks like on most real-world workloads, which tend toward
more reads. The fact that GoGrid is much faster at doing commits than
EC2 is interesting, but that's only one of many parameters that impact
performance on more normal workloads.

The one parameter that can change how the test runs is turning off
synchronous_commit, which pulls the commit time out of the results to
some extent. And if you'd switched pgbench to a more read-oriented
test, you'd discover it becomes extremely sensitive to the size of the
database, as set by pgbench's scale parameter during setup.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Jim Nasby <jim(at)nasby(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Mark Rostron <mrostron(at)ql2(dot)com>, Alan Hodgson <ahodgson(at)simkin(dot)ca>, pgsql-performance(at)postgresql(dot)org
Subject: Re: amazon ec2
Date: 2011-05-04 13:05:35
Message-ID: 00C0C53C-7042-4C05-B6CD-68BFDF400C76@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On May 3, 2011, at 5:39 PM, Greg Smith wrote:
> I've also seen over a 20:1 speedup over PostgreSQL by using Greenplum's free Community Edition server, in situations where its column store + compression features work well on the data set. That's easiest with an append-only workload, and the data set needs to fit within the constraints where indexes on compressed data are useful. But if you fit the use profile it's good at, you end up with considerable ability to trade-off using more CPU resources to speed up queries. It effectively increases the amount of data that can be cached in RAM by a large multiple, and in the EC2 context (where any access to disk is very slow) it can be quite valuable.

FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a page from shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given page isn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you to cache amounts of data that far exceed the amount of memory you could put in a physical server.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Shaun Thomas <sthomas(at)peak6(dot)com>
To: Joel Reymont <joelr1(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: amazon ec2
Date: 2011-05-04 14:04:18
Message-ID: 4DC15CE2.8020708@peak6.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 05/03/2011 01:48 PM, Joel Reymont wrote:

> What are the best practices for setting up PG 9.x on Amazon EC2 to
> get the best performance?

Use EC2 and other Amazon hosting for cloud-based client access only.
Their shared disk services are universally despised by basically
everyone who has tried to use it for database hosting.

The recommended pattern is to have the scalable cloud clients access
(and memcache) a remote DB at a colo or managed services host. EBS is a
nice idea, and probably fine for things like image or video hosting, but
database access, especially for OLTP databases, will just result in
wailing and gnashing of teeth.

Just ask anyone who got bit by the recent EBS failure that spanned
*several* availability zones. For all those clients who thought they
were safe by deploying across multiple ACs, it was a rather rude awakening.

http://aws.amazon.com/message/65648/

The consensus seems to be that Amazon's cloud is fine... so long as you
stay far, far away from EBS. Apparently that needs a little more work.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: amazon ec2
Date: 2011-05-05 00:02:53
Message-ID: 4DC1E92D.5020904@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


> FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a page from shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given page isn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you to cache amounts of data that far exceed the amount of memory you could put in a physical server.

So memcached basically replaces the filesystem?

That sounds cool, but I'm wondering if it's actually a performance
speedup. Seems like it would only be a benefit for single-row lookups;
any large reads would be a mess.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: david(at)lang(dot)hm
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: amazon ec2
Date: 2011-05-05 00:12:30
Message-ID: alpine.DEB.2.00.1105041706210.20459@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 4 May 2011, Josh Berkus wrote:

> Date: Wed, 04 May 2011 17:02:53 -0700
> From: Josh Berkus <josh(at)agliodbs(dot)com>
> To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
> Subject: Re: [PERFORM] amazon ec2
>
>
>> FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a page from shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given page isn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you to cache amounts of data that far exceed the amount of memory you could put in a physical server.
>
> So memcached basically replaces the filesystem?
>
> That sounds cool, but I'm wondering if it's actually a performance
> speedup. Seems like it would only be a benefit for single-row lookups;
> any large reads would be a mess.

I think it would depend a lot on how well the page compresses.

if the network I/O plus uncompression time is faster than the seek and
read from disk it should be a win.

I don't see why the benifit would be limited to single row lookups,
anything within that page should be the same.

for multipage actions, the disk would have less of a disadvantage as
readahead may be able to hide some of the work while the memcache approach
would need to do separate transactions for each page.

this does bring up an interesting variation for hierarchical storage,
using compressed pages in memcache rather than dedicated resources on a
local system. thanks for the thought and I'll keep it in mind. I can think
of lots of cases where the database stores a relativly small set of values
that would compress well.

David Lang


From: Denis de Bernardy <ddebernardy(at)yahoo(dot)com>
To: PG Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: amazon ec2
Date: 2011-05-05 00:21:14
Message-ID: 781460.25103.qm@web112419.mail.gq1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

----- Original Message -----

> From: Josh Berkus <josh(at)agliodbs(dot)com>
> To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
> Cc:
> Sent: Thursday, May 5, 2011 2:02 AM
> Subject: Re: [PERFORM] amazon ec2
> So memcached basically replaces the filesystem?
>
> That sounds cool, but I'm wondering if it's actually a performance
> speedup.  Seems like it would only be a benefit for single-row lookups;
> any large reads would be a mess.

I've never tested with pgsql, but with mysql it makes a *huge* difference when you're pulling data repeatedly. Multi-row lookups can be cached too:

$rows = $cache->get(md5($query . '--' . serialize($args)));

if ( !$rows) {
  // query and cache for a few hours...
}

This is true even with mysql's caching features turned on. You spare the DB from doing identical queries that get repeated over and over. Memcache lets you pull those straight from the memory, allowing for the DB server to handle new queries exclusively.


From: Dave Page <dpage(at)pgadmin(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: amazon ec2
Date: 2011-05-05 07:49:47
Message-ID: BANLkTinKCu3brz498R7nP8WVmu_n_F7rUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, May 5, 2011 at 1:02 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> FWIW, EnterpriseDB's "InfiniCache" provides the same caching benefit. The way that works is when PG goes to evict a page from shared buffers that page gets compressed and stuffed into a memcache cluster. When PG determines that a given page isn't in shared buffers it will then check that memcache cluster before reading the page from disk. This allows you to cache amounts of data that far exceed the amount of memory you could put in a physical server.
>
> So memcached basically replaces the filesystem?

No, it sits in between shared buffers and the filesystem, effectively
providing an additional layer of extremely large, compressed cache.
Even on a single server there can be benefits over larger shared
buffers due to the compression.

> That sounds cool, but I'm wondering if it's actually a performance
> speedup.  Seems like it would only be a benefit for single-row lookups;
> any large reads would be a mess.

Depends on the database and the workload - if you can fit your entire
100GB database in cache, and your workload is read intensive then the
speedups are potentially huge (I've seen benchmarks showing 20x+).
Write intensive workloads, less so, similarly if the working set is
far larger than your cache size.

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company