Re: bg worker: general purpose requirements

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Markus Wanner <markus(at)bluegap(dot)ch>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bg worker: general purpose requirements
Date: 2010-09-25 18:03:53
Message-ID: AANLkTi=Z5JupJXX53jcCyvs-bLwNvCHJhEc=xSD7Q35d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 18, 2010 at 4:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> (It's exactly what apache pre-fork does, no? Is anybody concerned about the
>> idle processes there? Or do they consume much less resources?)
....
>
> I don't know whether an idle Apache worker consumes more or less
> memory than an idle PostgreSQL worker, but another difference between
> the Apache case and the PostgreSQL case is that presumably all those

Apache, like Postgres, handles a lot of different use cases and the
ideal configuration depends heavily on how you use it. The default
configs for Apache are meant to be flexible and handle a mixed
workload where some requests are heavyweight scripts which might have
waits on a database and others are lightweight requests for static
objects. This is a hard configuration to get right but the default is
to ramp up the number of processes dynamically in the hopes of
reaching some kind of equilibrium.

Generally the recommended approach for a high traffic site is to use a
dedicated Apache or thttpd or equivalent install for the static
objects -- this one would have hundreds of workers or threads or
whatever and each one would be fairly lightweight. In fact nearly all
the RAM can be shared and the overhead of forking a new process would
be too high compared to serving static content from cache to let the
number scale dynamically. If you have 200 processes each of which has
only a few kB of private RAM then nearly all the RAM is available for
filesystem cache and requests can be served in milliseconds (mostly
network latency).

Then the heavyweight scripts can run on a dedicated Apache install
where the total number of processes is limited to something sane like
a small multiple of the number of cores -- basically RAM / ram
required to run the interpreter. If you have 20 processes and each
uses a 40 MB then your 2GB machine has about half its RAM available
for filesystem cache or other uses. Again you want to run with the 20
processes always running -- this time because the interpreter startup
is usually quite slow.

The dynamic ramp-up is a feature to deal for the default install and
for use case where the system has lots of different users with
different needs.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-09-25 22:28:05 Re: Serializable Snapshot Isolation
Previous Message Peter Eisentraut 2010-09-25 16:43:31 Stalled post to pgsql-committers