Re: bg worker: general purpose requirements

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bg worker: general purpose requirements
Date: 2010-09-17 20:49:44
Message-ID: 4C93D468.1060207@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

On 09/17/2010 05:52 PM, Robert Haas wrote:
> Technically, you could start an autonomous transaction from within an
> autonomous transaction, so I don't think there's a hard maximum of one
> per normal backend. However, I agree that the expected case is to not
> have very many.

Thanks for pointing that out. I somehow knew that was wrong..

> I guess it depends on what your goals are.

Agreed.

> If you're optimizing for
> ability to respond quickly to a sudden load, keeping idle backends
> will probably win even when the number of them you're keeping around
> is fairly high. If you're optimizing for minimal overall resource
> consumption, though, you'll not be as happy about that.

What resources are we talking about here? Are idle backends really that
resource hungry? My feeling so far has been that idle processes are
relatively cheap (i.e. some 100 idle processes shouldn't hurt on a
modern server).

> What I'm
> struggling to understand is this: if there aren't any preforked
> workers around when the load hits, how much does it slow things down?

As the startup code is pretty much the same as for the current
avlauncher, the coordinator can only request one bgworker at a time.

This means the signal needs to reach the postmaster, which then forks a
bgworker process. That new process starts up, connects to the requested
database and then sends an imessage to the coordinator to register. Only
after having received that registration, the coordinator can request
another bgworker (note that this is a one-overall limitation, not per
database).

I haven't measured the actual time it takes, but given the use case of a
connection pool, I so far thought it's obvious that this process takes
too long.

(It's exactly what apache pre-fork does, no? Is anybody concerned about
the idle processes there? Or do they consume much less resources?)

> I would have thought that a few seconds to ramp up to speed after an
> extended idle period (5 minutes, say) would be acceptable for most of
> the applications you mention.

A few seconds? That might be sufficient for autovacuum, but most queries
are completed in less that one second. So for parallel querying,
autonomous transactions and Postgres-R, I certainly don't think that a
few seconds are reasonable. Especially considering the cost of idle
backends.

> Is the ramp-up time longer than that,
> or is even that much delay unacceptable for Postgres-R, or is there
> some other aspect to the problem I'm failing to grasp? I can tell you
> have some experience tuning this so I'd like to try to understand
> where you're coming from.

I didn't ever compare to a max_spare_background_workers = 0
configuration, so I don't have any hard numbers, sorry.

> I think this is an interesting example, and worth some further
> thought. I guess I don't really understand how Postgres-R uses these
> bgworkers.

The given example doesn't only apply to Postgres-R. But with fewer
bgworkers in total, you are more likely to want to use them all for one
database, yes.

> Are you replicating one transaction at a time, or how does
> the data get sliced up?

Yes, one transaction at a time. One transaction per backend (bgworker).
On a cluster with n nodes that has only performs writing transactions,
avg. at a rate of m concurrent transactions/node, you ideally end up
having m normal backends and (n-1) * m bgworkers that concurrently apply
the remote transactions.

> I remember you mentioning
> sync/async/eager/other replication strategies previously - do you have
> a pointer to some good reading on that topic?

Postgres-R mainly is eager multi-master replication. www.postgres-r.org
has some links, most up-to-date my concept paper:
http://www.postgres-r.org/downloads/concept.pdf

> That seems like it would be useful, too.

Okay, will try to come up with something, soon(ish).

Thank you for your feedback and constructive criticism.

Regards

Markus Wanner

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-09-17 20:50:43 Re: Progress indication prototype
Previous Message Magnus Hagander 2010-09-17 20:48:43 Re: Report: removing the inconsistencies in our CVS->git conversion