Re: postgresql roadmap for horizontal scalability?

From: "Mike Rylander" <mrylander(at)gmail(dot)com>
To: Timasmith <timasmith(at)hotmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: postgresql roadmap for horizontal scalability?
Date: 2006-12-03 20:40:10
Message-ID: b918cf3d0612031240s5da6a9a6pdcdaaf846007e15c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1 Dec 2006 09:43:57 -0800, Timasmith <timasmith(at)hotmail(dot)com> wrote:
> Hi,
>
> This database has exceeded all expectations and perhaps I am getting
> ahead of myself but is there anything in the roadmap to match Oracles
> ability to cluster databases across multiple servers for concurrent
> access?
>

My preferred solution is Slony-1 + pgPool (more below).

> The reason I ask is for the architecture of an open source database
> project I am working on. It is tempting to ignore caching altogether
> and push the onus on to the database to perform more database queries.
> This is makes life easier and allows the focus to be on the business
> logic.
>
> However if Postgresql is the datababase and the only means to scale is
> to buy a bigger single server then that could become a problem of cost
> to the community. It would be cheaper to scale across multiple smaller
> servers (I think).

It surely is, depending on your sync speed requirements.

>
> The application I am working on is OLTP with say 90% reads to writes
> ratio.
>

That's very similar to the read/write ratio of my app. Here's what we do:

We've set up
1 master read/write DB
2 (soon to be 3) slave read-only DBs
cascading pgPool setup in "external replication" mode

We spread the load over all machines using the built in pgPool
weighting algorithm, giving the master machine about 1/6 of the read
only traffic (it also gets all in-transaction traffic per the pgPool
config) and spread the rest of the load over the other machines.
Behind the cluster, taking care of keeping everyone in sync, is a
simple Slony-1 setup. It's fairly simple to get going once you wrap
your head around it, and we've yet to see a single problem that we
didn't cause by not reading the documentation.

One thing to watch out for, though, is the replication lag. We see
anywhere from 0.5 to 3 seconds of lag on our setup, and we expect to
start seeing a little more when we add the third slave machine. The
one sure way to counteract this is to always go to the master whenever
you're building a page that changes any data. Those should be few and
far between, realistically, but will cause more than the expected load
that the pgPool config would suggest. Giving the vast majority of the
read-only load to the slaves has mitigated the load issue for us, and
I expect would do the same for you.

Hope that helps. :)

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Mike Rylander
mrylander(at)gmail(dot)com
GPLS -- PINES Development
Database Developer
http://open-ils.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2006-12-03 22:59:33 Re: Notify enhancement
Previous Message Oleg Bartunov 2006-12-03 19:50:13 Re: Problems to create the portuguese dictionary