Quick Links

Re: Postgresql replication

Lists:	pgsql-general

From:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Postgresql replication
Date:	2005-08-24 12:21:50
Message-ID:	20050824122150.14159.qmail@web53214.mail.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hello,

Currently we have only one database accessed by the headquarter and two branches but the performance in the branches is very poor and I was invited to discover a way to increase it.

One possible solution is replicate the headquarter DB into the two branches.

I read about slony-i, but then the replicated DBs will be read-only.

Pgcluster is a sync solution and I think is not fited for us becase the replicated DBs will be located remotely and we have a lot of updates on the DBs.

I think I´m looking for a master-slave assynchronous solution. I know pgReplicator can do it, but I think the project is not active any more.

Are there another solutions?

Thanks in advance!

Reimer

__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger
http://br.download.yahoo.com/messenger/

From:	Michael Meskes <meskes(at)postgresql(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-24 13:31:34
Message-ID:	200508241531.34314.meskes@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Am Mittwoch, 24. August 2005 14:21 schrieb Carlos Henrique Reimer:
> One possible solution is replicate the headquarter DB into the two
> branches.
>
> I read about slony-i, but then the replicated DBs will be read-only.

That's because it's a master-slave replication. If you could sync the slave
back to the master it would be a master itself.

> I think I´m looking for a master-slave assynchronous solution. I know
> pgReplicator can do it, but I think the project is not active any more.

But Slony does master/slave replication.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes(at)jabber(dot)org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-24 15:54:42
Message-ID:	60fyszpby5.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

carlosreimer(at)yahoo(dot)com(dot)br (Carlos Henrique Reimer) writes:
> Currently we have only one database accessed by the headquarter and
> two branches but the performance in the branches is very poor and
> I was invited to discover a way to increase it.
>
> One possible solution is replicate the headquarter DB into the two
> branches.
>
> I read about slony-i, but then the replicated DBs will be read-only.

Correct.

> Pgcluster is a sync solution and I think is not fited for us becase
> the replicated DBs will be located remotely and we have a lot of
> updates on the DBs.

Unfortunately, pgcluster isn't much maintained anymore.

> I think I´m looking for a master-slave assynchronous solution. I
> know pgReplicator can do it, but I think the project is not active
> any more.

Slony-I is a master/slave asynchronous replication system; if you
already considered it unsuitable, then I see little likelihood of
other systems with the same sorts of properties being suitable.

What could conceivably be of use to you would be a *multimaster*
asynchronous replication system. Unfortunately, multimaster
*anything* is a really tough nut to crack.

There is a Slony-II project ongoing that is trying to construct a
more-or-less synchronous multimaster replication system (where part of
the cleverness involves trying to get as much taking place in an
asynchronous fashion as possible) that would almost certainly be of no
use to your "use case."

The most successful "multimaster asynchronous" replication system that
I am aware of is the PalmComputing "PalmSync" system.

It would presumably be possible to use some of the components of
Slony-I to construct a multimaster async replication system. A
pre-requisite would be the creation of some form of "distributed
sequence" which would try to minimize the conflicts that arise out of
auto-generation of sequence numbers.

But beyond that lies the larger challenge of conflict resolution.

Slony-I, as a single-master system, does not need to address
conflicts, as changes must be made on the "master" and propagate
elsewhere.

Synchronous multimaster systems address conflicts by detecting them
when they occur and rejecting one or another of the conflicting
transactions.

Asynchronous multimaster systems require some sort of conflict
management/resolution system for situations where tuples are being
concurrently updated on multiple nodes. How that is managed is, well,
troublesome :-(. The PalmSync approach is that if it finds conflicts,
it duplicates records and leaves you, the user, to clean things up.
That may not be suitable for every kind of application...
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://cbbrowne.com/info/slony.html
((LAMBDA (X) (X X)) (LAMBDA (X) (X X)))

From:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-24 17:31:06
Message-ID:	20050824173106.76233.qmail@web53210.mail.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I read some documents about replication and realized that if you plan on using asynchronous replication, your application should be designed from the outset with that in mind because asynchronous replication is not something that can be easily added on after the fact.

Am I right?

Reimer

__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger
http://br.download.yahoo.com/messenger/

From:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To:	Chris Browne <cbbrowne(at)acm(dot)org>
Cc:	pgsql-general(at)postgresql(dot)org, carlosreimer(at)yahoo(dot)com(dot)br
Subject:	Re: Postgresql replication
Date:	2005-08-24 19:53:34
Message-ID:	430CD03E.3070107@empires.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Chris Browne wrote:
>
> Slony-I is a master/slave asynchronous replication system; if you
> already considered it unsuitable, then I see little likelihood of
> other systems with the same sorts of properties being suitable.
>
> What could conceivably be of use to you would be a *multimaster*
> asynchronous replication system. Unfortunately, multimaster
> *anything* is a really tough nut to crack.
>

In general that's a difficult problem, but in practice there may be a
solution.

For instance, perhaps the following configuration would be helpful:

Make a database for each physical server, called db1 ... dbN. Let your
logical tables in each database be table1 ... tableM. Now, for each
logical tableX (where 1 <= X <= M), make N physical tables, tableX_1 ...
tableX_N. Now, make a view called tableX that is the UNION of tableX_1
... tableX_N (tableX is not a real table, it's just a logical table).

Now, use Slony-I. For each dbY (where 1 <= Y <= N), make dbY a master
for tableX_Y (for all X where 1 <= X <= M) and a slave for tableX_Z (for
all X,Z where 1 <= X <= M, Z != Y).

Now, use a rule that replaces all INSERTs to tableX (where 1 <= X <= M)
on dbY (where 1 <= Y <= N) with INSERTs to tableX_Y.

That was my attempt at being unambiguous. In general what I mean is that
each database is master of one piece of a table, and slave to all the
other pieces of that table, and then you have a view which is the union
of those pieces. That view is the logical table. Then have a RULE which
makes INSERTs go to the physical table for which that database is master.

The advantages: if one machine goes down, the rest keep going, and
merely miss the updates from that one site to that table. If one machine
makes an insert to the table, it quickly propogates to the other
machines and transparently becomes a part of the logical tables on those
machines.

The disadvantages: UPDATEs are difficult, and might end up with a
complicated set of rules/procedures/triggers. You may have to program
the application defensively in case the database is unable to update a
remote database for various reasons (if the record to be updated is a
part of a table for which another database is master). Also, since the
solution is asynchronous, the databases may provide different results to
the same query.

In general, this solution does not account for all kinds of data
constraints. The conflict resolution is very simplified because it's
basically just the union of data. If that union could cause a constraint
violation itself, this solution might not be right for you. For
instance, let's say you're tracking video rentals, and store policy says
that you only rent one video per person. However, maybe they go to store
1 and rent a video, and run to store 2 and rent a video before store 1
sends the INSERT record over to store 2. Now, when they finally do
attempt to UNION the data for the view, you have an inconsistant state.

Many applications can get by just fine by UNIONing the data like that,
and if not, perhaps work around it.

I hope this is helpful. Let me know if there's some reason my plan won't
work.

Regards,
Jeff Davis

From:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, carlosreimer(at)yahoo(dot)com(dot)br
Subject:	Re: Postgresql replication
Date:	2005-08-24 22:19:17
Message-ID:	20050824221917.GC96732@pervasive.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Or, for something far easier, try
http://pgfoundry.org/projects/pgcluster/ which provides syncronous
multi-master clustering.

On Wed, Aug 24, 2005 at 12:53:34PM -0700, Jeff Davis wrote:
> Chris Browne wrote:
> >
> > Slony-I is a master/slave asynchronous replication system; if you
> > already considered it unsuitable, then I see little likelihood of
> > other systems with the same sorts of properties being suitable.
> >
> > What could conceivably be of use to you would be a *multimaster*
> > asynchronous replication system. Unfortunately, multimaster
> > *anything* is a really tough nut to crack.
> >
>
> In general that's a difficult problem, but in practice there may be a
> solution.
>
> For instance, perhaps the following configuration would be helpful:
>
> Make a database for each physical server, called db1 ... dbN. Let your
> logical tables in each database be table1 ... tableM. Now, for each
> logical tableX (where 1 <= X <= M), make N physical tables, tableX_1 ...
> tableX_N. Now, make a view called tableX that is the UNION of tableX_1
> ... tableX_N (tableX is not a real table, it's just a logical table).
>
> Now, use Slony-I. For each dbY (where 1 <= Y <= N), make dbY a master
> for tableX_Y (for all X where 1 <= X <= M) and a slave for tableX_Z (for
> all X,Z where 1 <= X <= M, Z != Y).
>
> Now, use a rule that replaces all INSERTs to tableX (where 1 <= X <= M)
> on dbY (where 1 <= Y <= N) with INSERTs to tableX_Y.
>
> That was my attempt at being unambiguous. In general what I mean is that
> each database is master of one piece of a table, and slave to all the
> other pieces of that table, and then you have a view which is the union
> of those pieces. That view is the logical table. Then have a RULE which
> makes INSERTs go to the physical table for which that database is master.
>
> The advantages: if one machine goes down, the rest keep going, and
> merely miss the updates from that one site to that table. If one machine
> makes an insert to the table, it quickly propogates to the other
> machines and transparently becomes a part of the logical tables on those
> machines.
>
> The disadvantages: UPDATEs are difficult, and might end up with a
> complicated set of rules/procedures/triggers. You may have to program
> the application defensively in case the database is unable to update a
> remote database for various reasons (if the record to be updated is a
> part of a table for which another database is master). Also, since the
> solution is asynchronous, the databases may provide different results to
> the same query.
>
> In general, this solution does not account for all kinds of data
> constraints. The conflict resolution is very simplified because it's
> basically just the union of data. If that union could cause a constraint
> violation itself, this solution might not be right for you. For
> instance, let's say you're tracking video rentals, and store policy says
> that you only rent one video per person. However, maybe they go to store
> 1 and rent a video, and run to store 2 and rent a video before store 1
> sends the INSERT record over to store 2. Now, when they finally do
> attempt to UNION the data for the view, you have an inconsistant state.
>
> Many applications can get by just fine by UNIONing the data like that,
> and if not, perhaps work around it.
>
> I hope this is helpful. Let me know if there's some reason my plan won't
> work.
>
> Regards,
> Jeff Davis
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com 512-569-9461

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-24 23:22:39
Message-ID:	deivg4$ura$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Carlos Henrique Reimer wrote:
> I read some documents about replication and realized that if you plan on
> using asynchronous replication, your application should be designed from
> the outset with that in mind because asynchronous replication is not
> something that can be easily “added on” after the fact.

Yes, it requires a lot foresight to do multi-master replication --
especially across high latency connections. I do that now for 2
different projects. We have servers across the country replicating data
every X minutes with custom app logic resolves conflicting data.

Allocation of unique IDs that don't collide across servers is a must.
For 1 project, instead of using numeric IDs, we using CHAR and
pre-append a unique server code so record #1 on server A is A0000000001
versus ?x0000000001 on other servers. For the other project, we were too
far along in development to change all our numerics into chars so we
wrote custom sequence logic to divide our 10billion ID space into
1-Xbillion for server 1, X-Ybillion for server 2, etc.

With this step taken, we then had to isolate (1) transactions could run
on any server w/o issue (where we always take the newest record), (2)
transactions required an amalgam of all actions and (3) transactions had
to be limited to "home" servers. Record keeping stuff where we keep a
running history of all changes fell into the first category. It would
have been no different than 2 users on the same server updating the same
object at different times during the day. Updating of summary data fell
into category #2 and required parsing change history of individual
elements. Category #3 would be financial transactions requiring strict
locks were be divided up by client/user space and restricted to the
user's home server. This case would not allow auto-failover. Instead, it
would require some prolonged threshold of downtime for a server before
full financials are allowed on backup servers.

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	William Yu <wyu(at)talisys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 00:39:50
Message-ID:	877jeaetnt.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu <wyu(at)talisys(dot)com> writes:

> Allocation of unique IDs that don't collide across servers is a must. For 1
> project, instead of using numeric IDs, we using CHAR and pre-append a unique
> server code so record #1 on server A is A0000000001 versus ?x0000000001 on other
> servers. For the other project, we were too far along in development to change
> all our numerics into chars so we wrote custom sequence logic to divide our
> 10billion ID space into 1-Xbillion for server 1, X-Ybillion for server 2, etc.

I would have thought setting the sequences to "INCREMENT BY 100" would let you
handle this simply by setting the sequences on each server to start at a
different value modulo 100.

I wonder if it might be handy to be able to set default sequence parameters on
a per-database level so that you could set this up and then just do a normal
pg_restore of the same schema and get proper non-conflicting sequences on each
server.

I suppose it's the least of your problems though.

--
greg

From:	Aly Dharshi <aly(dot)dharshi(at)telus(dot)net>
To:	William Yu <wyu(at)talisys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 00:46:34
Message-ID:	430D14EA.50502@telus.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I know I am wadding into this discussion as an beginner compared to the rest who
have answered this thread, but doesn't something like pgpool provide relief for
pseudo-multimaster replication, and what about software like sqlrelay wouldn't
these suites help to some extent ? Looking forward to be enlightened.

Cheers,

Aly.

William Yu wrote:
> Carlos Henrique Reimer wrote:
>
>> I read some documents about replication and realized that if you plan
>> on using asynchronous replication, your application should be designed
>> from the outset with that in mind because asynchronous replication is
>> not something that can be easily “added on” after the fact.
>
>
> Yes, it requires a lot foresight to do multi-master replication --
> especially across high latency connections. I do that now for 2
> different projects. We have servers across the country replicating data
> every X minutes with custom app logic resolves conflicting data.
>
> Allocation of unique IDs that don't collide across servers is a must.
> For 1 project, instead of using numeric IDs, we using CHAR and
> pre-append a unique server code so record #1 on server A is A0000000001
> versus ?x0000000001 on other servers. For the other project, we were too
> far along in development to change all our numerics into chars so we
> wrote custom sequence logic to divide our 10billion ID space into
> 1-Xbillion for server 1, X-Ybillion for server 2, etc.
>
> With this step taken, we then had to isolate (1) transactions could run
> on any server w/o issue (where we always take the newest record), (2)
> transactions required an amalgam of all actions and (3) transactions had
> to be limited to "home" servers. Record keeping stuff where we keep a
> running history of all changes fell into the first category. It would
> have been no different than 2 users on the same server updating the same
> object at different times during the day. Updating of summary data fell
> into category #2 and required parsing change history of individual
> elements. Category #3 would be financial transactions requiring strict
> locks were be divided up by client/user space and restricted to the
> user's home server. This case would not allow auto-failover. Instead, it
> would require some prolonged threshold of downtime for a server before
> full financials are allowed on backup servers.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Aly S.P Dharshi
aly(dot)dharshi(at)telus(dot)net

"A good speech is like a good dress
that's short enough to be interesting
and long enough to cover the subject"

From:	Chris Travers <chris(at)travelamericas(dot)com>
To:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 06:29:44
Message-ID:	430D6558.8040605@travelamericas.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Carlos Henrique Reimer wrote:

> I read some documents about replication and realized that if you plan
> on using asynchronous replication, your application should be designed
> from the outset with that in mind because asynchronous replication is
> not something that can be easily “added on” after the fact.
> Am I right?

Depending on your needs, you may find pgpool and Slony to be a workable
combination. This is better when you have a lot of reads and only
occasional writes. This way writes get redirected back to the master,
and read-only transactions get run on the slaves.

Best Wishes,
Chris Travers
Metatron Technology Consulting

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 07:01:49
Message-ID:	dejqcu$1slo$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

It provides pseudo relief if all your servers are in the same building.
Having a front-end pgpool connector pointing to servers across the world
is not workable -- performance ends up being completely decrepit due to
the high latency.

Which is the problem we face. Great, you've got multiple servers for
failover. Too bad it doesn't do much good if your building gets hit by
fire/earthquake/hurricane/etc.

Aly Dharshi wrote:
> I know I am wadding into this discussion as an beginner compared to the
> rest who have answered this thread, but doesn't something like pgpool
> provide relief for pseudo-multimaster replication, and what about
> software like sqlrelay wouldn't these suites help to some extent ?
> Looking forward to be enlightened.
>
> Cheers,
>
> Aly.
>
> William Yu wrote:
>
>> Carlos Henrique Reimer wrote:
>>
>>> I read some documents about replication and realized that if you plan
>>> on using asynchronous replication, your application should be
>>> designed from the outset with that in mind because asynchronous
>>> replication is not something that can be easily “added on” after the
>>> fact.
>>
>>
>> Yes, it requires a lot foresight to do multi-master replication --
>> especially across high latency connections. I do that now for 2
>> different projects. We have servers across the country replicating
>> data every X minutes with custom app logic resolves conflicting data.
>>
>> Allocation of unique IDs that don't collide across servers is a must.
>> For 1 project, instead of using numeric IDs, we using CHAR and
>> pre-append a unique server code so record #1 on server A is
>> A0000000001 versus ?x0000000001 on other servers. For the other
>> project, we were too far along in development to change all our
>> numerics into chars so we wrote custom sequence logic to divide our
>> 10billion ID space into 1-Xbillion for server 1, X-Ybillion for server
>> 2, etc.
>>
>> With this step taken, we then had to isolate (1) transactions could
>> run on any server w/o issue (where we always take the newest record),
>> (2) transactions required an amalgam of all actions and (3)
>> transactions had to be limited to "home" servers. Record keeping stuff
>> where we keep a running history of all changes fell into the first
>> category. It would have been no different than 2 users on the same
>> server updating the same object at different times during the day.
>> Updating of summary data fell into category #2 and required parsing
>> change history of individual elements. Category #3 would be financial
>> transactions requiring strict locks were be divided up by client/user
>> space and restricted to the user's home server. This case would not
>> allow auto-failover. Instead, it would require some prolonged
>> threshold of downtime for a server before full financials are allowed
>> on backup servers.

From:	Bohdan Linda <bohdan(dot)linda(at)seznam(dot)cz>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 11:44:15
Message-ID:	20050825114411.GA30455@bafster.chello.upc.cz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I would have a slight offtopic question, this is issue only of pgsql or
there are some other db solutions which have good performance when doing
this kind of replication across the world.

Regards,
Bohdan

On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote:
> It provides pseudo relief if all your servers are in the same building.
> Having a front-end pgpool connector pointing to servers across the world
> is not workable -- performance ends up being completely decrepit due to
> the high latency.
>
> Which is the problem we face. Great, you've got multiple servers for
> failover. Too bad it doesn't do much good if your building gets hit by
> fire/earthquake/hurricane/etc.
>
>

From:	Tino Wildenhain <tino(at)wildenhain(dot)de>
To:	Bohdan Linda <bohdan(dot)linda(at)seznam(dot)cz>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 11:50:48
Message-ID:	430DB098.9070905@wildenhain.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Bohdan Linda schrieb:
> I would have a slight offtopic question, this is issue only of pgsql or
> there are some other db solutions which have good performance when doing
> this kind of replication across the world.

it depends entirely on your application. There is no "one size
fits all"

For example to have an online backup, WAL archiving to remote
sites is often sufficient.

However you cannot have synchronous multimaster replication
over slow lines and high performance with updates the same
time.

There is always a tradeoff in any (even in high cost
commercial solutions) you have to carefully consider.

>
> Regards,
> Bohdan
>
> On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote:
>
>>It provides pseudo relief if all your servers are in the same building.
>>Having a front-end pgpool connector pointing to servers across the world
>>is not workable -- performance ends up being completely decrepit due to
>>the high latency.
>>
>>Which is the problem we face. Great, you've got multiple servers for
>>failover. Too bad it doesn't do much good if your building gets hit by
>>fire/earthquake/hurricane/etc.

This would remove the application using that data too, or not? ;)

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 12:03:24
Message-ID:	dekc2c$1pqm$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

As far as I know, nobody has a generic solution for multi-master
replication where servers are not in close proximity. Single master
replication? Doable. Application specific conflict resolution? Doable.
Off the shelf package that somehow knows financial transactions on a
server shouldn't be duplicated on another? Uhh...I'd be wary of trying
it out myself.

Bohdan Linda wrote:
> I would have a slight offtopic question, this is issue only of pgsql or
> there are some other db solutions which have good performance when doing
> this kind of replication across the world.
>
> Regards,
> Bohdan
>
> On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote:
>
>>It provides pseudo relief if all your servers are in the same building.
>>Having a front-end pgpool connector pointing to servers across the world
>>is not workable -- performance ends up being completely decrepit due to
>>the high latency.
>>
>>Which is the problem we face. Great, you've got multiple servers for
>>failover. Too bad it doesn't do much good if your building gets hit by
>>fire/earthquake/hurricane/etc.

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 12:24:53
Message-ID:	dekdak$1v8e$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Another tidbit I'd like to add. What has helped a lot in implementing
high-latency master-master replication writing our software with a
business process model in mind where data is not posted directly to the
final tables. Instead, users are generally allowed to enter anything --
could be incorrect, incomplete or the user does not have rights -- the
data is still dumped into "pending" tables for people with rights to
fix/review/approve later. Only after that process is the data posted to
the final tables. (Good data entered on the first try still gets pended
-- validation phase simply assumes the user who entered the data is also
the one who fixed/reviewed/approved.)

In terms of replication, this model allows for users to enter data on
any server. The pending records then get replicated to every server.
Each specific server then looks at it's own dataset of pendings to post
to final tables. Final data is then replicated back to all the
participating servers.

There may be a delay for the user if he/she is working on a server that
doesn't have rights to post his data. However, the pending->post model
gets users used to the idea of (1) entering all data in large swoop and
validating/posting it afterwards and (2) data can/will sit in pending
for a period of time until it is acted upon with somebody/some server
with the proper authority. Hence users aren't expecting results to pop
up on the screen the moment they press the submit button.

William Yu wrote:
> Yes, it requires a lot foresight to do multi-master replication --
> especially across high latency connections. I do that now for 2
> different projects. We have servers across the country replicating data
> every X minutes with custom app logic resolves conflicting data.
>
> Allocation of unique IDs that don't collide across servers is a must.
> For 1 project, instead of using numeric IDs, we using CHAR and
> pre-append a unique server code so record #1 on server A is A0000000001
> versus ?x0000000001 on other servers. For the other project, we were too
> far along in development to change all our numerics into chars so we
> wrote custom sequence logic to divide our 10billion ID space into
> 1-Xbillion for server 1, X-Ybillion for server 2, etc.
>
> With this step taken, we then had to isolate (1) transactions could run
> on any server w/o issue (where we always take the newest record), (2)
> transactions required an amalgam of all actions and (3) transactions had
> to be limited to "home" servers. Record keeping stuff where we keep a
> running history of all changes fell into the first category. It would
> have been no different than 2 users on the same server updating the same
> object at different times during the day. Updating of summary data fell
> into category #2 and required parsing change history of individual
> elements. Category #3 would be financial transactions requiring strict
> locks were be divided up by client/user space and restricted to the
> user's home server. This case would not allow auto-failover. Instead, it
> would require some prolonged threshold of downtime for a server before
> full financials are allowed on backup servers.

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 12:34:46
Message-ID:	dekdt6$219p$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tino Wildenhain wrote:
>>> Which is the problem we face. Great, you've got multiple servers for
>>> failover. Too bad it doesn't do much good if your building gets hit
>>> by fire/earthquake/hurricane/etc.
>
>
> This would remove the application using that data too, or not? ;)

Yes and no. If your DB is an internal app for a company where the users
are in the same building as the servers, doesn't matter really I guess.
Meteor hitting the building would kill the users in addition to the
server so nobody will be calling you to complain about system downtime.

If your app is used by external customers who are all across the
country, they want to continue to still use your software even though
you and data center #1 are 6 feet under due to an 8.0 earthquake. They
want auto-failover to data center #2 which is in close proximity to CIA
headquarters and other juicy terrorist targets.

From:	David Goodenough <david(dot)goodenough(at)btconnect(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 13:06:51
Message-ID:	200508251406.52090.david.goodenough@btconnect.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Thursday 25 August 2005 13:03, William Yu wrote:
> As far as I know, nobody has a generic solution for multi-master
> replication where servers are not in close proximity. Single master
> replication? Doable. Application specific conflict resolution? Doable.
> Off the shelf package that somehow knows financial transactions on a
> server shouldn't be duplicated on another? Uhh...I'd be wary of trying
> it out myself.

The most obvious one that does exactly this (generic multi-master
replication) is Lotus Domino. It is not a relational DB, but not sufficiently
far off to stop the analogy.

Domino marks each document with a binary value which identifies the
server (built from a hash of the server name and the time the DB was
created) and a timestamp when it was last modified, and also each document
(record) has an ID (like OIDs). More recent versions also do this at a field
level to avoid conflicts and speed replication. When two servers replicate
they look for all documents modified since the last replication time, and
compare the list. Those only modified on one server are copied across
to the other server replacing the old record and carrying the updated on
server and timestamp with them. When a document is deleted in Domino it
actually does not dissapear, it is reduced to a deletion stub, and this gets
replicated as it has the same ID as the original record. Those that have been
modified on both sides are copied to the other DB, but both records remain and
it is left to the user to resolve conflicts. Field level replication reduces
the need for this considerably. Periodically the deletion stubs are purged,
once all known replicas have replicated.

Domino has absolutely no concept of a master DB.

Obviously this scheme would be difficult to do on a pure relational system.
But with triggers and a few standard fields it would not be impossible to
do for a limited application set. How the user would resolve conflicts would
also be application specific I suspect and how one would relate having two
version of a record in the DB then they both have a field which is supposed to
be unique is also a problem that would have to be resolved (Domino does not
have the concept of unique keys).

David

>
> Bohdan Linda wrote:
> > I would have a slight offtopic question, this is issue only of pgsql or
> > there are some other db solutions which have good performance when doing
> > this kind of replication across the world.
> >
> > Regards,
> > Bohdan
> >
> > On Thu, Aug 25, 2005 at 09:01:49AM +0200, William Yu wrote:
> >>It provides pseudo relief if all your servers are in the same building.
> >>Having a front-end pgpool connector pointing to servers across the world
> >>is not workable -- performance ends up being completely decrepit due to
> >>the high latency.
> >>
> >>Which is the problem we face. Great, you've got multiple servers for
> >>failover. Too bad it doesn't do much good if your building gets hit by
> >>fire/earthquake/hurricane/etc.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

From:	Tino Wildenhain <tino(at)wildenhain(dot)de>
To:	William Yu <wyu(at)talisys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 13:19:19
Message-ID:	430DC557.7090905@wildenhain.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu schrieb:
> Tino Wildenhain wrote:
>
>>>> Which is the problem we face. Great, you've got multiple servers for
>>>> failover. Too bad it doesn't do much good if your building gets hit
>>>> by fire/earthquake/hurricane/etc.
>>
>>
>>
>> This would remove the application using that data too, or not? ;)
>
>
> Yes and no. If your DB is an internal app for a company where the users
> are in the same building as the servers, doesn't matter really I guess.
> Meteor hitting the building would kill the users in addition to the
> server so nobody will be calling you to complain about system downtime.
>
> If your app is used by external customers who are all across the
> country, they want to continue to still use your software even though
> you and data center #1 are 6 feet under due to an 8.0 earthquake. They
> want auto-failover to data center #2 which is in close proximity to CIA
> headquarters and other juicy terrorist targets.

Sure, but in this case a "simple" async master-slave (slony-1)
and the usual failover (also DNS-failover) should be sufficient.

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 14:47:41
Message-ID:	430DDA0D.8090800@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Chris Travers wrote:

> Carlos Henrique Reimer wrote:
>
>> I read some documents about replication and realized that if you plan
>> on using asynchronous replication, your application should be
>> designed from the outset with that in mind because asynchronous
>> replication is not something that can be easily “added on” after the
>> fact.
>> Am I right?
>
>
> Depending on your needs, you may find pgpool and Slony to be a
> workable combination. This is better when you have a lot of reads and
> only occasional writes. This way writes get redirected back to the
> master, and read-only transactions get run on the slaves.

As long as you don't have any functions that write to the db. pgpool
could (and likely would) redirect some of these to the subscriber.
Slony would prevent the data from being written (which would prevent
the subscriber from being corrupted).

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 14:56:05
Message-ID:	430DDC05.90809@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Bohdan Linda wrote:

>I would have a slight offtopic question, this is issue only of pgsql or
>there are some other db solutions which have good performance when doing
>this kind of replication across the world.
>
>
>
It all depends on the quality of the connection Node A to Node B. If
connectivity is poor, then it is impossible to have good performance
doing anything across that connection.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 16:18:55
Message-ID:	60irxung5s.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

aly(dot)dharshi(at)telus(dot)net (Aly Dharshi) writes:
> I know I am wadding into this discussion as an beginner compared to
> the rest who have answered this thread, but doesn't something like
> pgpool provide relief for pseudo-multimaster replication, and what
> about software like sqlrelay wouldn't these suites help to some extent
> ? Looking forward to be enlightened.

pgpool and sqlrelay provide you a loose equivalent to synchronous
replication, as they have to submit the queries to *all* of the nodes.

If you can live with the long latency times that result if the nodes
are widely separated, that's great.

Unfortunately, the reason for people to want *asynchronous*
multimaster replication is that they /cannot/ afford that latency
time. They want to submit updates only to the local database, and
have the updates head to the other server some time later.

That's why pgpool/sqlrelay aren't an answer.
--
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/languages.html
Rules of the Evil Overlord #46. "If an advisor says to me "My liege,
he is but one man. What can one man possibly do?", I will reply
"This." and kill the advisor." <http://www.eviloverlord.com/>

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 16:20:52
Message-ID:	60ek8ing2j.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

bohdan(dot)linda(at)seznam(dot)cz (Bohdan Linda) writes:
> I would have a slight offtopic question, this is issue only of pgsql or
> there are some other db solutions which have good performance when doing
> this kind of replication across the world.

Asynchronous multimaster replication is pretty much a generally tough
problem.

Oracle, Sybase, and DB2 all have methods of doing it; none are either
simple, straightforward, or transparent to use.

It's a tough problem, in general.
--
(reverse (concatenate 'string "gro.mca" "@" "enworbbc"))
http://cbbrowne.com/info/rdbms.html
"I think we might have been better off with a slide rule."
-- Zaphod Beeblebrox

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 16:23:52
Message-ID:	60acj6nfxj.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

david(dot)goodenough(at)btconnect(dot)com (David Goodenough) writes:
> On Thursday 25 August 2005 13:03, William Yu wrote:
>> As far as I know, nobody has a generic solution for multi-master
>> replication where servers are not in close proximity. Single master
>> replication? Doable. Application specific conflict resolution?
>> Doable. Off the shelf package that somehow knows financial
>> transactions on a server shouldn't be duplicated on another?
>> Uhh...I'd be wary of trying it out myself.
>
> The most obvious one that does exactly this (generic multi-master
> replication) is Lotus Domino. It is not a relational DB, but not
> sufficiently far off to stop the analogy.

There's a better known case, and that's PalmOS "PalmSync."

It does fairly much the same thing as Lotus Notes(tm).

In both cases, they are doing something that works reasonably well for
data that looks somewhat like "documents." Conflict resolution at the
"document" level is something that users can handle pretty well by
hand.

Unfortunately, for data that's more at the "hordes of little facts"
level, conflict resolution is a tough problem :-(.
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/internet.html
Rules of the Evil Overlord #223. "I will install a fire extinguisher
in every room -- three, if the room contains vital equipment or
volatile chemicals." <http://www.eviloverlord.com/>

From:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To:	"Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, carlosreimer(at)yahoo(dot)com(dot)br
Subject:	Re: Postgresql replication
Date:	2005-08-25 18:52:31
Message-ID:	430E136F.2010405@empires.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Jim C. Nasby wrote:
> Or, for something far easier, try
> http://pgfoundry.org/projects/pgcluster/ which provides syncronous
> multi-master clustering.
>

He specifically said that pgcluster did not work for him because the
databases would be at physically seperate locations. PGCluster requires
that there be a load balancer and a replicator centrally located
managing the cluster. If a network problem happens at the centralized
location, it would bring down all locations completely.

I think he's looking for an async solution because of that. In my
solution, if one location goes down, the others keep going.

Regards,
Jeff Davis

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 19:02:05
Message-ID:	del4jg$242o$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tino Wildenhain wrote:
>> If your app is used by external customers who are all across the
>> country, they want to continue to still use your software even though
>> you and data center #1 are 6 feet under due to an 8.0 earthquake. They
>> want auto-failover to data center #2 which is in close proximity to
>> CIA headquarters and other juicy terrorist targets.
>
>
> Sure, but in this case a "simple" async master-slave (slony-1)
> and the usual failover (also DNS-failover) should be sufficient.

Workable if you're willing to keep multiple sets of servers idling on
stand-by only for catastrophic failure. For temporary failure, I believe
at this time Slony requires manual admin work to resync a promoted
master and demote it back to slave. Not a big deal if the downtime
doesn't happen often but when you're depending a cross-country internet
connection, it happens far more often than you like to deal with.

Of course, we're also using multi-master replication to load balance
traffic across multiple servers because the price jump from going 2xDC
to 4xDC is major. So we have a bunch of 2x1 and 2xDC servers across the
country serving our customers versus 1 big server for primary access and
1 big server as a standby.

From:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
To:	Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Subject:	Re: Postgresql replication
Date:	2005-08-25 19:26:57
Message-ID:	20050825192657.5098.qmail@web53202.mail.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Exactly!!!!

Jeff Davis <jdavis-pgsql(at)empires(dot)org> escreveu:Jim C. Nasby wrote:
> Or, for something far easier, try
> http://pgfoundry.org/projects/pgcluster/ which provides syncronous
> multi-master clustering.
>

I think he's looking for an async solution because of that. In my
solution, if one location goes down, the others keep going.

Regards,
Jeff Davis

---------------------------------
Yahoo! Acesso Grátis: Internet rápida e grátis. Instale o discador agora!

From:	Matt Miller <mattm(at)epx(dot)com>
To:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org, Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Subject:	Re: Postgresql replication
Date:	2005-08-25 19:54:49
Message-ID:	1124999689.3076.8.camel@dbamm01-linux
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> > http://pgfoundry.org/projects/pgcluster/ which provides syncronous
> > multi-master clustering.
>
> He specifically said that pgcluster did not work for him
> because ...PGCluster requires that there be a load balancer and a
> replicator centrally located managing the cluster. If a network
> problem happens at the centralized location, it would bring down all
> locations completely.

I think the load balancer is an optional component. Clients can connect
either to a load balancer, or directly to a specific machine.

Also, I think pgcluster can operate async. If the system runs in
"normal" mode then the client gets a response as soon as the transaction
is committed on the local machine. In "reliable" mode the client waits
for the commit to happen on all machines.

See
http://pgcluster.projects.postgresql.org/structure_of_replication.html

From:	Steve Atkins <steve(at)blighty(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 20:05:19
Message-ID:	20050825200519.GB23397@gp.word-to-the-wise.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Thu, Aug 25, 2005 at 12:20:52PM -0400, Chris Browne wrote:

> bohdan(dot)linda(at)seznam(dot)cz (Bohdan Linda) writes:
> > I would have a slight offtopic question, this is issue only of pgsql or
> > there are some other db solutions which have good performance when doing
> > this kind of replication across the world.
>
> Asynchronous multimaster replication is pretty much a generally tough
> problem.
>
> Oracle, Sybase, and DB2 all have methods of doing it; none are either
> simple, straightforward, or transparent to use.
>
> It's a tough problem, in general.

I've been involved in one project that did multi-master async replication
across geographically separated servers.

It worked but was operationally so horrible that I've sworn never to
architect a system that requires multi-master replication at the
database ever again. Application level? Possibly. But not at the DB
level.

Cheers,
Steve

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 21:13:14
Message-ID:	delc9f$30d4$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

David Goodenough wrote:
> The most obvious one that does exactly this (generic multi-master
> replication) is Lotus Domino. It is not a relational DB, but not sufficiently
> far off to stop the analogy.
>
> Domino marks each document with a binary value which identifies the
> server (built from a hash of the server name and the time the DB was
> created) and a timestamp when it was last modified, and also each document
> (record) has an ID (like OIDs). More recent versions also do this at a field
> level to avoid conflicts and speed replication. When two servers replicate

This system sounds ok for documents and general data that can always be
revived via version control/history. But I can't see how this would work
for financial transactions where you're dealing with money and bank
accounts. Suppose I have $100 in my account. I decided to login to
multiple servers and wire transfer $100 to another account on every
server. And I hit submit exactly at the same time for every server so
check. Sure they can resolve the conflict afterwards in terms of saying
in terms of which transfer to kill off. But the fact is that my other
account has that N x $100 already and I've just fleeced the bank.

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 21:17:52
Message-ID:	60vf1tn2bj.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

bnichols(at)ca(dot)afilias(dot)info (Brad Nicholson) writes:
> Bohdan Linda wrote:
>
>>I would have a slight offtopic question, this is issue only of pgsql or
>>there are some other db solutions which have good performance when doing
>>this kind of replication across the world.
>>
>>
>>
> It all depends on the quality of the connection Node A to Node B. If
> connectivity is poor, then it is impossible to have good performance
> doing anything across that connection.

The "nifty magic" of asynchronous multimaster would be that you'd be
able to have fast access at "Site Alpha" to "Node Alpha" as well as
fast access at "Site Beta" to "Node Beta."

That would, indeed, be valuable. That's why it's one of those "Holy
Grail" things... It's Rather Hard, which is why there isn't a
"Slony-III: The Wrath of Async MultiMaster" just yet.

It would be valuable for someone to look into a replication system to
support that sort of scenario. With the stipulation that there be
some big, fiery dragons there ready to eat you and your data :-).
--
(format nil "~S(at)~S" "cbbrowne" "ntlug.org")
http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
"I'm guilty of a lot of things, but I didn't ever do that.''
-- Bill Clinton, on why he had astroturf lining the back of his pickup
truck
[In fact, he _DID_ do this, thus making life creepier than fiction...]

From:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 21:30:56
Message-ID:	430E3890.7010803@empires.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Carlos Henrique Reimer wrote:
> Exactly!!!!
>

Was there something lacking in my suggested solution at:
<http://archives.postgresql.org/pgsql-general/2005-08/msg01240.php>

It's a little complicated to administer, but it seems well-suited to a
company that has several locations that want to share data without being
too dependent upon eachother.

Regards,
Jeff Davis

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 21:39:39
Message-ID:	60mzn5n1b8.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

steve(at)blighty(dot)com (Steve Atkins) writes:
> On Thu, Aug 25, 2005 at 12:20:52PM -0400, Chris Browne wrote:
>> bohdan(dot)linda(at)seznam(dot)cz (Bohdan Linda) writes:
>> > I would have a slight offtopic question, this is issue only of pgsql or
>> > there are some other db solutions which have good performance when doing
>> > this kind of replication across the world.
>>
>> Asynchronous multimaster replication is pretty much a generally tough
>> problem.
>>
>> Oracle, Sybase, and DB2 all have methods of doing it; none are either
>> simple, straightforward, or transparent to use.
>>
>> It's a tough problem, in general.
>
> I've been involved in one project that did multi-master async replication
> across geographically separated servers.
>
> It worked but was operationally so horrible that I've sworn never to
> architect a system that requires multi-master replication at the
> database ever again. Application level? Possibly. But not at the DB
> level.

I have heard something of the same argument being thrown at attempts
to use things like CORBA to allow you to pretend that there is no
difference between local and remote access to things.

There are some good arguments to be made, there.

You really need to access remote data in different ways than local
data because the latency *will* kill you if you ignore it.

It's a well and neat idea to try to find ways to hide those
differences under the carpet; it is quite likely that the "better way"
will involve addressing that somewhere in the application, as opposed
to trying to get the DB (or the ORB or other 'distributed thingie') to
do it for you.

Tables that contain balances (e.g. - credit balances, account
balances, and such) would be a good example. You do NOT want to
distribute the process of updating balances across some
hideous-latency "Great Divide" link. That heads, of course, to
application design, not to "pure DB level" activity...
--
(format nil "~S(at)~S" "cbbrowne" "cbbrowne.com")
http://www.ntlug.org/~cbbrowne/advocacy.html
"Tools that are no good require more skill."

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 21:48:46
Message-ID:	60irxtn0w1.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu <wyu(at)talisys(dot)com> writes:
> David Goodenough wrote:
>> The most obvious one that does exactly this (generic multi-master
>> replication) is Lotus Domino. It is not a relational DB, but not sufficiently
>> far off to stop the analogy.
>> Domino marks each document with a binary value which identifies the
>> server (built from a hash of the server name and the time the DB was
>> created) and a timestamp when it was last modified, and also each document
>> (record) has an ID (like OIDs). More recent versions also do this at a field
>> level to avoid conflicts and speed replication. When two servers replicate
>
> This system sounds ok for documents and general data that can always
> be revived via version control/history. But I can't see how this
> would work for financial transactions where you're dealing with
> money and bank accounts. Suppose I have $100 in my account. I
> decided to login to multiple servers and wire transfer $100 to
> another account on every server. And I hit submit exactly at the
> same time for every server so check. Sure they can resolve the
> conflict afterwards in terms of saying in terms of which transfer to
> kill off. But the fact is that my other account has that N x $100
> already and I've just fleeced the bank.

There are two pretty reasonable ways to address this:

1. Your application does not replicate balances; those are always
computed locally. Only *transactions* are relayed.

2. Balance updates take place as a special kind of "delta update"
where replication transfers around (old.balance - new.balance) instead
of either of the values.

Either of these leads to the nifty case where that bank discovers that
you have overdrawn your account, and then takes whatever actions they
deem are appropriate next.

Such as billing you $29 for each overdraft transaction.

You're hardly going to be retiring to the Bahamas on Nx$100, are you
:-).
--
select 'cbbrowne' || '@' || 'ntlug.org';
http://cbbrowne.com/info/internet.html
Rules of the Evil Overlord #211. "If my chief engineer displeases me,
he will be shot, not imprisoned in the dungeon or beyond the traps he
helped design." <http://www.eviloverlord.com/>

From:	Chris Travers <chris(at)metatrontech(dot)com>
To:	William Yu <wyu(at)talisys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-25 23:00:21
Message-ID:	430E4D85.1040306@metatrontech.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu wrote:

>
> This system sounds ok for documents and general data that can always
> be revived via version control/history. But I can't see how this would
> work for financial transactions where you're dealing with money and
> bank accounts. Suppose I have $100 in my account. I decided to login
> to multiple servers and wire transfer $100 to another account on every
> server. And I hit submit exactly at the same time for every server so
> check. Sure they can resolve the conflict afterwards in terms of
> saying in terms of which transfer to kill off. But the fact is that my
> other account has that N x $100 already and I've just fleeced the bank.
>
Hmmm.......

I think you should realize by now that no multimaster async replication
solution is going to adequately work for something like these financial
transactions. You need either synchronous or single-master simply
because you need to have a concept that there is a *single*
authoritative source of the current status and balance on the account.
You can't really expect async-replicated servers to know about
transactions that haven't been replicated yet can you?

In other words this problem is inherent to multimaster async
replication. There is no way around it. If there was it would either
not be async or not be multimaster :-)

I have built my own async multimaster replication systems for some
applications. One of them is actually a system where you have two and
only two masters which replicate back and forth. The trick I used here
was for each master to use a different SERIAL field as its primary key
so there are no duplicates. Another option is to have a compound
primary key which is a serial and a server-hash (this would scale to a
larger number of masters).

Best Wishes,
Chris Travers
Metatron Technology Consulting

Attachment	Content-Type	Size
chris.vcf	text/x-vcard	127 bytes

From:	Chris Travers <chris(at)travelamericas(dot)com>
To:	William Yu <wyu(at)talisys(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Postgresql replication
Date:	2005-08-25 23:15:34
Message-ID:	430E5116.5000108@travelamericas.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu wrote:

> Another tidbit I'd like to add. What has helped a lot in implementing
> high-latency master-master replication writing our software with a
> business process model in mind where data is not posted directly to
> the final tables. Instead, users are generally allowed to enter
> anything -- could be incorrect, incomplete or the user does not have
> rights -- the data is still dumped into "pending" tables for people
> with rights to fix/review/approve later. Only after that process is
> the data posted to the final tables. (Good data entered on the first
> try still gets pended -- validation phase simply assumes the user who
> entered the data is also the one who fixed/reviewed/approved.)
>
In this case, why have multimaster replication at all?

Why not have the people who have rights to review this all write to the
master database and have that replicated back? It seems like latency is
not really an issue. Replication here is only going to complicate
things. If it were me, I would be having my approval app pull data from
*all* of the databases independently and not rely on the replication for
this part. The replication could then be used to replicate *approved*
data back to the slaves.

>
> There may be a delay for the user if he/she is working on a server
> that doesn't have rights to post his data. However, the pending->post
> model gets users used to the idea of (1) entering all data in large
> swoop and validating/posting it afterwards and (2) data can/will sit
> in pending for a period of time until it is acted upon with
> somebody/some server with the proper authority. Hence users aren't
> expecting results to pop up on the screen the moment they press the
> submit button.
>
I still don't understand the purpose of replicating the pending data...

Best Wishes,
Chris Travers
Metatron Technology Consulting

From:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To:	Matt Miller <mattm(at)epx(dot)com>
Cc:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 00:45:51
Message-ID:	430E663F.1080107@empires.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Matt Miller wrote:
>>>http://pgfoundry.org/projects/pgcluster/ which provides syncronous
>>>multi-master clustering.
>>
>>He specifically said that pgcluster did not work for him
>>because ...PGCluster requires that there be a load balancer and a
>>replicator centrally located managing the cluster. If a network
>>problem happens at the centralized location, it would bring down all
>>locations completely.
>
>
> I think the load balancer is an optional component. Clients can connect
> either to a load balancer, or directly to a specific machine.

The replicator surely is not optional, and must be centralized.

> Also, I think pgcluster can operate async. If the system runs in
> "normal" mode then the client gets a response as soon as the transaction
> is committed on the local machine. In "reliable" mode the client waits
> for the commit to happen on all machines.
>

Interesting. I suppose whatever works for your application is the right
answer for replication. PGCluster is query-based, right? I suppose the
question would then be, would he rather use a trigger-based replication
solution, like Slony, or a query-based replication solution.

Regards,
Jeff Davis

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 07:23:51
Message-ID:	demg28$2k1m$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Chris Travers wrote:
> Why not have the people who have rights to review this all write to the
> master database and have that replicated back? It seems like latency is
> not really an issue. Replication here is only going to complicate

What master database? Having a single master defeats the purpose of load
balancing to handle more users.

> things. If it were me, I would be having my approval app pull data
> from *all* of the databases independently and not rely on the
> replication for this part. The replication could then be used to
> replicate *approved* data back to the slaves.

If your app client happens to have high speed access to all servers,
fine. And you can guarantee uptime connections to all servers except for
the rare cases of hardware failure. The problem is if you don't, you end
up with every transaction running at the speed of the slowest connection
between a client and the farthest DB. While the final status of a
transaction does not need to show up anytime soon on a user's screen,
there still needs to be fast response for each individual user action.

How bad does the response get? I've done some simple tests comparing APP
<-LAN-> DB versus APP <-cross country VPN-> DB. Even simple actions like
inserting a recording and checking for a dupe key violation (e.g. almost
no bandwidth needed) takes about 10,000 times longer than over a 100mbit
LAN.

> I still don't understand the purpose of replicating the pending data...

Imagine a checking account. A request to make an online payment can be
made on any server. The moment the user submits a request, he sees it on
his screen. This request is considered pending and not a true
transaction yet. All requests are collected together via replication and
the "home" server for that account then can check the account balance
and decide whether there's enough funds to issue those payments.

From:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>
To:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 12:55:38
Message-ID:	20050826125539.5239.qmail@web53214.mail.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hi,

I´m thinking to test your suggestion, basically because there are only few sites to connect, but there are some points that aren´t very clear to me.

My doubts:

1. How to make a view updatable? Using the rule system?
1. Why are inserts handled differently from updates?
2. Can not I use the rule system to make updates go to the base table too (the master piece) like the inserts?
3. Is there not some peformance issues related with the fact of using a table that is an union instead a normal table?

Thanks in advance!

Reimer

Jeff Davis <jdavis-pgsql(at)empires(dot)org> escreveu:
Chris Browne wrote:
>
> Slony-I is a master/slave asynchronous replication system; if you
> already considered it unsuitable, then I see little likelihood of
> other systems with the same sorts of properties being suitable.
>
> What could conceivably be of use to you would be a *multimaster*
> asynchronous replication system. Unfortunately, multimaster
> *anything* is a really tough nut to crack.
>

In general that's a difficult problem, but in practice there may be a
solution.

For instance, perhaps the following configuration would be helpful:

Now, use Slony-I. For each dbY (where 1 <= Y <= N), make dbY a master
for tableX_Y (for all X where 1 <= X <= M) and a slave for tableX_Z (for
all X,Z where 1 <= X <= M, Z != Y).

Now, use a rule that replaces all INSERTs to tableX (where 1 <= X <= M)
on dbY (where 1 <= Y <= N) with INSERTs to tableX_Y.

Many applications can get by just fine by UNIONing the data like that,
and if not, perhaps work around it.

I hope this is helpful. Let me know if there's some reason my plan won't
work.

Regards,
Jeff Davis

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger
http://br.download.yahoo.com/messenger/

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 15:35:36
Message-ID:	604q9cn22f.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

chris(at)verkiel(dot)metatrontech(dot)com (Chris Travers) writes:
> William Yu wrote:
>> This system sounds ok for documents and general data that can
>> always be revived via version control/history. But I can't see how
>> this would work for financial transactions where you're dealing
>> with money and bank accounts. Suppose I have $100 in my account. I
>> decided to login to multiple servers and wire transfer $100 to
>> another account on every server. And I hit submit exactly at the
>> same time for every server so check. Sure they can resolve the
>> conflict afterwards in terms of saying in terms of which transfer
>> to kill off. But the fact is that my other account has that N x
>> $100 already and I've just fleeced the bank.
>>
> Hmmm.......
>
> I think you should realize by now that no multimaster async
> replication solution is going to adequately work for something like
> these financial transactions. You need either synchronous or
> single-master simply because you need to have a concept that there
> is a *single* authoritative source of the current status and balance
> on the account. You can't really expect async-replicated servers to
> know about transactions that haven't been replicated yet can you?

Actually, I disagree. There _is_ a way to cope with that issue.

The methodology involves not having a Balance Column anywhere in the
database, where activity *always* involves recording the "deltas."

If the DEBIT and the CREDIT for each financial transaction are always
committed, in any given place, as part of a single transaction, then
there is no reason for the system to fall out of balance.

This doesn't prevent the scenario of someone logging into many servers
simultaneously and emptying their account multiple times. But there
are other ways of addressing that problem.

This sort of scenario is almost certainly the case for the
stupendously enormous quantities of billing transactions that result
from long distance and cellular activity in the telco industry.

I'm pretty sure that they _don't_ track balance updates for each
transaction that applies to a customer's account. You could, via one
form of trickery or another, "overdraw" your account by a fairly hefty
amount, and they probably won't notice for a day or even three. But
once they notice/estimate that the Gentle Caller has built up some
dangerously high balance, they'll warn of impending discontinuation of
service if some sum isn't paid by some date.

This sort of analysis does not require that any location is considered
"authoritative" for the balance of the account. It suffices to have
some sort of "transaction cutoff," that the analysis is sure to
include all transactions up to *yesterday* at midnight UTC.

Some "bad things" might take place during the up-to-24h lag; the
assumption is that that won't be material, or that you can do other
sorts of traffic analysis to warn of impending problem accounts...
--
output = ("cbbrowne" "@" "ntlug.org")
http://www.ntlug.org/~cbbrowne/sap.html
"... The book [CLtL1] is about 400 pages of 8.5" by 11" Dover output.
Apparently the publisher and typesetter decided that this made the
lines too wide for easy reading, so they will use a 6" by 9" format.
This will make the shape of the book approximately cubical. Now,
there are 26 chapters counting the index, and a Rubik's cube has 26
exterior cubies. I'll let you individually extrapolate and fantasize
from there." -- GLS

From:	Chris Travers <chris(at)travelamericas(dot)com>
To:	William Yu <wyu(at)talisys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 17:26:44
Message-ID:	430F50D3.9090709@travelamericas.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu wrote:

> Chris Travers wrote:
>
>> Why not have the people who have rights to review this all write to
>> the master database and have that replicated back? It seems like
>> latency is not really an issue. Replication here is only going to
>> complicate
>
>
> What master database? Having a single master defeats the purpose of
> load balancing to handle more users.

I guess I am thinking along different lines than you. I was thinking
that the simplest solution would be to have master/slave replication for
*approved* transactions only and no replication for initial commits
prior to approval. This makes the assumption that a single transaction
will be committed on a single server, and that a single transaction will
not be split over multiple servers. In this way, you can commit a
pending transaction to any single server, and when it is approved, it
gets replicated via the master. See below for more.

>
>
> > things. If it were me, I would be having my approval app pull data
> > from *all* of the databases independently and not rely on the
> > replication for this part. The replication could then be used to
> > replicate *approved* data back to the slaves.
>
> If your app client happens to have high speed access to all servers,
> fine. And you can guarantee uptime connections to all servers except
> for the rare cases of hardware failure. The problem is if you don't,
> you end up with every transaction running at the speed of the slowest
> connection between a client and the farthest DB. While the final
> status of a transaction does not need to show up anytime soon on a
> user's screen, there still needs to be fast response for each
> individual user action.

Well... It depends on how it is implimented I guess. If you pull
transactional information in the background while the user is doing
other things, then it shouldn't matter. Besides, what should actually
happen is that your connection is only as slow as the connection to the
server which hosts the pending transaction you are trying to commit at
the moment. In this way, each request only goes to one server (the one
which has the connection). You could probably use DBI-Link and some
clever materialized views to maintain the metadata at each location
without replicating the whole transaction. You could probably even use
DBI-Link or dblink to pull the transactions in a transparent way. Or
you could replicate transactions into a pending queue dynamically...
There are all sorts of ways you could make this respond well over slow
connections. Remember, PostgreSQL allows you to separate storage from
presentation of the data, and this is quite powerful.

>
> How bad does the response get? I've done some simple tests comparing
> APP <-LAN-> DB versus APP <-cross country VPN-> DB. Even simple
> actions like inserting a recording and checking for a dupe key
> violation (e.g. almost no bandwidth needed) takes about 10,000 times
> longer than over a 100mbit LAN.

I think you could design a database such that duplicate keys are not an
issue and only get checked on the master and then should never be a
problem.

Thinking about it.... It seems here that one ends up with a sort of
weird "multi-master" replication based on master/slave replication if
you replicate these changes in the background (via another process,
Notify, etc).

>
>
>> I still don't understand the purpose of replicating the pending data...
>
>
> Imagine a checking account. A request to make an online payment can be
> made on any server. The moment the user submits a request, he sees it
> on his screen. This request is considered pending and not a true
> transaction yet. All requests are collected together via replication
> and the "home" server for that account then can check the account
> balance and decide whether there's enough funds to issue those payments.
>
Thinking about this.... The big issue is that you only want to
replicate the deltas, not the entire account. I am still thinking
master/slave, but something where the deltas are replicated in the
background or where the user, in checking his account, is actually
querying the home server. This second issue could be done via dblink or
DBI-Link and would simply require that a master table linking the
accounts with home servers be replicated (this should, I think, be
fairly low-overhead).

Best Wishes,
Chris Travers
Metatron Technology Consulting

From:	Matt Miller <mattm(at)epx(dot)com>
To:	Jeff Davis <jdavis-pgsql(at)empires(dot)org>
Cc:	Carlos Henrique Reimer <carlosreimer(at)yahoo(dot)com(dot)br>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 18:30:07
Message-ID:	1125081007.3660.40.camel@dbamm01-linux
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Thu, 2005-08-25 at 17:45 -0700, Jeff Davis wrote:
> The replicator surely is not optional, and must be centralized.

>From http://pgcluster.projects.postgresql.org/1_3/index.html:

"Several replication server can be set up. When an problem occurs at the
replication server, Cluster DB automatically changes connection to the
standing-by replication server. When all replication servers stop,
Cluster DB operates in stand-alone mode. There are two modes for
stand-alone mode: read only mode which only permits references, and read
write mode, which permits updating."

> PGCluster is query-based, right?

Yes.

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 22:50:43
Message-ID:	deo6c8$gsh$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Chris Browne wrote:
> I'm pretty sure that they _don't_ track balance updates for each
> transaction that applies to a customer's account. You could, via one
> form of trickery or another, "overdraw" your account by a fairly hefty
> amount, and they probably won't notice for a day or even three. But
> once they notice/estimate that the Gentle Caller has built up some
> dangerously high balance, they'll warn of impending discontinuation of
> service if some sum isn't paid by some date.

This works for companies that have some degree of power over their
customers. E.g. pay up or we disconnect your service. Return your
overdrafts/pay your fees or we mess up your credit.

This doesn't work if it's a small company who's account has been
emptied. Sure the bank will refuse to honor the check but then that
company will be hit with overdraw penalties and possible legal penalties
to the payee for the bounced check.

The warning threshold system is easy to implement but there will always
be corner cases where the warning is not soon enough or a single payment
wipes out ~ 100% of the account. Warn too often and it'll be ignored by
people as a "boy crying wolf" alarm.

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-26 23:07:00
Message-ID:	deo7ap$kip$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Chris Travers wrote:
> I guess I am thinking along different lines than you. I was thinking
> that the simplest solution would be to have master/slave replication for
> *approved* transactions only and no replication for initial commits
> prior to approval. This makes the assumption that a single transaction
> will be committed on a single server, and that a single transaction will
> not be split over multiple servers. In this way, you can commit a
> pending transaction to any single server, and when it is approved, it
> gets replicated via the master. See below for more.

This works if you don't care that multiple servers commit transactions
that force a budget or bank account to be exceeded.

> Thinking about this.... The big issue is that you only want to
> replicate the deltas, not the entire account. I am still thinking
> master/slave, but something where the deltas are replicated in the
> background or where the user, in checking his account, is actually
> querying the home server. This second issue could be done via dblink or
> DBI-Link and would simply require that a master table linking the
> accounts with home servers be replicated (this should, I think, be
> fairly low-overhead).

Except what you know have is your system fails if any server fail or is
inaccessible.

From:	Chris Travers <chris(at)travelamericas(dot)com>
To:	William Yu <wyu(at)talisys(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Postgresql replication
Date:	2005-08-27 01:26:22
Message-ID:	430FC13E.2000302@travelamericas.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu wrote:

> Chris Travers wrote:
>
>> I guess I am thinking along different lines than you. I was thinking
>> that the simplest solution would be to have master/slave replication
>> for *approved* transactions only and no replication for initial
>> commits prior to approval. This makes the assumption that a single
>> transaction will be committed on a single server, and that a single
>> transaction will not be split over multiple servers. In this way,
>> you can commit a pending transaction to any single server, and when
>> it is approved, it gets replicated via the master. See below for more.
>
>
> This works if you don't care that multiple servers commit transactions
> that force a budget or bank account to be exceeded.
>
Ok. then lets look at possibility B. (Async Multimaster Replication is
out).

>
>> Thinking about this.... The big issue is that you only want to
>> replicate the deltas, not the entire account. I am still thinking
>> master/slave, but something where the deltas are replicated in the
>> background or where the user, in checking his account, is actually
>> querying the home server. This second issue could be done via dblink
>> or DBI-Link and would simply require that a master table linking the
>> accounts with home servers be replicated (this should, I think, be
>> fairly low-overhead).
>
>
> Except what you know have is your system fails if any server fail or
> is inaccessible.
>
Ok. If you have a system where each location is authoritative for its
customers and the server transparently queries that server where needed
(via a view). then when any server becomes inaccessible then the
customers whose accounts are on that server become inaccessible. This
may not be accessible. But if this is the case, then you could treat
this as a partitioned table, where each partition is authoritative on
one location (see a previous post on how to do this) and then use Slony
to replicate. Again this does not get around your objection above
namely that it is possible to do duplicate transactions at multiple
locations. For this you would need an explicit call to the
authoritative server. I see no other way around that. This might allow
people to view tentative balances from other branches if the home (or
its connection) server is down, but they would not be able to withdraw
funds.

But if you do this, you have a different problem. Namely that
replication will be occuring over your slower than desired links. As
the number of branches grow, so will the bandwidth demands on every
branch. This may not therefore be scalable.

Unfortunately there is no free lunch here. And I think that at some
point you are going to have to choose between:

1) Efficiency of network throughput
2) Tolerance to attempts at repeat transactions before replication
(emptying an account multiple times)
3) Availability of a transaction.

You can pick any two. I think that this is the way it will work with
any other database system as well.

Best Wishes,
Chris Travers
Metatron Technology Consulting

> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-27 13:27:43
Message-ID:	deppoc$11n7$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Chris Travers wrote:
> 1) Efficiency of network throughput
> 2) Tolerance to attempts at repeat transactions before replication
> (emptying an account multiple times)
> 3) Availability of a transaction.

We ended up having to give up #1. It's possible to have our transactions
routed to multiple servers before it becomes a final transaction. User1
might request a payment on ServerA. User2 then marks the payment as
approved on ServerB. ServerC is authoritative and checks the bank/budget
balances before posting as final. After each of these steps requires
replication of the latest changes to all other servers. (In theory, the
first 2 steps only require replication to the authoritative server but
we do so anyways so all servers can act as backups for each other --
pending transactions still need to be restored in case of total DB failure.)

There's definitely a delay in terms of getting from point A to point Z;
duplicate servers. But there's guaranteed financial integrity, users can
connect to any server the load balancer picks and no server requires any
other server/connection to be up for individual user tranactions to occur.

From:	Chris Travers <chris(at)travelamericas(dot)com>
To:	William Yu <wyu(at)talisys(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-27 18:03:10
Message-ID:	4310AADE.6090008@travelamericas.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu wrote:

> Chris Travers wrote:
>
>> 1) Efficiency of network throughput
>> 2) Tolerance to attempts at repeat transactions before replication
>> (emptying an account multiple times)
>> 3) Availability of a transaction.
>
>
> We ended up having to give up #1. It's possible to have our
> transactions routed to multiple servers before it becomes a final
> transaction. User1 might request a payment on ServerA. User2 then
> marks the payment as approved on ServerB. ServerC is authoritative and
> checks the bank/budget balances before posting as final. After each of
> these steps requires replication of the latest changes to all other
> servers. (In theory, the first 2 steps only require replication to the
> authoritative server but we do so anyways so all servers can act as
> backups for each other -- pending transactions still need to be
> restored in case of total DB failure.)

Ok. I see where you are going with this.

It is an interesting problem. Multimaster Async Replication will give
you a problem in that it allows the attack you are describing due to the
asynchronous nature of the replication. If I were trying to prevent
this sort of attack, I would try to build into this some sort of
"account authority" which can manage these transactions. Origionally I
was thinking of the home server as the obvious place to start if it is
available. But if it is not, then you would need some infrastructure to
track attempted withdrawals and handle them appropriately. Such servers
could cache requests and if they see duplicates or many requests coming
from many servers on the same account could flag that. One option might
be to have a rotational authority (i.e. home-server, then next, then
next) in a ring so that an unavailable server still allows reasonable
precautions to be held against emptying the account many times.

Basically, if the servers aren't talking to eachother at the time of the
transaction, then they are not going to know about duplicates. You have
to have some setup beyond your replication to handle this. Otherwise
you will have some issues with delays causing the security risks you
deem unacceptable.

Your question seems to be "How do I set up multimaster async replication
such that a person cannot empty his account on each server" and the
answer is that this is an inherent limitation of multimaster async
replication. This also means that you will have to have some sort of
other verification process for such transactions beyond what is locally
available on the replicants.

> There's definitely a delay in terms of getting from point A to point
> Z; duplicate servers. But there's guaranteed financial integrity,
> users can connect to any server the load balancer picks and no server
> requires any other server/connection to be up for individual user
> tranactions to occur.

The delay will by definition defeat any guarantee of financial integrity
if you are allowing read-write operations to the replica without
checking with some sort of central authority. At very least, the
central authority should look for suspicious patterns. Again, it may be
possible to do some failover here, but I don't think you can do without
*some* sort of centralized control.

(Note, here load balancing is handled by the distribution of accounts.
A down server simply means that the next server in the ring will take
over its remote verification role).

This doesn;t make the security issue go away, but it may reduce it to an
acceptable level. I.e. it is still possible for duplicates to be
submitted just before and after a home server goes down, but this is a
lot better than being able to have one transaction repeated on each
server and then dealing with the massively overdrawn account.

Best Wishes,
Chris Travers
Metatron Technology Consulting

From:	William Yu <wyu(at)talisys(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-27 19:25:22
Message-ID:	deqen3$miv$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Our own personal IM :)

Chris Travers wrote:
> The delay will by definition defeat any guarantee of financial integrity
> if you are allowing read-write operations to the replica without
> checking with some sort of central authority. At very least, the
> central authority should look for suspicious patterns. Again, it may be
> possible to do some failover here, but I don't think you can do without
> *some* sort of centralized control.

Actually this is the easy part. When the home server finally issues
payments, it only issues what it knows about and what can be verified as
OK. Any transactions that are currently being entered on another server
will appear after the next replication cycle and it will be verified
afterwards. If the previous payment issuing cycle used up all the money,
the "new" requests are kept in pending until money is put in. This does
allow for newer requests that happen to be executed on home servers to
possibly take precendence over old requests but there is no requirement
in the business process that payments must come out in any specific order.

> This doesn;t make the security issue go away, but it may reduce it to an
> acceptable level. I.e. it is still possible for duplicates to be
> submitted just before and after a home server goes down, but this is a
> lot better than being able to have one transaction repeated on each
> server and then dealing with the massively overdrawn account.

The "home" server going down is the trickiest issue. Because when a
server disappears, is that because it went down temporarily? For good? A
temporary internet problem where nobody can get access to it? Or an
internet routing issue where just the connection between those two
servers is severed? If it's the last, users might still be doing stuff
on ServerA with ServerA is posting financials but ServerB thinks the
server is down and decides to take over ServerA's duties. Of course, in
ServerA's view, it's ServerB and ServerC that's down -- not itself.

Maybe we can mitigate this by having more servers at more data centers
around the world so everybody can monitor everybody. At some point, if
you have N servers and N-1 servers say ServerA is down, it probably is
down. With a high enough N, ServerA could probably decisively decide it
was the server severed from the internet and refuse to post any
financials until connection to the outside world was restore + some
extra threshold.

This last problem, which luckily occurs rarely, we do by hand right now.
We're not ready to run this on full auto because we only have 2 data
centers (with multiple servers within each data center). The servers do
not have enough info to know which server is actually down in order to
auto-promote/demote. It does require staff that's not just in 1 location
though because our primary office going down w/ our local datacenter
would mean nobody there could do the switchover. (Assuming major natural
disaster that kept us from using our laptops at the local Starbucks to
do the work.)

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-28 03:35:52
Message-ID:	60oe7iwx5z.fsf@dba2.int.libertyrms.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

William Yu <wyu(at)talisys(dot)com> writes:
> Chris Browne wrote:
>> I'm pretty sure that they _don't_ track balance updates for each
>> transaction that applies to a customer's account. You could, via one
>> form of trickery or another, "overdraw" your account by a fairly hefty
>> amount, and they probably won't notice for a day or even three. But
>> once they notice/estimate that the Gentle Caller has built up some
>> dangerously high balance, they'll warn of impending discontinuation of
>> service if some sum isn't paid by some date.
>
> This works for companies that have some degree of power over their
> customers. E.g. pay up or we disconnect your service. Return your
> overdrafts/pay your fees or we mess up your credit.
>
> This doesn't work if it's a small company who's account has been
> emptied. Sure the bank will refuse to honor the check but then that
> company will be hit with overdraw penalties and possible legal
> penalties to the payee for the bounced check.
>
> The warning threshold system is easy to implement but there will
> always be corner cases where the warning is not soon enough or a
> single payment wipes out ~ 100% of the account. Warn too often and
> it'll be ignored by people as a "boy crying wolf" alarm.

In a context where there is no "degree of power over their customers,"
I would hardly think that the presence/lack of automated controls or
presence/lack of balance synchronization is particularly material.

In other words, if trying to apply policies is going to forcibly
break, then building the data system may have been a futile exercise
in the first place. And trying to distribute the system may again be
a *business* mistake that admits no technical solution.

If that's the case, then we can conclude that replication is no
solution, and that the organization should improve connectivity for
their single centralized system.

That may be the case, but is the uninteresting case, as it amounts to
throwing our hands up, saying "no answers; you can't have a
distributed system," and giving up on trying anything further. No
point to further discussion...
--
let name="cbbrowne" and tld="acm.org" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/linux.html
"There is no reason anyone would want a computer in their home".
-- Ken Olson, Pres. and founder of Digital Equipment Corp. 1977

From:	Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-30 17:55:08
Message-ID:	20050830175508.GB28511@phlogiston.dyndns.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Wed, Aug 24, 2005 at 11:54:42AM -0400, Chris Browne wrote:
> There is a Slony-II project ongoing that is trying to construct a
> more-or-less synchronous multimaster replication system (where part of
> the cleverness involves trying to get as much taking place in an
> asynchronous fashion as possible) that would almost certainly be of no
> use to your "use case."

Just to emphasise this point: assuming we ever get Slony-II to work,
it is all but guaranteed to be useless for cases like the one that
started this thread: it'll simply require very fast network
connections to work. I've had more than one person ask me when
multi-site multimaster is coming, and my answer is always, "Have you
started work on it yet?" I think there might be a way to hack up
Slony-I to do it -- Josh Berkus gave me a quick outline while at
OSCON that made me think it possible -- but AFAIK, nobody is actually
doing that work.

It's worth noting that single-origin master-slave async replication
is tricky, but by no means impossible. Multi-master _anything_ is
hard, no question about it; and it more or less always imposes some
overhead that you won't like. The question is merely whether you
want to pay that price.

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton

From:	Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Postgresql replication
Date:	2005-08-30 17:56:25
Message-ID:	20050830175625.GC28511@phlogiston.dyndns.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Thu, Aug 25, 2005 at 01:44:15PM +0200, Bohdan Linda wrote:
> there are some other db solutions which have good performance when doing
> this kind of replication across the world.

Bluntly, "No."

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
Information security isn't a technological problem. It's an economics
problem.
--Bruce Schneier