Re: cluster replication with intermezzo

Lists: pgsql-general
From: Robert Williams <bob(at)bob(dot)usuhs(dot)mil>
To: pgsql-general(at)postgresql(dot)org
Subject: cluster replication with intermezzo
Date: 2002-10-01 12:51:52
Message-ID: 3D999A68.4060209@bob.usuhs.mil
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'm running postgresql on a two machine
intermezzo cluster (www.inter-mezzo.org).
I haven't tested it with a heavy load yet -
maybe today, but it works under a light load, with
bidirectional replication of the file system containing
the database (/var/lib/pgsql). The goal
of course is to incorporate these real servers
into a scalable load balancing high availability
distributed cluster.

Each machine is running postmaster, a
seeming violation of the man page prohibition
against running more than one postmaster on
a postgres database. This is necessary,
because the system must be fully functional even
when one machine is taken off line.

I don't thing this should be a problem,
since as I understand it, table and row
locking occurs at the postgres backend level
and lock files are kept in a database table,
itself replicated across the intermezzo file system.

Can anyone think of any reason why this might
not work under a heavy load? My load test
later today or tomorrow on my two node cluster
will only partly answer this question.

I'm using the latest CVS version with kernel 2.4.18.

Robert Williams


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: bob(at)bob(dot)usuhs(dot)mil
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: cluster replication with intermezzo
Date: 2002-10-01 14:07:28
Message-ID: 21495.1033481248@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert Williams <bob(at)bob(dot)usuhs(dot)mil> writes:
> I'm running postgresql on a two machine
> intermezzo cluster (www.inter-mezzo.org).

This *will* *not* *work*. Period.

> I haven't tested it with a heavy load yet -

Expect data corruption as soon as you exercise it at all.

> I don't thing this should be a problem,
> since as I understand it, table and row
> locking occurs at the postgres backend level
> and lock files are kept in a database table,

No, the locking is all done in shared memory. Since you've got two
postmasters with two separate shared memory blocks, there is no
interlocking between the two sets of backends. There are more problems
here than I can easily enumerate :-(

Now, you could possibly make it work as a hot-failover setup, ie,
one machine can start running a postmaster after the other one crashes.
But two postmasters running simultaneously against the same file set
won't work.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: bob(at)bob(dot)usuhs(dot)mil, pgsql-general(at)postgresql(dot)org
Subject: Re: cluster replication with intermezzo
Date: 2002-10-01 15:23:28
Message-ID: 200210011523.g91FNSI18756@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Robert Williams <bob(at)bob(dot)usuhs(dot)mil> writes:
> > I'm running postgresql on a two machine
> > intermezzo cluster (www.inter-mezzo.org).
>
> This *will* *not* *work*. Period.
>
> > I haven't tested it with a heavy load yet -
>
> Expect data corruption as soon as you exercise it at all.

No problem. He is only using it to play short pieces of music --- get
it, intermezzo? ;-)

$ dict intermezzo
in.ter.mez.zo \.int-*r-'met-so_-, -'med-zo_-\ n, pl -zi \-se_-,
-ze_-\ or -zos : a short movement connecting major sections of an
extended musical work (as a symphony); also : a short independent
instrumental composition

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: bob(at)bob(dot)usuhs(dot)mil, pgsql-general(at)postgresql(dot)org
Subject: Re: cluster replication with intermezzo
Date: 2002-10-01 17:55:19
Message-ID: 87heg6f3i0.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Robert Williams <bob(at)bob(dot)usuhs(dot)mil> writes:
> > I don't thing this should be a problem,
> > since as I understand it, table and row
> > locking occurs at the postgres backend level
> > and lock files are kept in a database table,
>
> No, the locking is all done in shared memory. Since you've got two
> postmasters with two separate shared memory blocks, there is no
> interlocking between the two sets of backends.

Speaking of which, I vaguely recall the OpenMOSIX guys talking about
possibly implementing clusterable shared memory (i.e. "shared" across
machines in a cluster) at some point in the future. There would still
be some problems with using PostgreSQL in that environment (e.g. the
different semantics between NFS and normal filesystems), but it's an
interesting possibility, at any rate.

Cheers,

Neil

--
Neil Conway <neilc(at)samurai(dot)com> || PGP Key ID: DB3C29FC


From: Alvaro Herrera <alvherre(at)atentus(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <bob(at)bob(dot)usuhs(dot)mil>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: cluster replication with intermezzo
Date: 2002-10-01 18:04:12
Message-ID: Pine.LNX.4.33.0210011359390.19389-100000@polluelo.lab.protecne.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 1 Oct 2002, Neil Conway wrote:

> Speaking of which, I vaguely recall the OpenMOSIX guys talking about
> possibly implementing clusterable shared memory (i.e. "shared" across
> machines in a cluster) at some point in the future.

To make PostgreSQL _really_ work in an environment like that, there
would have to be some way of differentiate "local" shared memory versus
"remote", because the speed of accessing remote shmem would be much
lower than local shmem. What would be the gain versus have multi-master
replication?

ISTM horizontal partitioning of tables can give similar results without
a so different architecture.

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)atentus(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, bob(at)bob(dot)usuhs(dot)mil, pgsql-general(at)postgresql(dot)org
Subject: Re: cluster replication with intermezzo
Date: 2002-10-01 19:19:58
Message-ID: 200210011919.g91JJwf12742@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera wrote:
> On 1 Oct 2002, Neil Conway wrote:
>
> > Speaking of which, I vaguely recall the OpenMOSIX guys talking about
> > possibly implementing clusterable shared memory (i.e. "shared" across
> > machines in a cluster) at some point in the future.
>
> To make PostgreSQL _really_ work in an environment like that, there
> would have to be some way of differentiate "local" shared memory versus
> "remote", because the speed of accessing remote shmem would be much
> lower than local shmem. What would be the gain versus have multi-master
> replication?
>
> ISTM horizontal partitioning of tables can give similar results without
> a so different architecture.

As I remember, to do locking, they transfer the shared memory to the
local machine, then do the locking --- seems kind of slow.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, bob(at)bob(dot)usuhs(dot)mil, pgsql-general(at)postgresql(dot)org
Subject: Re: cluster replication with intermezzo
Date: 2002-10-03 19:21:06
Message-ID: 3D9C98A2.666F5BF8@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Neil Conway wrote:
>
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > Robert Williams <bob(at)bob(dot)usuhs(dot)mil> writes:
> > > I don't thing this should be a problem,
> > > since as I understand it, table and row
> > > locking occurs at the postgres backend level
> > > and lock files are kept in a database table,
> >
> > No, the locking is all done in shared memory. Since you've got two
> > postmasters with two separate shared memory blocks, there is no
> > interlocking between the two sets of backends.
>
> Speaking of which, I vaguely recall the OpenMOSIX guys talking about
> possibly implementing clusterable shared memory (i.e. "shared" across
> machines in a cluster) at some point in the future. There would still
> be some problems with using PostgreSQL in that environment (e.g. the
> different semantics between NFS and normal filesystems), but it's an
> interesting possibility, at any rate.

Only if they implement cluster-shared-memory supporting TAS. Otherwise
we would have to fallback to some sort of cluster-safe implementation of
semaphores for every single bit to lock ... and that I guess would eat
alot of the neat performance someone expects to get from that setup.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #