Re: [Sequoia] PostgreSQL Documentation of High Availability and Load
- From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
- To: Emmanuel Cecchet <emmanuel(dot)cecchet(at)continuent(dot)com>
- Cc: pgsql-docs(at)postgresql(dot)org, Sequoia general mailing list <sequoia(at)lists(dot)forge(dot)continuent(dot)org>, pgpool-hackers(at)pgfoundry(dot)org, pgcluster-general(at)pgfoundry(dot)org, community(at)gorda(dot)di(dot)uminho(dot)pt
- Subject: Re: [Sequoia] PostgreSQL Documentation of High Availability and Load
- Date: Tue, 21 Nov 2006 11:08:19 +0100
- Message-id: <4562D013(dot)2010504(at)bluegap(dot)ch>
Hello Emmanuel,
Emmanuel Cecchet wrote:
I just quickly went through the classification.
I don't think that your description of 'multi-master replication using
clustering' is correct. Oracle RAC is a shared disk approach and just
send cache invalidations to other nodes but not actual data. As the disk
is shared, data is only commited once to disk and there is a distributed
locking protocol to make nodes agree on a serializable transactional order.
I agree with that. AFAI understand, we are not entirely sure how much of
that we want to cover in our documentation.
You can have a look at the last ApacheCon presentation I gave
(http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf),
thank you for that link, I'll read through it ASAP.
there is a comparison of most solutions. I think that it would be cool
to have a comparison matrix with the different features that you may
need in an HA solution and see how the different solutions can provide
an answer to these problems.
We've already come to the consensus, that we don't want to list all
products in our documentation, but better do that on the website, where
it can easily be changed at any time. The documentation should explain
fundamentals.
What you describe as 'Statement-Based Replication Middleware' which is
in fact multi-master replication at the middleware level, is usually
more than just a simple proxy that broadcasts all queries. It must
provide 1-copy-serializability if you want the cluster to be consistent.
Yes, the current description is very much targeted at pgpool. Thank you
very much for your input, that helps to clarify things.
This means that write queries must be sent in the same serializable
order to every node. As SQL is interpreted, macros can be replaced on
the fly by the middleware with cluster-wide values. About sequences,
calls to sequences must be broadcast as if it were writes. This is
needed to update sequences the same way on all nodes. Note that
sequences are parts of unrollbackable changes of the database. This
means that even if a transaction rollbacks at one node, its sequence
will not be rollbacked meaning that all other nodes must also play
rollbacked transactions (at runtime or recovery time) if they were
accessing sequences or involving other operations that the database
cannot rollback.
From a performance standpoint, the middleware approach can balance
queries that come from the same client connection to different nodes
which is not the case when you are directly connected to a database
instance. Also failover can be made fully transparent at the middleware
level (any node failure can be completely hidden to the client). With
Sequoia, we even hide middleware failures by providing transparent
failover code in our driver.
The discussion should add a section describing real HA features (not
just load balancing):
- Do you lose data on failure?
- Will my transactions fail on a node failure?
- Is failover transparent? Does it need special support in the client
application?
- Is failback a manual process?
- How long does it take to restart/resynchronize a node?
- Can I add nodes on the fly to the cluster?
- Can I do maintenance operations without stopping the cluster?
- Can I upgrade the cluster without interruptions (e.g. migrate from
PostgreSQL 7.4 to 8.0 without interruption) ?
- WAN support? How network partitions are handled?
These are good questions to analyze a certain solution. As far as our
documentation is concerned, I think giving rough estimates for
categories of replication algorithms is sufficient (i.e. stating that
Multi Master Replication scales very good for reading transactions, but
not very well for writing ones).
When configured with RAIDb-1,
I know RAID-1, but what's a RAIBd-1?
Sequoia provide a multi-master (full)
replication at the middleware level. Note that Sequoia also allows for
partial replication or data partitioning (but the granularity is the
table meaning that you can only distribute tables on different nodes but
not slice a table and split it on different nodes as you describe in
Data Partitioning).
Thanks again for this great work and I hope this will help improve the
documentation.
Sure, thank you very much for your input. I'm glad we have the
possibility to cover sequoia, too.
Note that there is also Postgres-R that is another approach to
middleware-based replication.
Middleware? Mrs. Kemme also did some research with middleware
replication, but I wouldn't exactly call Postgres-R a middleware-based
replication solution. I've ported Postgres-R to PostgreSQL 8.2 and can
assure you that it's very well embedded into the backend of PostgreSQL.
Probably too well for some ;-) You might want to check out my (slightly
outdated) website www.postgres-r.org.
Regards
Markus
Home |
Main Index |
Thread Index