Re: [Sequoia] PostgreSQL Documentation of High Availability and Load
Hi,
Emmanuel Cecchet wrote:
I think that you can still have a matrix with the main
features(performance, data loss/failover/failback on node failure,
Disaster recovery, WAN, ...) and how each approach (master/slave, shared
disk, multi-master, ...) addresses the issue.
Yes, I certainly agree with that.
These are good questions to analyze a certain solution. As far as our
documentation is concerned, I think giving rough estimates for
categories of replication algorithms is sufficient (i.e. stating that
Multi Master Replication scales very good for reading transactions,
but not very well for writing ones).
Even here I think that there is a common misconception between
performance and scalability. Most people think that by having multiple
nodes their query will run faster which is obviously wrong if your
original workload does not saturate a single node.
Sure. Do you think that should be made clearer?
The replication
mechanisms are even adding overhead (usually perceived as increased
latency) to the query execution. It is ONLY when the workload increases
that you can see throughput going up (ideally somewhat close to the
workload increase) and query latency remaining stable. Unless you really
have a parallel query execution (that is only efficient for big queries
anyway), you will never see a performance improvement on a single query
execution since this is always the same database engine that executes
the query in the end.
I don't quite agree with that statement, but probably I'm just
misreading it. If you have enough concurrent transactions you can spread
among the nodes, you'll certainly note an improvement. After all, it's a
huge difference, if your single node is processing only ten or hundreds
of concurrent transactions.
Of course, the amount of concurrent transactions limits how far a
replication solution can scale. Having more nodes than concurrent
transactions does not make sense. (Of course with the exception of
parallel query execution.)
But you are right that full replication (in shared nothing environments)
does not perform with write heavy workload. At best it will go to the
speed of the fastest node in the cluster, but it will usually degrade
quickly. A good replication implementation will have a constant overhead
on query execution time (let's say few millliseconds). Therefore the
impact will be quite different if this is a small query or a
long-running query. Adding few milliseconds to a query that takes
seconds to execute is negligible but adding the same time to a
sub-millisecond query will be a tremendous slowdown (in term of latency).
To summarize, clustering solutions provide performance scalability
(stable latency, throughput increasing almost linearly with load) but
not performance improvement on individual query execution time.
Yes, for writing transactions, no for read-only ones (queries?). Or why
do you have to add overhead to read-only queries?
If the
client application is not multithreaded it is very unlikely that any
solution will improve the application performance.
Ehm.. I wouldn't refer to threading here. You can very well have
multiple single-process programs running on different nodes...
I'd keep referring to concurrency of transactions.
As an additional point, transactions including calls such as 'select
nextval' should be considered as write transactions with PostgreSQL.
Sure.
That might not be obvious for most users.
Agreed.
When configured with RAIDb-1,
I know RAID-1, but what's a RAIBd-1?
RAIDb is an acronym for Redundant Array of Inexpensive Databases.
You can find an article on this at
http://c-jdbc.objectweb.org/current/doc/RR-C-JDBC.pdf
Aha, thank you.
That's great that the work was revived in 8.2. Yes, Postgres-R is much
more embedded in Postgres but I was confused with Middle-R that was done
later on with Bettina and Ricardo using a similar technique at the
middlware level.
Yeah, I thought you meant that one. I don't know Middle-R at all, sorry.
Seems similar to sequoia. Did you base your work on Middle-R?
What are your development plans for Postgres-R?
To make it work and production ready as soon as possible. ;-) I'm
currently working on initialization and recovery.
Regards
Markus
Home |
Main Index |
Thread Index