Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: [Sequoia] PostgreSQL Documentation of High Availability and Load



Hi,

Emmanuel Cecchet wrote:
I think that you can still have a matrix with the main features(performance, data loss/failover/failback on node failure, Disaster recovery, WAN, ...) and how each approach (master/slave, shared disk, multi-master, ...) addresses the issue.

Yes, I certainly agree with that.

These are good questions to analyze a certain solution. As far as our documentation is concerned, I think giving rough estimates for categories of replication algorithms is sufficient (i.e. stating that Multi Master Replication scales very good for reading transactions, but not very well for writing ones).
Even here I think that there is a common misconception between performance and scalability. Most people think that by having multiple nodes their query will run faster which is obviously wrong if your original workload does not saturate a single node.

Sure. Do you think that should be made clearer?

The replication mechanisms are even adding overhead (usually perceived as increased latency) to the query execution. It is ONLY when the workload increases that you can see throughput going up (ideally somewhat close to the workload increase) and query latency remaining stable. Unless you really have a parallel query execution (that is only efficient for big queries anyway), you will never see a performance improvement on a single query execution since this is always the same database engine that executes the query in the end.

I don't quite agree with that statement, but probably I'm just misreading it. If you have enough concurrent transactions you can spread among the nodes, you'll certainly note an improvement. After all, it's a huge difference, if your single node is processing only ten or hundreds of concurrent transactions.

Of course, the amount of concurrent transactions limits how far a replication solution can scale. Having more nodes than concurrent transactions does not make sense. (Of course with the exception of parallel query execution.)

But you are right that full replication (in shared nothing environments) does not perform with write heavy workload. At best it will go to the speed of the fastest node in the cluster, but it will usually degrade quickly. A good replication implementation will have a constant overhead on query execution time (let's say few millliseconds). Therefore the impact will be quite different if this is a small query or a long-running query. Adding few milliseconds to a query that takes seconds to execute is negligible but adding the same time to a sub-millisecond query will be a tremendous slowdown (in term of latency). To summarize, clustering solutions provide performance scalability (stable latency, throughput increasing almost linearly with load) but not performance improvement on individual query execution time.

Yes, for writing transactions, no for read-only ones (queries?). Or why do you have to add overhead to read-only queries?

If the client application is not multithreaded it is very unlikely that any solution will improve the application performance.

Ehm.. I wouldn't refer to threading here. You can very well have multiple single-process programs running on different nodes...

I'd keep referring to concurrency of transactions.

As an additional point, transactions including calls such as 'select nextval' should be considered as write transactions with PostgreSQL.

Sure.

That might not be obvious for most users.

Agreed.

When configured with RAIDb-1,
I know RAID-1, but what's a RAIBd-1?
RAIDb is an acronym for Redundant Array of Inexpensive Databases.
You can find an article on this at http://c-jdbc.objectweb.org/current/doc/RR-C-JDBC.pdf

Aha, thank you.

That's great that the work was revived in 8.2. Yes, Postgres-R is much more embedded in Postgres but I was confused with Middle-R that was done later on with Bettina and Ricardo using a similar technique at the middlware level.

Yeah, I thought you meant that one. I don't know Middle-R at all, sorry. Seems similar to sequoia. Did you base your work on Middle-R?

What are your development plans for Postgres-R?

To make it work and production ready as soon as possible. ;-) I'm currently working on initialization and recovery.

Regards

Markus




Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group