Replication Documentation

Lists: pgsql-hackerspgsql-patches
From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Subject: Replication Documentation
Date: 2006-08-01 20:05:06
Message-ID: 60bqr4jk7h.fsf_-_@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Here's a patch to add in the material on replication recently
discussed on pgsql.docs. I'm not thrilled that there were only a few
comments made; I'd be happy to see "slicing and dicing" to see this
made more useful.

Index: filelist.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/filelist.sgml,v
retrieving revision 1.44
diff -c -u -r1.44 filelist.sgml
--- filelist.sgml 12 Sep 2005 22:11:38 -0000 1.44
+++ filelist.sgml 1 Aug 2006 20:00:00 -0000
@@ -44,6 +44,7 @@
<!entity config SYSTEM "config.sgml">
<!entity user-manag SYSTEM "user-manag.sgml">
<!entity wal SYSTEM "wal.sgml">
+<!entity replication SYSTEM "replication.sgml">

<!-- programmer's guide -->
<!entity dfunc SYSTEM "dfunc.sgml">
Index: postgres.sgml
===================================================================
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/postgres.sgml,v
retrieving revision 1.77
diff -c -u -r1.77 postgres.sgml
--- postgres.sgml 10 Mar 2006 19:10:48 -0000 1.77
+++ postgres.sgml 1 Aug 2006 20:00:00 -0000
@@ -155,6 +155,7 @@
&diskusage;
&wal;
&regress;
+ &replication;

</part>

---- Then add the following as .../doc/src/sgml/replication.sgml

<!-- $PostgreSQL$ -->

<chapter id="replication"> <title> Replication </title>

<indexterm><primary>replication</primary></indexterm>

<para> People frequently ask about what replication options are
available for <productname>PostgreSQL</productname>. Unfortunately,
there are so many approaches and models to this that are useful for
different purposes that things tend to get confusing.
</para>

<para> At perhaps the most primitive level, one might use <xref
linkend="backup"> tools, whether <xref linkend="app-pgdump"> or
<xref linkend="continuous-archiving"> to create additional copies of
databases. This <emphasis>doesn't</emphasis> provide any way to
keep the replicas up to date; to bring the state of things to a
different point in time requires bringing up another copy. There is
no way, with these tools, for updates on a <quote>master</quote>
system to automatically propagate to the replicas.</para>

<sect1> <title> Categorization of Replication Systems </title>

<para> Looking at replication systems, there are a number of ways in
which they may be viewed:

<itemizedlist>

<listitem><para> Single master versus multimaster.</para>

<para> That is, whether there is a single database considered
<quote>master</quote>, where all update operations are required
to be submitted, or the alternative, multimaster, where updates
may be submitted to any of several databases.</para>

<para> Multimaster replication is vastly more complex and
expensive, because of the need to deal with the possibility of
conflicting updates. The simplest example of this is where a
replicated database manages inventory; the question is, what
happens when requests go to different database nodes requesting
a particular piece of inventory?</para>

<para> Synchronous multimaster replication introduces the need
to distribute locks across the systems, which, in research work
done with Postgres-R and Slony-II, has proven to be very
expensive. </para></listitem>

<listitem><para> Synchronous versus asynchronous</para>

<para>Synchronous systems are ones where updates must be
accepted on all the databases before they are permitted to
<command>COMMIT</command>. </para>

<para> Asynchronous systems propagate updates to the other
databases later. This permits the possibility that one database
may have data significantly behind others. Whether or not being
behind is acceptable or not will depend on the nature of the
application.</para>

<para> Asynchronous multimaster replication introduces the
possibility that conflicting updates will be accepted by
multiple nodes, as they don't know, at <command>COMMIT</command>
time, that the updates conflict. It is then necessary to have
some sort of conflict resolution system, which can't really be
generalized as a generic database facility. An instance of this
that is commonly seen is in the <productname>PalmOS
HotSync</productname> system; the <quote>general policy</quote>
when conflicts are noticed is to allow both conflicting records
to persist until a human can intervene. That may be quite
acceptable for an address book; it's <emphasis>not</emphasis>
fine for OLTP systems. </para>

</listitem>

<listitem><para> Update capture methods </para>

<para> Common methods include having triggers on tables,
capturing SQL statements, and capturing transaction log (WAL)
updates </para>

<itemizedlist>

<listitem><para> Triggers, as used in eRServer and Slony-I,
have the advantage of capturing updates at the end of
processing when all column values have been finalized. The use
of transaction visibility (MVCC) and ordering can provide
strong guarantees on consistency. </para>

<para> Of course, firing a trigger for each tuple update comes
at a not inconsiderable cost: a statement that touches 10,000
tuples will fire the trigger 10,000 times, and transform, on
the subscriber, into 10,000 SQL statements.</para></listitem>

<listitem><para> Statement capture almost exactly reverses the
issues, as compared to triggers.</para>

<para> There are no strong guarantees on consistency: any sort
of nondeterministic query can <quote>corrupt</quote> things by
introducing differences between nodes. Here are four examples
of cases where naive statement capture is sure to get things
wrong:</para>

<itemizedlist>
<listitem><para><command>INSERT INTO mytable (txntime,
product, quantity, taxes, total) values (now(), 'AB-275', 10,
45, 250.00);</command></para> <para> Some replication systems
parse the queries, replacing date requests with
timestamps. </para>
</listitem>
<listitem><para><command>INSERT INTO table2 (random() *
50);</command></para> <para> In this case, nondeterminism is
fairly much the point!</para>
</listitem>
<listitem><para>Any use of sequnce values as defaults,
particularly with per-connection value cacheing, will open up
occasions for values to diverge between
nodes.</para></listitem>

<listitem><para><command>INSERT INTO tab1 (txn_type, tdate,
quantity, units, price) SELECT * FROM tab2 ORDER BY txn_type
limit 50;</command></para>

<para> There are many variations on this which will turn out
badly: </para>
<itemizedlist>
<listitem><para>If there are default fields in tab1
that are set using sequences, the only way to even
hope for the same ordering is to have
an <command>ORDER BY</command> clause that ensures
identical ordering on both hosts.</para></listitem>
<listitem><para> If the ordering isn't a suitable
total ordering, the requests for data from tab2 may
find different data on different
hosts.</para></listitem>
<listitem><para>Columns with a default
of <function>now()</function> will be troublesome as
mentioned earlier, and this makes the problem harder
because unlike in the earlier query, where one might
substitute '2006-09-02 04:42:23-00'
for <function>now()</function>, this requires a
substantial rewriting of the query.</para></listitem>
</itemizedlist>
</listitem>
</itemizedlist>

</listitem>
</itemizedlist>

</listitem>

</itemizedlist>

</para>

</sect1>

<sect1 id="replicationsystems"> <title> PostgreSQL Replication Systems and Their Uses </title>

<para> Based on the preceding taxonomy, we may categorize various
replication systems, which should be helpful in determining what
they may be best used for, and whether they are compatible with
your <quote>use case.</quote></para>

<sect2><title> Slony-I</title>

<para> Slony-I is a single-master to multiple subscriber
asynchronous replication system that captures updates using
triggers. </para>

<para> For many systems, it is not clear how to initialize
replication on a new node some time after a system has been set up
in production. Slony-I was specifically designed to provide the
ability to introduce new nodes without the need to interrupt
activity on the master node. </para>

<para> It has, a particular merit, that, by only using components
internal to PostgreSQL, it is compatible with multiple versions of
PostgreSQL. This lends it especially to assisting at upgrading
systems from one version of PostgreSQL to another without
requiring a long outage. </para>

<para> It suffers from three particular problems:</para>

<itemizedlist>
<listitem><para> Despite improvements from earlier versions, it
is fairly complex to configure and administer.</para></listitem>
<listitem><para> It can only replicate changes that can be
captured using triggers. </para>

<para> There is a handling for sequences, which comes via
polling, but Slony-I <emphasis>does not</emphasis> provide an
automatic way to replicate other sorts of objects. </listitem>

<listitem><para> The handling of DDL changes is somewhat fragile,
and exists as something of a bag on the side. </para>

<para> There has been loose discussion as to how to address
that; useful comprehensive answers have not emerged.
</listitem>
</itemizedlist>

<sect3> <title> Use Cases </title>

<para> Slony-I has proven useful for the following sorts of usages: </para>
<itemizedlist>

<listitem><para> Upgrading from one PostgreSQL release to
another with only brief downtime. </para></listitem>

<listitem><para> Providing extra database copies that are nearly
up to date that may be used to offload read activity from the
<quote>master</quote> database system. </para></listitem>

<listitem><para> Providing extra database copies that are nearly
up to date that may be used as failover targets. </para>
</listitem>

</itemizedlist>

</sect2>

<sect2><title> pgpool </title>

<para> <application>pgpool</application> was initially created by
Tatsuo Isshii as a portable alternative to Java connection pool
modules. He subsequently observed that it wouldn't take very much
effort to extend it to create a simple replication system: if it
is forwarding SQL queries to a PostgreSQL instance, extending that
to two databases is very straightforward. </para>

<para> It suffers, by nature, from the problems associated with
replicating using capture of SQL statements; any sort of
nondeterminism in the replicated statements will cause the
databases to diverge. </para>

<para> On the other hand, it is very easy to install and
configure; for users with simple requirements, that can
suffice. </para>

<para> A <application>pgpool-2</application> is under way which
introduces a more sophisticated query parser to try to address the
nondeterminism issues; that may limit ongoing support for the
legacy version.</para>

<sect3> <title> Use Cases </title>

<para> pgpool has proven useful for the following sorts of usages: </para>
<itemizedlist>

<listitem><para> Dividing read-only database activity between
two database instances. </para></listitem>

<listitem><para> Providing a simple replication system for
systems that do not make use of nondeterministic update
queries. </para></listitem>

</itemizedlist>

</sect3>

</sect2>

<sect2> <title> PITR - Point In Time Recovery </title>

<para> If you have a database cluster that supports a large number
of database instances (<emphasis>e.g.</emphasis> - varying values
for PGDATABASE), connection-managing systems like pgpool and
systems like Slony-I which require a manager process for each
database for each node that is replicated will turn out quite
badly.</para>

<para> For instance, if you have a database cluster that hosts 300
databases, as would be the case in a "web hosting" situation, for
Slony-I to replicate all of this data, it would have to have 300
slon processes for each node. </para>

<para> PITR is likely to be more suitable in this case; that
doesn't provide you with a usable replica running, but it can
recover <emphasis>all</emphasis> of the tables in
<emphasis>all</emphasis> of the databases on the backend.</para>

</sect2>

<sect2> <title> Postgres-R </title>

<para> This has been a research project at McGill University,
building a multimaster synchronous replication system which uses a
group communications system (<emphasis>e.g.</emphasis> - <ulink
url="http://www.spread.org/"> Spread</ulink>) to control
propagation of update requests, which it captures via adding
<quote>hooks</quote> to the database engine to detect
changes. </para>

<para> Being a research project, the key has been to learn about
replication as opposed to provide a <quote> production grade
</quote> replication system. For a considerable period of time it
was only at all usable on rather old releases of PostgreSQL; it is
now available for recent releases. </para>

<para> The handling of DDL changes has long been somewhat
controversial; several attempts to implement DDL handlers have
been made, none of which has yet <quote>stuck.</quote> </para>

<para> The Slony-II project inherited directly from Postgres-R,
with an intent to create a multimaster synchronous replication
system atop a group communications system, but then to proceed to
something more of <quote>production grade</quote>. </para>

<para> The notable distinction from Postgres-R was that, in order
to find conflicts earlier, and to diminish the amount of work
needing to be done at the synchronization point, Slony-II would
try to publish and promote lock requests as soon as possible. (It
is possible for this to worsen behaviour in some cases.)</para>

<para> Unfortunately several problems emerged: </para>

<itemizedlist>

<listitem><para> The available open source group communications
systems turn out to neither be fast enough nor reliable enough
for the purpose. </para></listitem>

<listitem><para> One of the goals was for there to be as little
need as possible to modify applications to deal with
replication. </para>

<para> Unfortunately, there turn out to be some cases where
competing updates (e.g. - for updates to account balances) would
cause multimaster replication to reject transactions due to
concurrency problems with high frequency. </para>
</listitem>

</itemizedlist>

<para> As a result of those problems, Slony-II efforts have fallen
off somewhat. </para>

<para> The remaining developers plan to join together efforts for
these two projects. There are working prototypes, but it is not
clear when <quote>production grade</quote> versions will
emerge. </para>

</sect2>

</sect1>

</chapter>

<!-- Keep this comment at the end of the file Local variables:
mode:sgml
sgml-omittag:nil
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
sgml-parent-document:postgres.sgml
sgml-default-dtd-file:"./reference.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:("/usr/lib/sgml/catalog")
sgml-local-ecat-files:nil
End: -->

--
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/
What's another word for synonym?


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-01 20:40:28
Message-ID: 20060801204028.GB19514@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Chris Browne wrote:
> Here's a patch to add in the material on replication recently
> discussed on pgsql.docs. I'm not thrilled that there were only a few
> comments made; I'd be happy to see "slicing and dicing" to see this
> made more useful.

s/e.g. -/e.g.,/
s/ - /&ndash;/

The indentation of the SGML file seems at odds with our conventions (we
don't use tabs, for one thing.)

You mention this:

> <para> Common methods include having triggers on tables,
> capturing SQL statements, and capturing transaction log (WAL)
> updates </para>

However you don't mention anything about WAL captures. Mentioning that
PITR is one of these would be good.

In the last few paragraphs, the title is about Postgres-R but then you
comment on Slony-II. Should the title mention both?

> <para> As a result of those problems, Slony-II efforts have fallen
> off somewhat. </para>

s/those/these/ ?

Otherwise looks good to my untrained eyes.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "korryd(at)enterprisedb(dot)com" <korryd(at)enterprisedb(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-01 21:36:33
Message-ID: 1154468193.6827.9.camel@sakai.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

s/sequnce/sequence/

Nice work!

--
Korry Douglas korryd(at)enterprisedb(dot)com
EnterpriseDB http://www.enterprisedb.com


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org, cbbrowne(at)acm(dot)org
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-01 22:34:57
Message-ID: 20060802.073457.85414937.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Thanks for mentioning about pgpool!

> <sect2><title> pgpool </title>
>
> <para> <application>pgpool</application> was initially created by
> Tatsuo Isshii as a portable alternative to Java connection pool
> modules. He subsequently observed that it wouldn't take very much
> effort to extend it to create a simple replication system: if it
> is forwarding SQL queries to a PostgreSQL instance, extending that
> to two databases is very straightforward. </para>
>
> <para> It suffers, by nature, from the problems associated with
> replicating using capture of SQL statements; any sort of
> nondeterminism in the replicated statements will cause the
> databases to diverge. </para>
>
> <para> On the other hand, it is very easy to install and
> configure; for users with simple requirements, that can
> suffice. </para>
>
> <para> A <application>pgpool-2</application> is under way which
> introduces a more sophisticated query parser to try to address the
> nondeterminism issues; that may limit ongoing support for the
> legacy version.</para>

pgpool-II (not pgpool-2, please) does not try to resolve
nondeterminism issues but try to add parallel SELECT query
execution. Also we will continue to support legacy version until
pgpool-II becomes stable enough.

Also you might want to add pgpool development site URL.

FYI, pgpool-II presentation material for PostgreSQL Anniversary Summit
can be obtained from:
http://www.sraoss.co.jp/event_seminar/2006/pgpool_feat_and_devel.pdf
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: Replication Documentation
Date: 2006-08-01 23:45:53
Message-ID: 200608020145.53751.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Chris Browne wrote:
> Here's a patch to add in the material on replication recently
> discussed on pgsql.docs. I'm not thrilled that there were only a few
> comments made; I'd be happy to see "slicing and dicing" to see this
> made more useful.

The agreed-to process was

1. post information on pgsql-general
1.a. solicit comments
2. put information page on web site
3. link from documentation to web site

You seem to have short-circuited all that.

I don't think this sort of material belongs directly into the PostgreSQL
documentation.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: Replication Documentation
Date: 2006-08-02 00:22:17
Message-ID: 44CFF039.10408@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

>
> 1. post information on pgsql-general
> 1.a. solicit comments
> 2. put information page on web site
> 3. link from documentation to web site
>
> You seem to have short-circuited all that.
>
> I don't think this sort of material belongs directly into the PostgreSQL
> documentation.

It might be interesting to have some links in the external projects area
for replication, but a section of its own doesn't seem relevant.

Joshua D. Drkae

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: [HACKERS] Replication Documentation
Date: 2006-08-02 00:39:08
Message-ID: 20060802003908.GD20401@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Joshua D. Drake wrote:

> >I don't think this sort of material belongs directly into the PostgreSQL
> >documentation.

Why not?

> It might be interesting to have some links in the external projects area
> for replication, but a section of its own doesn't seem relevant.

I disagree about "having some links". Maybe we should consider adding
this as a section in the external projects chapter, instead of having a
chapter of its own, but "some links" seems a little short on actual
contents.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: [HACKERS] Replication Documentation
Date: 2006-08-02 00:46:58
Message-ID: 44CFF602.6050608@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Joshua D. Drake wrote:
>
>>> I don't think this sort of material belongs directly into the PostgreSQL
>>> documentation.
>
> Why not?

Well Peter said that, not me :)

>
>> It might be interesting to have some links in the external projects area
>> for replication, but a section of its own doesn't seem relevant.
>
> I disagree about "having some links". Maybe we should consider adding
> this as a section in the external projects chapter, instead of having a
> chapter of its own, but "some links" seems a little short on actual
> contents.

O.k. more specifically, I think that the content (even if it is a
section) probably deserves discussion in the external projects section.

Joshua D. Drake

>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: [HACKERS] Replication Documentation
Date: 2006-08-02 01:10:03
Message-ID: 20060802011003.GG20401@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Joshua D. Drake wrote:
> Alvaro Herrera wrote:
> >Joshua D. Drake wrote:
> >
> >>>I don't think this sort of material belongs directly into the PostgreSQL
> >>>documentation.
> >
> >Why not?
>
> Well Peter said that, not me :)

I know, but I though I'd post one message instead of two. (In fact I
didn't even think about it -- I just assume it's clear.)

> >>It might be interesting to have some links in the external projects area
> >>for replication, but a section of its own doesn't seem relevant.
> >
> >I disagree about "having some links". Maybe we should consider adding
> >this as a section in the external projects chapter, instead of having a
> >chapter of its own, but "some links" seems a little short on actual
> >contents.
>
> O.k. more specifically, I think that the content (even if it is a
> section) probably deserves discussion in the external projects section.

Sure, see my suggestion above.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: [PATCHES] Replication Documentation
Date: 2006-08-02 02:27:25
Message-ID: 200608020427.26195.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> > >I don't think this sort of material belongs directly into the
> > > PostgreSQL documentation.
>
> Why not?

PostgreSQL documentation (or any product documentation) should be
factual: describe what the software does and give advice on its use.
This should be mostly independent of the external circumstances,
because people will still read that documentation three or four years
from now.

The proposed text is, at least partially, journalistic: it evaluates
competing ideas, gives historical and anecdotal information, reports on
current events, and makes speculations about the future. That is the
sort of material that is published in periodicals or other volatile
media.

At the summit, we resolved, for precisely these reasons, to keep the
journalistic parts on the web site, for clear separation from the
shipped product and for easier updates (and for easier reference as
well, because the PostgreSQL documentation is not the single obvious
place to look for it) and refer to it from the documentation.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-02 02:27:33
Message-ID: 877j1rswh6.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

peter_e(at)gmx(dot)net (Peter Eisentraut) wrote:
> Chris Browne wrote:
>> Here's a patch to add in the material on replication recently
>> discussed on pgsql.docs. I'm not thrilled that there were only a few
>> comments made; I'd be happy to see "slicing and dicing" to see this
>> made more useful.
>
> The agreed-to process was
>
> 1. post information on pgsql-general
> 1.a. solicit comments
> 2. put information page on web site
> 3. link from documentation to web site
>
> You seem to have short-circuited all that.
>
> I don't think this sort of material belongs directly into the PostgreSQL
> documentation.

I don't recall that anyone agreed to do anything in particular, let
alone the process being formalized thus.

Bruce was looking for there to be some form of overview of the free
replication options so he'd have some kind of tale to tell about it.
Apparently the issue comes up fairly frequently.

1. I posted information on pgsql-docs
1.a. I solicited comments
2. There being not many of those, I have put together something that
could fit into the documentation.

I frankly don't care all that much where the material goes; if it
ought to be some place else other than in the documentation tree
proper, I'm fine with that.
--
select 'cbbrowne' || '@' || 'gmail.com';
http://linuxdatabases.info/info/postgresql.html
"How much more helpful could I be than to provide you with the
appropriate e-mail address? I could engrave it on a clue-by-four and
deliver it to you in Chicago, I suppose." -- Seen on Slashdot...


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: [PATCHES] Replication Documentation
Date: 2006-08-02 03:02:43
Message-ID: 200608020302.k7232h229060@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I was thinking of something similar to our encryption section:

http://www.postgresql.org/docs/8.1/static/encryption-options.html

The idea being to define issues like multi/single master, async vs,
sync, and mention the projects which are in each category.

---------------------------------------------------------------------------

Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > > >I don't think this sort of material belongs directly into the
> > > > PostgreSQL documentation.
> >
> > Why not?
>
> PostgreSQL documentation (or any product documentation) should be
> factual: describe what the software does and give advice on its use.
> This should be mostly independent of the external circumstances,
> because people will still read that documentation three or four years
> from now.
>
> The proposed text is, at least partially, journalistic: it evaluates
> competing ideas, gives historical and anecdotal information, reports on
> current events, and makes speculations about the future. That is the
> sort of material that is published in periodicals or other volatile
> media.
>
> At the summit, we resolved, for precisely these reasons, to keep the
> journalistic parts on the web site, for clear separation from the
> shipped product and for easier updates (and for easier reference as
> well, because the PostgreSQL documentation is not the single obvious
> place to look for it) and refer to it from the documentation.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-02 03:03:35
Message-ID: 44D01607.9020008@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


>> I don't think this sort of material belongs directly into the PostgreSQL
>> documentation.
>
> I don't recall that anyone agreed to do anything in particular, let
> alone the process being formalized thus.
>
> Bruce was looking for there to be some form of overview of the free
> replication options so he'd have some kind of tale to tell about it.
> Apparently the issue comes up fairly frequently.

Then lets FAQ it.

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: [PATCHES] Replication Documentation
Date: 2006-08-02 13:07:03
Message-ID: 44D0A377.7080706@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

Bruce Momjian wrote:
> The idea being to define issues like multi/single master, async vs,
> sync, and mention the projects which are in each category.

You could even add shared-nothing vs. shared-disk nodes.

Generally I'd say it makes sense to 'educate' people, but does it really
make sense to explain all that if there is no replication solution for
most of these combinations?

I'd vote for an external (not in the documentation) information site
about replication solutions. There we can put all the information we see
fit (even 'journalistic' ones).

I might change my mind once we have multiple replication solutions
covering most situations. ;-)

I like what and how Chris wrote [1] - an overview over existing and
upcomming replication solutions.

Regards

Markus

[1]: I can't find Chris' original message. My answer to it is in the
archives, but not the original message. Why is that? (Thread view says
'message not available'). My answer contains Chris' text, though:
http://archives.postgresql.org/pgsql-docs/2006-07/msg00019.php


From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Replication Documentation
Date: 2006-08-02 13:28:25
Message-ID: 44D0A879.6000206@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hello,

Peter Eisentraut wrote:
> 1. post information on pgsql-general
> 1.a. solicit comments
> 2. put information page on web site
> 3. link from documentation to web site

I don't remember such a clear agreement either. I'm glad Chris has
written something. And posting it to -docs seems a much better fit, IMHO.

Also, I think we didn't really agree on where exactly to put what
information. See my previous mail on -hackers for my opinion on that.

> I don't think this sort of material belongs directly into the PostgreSQL
> documentation.

I agree with that.

Regards

Markus


From: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-02 15:52:20
Message-ID: 1154533940.291915.258900@75g2000cwc.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Alvaro Herrera wrote:
> > > >I don't think this sort of material belongs directly into the
> > > > PostgreSQL documentation.
> >
> > Why not?
>
> PostgreSQL documentation (or any product documentation) should be
> factual: describe what the software does and give advice on its use.
> This should be mostly independent of the external circumstances,
> because people will still read that documentation three or four years
> from now.
>
> The proposed text is, at least partially, journalistic: it evaluates
> competing ideas, gives historical and anecdotal information, reports on
> current events, and makes speculations about the future. That is the
> sort of material that is published in periodicals or other volatile
> media.

I can see value in documenting what replication systems are known to
work (for some definition of work) with a given release in the
documentation for that release. Five years down the road when I'm
trying to implement replication for a client who's somehow locked into
postgres 8.2 (for whatever reason), it would be very helpful to know
that slony1.2 is an option. I don't know if this is sufficient
justification.

Including a separate page on the history of postgres replication to
date also makes some sense, at least to me. It should be relatively
easy to maintain.

If we do talk about replicatoin, then including a probably separate and
presumably quite static page on the taxonomy of replication seems
necessary. As Chris notes, the term replication by it'self is can mean
quite a number of things.

> At the summit, we resolved, for precisely these reasons, to keep the
> journalistic parts on the web site, for clear separation from the
> shipped product and for easier updates (and for easier reference as
> well, because the PostgreSQL documentation is not the single obvious
> place to look for it) and refer to it from the documentation.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org


From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Andrew Hammond <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Subject: Re: Replication Documentation
Date: 2006-08-02 16:42:40
Message-ID: 44D0D600.30201@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

Andrew Hammond wrote:
> I can see value in documenting what replication systems are known to
> work (for some definition of work) with a given release in the
> documentation for that release. Five years down the road when I'm
> trying to implement replication for a client who's somehow locked into
> postgres 8.2 (for whatever reason), it would be very helpful to know
> that slony1.2 is an option. I don't know if this is sufficient
> justification.

Please keep in mind, that most replication solutions (that I know of)
are quite independent from the PostgreSQL version used. Thus,
documenting which version of PostgreSQL can be used with which version
of a replication system should better be covered in the documentation of
the replication system. Otherwise you would have to update the
PostgreSQL documentation for new releases of your favorite replication
system - which seems to lead to confusion.

> Including a separate page on the history of postgres replication to
> date also makes some sense, at least to me. It should be relatively
> easy to maintain.

I agree that having such a 'replication guide for users of PostgreSQL'
is a good thing to have. But I think not much of that should be part of
the official PostgreSQL documentation - mainly because the replication
solutions are not part of PostgreSQL.

Regards

Markus


From: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-03 16:11:38
Message-ID: 1154621498.058687.178800@75g2000cwc.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Markus Schiltknecht wrote:
> Hi,
>
> Andrew Hammond wrote:
> > I can see value in documenting what replication systems are known to
> > work (for some definition of work) with a given release in the
> > documentation for that release. Five years down the road when I'm
> > trying to implement replication for a client who's somehow locked into
> > postgres 8.2 (for whatever reason), it would be very helpful to know
> > that slony1.2 is an option. I don't know if this is sufficient
> > justification.
>
> Please keep in mind, that most replication solutions (that I know of)
> are quite independent from the PostgreSQL version used. Thus,
> documenting which version of PostgreSQL can be used with which version
> of a replication system should better be covered in the documentation of
> the replication system.

I would agree to this with the caveat that there needs to be something
in the postgres documentation that points people to the various
replication systems available.

> Otherwise you would have to update the
> PostgreSQL documentation for new releases of your favorite replication
> system - which seems to lead to confusion.

Yeah, updating the docs based on other software releases would suck.
How about "what works with a given release at the time of the release"?
Perhaps this could be limited to a pointer to the docs for such
replication systems, and maybe a very brief description (based on
Chris' taxonomy)?

> > Including a separate page on the history of postgres replication to
> > date also makes some sense, at least to me. It should be relatively
> > easy to maintain.
>
> I agree that having such a 'replication guide for users of PostgreSQL'
> is a good thing to have. But I think not much of that should be part of
> the official PostgreSQL documentation - mainly because the replication
> solutions are not part of PostgreSQL.

Arguably, neither are most of the procedural languages in the Server
Programming section of the documentation, and yet they're included. I
agree that it's improtant to keep the documentation from getting
cluttered up with stuff that's "not part of PostgreSQL". However, I
think the very fact so many people assume that there's no replication
for PostgreSQL simply because it's not mentioned in the documentation
shows that for many people replication is precieved as "part of" the
dbms. Even a single page in the documentation wich consists of
something along the lines of the following would help these folks find
what they're looking for.

"There are a number of different approaches to solving the problem of
replication, each with strengths and weaknesses. As a result, there are
a number of different replication solutions available for PostgreSQL.
To find out more, please refer to the website."


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
Subject: Re: Replication Documentation
Date: 2006-08-03 17:16:15
Message-ID: 200608031916.15621.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Andrew Hammond wrote:
> How about "what works with a given release at the time of the
> release"?

We just threw that idea out in the context of the procedural language
discussion because we do not have the resources to check what works.

> Arguably, neither are most of the procedural languages in the Server
> Programming section of the documentation, and yet they're included.

That is false. The documentation documents exactly those pieces of code
that we distribute.

> "There are a number of different approaches to solving the problem of
> replication, each with strengths and weaknesses. As a result, there
> are a number of different replication solutions available for
> PostgreSQL. To find out more, please refer to the website."

Well, that's what I've been talking about all along, and it has also
been the resolution at the Toronto meeting.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replication Documentation
Date: 2006-08-03 18:14:04
Message-ID: 1154628844.236574.48710@i42g2000cwa.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> > "There are a number of different approaches to solving the problem of
> > replication, each with strengths and weaknesses. As a result, there
> > are a number of different replication solutions available for
> > PostgreSQL. To find out more, please refer to the website."
>
> Well, that's what I've been talking about all along, and it has also
> been the resolution at the Toronto meeting.

Great. Is the above text sufficient for the documentation then, or does
anyone have a suggestion on how to say this better?

Drew