Re: Core team statement on replication in PostgreSQL

Lists: pgsql-advocacypgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Core team statement on replication in PostgreSQL
Date: 2008-05-29 14:12:55
Message-ID: 26529.1212070375@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

The Postgres core team met at PGCon to discuss a few issues, the largest
of which is the need for simple, built-in replication for PostgreSQL.
Historically the project policy has been to avoid putting replication
into core PostgreSQL, so as to leave room for development of competing
solutions, recognizing that there is no "one size fits all" replication
solution. However, it is becoming clear that this policy is hindering
acceptance of PostgreSQL to too great an extent, compared to the benefit
it offers to the add-on replication projects. Users who might consider
PostgreSQL are choosing other database systems because our existing
replication options are too complex to install and use for simple cases.
In practice, simple asynchronous single-master-multiple-slave
replication covers a respectable fraction of use cases, so we have
concluded that we should allow such a feature to be included in the core
project. We emphasize that this is not meant to prevent continued
development of add-on replication projects that cover more complex use
cases.

We believe that the most appropriate base technology for this is
probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
We hope that such a feature can be completed for 8.4. Ideally this
would be coupled with the ability to execute read-only queries on the
slave servers, but we see technical difficulties that might prevent that
from being completed before 8.5 or even further out. (The big problem
is that long-running slave-side queries might still need tuples that are
vacuumable on the master, and so replication of vacuuming actions would
cause the slave's queries to deliver wrong answers.)

Again, this will not replace Slony, pgPool, Continuent, Londiste, or
other systems for many users, as it will be not be highly scalable nor
support long-distance replication nor replicating less than an entire
installation. But it is time to include a simple, reliable basic
replication feature in the core system.

regards, tom lane


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 14:54:03
Message-ID: e51f66da0805290754p1b8d82b1m268e88b99b8cb1eb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/29/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The Postgres core team met at PGCon to discuss a few issues, the largest
> of which is the need for simple, built-in replication for PostgreSQL.
> Historically the project policy has been to avoid putting replication
> into core PostgreSQL, so as to leave room for development of competing
> solutions, recognizing that there is no "one size fits all" replication
> solution. However, it is becoming clear that this policy is hindering
> acceptance of PostgreSQL to too great an extent, compared to the benefit
> it offers to the add-on replication projects. Users who might consider
> PostgreSQL are choosing other database systems because our existing
> replication options are too complex to install and use for simple cases.
> In practice, simple asynchronous single-master-multiple-slave
> replication covers a respectable fraction of use cases, so we have
> concluded that we should allow such a feature to be included in the core
> project. We emphasize that this is not meant to prevent continued
> development of add-on replication projects that cover more complex use
> cases.
>
> We believe that the most appropriate base technology for this is
> probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
> We hope that such a feature can be completed for 8.4.

+1

Although I would explain it more shortly - we do need a solution for
lossless failover servers and such solution needs to live in core backend.

> Ideally this
> would be coupled with the ability to execute read-only queries on the
> slave servers, but we see technical difficulties that might prevent that
> from being completed before 8.5 or even further out. (The big problem
> is that long-running slave-side queries might still need tuples that are
> vacuumable on the master, and so replication of vacuuming actions would
> cause the slave's queries to deliver wrong answers.)

Well, both Slony-I and upcoming Skytools 3 have the same problem when
cleaning events and have it solved simply by slaves reporting back their
lowest position on event stream. I cannot see why it cannot be applied
in this case too. So each slave just needs to report its own longest
open tx as "open" to master. Yes, it bloats master but no way around it.

Only problem could be the plan to vacuum tuples updated in between long
running tx and the regular ones, but such behaviour can be just turned off.

We could also have a option of "inaccessible slave", for those who
fear bloat on master.

--
marko


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:21:05
Message-ID: 20080529152105.GO16218@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:
> The Postgres core team met at PGCon to discuss a few issues, the
> largest of which is the need for simple, built-in replication for
> PostgreSQL. Historically the project policy has been to avoid
> putting replication into core PostgreSQL, so as to leave room for
> development of competing solutions, recognizing that there is no
> "one size fits all" replication solution. However, it is becoming
> clear that this policy is hindering acceptance of PostgreSQL to too
> great an extent, compared to the benefit it offers to the add-on
> replication projects. Users who might consider PostgreSQL are
> choosing other database systems because our existing replication
> options are too complex to install and use for simple cases. In
> practice, simple asynchronous single-master-multiple-slave
> replication covers a respectable fraction of use cases, so we have
> concluded that we should allow such a feature to be included in the
> core project. We emphasize that this is not meant to prevent
> continued development of add-on replication projects that cover more
> complex use cases.
>
> We believe that the most appropriate base technology for this is
> probably real-time WAL log shipping, as was demoed by NTT OSS at
> PGCon. We hope that such a feature can be completed for 8.4.

> Ideally this would be coupled with the ability to execute read-only
> queries on the slave servers, but we see technical difficulties that
> might prevent that from being completed before 8.5 or even further
> out. (The big problem is that long-running slave-side queries might
> still need tuples that are vacuumable on the master, and so
> replication of vacuuming actions would cause the slave's queries to
> deliver wrong answers.)

This part is a deal-killer. It's a giant up-hill slog to sell warm
standby to those in charge of making resources available because the
warm standby machine consumes SA time, bandwidth, power, rack space,
etc., but provides no tangible benefit, and this feature would have
exactly the same problem.

IMHO, without the ability to do read-only queries on slaves, it's not
worth doing this feature at all.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "David Fetter" <david(at)fetter(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:40:57
Message-ID: e51f66da0805290840j71bfaaam70ff3a4054f22440@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/29/08, David Fetter <david(at)fetter(dot)org> wrote:
> On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:
> > Ideally this would be coupled with the ability to execute read-only
> > queries on the slave servers, but we see technical difficulties that
> > might prevent that from being completed before 8.5 or even further
> > out. (The big problem is that long-running slave-side queries might
> > still need tuples that are vacuumable on the master, and so
> > replication of vacuuming actions would cause the slave's queries to
> > deliver wrong answers.)
>
> This part is a deal-killer. It's a giant up-hill slog to sell warm
> standby to those in charge of making resources available because the
> warm standby machine consumes SA time, bandwidth, power, rack space,
> etc., but provides no tangible benefit, and this feature would have
> exactly the same problem.
>
> IMHO, without the ability to do read-only queries on slaves, it's not
> worth doing this feature at all.

I would not be so harsh - I'd like to have the lossless standby even
without read-only slaves.

But Tom's mail gave me impression core wants to wait until we get "perfect"
read-only slave implementation so we wait with it until 8.6, which does
not seem sensible. If we can do slightly inefficient (but simple)
implementation
right now, I see no reason to reject it, we can always improve it later.

Especially as it can be switchable. And we could also have
transaction_timeout paramenter on slaves so the hit on master is limited.

--
marko


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:46:22
Message-ID: 1212075982.26576.10.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:
> On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:

> This part is a deal-killer. It's a giant up-hill slog to sell warm
> standby to those in charge of making resources available because the
> warm standby machine consumes SA time, bandwidth, power, rack space,
> etc., but provides no tangible benefit, and this feature would have
> exactly the same problem.
>
> IMHO, without the ability to do read-only queries on slaves, it's not
> worth doing this feature at all.

The only question I have is... what does this give us that PITR doesn't
give us?

Sincerely,

Joshua D. Drake


From: "Douglas McNaught" <doug(at)mcnaught(dot)org>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:48:25
Message-ID: 5ded07e00805290848v4a539d15v67d2b57afe8f797e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:

> The only question I have is... what does this give us that PITR doesn't
> give us?

I think the idea is that WAL records would be shipped (possibly via
socket) and applied as they're generated, rather than on a
file-by-file basis. At least that's what "real-time" implies to me...

-Doug


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:49:54
Message-ID: 483ED0A2.7060006@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Marko,

> But Tom's mail gave me impression core wants to wait until we get "perfect"
> read-only slave implementation so we wait with it until 8.6, which does
> not seem sensible. If we can do slightly inefficient (but simple)
> implementation
> right now, I see no reason to reject it, we can always improve it later.

That's incorrect. We're looking for a workable solution. If we could
get one for 8.4, that would be brilliant but we think it's going to be
harder than that.

Publishing the XIDs back to the master is one possibility. We also
looked at using "spillover segments" for vacuumed rows, but that seemed
even less viable.

I'm also thinking, for *async replication*, that we could simply halt
replication on the slave whenever a transaction passes minxid on the
master. However, the main focus will be on synchrounous hot standby.

--Josh


From: David Fetter <david(at)fetter(dot)org>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:53:03
Message-ID: 20080529155303.GQ16218@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
> On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:
> > This part is a deal-killer. It's a giant up-hill slog to sell
> > warm standby to those in charge of making resources available
> > because the warm standby machine consumes SA time, bandwidth,
> > power, rack space, etc., but provides no tangible benefit, and
> > this feature would have exactly the same problem.
> >
> > IMHO, without the ability to do read-only queries on slaves, it's
> > not worth doing this feature at all.
>
> The only question I have is... what does this give us that PITR
> doesn't give us?

It looks like a wrapper for PITR to me, so the gain would be ease of
use.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Rick Vernam <rickv(at)hobi(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:58:19
Message-ID: 200805291058.19443.rickv@hobi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thursday 29 May 2008 09:54:03 am Marko Kreen wrote:
> On 5/29/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > The Postgres core team met at PGCon to discuss a few issues, the largest
> >  of which is the need for simple, built-in replication for PostgreSQL.
> >  Historically the project policy has been to avoid putting replication
> >  into core PostgreSQL, so as to leave room for development of competing
> >  solutions, recognizing that there is no "one size fits all" replication
> >  solution.  However, it is becoming clear that this policy is hindering
> >  acceptance of PostgreSQL to too great an extent, compared to the benefit
> >  it offers to the add-on replication projects.  Users who might consider
> >  PostgreSQL are choosing other database systems because our existing
> >  replication options are too complex to install and use for simple cases.
> >  In practice, simple asynchronous single-master-multiple-slave
> >  replication covers a respectable fraction of use cases, so we have
> >  concluded that we should allow such a feature to be included in the core
> >  project.  We emphasize that this is not meant to prevent continued
> >  development of add-on replication projects that cover more complex use
> >  cases.
> >
> >  We believe that the most appropriate base technology for this is
> >  probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
> >  We hope that such a feature can be completed for 8.4.
>
> +1
>
> Although I would explain it more shortly - we do need a solution for
> lossless failover servers and such solution needs to live in core backend.

+1 for lossless failover (ie, synchronous)


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 15:58:31
Message-ID: 200805291558.m4TFwVt27309@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Josh Berkus wrote:
> Marko,
>
> > But Tom's mail gave me impression core wants to wait until we get "perfect"
> > read-only slave implementation so we wait with it until 8.6, which does
> > not seem sensible. If we can do slightly inefficient (but simple)
> > implementation
> > right now, I see no reason to reject it, we can always improve it later.
>
> That's incorrect. We're looking for a workable solution. If we could
> get one for 8.4, that would be brilliant but we think it's going to be
> harder than that.
>
> Publishing the XIDs back to the master is one possibility. We also
> looked at using "spillover segments" for vacuumed rows, but that seemed
> even less viable.
>
> I'm also thinking, for *async replication*, that we could simply halt
> replication on the slave whenever a transaction passes minxid on the
> master. However, the main focus will be on synchrounous hot standby.

Another idea I discussed with Tom is having the slave _delay_ applying
WAL files until all slave snapshots are ready.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Dave Page" <dpage(at)pgadmin(dot)org>
To: "Douglas McNaught" <doug(at)mcnaught(dot)org>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:00:55
Message-ID: 937d27e10805290900t216a1a77wd33f78f1bba53fd5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 4:48 PM, Douglas McNaught <doug(at)mcnaught(dot)org> wrote:
> On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
>
>> The only question I have is... what does this give us that PITR doesn't
>> give us?
>
> I think the idea is that WAL records would be shipped (possibly via
> socket) and applied as they're generated, rather than on a
> file-by-file basis. At least that's what "real-time" implies to me...

Yes, we're talking real-time streaming (synchronous) log shipping.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


From: Mathias Brossard <mathias(dot)brossard(at)opentrust(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:01:44
Message-ID: 483ED368.5090408@opentrust.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Tom Lane wrote:
> In practice, simple asynchronous single-master-multiple-slave
> replication covers a respectable fraction of use cases, so we have
> concluded that we should allow such a feature to be included in the core
> project. We emphasize that this is not meant to prevent continued
> development of add-on replication projects that cover more complex use
> cases.

IMHO, this will help PostgreSQL adoption, mindshare and even boost interest in
development for the other replication use cases.

> We believe that the most appropriate base technology for this is
> probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.

The slides are up at http://www.pgcon.org/2008/schedule/events/76.en.html
From what I gather from those slides it seems to me that the NTT solution is
synchronous not asynchronous. In my opinion it's even better, but I do
understand that others might prefer asynchronous. I'm going to speculate, but I
would think it should be possible (without a substancial rewrite) to support
both modes (or even some intermediate modes, like DRBD on Linux).

> We hope that such a feature can be completed for 8.4. Ideally this
> would be coupled with the ability to execute read-only queries on the
> slave servers, but we see technical difficulties that might prevent that
> from being completed before 8.5 or even further out. (The big problem
> is that long-running slave-side queries might still need tuples that are
> vacuumable on the master, and so replication of vacuuming actions would
> cause the slave's queries to deliver wrong answers.)

From the 8.4dev documentation, another problem for read-only slaves would be :
« Operations on hash indexes are not presently WAL-logged, so replay will not
update these indexes. The recommended workaround is to manually REINDEX each
such index after completing a recovery operation. ».

Sincerely,
--
Mathias Brossard

Attachment Content-Type Size
mathias_brossard.vcf text/x-vcard 191 bytes

From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:10:05
Message-ID: 20080529161005.GC27560@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

* Josh Berkus <josh(at)agliodbs(dot)com> [080529 11:52]:
> Marko,
>
> >But Tom's mail gave me impression core wants to wait until we get "perfect"
> >read-only slave implementation so we wait with it until 8.6, which does
> >not seem sensible. If we can do slightly inefficient (but simple)
> >implementation
> >right now, I see no reason to reject it, we can always improve it later.
>
> That's incorrect. We're looking for a workable solution. If we could
> get one for 8.4, that would be brilliant but we think it's going to be
> harder than that.
>
> Publishing the XIDs back to the master is one possibility. We also
> looked at using "spillover segments" for vacuumed rows, but that seemed
> even less viable.
>
> I'm also thinking, for *async replication*, that we could simply halt
> replication on the slave whenever a transaction passes minxid on the
> master. However, the main focus will be on synchrounous hot standby.

Or, instead of statement timeout killing statements on the RO slave,
simply kill any "old" transactions on the RO slave. "Old" in the sense
that the master's xmin has passed it. And it's just an exersise in
controlling the age of xmin on the master, which could even be done
user-side.

Doesn't fit all, but no one size does... It would work for where you're
hammering your slaves with a diverse set of high-velocity short queries
that you're trying to avoid on the master...

An option to "pause reply (making it async)" or "abort transactions
(for sync)" might make it possible to easily run an async slave for slow
reporting queries, and a sync slave for short queries.

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:10:26
Message-ID: 483ED572.3000606@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Joshua D. Drake wrote:
>
> On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:
>> On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:
>
>> This part is a deal-killer. It's a giant up-hill slog to sell warm
>> standby to those in charge of making resources available because the
>> warm standby machine consumes SA time, bandwidth, power, rack space,
>> etc., but provides no tangible benefit, and this feature would have
>> exactly the same problem.
>>
>> IMHO, without the ability to do read-only queries on slaves, it's not
>> worth doing this feature at all.
>
> The only question I have is... what does this give us that PITR doesn't
> give us?

Since people seem to be unclear on what we're proposing:

8.4 Synchronous Warm Standby: makes PostgreSQL more suitable for HA
systems by eliminating failover data loss and cutting failover time.

8.5 (probably) Synchronous & Asynchronous Hot Standby: adds read-only
queries on slaves to the above.

Again, if we can implement queries on slaves for 8.4, we're all for it.
However, after conversations in Core and with Simon we all think it's
going to be too big a task to complete in 4-5 months. We *don't* want
to end up delaying 8.4 for 5 months because we're debugging hot standby.

--Josh


From: Brian Hurt <bhurt(at)janestcapital(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:11:21
Message-ID: 483ED5A9.9000000@janestcapital.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter wrote:

>
>This part is a deal-killer. It's a giant up-hill slog to sell warm
>standby to those in charge of making resources available because the
>warm standby machine consumes SA time, bandwidth, power, rack space,
>etc., but provides no tangible benefit, and this feature would have
>exactly the same problem.
>
>IMHO, without the ability to do read-only queries on slaves, it's not
>worth doing this feature at all.
>
>
>

I don't think I agree with this. There are a large number of situations
where it's positive expectancy to do precisely this- it's not unlike
buying a $1 lottery ticket with a 1 chance in 100 of winning $1000- the
vast majority of the time (99 times out of 100), you're going to lose
$1. But when you win, you win big, and make up for all the small losses
you incurred getting there and then some. Failover machines are like
that- most of the time they're negative value, as you said- taking up SA
time, bandwidth, power, rack space, money, etc. But every once in a
(great) while, they save you. If the cost of having the database down
for hours or days (as you madly try to next-day replacement hardware)
isn't that great, then no, this isn't worthwhile- but in cases where
the database being down chalks up the lost money quickly, this is easy
to cost-justify.

Being able to do read-only queries makes this feature more valuable in
more situations, but I disagree that it's a deal-breaker.

Brian


From: David Fetter <david(at)fetter(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:12:02
Message-ID: 20080529161202.GR16218@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 11:58:31AM -0400, Bruce Momjian wrote:
> Josh Berkus wrote:
> > Publishing the XIDs back to the master is one possibility. We
> > also looked at using "spillover segments" for vacuumed rows, but
> > that seemed even less viable.
> >
> > I'm also thinking, for *async replication*, that we could simply
> > halt replication on the slave whenever a transaction passes minxid
> > on the master. However, the main focus will be on synchrounous
> > hot standby.
>
> Another idea I discussed with Tom is having the slave _delay_
> applying WAL files until all slave snapshots are ready.

Either one of these would be great, but something that involves
machines that stay useless most of the time is just not going to work.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:13:20
Message-ID: 200805291613.m4TGDKs19087@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter wrote:
> On Thu, May 29, 2008 at 11:58:31AM -0400, Bruce Momjian wrote:
> > Josh Berkus wrote:
> > > Publishing the XIDs back to the master is one possibility. We
> > > also looked at using "spillover segments" for vacuumed rows, but
> > > that seemed even less viable.
> > >
> > > I'm also thinking, for *async replication*, that we could simply
> > > halt replication on the slave whenever a transaction passes minxid
> > > on the master. However, the main focus will be on synchrounous
> > > hot standby.
> >
> > Another idea I discussed with Tom is having the slave _delay_
> > applying WAL files until all slave snapshots are ready.
>
> Either one of these would be great, but something that involves
> machines that stay useless most of the time is just not going to work.

Right, the ultimate target is to have the slave be read-only, but we
need to get the streaming of WAL logs done first.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Douglas McNaught <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:16:15
Message-ID: 20080529161615.GD27560@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

* Dave Page <dpage(at)pgadmin(dot)org> [080529 12:03]:
> On Thu, May 29, 2008 at 4:48 PM, Douglas McNaught <doug(at)mcnaught(dot)org> wrote:

> > I think the idea is that WAL records would be shipped (possibly via
> > socket) and applied as they're generated, rather than on a
> > file-by-file basis. At least that's what "real-time" implies to me...
>
> Yes, we're talking real-time streaming (synchronous) log shipping.

But synchronous streaming doesn't mean the WAL has to be *applied* on
the salve yet. Just that it has to be "safely" on the slave (i.e on
disk, not just in kernel buffers).

The whole single-threaded WAL replay problem is going to rear it's ugly
head here too, and mean that a slave *won't* be able to keep up with a
busy master if it's actually trying to apply all the changes in
real-time. Well, actually, if it's synchronous, it will keep up, but it
just means that now your master is IO capabilities is limited to the
speed of the slaves single-threaded WAL application.

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:18:44
Message-ID: 483ED764.7050804@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce,

> Another idea I discussed with Tom is having the slave _delay_ applying
> WAL files until all slave snapshots are ready.
>

Well, again, that only works for async mode. I personally think that's
the correct solution for async. But for synch mode, I think we need to
push the xids back to the master; generally if a user is running in
synch mode they're concerned about failover time and zero data loss, so
holding back the WAL files doesn't make sense.

Also, if you did delay applying WAL files on an async slave, you'd reach
a point (perhaps after a 6-hour query) where it'd actually be cheaper to
rebuild the slave than to apply the pent-up WAL files.

--Josh Berkus


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dave Page <dpage(at)pgadmin(dot)org>
Cc: Douglas McNaught <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:19:48
Message-ID: 483ED7A4.50703@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Dave Page wrote:
> On Thu, May 29, 2008 at 4:48 PM, Douglas McNaught <doug(at)mcnaught(dot)org> wrote:
>
>> On Thu, May 29, 2008 at 11:46 AM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
>>
>>
>>> The only question I have is... what does this give us that PITR doesn't
>>> give us?
>>>
>> I think the idea is that WAL records would be shipped (possibly via
>> socket) and applied as they're generated, rather than on a
>> file-by-file basis. At least that's what "real-time" implies to me...
>>
>
> Yes, we're talking real-time streaming (synchronous) log shipping.
>

That's not what Tom's email said, AIUI. "Synchronous" replication surely
means that the master and slave always have the same set of transactions
applied. Streaming <> synchronous. But streaming log shipping will allow
us to get get closer to synchronicity in some situations, i.e. the
window for missing transactions will be much smaller.

Some of us were discussing this late on Friday night after PGcon. ISTM
that we can have either 1) fairly hot failover slaves that are
guaranteed to be almost up to date, or 2) slaves that can support
read-only transactions but might get somewhat out of date if they run
long transactions. The big problem is in having slaves which are both
highly up to date and support arbitrary read-only transactions. Maybe in
the first instance, at least, we need to make slaves choose which role
they will play.

cheers

andrew


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:20:37
Message-ID: e51f66da0805290920j5ced81f4i992b13aa3ce7d46@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/29/08, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> On Thu, 2008-05-29 at 08:21 -0700, David Fetter wrote:
> > On Thu, May 29, 2008 at 10:12:55AM -0400, Tom Lane wrote:
> > This part is a deal-killer. It's a giant up-hill slog to sell warm
> > standby to those in charge of making resources available because the
> > warm standby machine consumes SA time, bandwidth, power, rack space,
> > etc., but provides no tangible benefit, and this feature would have
> > exactly the same problem.
> >
> > IMHO, without the ability to do read-only queries on slaves, it's not
> > worth doing this feature at all.
>
> The only question I have is... what does this give us that PITR doesn't
> give us?

Tom is talking about synchronous WAL replication.

So you can do lossless failover. Currently there is no good
solution for this.

And it needs to live in core backend. Yes, it could somehow be implemented
by filling backend with hooks, but the question is how it will get synced
with changes in core backend after couple of releases? The WAL writing
and txid/snapshot handling receive heavy changes on each release.

No external project that needs deep hooks has been able to keep pace with
core changes thus far. Unless heavily commercially backed which means
not open-source. Companies can tell the price they pay for such syncing..

Other solution would be indeed to have fixed hooks guaranteed to be stable
between releases. (replica-hooks-discuss?) But that would mean limiting
the changes we can do with WAL-writing/snapshot handling code and that
does not seem like attractive solution.

By having such replication code that tightly ties into core code
included in main Postgres source, we are still free to do any changes
we feel like and not be tied into external API promises.

--
marko


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:20:55
Message-ID: 200805291620.m4TGKtY25944@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Josh Berkus wrote:
> Bruce,
>
> > Another idea I discussed with Tom is having the slave _delay_ applying
> > WAL files until all slave snapshots are ready.
> >
>
> Well, again, that only works for async mode. I personally think that's
> the correct solution for async. But for synch mode, I think we need to
> push the xids back to the master; generally if a user is running in
> synch mode they're concerned about failover time and zero data loss, so
> holding back the WAL files doesn't make sense.

You send the WAL to the slave, but the slave doesn't apply them right
away --- it isn't related to async.

> Also, if you did delay applying WAL files on an async slave, you'd reach
> a point (perhaps after a 6-hour query) where it'd actually be cheaper to
> rebuild the slave than to apply the pent-up WAL files.

True.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Steve Atkins <steve(at)blighty(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:22:26
Message-ID: 4AE4C6AA-6F0C-4289-8F16-CFF46ED20DA0@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


On May 29, 2008, at 9:12 AM, David Fetter wrote:

> On Thu, May 29, 2008 at 11:58:31AM -0400, Bruce Momjian wrote:
>> Josh Berkus wrote:
>>> Publishing the XIDs back to the master is one possibility. We
>>> also looked at using "spillover segments" for vacuumed rows, but
>>> that seemed even less viable.
>>>
>>> I'm also thinking, for *async replication*, that we could simply
>>> halt replication on the slave whenever a transaction passes minxid
>>> on the master. However, the main focus will be on synchrounous
>>> hot standby.
>>
>> Another idea I discussed with Tom is having the slave _delay_
>> applying WAL files until all slave snapshots are ready.
>
> Either one of these would be great, but something that involves
> machines that stay useless most of the time is just not going to work.

I have customers who are thinking about warm standby functionality, and
the only thing stopping them deploying it is complexity and maintenance,
not the cost of the HA hardware. If trivial-to-deploy replication that
didn't
offer read-only access of the slaves were available today I'd bet that
most
of them would be using it.

Read-only slaves would certainly be nice, but (for me) it's making it
trivial to
deploy and maintain that's more interesting.

Cheers,
Steve


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Aidan Van Dyk" <aidan(at)highrise(dot)ca>
Cc: "Dave Page" <dpage(at)pgadmin(dot)org>, "Douglas McNaught" <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:27:38
Message-ID: e51f66da0805290927w2edd8318pdf717b76fec4c700@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/29/08, Aidan Van Dyk <aidan(at)highrise(dot)ca> wrote:
> * Dave Page <dpage(at)pgadmin(dot)org> [080529 12:03]:
> > On Thu, May 29, 2008 at 4:48 PM, Douglas McNaught <doug(at)mcnaught(dot)org> wrote:
> > > I think the idea is that WAL records would be shipped (possibly via
> > > socket) and applied as they're generated, rather than on a
> > > file-by-file basis. At least that's what "real-time" implies to me...
> >
> > Yes, we're talking real-time streaming (synchronous) log shipping.
>
> But synchronous streaming doesn't mean the WAL has to be *applied* on
> the salve yet. Just that it has to be "safely" on the slave (i.e on
> disk, not just in kernel buffers).
>
> The whole single-threaded WAL replay problem is going to rear it's ugly
> head here too, and mean that a slave *won't* be able to keep up with a
> busy master if it's actually trying to apply all the changes in
> real-time. Well, actually, if it's synchronous, it will keep up, but it
> just means that now your master is IO capabilities is limited to the
> speed of the slaves single-threaded WAL application.

I don't think thats a problem. If the user runs its server at the
limit of write-bandwidth, thats its problem.

IOW, with synchronous replication, we _want_ the server to lag behind
slaves.

About the single-threading problem - afaik, the replay is mostly I/O bound
so threading would not buy you much.

--
marko


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:29:58
Message-ID: 1212078598.27385.4.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 09:10 -0700, Josh Berkus wrote:
> Joshua D. Drake wrote:
> >
> > The only question I have is... what does this give us that PITR doesn't
> > give us?
>
> Since people seem to be unclear on what we're proposing:
>
> 8.4 Synchronous Warm Standby: makes PostgreSQL more suitable for HA
> systems by eliminating failover data loss and cutting failover time.
>

What does this give us that Solaris Cluster, RedHat Cluster, DRBD etc..
doesn't give us? I am not trying to be a poison pill, but I am just not
seeing the benefit over what solutions that already exist. I could
probably argue if I had more time, that this solution doesn't do
anything but make us look like we are half baked in implementation.

If the real goal is read-only slaves with synchronous capability, then
let's implement that. If we can't do that by 8.4 it gets pushed to 8.5.
We already have a dozen different utilities to give us what is being
currently proposed.

Sincerely,

Joshua D. Drake


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:35:41
Message-ID: 20080529163541.GE27560@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

* Marko Kreen <markokr(at)gmail(dot)com> [080529 12:27]:

> I don't think thats a problem. If the user runs its server at the
> limit of write-bandwidth, thats its problem.
>
> IOW, with synchronous replication, we _want_ the server to lag behind
> slaves.
>
> About the single-threading problem - afaik, the replay is mostly I/O bound
> so threading would not buy you much.

Right - the problem is that the master has N>1 backends working away,
preloading the modified heap pages into shared buffers, where they are
modified w/ WAL. This means the kernel/controller has man read-requests
in flight at a time as the modifies/writes chug along. The slave has to
read/modify/write every buffer, one at a time, as WAL arrives, meaning
there is ever only 1 IO request in flight at a time.

So the server as a queue of many parallel reads going on, the slave has
a set of sequential random reads going on.

a.
--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 16:40:14
Message-ID: 594.1212079214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Dave Page wrote:
>> Yes, we're talking real-time streaming (synchronous) log shipping.

> That's not what Tom's email said, AIUI.

Sorry, I was a bit sloppy about that. If we go with a WAL-shipping
solution it would be pretty easy to support both synchronous and
asynchronous cases (synchronous == master doesn't report commit until
the WAL is down to disk on the slaves too). There are different
use-cases for both so it'd make sense to do both.

regards, tom lane


From: Shane Ambler <pgsql(at)Sheeky(dot)Biz>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 17:11:08
Message-ID: 483EE3AC.9050309@Sheeky.Biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter wrote:

> This part is a deal-killer. It's a giant up-hill slog to sell warm
> standby to those in charge of making resources available because the
> warm standby machine consumes SA time, bandwidth, power, rack space,
> etc., but provides no tangible benefit, and this feature would have
> exactly the same problem.
>
> IMHO, without the ability to do read-only queries on slaves, it's not
> worth doing this feature at all.

+1

I would think that a read-only WAL slave is more valuable than a
real-time backup. (especially as the topic is about adding slaves not
increasing the effectiveness of backups)

I also think that starting with a read-only WAL slave will ease the
transition between delayed slave updating and real-time slave updating.

--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 17:37:14
Message-ID: 1524.1212082634@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter <david(at)fetter(dot)org> writes:
> On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
>> The only question I have is... what does this give us that PITR
>> doesn't give us?

> It looks like a wrapper for PITR to me, so the gain would be ease of
> use.

A couple of points about that:

* Yeah, ease of use is a huge concern here. We're getting beat up
because people have to go find a separate package (and figure out
which one they want), install it, learn how to use it, etc. It doesn't
help that the most mature package is Slony which is, um, not very
novice-friendly or low-admin-complexity. I personally got religion
on this about two months ago when Red Hat switched their bugzilla
from Postgres to MySQL because the admins didn't want to deal with Slony
any more. People want simple.

* The proposed approach is trying to get to "real" replication
incrementally. Getting rid of the loss window involved in file-by-file
log shipping is step one, and I suspect that step two is going to be
fixing performance issues in WAL replay to ensure that slaves can keep
up. After that we'd start thinking about how to let slaves run
read-only queries. But even without read-only queries, this will be
a useful improvement for HA/backup scenarios.

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 17:39:48
Message-ID: Pine.GSO.4.64.0805291328110.10679@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 29 May 2008, David Fetter wrote:

> It's a giant up-hill slog to sell warm standby to those in charge of
> making resources available because the warm standby machine consumes SA
> time, bandwidth, power, rack space, etc., but provides no tangible
> benefit, and this feature would have exactly the same problem.

This is an interesting commentary on the priorities of the customers
you're selling to, but I don't think you can extrapolate from that too
much. The deployments I normally deal with won't run a system unless
there's a failover backup available, period, and the fact that such a
feature is not integrated into the core yet is a major problem for them.
Read-only slaves is a very nice to have, but by no means a prerequisite
before core replication will be useful to some people. Hardware/machine
resources are only worth a tiny fraction of what the data is in some
environments, and in some of those downtime is really, really expensive.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 17:41:32
Message-ID: 1212082892.11065.19.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 09:18 -0700, Josh Berkus wrote:
> Bruce,
>
> > Another idea I discussed with Tom is having the slave _delay_ applying
> > WAL files until all slave snapshots are ready.
> >
>
> Well, again, that only works for async mode.

It depends on what we mean by synchronous. Do we mean "the WAL record
has made it to the disk on the slave system," or "the WAL record has
been applied on the slave system"?

With this type of replication there will always be a difference for some
small window, but most people would expect that window to be very small
for synchronous replication.

Regards,
Jeff Davis


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 18:05:12
Message-ID: 483EF058.7040408@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Josh,

> What does this give us that Solaris Cluster, RedHat Cluster, DRBD etc..
> doesn't give us?

Actually, these solutions all have some serious drawbacks, not the least
of which is difficult administration (I speak from bitter personal
experience). Also, most of them require installation at the filesystem
level, something which often isn't available in a hosted environment.

--Josh Berkus


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 18:13:26
Message-ID: 20080529181326.GY40070@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 12:11:21PM -0400, Brian Hurt wrote:
>
> Being able to do read-only queries makes this feature more valuable in more
> situations, but I disagree that it's a deal-breaker.

Your managers are apparently more enlightened than some. ;-)

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 18:20:04
Message-ID: 20080529182003.GZ40070@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 07:20:37PM +0300, Marko Kreen wrote:
>
> So you can do lossless failover. Currently there is no good
> solution for this.

Indeed. Getting lossless failover would be excellent.

I understand David's worry (having had those arguments more times than
I care to admit), but if people don't want to spend the money on the
extra machine that can't be queried, they can use another solution for
the time being.

The big missing piece is lossless failover. People are currently
doing it with DRBD, various clustering things, &c., and those are
complicated to set up and maintain. (As I've told more than one
person looking at it, there is a risk that you'll actually make your
installation complicated enough that you'll make it _less_ reliable.
I have some bitter personal experiences with this effect, and I know
some others on this list do as well.)

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Sullivan <ajs(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 18:28:08
Message-ID: 20080529182808.GV16218@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 02:13:26PM -0400, Andrew Sullivan wrote:
> On Thu, May 29, 2008 at 12:11:21PM -0400, Brian Hurt wrote:
> > Being able to do read-only queries makes this feature more
> > valuable in more situations, but I disagree that it's a
> > deal-breaker.
>
> Your managers are apparently more enlightened than some. ;-)

Than most managers, at least in my experience, and since this feature
is (IMHO rightly) based around broad adoption, it's a good thing to
bring up.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 18:49:14
Message-ID: 483EFAAA.9040200@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


> in this case too. So each slave just needs to report its own longest
> open tx as "open" to master. Yes, it bloats master but no way around it.

Slaves should not report it every time or every transaction. Vacuum on master
will ask them before doing a real work.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Dave Page" <dpage(at)pgadmin(dot)org>, "Douglas McNaught" <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:03:40
Message-ID: b42b73150805291203q7e2bc7f5ga551958556be532d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 12:19 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> That's not what Tom's email said, AIUI. "Synchronous" replication surely
> means that the master and slave always have the same set of transactions
> applied. Streaming <> synchronous. But streaming log shipping will allow us
> to get get closer to synchronicity in some situations, i.e. the window for
> missing transactions will be much smaller.
>
> Some of us were discussing this late on Friday night after PGcon. ISTM that
> we can have either 1) fairly hot failover slaves that are guaranteed to be
> almost up to date, or 2) slaves that can support read-only transactions but
> might get somewhat out of date if they run long transactions. The big
> problem is in having slaves which are both highly up to date and support
> arbitrary read-only transactions. Maybe in the first instance, at least, we
> need to make slaves choose which role they will play.

I personally would be thrilled to have slaves be query-able in any
fashion, even if 'wrong' under certain circumstances. Any
asynchronous solution by definition gives the wrong answer on the
slave. Read only slave is the #1 most anticipated feature in the
circles I run with. It would literally transform how the database
world thinks about postgres overnight. This, coupled with easier
standby setup (a pg_archive to mirror pg_restore) would be most
welcome!

merlin


From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:05:18
Message-ID: C464BCFE.7E90%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hi everyone,

First of all, I'm absolutely delighted that the PG community is thinking seriously about replication.

Second, having a solid, easy-to-use database availability solution that works more or less out of the box would be an enormous benefit to customers. Availability is the single biggest problem for customers in my experience and as other people have commented the alternatives are not nice. It's an excellent idea to build off an existing feature-PITR is already pretty useful and the proposed features are solid next steps. The fact that it does not solve all problems is not a drawback but means it's likely to get done in a reasonable timeframe.

Third, you can't stop with just this feature. (This is the BUT part of the post.) The use cases not covered by this feature area actually pretty large. Here are a few that concern me:

1.) Partial replication.
2.) WAN replication.
3.) Bi-directional replication. (Yes, this is evil but there are problems where it is indispensable.)
4.) Upgrade support. Aside from database upgrade (how would this ever really work between versions?), it would not support zero-downtime app upgrades, which depend on bi-directional replication tricks.
5.) Heterogeneous replication.
6.) Finally, performance scaling using scale-out over large numbers of replicas. I think it's possible to get tunnel vision on this-it's not a big requirement in the PG community because people don't use PG in the first place when they want to do this. They use MySQL, which has very good replication for performance scaling, though it's rather weak for availability.

As a consequence, I don't see how you can get around doing some sort of row-based replication like all the other databases. Now that people are starting to get religion on this issue I would strongly advocate a parallel effort to put in a change-set extraction API that would allow construction of comprehensive master/slave replication. (Another approach would be to make it possible for third party apps to read the logs and regenerate SQL.) There are existing models for how to do change set extraction; we have done it several times at my company already. There are also research projects like GORDA that have looked fairly comprehensively at this problem.

My company would be quite happy to participate in or even sponsor such an API. Between the proposed WAL-based approach and change-set-based replication it's not hard to see PG becoming the open source database of choice for a very large number of users.

Cheers, Robert

On 5/29/08 6:37 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

David Fetter <david(at)fetter(dot)org> writes:
> On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
>> The only question I have is... what does this give us that PITR
>> doesn't give us?

> It looks like a wrapper for PITR to me, so the gain would be ease of
> use.

A couple of points about that:

* Yeah, ease of use is a huge concern here. We're getting beat up
because people have to go find a separate package (and figure out
which one they want), install it, learn how to use it, etc. It doesn't
help that the most mature package is Slony which is, um, not very
novice-friendly or low-admin-complexity. I personally got religion
on this about two months ago when Red Hat switched their bugzilla
from Postgres to MySQL because the admins didn't want to deal with Slony
any more. People want simple.

* The proposed approach is trying to get to "real" replication
incrementally. Getting rid of the loss window involved in file-by-file
log shipping is step one, and I suspect that step two is going to be
fixing performance issues in WAL replay to ensure that slaves can keep
up. After that we'd start thinking about how to let slaves run
read-only queries. But even without read-only queries, this will be
a useful improvement for HA/backup scenarios.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Robert Hodges, CTO, Continuent, Inc.
Email: robert(dot)hodges(at)continuent(dot)com
Mobile: +1-510-501-3728 Skype: hodgesrm


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Robert Hodges" <robert(dot)hodges(at)continuent(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:16:57
Message-ID: b42b73150805291216n6ea5c151u59a3e33671a4640e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 3:05 PM, Robert Hodges
<robert(dot)hodges(at)continuent(dot)com> wrote:
> Third, you can't stop with just this feature. (This is the BUT part of the
> post.) The use cases not covered by this feature area actually pretty
> large. Here are a few that concern me:
>
> 1.) Partial replication.
> 2.) WAN replication.
> 3.) Bi-directional replication. (Yes, this is evil but there are problems
> where it is indispensable.)
> 4.) Upgrade support. Aside from database upgrade (how would this ever
> really work between versions?), it would not support zero-downtime app
> upgrades, which depend on bi-directional replication tricks.
> 5.) Heterogeneous replication.
> 6.) Finally, performance scaling using scale-out over large numbers of
> replicas. I think it's possible to get tunnel vision on this—it's not a big
> requirement in the PG community because people don't use PG in the first
> place when they want to do this. They use MySQL, which has very good
> replication for performance scaling, though it's rather weak for
> availability.

These type of things are what Slony is for. Slony is trigger based.
This makes it more complex than log shipping style replication, but
provides lots of functionality.

wal shipping based replication is maybe the fastest possible
solution...you are already paying the overhead so it comes virtually
for free from the point of view of the master.

mysql replication is imo nearly worthless from backup standpoint.

merlin


From: Brian Hurt <bhurt(at)janestcapital(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:21:24
Message-ID: 483F0234.7070202@janestcapital.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andrew Sullivan wrote:

>On Thu, May 29, 2008 at 12:11:21PM -0400, Brian Hurt wrote:
>
>
>>Being able to do read-only queries makes this feature more valuable in more
>>situations, but I disagree that it's a deal-breaker.
>>
>>
>
>Your managers are apparently more enlightened than some. ;-)
>
>A
>
>
>
No doubt. But defining the minimum acceptable feature set by the
demands of the dumbest manager is a no-win proposition.

Brian


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:22:09
Message-ID: e51f66da0805291222w5a0928ebk3d96dfe5bd713c0a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/29/08, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> > in this case too. So each slave just needs to report its own longest
> > open tx as "open" to master. Yes, it bloats master but no way around it.
>
> Slaves should not report it every time or every transaction. Vacuum on
> master will ask them before doing a real work.

That is bit too slow if we want to do some page-level cleanup with
regular transactions / bgwriter. But once or twice per minute update
should work fine.

--
marko


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:30:07
Message-ID: 20080529193007.GB40070@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:

> people are starting to get religion on this issue I would strongly
> advocate a parallel effort to put in a change-set extraction API
> that would allow construction of comprehensive master/slave
> replication.

You know, I gave a talk in Ottawa just last week about how the last
effort to develop a comprehensive API for replication failed. I had
some ideas about why, the main one of which is something like this:
"Big features with a roadmap have not historically worked, so unless
we're willing to change the way we work, we won't get that."

I don't think an API is what's needed. It's clear proposals for
particlar features that can be delivered in small pieces. That's what
the current proposal offers. I think any kind of row-based approach
such as what you're proposing would need that kind of proposal too.

That isn't to say that I think an API is impossible or undesirable.
It is to say that the last few times we tried, it went nowhere; and
that I don't think the circumstances have changed.

A
--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David Fetter" <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:33:41
Message-ID: e51f66da0805291233j7ad69fbclbb01a344cab43437@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/29/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> * The proposed approach is trying to get to "real" replication
> incrementally. Getting rid of the loss window involved in file-by-file
> log shipping is step one, and I suspect that step two is going to be
> fixing performance issues in WAL replay to ensure that slaves can keep
> up. After that we'd start thinking about how to let slaves run
> read-only queries. But even without read-only queries, this will be
> a useful improvement for HA/backup scenarios.

I agree with this plan, but I think this extends also for read-only
queries - we don't need to have the perfect, no-overhead solution
as the first step, instead lets have simple and working solution
with some overhead, then improve that one.

And for the first-step solution, I think letting VACUUM keep tuples
around based on slave queries is preferable to letting slaves lag.
This is useful to more situations.

--
marko


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: David Fetter <david(at)fetter(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:54:03
Message-ID: 200805292154.04547.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter wrote:
> Either one of these would be great, but something that involves
> machines that stay useless most of the time is just not going to work.

Lots of people do use warm standby already anyway, just not based on
mechanisms built into PostgreSQL. So defining away this need is completely
unrealistic based on my experience. Even if there were a read-only slave,
lots of applications couldn't make use of it.

Anyway, a common approach to making better use of the hardware is to put some
other service on the otherwise-standby machine, which in turn uses your
master database server machine as its failover target. Unless you run *only*
a database, there would usually be some candidate that you could set up that
way.

Another common approach is to realize that for some the costs of a downtime
risk are higher than buying some extra hardware.

I think the consensus in the core team was that having synchronous log
shipping in 8.4 would already be a worthwhile feature by itself.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:56:12
Message-ID: 200805292156.13976.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Jeff Davis wrote:
> It depends on what we mean by synchronous. Do we mean "the WAL record
> has made it to the disk on the slave system," or "the WAL record has
> been applied on the slave system"?

DRBD, which is a common warm standby solution for PostgreSQL at the moment,
provides various levels of synchronicity. I imagine we could also define, as
need arises, various levels, some of which may be the ones you listed.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Merlin Moncure" <mmoncure(at)gmail(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Dave Page" <dpage(at)pgadmin(dot)org>, "Douglas McNaught" <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 19:59:55
Message-ID: 200805292159.56607.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Merlin Moncure wrote:
> Read only slave is the #1 most anticipated feature in the
> circles I run with.

Do these circles not know about slony and londiste?


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:08:11
Message-ID: 200805292208.12701.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Joshua D. Drake wrote:
> What does this give us that Solaris Cluster, RedHat Cluster, DRBD etc..
> doesn't give us?

I personally think that DRBD is a fine solution. But it only runs on Linux.
And Solaris Cluster isn't the same as DRBD.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Sullivan <ajs(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:09:47
Message-ID: 5567.1212091787@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andrew Sullivan <ajs(at)commandprompt(dot)com> writes:
> On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
>> people are starting to get religion on this issue I would strongly
>> advocate a parallel effort to put in a change-set extraction API
>> that would allow construction of comprehensive master/slave
>> replication.

> You know, I gave a talk in Ottawa just last week about how the last
> effort to develop a comprehensive API for replication failed.

Indeed, core's change of heart on this issue was largely driven by
Andrew's talk and subsequent discussion. We had more or less been
waiting for the various external replication projects to tell us
what they wanted in this line, and it was only the realization that
no such thing was likely to happen that forced us to think seriously
about what could be done within the core project.

As I said originally, we have no expectation that the proposed features
will displace the existing replication projects for "high end"
replication problems ... and I'd characterize all of Robert's concerns
as "high end" problems. We are happy to let those be solved outside
the core project.

About the only thing that would make me want to consider row-based
replication in core would be if we determine that read-only slave
queries are impractical atop a WAL-log-shipping implementation.
Which could happen; in fact I think that's the main risk of the
proposed development plan. But I also think that the near-term
steps of the plan are worth doing anyway, for various other reasons,
and so we won't be out too much effort if the plan fails.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Sullivan <ajs(at)commandprompt(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:10:13
Message-ID: 200805292210.13593.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andrew Sullivan wrote:
> The big missing piece is lossless failover.  People are currently
> doing it with DRBD, various clustering things, &c., and those are
> complicated to set up and maintain.

Well, we'll see at the end of this (we hope) how a setup procedure of DRBD vs.
PG warm standby works from a standing start. I wouldn't be overly
optimistic. ;-)


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Mathias Brossard <mathias(dot)brossard(at)opentrust(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:12:54
Message-ID: 200805292212.55391.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Mathias Brossard wrote:
>  From what I gather from those slides it seems to me that the NTT solution
> is synchronous not asynchronous. In my opinion it's even better, but I do
> understand that others might prefer asynchronous. I'm going to speculate,
> but I would think it should be possible (without a substancial rewrite) to
> support both modes (or even some intermediate modes, like DRBD on Linux).

Well, we already have asynchronous. The whole point is adding synchronous.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:14:14
Message-ID: 200805292214.14450.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Tom Lane wrote:
> We believe that the most appropriate base technology for this is
> probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.

Now how do we get our hands on their code?


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Dave Page" <dpage(at)pgadmin(dot)org>, "Douglas McNaught" <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "David Fetter" <david(at)fetter(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:22:13
Message-ID: b42b73150805291322m3b08a3a3o1bcdc1ecd22c0d47@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 3:59 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> Merlin Moncure wrote:
>> Read only slave is the #1 most anticipated feature in the
>> circles I run with.
>
> Do these circles not know about slony and londiste?

Sure.

For various reasons mentioned elsewhere on this thread, a PITR based
replication solution would have a lot of appeal, assuming it could be
made to work. The main thing is that it is completely non-invasive to
the master database and all the work is handled on the standby. It
also operates on a trivial mechanism to transfer the files.

merlin


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:35:41
Message-ID: 483F139D.7050901@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Robert,

> 1.) Partial replication.
> 2.) WAN replication.
> 3.) Bi-directional replication. (Yes, this is evil but there are
> problems where it is indispensable.)
> 4.) Upgrade support. Aside from database upgrade (how would this ever
> really work between versions?), it would not support zero-downtime app
> upgrades, which depend on bi-directional replication tricks.
> 5.) Heterogeneous replication.
> 6.) Finally, performance scaling using scale-out over large numbers of
> replicas. I think it’s possible to get tunnel vision on this—it’s not a
> big requirement in the PG community because people don’t use PG in the
> first place when they want to do this. They use MySQL, which has very
> good replication for performance scaling, though it’s rather weak for
> availability.

Let's not try to boil the ocean, hey?

From my perspective, the above use cases are what complex tools like
Slony, Bucardo, Skytools, Continuent, pgCluster, pgPool2, etc., etc. are
for. Now, if you're saying that you want to develop row-based
replication so that Continuent will work better, I'm all for it; but
saying that we *shouldn't* implement the current spec which satisfies
large numbers of users because it doesn't support *all* users is a
recipe for self-defeat. We can't satisfy all users with one
implementation, and we shouldn't try.

I think, for that matter, that work on the common replication hooks
supporting the external replication packages should continue. We need
these for precisely the reasons you state. But ... single-master,
single-slave, synch or asynch, whole-installation local network
replication is a case which covers a *lot* of users' needs ... I'd argue
the numerical majority.

--Josh


From: David Fetter <david(at)fetter(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:39:29
Message-ID: 20080529203929.GA31478@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 09:54:03PM +0200, Peter Eisentraut wrote:
> David Fetter wrote:
> > Either one of these would be great, but something that involves
> > machines that stay useless most of the time is just not going to
> > work.
>
> Lots of people do use warm standby already anyway, just not based on
> mechanisms built into PostgreSQL. So defining away this need is
> completely unrealistic based on my experience. Even if there were a
> read-only slave, lots of applications couldn't make use of it.
>
> Anyway, a common approach to making better use of the hardware is to
> put some other service on the otherwise-standby machine, which in
> turn uses your master database server machine as its failover
> target. Unless you run *only* a database, there would usually be
> some candidate that you could set up that way.
>
> Another common approach is to realize that for some the costs of a
> downtime risk are higher than buying some extra hardware.

> I think the consensus in the core team was that having synchronous
> log shipping in 8.4 would already be a worthwhile feature by itself.

If that was in fact the consensus of the core team, and what I've been
seeing from several core members in this thread makes that idea
unclear, it's out of step with the stated goal of the feature. Having
some kind of half-way, doesn't-actually-quite-work-out-of-the-box
"replication" will make things worse and not better.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:44:19
Message-ID: 6136.1212093859@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter <david(at)fetter(dot)org> writes:
> On Thu, May 29, 2008 at 09:54:03PM +0200, Peter Eisentraut wrote:
>> I think the consensus in the core team was that having synchronous
>> log shipping in 8.4 would already be a worthwhile feature by itself.

> If that was in fact the consensus of the core team, and what I've been
> seeing from several core members in this thread makes that idea
> unclear, it's out of step with the stated goal of the feature. Having
> some kind of half-way, doesn't-actually-quite-work-out-of-the-box
> "replication" will make things worse and not better.

What is your justification for denigrating this plan with that?
Or are you merely complaining because we know we won't be all the
way there in 8.4?

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:50:51
Message-ID: 20080529205051.GB31478@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 04:44:19PM -0400, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
> > On Thu, May 29, 2008 at 09:54:03PM +0200, Peter Eisentraut wrote:
> >> I think the consensus in the core team was that having
> >> synchronous log shipping in 8.4 would already be a worthwhile
> >> feature by itself.

I'm sorry if I was unclear on this. I don't disagree that it would be
a worthy feature. I do disagree with the idea, given this goal, that
we can expect to ship 8.4 without a built-in replication and have that
be good enough by itself.

> > If that was in fact the consensus of the core team, and what I've
> > been seeing from several core members in this thread makes that
> > idea unclear, it's out of step with the stated goal of the
> > feature. Having some kind of half-way,
> > doesn't-actually-quite-work-out-of-the-box "replication" will make
> > things worse and not better.
>
> What is your justification for denigrating this plan with that? Or
> are you merely complaining because we know we won't be all the way
> there in 8.4?

Again, just my humble opinion, but given the stated goal, which I
agree with, I'd say it's worth holding up 8.4 until some kind of
out-of-the-box replication advances that goal, where Yet Another
Toolkit Suitable For People Who Are Already Database Kernel Hackers To
Build Replication Systems On Top Of does not.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:54:04
Message-ID: 200805292054.m4TKs4O26040@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter wrote:
> > What is your justification for denigrating this plan with that? Or
> > are you merely complaining because we know we won't be all the way
> > there in 8.4?
>
> Again, just my humble opinion, but given the stated goal, which I
> agree with, I'd say it's worth holding up 8.4 until some kind of
> out-of-the-box replication advances that goal, where Yet Another
> Toolkit Suitable For People Who Are Already Database Kernel Hackers To
> Build Replication Systems On Top Of does not.

I am confused why we would hold 8.4 when no one said we should hold 8.3
because we didn't have built-in replication.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:55:42
Message-ID: 483F184E.1040007@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David,

>> I think the consensus in the core team was that having synchronous
>> log shipping in 8.4 would already be a worthwhile feature by itself.
>
> If that was in fact the consensus of the core team,

It is.

> and what I've been
> seeing from several core members in this thread makes that idea
> unclear, it's out of step with the stated goal of the feature. Having
> some kind of half-way, doesn't-actually-quite-work-out-of-the-box
> "replication" will make things worse and not better.

So, you've got a better implementation up your sleeve?

I really don't get where you're coming from on this. Frankly, your
e-mails seem gauged to be merely disruptive without any intent of
constructive input. If you're opposed to working on replication in the
core, then just say so. If you think that there's an easier way to
develop M-S replication in the core than using WAL, then please present
your solution.

--Josh


From: David Fetter <david(at)fetter(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 20:58:34
Message-ID: 20080529205833.GC31478@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 04:54:04PM -0400, Bruce Momjian wrote:
> David Fetter wrote:
> > > What is your justification for denigrating this plan with that?
> > > Or are you merely complaining because we know we won't be all
> > > the way there in 8.4?
> >
> > Again, just my humble opinion, but given the stated goal, which I
> > agree with, I'd say it's worth holding up 8.4 until some kind of
> > out-of-the-box replication advances that goal, where Yet Another
> > Toolkit Suitable For People Who Are Already Database Kernel
> > Hackers To Build Replication Systems On Top Of does not.
>
> I am confused why we would hold 8.4 when no one said we should hold
> 8.3 because we didn't have built-in replication.

If people on core had come to the idea that we needed to build in
replication *before* 8.3 came out, they certainly didn't announce it.

Now is a great time to mention this because it gives everybody time to:

1. Come to a consensus on what the out-of-the-box replication should
be, and

2. Build, test and debug whatever the consensus out-of-the-box
replication turns out to be.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:00:29
Message-ID: 6476.1212094829@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> David Fetter wrote:
>> Again, just my humble opinion, but given the stated goal, which I
>> agree with, I'd say it's worth holding up 8.4 until some kind of
>> out-of-the-box replication advances that goal, where Yet Another
>> Toolkit Suitable For People Who Are Already Database Kernel Hackers To
>> Build Replication Systems On Top Of does not.

> I am confused why we would hold 8.4 when no one said we should hold 8.3
> because we didn't have built-in replication.

Possibly I am misinterpreting what David said, but I think his opinion
is that synchronous log shipping in itself is useless, and that we
should hold 8.4 until we have that *and* slave-side query execution.
Suffice it to say that I disagree with the first position, and as for
the second, I will never again be willing to hold a release for one
single feature, no matter what it is. We've learned the folly of
managing the project that way several times over.

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:02:08
Message-ID: 20080529210208.GD31478@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 01:55:42PM -0700, Josh Berkus wrote:
> David,
>
>>> I think the consensus in the core team was that having synchronous
>>> log shipping in 8.4 would already be a worthwhile feature by itself.
>>
>> If that was in fact the consensus of the core team,
>
> It is.
>
>> and what I've been seeing from several core members in this thread
>> makes that idea unclear, it's out of step with the stated goal of
>> the feature. Having some kind of half-way,
>> doesn't-actually-quite-work-out-of-the-box "replication" will make
>> things worse and not better.
>
> So, you've got a better implementation up your sleeve?

Nope.

> I really don't get where you're coming from on this. Frankly, your
> e-mails seem gauged to be merely disruptive without any intent of
> constructive input.

I'm sorry they've come across that way. That was not my intent.

> If you're opposed to working on replication in the core, then just
> say so. If you think that there's an easier way to develop M-S
> replication in the core than using WAL, then please present your
> solution.

I think having master-slave replication in the core using WAL is a
*great* thing to do, doable, a good path to go on, etc., and I think
it's worth holding up 8.4 until we have at least one actual
out-of-the-box version of same.

People have hinted that we might be able to get both a synchronous one
and an asynchronous one based on WAL, which would be even better. :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:09:28
Message-ID: 483F1B88.208@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David,

> I think having master-slave replication in the core using WAL is a
> *great* thing to do, doable, a good path to go on, etc., and I think
> it's worth holding up 8.4 until we have at least one actual
> out-of-the-box version of same.

Ah, ok. Well, I can tell you that the core team is also united on the
value of time-based as opposed to feature-based release cycles. *if* we
can develop this in time for 8.4, everybody would be overjoyed, but I
also think we should be realistic.

> People have hinted that we might be able to get both a synchronous one
> and an asynchronous one based on WAL, which would be even better. :)

That's the idea, yes. No reason to dump asynch WAL copying when it's
already working.

--Josh


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:29:01
Message-ID: 20080529212900.GG40070@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 01:39:29PM -0700, David Fetter wrote:

> > I think the consensus in the core team was that having synchronous
> > log shipping in 8.4 would already be a worthwhile feature by itself.
>
> If that was in fact the consensus of the core team, and what I've been
> seeing from several core members in this thread makes that idea
> unclear, it's out of step with the stated goal of the feature. Having
> some kind of half-way, doesn't-actually-quite-work-out-of-the-box
> "replication" will make things worse and not better.

I don't agree with that. It seems to me that the stated goal is
fourfold:

1. Get something that is simple (out of the box).

2. Get something that is generally useful, even if not universally useful.

3. Get something that is implementable in stages, with each stage
representing an increase in functionality.

4. Aim for 8.4, but be realistic about chances.

It seems to me that sync. replication to a write-only standby actually
meets those goals. I can think of more than one case where I have
spent lots of time, money, or both attempting to achieve this goal
with various add-on hardware and software systems that mostly worked
except when they didn't. These additions were complicated to set up,
painful to use, and documented by sadists who couldn't write.

I agree that there are plenty of people unwilling to use a system that
requires additional hardware that mostly can't be used for other
things. But those people already have alternatives (maybe even more,
and simple ones, soon). The synchronous-needing crowd currently have
nothing. The proposed feature would be a huge improvement.

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: Mathias Brossard <mathias(dot)brossard(at)opentrust(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:37:44
Message-ID: 483F2228.9060100@opentrust.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Peter Eisentraut wrote:
> Mathias Brossard wrote:
>> From what I gather from those slides it seems to me that the NTT solution
>> is synchronous not asynchronous. In my opinion it's even better, but I do
>> understand that others might prefer asynchronous. I'm going to speculate,
>> but I would think it should be possible (without a substancial rewrite) to
>> support both modes (or even some intermediate modes, like DRBD on Linux).
>
> Well, we already have asynchronous. The whole point is adding synchronous.

I don't think the current log shipping solution matches, as Tom puts it,
the « need for simple, built-in replication » (from a user perspective,
I wouldn't call it "simple").

I pointed out that the NTT solution is synchronous because Tom said in
the first part of his email that:

> In practice, simple asynchronous single-master-multiple-slave
> replication covers a respectable fraction of use cases, so we have
> concluded that we should allow such a feature to be included in the
> core project.

... and yet "the most appropriate base technology for this" is
synchronous and maybe I should have also pointed out in my previous mail
is that it doesn't support multiple slaves.

Also, as other have pointed out there are different interpretations of
"synchronous" depending on wether the WAL data has reached the other end
of the network connection, a safe disk checkpoint or the slave DB itself.

Sincerely,
--
Mathias Brossard


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:42:16
Message-ID: 200805291742.17495.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thursday 29 May 2008 12:13:20 Bruce Momjian wrote:
> David Fetter wrote:
> > On Thu, May 29, 2008 at 11:58:31AM -0400, Bruce Momjian wrote:
> > > Josh Berkus wrote:
> > > > Publishing the XIDs back to the master is one possibility. We
> > > > also looked at using "spillover segments" for vacuumed rows, but
> > > > that seemed even less viable.
> > > >
> > > > I'm also thinking, for *async replication*, that we could simply
> > > > halt replication on the slave whenever a transaction passes minxid
> > > > on the master. However, the main focus will be on synchrounous
> > > > hot standby.
> > >
> > > Another idea I discussed with Tom is having the slave _delay_
> > > applying WAL files until all slave snapshots are ready.
> >
> > Either one of these would be great, but something that involves
> > machines that stay useless most of the time is just not going to work.
>
> Right, the ultimate target is to have the slave be read-only, but we
> need to get the streaming of WAL logs done first.
>

I would have thought the read only piece would have been more important than
the synchronous piece. In my experience readable slaves is the big selling
point in both Oracle and MySQL's implementations, and people are not nearly
as concerned if there is a small asynchronous window.

It would also be easier to implement on some level; we have already solved the
asynchronus wal shipping problem, so we would just need to solve the
read-only bits. For synchronus hot standby, you have to solve both the
synchronus shipping and the read-only bits. Seems like more work with less
upside that read-only slaves vs. pitr warm standby we have now.

Interesting that core views this differently.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:52:37
Message-ID: 7290.1212097957@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> I would have thought the read only piece would have been more important than
> the synchronous piece. In my experience readable slaves is the big selling
> point in both Oracle and MySQL's implementations, and people are not nearly
> as concerned if there is a small asynchronous window.

Actually, it's the "easy to use" piece that's near the front of my own
priority list. In any case, the expectation that we can get the log
shipping fixed sooner than we can get read-only slaves has nothing to do
with the relative desirability of the pieces; it's a technical judgment
based on what we can see of the problems' difficulty.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 21:54:36
Message-ID: 1212098076.27385.73.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 17:42 -0400, Robert Treat wrote:

>
> I would have thought the read only piece would have been more important than
> the synchronous piece. In my experience readable slaves is the big selling
> point in both Oracle and MySQL's implementations, and people are not nearly
> as concerned if there is a small asynchronous window.

The read only piece is the more important piece from a market
perspective.

Joshua D. Drake


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 22:06:39
Message-ID: 60bq2o22a8.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) writes:
> As I said originally, we have no expectation that the proposed features
> will displace the existing replication projects for "high end"
> replication problems ... and I'd characterize all of Robert's concerns
> as "high end" problems. We are happy to let those be solved outside
> the core project.

I expect that one of the effects of having an "improved PITR" as a
'core' replication solution will be that Slony-I will become less
popular amongst "naive users," and that is *not* a bad thing. I am
quite sure that there are a lot of cases today where users would be
better served by what falls out of this.

It will mean that those that are left will, despite being fewer, be
more cognizant that they are getting what they require.

We should see somewhat fewer "expectation gaps," and that should allow
more people to be more satisfied with what they are getting.
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/linux.html
Lisp stoppped itself
FEP Command:


From: Dimitri Fontaine <dim(at)hi-media(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 22:12:31
Message-ID: 5966F428-2B4C-4383-B123-077D23DF9A55@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I'd first want to applaud core decision: having bare PostgreSQL
propose a reliable and simple to set-up synchronous replication
solution is an excellent perspective! ...

Le 29 mai 08 à 23:42, Robert Treat a écrit :
> I would have thought the read only piece would have been more
> important than
> the synchronous piece. In my experience readable slaves is the big
> selling
> point in both Oracle and MySQL's implementations, and people are not
> nearly
> as concerned if there is a small asynchronous window.

... Even more so when you're confronted to this exact problem.
A fellow PG user ended up having both the WAL and the data replicated
by DRBD (protocol C) and some heartbeat scripts to do the automatic
failover. This wasn't easy to setup, and to some extend we're still
concerned about the reliability part of it. We know about the "easy to
use" part of it: we didn't get it.

While at it, would it be possible for the "simple" part of the core
team statement to include automatic failover?
That would mean for current master when it's going to stop on error
(fatal) to tell the slave to warm-up. Of course in case of more severe
crash the slave would have to get started by other means, but covering
the fatal error path and have the master restart as a slave would only
add up to the reliability... wouldn't it?

> It would also be easier to implement on some level; we have already
> solved the
> asynchronus wal shipping problem, so we would just need to solve the
> read-only bits. For synchronus hot standby, you have to solve both the
> synchronus shipping and the read-only bits. Seems like more work
> with less
> upside that read-only slaves vs. pitr warm standby we have now.
>
> Interesting that core views this differently.

core seems to think read-only slave is more complex than synchronous
slave, in term of slave read only long transaction and master vacuums
for example.

Regards,
- --
dim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkg/Kk8ACgkQlBXRlnbh1bneKACeMK+fSp8VExctndo46X76NTxV
atIAn2UYw1g/4RPddypqirrZcqg5C7gm
=JeA6
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dim(at)hi-media(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 22:29:01
Message-ID: 7898.1212100141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Dimitri Fontaine <dim(at)hi-media(dot)com> writes:
> While at it, would it be possible for the "simple" part of the core
> team statement to include automatic failover?

No, I think it would be a useless expenditure of energy. Failover
includes a lot of things that are not within our purview: switching
IP addresses to point to the new server, some kind of STONITH solution
to keep the original master from coming back to life, etc. Moreover
there are already projects/products concerned with those issues.

It might be useful to document where to find solutions to that problem,
but we can't take it on as part of core Postgres.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 22:39:53
Message-ID: 483F30B9.1080705@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Joshua D. Drake wrote:
> On Thu, 2008-05-29 at 17:42 -0400, Robert Treat wrote:
>
>
>> I would have thought the read only piece would have been more important than
>> the synchronous piece. In my experience readable slaves is the big selling
>> point in both Oracle and MySQL's implementations, and people are not nearly
>> as concerned if there is a small asynchronous window.
>>
>
> The read only piece is the more important piece from a market
> perspective.
>
>
>

You must be gauging a different market from the one I'm in. I have just
come back from a meeting with a (quite technically savvy) customer who
was quite excited by the news and saw the possibility of read-only
slaves as a nice to have extra rather than a must-have-or-it's-not-worth
anything feature.

I'm really quite astounded and rather saddened by the waves of
negativity I have seen today.

cheers

andrew


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 22:47:05
Message-ID: 1212101225.27385.80.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 18:39 -0400, Andrew Dunstan wrote:
>
> Joshua D. Drake wrote:
> > On Thu, 2008-05-29 at 17:42 -0400, Robert Treat wrote:
> >
>
> You must be gauging a different market from the one I'm in. I have just
> come back from a meeting with a (quite technically savvy) customer who

One customer does not make a hundred. I am not saying that the shipping
isn't valid, just that those that I talk to are more interested in the
read only slave. Consider that we have any number of ways to solve the
problem we are considering implementing now. DRBD being just one of
them. Especially considering we aren't talking about failover etc..

> was quite excited by the news and saw the possibility of read-only
> slaves as a nice to have extra rather than a must-have-or-it's-not-worth
> anything feature.
>
> I'm really quite astounded and rather saddened by the waves of
> negativity I have seen today.

I have seen much more positive than negative. I think most are just
trying to figure out exactly what we are talking about.

I for example am not really against the feature at all. I am suspect of
the idea of only shipping 2 components of a 3 component feature though.

Sincerely,

Joshua D. Drake


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 23:02:56
Message-ID: 8400.1212102176@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> One customer does not make a hundred. I am not saying that the shipping
> isn't valid, just that those that I talk to are more interested in the
> read only slave. Consider that we have any number of ways to solve the
> problem we are considering implementing now. DRBD being just one of
> them.

In principle, sure, but there's a big gap between theory and practice
here. What you are arguing for is the "give the users a toolkit and
let them figure it out" approach that David was lambasting upthread.
People want the bits to go from point A to point B; they don't want
to have to research, design, test, and administer their own solution
for moving the bits. Furthermore, as several people have pointed out
from their own bitter experiences, getting a reliable and easy-to-use
solution is not as easy as it sounds. I think we have nontrivial
work in front of us to build a simple, reliable, community-tested
log shipping solution; and it's not very sexy work either. But it
needs to get done, and it really needs to get done first. There's
no point in having read-only slave queries if you don't have a
trustworthy method of getting the data to them.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-29 23:12:22
Message-ID: 1212102742.27385.102.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 19:02 -0400, Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> > One customer does not make a hundred. I am not saying that the shipping
> > isn't valid, just that those that I talk to are more interested in the
> > read only slave. Consider that we have any number of ways to solve the
> > problem we are considering implementing now. DRBD being just one of
> > them.
>
> In principle, sure, but there's a big gap between theory and practice
> here. What you are arguing for is the "give the users a toolkit and
> let them figure it out" approach that David was lambasting upthread.

I think maybe my actual argument isn't coming through. What I am arguing
for is not shipping XY without Z. That is all. (and no, I don't think we
should hold up 8.4).

In short. Let's do the feature but don't ship half of it. Ship the whole
thing. If that means it doesn't get into 8.4, then it doesn't get into
8.4.

> People want the bits to go from point A to point B; they don't want
> to have to research, design, test, and administer their own solution
> for moving the bits.

You don't have to convince me. I agree with you.

> I think we have nontrivial
> work in front of us to build a simple, reliable, community-tested
> log shipping solution; and it's not very sexy work either. But it
> needs to get done, and it really needs to get done first. There's
> no point in having read-only slave queries if you don't have a
> trustworthy method of getting the data to them.

O.k. I was with you until here. Log shipping ala pg_standby works fine
now sans read-only slave. No, it isn't out of the box which I can see an
argument for but it is certainly trustworthy. Or do you mean the
synchronous part?

Sincerely,

Joshua D. rake


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 00:14:30
Message-ID: 21771.1212106470@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> I think maybe my actual argument isn't coming through. What I am arguing
> for is not shipping XY without Z. That is all. (and no, I don't think we
> should hold up 8.4).

So we should keep all the work out of the tree until every part is done?
No thanks; especially not when there is a perfectly respectable use-case
for parts X and Y alone (whether it suits *your* uses or not).

>> There's no point in having read-only slave queries if you don't have a
>> trustworthy method of getting the data to them.

> O.k. I was with you until here. Log shipping ala pg_standby works fine
> now sans read-only slave. No, it isn't out of the box which I can see an
> argument for but it is certainly trustworthy. Or do you mean the
> synchronous part?

How much testing has pg_standby really gotten? Some, sure, but it's a
contrib module that wasn't even there before 8.3. Even ignoring the lag
issue, I wouldn't trust it a whole lot if I were a DBA responsible for
valuable data. As much as some folk would like to think that contrib
is mainstream, it's not really in the same league as far as testing
coverage goes.

regards, tom lane


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Bruce Momjian" <bruce(at)momjian(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Marko Kreen" <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 00:25:10
Message-ID: b42b73150805291725j1f0ccacfl779e758847b7aa2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 7:12 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> On Thu, 2008-05-29 at 19:02 -0400, Tom Lane wrote:
>> I think we have nontrivial
>> work in front of us to build a simple, reliable, community-tested
>> log shipping solution; and it's not very sexy work either. But it
>> needs to get done, and it really needs to get done first. There's
>> no point in having read-only slave queries if you don't have a
>> trustworthy method of getting the data to them.
>
> O.k. I was with you until here. Log shipping ala pg_standby works fine
> now sans read-only slave. No, it isn't out of the box which I can see an
> argument for but it is certainly trustworthy. Or do you mean the
> synchronous part?

I disagree...setting up pg_standby is more complex than it really has
to be. There are several examples in the archives of people getting
their standby solutions busted with partial wal files, etc. I helped
beta test pg_standby and there are a few 'gotchas' in getting it set
up properly.

pg_standby is not the problem (although there are some odd things
about it), it's getting files from point a to point b. It would be
nice to have 'pg_archive' which mirrors pg_standby and handles the
work on the client side for example.

While some of us can work magic with rsync, etc. It would be nice to
get things running with few .conf settings and no external
dependencies.

merlin


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 00:31:31
Message-ID: Pine.GSO.4.64.0805292012520.16207@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 29 May 2008, Tom Lane wrote:

> There's no point in having read-only slave queries if you don't have a
> trustworthy method of getting the data to them.

This is a key statement that highlights the difference in how you're
thinking about this compared to some other people here. As far as some
are concerned, the already working log shipping *is* a trustworthy method
of getting data to the read-only slaves. There are plenty of applications
(web oriented ones in particular) where if you could direct read-only
queries against a slave, the resulting combination would be a giant
improvement over the status quo even if that slave was as much as
archive_timeout behind the master. That quantity of lag is perfectly fine
for a lot of the same apps that have read scalability issues.

If you're someone who falls into that camp, the idea of putting the sync
replication job before the read-only slave one seems really backwards.

I fully accept that it may be the case that it doesn't make technical
sense to tackle them in any order besides sync->read-only slaves because
of dependencies in the implementation between the two. If that's the
case, it would be nice to explicitly spell out what that was to deflect
criticism of the planned prioritization.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Aidan Van Dyk <aidan(at)highrise(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 00:57:21
Message-ID: 20080530005720.GH27560@yugib.highrise.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [080529 20:22]:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> > I think maybe my actual argument isn't coming through. What I am arguing
> > for is not shipping XY without Z. That is all. (and no, I don't think we
> > should hold up 8.4).
>
> So we should keep all the work out of the tree until every part is done?
> No thanks; especially not when there is a perfectly respectable use-case
> for parts X and Y alone (whether it suits *your* uses or not).

Thank you.

I would be really disappointed if some sort of synchronous WAL shipping
patch was done and considered good, but just not applied because some
semi-related "read only queries while WAL applying" patch wasn't yet
considered done and good...

Then again, the opposite is also true. If someone sat down and did the
hard work and made read-only queries runnable on a slave applying WAL
and the patch was considered done and good, I would be disappointed if it
wasn't applied because the synchronous WAL shipping wasn't done.

Both features are worthy on their own. But the first done should not be
held up because the other is not ready.

But because core specifically mentioned "ease of use" as one of the
goals, I really think the "built-in WAL shipping" should be *their*
focuss first. Having read-only queries that no-one can use (well, sure,
some of us could set it up reliably) isn't that useful to "general
PostgreSQL" community that I think they were thinking of when deciding
on this. But their focus doesn't force anybody interested in other
features to not work on their features. It's an OSS community...

And the fact that NTT had a nice presentation on it means it's
probably actually doable for 8.4:
http://www.pgcon.org/2008/schedule/attachments/61_Synchronous%20Log%20Shipping%20Replication.pdf

a.

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 01:25:49
Message-ID: 483F579D.9080307@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Tom Lane wrote:

>>> There's no point in having read-only slave queries if you don't have a
>>> trustworthy method of getting the data to them.
>
>> O.k. I was with you until here. Log shipping ala pg_standby works fine
>> now sans read-only slave. No, it isn't out of the box which I can see an
>> argument for but it is certainly trustworthy. Or do you mean the
>> synchronous part?
>
> How much testing has pg_standby really gotten? Some, sure, but it's a
> contrib module that wasn't even there before 8.3. Even ignoring the lag
> issue, I wouldn't trust it a whole lot if I were a DBA responsible for
> valuable data. As much as some folk would like to think that contrib
> is mainstream, it's not really in the same league as far as testing
> coverage goes.

This is a pretty telling statement.

Sincerely,

Joshua D. Drake


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 01:26:36
Message-ID: 200805291826.36873.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Greg,

> I fully accept that it may be the case that it doesn't make technical
> sense to tackle them in any order besides sync->read-only slaves because
> of dependencies in the implementation between the two. If that's the
> case, it would be nice to explicitly spell out what that was to deflect
> criticism of the planned prioritization.

There's a very simple reason to prioritize the synchronous log shipping first;
NTT may open source their solution and we'll get it a lot sooner than the
other components.

That is, we expect that synch log shipping is *easier* than read-only slaves
and will get done sooner. Since there are quite a number of users who could
use this, whether or not they can run queries on the slaves, why not ship
that feature as soon as its done?

There's also a number of issues with using the currently log shipping method
for replication. In additon to the previously mentioned setup pains, there's
the 16MB chunk size for shipping log segments, which is fine for data
warehouses but kind of sucks for a web application with a 3GB database which
may take 2 hours to go though 16MB. So we have to change the shipping method
anyway, and if we're doing that, why not work on synch?

Mind you, if someone wanted to get started on read-only slaves *right now* I
can't imagine anyone would object. There's a number of problems to solve
with recovery mode, table locking etc. that can use some work even before we
deal with changes to log shipping, or XID writeback or any of the other
issues. So, volunteers?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 02:38:28
Message-ID: 20080530023828.GA46626@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 07:02:56PM -0400, Tom Lane wrote:

> People want the bits to go from point A to point B; they don't want
> to have to research, design, test, and administer their own solution
> for moving the bits.

I agree with this. I think I probably know as well as most people --
perhaps less well than maybe two people in the world -- where most of
the bodies in Slony are hidden, and I still find it a pain to
administer. Other systems are only somewhat less troublesome; and I
really do know about the concepts involved. I'm not tripping on
important things. It's just some work to set up.

Other systems hide that work.

Given that (for instance) psql is really very easy to use once you
know a few things, the ongoing pain of simple replication in Postgres
is a big wart.

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Marko Kreen" <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 02:59:21
Message-ID: b42b73150805291959o7856c27cm99617f2edba9eb1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 9:26 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I fully accept that it may be the case that it doesn't make technical
>> sense to tackle them in any order besides sync->read-only slaves because
>> of dependencies in the implementation between the two. If that's the
>> case, it would be nice to explicitly spell out what that was to deflect
>> criticism of the planned prioritization.
>
> There's a very simple reason to prioritize the synchronous log shipping first;
> NTT may open source their solution and we'll get it a lot sooner than the
> other components.

That's a good argument. I just read the NTT document and the stuff
looks fantastic. You've convinced me...it just doesn't seem prudent
to forge ahead with hot standby without dealing with all the
syncnronous changes to wal logging first. I just want you guys to
understand how important hot standby is to a lot of people. sync
logging maybe less so, but having a proof of concept implementation
significantly alters the bang/buck ratio.

> That is, we expect that synch log shipping is *easier* than read-only slaves
> and will get done sooner. Since there are quite a number of users who could
> use this, whether or not they can run queries on the slaves, why not ship
> that feature as soon as its done?

I think what dfetter, etc. were saying is that we should elevate the
hot standby stuff to a requirement, or at least a future requirement.
IOW, we should try and avoid doing anything which would make it harder
than it already is. Please understand that I don't thing people on
the list were trying to be negative...the failure of hot standby to
materialize in the 8.3 cycle was a bitter pill for many people. I
personally see this new thinking as a hugely positive development.

> There's also a number of issues with using the currently log shipping method
> for replication. In additon to the previously mentioned setup pains, there's
> the 16MB chunk size for shipping log segments, which is fine for data
> warehouses but kind of sucks for a web application with a 3GB database which
> may take 2 hours to go though 16MB. So we have to change the shipping method
> anyway, and if we're doing that, why not work on synch?

well, there is the archive_timeout setting...but point taken. A big
use case for hot standby is OLTP environments where you get to combine
HA and reporting server into a single box.

> Mind you, if someone wanted to get started on read-only slaves *right now* I
> can't imagine anyone would object. There's a number of problems to solve
> with recovery mode, table locking etc. that can use some work even before we
> deal with changes to log shipping, or XID writeback or any of the other
> issues. So, volunteers?

As I see it, sync logging, hot standby, and improved setup features
are all mostly orthogonal.. Florian took some pretty decent notes
during his analysis and outlined the problem areas pretty well. That
would be a starting point. It just strikes me for all this stuff to
having even remote chance of making 8.4 the work needs to be divided
up into teams and conquered separately.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Greg Smith <gsmith(at)gregsmith(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 03:02:56
Message-ID: 483F6E60.1090109@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Josh Berkus wrote:
> Greg,
>
>
>> I fully accept that it may be the case that it doesn't make technical
>> sense to tackle them in any order besides sync->read-only slaves because
>> of dependencies in the implementation between the two. If that's the
>> case, it would be nice to explicitly spell out what that was to deflect
>> criticism of the planned prioritization.
>>
>
> There's a very simple reason to prioritize the synchronous log shipping first;
> NTT may open source their solution and we'll get it a lot sooner than the
> other components.
>

I have been reading the slides from the NTT presentation, and I now
really regret not having gone to that talk.

It does seem quite heavy, though, including new background processes,
heartbeat etc.
> That is, we expect that synch log shipping is *easier* than read-only slaves
> and will get done sooner. Since there are quite a number of users who could
> use this, whether or not they can run queries on the slaves, why not ship
> that feature as soon as its done?
>

Indeed.

> There's also a number of issues with using the currently log shipping method
> for replication. In additon to the previously mentioned setup pains, there's
> the 16MB chunk size for shipping log segments, which is fine for data
> warehouses but kind of sucks for a web application with a 3GB database which
> may take 2 hours to go though 16MB. So we have to change the shipping method
> anyway, and if we're doing that, why not work on synch?
>

Well, yes, but you do know about archive_timeout, right? No need to wait
2 hours.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 05:10:20
Message-ID: 25041.1212124220@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> I fully accept that it may be the case that it doesn't make technical
> sense to tackle them in any order besides sync->read-only slaves because
> of dependencies in the implementation between the two.

Well, it's certainly not been my intention to suggest that no one should
start work on read-only-slaves before we finish the other part. The
point is that I expect the log shipping issues will be done first
because they're easier, and it would be pointless to not release that
feature if we had it.

But since you mention it: one of the plausible answers for fixing the
vacuum problem for read-only slaves is to have the slaves push an xmin
back upstream to the master to prevent premature vacuuming. The current
design of pg_standby is utterly incapable of handling that requirement.
So there might be an implementation dependency there, depending on how
we want to solve that problem.

regards, tom lane


From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Bruce Momjian" <bruce(at)momjian(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Marko Kreen" <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 07:01:46
Message-ID: 65937bea0805300001q3ca1ddb7pa26fc71963cb64a9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> But since you mention it: one of the plausible answers for fixing the
> vacuum problem for read-only slaves is to have the slaves push an xmin
> back upstream to the master to prevent premature vacuuming. The current
> design of pg_standby is utterly incapable of handling that requirement.
> So there might be an implementation dependency there, depending on how
> we want to solve that problem.
>

I think it would be best to not make the slave interfere with the master's
operations; that's only going to increase the operational complexity of such
a solution.

There could be multiple slaves following a master, some serving
data-warehousing queries, some for load-balancing reads, some others just
for disaster recovery, and then some just to mitigate human errors by
re-applying the logs with a delay.

I don't think any one installation would see all of the above mentioned
scenarios, but we need to take care of multiple slaves operating off of a
single master; something similar to cascaded Slony-I.

My two cents.

Best regards,
--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 08:29:17
Message-ID: 1212136157.4120.5.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 10:12 -0400, Tom Lane wrote:
> The Postgres core team met at PGCon to discuss a few issues, the largest
> of which is the need for simple, built-in replication for PostgreSQL.
> Historically the project policy has been to avoid putting replication
> into core PostgreSQL, so as to leave room for development of competing
> solutions, recognizing that there is no "one size fits all" replication
> solution. However, it is becoming clear that this policy is hindering
> acceptance of PostgreSQL to too great an extent, compared to the benefit
> it offers to the add-on replication projects. Users who might consider
> PostgreSQL are choosing other database systems because our existing
> replication options are too complex to install and use for simple cases.
> In practice, simple asynchronous single-master-multiple-slave
> replication covers a respectable fraction of use cases, so we have
> concluded that we should allow such a feature to be included in the core
> project. We emphasize that this is not meant to prevent continued
> development of add-on replication projects that cover more complex use
> cases.
>
> We believe that the most appropriate base technology for this is
> probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
> We hope that such a feature can be completed for 8.4. Ideally this
> would be coupled with the ability to execute read-only queries on the
> slave servers, but we see technical difficulties that might prevent that
> from being completed before 8.5 or even further out. (The big problem
> is that long-running slave-side queries might still need tuples that are
> vacuumable on the master, and so replication of vacuuming actions would
> cause the slave's queries to deliver wrong answers.)
>
> Again, this will not replace Slony, pgPool, Continuent, Londiste, or
> other systems for many users, as it will be not be highly scalable nor
> support long-distance replication nor replicating less than an entire
> installation. But it is time to include a simple, reliable basic
> replication feature in the core system.

I'm in full support of this and commend the work of the NTT team.

The goals and timescales are realistic and setting a timetable in this
way will help planning for many users,

I'm expecting to lead the charge on the Hot Standby project. The problem
mentioned is just one of the issues, though overall I'm now optimistic
about our eventual success in that area. I'm discussing this now with a
couple of sponsors and would welcome serious financial commitments to
this goal. Please contact me off-list if you agree also.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dim(at)hi-media(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 08:29:30
Message-ID: 200805301029.33274.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Le vendredi 30 mai 2008, Tom Lane a écrit :
> No, I think it would be a useless expenditure of energy. Failover
> includes a lot of things that are not within our purview: switching
> IP addresses to point to the new server, some kind of STONITH solution
> to keep the original master from coming back to life, etc. Moreover
> there are already projects/products concerned with those issues.

Well, I forgot that there's in fact no active plan to put pgbouncer features
into core at the moment (I'm sure I've read something about this on the lists
though). If it was the case, the slave could proxy queries to the master, and
stop proxying but serve them if the master tells it it's dying.
This way, no need to switch IP addresses, the clients just connect as usual
and get results back and do not have to know whether the host they're qerying
against is a slave or a master. This level of smartness is into -core.

The STONITH part in case of known failure (fatal) does not seem that hard
either, as the master at fatal time could write somewhere it's now a slave
and use this at next startup time (recovery.conf?). If it can't even do that,
well, we're back to crash situation with no provided automatic failover
solution. Not handled failure cases obviously will continue to exist.

I'm not asking for "all cases managed in -core please", just for "some level
of effort" on the topic. Of course, I'm just the one asking questions and
trying to raise ideas, so I'm perfectly fine with your current answer
(useless expenditure of energy) even if somewhat disagreeing on the useless
part of it :)

As for the integrated pgbouncer daemon part, I'm thinking this would allow the
infamous part 3 of the proposal (read-only slave) to get pretty simple to
setup when ready: the slave knows who its master is, and as soon as an XID is
needed the transaction queries are forwarded/proxied to it. Thanks again
Florian !

> It might be useful to document where to find solutions to that problem,
> but we can't take it on as part of core Postgres.

Even the part when it makes sense (provided it does and I'm not completely off
tracks here)?

Regards,
--
dim


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <ajs(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 08:56:45
Message-ID: 1212137805.4120.17.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


On Fri, 2008-05-30 at 01:10 -0400, Tom Lane wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> > I fully accept that it may be the case that it doesn't make technical
> > sense to tackle them in any order besides sync->read-only slaves because
> > of dependencies in the implementation between the two.
>
> Well, it's certainly not been my intention to suggest that no one should
> start work on read-only-slaves before we finish the other part. The
> point is that I expect the log shipping issues will be done first
> because they're easier, and it would be pointless to not release that
> feature if we had it.

Agreed.

I'm arriving late to a thread that seems to have grown out of all
proportion.

AFAICS streaming WAL and hot standby are completely orthogonal features.
Streaming WAL is easier and if NTT can release their code to open source
we may get this in the Sept commit fest. Hot Standby is harder and it
was my viewpoint at PGCon that we may not have a perfect working version
of this by the end of 8.4. We are very likely to have something working,
but maybe not the whole feature set as we might wish to have. I expect
to be actively working on this "soon". I definitely do want to see WAL
streaming going in as early as possible and before end of 8.4, otherwise
code conflicts and other difficulties are likely to push out the 8.4
date and/or Hot Standby.

So as I see it, Tom has only passed on my comments on this, not added or
removed anything. The main part of the announcement was really about
bringing the WAL streaming into core and effectively favouring it over a
range of other projects.

Can we all back off a little on this for now? Various concerns have been
validly expressed, but it will all come good AFAICS.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dim(at)hi-media(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 09:02:03
Message-ID: 200805301102.06063.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Le vendredi 30 mai 2008, Dimitri Fontaine a écrit :
> This way, no need to switch IP addresses, the clients just connect as usual
> and get results back and do not have to know whether the host they're
> qerying against is a slave or a master. This level of smartness is into
> -core.

Oh, and if you want clients to connect to a single IP and hit either the
master or the slave with some weights to choose one or the other, and a way
to remove from pool on failure etc, I think using haproxy in TCP mode would
do it. HaProxy is really nice for this purpose.
http://haproxy.1wt.eu/

Regards,
--
dim


From: Dirk Riehle <dirk(at)riehle(dot)org>
To: pgsql-advocacy(at)postgresql(dot)org
Subject: Feedback on blog post about Replication Feature decision and its impact
Date: 2008-05-30 10:21:53
Message-ID: 483FD541.9020809@riehle.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hello everyone, I was about to blog about the announcement of adding
replication to the core product but then thought I should ask for
feedback here. I moved it to advocacy (from hackers). Any thoughts would
be appreciated, thanks! --Dirk

----

Every software community has its peculiar challenges, and open source is
no exception. This post discusses the relationship between a core open
source product (PostgreSQL) and commercial offerings based on it (e.g.
EnterpriseDB).<p>

PostgreSQL is a relational database system (the most advanced open
source database in its own words); to many it is known as the other open
source database (next to MySQL). Unlike MySQL, PostgreSQL is not owned
by anyone, it is a true community project. What's more, PostgreSQL is
not based on the GPL (license) but rather on the more permissive BSD
license, which lets companies distribute the database plus enhancements
without having to contribute back their code.<p>

The core system until now does not contain support for database
replication. Naturally then, there are plenty of extensions, many of
them open source, that add these features. However, if you need
replication, it is rather cumbersome having to add some extension and
maintain it separately from the core distribution. So the pressure has
been mounting to add replication to the core product, and the core team
of seven committers finally made that decision at their recent users
conference. The <a href="
http://www.nabble.com/Core-team-statement-on-replication-in-PostgreSQL-td17537053.html">discussion
of these new features</a> is refreshingly unpolitical and focussed on
the task at hand, as to be expected of a mature open source community.<p>

<a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
database startup whose product builds on PostgreSQL. EnterpriseDB adds
many "enterprise-readiness" features to the basic PostgreSQL product,
including database replication, and much more. One might argue that it
is not in the interest of EnterpriseDB to have replication added to
PostgreSQL as it reduces the differentiation between the free community
product and the more advanced commercial offering. Why pay for
EnterpriseDB if you already get what you need from the free version?
Won't adding replication to the core product reduce EnterpriseDBs sales?
This tension seems only to get worse when you realize that EnterpriseDB
employs several of the core developers of PostgreSQL, suggesting a
direct conflict of interest when making decisions like whether to add
replication or not.<p>

So they finally made the decision to add replication, and it gives me
the opportunity to discuss what I believe are misunderstandings about
the open source business.<p>

<h3>Won't EnterpriseDB loose sales once replication is added to the core
PostgreSQL product?</h3>

I think the opposite will be the case. Officially, EnterpriseDB wants to
be a cheaper Oracle, but in the open source arena, its main competitor
is MySQL. EnterpriseDB the commercial offering is competing with MySQL
the commercial offering, and not with the free community version of
PostgreSQL. It is in EnterpriseDB's interest to have a free PostgreSQL
version installed and used in as many IT departments as possible,
because it is <a
href="http://www.riehle.org/2008/04/30/sdn-is-open-source-competing-unfairly/">the
first (and important) step to a later sale</a>, as I have discussed
elsewhere. Enhancing the free product achieves exactly this.<p>

<h3>Won't a reduced differentiation between EnterpriseDB and the core
product reduce their addressable market?</h3>

I'm pretty sure it doesn't. The addressable market size doesn't go down.
That's because EnterpriseDB is not only selling additional features, but
more importantly to many applications and customers, it is selling
"operational comfort". Operational comfort means that EnterpriseDB is
offering its throat (to choke) to customers should something go wrong.
For money obviously; this is a core part of their business. Once a
database system becomes mission-critical, few companies will want to go
without paying for support. What the reduced differentiation does,
however, is to increase the possible competition around selling such
operational comfort. Other companies may more easily enter this market
and compete with EnterpriseDB. However, as I have argued elsewhere, <a
href="http://www.riehle.org/computer-science/research/2007/computer-2007.html">by
employing core developers, EnterpriseDB is well positioned to make a
believable case that it is the go-to provider of operational comfort</a>.<p>

<h3>There is only one license, the GPL, and everyone should be using
it.</h3>

PostgreSQL is a good example of a community open source project that
does not use GPL and still is flourishing well. Whatever the ideological
background of this statement, the belief seems to be that people should
be forced to contribute back to a project rather than do so of their own
choosing. That's hardly a notion that increases freedom. More
importantly, the rationale behind it makes little sense to me. In
general, firms and individual contributors alike are motivated to
contribute back (non-differentiating) code to reduce their maintenance
overhead. If they don't, they'll only create more non-differentiating
work for themselves as they are trying to catch up with the evolving
codebase. What's more, every possible proprietary extension faces the
problem of possibly being contributed by someone else, if only there is
enough demand for it. If you wait too long to make your contribution,
someone else will do it, and you just created another maintenance and
migration problem for yourself. Which is exactly what we see happening
with the replication feature in PostgreSQL. Pressure had been mounting,
and now it will be included, to everyone's benefit.<p>

--
Into novel software paradigms, tools, processes?
Then submit a short paper to Onward! 2008 by July 2nd!
See http://www.oopsla.org/oopsla2008/cfp/cfp-onward.html
--
Phone: + 1 (650) 215 3459, Web: http://www.riehle.org


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 13:29:03
Message-ID: 20080530132903.GB46762@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 01:58:34PM -0700, David Fetter wrote:

> If people on core had come to the idea that we needed to build in
> replication *before* 8.3 came out, they certainly didn't announce it.
>
> Now is a great time to mention this because it gives everybody time to:
>
> 1. Come to a consensus on what the out-of-the-box replication should
> be, and
>
> 2. Build, test and debug whatever the consensus out-of-the-box
> replication turns out to be.

None of that is an argument for why this has to go in 8.4.

I argued in Ottawa that the idea that you have to plan a feature for
_the next release_ is getting less tenable with each release. This is
because major new features for Postgres are now often big and
complicated. The days of big gains from single victories are mostly
over (though there are exceptions, like HOT). Postgres is already
mature. As for the middle-aged person with a mortgage, longer-term
planning is simply a necessary part of life now.

There are two possibilities here. One is to have huge releases on
much longer timetables. I think this is unsustainable in a free
project, because people will get bored and go away if they don't get
to use the results of their work in a reasonably short time frame.
The other is to accept that sometimes, planning and development for
new features will have to start a long time before actual release --
maybe planning and some coding for 2 releases out. That allows large
features like the one we're discussing to be developed responsibly
without making everything else wait for it.

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Bruce Momjian" <bruce(at)momjian(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 13:31:06
Message-ID: e51f66da0805300631p2e187e1cmf9b1e2a5f4aa250b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 5/30/08, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
> On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > But since you mention it: one of the plausible answers for fixing the
> > vacuum problem for read-only slaves is to have the slaves push an xmin
> > back upstream to the master to prevent premature vacuuming. The current
> > design of pg_standby is utterly incapable of handling that requirement.
> > So there might be an implementation dependency there, depending on how
> > we want to solve that problem.
>
> I think it would be best to not make the slave interfere with the master's
> operations; that's only going to increase the operational complexity of such
> a solution.

I disagree - it's better to consider syncronized WAL-slaves
as equal to master, so having queries there affect master is ok.

You need to remeber this solution tries not to replace 100-node Slony-I
setups. You can run sanity checks on slaves or use them to load-balance
read-only OLTP queries, but not random stuff.

> There could be multiple slaves following a master, some serving
> data-warehousing queries, some for load-balancing reads, some others just
> for disaster recovery, and then some just to mitigate human errors by
> re-applying the logs with a delay.

To run warehousing queries you better use Slony-I / Londiste. For
warehousring you want different / more indexes on tables anyway,
so I think it's quite ok to say "don't do it" for complex queries
on WAL-slaves.

> I don't think any one installation would see all of the above mentioned
> scenarios, but we need to take care of multiple slaves operating off of a
> single master; something similar to cascaded Slony-I.

Again, the synchronized WAL replication is not generic solution.
Use Slony/Londiste if you want to get totally independent slaves.

Thankfully the -core has set concrete and limited goals,
that means it is possible to see working code in reasonable time.
I think that should apply to read-only slaves too.

If we try to make it handle any load, it will not be finished in any time.

Now if we limit the scope I've seen 2 variants thus far:

1) Keep slave max in sync, let the load there affect master (xmin).
- Slave can be used to load-balance OLTP load
- Slave should not be used for complex queries.

2) If long query is running, let slave lag (avoid applying WAL data).
- Slave cannot be used to load-balance OLTP load
- Slave can be used for complex queries (although no new indexes
or temp tables can be created).

I think 1) is more important (and more easily implementable) case.

For 2) we already have solutions (Slony/Londiste/Bucardo, etc)
so there is no point to make effort to solve this here.

--
marko


From: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 14:19:00
Message-ID: 20080530161900.074f218d@iridium.wars-nicht.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:

> Well, yes, but you do know about archive_timeout, right? No need to wait
> 2 hours.

Then you ship 16 MB binary stuff every 30 second or every minute but
you only have some kbyte real data in the logfile. This must be taken
into account, especially if you ship the logfile over the internet
(means: no high-speed connection, maybe even pay-per-traffic) to the
slave.

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group


From: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 14:26:53
Message-ID: 20080530162653.3a243dd6@iridium.wars-nicht.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 29 May 2008 18:29:01 -0400 Tom Lane wrote:

> Dimitri Fontaine <dim(at)hi-media(dot)com> writes:
> > While at it, would it be possible for the "simple" part of the core
> > team statement to include automatic failover?
>
> No, I think it would be a useless expenditure of energy. Failover
> includes a lot of things that are not within our purview: switching
> IP addresses to point to the new server, some kind of STONITH solution
> to keep the original master from coming back to life, etc. Moreover
> there are already projects/products concerned with those issues.

True words. Failover is not and should not be part of PostgreSQL.

But PG can help the failover solution, as example: an easy-to-use
interface about the current slave status comes into my mind. Other
ideas might also be possible.

> It might be useful to document where to find solutions to that problem,
> but we can't take it on as part of core Postgres.

Ack

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group


From: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 14:42:10
Message-ID: 20080530164210.38eee0a1@iridium.wars-nicht.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 29 May 2008 09:22:26 -0700 Steve Atkins wrote:
> On May 29, 2008, at 9:12 AM, David Fetter wrote:
>
> > Either one of these would be great, but something that involves
> > machines that stay useless most of the time is just not going to work.
>
> I have customers who are thinking about warm standby functionality, and
> the only thing stopping them deploying it is complexity and maintenance,
> not the cost of the HA hardware. If trivial-to-deploy replication that
> didn't offer read-only access of the slaves were available today I'd bet
> that most of them would be using it.

Sure, have a similar customer. They are right now using a set of
Perl-scripts which ship the logfiles to the slave, take care of the
status, apply the logfiles, validate checksums ect ect. The whole thing
works very well in combination with RedHat cluster software, but it
took several weeks to implement the current solution.

Not everyone wants to spend the time and the manpower to implement a
"simple" replication.

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 14:43:00
Message-ID: 1212158580.4120.106.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
> On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> But since you mention it: one of the plausible answers for
> fixing the
> vacuum problem for read-only slaves is to have the slaves push
> an xmin
> back upstream to the master to prevent premature vacuuming.
> The current
> design of pg_standby is utterly incapable of handling that
> requirement.
> So there might be an implementation dependency there,
> depending on how
> we want to solve that problem.
>
> I think it would be best to not make the slave interfere with the
> master's operations; that's only going to increase the operational
> complexity of such a solution.
>
> There could be multiple slaves following a master, some serving
> data-warehousing queries, some for load-balancing reads, some others
> just for disaster recovery, and then some just to mitigate human
> errors by re-applying the logs with a delay.

Agreed.

We ruled that out as the-only-solution a while back. It does have the
beauty of simplicity, so it may exist as an option or possibly the only
way, for 8.4.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 14:46:19
Message-ID: 3595.1212158779@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
>> On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> But since you mention it: one of the plausible answers for fixing the
>>> vacuum problem for read-only slaves is to have the slaves push an xmin
>>> back upstream to the master to prevent premature vacuuming.
>>
>> I think it would be best to not make the slave interfere with the
>> master's operations; that's only going to increase the operational
>> complexity of such a solution.

> We ruled that out as the-only-solution a while back. It does have the
> beauty of simplicity, so it may exist as an option or possibly the only
> way, for 8.4.

Yeah. The point is that it's fairly clear that we could make that work.
A solution that doesn't impact the master at all would be nicer, but
it's not at all clear to me that one is possible, unless we abandon
WAL-shipping as the base technology.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 15:30:14
Message-ID: 48401D86.6030808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>
>> On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
>>
>>> On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>>> But since you mention it: one of the plausible answers for fixing the
>>>> vacuum problem for read-only slaves is to have the slaves push an xmin
>>>> back upstream to the master to prevent premature vacuuming.
>>>>
>>> I think it would be best to not make the slave interfere with the
>>> master's operations; that's only going to increase the operational
>>> complexity of such a solution.
>>>
>
>
>> We ruled that out as the-only-solution a while back. It does have the
>> beauty of simplicity, so it may exist as an option or possibly the only
>> way, for 8.4.
>>
>
> Yeah. The point is that it's fairly clear that we could make that work.
> A solution that doesn't impact the master at all would be nicer, but
> it's not at all clear to me that one is possible, unless we abandon
> WAL-shipping as the base technology.
>
>
>

Quite. Before we start ruling things out let's know what we think we can
actually do.

I hope that NTT will release their code ASAP so we will have a better
idea of what we have and what we need.

cheers

andrew


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 15:58:03
Message-ID: 1212163083.4120.139.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


On Fri, 2008-05-30 at 11:30 -0400, Andrew Dunstan wrote:
>
> Tom Lane wrote:
> > Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> >
> >> On Fri, 2008-05-30 at 12:31 +0530, Gurjeet Singh wrote:
> >>
> >>> On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>>
> >>>> But since you mention it: one of the plausible answers for fixing the
> >>>> vacuum problem for read-only slaves is to have the slaves push an xmin
> >>>> back upstream to the master to prevent premature vacuuming.
> >>>>
> >>> I think it would be best to not make the slave interfere with the
> >>> master's operations; that's only going to increase the operational
> >>> complexity of such a solution.
> >>>
> >> We ruled that out as the-only-solution a while back. It does have the
> >> beauty of simplicity, so it may exist as an option or possibly the only
> >> way, for 8.4.
> >>
> > Yeah. The point is that it's fairly clear that we could make that work.
> > A solution that doesn't impact the master at all would be nicer, but
> > it's not at all clear to me that one is possible, unless we abandon
> > WAL-shipping as the base technology.
> >
> Quite. Before we start ruling things out let's know what we think we can
> actually do.

Let me re-phrase: I'm aware of that possibility and believe we can and
could do it for 8.4. My assessment is that people won't find it
sufficient and I am looking at other alternatives also. There may be a
better one possible for 8.4, there may not. Hence I've said "something
in 8.4, something better later". There is no need to decide that is the
only way forward, yet.

I hope and expect to put some of these ideas into a more concrete form,
but this has not yet happened. Nothing has slipped, not having any
trouble getting on with it, just that my plans were to not start it yet.
I think having a detailed design ready for review by September commit
fest is credible.

> I hope that NTT will release their code ASAP so we will have a better
> idea of what we have and what we need.

That has very little to do with Hot Standby, though there could be patch
conflicts, which is why I'm aiming to get WAL streaming done first.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Marko Kreen" <markokr(at)gmail(dot)com>
Cc: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Bruce Momjian" <bruce(at)momjian(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 15:59:23
Message-ID: b42b73150805300859q54ffa0a8ta5ded202c7695ae9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, May 30, 2008 at 9:31 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> On 5/30/08, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
>>
>> I think it would be best to not make the slave interfere with the master's
>> operations; that's only going to increase the operational complexity of such
>> a solution.
>
> I disagree - it's better to consider syncronized WAL-slaves
> as equal to master, so having queries there affect master is ok.
>
> You need to remeber this solution tries not to replace 100-node Slony-I
> setups. You can run sanity checks on slaves or use them to load-balance
> read-only OLTP queries, but not random stuff.
>
>> There could be multiple slaves following a master, some serving
>> data-warehousing queries, some for load-balancing reads, some others just
>> for disaster recovery, and then some just to mitigate human errors by
>> re-applying the logs with a delay.
>
> To run warehousing queries you better use Slony-I / Londiste. For
> warehousring you want different / more indexes on tables anyway,
> so I think it's quite ok to say "don't do it" for complex queries
> on WAL-slaves.
>
>> I don't think any one installation would see all of the above mentioned
>> scenarios, but we need to take care of multiple slaves operating off of a
>> single master; something similar to cascaded Slony-I.
>
> Again, the synchronized WAL replication is not generic solution.
> Use Slony/Londiste if you want to get totally independent slaves.

I strongly agree with Gurjeet. The warm standby replication mechanism
is pretty simple and is wonderfully flexible with the one big
requirement that your clusters have to be mirrors of each other.

Synchronous wal replication obviously needs some communication channel
from the slave back to the master. Hopefully, it will be possible to
avoid this for asynchronous shipping.

merlin


From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <ajs(at)commandprompt(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 18:12:51
Message-ID: C4660233.7F96%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hi Tom,

Thanks for the reasoned reply. As you saw from point #2 in my comments, I
think you should do this feature. I hope this answers Josh Berkus' concern
about my comments.

You make a very interesting comment which seems to go to the heart of this
design approach:

> About the only thing that would make me want to consider row-based
> replication in core would be if we determine that read-only slave
> queries are impractical atop a WAL-log-shipping implementation.

It's possible I'm misunderstanding some of the implementation issues, but it
is striking that the detailed responses to your proposal list a number of
low-level dependencies between master and slave states when replicating WAL
records. It appears that you are designing a replication mechanism that
works effectively between a master and a relatively small number of "nearby"
slaves. This is clearly an important use case but it also seems clear that
the WAL approach is not a general-purpose approach to replication. In other
words, you'll incrementally get to that limited end point I describe. This
will still leave a lot to be desired on read scaling, not to mention many
other cases.

Hence my original comments. However, rather than harp on that further I
will open up a separate thread to describe a relatively small set of
extensions to PostgreSQL that would be enabling for a wide range of
replication applications. Contrary to popular opinion these extensions are
actually well understood at the theory level and have been implemented as
prototypes as well as in commercial patches multiple times in different
databases. Those of us who are deeply involved in replication deserve just
condemnation for not stepping up and getting our thoughts out on the table.

Meanwhile, I would be interested in your reaction to these thoughts on the
scope of the real-time WAL approach. There's obviously tremendous interest
in this feature. A general description that goes beyond the NTT slides
would be most helpful for further discussions.

Cheers, Robert

P.s., The NTT slides were really great. Takahiro and Masao deserve
congratulations on an absolutely first-rate presentation.

On 5/29/08 9:09 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andrew Sullivan <ajs(at)commandprompt(dot)com> writes:
>> On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote:
>>> people are starting to get religion on this issue I would strongly
>>> advocate a parallel effort to put in a change-set extraction API
>>> that would allow construction of comprehensive master/slave
>>> replication.
>
>> You know, I gave a talk in Ottawa just last week about how the last
>> effort to develop a comprehensive API for replication failed.
>
> Indeed, core's change of heart on this issue was largely driven by
> Andrew's talk and subsequent discussion. We had more or less been
> waiting for the various external replication projects to tell us
> what they wanted in this line, and it was only the realization that
> no such thing was likely to happen that forced us to think seriously
> about what could be done within the core project.
>
> As I said originally, we have no expectation that the proposed features
> will displace the existing replication projects for "high end"
> replication problems ... and I'd characterize all of Robert's concerns
> as "high end" problems. We are happy to let those be solved outside
> the core project.
>
> About the only thing that would make me want to consider row-based
> replication in core would be if we determine that read-only slave
> queries are impractical atop a WAL-log-shipping implementation.
> Which could happen; in fact I think that's the main risk of the
> proposed development plan. But I also think that the near-term
> steps of the plan are worth doing anyway, for various other reasons,
> and so we won't be out too much effort if the plan fails.
>
> regards, tom lane
>


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 19:16:29
Message-ID: 200805301516.30049.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Friday 30 May 2008 01:10:20 Tom Lane wrote:
> Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> > I fully accept that it may be the case that it doesn't make technical
> > sense to tackle them in any order besides sync->read-only slaves because
> > of dependencies in the implementation between the two.
>
> Well, it's certainly not been my intention to suggest that no one should
> start work on read-only-slaves before we finish the other part. The
> point is that I expect the log shipping issues will be done first
> because they're easier, and it would be pointless to not release that
> feature if we had it.
>
> But since you mention it: one of the plausible answers for fixing the
> vacuum problem for read-only slaves is to have the slaves push an xmin
> back upstream to the master to prevent premature vacuuming. The current
> design of pg_standby is utterly incapable of handling that requirement.
> So there might be an implementation dependency there, depending on how
> we want to solve that problem.
>

Sure, but whose to say that after synchronous wal shipping is "finished" it
wont need a serious re-write due to new needs from the hot standby feature. I
think going either way carries some risk.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 19:16:34
Message-ID: 200805301516.34494.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thursday 29 May 2008 20:31:31 Greg Smith wrote:
> On Thu, 29 May 2008, Tom Lane wrote:
> > There's no point in having read-only slave queries if you don't have a
> > trustworthy method of getting the data to them.
>
> This is a key statement that highlights the difference in how you're
> thinking about this compared to some other people here. As far as some
> are concerned, the already working log shipping *is* a trustworthy method
> of getting data to the read-only slaves. There are plenty of applications
> (web oriented ones in particular) where if you could direct read-only
> queries against a slave, the resulting combination would be a giant
> improvement over the status quo even if that slave was as much as
> archive_timeout behind the master. That quantity of lag is perfectly fine
> for a lot of the same apps that have read scalability issues.
>
> If you're someone who falls into that camp, the idea of putting the sync
> replication job before the read-only slave one seems really backwards.
>

Just looking at it from an overall market perspective, synchronous log
shipping pretty much only addresses failover needs, where as read-only slaves
address both failover and scaling issues. (Note I say address, not solve).

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Marko Kreen" <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 19:16:57
Message-ID: 200805301516.58062.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thursday 29 May 2008 22:59:21 Merlin Moncure wrote:
> On Thu, May 29, 2008 at 9:26 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >> I fully accept that it may be the case that it doesn't make technical
> >> sense to tackle them in any order besides sync->read-only slaves because
> >> of dependencies in the implementation between the two. If that's the
> >> case, it would be nice to explicitly spell out what that was to deflect
> >> criticism of the planned prioritization.
> >
> > There's a very simple reason to prioritize the synchronous log shipping
> > first; NTT may open source their solution and we'll get it a lot sooner
> > than the other components.
>
> That's a good argument. I just read the NTT document and the stuff
> looks fantastic. You've convinced me...

It would be a better argument if the NTT guys hadn't said that they estimated
6 months time before the code would be released, which puts us beyond 8.4.
Now it is possible that the time frame could be sooner, but unless someone
already has the patch, this reminds me a little too much of the arguments for
including windows support in a single release because we already had a work
port/patch set to go from.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Sullivan <ajs(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 19:57:28
Message-ID: 1212177448.4120.160.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


On Fri, 2008-05-30 at 11:12 -0700, Robert Hodges wrote:
> This is clearly an important use case but it also seems clear that
> the WAL approach is not a general-purpose approach to replication.

I think we cannot make such a statement yet, if ever.

I would note that log-based replication is now the mainstay of
commercial database replication techniques for loosely-coupled groups of
servers. It would seem strange to assume that it should not be good for
us too, simply because we know it to be difficult.

IMHO the project has a pretty good track record of delivering
functionality that looked hard at first glance.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 20:21:28
Message-ID: 65937bea0805301321x668b8b1fxe15d86243ee8fcaf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 7:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> The big problem
> is that long-running slave-side queries might still need tuples that are
> vacuumable on the master, and so replication of vacuuming actions would
> cause the slave's queries to deliver wrong answers.

Another issue with read-only slaves just popped up in my head.

How do we block the readers on the slave while it is replaying an ALTER
TABLE or similar command that requires Exclusive lock and potentially alters
the table's structure. Or does the WAL replay already takes an x-lock on
such a table?

Best regards,
--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 20:22:41
Message-ID: Pine.GSO.4.64.0805301614580.15855@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:

> Then you ship 16 MB binary stuff every 30 second or every minute but
> you only have some kbyte real data in the logfile.

Not if you use pg_clearxlogtail (
http://www.2ndquadrant.com/replication.htm ), which got lost in the giant
March commitfest queue but should probably wander into contrib as part of
8.4.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 21:05:57
Message-ID: 48406C35.9050409@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andreas 'ads' Scherbaum wrote:
> On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
>
>
>> Well, yes, but you do know about archive_timeout, right? No need to wait
>> 2 hours.
>>
>
> Then you ship 16 MB binary stuff every 30 second or every minute but
> you only have some kbyte real data in the logfile. This must be taken
> into account, especially if you ship the logfile over the internet
> (means: no high-speed connection, maybe even pay-per-traffic) to the
> slave.
>
>
>
>

Sure there's a price to pay. But that doesn't mean the facility doesn't
exist. And I rather suspect that most of Josh's customers aren't too
concerned about traffic charges or affected by such bandwidth
restrictions. Certainly, none of my clients are, and they aren't in the
giant class. Shipping a 16Mb file, particularly if compressed, every
minute or so, is not such a huge problem for a great many commercial
users, and even many domestic users.

cheers

andrew


From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 21:18:51
Message-ID: 65937bea0805301418j294d8fc2g49a1559a5e8593ba@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sat, May 31, 2008 at 1:52 AM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:
>
> Then you ship 16 MB binary stuff every 30 second or every minute but
>> you only have some kbyte real data in the logfile.
>>
>
> Not if you use pg_clearxlogtail (
> http://www.2ndquadrant.com/replication.htm ), which got lost in the giant
> March commitfest queue but should probably wander into contrib as part of
> 8.4.

This means we need to modify pg_standby to not check for filesize when
reading XLogs.

Best regards,

--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 22:11:20
Message-ID: Pine.GSO.4.64.0805301809050.9617@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sat, 31 May 2008, Gurjeet Singh wrote:

>> Not if you use pg_clearxlogtail
>
> This means we need to modify pg_standby to not check for filesize when
> reading XLogs.

No, the idea is that you run the segments through pg_clearxlogtail | gzip,
which then compresses lightly used segments massively because all the
unused bytes are 0. File comes out the same size at the other side, but
you didn't ship a full 16MB if there was only a few KB used.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 22:42:20
Message-ID: 20080531004220.7c3a9522@iridium.wars-nicht.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote:

> On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:
>
> > Then you ship 16 MB binary stuff every 30 second or every minute but
> > you only have some kbyte real data in the logfile.
>
> Not if you use pg_clearxlogtail (
> http://www.2ndquadrant.com/replication.htm ), which got lost in the giant
> March commitfest queue but should probably wander into contrib as part of
> 8.4.

Yes, this topic was discussed several times in the past but to
solve this it needs a patch/solution which is integrated into PG
itself, not contrib.

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group


From: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 22:47:19
Message-ID: 20080531004719.4b8206a0@iridium.wars-nicht.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
> Andreas 'ads' Scherbaum wrote:
> > On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
> >
> >> Well, yes, but you do know about archive_timeout, right? No need to wait
> >> 2 hours.
> >
> > Then you ship 16 MB binary stuff every 30 second or every minute but
> > you only have some kbyte real data in the logfile. This must be taken
> > into account, especially if you ship the logfile over the internet
> > (means: no high-speed connection, maybe even pay-per-traffic) to the
> > slave.
>
> Sure there's a price to pay. But that doesn't mean the facility doesn't
> exist. And I rather suspect that most of Josh's customers aren't too
> concerned about traffic charges or affected by such bandwidth
> restrictions. Certainly, none of my clients are, and they aren't in the
> giant class. Shipping a 16Mb file, particularly if compressed, every
> minute or so, is not such a huge problem for a great many commercial
> users, and even many domestic users.

The real problem is not the 16 MB, the problem is: you can't compress
this file. If the logfile is rotated it still contains all the
old binary data which is not a good starter for compression.

So you may have some kB changes in the wal logfile every minute but you
still copy 16 MB data. Sure, it's not so much - but if you rotate a
logfile every minute this still transfers 16*60*24 = ~23 GB a day.

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-30 22:51:57
Message-ID: 200805301551.58092.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andrew,

> Sure there's a price to pay. But that doesn't mean the facility doesn't
> exist. And I rather suspect that most of Josh's customers aren't too
> concerned about traffic charges or affected by such bandwidth
> restrictions. Certainly, none of my clients are, and they aren't in the
> giant class. Shipping a 16Mb file, particularly if compressed, every
> minute or so, is not such a huge problem for a great many commercial
> users, and even many domestic users.

The issue is that when you're talking about telecommunications companies
(and similar) once a minute isn't adequate. Those folks want at least
every second, or even better synchronous.

Anyway, this is a pretty pointless discussion given that we want both
capabilities, and stuff will get implemented in the order it makes
technical sense.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 00:00:17
Message-ID: 1212192017.12052.26.camel@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sat, 2008-05-31 at 02:48 +0530, Gurjeet Singh wrote:

>
> Not if you use pg_clearxlogtail
> ( http://www.2ndquadrant.com/replication.htm ), which got lost
> in the giant March commitfest queue but should probably wander
> into contrib as part of 8.4.
>
> This means we need to modify pg_standby to not check for filesize when
> reading XLogs.
>
> Best regards,
>

It does.

Joshua D. Drake

>


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: andrew(at)dunslane(dot)net
Cc: adsmail(at)wars-nicht(dot)de, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 00:56:19
Message-ID: 20080531.095619.85415428.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

> Andreas 'ads' Scherbaum wrote:
> > On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
> >
> >
> >> Well, yes, but you do know about archive_timeout, right? No need to wait
> >> 2 hours.
> >>
> >
> > Then you ship 16 MB binary stuff every 30 second or every minute but
> > you only have some kbyte real data in the logfile. This must be taken
> > into account, especially if you ship the logfile over the internet
> > (means: no high-speed connection, maybe even pay-per-traffic) to the
> > slave.
> >
> >
> >
> >
>
> Sure there's a price to pay. But that doesn't mean the facility doesn't
> exist. And I rather suspect that most of Josh's customers aren't too
> concerned about traffic charges or affected by such bandwidth
> restrictions. Certainly, none of my clients are, and they aren't in the
> giant class. Shipping a 16Mb file, particularly if compressed, every
> minute or so, is not such a huge problem for a great many commercial
> users, and even many domestic users.

Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in
Japan (parent company of Sumitomo Electric Information Systems Co.,
Ltd., which is one of the Recursive SQL development support company)
uses 100 PostgreSQL servers. They are doing backups by using log
shipping to another data center and have problems with the amount of
the transferring log data. They said this is one of the big problems
they have with PostgreSQL and hope it will be solved in the near
future.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: adsmail(at)wars-nicht(dot)de, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 01:32:26
Message-ID: 4840AAAA.7010109@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Tatsuo Ishii wrote:
>> Andreas 'ads' Scherbaum wrote:
>>
>>> On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
>>>
>>>
>>>
>>>> Well, yes, but you do know about archive_timeout, right? No need to wait
>>>> 2 hours.
>>>>
>>>>
>>> Then you ship 16 MB binary stuff every 30 second or every minute but
>>> you only have some kbyte real data in the logfile. This must be taken
>>> into account, especially if you ship the logfile over the internet
>>> (means: no high-speed connection, maybe even pay-per-traffic) to the
>>> slave.
>>>
>>>
>>>
>>>
>>>
>> Sure there's a price to pay. But that doesn't mean the facility doesn't
>> exist. And I rather suspect that most of Josh's customers aren't too
>> concerned about traffic charges or affected by such bandwidth
>> restrictions. Certainly, none of my clients are, and they aren't in the
>> giant class. Shipping a 16Mb file, particularly if compressed, every
>> minute or so, is not such a huge problem for a great many commercial
>> users, and even many domestic users.
>>
>
> Sumitomo Electric Co., Ltd., a 20 billion dollars selling company in
> Japan (parent company of Sumitomo Electric Information Systems Co.,
> Ltd., which is one of the Recursive SQL development support company)
> uses 100 PostgreSQL servers. They are doing backups by using log
> shipping to another data center and have problems with the amount of
> the transferring log data. They said this is one of the big problems
> they have with PostgreSQL and hope it will be solved in the near
> future.
>
>

Excellent data point. Now, what I'd like to know is whether they are
getting into trouble simply because of the volume of log data generated
or because they have a short archive_timeout set. If it's the former
(which seems more likely) then none of the ideas I have seen so far in
this discussion seemed likely to help, and that would indeed be a major
issue we should look at. Another question is this: are they being
overwhelmed by the amount of network traffic generated, or by difficulty
in postgres producers and/or consumers to keep up? If it's network
traffic, then perhaps compression would help us.

Maybe we need to set some goals for the level of log volumes we expect
to be able to create/send/comsume.

cheers

andrew


From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 03:39:14
Message-ID: 65937bea0805302039n30083771r7598f20d1b4d448c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sat, May 31, 2008 at 3:41 AM, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:

> On Sat, 31 May 2008, Gurjeet Singh wrote:
>
> Not if you use pg_clearxlogtail
>>>
>>
>> This means we need to modify pg_standby to not check for filesize when
>> reading XLogs.
>>
>
> No, the idea is that you run the segments through pg_clearxlogtail | gzip,
> which then compresses lightly used segments massively because all the unused
> bytes are 0. File comes out the same size at the other side, but you didn't
> ship a full 16MB if there was only a few KB used.

Got it. I remember reading about pg_clearxlogtail in these mailing lists;
but somehow forgot how it actually worked!

--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device


From: "Mike Rylander" <mrylander(at)gmail(dot)com>
To: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 06:18:08
Message-ID: b918cf3d0805302318o2d1d79a7pc93f76190a734c40@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum
<adsmail(at)wars-nicht(dot)de> wrote:
> On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
>> Andreas 'ads' Scherbaum wrote:
>> > On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
>> >
>> >> Well, yes, but you do know about archive_timeout, right? No need to wait
>> >> 2 hours.
>> >
>> > Then you ship 16 MB binary stuff every 30 second or every minute but
>> > you only have some kbyte real data in the logfile. This must be taken
>> > into account, especially if you ship the logfile over the internet
>> > (means: no high-speed connection, maybe even pay-per-traffic) to the
>> > slave.
>>
>> Sure there's a price to pay. But that doesn't mean the facility doesn't
>> exist. And I rather suspect that most of Josh's customers aren't too
>> concerned about traffic charges or affected by such bandwidth
>> restrictions. Certainly, none of my clients are, and they aren't in the
>> giant class. Shipping a 16Mb file, particularly if compressed, every
>> minute or so, is not such a huge problem for a great many commercial
>> users, and even many domestic users.
>
> The real problem is not the 16 MB, the problem is: you can't compress
> this file. If the logfile is rotated it still contains all the
> old binary data which is not a good starter for compression.

Using bzip2 in my archive_command script, my WAL files are normally
compressed to between 2MB and 5MB, depending on the write load
(larger, and more of them, in the middle of the day). bzip2
compression is more expensive and rotated WAL files are not
particularly compressable to be sure, but due to (and given) the
nature of the data bzip2 works pretty well, and much better than gzip.

>
> So you may have some kB changes in the wal logfile every minute but you
> still copy 16 MB data. Sure, it's not so much - but if you rotate a
> logfile every minute this still transfers 16*60*24 = ~23 GB a day.
>

I archived 1965 logs yesterday on one instance of my app totalling
8.5GB ... not to bad, really.

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Mike Rylander" <mrylander(at)gmail(dot)com>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 14:21:41
Message-ID: b42b73150805310721l20348338h1141939855ad52b0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sat, May 31, 2008 at 2:18 AM, Mike Rylander <mrylander(at)gmail(dot)com> wrote:
> On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum
> <adsmail(at)wars-nicht(dot)de> wrote:
>> On Fri, 30 May 2008 17:05:57 -0400 Andrew Dunstan wrote:
>>> Andreas 'ads' Scherbaum wrote:
>>> > On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
>>> >
>>> >> Well, yes, but you do know about archive_timeout, right? No need to wait
>>> >> 2 hours.
>>> >
>>> > Then you ship 16 MB binary stuff every 30 second or every minute but
>>> > you only have some kbyte real data in the logfile. This must be taken
>>> > into account, especially if you ship the logfile over the internet
>>> > (means: no high-speed connection, maybe even pay-per-traffic) to the
>>> > slave.
>>>
>>> Sure there's a price to pay. But that doesn't mean the facility doesn't
>>> exist. And I rather suspect that most of Josh's customers aren't too
>>> concerned about traffic charges or affected by such bandwidth
>>> restrictions. Certainly, none of my clients are, and they aren't in the
>>> giant class. Shipping a 16Mb file, particularly if compressed, every
>>> minute or so, is not such a huge problem for a great many commercial
>>> users, and even many domestic users.
>>
>> The real problem is not the 16 MB, the problem is: you can't compress
>> this file. If the logfile is rotated it still contains all the
>> old binary data which is not a good starter for compression.
>
> Using bzip2 in my archive_command script, my WAL files are normally
> compressed to between 2MB and 5MB, depending on the write load
> (larger, and more of them, in the middle of the day). bzip2
> compression is more expensive and rotated WAL files are not
> particularly compressable to be sure, but due to (and given) the
> nature of the data bzip2 works pretty well, and much better than gzip.

Compression especially is going to negate one of the big advantages of
wal shipping, namely that it is cheap investment in terms of load to
the main. A gigabit link can ship a lot of log files, you can always
bond and 10gige is coming. IMO the key trick is to make sure you
don't send the log file more than once from the same source...i.e
cascading relay.

merlin


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Mike Rylander <mrylander(at)gmail(dot)com>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-05-31 15:14:48
Message-ID: 48416B68.1060708@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Merlin Moncure wrote:
> On Sat, May 31, 2008 at 2:18 AM, Mike Rylander <mrylander(at)gmail(dot)com> wrote:
>> On Fri, May 30, 2008 at 6:47 PM, Andreas 'ads' Scherbaum

> Compression especially is going to negate one of the big advantages of
> wal shipping, namely that it is cheap investment in terms of load to
> the main. A gigabit link can ship a lot of log files, you can always

Who has a gigabit link between Dallas and Atlanta? That is the actual
problem here. Switch to Switch compression is a waste of time (if you
aren't running GiGE, what are you doing???).

Sincerely,

Joshua D. Drake


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-advocacy(at)postgresql(dot)org
Cc: Dirk Riehle <dirk(at)riehle(dot)org>
Subject: Re: Feedback on blog post about Replication Feature decision and its impact
Date: 2008-05-31 18:48:26
Message-ID: 200805311148.26975.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Dirk,

> <a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
> database startup whose product builds on PostgreSQL. EnterpriseDB adds
> many "enterprise-readiness" features to the basic PostgreSQL product,
> including database replication, and much more.

The replication-in-core vs. not-in-core has absolutely nothing to do with
EnterpriseDB either way. I think you'd be doing a disservice to your readers
by implying that it does. Or with the GPL. If you want to blog about these
things, maybe break them up into seperate posts?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-advocacy(at)postgresql(dot)org
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Dirk Riehle <dirk(at)riehle(dot)org>
Subject: Re: Feedback on blog post about Replication Feature decision and its impact
Date: 2008-06-01 03:22:23
Message-ID: 200805312322.23291.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Saturday 31 May 2008 14:48:26 Josh Berkus wrote:
> Dirk,
>
> > <a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
> > database startup whose product builds on PostgreSQL. EnterpriseDB adds
> > many "enterprise-readiness" features to the basic PostgreSQL product,
> > including database replication, and much more.
>
> The replication-in-core vs. not-in-core has absolutely nothing to do with
> EnterpriseDB either way. I think you'd be doing a disservice to your
> readers by implying that it does. Or with the GPL. If you want to blog
> about these things, maybe break them up into seperate posts?
>

Adding any feature into core Postgres can have a ripple effect on
EnterpriseDB, so I don't see any reason not to discuss these issues. I'd be
more concern about the factual errors in the post (things like "core team of
seven committers") than exactly where the OP thinks the discussion should go.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Shane Ambler <pgsql(at)Sheeky(dot)Biz>
To: Dirk Riehle <dirk(at)riehle(dot)org>
Cc: pgsql-advocacy(at)postgresql(dot)org
Subject: Re: Feedback on blog post about Replication Feature decision and its impact
Date: 2008-06-01 05:23:21
Message-ID: 48423249.4090406@Sheeky.Biz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Dirk Riehle wrote:

> <h3>Won't EnterpriseDB loose sales once replication is added to the core
> PostgreSQL product?</h3>
>

> <h3>Won't a reduced differentiation between EnterpriseDB and the core
> product reduce their addressable market?</h3>
>

To these two I would mention that although the developers are planning
to add basic replication with a goal of easy config, this won't take out
the need for a more complex replication options that EDB offers

--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz


From: Dirk Riehle <dirk(at)riehle(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-advocacy(at)postgresql(dot)org
Subject: Re: Feedback on blog post about Replication Feature decision and its impact
Date: 2008-06-01 13:31:44
Message-ID: 4842A4C0.5020506@riehle.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Thanks for the feedback. I was a bit worried the post would be too
bland, but I think I'm seeing now where I wanted to go in the first
place, which is to discuss perceptions around conflict of interest.

I sometimes hear stuff like "in order to take over an open source
project you need to hire the committers". Ignoring the insult to the
integrity of the committers, I think this is also based on a wrong idea
of conflict of interest.

One typical perceived conflict of interest is that commercial companies
may use committers to keep commercially relevant features out of the
free community product in order to better facilitate an upsell. One
might have argued that replication in PostgreSQL is a case in point. So
in my blog post I argue that enhancing the core product is actually in
the interest of commercial offerings because "the enemy" is not the free
community edition but rather alternative products like Oracle or MySQL.
And making the free community product stronger beefs up the sales
process, because a corner stone of open source based sales processes is
to get free versions into potential customer companies.

Now, I'm sure I'm a bit naive about this, however, the core argument I
make above seems right to me. It would be interesting see where actual
conflicts of interest happen and the last defense for the community is
actually the integrity of the committers, and not some economic reasoning.

Cheers,
Dirk

Josh Berkus wrote:
> Dirk,
>
>
>> <a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
>> database startup whose product builds on PostgreSQL. EnterpriseDB adds
>> many "enterprise-readiness" features to the basic PostgreSQL product,
>> including database replication, and much more.
>>
>
> The replication-in-core vs. not-in-core has absolutely nothing to do with
> EnterpriseDB either way. I think you'd be doing a disservice to your readers
> by implying that it does. Or with the GPL. If you want to blog about these
> things, maybe break them up into seperate posts?
>
>

--
Into novel software paradigms, tools, processes?
Then submit a short paper to Onward! 2008 by July 2nd!
See http://www.oopsla.org/oopsla2008/cfp/cfp-onward.html
--
Phone: + 1 (650) 215 3459, Web: http://www.riehle.org


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Dirk Riehle" <dirk(at)riehle(dot)org>
Cc: pgsql-advocacy(at)postgresql(dot)org
Subject: Re: Feedback on blog post about Replication Feature decision and its impact
Date: 2008-06-01 14:56:10
Message-ID: 36e682920806010756y2a43ad0cva8ed6ba551c89df4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, May 30, 2008 at 6:21 AM, Dirk Riehle <dirk(at)riehle(dot)org> wrote:
> <a href="http://www.enterprisedb.com/">EnterpriseDB</a> is a well-funded
> database startup whose product builds on PostgreSQL. EnterpriseDB adds many
> "enterprise-readiness" features to the basic PostgreSQL product, including
> database replication, and much more. One might argue that it is not in the
> interest of EnterpriseDB to have replication added to PostgreSQL as it
> reduces the differentiation between the free community product and the more
> advanced commercial offering. Why pay for EnterpriseDB if you already get
> what you need from the free version? Won't adding replication to the core
> product reduce EnterpriseDBs sales? This tension seems only to get worse
> when you realize that EnterpriseDB employs several of the core developers of
> PostgreSQL, suggesting a direct conflict of interest when making decisions
> like whether to add replication or not.<p>

Having worked for EnterpriseDB for about three years now, and
understanding that there are several facets of our business model, I
can say that adding these types of features wouldn't hurt us at all.
In fact, they would make some of our work easier, and would be
welcomed.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah(dot)harris(at)enterprisedb(dot)com
Edison, NJ 08837 | http://www.enterprisedb.com/


From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 15:58:59
Message-ID: C46885D3.8024%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hi Merlin,

My point here is that with reasonably small extensions to the core you can build products that are a lot better than SLONY. Triggers do not cover DDL, among other issues, and it's debatable whether they are the best way to implement quorum policies like Google's semi-synchronous replication. As I mentioned separately this topic deserves another thread which I promise to start.

It is of course possible to meet some of these needs with an appropriate client interface to WAL shipping. There's no a-priori reason why built-in PostgreSQL slaves need to be the only client. I would put a vote in for covering this possibility in the initial replication design. We are using a very similar approach in our own master/slave replication product.

Thanks, Robert

P.S., No offense intended to Jan Wieck et al. There are some pretty cool things in SLONY.

On 5/29/08 8:16 PM, "Merlin Moncure" <mmoncure(at)gmail(dot)com> wrote:

On Thu, May 29, 2008 at 3:05 PM, Robert Hodges
<robert(dot)hodges(at)continuent(dot)com> wrote:
> Third, you can't stop with just this feature. (This is the BUT part of the
> post.) The use cases not covered by this feature area actually pretty
> large. Here are a few that concern me:
>
> 1.) Partial replication.
> 2.) WAN replication.
> 3.) Bi-directional replication. (Yes, this is evil but there are problems
> where it is indispensable.)
> 4.) Upgrade support. Aside from database upgrade (how would this ever
> really work between versions?), it would not support zero-downtime app
> upgrades, which depend on bi-directional replication tricks.
> 5.) Heterogeneous replication.
> 6.) Finally, performance scaling using scale-out over large numbers of
> replicas. I think it's possible to get tunnel vision on this-it's not a big
> requirement in the PG community because people don't use PG in the first
> place when they want to do this. They use MySQL, which has very good
> replication for performance scaling, though it's rather weak for
> availability.

These type of things are what Slony is for. Slony is trigger based.
This makes it more complex than log shipping style replication, but
provides lots of functionality.

wal shipping based replication is maybe the fastest possible
solution...you are already paying the overhead so it comes virtually
for free from the point of view of the master.

mysql replication is imo nearly worthless from backup standpoint.

merlin

--
Robert Hodges, CTO, Continuent, Inc.
Email: robert(dot)hodges(at)continuent(dot)com
Mobile: +1-510-501-3728 Skype: hodgesrm


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Robert Hodges" <robert(dot)hodges(at)continuent(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David Fetter" <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 17:19:10
Message-ID: b42b73150806011019g34be3a34t346ab0ed621bca76@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sun, Jun 1, 2008 at 11:58 AM, Robert Hodges
<robert(dot)hodges(at)continuent(dot)com> wrote:
> Hi Merlin,
>
> My point here is that with reasonably small extensions to the core you can
> build products that are a lot better than SLONY. Triggers do not cover
> DDL, among other issues, and it's debatable whether they are the best way to
> implement quorum policies like Google's semi-synchronous replication. As I
> mentioned separately this topic deserves another thread which I promise to
> start.

These issues are much discussed and well understood. At this point,
the outstanding points of discussion are technical...how to make this
thing work.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "Robert Hodges" <robert(dot)hodges(at)continuent(dot)com>, "David Fetter" <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 17:43:22
Message-ID: 23878.1212342202@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On Sun, Jun 1, 2008 at 11:58 AM, Robert Hodges
> <robert(dot)hodges(at)continuent(dot)com> wrote:
>> My point here is that with reasonably small extensions to the core you can
>> build products that are a lot better than SLONY.

> These issues are much discussed and well understood.

Well, what we know is that previous attempts to define replication hooks
to be added to the core have died for lack of interest. Maybe Robert
can start a new discussion that will actually get somewhere; if so, more
power to him. (Is the replica-hooks-discuss list still working?) But
that is entirely orthogonal to what is proposed in this thread, which
is to upgrade the existing PITR support into a reasonably useful
replication feature.

regards, tom lane


From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 18:31:13
Message-ID: 758d5e7f0806011131t70ff9bbdl57926d22dd04a024@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, May 29, 2008 at 4:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The Postgres core team met at PGCon to discuss a few issues, the largest
> of which is the need for simple, built-in replication for PostgreSQL.
[...]
> We believe that the most appropriate base technology for this is
1> probably real-time WAL log shipping, as was demoed by NTT OSS at PGCon.
> We hope that such a feature can be completed for 8.4. Ideally this
> would be coupled with the ability to execute read-only queries on the
> slave servers, but we see technical difficulties that might prevent that
> from being completed before 8.5 or even further out. (The big problem
> is that long-running slave-side queries might still need tuples that are
> vacuumable on the master, and so replication of vacuuming actions would
> cause the slave's queries to deliver wrong answers.)
>
> Again, this will not replace Slony, pgPool, Continuent, Londiste, or
> other systems for many users, as it will be not be highly scalable nor
> support long-distance replication nor replicating less than an entire
> installation. But it is time to include a simple, reliable basic
> replication feature in the core system.

Hello!

I thought I would share a few thoughts of my own about the issue.
I have a hands-on experience with Oracle and MySQL apart from
PostgreSQL so I hope it will be a bit interesting.

The former has a feature called "physical standby", which looks
quite like our WAL-shipping based replication. Simply archived
logs are replayed on the standby database. A primary database
and standby database are connected, and can stream the logs
directly. They either copy the log when its finished (as we do now)
or can do it in coninuous manner (as I hope we will be able to).

It is possible to have a synchronous replication (where "COMMIT"
on primary database succeeds when the data is safely stored on
the standby database). I think such a feature would be a great
advantage for PostgreSQL (where you cannot afford to loose
any transactions).

Their standby database is not accessible. It can be opened read-only,
but during that time replication stops. So PostgreSQL having
read-only and still replicating standby database would be great.

The other method is "logical standby" which works by dissecting
WAL-logs and recreating DDLs/DMLs from it. Never seen anyone
use it. ;-)

Then we have a mysql replication -- done by replaying actual DDLs/DMLs
on the slaves. This approach has issues, most notably when slaves are
highly loaded and lag behind the master -- so you end up with infrastructure
to monitor lags and turn off slaves which lag too much. Also it is painful
to setup -- you have to stop, copy, configure and run.

* Back to PostgreSQL world

As for PostgreSQL solutions we have a slony-I, which is great as long as
you don't have too many people managing the database and/or your
schema doesn't change too frequently. Perhaps it would be maintainable
more easily if there would be to get DDLs (as DDL triggers or similar).
Its main advantages for me is ability to prepare complex setups and
easily add new slaves). The pgpool solution is quite nice but then
again adding a new slave is not so easy. And being a filtering
layer between client and server it feels a bit "fragile" (I know it is not,
but then again it is harder to convince someone that "yes it will work
100% right all the time").

* How I would like PostgreSQL WAL-replication to evolve:

First of all it would be great if a slave/standby would contact the master
and maintain the state with it (tell it its xmin, request a log to stream,
go online-streaming). Especially I hope that it should be possible
to make a switchover (where the two databases exchange roles),
and in this the direct connection between the two should help.

In detail, I think it should go like this:
* A slave database starts up, checks that it works as a "replica"
(hopefully it would not be a postgresql.conf constant, but rather
some file maintained by the database).
* It would connect to the master database, tell where in the WAL
it is now, and request a log N.
* If log N is not available, request a log from external supplied
script (so that it could be fetched from log archive repository
somewhere, recovered from a backup tape, etc).
* Continue asking, until we get to the logs which are available
at master database.
* Continue replaying until we get within max_allowed_replication_lag
time, and open our slave for read-only queries.
* If we start lagging too much perhaps close the read-only access
to the database (perhaps configurable?).

I think that replication should be easy to set up. I think our
archive_command is quite easy, but many a person come
with a lot of misconceptions how it works (and it takes time
to explain them how it actually work, especially what is
archive_command for, and that pg_start_backup() doesn't
actually _do_ backup, but just tells PostgreSQL that
backup is being done).

Easy to setup and easy to switchover (change the direction
of replication) and failover.

Also, it would be interesting to be able to have a shared-storage
solution where a single database would modify data and multiple
read-only slaves would access the data. The WAL-shipping would
be used then to "push" modified pages into read-only slaves' memory.
And each checkpoint would allow read-only slaves to release these
pages from memory. Kind of replication without actually doing any
writes.

* high available libpq?

Other thing worth to consider is perhaps adding a HA-capability
to libpq. Currently I use a setup where I have machine with
database and pg_bouncer (for connection pooling at database
level) and on client machines I have a pgpool on localhost
(for transparent failover). I think some level of replication
awareness could be built-in into libpq. It would allow to make
simpler, but still HA, setups.

Perhaps ability to specify a standby in connect string, something like:

"host=foo1.bar;db=pgdb;failover=yes;stbyhost=foo2.bar"

...a failover="yes" would allow libpq to try connecting to stbyhost
if host is not available AND succeed if stbyhost announces itself
as a new master.

This would make switchover easier (clients would be told, that
we are closing, and that they should connect to a new master).

Also some way of supplying multiple hosts, where client should
pick a host at random, would be quite useful, say:

"host=foo1.bar,foo2.bar,foo3.bar;db=pgdb;loadbalance=random"

Why in libpq and not in [language/framework/application]? Well
it would help with not reinventing the wheel. And if we want
a painless switchover, client should be aware that there is
a standby ready. Also, being optional it wouldn't change anything
for those who don't want it.

Regards,
Dawid

PS: As for pgpool/pg_bouncer -- I wish there would be a way for
them to tell the database that they are proxying for someone
else (so that I could check in pg_stat_activity who is on the far
end of the database connection, not "localhost" ;)).
--
Solving [site load issues] with [more database replication] is a lot
like solving your own personal problems with heroin - at first it
sorta works, but after a while things just get out of hand.


From: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 19:57:00
Message-ID: 4842FF0C.1020604@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

David Fetter wrote:
> This part is a deal-killer. It's a giant up-hill slog to sell warm
> standby to those in charge of making resources available because the
> warm standby machine consumes SA time, bandwidth, power, rack space,
> etc., but provides no tangible benefit, and this feature would have
> exactly the same problem.
>
> IMHO, without the ability to do read-only queries on slaves, it's not
> worth doing this feature at all.
>
That's not something that squares with my experience *at all*, which
admitedly is entirely in
investment banks. Business continuity is king, and in some places the
warm standby rep
from the database vendor is trusted more than block-level rep from the
SAN vendor
(though that may be changing to some extent in favour of the SAN).

James


From: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Dave Page <dpage(at)pgadmin(dot)org>, Douglas McNaught <doug(at)mcnaught(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 20:16:51
Message-ID: 484303B3.10109@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Aidan Van Dyk wrote:
> The whole single-threaded WAL replay problem is going to rear it's ugly
> head here too, and mean that a slave *won't* be able to keep up with a
> busy master if it's actually trying to apply all the changes in
> real-time.
Is there a reason to commit at the same points that the master
committed? Wouldn't relaxing
that mean that at least you would get 'big' commits and some economy of
scale? It might
not be too bad. All I can say is that Sybase warm standby is useful,
even though the rep
for an update that changes a hundred rows is a hundred updates keyed on
primary key,
which is pretty sucky in terms of T-SQL performance.


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 20:42:11
Message-ID: 1212352931.18365.17.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Fri, 2008-05-30 at 15:16 -0400, Robert Treat wrote:
> On Friday 30 May 2008 01:10:20 Tom Lane wrote:
> > Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> > > I fully accept that it may be the case that it doesn't make technical
> > > sense to tackle them in any order besides sync->read-only slaves because
> > > of dependencies in the implementation between the two.
> >
> > Well, it's certainly not been my intention to suggest that no one should
> > start work on read-only-slaves before we finish the other part. The
> > point is that I expect the log shipping issues will be done first
> > because they're easier, and it would be pointless to not release that
> > feature if we had it.
> >
> > But since you mention it: one of the plausible answers for fixing the
> > vacuum problem for read-only slaves is to have the slaves push an xmin
> > back upstream to the master to prevent premature vacuuming. The current
> > design of pg_standby is utterly incapable of handling that requirement.
> > So there might be an implementation dependency there, depending on how
> > we want to solve that problem.
> >
>
> Sure, but whose to say that after synchronous wal shipping is "finished" it
> wont need a serious re-write due to new needs from the hot standby feature. I
> think going either way carries some risk.

The simplest form of synchronous wal shipping would not even need
postgresql running on slave, just a small daemon which reports when wal
blocks are a) received and b) synced to disk.

This setup would just guarantee "no data loss on single machine
failure". form there on you could add various features, including
support for both switchover and failover, async replication to multiple
slaves, etc.

the only thing that needs anything additional from slave wal-receiving
daemon is when you want the kind of wal-sync which would guarantee that
read-only query on slave issued after commit returns from master sees
latest data. for this kinds of guarantees you need at least feedback
about wal-replay, but possibly also shared transaction numbers and
shared snapshots, to be sure that OLTP type queries see the latest and
OLAP queries are not denied seeing VACUUMED on master.

--------------
Hannu


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 20:42:29
Message-ID: 1212352949.18365.18.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 13:37 -0400, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
> > On Thu, May 29, 2008 at 08:46:22AM -0700, Joshua D. Drake wrote:
> >> The only question I have is... what does this give us that PITR
> >> doesn't give us?
>
> > It looks like a wrapper for PITR to me, so the gain would be ease of
> > use.
>
> A couple of points about that:
>
> * Yeah, ease of use is a huge concern here. We're getting beat up
> because people have to go find a separate package (and figure out
> which one they want), install it, learn how to use it, etc. It doesn't
> help that the most mature package is Slony which is, um, not very
> novice-friendly or low-admin-complexity. I personally got religion
> on this about two months ago when Red Hat switched their bugzilla
> from Postgres to MySQL because the admins didn't want to deal with Slony
> any more. People want simple.
>
> * The proposed approach is trying to get to "real" replication
> incrementally. Getting rid of the loss window involved in file-by-file
> log shipping is step one,

Actually we can already do better than file-by-file by using
pg_xlogfile_name_offset() which was added sometime in 2006. SkyTools for
example does this to get no more than a few seconds failure window.

Doing this synchronously would be of course better.

probably we should use the same "modes/protocols" as DRBD when
determining when a "sync" wal write is "done"

quote from 
http://www.slackworks.com/~dkrovich/DRBD/usingdrbdsetup.html#AEN76

Table 1. DRBD Protocols

Protocol
Description
A
A write operation is complete as
soon as the data is written to disk
and sent to the network.
B
A write operation is complete as
soon as a reception acknowledgement
arrives.
C
A write operation is complete as
soon as a write acknowledgement
arrives.

There are also additional paramaters you can pass to the disk and net
options. See the drbdsetup man page for additional information

/end quote

> and I suspect that step two is going to be
> fixing performance issues in WAL replay to ensure that slaves can keep
> up. After that we'd start thinking about how to let slaves run
> read-only queries. But even without read-only queries, this will be
> a useful improvement for HA/backup scenarios.
>
> regards, tom lane
>


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-01 21:14:39
Message-ID: 1212354879.18365.30.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 12:05 -0700, Robert Hodges wrote:
> Hi everyone,
>
> First of all, I’m absolutely delighted that the PG community is
> thinking seriously about replication.
>
> Second, having a solid, easy-to-use database availability solution
> that works more or less out of the box would be an enormous benefit to
> customers. Availability is the single biggest problem for customers
> in my experience and as other people have commented the alternatives
> are not nice. It’s an excellent idea to build off an existing feature
> —PITR is already pretty useful and the proposed features are solid
> next steps. The fact that it does not solve all problems is not a
> drawback but means it’s likely to get done in a reasonable timeframe.
>
> Third, you can’t stop with just this feature. (This is the BUT part
> of the post.) The use cases not covered by this feature area actually
> pretty large. Here are a few that concern me:
>
> 1.) Partial replication.
> 2.) WAN replication.

1.) & 2.) are better done asunc, the domain of Slony-I/Londiste

> 3.) Bi-directional replication. (Yes, this is evil but there are
> problems where it is indispensable.)

Sure, it is also a lot harder and always has several dimensions
(performanse/availability7locking) which play against each other

> 4.) Upgrade support. Aside from database upgrade (how would this ever
> really work between versions?), it would not support zero-downtime app
> upgrades, which depend on bi-directional replication tricks.

Or you could use zero-downtime app upgrades, which don't depend on
this :P

> 5.) Heterogeneous replication.
> 6.) Finally, performance scaling using scale-out over large numbers of
> replicas. I think it’s possible to get tunnel vision on this—it’s not
> a big requirement in the PG community because people don’t use PG in
> the first place when they want to do this. They use MySQL, which has
> very good replication for performance scaling, though it’s rather weak
> for availability.

Again, doing scale-out over large number of replicas should either be
async or for sync use some broadcast channel to all slaves (and still be
a performance problem on master, as it has to wait for slowest slave).

> As a consequence, I don’t see how you can get around doing some sort
> of row-based replication like all the other databases.

Is'nt WAL-base replication "some sort of row-based replication" ?

> Now that people are starting to get religion on this issue I would
> strongly advocate a parallel effort to put in a change-set extraction
> API that would allow construction of comprehensive master/slave
> replication.

Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko
Kreen's presentation at pgCon08.

> (Another approach would be to make it possible for third party apps
> to read the logs and regenerate SQL.)

which logs ? WAL or SQL command logs ?

> There are existing models for how to do change set extraction; we have
> done it several times at my company already. There are also research
> projects like GORDA that have looked fairly comprehensively at this
> problem.

pgQ with its triggers does a pretty good job of change-set extraction.

------------------
Hannu


From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-02 06:37:30
Message-ID: C468E33A.8116%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hi Hannu,

Hi Hannu,

On 6/1/08 2:14 PM, "Hannu Krosing" <hannu(at)krosing(dot)net> wrote:

>
>> As a consequence, I don¹t see how you can get around doing some sort
>> of row-based replication like all the other databases.
>
> Is'nt WAL-base replication "some sort of row-based replication" ?
>
Yes, in theory. However, there's a big difference between replicating
physical WAL records and doing logical replication with SQL statements.
Logical replication requires extra information to reconstruct primary keys.
(Somebody tell me if this is already in the WAL; I'm learning the code as
fast as possible but assuming for now it's not.)

>
>> Now that people are starting to get religion on this issue I would
>> strongly advocate a parallel effort to put in a change-set extraction
>> API that would allow construction of comprehensive master/slave
>> replication.
>
> Triggers. see pgQ's logtrigga()/logutrigga(). See slides for Marko
> Kreen's presentation at pgCon08.
>
>
Thanks very much for the pointer. The slides look interesting.

Robert


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-02 15:52:05
Message-ID: 60ej7fkf6i.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

adsmail(at)wars-nicht(dot)de ("Andreas 'ads' Scherbaum") writes:
> On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
>
>> Well, yes, but you do know about archive_timeout, right? No need to wait
>> 2 hours.
>
> Then you ship 16 MB binary stuff every 30 second or every minute but
> you only have some kbyte real data in the logfile. This must be taken
> into account, especially if you ship the logfile over the internet
> (means: no high-speed connection, maybe even pay-per-traffic) to the
> slave.

If you have that kind of scenario, then you have painted yourself into
a corner, and there isn't anything that can be done to extract you
from it.

Consider: If you have so much update traffic that it is too much to
replicate via WAL-copying, why should we expect that other mechanisms
*wouldn't* also overflow the connection?

If you haven't got enough network bandwidth to use this feature, then
nobody is requiring that you use it. It seems like a perfectly
reasonable prerequisite to say "this requires that you have enough
bandwidth."
--
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/
"There's nothing worse than having only one drunk head."
-- Zaphod Beeblebrox


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Mathias Brossard <mathias(dot)brossard(at)opentrust(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-02 17:37:59
Message-ID: 1212428279.8183.5.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Thu, 2008-05-29 at 23:37 +0200, Mathias Brossard wrote:

> I pointed out that the NTT solution is synchronous because Tom said in
> the first part of his email that:
>
> > In practice, simple asynchronous single-master-multiple-slave
> > replication covers a respectable fraction of use cases, so we have
> > concluded that we should allow such a feature to be included in the
> > core project.
>
> ... and yet "the most appropriate base technology for this" is
> synchronous and maybe I should have also pointed out in my previous mail
> is that it doesn't support multiple slaves.

I don't think that you need too many slaves in sync mode.

Probably 1-st slave sync and others async from there on will be good
enough.

> Also, as other have pointed out there are different interpretations of
> "synchronous" depending on wether the WAL data has reached the other end
> of the network connection, a safe disk checkpoint or the slave DB itself.

Probably all DRBD-s levels ( A) data sent to network, B) data received,
C) data written to disk) should be supported + C1) data replayed in
slave DB. C1 meaning that it can be done in parallel with C)

Then each DBA can set it up depending on what he trusts - network,
slave's power supply or slaves' disk.

Also, the case of slave failure should be addressed. I don't think that
the best solution is halting all ops on master if slave/network fails.

Maybe we should allow also a setup with 2-3 slaves, where operations can
continue when at least 1 slave is "syncing" ?

--------------
Hannu


From: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-02 20:40:47
Message-ID: 20080602224047.6dc8a60b@iridium.wars-nicht.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote:

> adsmail(at)wars-nicht(dot)de ("Andreas 'ads' Scherbaum") writes:
> > On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
> >
> >> Well, yes, but you do know about archive_timeout, right? No need to wait
> >> 2 hours.
> >
> > Then you ship 16 MB binary stuff every 30 second or every minute but
> > you only have some kbyte real data in the logfile. This must be taken
> > into account, especially if you ship the logfile over the internet
> > (means: no high-speed connection, maybe even pay-per-traffic) to the
> > slave.
>
> If you have that kind of scenario, then you have painted yourself into
> a corner, and there isn't anything that can be done to extract you
> from it.

You are misunderstanding something. It's perfectly possible that you
have a low-traffic database with changes every now and then. But you
have to copy a full 16 MB logfile every 30 seconds or every minute just
to have the slave up-to-date.

> Consider: If you have so much update traffic that it is too much to
> replicate via WAL-copying, why should we expect that other mechanisms
> *wouldn't* also overflow the connection?

For some MB real data you copy several GB logfiles per day - that's a
lot overhead, isn't it?

> If you haven't got enough network bandwidth to use this feature, then
> nobody is requiring that you use it. It seems like a perfectly
> reasonable prerequisite to say "this requires that you have enough
> bandwidth."

If you have a high-traffic database, then of course you need an other
connection as if you only have a low-traffic or a mostly read-only
database. But that's not the point. Copying an almost unused 16 MB WAL
logfile is just overhead - especially because the logfile is not
compressable very much because of all the leftovers from earlier use.

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Andreas 'ads' Scherbaum <adsmail(at)wars-nicht(dot)de>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-03 07:19:59
Message-ID: 1212477599.7267.3.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Mon, 2008-06-02 at 22:40 +0200, Andreas 'ads' Scherbaum wrote:
> On Mon, 02 Jun 2008 11:52:05 -0400 Chris Browne wrote:
>
> > adsmail(at)wars-nicht(dot)de ("Andreas 'ads' Scherbaum") writes:
> > > On Thu, 29 May 2008 23:02:56 -0400 Andrew Dunstan wrote:
> > >
> > >> Well, yes, but you do know about archive_timeout, right? No need to wait
> > >> 2 hours.
> > >
> > > Then you ship 16 MB binary stuff every 30 second or every minute but
> > > you only have some kbyte real data in the logfile. This must be taken
> > > into account, especially if you ship the logfile over the internet
> > > (means: no high-speed connection, maybe even pay-per-traffic) to the
> > > slave.
> >
> > If you have that kind of scenario, then you have painted yourself into
> > a corner, and there isn't anything that can be done to extract you
> > from it.
>
> You are misunderstanding something. It's perfectly possible that you
> have a low-traffic database with changes every now and then. But you
> have to copy a full 16 MB logfile every 30 seconds or every minute just
> to have the slave up-to-date.

To repeat my other post in this thread:

Actually we can already do better than file-by-file by using
pg_xlogfile_name_offset() which was added sometime in 2006. walmgr.py
from SkyTools package for example does this to get no more than a few
seconds failure window and it copies just the changed part of WAL to
slave.

pg_xlogfile_name_offset() was added just for this purpose - to enable
WAL shipping scripts to query, where inside the logfile current write
pointer is.

It is not synchronous, but it can be made very close, within subsecond
if you poll it frequently enough.

-------------------
Hannu


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 03:20:11
Message-ID: 20080604032011.GB40347@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Sun, Jun 01, 2008 at 01:43:22PM -0400, Tom Lane wrote:
> power to him. (Is the replica-hooks-discuss list still working?) But

Yes. And silent as ever. :-)

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Hannu Krosing" <hannu(at)krosing(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 04:08:46
Message-ID: F0238EBA67824444BC1CB4700960CB480576691E@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hannu Krosing wrote:
> The simplest form of synchronous wal shipping would not even need
> postgresql running on slave, just a small daemon which
> reports when wal
> blocks are a) received and b) synced to disk.

While that does sound simple, I'd presume that most people would want the guarantee of the same version of postgresql installed wherever the logs are ending up, with the log receiver speaking the same protocol version as the log sender. I imagine that would be most easily achieved through using something like the continuously restoring startup mode of current postgresql.

However variations on this kind of daemon can be used to perform testing, configuring it to work well, go slow, pause, not respond, disconnect, or fail in particular ways, emulating disk full, etc.

Regards,
Stephen Denne.
--
At the Datamail Group we value teamwork, respect, achievement, client focus, and courage.
This email with any attachments is confidential and may be subject to legal privilege.
If it is not intended for you please advise by replying immediately, destroy it and do not
copy, disclose or use it in any way.

The Datamail Group, through our GoGreen programme, is committed to environmental sustainability.
Help us in our efforts by not printing this email.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________


From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Andrew Sullivan <ajs(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 07:24:20
Message-ID: 48464324.1030307@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hello Andrew,

Andrew Sullivan wrote:
> Yes. And silent as ever. :-)

Are the slides of your PgCon talk available for download somewhere?

BTW: up until recently, there was yet another mailing list:
pgreplication-general(at)gborg(dot)postgresql(dot)org(dot) It was less focused on hooks
and got at least some traffic. :-) Are those mails still archived somewhere?

Regards

Markus


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc: "Hannu Krosing" <hannu(at)krosing(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 08:13:52
Message-ID: 48464EC0.5020307@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Stephen Denne wrote:
> Hannu Krosing wrote:
>> The simplest form of synchronous wal shipping would not even need
>> postgresql running on slave, just a small daemon which
>> reports when wal
>> blocks are a) received and b) synced to disk.
>
> While that does sound simple, I'd presume that most people would want the guarantee of the same version of postgresql installed wherever the logs are ending up, with the log receiver speaking the same protocol version as the log sender. I imagine that would be most easily achieved through using something like the continuously restoring startup mode of current postgresql.

Hmm, WAL version compatibility is an interesting question. Most minor
releases hasn't changed the WAL format, and it would be nice to allow
running different minor versions in the master and slave in those cases.
But it's certainly not unheard of to change the WAL format. Perhaps we
should introduce a WAL version number, similar to catalog version?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Hannu Krosing <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 10:51:26
Message-ID: 484673AE.7040809@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

>
> Hmm, WAL version compatibility is an interesting question. Most minor
> releases hasn't changed the WAL format, and it would be nice to allow
As I remember, high minor version should read all WALs from lowers, but it isn't
true for opposite case and between different major versions.

> running different minor versions in the master and slave in those cases.
> But it's certainly not unheard of to change the WAL format. Perhaps we
> should introduce a WAL version number, similar to catalog version?

Agree. Right now it only touches warm-stand-by servers, but introducing simple
log-shipping and based on it replication will cause a lot of unobvious
errors/bugs. Is it possible to use catalog version number as WAL version?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Hannu Krosing <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 10:58:53
Message-ID: 1212577133.8543.7.camel@PCD12478
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote:
> Hmm, WAL version compatibility is an interesting question. Most minor
> releases hasn't changed the WAL format, and it would be nice to allow
> running different minor versions in the master and slave in those cases.
> But it's certainly not unheard of to change the WAL format. Perhaps we
> should introduce a WAL version number, similar to catalog version?

Would that also cover possible differences in page size, 32bit OS vs.
64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
things can have an influence on how the data is written and possibly
make the WAL incompatible with other postgres instances, even if the
exact same version...

Cheers,
Csaba.


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Csaba Nagy" <nagy(at)ecircle-ag(dot)com>
Cc: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 11:17:55
Message-ID: 484679E3.2040202@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Csaba Nagy wrote:
> On Wed, 2008-06-04 at 11:13 +0300, Heikki Linnakangas wrote:
>> Hmm, WAL version compatibility is an interesting question. Most minor
>> releases hasn't changed the WAL format, and it would be nice to allow
>> running different minor versions in the master and slave in those cases.
>> But it's certainly not unheard of to change the WAL format. Perhaps we
>> should introduce a WAL version number, similar to catalog version?
>
> Would that also cover possible differences in page size, 32bit OS vs.
> 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
> things can have an influence on how the data is written and possibly
> make the WAL incompatible with other postgres instances, even if the
> exact same version...

These are already covered by the information in pg_control.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Cc: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 11:19:42
Message-ID: 48467A4E.5080101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Teodor Sigaev wrote:
> Is it possible to use catalog version number as WAL version?

No, because we don't change the catalog version number in minor
releases, even though we might change WAL format.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Sullivan <ajs(at)commandprompt(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 14:36:23
Message-ID: 20080604143622.GD47732@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Wed, Jun 04, 2008 at 09:24:20AM +0200, Markus Schiltknecht wrote:
>
> Are the slides of your PgCon talk available for download somewhere?

There weren't any slides, really (there were 4 that I put up in case
the cases I was discussing needed back-references, but they didn't).
Joshua tells me that I'm supposed to make the paper readable and put
it up on Command Prompt's website, so I will soon.

> BTW: up until recently, there was yet another mailing list:
> pgreplication-general(at)gborg(dot)postgresql(dot)org(dot) It was less focused on hooks
> and got at least some traffic. :-) Are those mails still archived
> somewhere?

Unless whoever was operating that list moved it to pgfoundry, I doubt
it (except on backups somewhere).

A

--
Andrew Sullivan
ajs(at)commandprompt(dot)com
+1 503 667 4564 x104
http://www.commandprompt.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 14:40:58
Message-ID: 24477.1212590458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> Hmm, WAL version compatibility is an interesting question. Most minor
> releases hasn't changed the WAL format, and it would be nice to allow
> running different minor versions in the master and slave in those cases.
> But it's certainly not unheard of to change the WAL format. Perhaps we
> should introduce a WAL version number, similar to catalog version?

Yeah, perhaps. In the past we've changed the WAL page ID field for
this; I'm not sure if that's enough or not. It does seem like a good
idea to have a way to check that the slaves aren't trying to read a
WAL version they don't understand. Also, it's possible that the WAL
format doesn't change across a major update, but you still couldn't
work with say an 8.4 master and an 8.3 slave, so maybe we need the
catalog version ID in there too.

regards, tom lane


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 15:27:04
Message-ID: 1212593224.7526.1.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Wed, 2008-06-04 at 10:40 -0400, Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> > Hmm, WAL version compatibility is an interesting question. Most minor
> > releases hasn't changed the WAL format, and it would be nice to allow
> > running different minor versions in the master and slave in those cases.
> > But it's certainly not unheard of to change the WAL format. Perhaps we
> > should introduce a WAL version number, similar to catalog version?
>
> Yeah, perhaps. In the past we've changed the WAL page ID field for
> this; I'm not sure if that's enough or not. It does seem like a good
> idea to have a way to check that the slaves aren't trying to read a
> WAL version they don't understand. Also, it's possible that the WAL
> format doesn't change across a major update, but you still couldn't
> work with say an 8.4 master and an 8.3 slave, so maybe we need the
> catalog version ID in there too.

And something dependent on datetime being integer.

We probably won't need to encode presence of user defined types, like
PostGis , being present ?

-----
Hannu


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 15:37:25
Message-ID: 25198.1212593845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hannu Krosing <hannu(at)krosing(dot)net> writes:
> On Wed, 2008-06-04 at 10:40 -0400, Tom Lane wrote:
>> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> Hmm, WAL version compatibility is an interesting question. Most minor
> releases hasn't changed the WAL format, and it would be nice to allow
> running different minor versions in the master and slave in those cases.
> But it's certainly not unheard of to change the WAL format. Perhaps we
> should introduce a WAL version number, similar to catalog version?
>>
>> Yeah, perhaps. In the past we've changed the WAL page ID field for
>> this; I'm not sure if that's enough or not. It does seem like a good
>> idea to have a way to check that the slaves aren't trying to read a
>> WAL version they don't understand. Also, it's possible that the WAL
>> format doesn't change across a major update, but you still couldn't
>> work with say an 8.4 master and an 8.3 slave, so maybe we need the
>> catalog version ID in there too.

> And something dependent on datetime being integer.

This thread is getting out of hand, actually.

Heikki's earlier comment about pg_control reminded me that we already
have a unique system identifier stored in pg_control and check that
against WAL headers. So I think we already have enough certainty that
the master and slaves have the same pg_control and hence are the same
for everything checked by pg_control.

However, since by definition pg_control doesn't change in a minor
upgrade, there isn't any easy way to enforce a rule like "slaves must be
same or newer minor version as the master". I'm not sure that we
actually *want* to enforce such a rule, though. Most of the time, the
other way around would work fine.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 16:16:40
Message-ID: 1212596200.4148.123.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


On Wed, 2008-06-04 at 11:37 -0400, Tom Lane wrote:

> This thread is getting out of hand, actually.

Agreed. We should start new threads for specific things. Please.

> However, since by definition pg_control doesn't change in a minor
> upgrade, there isn't any easy way to enforce a rule like "slaves must be
> same or newer minor version as the master". I'm not sure that we
> actually *want* to enforce such a rule, though.

Definitely don't want to prevent minor version mismatches. We want to be
able to upgrade a standby, have it catch up with the master then
switchover to the new version. Otherwise we'd have to take whole
replicated system down to do minor upgrades/backouts. Ugh!

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Hannu Krosing <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 17:18:17
Message-ID: 1212599897.3776.3.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote:
> > Would that also cover possible differences in page size, 32bit OS vs.
> > 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
> > things can have an influence on how the data is written and possibly
> > make the WAL incompatible with other postgres instances, even if the
> > exact same version...
>
> These are already covered by the information in pg_control.

Another thing that can change between systems is the collation behavior,
which can corrupt indexes (and other bad things).

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Hannu Krosing <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 18:23:38
Message-ID: 27600.1212603818@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote:
>> These are already covered by the information in pg_control.

> Another thing that can change between systems is the collation behavior,
> which can corrupt indexes (and other bad things).

That is covered by pg_control, at least to the extent of forcing the
same value of LC_COLLATE.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Stephen Denne <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, Hannu Krosing <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-04 18:26:02
Message-ID: 1212603962.3776.5.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Wed, 2008-06-04 at 14:23 -0400, Tom Lane wrote:
> That is covered by pg_control, at least to the extent of forcing the
> same value of LC_COLLATE.

But the same LC_COLLATE means different things on different systems.
Even "en_US" means something different on Mac versus Linux.

Regards,
Jeff Davis


From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-05 00:10:26
Message-ID: a778a7260806041710v2c0c44abld1d14bbcf1edb11d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Well, WAL format doesn't only depend on WAL itself, but also depend on
each resource manager. If we introduce WAL format version
identification, ISTM that we have to take care of the matching of
resource manager in the master and the slave as well.

2008/6/4 Heikki Linnakangas <heikki(at)enterprisedb(dot)com>:
> Stephen Denne wrote:
>>
>> Hannu Krosing wrote:
>>>
>>> The simplest form of synchronous wal shipping would not even need
>>> postgresql running on slave, just a small daemon which reports when wal
>>> blocks are a) received and b) synced to disk.
>>
>> While that does sound simple, I'd presume that most people would want the
>> guarantee of the same version of postgresql installed wherever the logs are
>> ending up, with the log receiver speaking the same protocol version as the
>> log sender. I imagine that would be most easily achieved through using
>> something like the continuously restoring startup mode of current
>> postgresql.
>
> Hmm, WAL version compatibility is an interesting question. Most minor
> releases hasn't changed the WAL format, and it would be nice to allow
> running different minor versions in the master and slave in those cases. But
> it's certainly not unheard of to change the WAL format. Perhaps we should
> introduce a WAL version number, similar to catalog version?
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
------
Koichi Suzuki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-05 00:30:34
Message-ID: 7565.1212625834@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com> writes:
> Well, WAL format doesn't only depend on WAL itself, but also depend on
> each resource manager. If we introduce WAL format version
> identification, ISTM that we have to take care of the matching of
> resource manager in the master and the slave as well.

That seems a bit overdesigned. What are the prospects that two builds
of the same Postgres version are going to have different sets of
resource managers in them?

regards, tom lane


From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-05 04:21:25
Message-ID: a778a7260806042121n4db0d490wa329d476419729a7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

If the version of the master and the slave is different and we'd still
like to allow log shipping replication, we need a negotiation if WAL
format for the two is compatible. I hope it is not in our scope
and I'm worrying too much.

2008/6/5 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com> writes:
>> Well, WAL format doesn't only depend on WAL itself, but also depend on
>> each resource manager. If we introduce WAL format version
>> identification, ISTM that we have to take care of the matching of
>> resource manager in the master and the slave as well.
>
> That seems a bit overdesigned. What are the prospects that two builds
> of the same Postgres version are going to have different sets of
> resource managers in them?
>
> regards, tom lane
>

--
------
Koichi Suzuki


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Csaba Nagy" <nagy(at)ecircle-ag(dot)com>, "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, "Hannu Krosing" <hannu(at)krosing(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-05 11:08:14
Message-ID: 87prqwkuld.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Jeff Davis" <pgsql(at)j-davis(dot)com> writes:

> On Wed, 2008-06-04 at 14:17 +0300, Heikki Linnakangas wrote:
>> > Would that also cover possible differences in page size, 32bit OS vs.
>> > 64bit OS, different timestamp flavour, etc. issues ? AFAIR, all these
>> > things can have an influence on how the data is written and possibly
>> > make the WAL incompatible with other postgres instances, even if the
>> > exact same version...
>>
>> These are already covered by the information in pg_control.
>
> Another thing that can change between systems is the collation behavior,
> which can corrupt indexes (and other bad things).

Well, yes and no. It's entirely possible, for example, for a minor release of
an OS to tweak the collation rules for a collation without changing the name.
For the sake of argument they might just be fixing a bug in the collation
rules. From the point of view of the OS that's a minor bug fix that they might
not foresee causing data corruption problems.

Pegging pg_control to a particular release of the OS would be pretty terrible
though. I don't really see an out for this. But it's another roadblock to
consider akin to "not-really-immutable index expressions" for any proposal
which depends on re-finding index pointers :(

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 01:25:17
Message-ID: 200806100125.m5A1PHP08818@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Gurjeet Singh wrote:
> On Fri, May 30, 2008 at 10:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > But since you mention it: one of the plausible answers for fixing the
> > vacuum problem for read-only slaves is to have the slaves push an xmin
> > back upstream to the master to prevent premature vacuuming. The current
> > design of pg_standby is utterly incapable of handling that requirement.
> > So there might be an implementation dependency there, depending on how
> > we want to solve that problem.
> >
>
> I think it would be best to not make the slave interfere with the master's
> operations; that's only going to increase the operational complexity of such
> a solution.
>
> There could be multiple slaves following a master, some serving

For the slave to not interfere with the master at all, we would need to
delay application of WAL files on each slave until visibility on that
slave allows the WAL to be applied, but in that case we would have
long-running transactions delay data visibility of all slave sessions.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 01:46:06
Message-ID: 200806100146.m5A1k6125570@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Andreas 'ads' Scherbaum wrote:
> On Fri, 30 May 2008 16:22:41 -0400 (EDT) Greg Smith wrote:
>
> > On Fri, 30 May 2008, Andreas 'ads' Scherbaum wrote:
> >
> > > Then you ship 16 MB binary stuff every 30 second or every minute but
> > > you only have some kbyte real data in the logfile.
> >
> > Not if you use pg_clearxlogtail (
> > http://www.2ndquadrant.com/replication.htm ), which got lost in the giant
> > March commitfest queue but should probably wander into contrib as part of
> > 8.4.
>
> Yes, this topic was discussed several times in the past but to
> solve this it needs a patch/solution which is integrated into PG
> itself, not contrib.

Agreed. I realize why we are not zeroing those bytes (for performance),
but can't we have the archiver zero those bytes before calling the
'archive_command'?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 01:49:52
Message-ID: 20080610014952.GF10034@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian wrote:

> Agreed. I realize why we are not zeroing those bytes (for performance),
> but can't we have the archiver zero those bytes before calling the
> 'archive_command'?

Perhaps make the zeroing user-settable.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, David Fetter <david(at)fetter(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 01:50:58
Message-ID: 5164.1213062658@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Gurjeet Singh wrote:
>> There could be multiple slaves following a master, some serving

> For the slave to not interfere with the master at all, we would need to
> delay application of WAL files on each slave until visibility on that
> slave allows the WAL to be applied, but in that case we would have
> long-running transactions delay data visibility of all slave sessions.

Right, but you could segregate out long-running queries to one slave
server that could be further behind than the others.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 01:55:31
Message-ID: 5245.1213062931@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Agreed. I realize why we are not zeroing those bytes (for performance),
> but can't we have the archiver zero those bytes before calling the
> 'archive_command'?

The archiver doesn't know any more about where the end-of-data is than
the archive_command does. Moreover, the archiver doesn't know whether
the archive_command cares. I think the separate module is a fine
solution.

It should also be pointed out that the whole thing becomes uninteresting
if we get real-time log shipping implemented. So I see absolutely no
point in spending time integrating pg_clearxlogtail now.

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 02:48:33
Message-ID: Pine.GSO.4.64.0806092243080.11286@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On Mon, 9 Jun 2008, Tom Lane wrote:

> It should also be pointed out that the whole thing becomes uninteresting
> if we get real-time log shipping implemented. So I see absolutely no
> point in spending time integrating pg_clearxlogtail now.

There are remote replication scenarios over a WAN (mainly aimed at
disaster recovery) that want to keep a fairly updated database without
putting too much traffic over the link. People in that category really
want zeroed tail+compressed archives, but probably not the extra overhead
that comes with shipping smaller packets in a real-time implementation.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 03:16:11
Message-ID: a778a7260806092016s53a26a4co8902eee56dfc329b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Just for information.

In terms of archive compression, I have archive log compression which
will be found in http://pgfoundry.org/projects/pglesslog/

This feature is also included in NTT's synchronized log shipping
replication presented in the last PGCon.

2008/6/10 Greg Smith <gsmith(at)gregsmith(dot)com>:
> On Mon, 9 Jun 2008, Tom Lane wrote:
>
>> It should also be pointed out that the whole thing becomes uninteresting
>> if we get real-time log shipping implemented. So I see absolutely no
>> point in spending time integrating pg_clearxlogtail now.
>
> There are remote replication scenarios over a WAN (mainly aimed at disaster
> recovery) that want to keep a fairly updated database without putting too
> much traffic over the link. People in that category really want zeroed
> tail+compressed archives, but probably not the extra overhead that comes
> with shipping smaller packets in a real-time implementation.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
------
Koichi Suzuki


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 09:43:51
Message-ID: 87y75dmxpk.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

"Greg Smith" <gsmith(at)gregsmith(dot)com> writes:

> On Mon, 9 Jun 2008, Tom Lane wrote:
>
>> It should also be pointed out that the whole thing becomes uninteresting
>> if we get real-time log shipping implemented. So I see absolutely no
>> point in spending time integrating pg_clearxlogtail now.
>
> There are remote replication scenarios over a WAN (mainly aimed at disaster
> recovery) that want to keep a fairly updated database without putting too much
> traffic over the link. People in that category really want zeroed
> tail+compressed archives, but probably not the extra overhead that comes with
> shipping smaller packets in a real-time implementation.

Instead of zeroing bytes and depending on compression why not just pass an
extra parameter to the archive command with the offset to the logical end of
data. The archive_command could just copy from the start to that point and not
bother transferring the rest.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 10:03:08
Message-ID: 484E515C.5060309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Gregory Stark wrote:
> Instead of zeroing bytes and depending on compression why not just pass an
> extra parameter to the archive command with the offset to the logical end of
> data.

Because the archiver process doesn't have that information.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, David Fetter <david(at)fetter(dot)org>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-10 16:01:05
Message-ID: 200806100901.06460.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

All,

> > For the slave to not interfere with the master at all, we would need to
> > delay application of WAL files on each slave until visibility on that
> > slave allows the WAL to be applied, but in that case we would have
> > long-running transactions delay data visibility of all slave sessions.
>
> Right, but you could segregate out long-running queries to one slave
> server that could be further behind than the others.

I still see having 2 different settings:

Synchronous: XID visibility is pushed to the master. Maintains synchronous
failover, and users are expected to run *1* master to *1* slave for most
installations.

Asynchronous: replication stops on the slave whenever minxid gets out of
synch. Could have multiple slaves, but noticeable lag between master and
slave.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-11 01:11:07
Message-ID: 20080611095156.81B0.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers


Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> I still see having 2 different settings:
>
> Synchronous: XID visibility is pushed to the master. Maintains synchronous
> failover, and users are expected to run *1* master to *1* slave for most
> installations.
>
> Asynchronous: replication stops on the slave whenever minxid gets out of
> synch. Could have multiple slaves, but noticeable lag between master and
> slave.

I agree with you that we have sync/async option in log-shipping.
Also, we could have another setting - synchronous-shipping and
asynchronous-flushing. We won't lose transactions if both servers are
down at once and can avoid delays to flush wal files into primary's disks.

As for multiple slaves, we could have a cascading configuration;
WAL receiver also delivers WAL records to other servers.
I think it is simple that the postgres core has only one-to-one replication
and multiple slaves are supported by 3rd party's WAL receivers.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>,<pgsql-hackers(at)postgresql(dot)org>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-06-16 16:56:51
Message-ID: 485654B7.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

>>> On Mon, Jun 9, 2008 at 9:48 PM, in message
<Pine(dot)GSO(dot)4(dot)64(dot)0806092243080(dot)11286(at)westnet(dot)com>, Greg Smith
<gsmith(at)gregsmith(dot)com> wrote:
> On Mon, 9 Jun 2008, Tom Lane wrote:
>
>> It should also be pointed out that the whole thing becomes
uninteresting
>> if we get real-time log shipping implemented. So I see absolutely
no
>> point in spending time integrating pg_clearxlogtail now.
>
> There are remote replication scenarios over a WAN (mainly aimed at
> disaster recovery) that want to keep a fairly updated database
without
> putting too much traffic over the link. People in that category
really
> want zeroed tail+compressed archives, but probably not the extra
overhead
> that comes with shipping smaller packets in a real-time
implementation.

We ship the WAL files over a (relatively) slow WAN for disaster
recovery purposes, and we would be fine with replacing our current
techniques with real-time log shipping as long as:

(1) We can do it asynchronously. (i.e., we don't have to wait for
WAN latency to commit transactions.)

(2) It can ship to multiple targets. (Management dictates that we
have backups at the site of origin as well as our central site. A
failure to replicate to one must not delay the other.)

(3) It doesn't consume substantially more WAN bandwidth overall.

A solution which fails to cover any of these leaves pg_clearxlogtail
interesting to us.

-Kevin


From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Follow-up on replication hooks for PostgreSQL
Date: 2008-07-10 00:55:25
Message-ID: C49AAC0D.C43F%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hi everyone,

This is a quick update on a promise I made early in June to suggest requirements as well as ways to add replication hooks that would support logical replication, as opposed to the physical replication work currently underway based on NTT's code.

Well, June was a pretty busy month, so it has taken a while to get back to this. However, we are now beginning to examine options for PostgreSQL logical replication. To make a long story short we are willing to commit resources to this problem or fund other people to do it for us. If you are interested please contact me directly. Meanwhile, we are quite serious about this problem and intend to work on helpful additions to PostgreSQL in this area. I will post more as we make progress.

Thanks, Robert

--
Robert Hodges, CTO, Continuent, Inc.
Email: robert(dot)hodges(at)continuent(dot)com

P.s., Happy 12th birthday everyone!


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Robert Hodges" <robert(dot)hodges(at)continuent(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Follow-up on replication hooks for PostgreSQL
Date: 2008-07-10 11:56:31
Message-ID: e51f66da0807100456s5b7b1062o2e7eab098d97cd8d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

On 7/10/08, Robert Hodges <robert(dot)hodges(at)continuent(dot)com> wrote:
> This is a quick update on a promise I made early in June to suggest
> requirements as well as ways to add replication hooks that would support
> logical replication, as opposed to the physical replication work currently
> underway based on NTT's code.
>
> Well, June was a pretty busy month, so it has taken a while to get back to
> this. However, we are now beginning to examine options for PostgreSQL
> logical replication. To make a long story short we are willing to commit
> resources to this problem or fund other people to do it for us. If you are
> interested please contact me directly. Meanwhile, we are quite serious
> about this problem and intend to work on helpful additions to PostgreSQL in
> this area. I will post more as we make progress.

Well, I'm not exactly sure what you are planning. It's OK to do
draft design privately, but before actually starting coding, the
design should be discussed in -hackers.

And I'm not exactly sure what you mean in "logical replication"?
Way to log DDL statements? Do you want to log DML also?

FWIW, here's very draft design for functionality that could be used
to make current Slony-I/Londiste-like solutions to replicate DDL also.

1. CREATE DDL TRIGGER statement that allows to call function
for all DDL statements.

Only filtering that makes sense here is filtering by area:
tables/functions/views/etc.

It must be possible to do AFTER trigger. Whether BEFORE
trigger for DDL make sense or not, I'm not sure.

2. When function is called, following information is given:

- Object type the event was for (table/view/function)
- Array of object names.
- SQL statement as text.

The trigger function can filter further based on object names
whether it does want to log the event or not.

Trying to make the trigger run on only subset of events is complex,
and parsing the SQL to pieces for trigger to understand it better
is also complex and neither is needed. Unless there are some common
situation where such simple design fails to work, I would not make
the scheme more complex.

Also the design should be based on assumption that the target side
is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE.
We should not make scheme more complex to survive cases where target
is not in sync. That way madness lies. The effect should be like
same SQL statements are applied to target by hand, no more, no less.

--
marko


From: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Follow-up on replication hooks for PostgreSQL
Date: 2008-07-10 14:44:20
Message-ID: C49B6E54.C4AD%robert.hodges@continuent.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Hi Marko,

No fear, we definitely will discuss on pgsql-hackers. I just wanted to make sure that people understood we are still committed to solving this problem and will one way or another commit resources to help.

Just to be clear, by logical replication I mean replication based on sending SQL or near-SQL (e.g., generic DML events) between servers. Physical replication on the other hand uses internal formats to replicate changes without intervening conversion to SQL, for example by shipping WAL records. There are advantages to each for different applications. BTW, I heard this nomenclature from Simon Riggs. It seems quite helpful.

The DDL trigger proposal is interesting and would be a very useful feature addition to PostgreSQL. To execute correctly it may also be necessary to know which database you were using at the time the SQL was issued.

For our part we are looking for ways to replicate most or all data on a server as efficiently as possible. Generic call-outs at commit time or reading the log directly are attractive approaches. Depending on the implementation you can avoid double writes of replicated data on the master host. Also, it avoids the management headache of ensuring that triggers are correctly installed. It seems as if one of these generic approaches could hook into WAL record transport.

Cheers, Robert

On 7/10/08 4:56 AM, "Marko Kreen" <markokr(at)gmail(dot)com> wrote:

On 7/10/08, Robert Hodges <robert(dot)hodges(at)continuent(dot)com> wrote:
> This is a quick update on a promise I made early in June to suggest
> requirements as well as ways to add replication hooks that would support
> logical replication, as opposed to the physical replication work currently
> underway based on NTT's code.
>
> Well, June was a pretty busy month, so it has taken a while to get back to
> this. However, we are now beginning to examine options for PostgreSQL
> logical replication. To make a long story short we are willing to commit
> resources to this problem or fund other people to do it for us. If you are
> interested please contact me directly. Meanwhile, we are quite serious
> about this problem and intend to work on helpful additions to PostgreSQL in
> this area. I will post more as we make progress.

Well, I'm not exactly sure what you are planning. It's OK to do
draft design privately, but before actually starting coding, the
design should be discussed in -hackers.

And I'm not exactly sure what you mean in "logical replication"?
Way to log DDL statements? Do you want to log DML also?

FWIW, here's very draft design for functionality that could be used
to make current Slony-I/Londiste-like solutions to replicate DDL also.

1. CREATE DDL TRIGGER statement that allows to call function
for all DDL statements.

Only filtering that makes sense here is filtering by area:
tables/functions/views/etc.

It must be possible to do AFTER trigger. Whether BEFORE
trigger for DDL make sense or not, I'm not sure.

2. When function is called, following information is given:

- Object type the event was for (table/view/function)
- Array of object names.
- SQL statement as text.

The trigger function can filter further based on object names
whether it does want to log the event or not.

Trying to make the trigger run on only subset of events is complex,
and parsing the SQL to pieces for trigger to understand it better
is also complex and neither is needed. Unless there are some common
situation where such simple design fails to work, I would not make
the scheme more complex.

Also the design should be based on assumption that the target side
is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE.
We should not make scheme more complex to survive cases where target
is not in sync. That way madness lies. The effect should be like
same SQL statements are applied to target by hand, no more, no less.

--
marko

--
Robert Hodges, CTO, Continuent, Inc.
Email: robert(dot)hodges(at)continuent(dot)com
Mobile: +1-510-501-3728 Skype: hodgesrm


From: chris <cbbrowne(at)ca(dot)afilias(dot)info>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Follow-up on replication hooks for PostgreSQL
Date: 2008-07-11 21:03:14
Message-ID: 87zloo16ct.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

markokr(at)gmail(dot)com ("Marko Kreen") writes:
> Also the design should be based on assumption that the target side
> is exactly in sync. Eg. DROP CASCADE should be replicated as DROP CASCADE.
> We should not make scheme more complex to survive cases where target
> is not in sync. That way madness lies. The effect should be like
> same SQL statements are applied to target by hand, no more, no less.

We have, already, in 8.4, a handling of triggers for TRUNCATE; the
reason why support hasn't made it into Slony-I yet relates quite
exactly to this...

The trouble comes in if you do TRUNCATE CASCADE; I'm not quite sure
how to collect together the multiple recordings of the trigger
functions that would be collected as a result; for it all to work,
safely, on the remote node, we'd need to apply all of those truncates
at once.

Note also that there is an issue with coordination of schemas; Slony-I
shuts off the RI triggers on subscribers, so that the target is fairly
certain to not be *entirely* in sync, by express intent.

Those are legitimate differences between source and target.
--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/lsf.html
Rules of the Evil Overlord #145. "My dungeon cell decor will not
feature exposed pipes. While they add to the gloomy atmosphere, they
are good conductors of vibrations and a lot of prisoners know Morse
code." <http://www.eviloverlord.com/>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-07-15 15:03:55
Message-ID: 200807151503.m6FF3t724558@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Agreed. I realize why we are not zeroing those bytes (for performance),
> > but can't we have the archiver zero those bytes before calling the
> > 'archive_command'?
>
> The archiver doesn't know any more about where the end-of-data is than
> the archive_command does. Moreover, the archiver doesn't know whether
> the archive_command cares. I think the separate module is a fine
> solution.
>
> It should also be pointed out that the whole thing becomes uninteresting
> if we get real-time log shipping implemented. So I see absolutely no
> point in spending time integrating pg_clearxlogtail now.

People doing PITR are still going to be saving these files, and for a
long time, so I think this is still something we should try to address.

Added to TODO:

o Reduce PITR WAL file size by removing full page writes and
by removing trailing bytes to improve compression

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-07-15 22:18:46
Message-ID: 20080715221846.GA9945@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian wrote:

> Added to TODO:
>
> o Reduce PITR WAL file size by removing full page writes and
> by removing trailing bytes to improve compression

If we remove full page writes, how does hint bit setting get propagated
to the slave?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-07-15 22:24:17
Message-ID: 200807152224.m6FMOHO05685@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Alvaro Herrera wrote:
> Bruce Momjian wrote:
>
> > Added to TODO:
> >
> > o Reduce PITR WAL file size by removing full page writes and
> > by removing trailing bytes to improve compression
>
> If we remove full page writes, how does hint bit setting get propagated
> to the slave?

We would remove full page writes that are needed for crash recovery, but
perhaps keep other full pages.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-07-15 22:32:08
Message-ID: 20080715223208.GB9945@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> >
> > > Added to TODO:
> > >
> > > o Reduce PITR WAL file size by removing full page writes and
> > > by removing trailing bytes to improve compression
> >
> > If we remove full page writes, how does hint bit setting get propagated
> > to the slave?
>
> We would remove full page writes that are needed for crash recovery, but
> perhaps keep other full pages.

How do you tell which is which?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andreas 'ads' Scherbaum" <adsmail(at)wars-nicht(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Core team statement on replication in PostgreSQL
Date: 2008-07-15 22:56:31
Message-ID: 200807152256.m6FMuVa21901@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy pgsql-hackers

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > >
> > > > Added to TODO:
> > > >
> > > > o Reduce PITR WAL file size by removing full page writes and
> > > > by removing trailing bytes to improve compression
> > >
> > > If we remove full page writes, how does hint bit setting get propagated
> > > to the slave?
> >
> > We would remove full page writes that are needed for crash recovery, but
> > perhaps keep other full pages.
>
> How do you tell which is which?

The WAL format would have to be modified to indicate which entries can
be discarded.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +