Re: Review: Patch for Synchronous Replication

Lists: pgsql-hackerspgsql-rrreviewers
From: Thom Brown <thom(at)linux(dot)com>
To: pgsql-rrreviewers(at)postgresql(dot)org
Subject: Review: Patch for Synchronous Replication
Date: 2010-09-29 22:29:24
Message-ID: AANLkTi=Bs8ZUjtXGqjqzaBT66SB=NknxtWRwWdkddVQG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-rrreviewers

This is a basic review of Fujii Masao's synchronous replication patch
from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com

Review Description
==================
This patch extends existing asynchronous streaming replication with
options to enable different levels of synchronous behaviour. On the
primary, the is a new standbys.conf file containing a manifest of
standbys it seeks a form a confirmation from before committing
transactions. These contain standby name, level of synchronicity (if
it wasn't a word, it is now), and a timeout value specifying how long
the primary is willing to wait for confirmation before aborting the
transaction. On the standby, the new option "standby_name" has been
added to publish a named identity of the standby server to the
primary.

Patch application
=================
The patch applies cleanly to HEAD. All regression tests pass
successfully as expected.

Testing
=======
I configured the primary's standby.conf to provide synchronous
replication to a single slave using fsync as it's replication level
and a timeout of 100ms. The wal_level was set to hot_standby in
postgresql.conf. The standby's recovery.conf had its standby_name
value set to match that expected by the primary's standby.conf. This
is summarised as follows:

## primary postgresql.conf:
wal_level = hot_standby
max_wal_senders = 2
wal_keep_segments = 2

## primary standbys.conf
# STANDBY NAME SYNCHRONOUS TIMEOUT
cougar fsync 100ms

## primary pg_hba.conf
# TYPE DATABASE USER CIDR-ADDRESS METHOD
host replication postgres 192.168.102.17/32 trust

## standby postgresql.conf
hot_standby = on

## standby recovery.conf
standby_name = 'cougar'
standby_mode = 'on'
primary_conninfo = 'host=192.168.102.125 port=5432'

Issues
======

The primary started up fine and was accepting connections. A base
backup was taken for the standby, and after configuring the standby, I
attempted to bring it up, but received the following error:

postgres(at)cougar:~/project/data$ pg_ctl start
server starting
postgres(at)cougar:~/project/data$ LOG: database system was shut down in
recovery at 2010-09-29 22:52:24 BST
LOG: entering standby mode
LOG: redo starts at 0/1000020
LOG: record with zero length at 0/10000B0
FATAL: could not connect to the primary server: invalid connection
option "standby_name"

I believe I am using the correct parameter as I followed the
additional documentation provided in the patch.

Conclusion
==========

It at least appears to me that it isn't functional in its current
state, or there is setup information missing from the
documentation.... or I've make a stupid mistake somewhere (likely).

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Thom Brown <thom(at)linux(dot)com>
To: pgsql-rrreviewers(at)postgresql(dot)org
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Review: Patch for Synchronous Replication
Date: 2010-09-29 22:57:33
Message-ID: AANLkTimBcQ4Bjx=4g5bTeeeNiQo3TDMSuv2FmHYMZjtU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-rrreviewers

On 29 September 2010 23:29, Thom Brown <thom(at)linux(dot)com> wrote:
> This is a basic review of Fujii Masao's synchronous replication patch
> from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com
>
> Review Description
> ==================
> This patch extends existing asynchronous streaming replication with
> options to enable different levels of synchronous behaviour.  On the
> primary, the is a new standbys.conf file containing a manifest of
> standbys it seeks a form a confirmation from before committing
> transactions.  These contain standby name, level of synchronicity (if
> it wasn't a word, it is now), and a timeout value specifying how long
> the primary is willing to wait for confirmation before aborting the
> transaction.  On the standby, the new option "standby_name" has been
> added to publish a named identity of the standby server to the
> primary.
>
>
> Patch application
> =================
> The patch applies cleanly to HEAD.  All regression tests pass
> successfully as expected.
>
>
> Testing
> =======
> I configured the primary's standby.conf to provide synchronous
> replication to a single slave using fsync as it's replication level
> and a timeout of 100ms.  The wal_level was set to hot_standby in
> postgresql.conf.  The standby's recovery.conf had its standby_name
> value set to match that expected by the primary's standby.conf.  This
> is summarised as follows:
>
> ## primary postgresql.conf:
> wal_level = hot_standby
> max_wal_senders = 2
> wal_keep_segments = 2
>
> ## primary standbys.conf
> # STANDBY NAME    SYNCHRONOUS   TIMEOUT
> cougar            fsync           100ms
>
> ## primary pg_hba.conf
> # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
> host    replication     postgres        192.168.102.17/32       trust
>
> ## standby postgresql.conf
> hot_standby = on
>
> ## standby recovery.conf
> standby_name = 'cougar'
> standby_mode = 'on'
> primary_conninfo = 'host=192.168.102.125 port=5432'
>
>
> Issues
> ======
>
> The primary started up fine and was accepting connections.  A base
> backup was taken for the standby, and after configuring the standby, I
> attempted to bring it up, but received the following error:
>
> postgres(at)cougar:~/project/data$ pg_ctl start
> server starting
> postgres(at)cougar:~/project/data$ LOG:  database system was shut down in
> recovery at 2010-09-29 22:52:24 BST
> LOG:  entering standby mode
> LOG:  redo starts at 0/1000020
> LOG:  record with zero length at 0/10000B0
> FATAL:  could not connect to the primary server: invalid connection
> option "standby_name"
>
> I believe I am using the correct parameter as I followed the
> additional documentation provided in the patch.
>
>
> Conclusion
> ==========
>
> It at least appears to me that it isn't functional in its current
> state, or there is setup information missing from the
> documentation.... or I've make a stupid mistake somewhere (likely).

Quick back-peddle... it appears the patch was only successful on the
standby. Only doc changes appeared to make it to the primary.
Re-attempt tomorrow. Apologies.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Thom Brown <thom(at)linux(dot)com>
To: pgsql-rrreviewers(at)postgresql(dot)org
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Review: Patch for Synchronous Replication
Date: 2010-09-29 23:29:03
Message-ID: AANLkTikTGQm99Wv-t7K8rNms2NUpaLBoG=Avq-c3Jc1F@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-rrreviewers

On 29 September 2010 23:57, Thom Brown <thom(at)linux(dot)com> wrote:
> On 29 September 2010 23:29, Thom Brown <thom(at)linux(dot)com> wrote:
>> This is a basic review of Fujii Masao's synchronous replication patch
>> from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com
>>
>> Review Description
>> ==================
>> This patch extends existing asynchronous streaming replication with
>> options to enable different levels of synchronous behaviour.  On the
>> primary, the is a new standbys.conf file containing a manifest of
>> standbys it seeks a form a confirmation from before committing
>> transactions.  These contain standby name, level of synchronicity (if
>> it wasn't a word, it is now), and a timeout value specifying how long
>> the primary is willing to wait for confirmation before aborting the
>> transaction.  On the standby, the new option "standby_name" has been
>> added to publish a named identity of the standby server to the
>> primary.
>>
>>
>> Patch application
>> =================
>> The patch applies cleanly to HEAD.  All regression tests pass
>> successfully as expected.
>>
>>
>> Testing
>> =======
>> I configured the primary's standby.conf to provide synchronous
>> replication to a single slave using fsync as it's replication level
>> and a timeout of 100ms.  The wal_level was set to hot_standby in
>> postgresql.conf.  The standby's recovery.conf had its standby_name
>> value set to match that expected by the primary's standby.conf.  This
>> is summarised as follows:
>>
>> ## primary postgresql.conf:
>> wal_level = hot_standby
>> max_wal_senders = 2
>> wal_keep_segments = 2
>>
>> ## primary standbys.conf
>> # STANDBY NAME    SYNCHRONOUS   TIMEOUT
>> cougar            fsync           100ms
>>
>> ## primary pg_hba.conf
>> # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
>> host    replication     postgres        192.168.102.17/32       trust
>>
>> ## standby postgresql.conf
>> hot_standby = on
>>
>> ## standby recovery.conf
>> standby_name = 'cougar'
>> standby_mode = 'on'
>> primary_conninfo = 'host=192.168.102.125 port=5432'
>>
>>
>> Issues
>> ======
>>
>> The primary started up fine and was accepting connections.  A base
>> backup was taken for the standby, and after configuring the standby, I
>> attempted to bring it up, but received the following error:
>>
>> postgres(at)cougar:~/project/data$ pg_ctl start
>> server starting
>> postgres(at)cougar:~/project/data$ LOG:  database system was shut down in
>> recovery at 2010-09-29 22:52:24 BST
>> LOG:  entering standby mode
>> LOG:  redo starts at 0/1000020
>> LOG:  record with zero length at 0/10000B0
>> FATAL:  could not connect to the primary server: invalid connection
>> option "standby_name"
>>
>> I believe I am using the correct parameter as I followed the
>> additional documentation provided in the patch.
>>
>>
>> Conclusion
>> ==========
>>
>> It at least appears to me that it isn't functional in its current
>> state, or there is setup information missing from the
>> documentation.... or I've make a stupid mistake somewhere (likely).
>
> Quick back-peddle... it appears the patch was only successful on the
> standby.  Only doc changes appeared to make it to the primary.
> Re-attempt tomorrow.  Apologies.

Well, that doesn't seem to have made any difference. Confirmed the
patch was applied in both cases, rebuilt, base backup again etc... no
change. Same error as in original review. *shrug*

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: pgsql-rrreviewers(at)postgresql(dot)org
Subject: Re: Review: Patch for Synchronous Replication
Date: 2010-09-30 00:38:59
Message-ID: AANLkTinr53EV7tq0okdEavvaKmeGbXXY+ARe_9VMj9mt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-rrreviewers

Thom -

Just as a logistical note, reviews should be posted to -hackers.
-rrreviewers is for discussion of assigning reviewers, coordinating
who is doing what, etc.

...Robert

On Wed, Sep 29, 2010 at 6:29 PM, Thom Brown <thom(at)linux(dot)com> wrote:
> This is a basic review of Fujii Masao's synchronous replication patch
> from http://archives.postgresql.org/message-id/AANLkTik2c3kV7HgJnM4MjkCWVG-QvDJXD3iR9TqsCnpP@mail.gmail.com
>
> Review Description
> ==================
> This patch extends existing asynchronous streaming replication with
> options to enable different levels of synchronous behaviour.  On the
> primary, the is a new standbys.conf file containing a manifest of
> standbys it seeks a form a confirmation from before committing
> transactions.  These contain standby name, level of synchronicity (if
> it wasn't a word, it is now), and a timeout value specifying how long
> the primary is willing to wait for confirmation before aborting the
> transaction.  On the standby, the new option "standby_name" has been
> added to publish a named identity of the standby server to the
> primary.
>
>
> Patch application
> =================
> The patch applies cleanly to HEAD.  All regression tests pass
> successfully as expected.
>
>
> Testing
> =======
> I configured the primary's standby.conf to provide synchronous
> replication to a single slave using fsync as it's replication level
> and a timeout of 100ms.  The wal_level was set to hot_standby in
> postgresql.conf.  The standby's recovery.conf had its standby_name
> value set to match that expected by the primary's standby.conf.  This
> is summarised as follows:
>
> ## primary postgresql.conf:
> wal_level = hot_standby
> max_wal_senders = 2
> wal_keep_segments = 2
>
> ## primary standbys.conf
> # STANDBY NAME    SYNCHRONOUS   TIMEOUT
> cougar            fsync           100ms
>
> ## primary pg_hba.conf
> # TYPE  DATABASE        USER            CIDR-ADDRESS            METHOD
> host    replication     postgres        192.168.102.17/32       trust
>
> ## standby postgresql.conf
> hot_standby = on
>
> ## standby recovery.conf
> standby_name = 'cougar'
> standby_mode = 'on'
> primary_conninfo = 'host=192.168.102.125 port=5432'
>
>
> Issues
> ======
>
> The primary started up fine and was accepting connections.  A base
> backup was taken for the standby, and after configuring the standby, I
> attempted to bring it up, but received the following error:
>
> postgres(at)cougar:~/project/data$ pg_ctl start
> server starting
> postgres(at)cougar:~/project/data$ LOG:  database system was shut down in
> recovery at 2010-09-29 22:52:24 BST
> LOG:  entering standby mode
> LOG:  redo starts at 0/1000020
> LOG:  record with zero length at 0/10000B0
> FATAL:  could not connect to the primary server: invalid connection
> option "standby_name"
>
> I believe I am using the correct parameter as I followed the
> additional documentation provided in the patch.
>
>
> Conclusion
> ==========
>
> It at least appears to me that it isn't functional in its current
> state, or there is setup information missing from the
> documentation.... or I've make a stupid mistake somewhere (likely).
>
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> --
> Sent via pgsql-rrreviewers mailing list (pgsql-rrreviewers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-rrreviewers
>

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Patch for Synchronous Replication
Date: 2010-10-04 13:48:29
Message-ID: AANLkTimGSMApO5kYZEs-UYHK2PG_J3FT2EArK_2tiHbp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-rrreviewers

On Thu, Sep 30, 2010 at 8:29 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>>> This is a basic review of Fujii Masao's synchronous replication patch

Thanks for the review! And sorry for the delay.

>>> I configured the primary's standby.conf to provide synchronous
>>> replication to a single slave using fsync as it's replication level
>>> and a timeout of 100ms.

In the patch, standbys.conf still cannot accept a timeout setting.
So please specify the standby name and replication level in that file.
$PREFIX/share/standbys.conf.sample would be helpful.

>>> The primary started up fine and was accepting connections.  A base
>>> backup was taken for the standby, and after configuring the standby, I
>>> attempted to bring it up, but received the following error:
>>>
>>> postgres(at)cougar:~/project/data$ pg_ctl start
>>> server starting
>>> postgres(at)cougar:~/project/data$ LOG:  database system was shut down in
>>> recovery at 2010-09-29 22:52:24 BST
>>> LOG:  entering standby mode
>>> LOG:  redo starts at 0/1000020
>>> LOG:  record with zero length at 0/10000B0
>>> FATAL:  could not connect to the primary server: invalid connection
>>> option "standby_name"
>>>
>>> I believe I am using the correct parameter as I followed the
>>> additional documentation provided in the patch.
>>>
>>>
>>> Conclusion
>>> ==========
>>>
>>> It at least appears to me that it isn't functional in its current
>>> state, or there is setup information missing from the
>>> documentation.... or I've make a stupid mistake somewhere (likely).
>>
>> Quick back-peddle... it appears the patch was only successful on the
>> standby.  Only doc changes appeared to make it to the primary.
>> Re-attempt tomorrow.  Apologies.
>
> Well, that doesn't seem to have made any difference.  Confirmed the
> patch was applied in both cases, rebuilt, base backup again etc... no
> change.  Same error as in original review. *shrug*

That's strange. I cannot reproduce that error.

According to the error message, I guess that the patch has not been
applied successfully to the master yet. Could you try applying the
patch and re-installing the patch-applied-postgres again?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center