Quick Links

Re: Support for N synchronous standby servers - take 2

Lists:	pgsql-hackers

From:	"Amir Rohan" <amir(dot)rohan(at)mail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-09-23 04:11:15
Message-ID:	trinity-4258acb0-b75a-4bf1-9657-f8237883cd0d-1442981475318@3capp-mailcom-lxa02
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

<html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>
<div>>On 07/16/15, Robert Haas wrote:<br/>
>    <br/>
>>> * Developers will immediately understand the format<br/>
>><br/>
>>I doubt it.  I think any format that we pick will have to be carefully<br/>
>>documented.  People may know what JSON looks like in general, but they<br/>
>>will not immediately know what bells and whistles are available in<br/>
>>this context.<br/>
>><br/>
>>> * Easy to programmatically manipulate in a range of languages<br/>
>><br/>
>> <...> I think it will be rare to need to parse the postgresql.conf string,<br/>
>> manipulate it programatically, and then put it back.<br/>
><br/>
>On Sun, Jul 19, 2015 at 4:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:<br/>
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:<br/>
>>> On 07/17/2015 04:36 PM, Jim Nasby wrote:<br/>
>>>> I'm guessing it'd be really ugly/hard to support at least this GUC being<br/>
>>>> multi-line?<br/>
>><br/>
>>> Mind you, multi-line GUCs would be useful otherwise, but we don't want<br/>
>>> to hinge this feature on making that work.<br/>
>><br/>
>> Do we really want such a global reduction in friendliness to make this<br/>
>> feature easier?<br/>
><br/>
>Maybe shoehorning this into the GUC mechanism is the wrong thing, and<br/>
>what we really need is a new config file for this.  The information<br/>
>we're proposing to store seems complex enough to justify that.</div>

<div>It seems like:</div>

<div>1) There's a need to support structured data in configuration for future</div>

<div>needs as well, it isn't specific to this feature.<br/>
2) There should/must be a better way to validate configuration then<br/>
to restarting the server in search of syntax errors.</div>

<div>Creating a whole new configuration file for just one feature *and* in a different
<div>format seems suboptimal.  What happens when the next 20 features need structured</div>

<div>config data, where does that go? will there be additional JSON config files *and* perhaps</div>

<div>new mini-language values in .conf as development continues?  How many dedicated</div>

<div>configuration files is too many?</div>
</div>

<div>Now, about JSON.... (Earlier Upthread):</div>

<div> <br/>
On 07/01/15, Peter Eisentraut wrote:</div>

<div>> On 6/26/15 2:53 PM, Josh Berkus wrote:<br/>
> > I would also suggest that if I lose this battle and<br/>
> > we decide to go with a single stringy GUC, that we at least use JSON<br/>
> > instead of defining our out, proprietary, syntax?<br/>
>  <br/>
> Does JSON have a natural syntax for a set without order?</div>

<div>No. Nor Timestamps. It doesn't even distingush integer from float</div>

<div>(Though parsers do it for you in dynamic languages). It's all because</div>

<div>of its unsightly javascript roots.<br/>
 </div>

<div>
<div>The current patch is now forced by JSON to conflate sets and lists, so</div>

<div>un/ordered semantics are no longer tied to type but to the specific configuration keys.</div>

<div>So, If a feature ever needs a key where the difference between set and list matters</div>

<div>and needs to support both, you'll need seperate keys (both with lists, but meaning different things)</div>

<div>or a separate "mode" key or something. Not terrible, just iffy.</div>

<div>Other have found JSON unsatisfactory before. For example, the clojure community</div>

<div>has made (at least) two attempts at alternatives, complete with the meh adoption</div>

<div>rates you'd expect despite being more capable formats:</div>

<div>http://blog.cognitect.com/blog/2014/7/22/transit<br/>
https://github.com/edn-format/edn</div>

<div>There's also YAML, TOML, etc', none as universal as JSON. But to reiterate, JSON itself</div>

<div>has Lackluster type support (no sets, no timestamps), is verbose, iseasy to malform when editing</div>

<div>(missed a curly brace, shouldn't use a single quote), isn't extensible, and my personal pet peeve</div>

<div>is that it doesn't allow non-string or bare-string keys in maps (a.k.a "death by double-quotes").</div>
 

<div>Python has the very natural {1,2,3} syntax for sets, but of course that's not part of JSON.</div>

<div>If  JSON wins out despite all this, one alternative not discussed is to extend</div>

<div>the .conf parser to accept json dicts as a fundamental type. e.g.:</div>

<div>data_directory = 'ConfigDir'   <br/>
port = 5432<br/>
work_mem = 4MB<br/>
hot_standby = off<br/>
client_min_messages = notice<br/>
log_error_verbosity = default<br/>
autovacuum_analyze_scale_factor = 0.1<br/>
synch_standby_config = {<br/>
  "sync_info": {<br/>
    "nodes": [<br/>
      {<br/>
        "priority": 1,<br/>
        "group": "cluster1"<br/>
      },<br/>
      "A"<br/>
    ],<br/>
    "quorum": 3<br/>
  },<br/>
  "groups": {<br/>
    "cluster1": [<br/>
      "B",<br/>
      "C"<br/>
    ]<br/>
  }<br/>
}</div>

<div>This *will* break someone's perl I would guess. Ironically, those scripts wouldn't have broken if</div>

<div>some structured format were in use for the configuration data when they were written...</div>

<div>`postgres --describe-config` is also pretty much tied to a line-oriented configuration.</div>

<div>MIA configuration validation tool/switch should probably get a thread too.</div>

Attachment	Content-Type	Size
unknown_filename	text/html	6.3 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amir Rohan <amir(dot)rohan(at)mail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-09-23 23:57:08
Message-ID:	CA+TgmobxRHv-9SGa=ya41C=JcVgtYVCOWWsS8H7zcAddp3Tdcg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 23, 2015 at 12:11 AM, Amir Rohan <amir(dot)rohan(at)mail(dot)com> wrote:
> It seems like:
> 1) There's a need to support structured data in configuration for future
> needs as well, it isn't specific to this feature.
> 2) There should/must be a better way to validate configuration then
> to restarting the server in search of syntax errors.
>
> Creating a whole new configuration file for just one feature *and* in a
> different
> format seems suboptimal. What happens when the next 20 features need
> structured
> config data, where does that go? will there be additional JSON config files
> *and* perhaps
> new mini-language values in .conf as development continues? How many
> dedicated
> configuration files is too many?

Well, I think that if we create our own mini-language, it may well be
possible to make the configuration for this compact enough to fit on
one line. If we use JSON, I think there's zap chance of that. But...
that's just what *I* think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amir Rohan <amir(dot)rohan(at)mail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-09-24 00:11:27
Message-ID:	5811.1443053487@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, I think that if we create our own mini-language, it may well be
> possible to make the configuration for this compact enough to fit on
> one line. If we use JSON, I think there's zap chance of that. But...
> that's just what *I* think.

Well, that depends on what you think the typical-case complexity is
and on how long a line will fit in your editor window ;-).

I think that we can't make much progress on this argument without a pretty
concrete idea of what typical and worst-case configurations would look
like. Would someone like to put forward examples? Then we could try them
in any specific syntax that's suggested and see how verbose it gets.

FWIW, I tend to agree that if we think common cases can be held to,
say, a hundred or two hundred characters, that we're best off avoiding
the challenges of dealing with multi-line postgresql.conf entries.
And I'm really not much in favor of a separate file; if we go that way
then we're going to have to reinvent a huge amount of infrastructure
that already exists for GUCs.

regards, tom lane

From:	"Amir Rohan" <amir(dot)rohan(at)mail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-09-24 02:26:38
Message-ID:	trinity-2596e0cb-af50-45e6-adf4-8f8c0ff8b19a-1443061598299@3capp-mailcom-lxa12
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Sent: Thursday, September 24, 2015 at 3:11 AM
>
> From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > Well, I think that if we create our own mini-language, it may well be
> > possible to make the configuration for this compact enough to fit on
> > one line. If we use JSON, I think there's zap chance of that. But...
> > that's just what *I* think.
>>

I've implemented a parser that reads you mini-language and dumps a JSON
equivalent. Once you start naming groups the line fills up quite quickly,
and on the other hands the JSON is verbose and fiddely.
But implementing a mechanism that can be used by other features in
the future seems the deciding factor here, rather then the brevity of a
bespoke mini-language.

>
> <...> we're best off avoiding the challenges of dealing with multi-line
> postgresql.conf entries.
>
> And I'm really not much in favor of a separate file; if we go that way
> then we're going to have to reinvent a huge amount of infrastructure
> that already exists for GUCs.
>
> regards, tom lane

Adding support for JSON objects (or some other kind of composite data type)
to the .conf parser would negate the need for one, and would also solve the
problem being discussed for future cases.
I don't know whether that would break some tooling you care about,
but if there's interest, I can probably do some of that work.

From:	Beena Emerson <memissemerson(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-08 13:59:54
Message-ID:	1444312794903-5869285.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Amir Rohan wrote:
> But implementing a mechanism that can be used by other features in
> the future seems the deciding factor here, rather then the brevity of a
> bespoke mini-language.

One decision to be taken is which among JSON or mini-language is better for
the SR setting.
Mini language can fit into the postgresql.conf single line.

For JSON currently a different file is used. But as said earlier, in case
composite types are required in future for other parameters then having
multiple .conf files does not make sense. To avoid this we can:
- support multi-line GUC which would be helpful for other comma-separated
conf values along with s_s_names. (This can make mini-language more
readable as well)
- Allow JSON support in postgresql.conf. So that other parameters in future
can use JSON as well within postgresql.conf.

What are the chances of future data requiring JSON? I think rare.

> > And I'm really not much in favor of a separate file; if we go that way
> > then we're going to have to reinvent a huge amount of infrastructure
> > that already exists for GUCs.
>
> Adding support for JSON objects (or some other kind of composite data
> type)
> to the .conf parser would negate the need for one, and would also solve
> the
> problem being discussed for future cases.

With the current pg_syncinfo file, the only code added was to check the
pg_syncinfo file in the specified path and read the entire content of the
file into a variable which was used for further parsing which could have
been avoided with multi-line GUC.

-----
Beena Emerson

--
View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869285.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Beena Emerson <memissemerson(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-08 14:01:12
Message-ID:	1444312872974-5869286.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

The JSON method was used in the patch because it seemed to be the group
consensus.

Requirement:
- Grouping : Specify a list of node names with the required number of
ACK for the group. We could have priority or quorum group. Quorum treats
all the standby in same level and ACK from any k can be considered. In
priority behavior, ACK must be received from the specified k lowest priority
servers for a successful transaction.
- Group names to enable easier status reporting for group. The
topmost group may not be named. It will be assigned a default name. All the
sub groups are to be compulsorily named.
- Not more than 3 groups with 1 level of nesting expected

Behavior in submitted patch:
- The name of the top most group is named ‘Default Group”. All the
other standby_names or groups will have to be listed within this.
- When more than 1 connected standby has the same name then the
highest LSN among them is chosen. Example: 2 priority in (X,Y,Z). If there 2
nodes X connected, even though both X have returned ACK, the server will
wait for ACK from Y.
- There are no “potential” standbys. In quorum behavior, there are
no fixed standbys which are to be in sync, all members are equal. ACK from
any specified n nodes from a set is considered success.

Further:
- improvements to pg_stat_replication to give the node tree and
status?
- Manipulate/Edit conf setting using functions.
- Regression tests

Mini-lang:
[] - to specify prioirty
() - to specify quorum
Format - <name> : <count> [<list>]
Not specifying count defaults to 1.
Ex: s_s_names = '2(cluster1: 1(A,B), cluster2: 2[X,Y,Z], U)'

JSON
It would contain 2 main keys: "sync_info" and "groups"
The "sync_info" would consist of "quorum"/"priority" with the count and
"nodes"/"group" with the group name or node list.
The optional "groups" key would list out all the "group" mentioned within
"sync_info" along with the node list.
Ex: {
"sync_info":
{
"quorum":2,
"nodes":
[
{"quorum":1,"group":"cluster1"},
{"prioirty":2,"group": "cluster2"},
"U"
]
},
"groups":
{
"cluster1":["A","B"],
"cluster2":["X","Y","z"]
}
}

JSON and mini-language:
- JSON is more verbose
- You can define a group and use it multiple times in sync settings
but since no many levels or nesting is expected I am not sure how useful
this will be.
- Though JSON parser is inbuilt, additional code is required to check
for the required format of JSON. For mini-language, new parser will have to
be written.

Despite all, I feel the mini-language is better mainly for its brevity.
Also, it will not require additional GUC parser support (multi line).

-----
Beena Emerson

--
View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5869286.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Beena Emerson <memissemerson(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-09 04:00:12
Message-ID:	CAA4eK1LY+hypar5nVwHRnjAq5=XxiOOCeZQn8DppgJVWMGtMeg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 8, 2015 at 7:31 PM, Beena Emerson <memissemerson(at)gmail(dot)com>
wrote:
>
>
> Mini-lang:
> [] - to specify prioirty
> () - to specify quorum
> Format - <name> : <count> [<list>]
> Not specifying count defaults to 1.
> Ex: s_s_names = '2(cluster1: 1(A,B), cluster2: 2[X,Y,Z], U)'
>
> JSON
> It would contain 2 main keys: "sync_info" and "groups"
> The "sync_info" would consist of "quorum"/"priority" with the count and
> "nodes"/"group" with the group name or node list.
> The optional "groups" key would list out all the "group" mentioned within
> "sync_info" along with the node list.
> Ex: {
> "sync_info":
> {
> "quorum":2,
> "nodes":
> [
> {"quorum":1,"group":"cluster1"},
> {"prioirty":2,"group": "cluster2"},
> "U"
> ]
> },
> "groups":
> {
> "cluster1":["A","B"],
> "cluster2":["X","Y","z"]
> }
> }
>
> JSON and mini-language:
> - JSON is more verbose
> - You can define a group and use it multiple times in sync settings
> but since no many levels or nesting is expected I am not sure how useful
> this will be.
> - Though JSON parser is inbuilt, additional code is required to
check
> for the required format of JSON. For mini-language, new parser will have
to
> be written.
>

Sounds like both the approaches have some pros and cons, also there are
some people who prefer mini-language and others who prefer JSON. I think
one thing that might help, is to check how other databases support this
feature or somewhat similar to this feature (mainly with respect to User
Interface), as that can help us in knowing what users are already familiar
with.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-09 19:35:18
Message-ID:	CA+TgmobGpGJd7=6Tf_NLqaqQBDR=PvcmpF-GuCY5rS2kV=zzbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 9, 2015 at 12:00 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Sounds like both the approaches have some pros and cons, also there are
> some people who prefer mini-language and others who prefer JSON. I think
> one thing that might help, is to check how other databases support this
> feature or somewhat similar to this feature (mainly with respect to User
> Interface), as that can help us in knowing what users are already familiar
> with.

+1!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-13 18:02:58
Message-ID:	CAD21AoDGFmabpZHG1SwkoEVRwA7QOE7MeyviUsP10e8AmU2CwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 10, 2015 at 4:35 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 9, 2015 at 12:00 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> Sounds like both the approaches have some pros and cons, also there are
>> some people who prefer mini-language and others who prefer JSON. I think
>> one thing that might help, is to check how other databases support this
>> feature or somewhat similar to this feature (mainly with respect to User
>> Interface), as that can help us in knowing what users are already familiar
>> with.
>
> +1!
>

For example, MySQL 5.7 has similar feature, but it doesn't support
quorum commit, and is simpler than postgresql attempting feature.
There is one configuration parameter in MySQL 5.7 which indicates the
number of sync replication node.
The primary server commit when the primary server receives the
specified number of ACK from standby server regardless name of standby
server.

And IIRC, Oracle database also doesn't support the quorum commit as well.
The settings standby server sync or async is specified per standby
server in configuration parameter in primary node.

I think that the use of JSON format approach and dedicated language
approach are different.
The dedicated language format approach would be useful for simple
configuration such as the one nesting, not using group.
This will allow us to configure replication more simpler and easier.
In contrast, The JSON format approach would be useful for complex configuration.

I thought that this feature for postgresql should be simple at first
implementation.
It would be good even if there are some restriction such as the
nesting level, the group setting.
The another new approach that I came up with is,
* Add new parameter synchronous_replication_method (say s_r_method)
which can have two names: 'priority', 'quorum'
* If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
is handled using priority. It's same as '[n1,n2,n3]' in dedicated
laguage.
* If s_r_method = 'quorum', the value of s_s_names is handled using
quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
* Setting of synchronous_standby_names is same as today. That is, the
storing the nesting value is not supported.
* If we want to support more complex syntax like what we are
discussing, we can add the new value to s_r_method, for example
'complex', 'json'.

Though?

Regards,

--
Masahiko Sawada

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-13 18:16:08
Message-ID:	561D4A68.70307@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/13/2015 11:02 AM, Masahiko Sawada wrote:
> I thought that this feature for postgresql should be simple at first
> implementation.
> It would be good even if there are some restriction such as the
> nesting level, the group setting.
> The another new approach that I came up with is,
> * Add new parameter synchronous_replication_method (say s_r_method)
> which can have two names: 'priority', 'quorum'
> * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
> is handled using priority. It's same as '[n1,n2,n3]' in dedicated
> laguage.
> * If s_r_method = 'quorum', the value of s_s_names is handled using
> quorum commit, It's same as '(n1,n2,n3)' in dedicated language.

Well, the first question is: can you implement both of these things for
9.6, realistically? If you can implement them, then we can argue about
configuration format later. It's even possible that the nature of your
implementation will enforce a particular syntax.

For example, if your implementation requires sync groups to be named,
then we have to include group names in the syntax. If you can't
implement nesting in the near future, there's no reason to have a syntax
for it.

> * Setting of synchronous_standby_names is same as today. That is, the
> storing the nesting value is not supported.
> * If we want to support more complex syntax like what we are
> discussing, we can add the new value to s_r_method, for example
> 'complex', 'json'.

I think having two different syntaxes is a bad idea. I'd rather have a
wholly proprietary configuration markup than deal with two alternate ones.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-13 18:28:57
Message-ID:	CAD21AoBF7Vbwo+FtDwqhJVB=20vmTYVxVx0ZR-n2M=b=NhdMHw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 14, 2015 at 3:16 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 10/13/2015 11:02 AM, Masahiko Sawada wrote:
>> I thought that this feature for postgresql should be simple at first
>> implementation.
>> It would be good even if there are some restriction such as the
>> nesting level, the group setting.
>> The another new approach that I came up with is,
>> * Add new parameter synchronous_replication_method (say s_r_method)
>> which can have two names: 'priority', 'quorum'
>> * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
>> is handled using priority. It's same as '[n1,n2,n3]' in dedicated
>> laguage.
>> * If s_r_method = 'quorum', the value of s_s_names is handled using
>> quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
>
> Well, the first question is: can you implement both of these things for
> 9.6, realistically?
> If you can implement them, then we can argue about
> configuration format later. It's even possible that the nature of your
> implementation will enforce a particular syntax.
>
> For example, if your implementation requires sync groups to be named,
> then we have to include group names in the syntax. If you can't
> implement nesting in the near future, there's no reason to have a syntax
> for it.

Yes, I can implement both without nesting.
The draft patch of replication using priority is already implemented
by Michael, so I need to implement simple quorum commit logic and
merge them.

>> * Setting of synchronous_standby_names is same as today. That is, the
>> storing the nesting value is not supported.
>> * If we want to support more complex syntax like what we are
>> discussing, we can add the new value to s_r_method, for example
>> 'complex', 'json'.
>
> I think having two different syntaxes is a bad idea. I'd rather have a
> wholly proprietary configuration markup than deal with two alternate ones.
>

I agree, we should choice either.

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-14 04:34:42
Message-ID:	CAB7nPqQN48KP6Gj471R2QkxOHrHOODdzWFuQABY40ofuO2_=TA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 14, 2015 at 3:28 AM, Masahiko Sawada wrote:
> The draft patch of replication using priority is already implemented
> by Michael, so I need to implement simple quorum commit logic and
> merge them.

The last patch in date I know of is this one:
http://www.postgresql.org/message-id/CAB7nPqRFSLmHbYonra0=p-X8MJ-XTL7oxjP_QXDJGsjpvWRXPA@mail.gmail.com
It would surely need a rebase.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-14 05:08:19
Message-ID:	CAB7nPqT4nGrp4XMi-RHdsm=vySqZ==CAsC=4qnRCiYG0R4xLng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 14, 2015 at 3:02 AM, Masahiko Sawada wrote:
> On Sat, Oct 10, 2015 at 4:35 AM, Robert Haas wrote:
>> On Fri, Oct 9, 2015 at 12:00 AM, Amit Kapila wrote:
>>> Sounds like both the approaches have some pros and cons, also there are
>>> some people who prefer mini-language and others who prefer JSON. I think
>>> one thing that might help, is to check how other databases support this
>>> feature or somewhat similar to this feature (mainly with respect to User
>>> Interface), as that can help us in knowing what users are already familiar
>>> with.
>>
>> +1!

Thanks for having a look at that!

> For example, MySQL 5.7 has similar feature, but it doesn't support
> quorum commit, and is simpler than postgresql attempting feature.
> There is one configuration parameter in MySQL 5.7 which indicates the
> number of sync replication node.
> The primary server commit when the primary server receives the
> specified number of ACK from standby server regardless name of standby
> server.

Hm. This is not much helpful in the case we especially mentioned
upthread at some point with 2 data centers, first one has the master
and a sync standby, and second one has a set of standbys. We need to
be sure that the standby in DC1 acknowledges all the time, and we
would only need to wait for one or more of them in DC2. I still
believe that this is the main use case for this feature to ensure a
proper failover without data loss if one data center blows away with a
meteorite.

> And IIRC, Oracle database also doesn't support the quorum commit as well.
> The settings standby server sync or async is specified per standby
> server in configuration parameter in primary node.

And I guess that they manage standby nodes using a system catalog
then, being able to change the state of a node from async to sync with
something at SQL level? Is that right?

> I thought that this feature for postgresql should be simple at first
> implementation.

And extensible.

> It would be good even if there are some restriction such as the
> nesting level, the group setting.
> The another new approach that I came up with is,
> * Add new parameter synchronous_replication_method (say s_r_method)
> which can have two names: 'priority', 'quorum'
> * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
> is handled using priority. It's same as '[n1,n2,n3]' in dedicated
> language.
> * If s_r_method = 'quorum', the value of s_s_names is handled using
> quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
> * Setting of synchronous_standby_names is same as today. That is, the
> storing the nesting value is not supported.
> * If we want to support more complex syntax like what we are
> discussing, we can add the new value to s_r_method, for example
> 'complex', 'json'.

If we go that path, I think that we still would need an extra
parameter to control the number of nodes that need to be taken from
the set defined in s_s_names whichever of quorum or priority is used.
Let's not forget that in the current configuration the first node
listed in s_s_names and *connected* to the master will be used to
acknowledge the commit.
--
Michael

From:	Beena Emerson <memissemerson(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-14 08:58:13
Message-ID:	CAOG9ApFLrua08rRPqjGkhju+nw5v8UzfOV_Jtzd7CS7F79axoA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 14, 2015 at 10:38 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com
> wrote:

> On Wed, Oct 14, 2015 at 3:02 AM, Masahiko Sawada wrote:
>
> > It would be good even if there are some restriction such as the
> > nesting level, the group setting.
> > The another new approach that I came up with is,
> > * Add new parameter synchronous_replication_method (say s_r_method)
> > which can have two names: 'priority', 'quorum'
> > * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
> > is handled using priority. It's same as '[n1,n2,n3]' in dedicated
> > language.
> > * If s_r_method = 'quorum', the value of s_s_names is handled using
> > quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
> > * Setting of synchronous_standby_names is same as today. That is, the
> > storing the nesting value is not supported.
> > * If we want to support more complex syntax like what we are
> > discussing, we can add the new value to s_r_method, for example
> > 'complex', 'json'.
>
> If we go that path, I think that we still would need an extra
> parameter to control the number of nodes that need to be taken from
> the set defined in s_s_names whichever of quorum or priority is used.
> Let's not forget that in the current configuration the first node
> listed in s_s_names and *connected* to the master will be used to
> acknowledge the commit.
>

Would it be better to just use a simple language instead of 3 different
parameters?

s_s_names = 2[X,Y,Z] # 2 priority
s_s_names = 1(A,B,C) # 1 quorum
s_s_names = R,S,T # default behavior: 1 priorty?

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Beena Emerson <memissemerson(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-14 11:24:01
Message-ID:	CAD21AoA2847DMQLX2=S_PBSmfvqaeqLr-Do15ohyEqO5YKW_uw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Reply to multiple member.

> Hm. This is not much helpful in the case we especially mentioned
> upthread at some point with 2 data centers, first one has the master
> and a sync standby, and second one has a set of standbys. We need to
> be sure that the standby in DC1 acknowledges all the time, and we
> would only need to wait for one or more of them in DC2. I still
> believe that this is the main use case for this feature to ensure a
> proper failover without data loss if one data center blows away with a
> meteorite.

Yes, I think so too.
In such case, the idea I posted yesterday could handle by setting the
followings;
* s_r_method = 'quorum'
* s_s_names = 'tokyo, seattle'
* s_s_nums = 2
* application_name of the first standby, which is in DC1, is 'tokyo',
and application_name of other standbys, which are in DC2, is
'seattle'.

> And I guess that they manage standby nodes using a system catalog
> then, being able to change the state of a node from async to sync with
> something at SQL level? Is that right?

I think that's right.

>
> If we go that path, I think that we still would need an extra
> parameter to control the number of nodes that need to be taken from
> the set defined in s_s_names whichever of quorum or priority is used.
> Let's not forget that in the current configuration the first node
> listed in s_s_names and *connected* to the master will be used to
> acknowledge the commit.

Yeah, such parameter is needed. I've forgotten to consider that.

>
>
> Would it be better to just use a simple language instead of 3 different
> parameters?
>
> s_s_names = 2[X,Y,Z] # 2 priority
> s_s_names = 1(A,B,C) # 1 quorum
> s_s_names = R,S,T # default behavior: 1 priorty?

I think that this means that we have choose dedicated language
approach instead of JSON format approach.
If we want to set multi sync replication more complexly, we would not
have no choice other than improvement of dedicated language.

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Beena Emerson <memissemerson(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-16 07:08:29
Message-ID:	CAB7nPqSx8DzZH+Vq+GoA7KuxH1DemTLbdTkh5FVHwQ0xxY+hVQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 14, 2015 at 5:58 PM, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
>
>
> On Wed, Oct 14, 2015 at 10:38 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>
>> On Wed, Oct 14, 2015 at 3:02 AM, Masahiko Sawada wrote:
>>
>> > It would be good even if there are some restriction such as the
>> > nesting level, the group setting.
>> > The another new approach that I came up with is,
>> > * Add new parameter synchronous_replication_method (say s_r_method)
>> > which can have two names: 'priority', 'quorum'
>> > * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
>> > is handled using priority. It's same as '[n1,n2,n3]' in dedicated
>> > language.
>> > * If s_r_method = 'quorum', the value of s_s_names is handled using
>> > quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
>> > * Setting of synchronous_standby_names is same as today. That is, the
>> > storing the nesting value is not supported.
>> > * If we want to support more complex syntax like what we are
>> > discussing, we can add the new value to s_r_method, for example
>> > 'complex', 'json'.
>>
>> If we go that path, I think that we still would need an extra
>> parameter to control the number of nodes that need to be taken from
>> the set defined in s_s_names whichever of quorum or priority is used.
>> Let's not forget that in the current configuration the first node
>> listed in s_s_names and *connected* to the master will be used to
>> acknowledge the commit.
>
>
> Would it be better to just use a simple language instead of 3 different
> parameters?
>
> s_s_names = 2[X,Y,Z] # 2 priority
> s_s_names = 1(A,B,C) # 1 quorum
> s_s_names = R,S,T # default behavior: 1 priorty?

Yeah, the main use case for this feature would just be that for most users:
s_s_names = 2[dc1_standby,1(dc2_standby1, dc2_standby2)]
Meaning that we wait for dc1_standby, which is a standby on data
center 1, and one of the dc2_standby* set which are standbys in data
center 2.
So the following minimal characteristics would be needed:
- support for priority selectivity for N nodes
- support for quorum selectivity for N nodes
- support for nested set of nodes, at least 2 level deep.
The requirement to define a group of nodes also would not be needed.
If we have that, I would say that we already do better than OrXXXe and
MyXXL, to cite two of them. And if we can get that for 9.6 or even
9.7, that would be really great.
Regards,
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-19 15:17:53
Message-ID:	CAD21AoCau1kE=oaPoZ=pZ0iPhjgiwAxErPcum5xmOdrO0N0E1w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 14, 2015 at 3:16 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 10/13/2015 11:02 AM, Masahiko Sawada wrote:
>> I thought that this feature for postgresql should be simple at first
>> implementation.
>> It would be good even if there are some restriction such as the
>> nesting level, the group setting.
>> The another new approach that I came up with is,
>> * Add new parameter synchronous_replication_method (say s_r_method)
>> which can have two names: 'priority', 'quorum'
>> * If s_r_method = 'priority', the value of s_s_names (e.g. 'n1,n2,n3')
>> is handled using priority. It's same as '[n1,n2,n3]' in dedicated
>> laguage.
>> * If s_r_method = 'quorum', the value of s_s_names is handled using
>> quorum commit, It's same as '(n1,n2,n3)' in dedicated language.
>
> Well, the first question is: can you implement both of these things for
> 9.6, realistically? If you can implement them, then we can argue about
> configuration format later. It's even possible that the nature of your
> implementation will enforce a particular syntax.
>

Hi,

Attached patch is a rough patch which supports multi sync replication
by another approach I sent before.

The new GUC parameters are:
* synchronous_standby_num, which specifies the number of standby
servers using sync rep. (default is 0)
* synchronous_replication_method, which specifies replication method;
priority or quorum. (default is priority)

The behaviour of 'priority' and 'quorum' are same as what we've been discussing.
But I write overview of these here again here.

[Priority Method]
The standby server has each different priority, and the active standby
servers having the top N priroity are become sync standby server.
If synchronous_standby_names = '*', the all active standby server
would be sync standby server.
If you want to set up standby like 9.5 or before, you can set
synchronous_standby_num = 1.

[Quorum Method]
The standby servers have same priority 1, and the all the active
standby servers will be sync standby server.
The master server have to wait for ACK from N sync standby servers at
least before COMMIT.
If synchronous_standby_names = '*', the all active standby server
would be sync standby server.

[Use case]
This patch can handle the main use case where Michael said;
There are 2 data centers, first one has the master and a sync standby,
and second one has a set of standbys.
We need to be sure that the standby in DC1 acknowledges all the time,
and we would only need to wait for one or more of them in DC2.

In order to handle this use case, you set these standbys and GUC
parameter as follows.
* synchronous_standby_names = 'DC1, DC2'
* synchronous_standby_num = 2
* synchronous_replication_method = quorum
* The name of standby server in DC1 is 'DC1', and the names of two
standby servers in DC2 are 'DC2'.

[Extensible]
By setting same application_name to different standbys, we can set up
sync replication with grouping standbys.
If we want to set up replication more complexly and flexibility, we
could add new syntax for s_s_names (e.g., JSON format or dedicated
language), and increase kind of values of
synhcronous_replication_method, e.g. s_r_method = 'complex',

And this patch doesn't need new parser for GUC parameter.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
synchronous_replication_method_v1.patch	application/octet-stream	14.0 KB

From:	Beena Emerson <memissemerson(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-20 11:10:08
Message-ID:	CAOG9ApEawDVGrEn1Qi5hWqP1=qqyTkfH7osRA4wzj6RWCFi_GA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 19, 2015 at 8:47 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

>
> Hi,
>
> Attached patch is a rough patch which supports multi sync replication
> by another approach I sent before.
>
> The new GUC parameters are:
> * synchronous_standby_num, which specifies the number of standby
> servers using sync rep. (default is 0)
> * synchronous_replication_method, which specifies replication method;
> priority or quorum. (default is priority)
>
> The behaviour of 'priority' and 'quorum' are same as what we've been
> discussing.
> But I write overview of these here again here.
>
> [Priority Method]
> The standby server has each different priority, and the active standby
> servers having the top N priroity are become sync standby server.
> If synchronous_standby_names = '*', the all active standby server
> would be sync standby server.
> If you want to set up standby like 9.5 or before, you can set
> synchronous_standby_num = 1.
>
>

I used the following setting with 2 servers A and D connected:

synchronous_standby_names = 'A,B,C,D'
synchronous_standby_num = 2
synchronous_replication_method = 'priority'

Though s_r_m = 'quorum' worked fine, changing it to 'priority' caused
segmentation fault.

Regards,

Beena Emerson

Have a Great Day!

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Beena Emerson <memissemerson(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-21 15:47:02
Message-ID:	CAD21AoBff2ZD-cYFm+nhoZxigF5jeRzG958Pk4gjyG=mFVvChQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 20, 2015 at 8:10 PM, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
>
> On Mon, Oct 19, 2015 at 8:47 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>>
>>
>> Hi,
>>
>> Attached patch is a rough patch which supports multi sync replication
>> by another approach I sent before.
>>
>> The new GUC parameters are:
>> * synchronous_standby_num, which specifies the number of standby
>> servers using sync rep. (default is 0)
>> * synchronous_replication_method, which specifies replication method;
>> priority or quorum. (default is priority)
>>
>> The behaviour of 'priority' and 'quorum' are same as what we've been
>> discussing.
>> But I write overview of these here again here.
>>
>> [Priority Method]
>> The standby server has each different priority, and the active standby
>> servers having the top N priroity are become sync standby server.
>> If synchronous_standby_names = '*', the all active standby server
>> would be sync standby server.
>> If you want to set up standby like 9.5 or before, you can set
>> synchronous_standby_num = 1.
>>
>
>
> I used the following setting with 2 servers A and D connected:
>
> synchronous_standby_names = 'A,B,C,D'
> synchronous_standby_num = 2
> synchronous_replication_method = 'priority'
>
> Though s_r_m = 'quorum' worked fine, changing it to 'priority' caused
> segmentation fault.
>

Thank you for taking a look!
This patch is a tool for discussion, so I'm not going to fix this bug
until getting consensus.

We are still under the discussion to find solution that can get consensus.
I felt that it's difficult to select from the two approaches within
this development cycle, and there would not be time to implement such
big feature even if we selected.
But this feature is obviously needed by many users.
So I'm considering more simple and extensible something solution, the
idea I posted is one of them.
The another worth considering approach is that just specifying the
number of sync standby. It also can cover the main use cases in
some-cases.

Regards,

--
Masahiko Sawada

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-10-29 14:16:32
Message-ID:	CAHGQGwE8N7f5dW-QQsb7ZP4-oZ+tqx-xyyC2cgFE_5EHLhqKrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Oct 20, 2015 at 8:10 PM, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
>>
>> On Mon, Oct 19, 2015 at 8:47 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> Attached patch is a rough patch which supports multi sync replication
>>> by another approach I sent before.
>>>
>>> The new GUC parameters are:
>>> * synchronous_standby_num, which specifies the number of standby
>>> servers using sync rep. (default is 0)
>>> * synchronous_replication_method, which specifies replication method;
>>> priority or quorum. (default is priority)
>>>
>>> The behaviour of 'priority' and 'quorum' are same as what we've been
>>> discussing.
>>> But I write overview of these here again here.
>>>
>>> [Priority Method]
>>> The standby server has each different priority, and the active standby
>>> servers having the top N priroity are become sync standby server.
>>> If synchronous_standby_names = '*', the all active standby server
>>> would be sync standby server.
>>> If you want to set up standby like 9.5 or before, you can set
>>> synchronous_standby_num = 1.
>>>
>>
>>
>> I used the following setting with 2 servers A and D connected:
>>
>> synchronous_standby_names = 'A,B,C,D'
>> synchronous_standby_num = 2
>> synchronous_replication_method = 'priority'
>>
>> Though s_r_m = 'quorum' worked fine, changing it to 'priority' caused
>> segmentation fault.
>>
>
> Thank you for taking a look!
> This patch is a tool for discussion, so I'm not going to fix this bug
> until getting consensus.
>
> We are still under the discussion to find solution that can get consensus.
> I felt that it's difficult to select from the two approaches within
> this development cycle, and there would not be time to implement such
> big feature even if we selected.
> But this feature is obviously needed by many users.
> So I'm considering more simple and extensible something solution, the
> idea I posted is one of them.
> The another worth considering approach is that just specifying the
> number of sync standby. It also can cover the main use cases in
> some-cases.

Yes, it covers main and simple use case like "I want to have multiple
synchronous replicas!". Even if we miss quorum commit at the first
version, the feature is still very useful.

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-13 00:07:01
Message-ID:	CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Oct 20, 2015 at 8:10 PM, Beena Emerson <memissemerson(at)gmail(dot)com> wrote:
>>>
>>> On Mon, Oct 19, 2015 at 8:47 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>>> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Attached patch is a rough patch which supports multi sync replication
>>>> by another approach I sent before.
>>>>
>>>> The new GUC parameters are:
>>>> * synchronous_standby_num, which specifies the number of standby
>>>> servers using sync rep. (default is 0)
>>>> * synchronous_replication_method, which specifies replication method;
>>>> priority or quorum. (default is priority)
>>>>
>>>> The behaviour of 'priority' and 'quorum' are same as what we've been
>>>> discussing.
>>>> But I write overview of these here again here.
>>>>
>>>> [Priority Method]
>>>> The standby server has each different priority, and the active standby
>>>> servers having the top N priroity are become sync standby server.
>>>> If synchronous_standby_names = '*', the all active standby server
>>>> would be sync standby server.
>>>> If you want to set up standby like 9.5 or before, you can set
>>>> synchronous_standby_num = 1.
>>>>
>>>
>>>
>>> I used the following setting with 2 servers A and D connected:
>>>
>>> synchronous_standby_names = 'A,B,C,D'
>>> synchronous_standby_num = 2
>>> synchronous_replication_method = 'priority'
>>>
>>> Though s_r_m = 'quorum' worked fine, changing it to 'priority' caused
>>> segmentation fault.
>>>
>>
>> Thank you for taking a look!
>> This patch is a tool for discussion, so I'm not going to fix this bug
>> until getting consensus.
>>
>> We are still under the discussion to find solution that can get consensus.
>> I felt that it's difficult to select from the two approaches within
>> this development cycle, and there would not be time to implement such
>> big feature even if we selected.
>> But this feature is obviously needed by many users.
>> So I'm considering more simple and extensible something solution, the
>> idea I posted is one of them.
>> The another worth considering approach is that just specifying the
>> number of sync standby. It also can cover the main use cases in
>> some-cases.
>
> Yes, it covers main and simple use case like "I want to have multiple
> synchronous replicas!". Even if we miss quorum commit at the first
> version, the feature is still very useful.

It can cover not only the case you mentioned but also main use case
Michael mentioned by setting same application_name.
And that first version patch is almost implemented, so just needs to
be reviewed.

I think that it would be good to implement the simple feature at the
first version, and then coordinate the design based on opinion and
feed backs from more user, use-case.

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-13 03:52:12
Message-ID:	20151113.125212.102628436.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg(at)mail(dot)gmail(dot)com>
> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
...
> >> This patch is a tool for discussion, so I'm not going to fix this bug
> >> until getting consensus.
> >>
> >> We are still under the discussion to find solution that can get consensus.
> >> I felt that it's difficult to select from the two approaches within
> >> this development cycle, and there would not be time to implement such
> >> big feature even if we selected.
> >> But this feature is obviously needed by many users.
> >> So I'm considering more simple and extensible something solution, the
> >> idea I posted is one of them.
> >> The another worth considering approach is that just specifying the
> >> number of sync standby. It also can cover the main use cases in
> >> some-cases.
> >
> > Yes, it covers main and simple use case like "I want to have multiple
> > synchronous replicas!". Even if we miss quorum commit at the first
> > version, the feature is still very useful.

> It can cover not only the case you mentioned but also main use case
> Michael mentioned by setting same application_name.
> And that first version patch is almost implemented, so just needs to
> be reviewed.
>
> I think that it would be good to implement the simple feature at the
> first version, and then coordinate the design based on opinion and
> feed backs from more user, use-case.

Yeah. I agree with it. And I have two proposals in this
direction.

- Notation

synchronous_standby_names, and synchronous_replication_method as
a variable to provide other syntax is probably no argument
except its name. But I feel synchronous_standby_num looks bit
too specific.

I'd like to propose if this doesn't reprise the argument on
notation for replication definitions:p

The following two GUCs would be enough to bear future expansion
of notation syntax and/or method.

synchronous_standby_names : as it is

synchronous_replication_method:

default is "1-priority", which means the same with the current
meaning. possible additional values so far would be,

"n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
where n is the number of required acknowledges.

"n-quorum": the format of s_s_names is the same as above, but
it is read in quorum context.

These can be expanded, for example, as follows, but in future.

"complex" : Michael's format.
"json" : JSON?
"json-ext": specify JSON in external file.

Even after we have complex notations, I suppose that many use
cases are coverd by the first tree notations.

- Internal design

What should be done in SyncRepReleaseWaiters() is calculating a
pair of LSNs that can be regarded as synced and decide whether
*this* walsender have advanced the LSN pair, then trying to
release backends that wait for the LSNs *if* this walsender has
advanced them.

From such point, the proposed patch will make redundant trials
to release backens.

Addition to that, the patch looks to be a mixture of the current
implement and the new feature. These are for the same objective
so they cannot coexist each other, I think. As the result, codes
for both quorum/priority judgement appear at multiple level in
call tree. This would be an obstacle for future (possible)
expansion.

So, I think this feature should be implemented as following,

SyncRepInitConfig reads the configuration and stores the result
structure into elsewhere such like WalSnd->syncrepset_definition
instead of WalSnd->sync_standby_priority, which should be
removed. Nothing would be stored if the current wal sender is
not a member of the defined replication set. Storing a pointer
to matching function there would increase the flexibility but
such implement in contrast will make the code difficult to be
read.. (I often look for the entity of xlogreader->read_page()
;)

Then SyncRepSyncedLsnAdvancedTo() instead of
SyncRepGetSynchronousStandbys() returns an LSN pair that can be
regarded as 'synced' according to specified definition of
replication set and whether this walsender have advanced the
LSNs.

Finally, SyncRepReleaseWaiters() uses it to release backends if
needed.

The differences among quorum/priority or others are confined in
SyncRepSyncedLsnAdvancedTo(). As the result,
SyncRepReleaseWaiters would look as following.

| SyncRepReleaseWaiters(void)
| {
| if (MyWalSnd->syncrepset_definition == NULL || ...)
| return;
| ...
| if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
| {
| /* I haven't advanced the synced LSNs */
| LWLockRelease(SyncRepLock);
| rerturn;
| }
| /* Set the lsn first so that when we wake backends they will relase...

I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
does but perhaps yes we can:p in effective manner..

What do you think about this?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-16 16:09:57
Message-ID:	CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 13, 2015 at 12:52 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Fri, 13 Nov 2015 09:07:01 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC9Vi8wOGtXio3Z1NwoVfXBJPNFtt7+5jadVHKn17uHOg(at)mail(dot)gmail(dot)com>
>> On Thu, Oct 29, 2015 at 11:16 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > On Thu, Oct 22, 2015 at 12:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> ...
>> >> This patch is a tool for discussion, so I'm not going to fix this bug
>> >> until getting consensus.
>> >>
>> >> We are still under the discussion to find solution that can get consensus.
>> >> I felt that it's difficult to select from the two approaches within
>> >> this development cycle, and there would not be time to implement such
>> >> big feature even if we selected.
>> >> But this feature is obviously needed by many users.
>> >> So I'm considering more simple and extensible something solution, the
>> >> idea I posted is one of them.
>> >> The another worth considering approach is that just specifying the
>> >> number of sync standby. It also can cover the main use cases in
>> >> some-cases.
>> >
>> > Yes, it covers main and simple use case like "I want to have multiple
>> > synchronous replicas!". Even if we miss quorum commit at the first
>> > version, the feature is still very useful.
>
> +1
>
>> It can cover not only the case you mentioned but also main use case
>> Michael mentioned by setting same application_name.
>> And that first version patch is almost implemented, so just needs to
>> be reviewed.
>>
>> I think that it would be good to implement the simple feature at the
>> first version, and then coordinate the design based on opinion and
>> feed backs from more user, use-case.
>
> Yeah. I agree with it. And I have two proposals in this
> direction.
>
> - Notation
>
> synchronous_standby_names, and synchronous_replication_method as
> a variable to provide other syntax is probably no argument
> except its name. But I feel synchronous_standby_num looks bit
> too specific.
>
> I'd like to propose if this doesn't reprise the argument on
> notation for replication definitions:p
>
> The following two GUCs would be enough to bear future expansion
> of notation syntax and/or method.
>
> synchronous_standby_names : as it is
>
> synchronous_replication_method:
>
> default is "1-priority", which means the same with the current
> meaning. possible additional values so far would be,
>
> "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
> where n is the number of required acknowledges.

One question is that what is different between the leading "n" in
s_s_names and the leading "n" of "n-priority"?

>
> "n-quorum": the format of s_s_names is the same as above, but
> it is read in quorum context.
>
> These can be expanded, for example, as follows, but in future.
>
> "complex" : Michael's format.
> "json" : JSON?
> "json-ext": specify JSON in external file.
>
> Even after we have complex notations, I suppose that many use
> cases are coverd by the first tree notations.

I'm not sure it's desirable to implement the all kind of methods into core.
I think it's better to extend replication in order to be more
extensibility like adding hook function.
And then other approach is implemented as a contrib module.

>
> - Internal design
>
> What should be done in SyncRepReleaseWaiters() is calculating a
> pair of LSNs that can be regarded as synced and decide whether
> *this* walsender have advanced the LSN pair, then trying to
> release backends that wait for the LSNs *if* this walsender has
> advanced them.
>
> From such point, the proposed patch will make redundant trials
> to release backens.
>
> Addition to that, the patch looks to be a mixture of the current
> implement and the new feature. These are for the same objective
> so they cannot coexist each other, I think. As the result, codes
> for both quorum/priority judgement appear at multiple level in
> call tree. This would be an obstacle for future (possible)
> expansion.
>
> So, I think this feature should be implemented as following,
>
> SyncRepInitConfig reads the configuration and stores the result
> structure into elsewhere such like WalSnd->syncrepset_definition
> instead of WalSnd->sync_standby_priority, which should be
> removed. Nothing would be stored if the current wal sender is
> not a member of the defined replication set. Storing a pointer
> to matching function there would increase the flexibility but
> such implement in contrast will make the code difficult to be
> read.. (I often look for the entity of xlogreader->read_page()
> ;)
>
> Then SyncRepSyncedLsnAdvancedTo() instead of
> SyncRepGetSynchronousStandbys() returns an LSN pair that can be
> regarded as 'synced' according to specified definition of
> replication set and whether this walsender have advanced the
> LSNs.
>
> Finally, SyncRepReleaseWaiters() uses it to release backends if
> needed.
>
> The differences among quorum/priority or others are confined in
> SyncRepSyncedLsnAdvancedTo(). As the result,
> SyncRepReleaseWaiters would look as following.
>
> | SyncRepReleaseWaiters(void)
> | {
> | if (MyWalSnd->syncrepset_definition == NULL || ...)
> | return;
> | ...
> | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
> | {
> | /* I haven't advanced the synced LSNs */
> | LWLockRelease(SyncRepLock);
> | rerturn;
> | }
> | /* Set the lsn first so that when we wake backends they will relase...
>
> I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
> does but perhaps yes we can:p in effective manner..
>
> What do you think about this?

I agree with this design.
What SyncRepSyncedLsnAdvancedTo() does would be different for each
method, so we can implement "n-priority" style multiple sync
replication at first version.

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-17 00:57:06
Message-ID:	20151117.095706.240836667.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ(at)mail(dot)gmail(dot)com>
> > - Notation
> >
> > synchronous_standby_names, and synchronous_replication_method as
> > a variable to provide other syntax is probably no argument
> > except its name. But I feel synchronous_standby_num looks bit
> > too specific.
> >
> > I'd like to propose if this doesn't reprise the argument on
> > notation for replication definitions:p
> >
> > The following two GUCs would be enough to bear future expansion
> > of notation syntax and/or method.
> >
> > synchronous_standby_names : as it is
> >
> > synchronous_replication_method:
> >
> > default is "1-priority", which means the same with the current
> > meaning. possible additional values so far would be,
> >
> > "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
> > where n is the number of required acknowledges.
>
> One question is that what is different between the leading "n" in
> s_s_names and the leading "n" of "n-priority"?

Ah. Sorry for the ambiguous description. 'n' in s_s_names
representing an arbitrary integer number and that in "n-priority"
is literally an "n", meaning "a format with any number of
priority hosts" as a whole. As an instance,

synchronous_replication_method = "n-priority"
synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"

I added "n-" of "n-priority" to distinguish with "1-priority" so
if we won't provide "1-priority" for backward compatibility,
"priority" would be enough to represent the type.

By the way, s_r_method is not essentially necessary but it would
be important to avoid complexity of autodetection of formats
including currently undefined ones.

> > "n-quorum": the format of s_s_names is the same as above, but
> > it is read in quorum context.

The "n" of this is the same as above.

> > These can be expanded, for example, as follows, but in future.
> >
> > "complex" : Michael's format.
> > "json" : JSON?
> > "json-ext": specify JSON in external file.
> >
> > Even after we have complex notations, I suppose that many use
> > cases are coverd by the first tree notations.
>
> I'm not sure it's desirable to implement the all kind of methods into core.
> I think it's better to extend replication in order to be more
> extensibility like adding hook function.
> And then other approach is implemented as a contrib module.

I agree with you. I proposed the following internal design having
that in mind.

> > - Internal design
> >
> > What should be done in SyncRepReleaseWaiters() is calculating a
> > pair of LSNs that can be regarded as synced and decide whether
> > *this* walsender have advanced the LSN pair, then trying to
> > release backends that wait for the LSNs *if* this walsender has
> > advanced them.
> >
> > From such point, the proposed patch will make redundant trials
> > to release backens.
> >
> > Addition to that, the patch looks to be a mixture of the current
> > implement and the new feature. These are for the same objective
> > so they cannot coexist each other, I think. As the result, codes
> > for both quorum/priority judgement appear at multiple level in
> > call tree. This would be an obstacle for future (possible)
> > expansion.
> >
> > So, I think this feature should be implemented as following,
> >
> > SyncRepInitConfig reads the configuration and stores the result
> > structure into elsewhere such like WalSnd->syncrepset_definition
> > instead of WalSnd->sync_standby_priority, which should be
> > removed. Nothing would be stored if the current wal sender is
> > not a member of the defined replication set. Storing a pointer
> > to matching function there would increase the flexibility but
> > such implement in contrast will make the code difficult to be
> > read.. (I often look for the entity of xlogreader->read_page()
> > ;)
> >
> > Then SyncRepSyncedLsnAdvancedTo() instead of
> > SyncRepGetSynchronousStandbys() returns an LSN pair that can be
> > regarded as 'synced' according to specified definition of
> > replication set and whether this walsender have advanced the
> > LSNs.
> >
> > Finally, SyncRepReleaseWaiters() uses it to release backends if
> > needed.
> >
> > The differences among quorum/priority or others are confined in
> > SyncRepSyncedLsnAdvancedTo(). As the result,
> > SyncRepReleaseWaiters would look as following.
> >
> > | SyncRepReleaseWaiters(void)
> > | {
> > | if (MyWalSnd->syncrepset_definition == NULL || ...)
> > | return;
> > | ...
> > | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
> > | {
> > | /* I haven't advanced the synced LSNs */
> > | LWLockRelease(SyncRepLock);
> > | rerturn;
> > | }
> > | /* Set the lsn first so that when we wake backends they will relase...
> >
> > I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
> > does but perhaps yes we can:p in effective manner..
> >
> > What do you think about this?
>
> I agree with this design.
> What SyncRepSyncedLsnAdvancedTo() does would be different for each
> method, so we can implement "n-priority" style multiple sync
> replication at first version.

Maybe the first *additional* one if we decide to keep backward
compatibility, as the discussion above.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-17 09:13:11
Message-ID:	CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 17, 2015 at 9:57 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Tue, 17 Nov 2015 01:09:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDhqGB=EtBfqnkHxR8T53d+8qMs4DPm5HVyq4bA2oR5eQ(at)mail(dot)gmail(dot)com>
>> > - Notation
>> >
>> > synchronous_standby_names, and synchronous_replication_method as
>> > a variable to provide other syntax is probably no argument
>> > except its name. But I feel synchronous_standby_num looks bit
>> > too specific.
>> >
>> > I'd like to propose if this doesn't reprise the argument on
>> > notation for replication definitions:p
>> >
>> > The following two GUCs would be enough to bear future expansion
>> > of notation syntax and/or method.
>> >
>> > synchronous_standby_names : as it is
>> >
>> > synchronous_replication_method:
>> >
>> > default is "1-priority", which means the same with the current
>> > meaning. possible additional values so far would be,
>> >
>> > "n-priority": the format of s_s_names is "n, <name>, <name>, <name>...",
>> > where n is the number of required acknowledges.
>>
>> One question is that what is different between the leading "n" in
>> s_s_names and the leading "n" of "n-priority"?
>
> Ah. Sorry for the ambiguous description. 'n' in s_s_names
> representing an arbitrary integer number and that in "n-priority"
> is literally an "n", meaning "a format with any number of
> priority hosts" as a whole. As an instance,
>
> synchronous_replication_method = "n-priority"
> synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>
> I added "n-" of "n-priority" to distinguish with "1-priority" so
> if we won't provide "1-priority" for backward compatibility,
> "priority" would be enough to represent the type.
>
> By the way, s_r_method is not essentially necessary but it would
> be important to avoid complexity of autodetection of formats
> including currently undefined ones.

Than you for your explanation, I understood that.

It means that the format of s_s_names will be changed, which would be not good.
So, how about the adding just s_r_method parameter and the number of
required ACK is represented in the leading of s_r_method?
For example, the following setting is same as above.

synchronous_replication_method = "2-priority"
synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

In quorum method, we can set;
synchronous_replication_method = "2-quorum"
synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

Thought?

>
>
>> > "n-quorum": the format of s_s_names is the same as above, but
>> > it is read in quorum context.
>
> The "n" of this is the same as above.
>
>> > These can be expanded, for example, as follows, but in future.
>> >
>> > "complex" : Michael's format.
>> > "json" : JSON?
>> > "json-ext": specify JSON in external file.
>> >
>> > Even after we have complex notations, I suppose that many use
>> > cases are coverd by the first tree notations.
>>
>> I'm not sure it's desirable to implement the all kind of methods into core.
>> I think it's better to extend replication in order to be more
>> extensibility like adding hook function.
>> And then other approach is implemented as a contrib module.
>
> I agree with you. I proposed the following internal design having
> that in mind.
>
>> > - Internal design
>> >
>> > What should be done in SyncRepReleaseWaiters() is calculating a
>> > pair of LSNs that can be regarded as synced and decide whether
>> > *this* walsender have advanced the LSN pair, then trying to
>> > release backends that wait for the LSNs *if* this walsender has
>> > advanced them.
>> >
>> > From such point, the proposed patch will make redundant trials
>> > to release backens.
>> >
>> > Addition to that, the patch looks to be a mixture of the current
>> > implement and the new feature. These are for the same objective
>> > so they cannot coexist each other, I think. As the result, codes
>> > for both quorum/priority judgement appear at multiple level in
>> > call tree. This would be an obstacle for future (possible)
>> > expansion.
>> >
>> > So, I think this feature should be implemented as following,
>> >
>> > SyncRepInitConfig reads the configuration and stores the result
>> > structure into elsewhere such like WalSnd->syncrepset_definition
>> > instead of WalSnd->sync_standby_priority, which should be
>> > removed. Nothing would be stored if the current wal sender is
>> > not a member of the defined replication set. Storing a pointer
>> > to matching function there would increase the flexibility but
>> > such implement in contrast will make the code difficult to be
>> > read.. (I often look for the entity of xlogreader->read_page()
>> > ;)
>> >
>> > Then SyncRepSyncedLsnAdvancedTo() instead of
>> > SyncRepGetSynchronousStandbys() returns an LSN pair that can be
>> > regarded as 'synced' according to specified definition of
>> > replication set and whether this walsender have advanced the
>> > LSNs.
>> >
>> > Finally, SyncRepReleaseWaiters() uses it to release backends if
>> > needed.
>> >
>> > The differences among quorum/priority or others are confined in
>> > SyncRepSyncedLsnAdvancedTo(). As the result,
>> > SyncRepReleaseWaiters would look as following.
>> >
>> > | SyncRepReleaseWaiters(void)
>> > | {
>> > | if (MyWalSnd->syncrepset_definition == NULL || ...)
>> > | return;
>> > | ...
>> > | if (!SyncRepSyncedLsnAdvancedTo(&flush_pos, &write_pos))
>> > | {
>> > | /* I haven't advanced the synced LSNs */
>> > | LWLockRelease(SyncRepLock);
>> > | rerturn;
>> > | }
>> > | /* Set the lsn first so that when we wake backends they will relase...
>> >
>> > I'm not thought concretely about what SyncRepSyncedLsnAdvancedTo
>> > does but perhaps yes we can:p in effective manner..
>> >
>> > What do you think about this?
>>
>> I agree with this design.
>> What SyncRepSyncedLsnAdvancedTo() does would be different for each
>> method, so we can implement "n-priority" style multiple sync
>> replication at first version.
>
> Maybe the first *additional* one if we decide to keep backward
> compatibility, as the discussion above.
>

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-17 10:40:10
Message-ID:	20151117.194010.17198448.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
> >> One question is that what is different between the leading "n" in
> >> s_s_names and the leading "n" of "n-priority"?
> >
> > Ah. Sorry for the ambiguous description. 'n' in s_s_names
> > representing an arbitrary integer number and that in "n-priority"
> > is literally an "n", meaning "a format with any number of
> > priority hosts" as a whole. As an instance,
> >
> > synchronous_replication_method = "n-priority"
> > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
> >
> > I added "n-" of "n-priority" to distinguish with "1-priority" so
> > if we won't provide "1-priority" for backward compatibility,
> > "priority" would be enough to represent the type.
> >
> > By the way, s_r_method is not essentially necessary but it would
> > be important to avoid complexity of autodetection of formats
> > including currently undefined ones.
>
> Than you for your explanation, I understood that.
>
> It means that the format of s_s_names will be changed, which would be not good.

I believe that the format of definition of "replication set"(?)
is not fixed and it would be more complex format to support
nested definition. This should be in very different format from
the current simple list of names. This is a selection among three
or possiblly more disigns in order to be tolerable for future
changes, I suppose.

1. Additional formats of definition in future will be stored in
elsewhere of s_s_names.

2. Additional format will be stored in s_s_names, the format will
be automatically detected.

3. (ditto), the format is designated by s_r_method.

4. Any other way?

I choosed the third way. What do you think about future expansion
of the format?

> So, how about the adding just s_r_method parameter and the number of
> required ACK is represented in the leading of s_r_method?
> For example, the following setting is same as above.
>
> synchronous_replication_method = "2-priority"
> synchronous_standby_names = "mercury, venus, earth, mars, jupiter"

I *feel* it is the same or worse as having the third parameter
s_s_num as your previous design.

> In quorum method, we can set;
> synchronous_replication_method = "2-quorum"
> synchronous_standby_names = "mercury, venus, earth, mars, jupiter"
>
> Thought?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-17 10:52:32
Message-ID:	20151117.195232.168237896.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oops.

At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151117(dot)194010(dot)17198448(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello,
>
> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
> > >> One question is that what is different between the leading "n" in
> > >> s_s_names and the leading "n" of "n-priority"?
> > >
> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
> > > representing an arbitrary integer number and that in "n-priority"
> > > is literally an "n", meaning "a format with any number of
> > > priority hosts" as a whole. As an instance,
> > >
> > > synchronous_replication_method = "n-priority"
> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
> > >
> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
> > > if we won't provide "1-priority" for backward compatibility,
> > > "priority" would be enough to represent the type.
> > >
> > > By the way, s_r_method is not essentially necessary but it would
> > > be important to avoid complexity of autodetection of formats
> > > including currently undefined ones.
> >
> > Than you for your explanation, I understood that.
> >
> > It means that the format of s_s_names will be changed, which would be not good.
>
> I believe that the format of definition of "replication set"(?)
> is not fixed and it would be more complex format to support
> nested definition. This should be in very different format from
> the current simple list of names. This is a selection among three
> or possiblly more disigns in order to be tolerable for future
> changes, I suppose.
>
> 1. Additional formats of definition in future will be stored in
> elsewhere of s_s_names.
>
> 2. Additional format will be stored in s_s_names, the format will
> be automatically detected.
>
> 3. (ditto), the format is designated by s_r_method.
>
> 4. Any other way?
>
> I choosed the third way. What do you think about future expansion
> of the format?
>
> > So, how about the adding just s_r_method parameter and the number of
> > required ACK is represented in the leading of s_r_method?
> > For example, the following setting is same as above.
> >
> > synchronous_replication_method = "2-priority"
> > synchronous_standby_names = "mercury, venus, earth, mars, jupiter"
>
> I *feel* it is the same or worse as having the third parameter
> s_s_num as your previous design.

I feel it is the same or worse *than* having the third parameter
s_s_num as your previous design.

> > In quorum method, we can set;
> > synchronous_replication_method = "2-quorum"
> > synchronous_standby_names = "mercury, venus, earth, mars, jupiter"
> >
> > Thought?
>
>
> regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-11-18 08:36:36
Message-ID:	CAD21AoBvN+ZHwfD8XPUmr66X3utwtVBowdJWP0++rStVT9uvtg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Oops.
>
> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151117(dot)194010(dot)17198448(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> Hello,
>>
>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
>> > >> One question is that what is different between the leading "n" in
>> > >> s_s_names and the leading "n" of "n-priority"?
>> > >
>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>> > > representing an arbitrary integer number and that in "n-priority"
>> > > is literally an "n", meaning "a format with any number of
>> > > priority hosts" as a whole. As an instance,
>> > >
>> > > synchronous_replication_method = "n-priority"
>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>> > >
>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>> > > if we won't provide "1-priority" for backward compatibility,
>> > > "priority" would be enough to represent the type.
>> > >
>> > > By the way, s_r_method is not essentially necessary but it would
>> > > be important to avoid complexity of autodetection of formats
>> > > including currently undefined ones.
>> >
>> > Than you for your explanation, I understood that.
>> >
>> > It means that the format of s_s_names will be changed, which would be not good.
>>
>> I believe that the format of definition of "replication set"(?)
>> is not fixed and it would be more complex format to support
>> nested definition. This should be in very different format from
>> the current simple list of names. This is a selection among three
>> or possiblly more disigns in order to be tolerable for future
>> changes, I suppose.
>>
>> 1. Additional formats of definition in future will be stored in
>> elsewhere of s_s_names.
>>
>> 2. Additional format will be stored in s_s_names, the format will
>> be automatically detected.
>>
>> 3. (ditto), the format is designated by s_r_method.
>>
>> 4. Any other way?
>>
>> I choosed the third way. What do you think about future expansion
>> of the format?
>>

I agree with #3 way and the s_s_name format you suggested.
I think that It's extensible and is tolerable for future changes.
I'm going to implement the patch based on this idea if other hackers
agree with this design.

Regards,

--
Masahiko Sawada

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-09 15:29:20
Message-ID:	CAD21AoDcn1fToCcYRqpU6fMY1xnpDdAKDTcbhW1R9M1mPM0kZg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Oops.
>>
>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151117(dot)194010(dot)17198448(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>>> Hello,
>>>
>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
>>> > >> One question is that what is different between the leading "n" in
>>> > >> s_s_names and the leading "n" of "n-priority"?
>>> > >
>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>>> > > representing an arbitrary integer number and that in "n-priority"
>>> > > is literally an "n", meaning "a format with any number of
>>> > > priority hosts" as a whole. As an instance,
>>> > >
>>> > > synchronous_replication_method = "n-priority"
>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>>> > >
>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>>> > > if we won't provide "1-priority" for backward compatibility,
>>> > > "priority" would be enough to represent the type.
>>> > >
>>> > > By the way, s_r_method is not essentially necessary but it would
>>> > > be important to avoid complexity of autodetection of formats
>>> > > including currently undefined ones.
>>> >
>>> > Than you for your explanation, I understood that.
>>> >
>>> > It means that the format of s_s_names will be changed, which would be not good.
>>>
>>> I believe that the format of definition of "replication set"(?)
>>> is not fixed and it would be more complex format to support
>>> nested definition. This should be in very different format from
>>> the current simple list of names. This is a selection among three
>>> or possiblly more disigns in order to be tolerable for future
>>> changes, I suppose.
>>>
>>> 1. Additional formats of definition in future will be stored in
>>> elsewhere of s_s_names.
>>>
>>> 2. Additional format will be stored in s_s_names, the format will
>>> be automatically detected.
>>>
>>> 3. (ditto), the format is designated by s_r_method.
>>>
>>> 4. Any other way?
>>>
>>> I choosed the third way. What do you think about future expansion
>>> of the format?
>>>
>
> I agree with #3 way and the s_s_name format you suggested.
> I think that It's extensible and is tolerable for future changes.
> I'm going to implement the patch based on this idea if other hackers
> agree with this design.
>

Please find the attached draft patch which supports multi sync replication.
This patch adds a GUC parameter synchronous_replication_method, which
represent the method of synchronous replication.

[Design of replication method]
synchronous_replication_method has two values; 'priority' and
'1-priority' for now.
We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.

* s_r_method = '1-priority'
This method is for backward compatibility, so the syntax of s_s_names
is same as today.
The behavior is same as well.

* s_r_method = 'priority'
This method is for multiple synchronous replication using priority method.
The syntax of s_s_names is,
<number of sync standbys>, <standby name> [, ...]

For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
node3' means that the master waits for acknowledge from at least 2
lowest priority servers.
If 4 standbys(node1 - node4) are available, the master server waits
acknowledge from 'node1' and 'node2.
The each status of wal senders are;

After 'node2' crashed, the master will wait for acknowledge from
'node1' and 'node3'.
The each status of wal senders are;

[Changing replication method]
When we want to change the replication method, we have to change the
s_r_method at first, and then do pg_reload_conf().
After changing replication method, we can change the s_s_names.

[Expanding replication method]
If we want to expand new replication method additionally, we need to
implement two functions for each replication method:
* int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
This function obtains the list of standbys considered as synchronous
at that time, and return its length.
* bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
This function obtains LSNs(write, flush) considered as synced.

Also, this patch debug code is remain yet, you can debug this behavior
using by enable DEBUG_REPLICATION macro.

Please give me feedbacks.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v1.patch	application/octet-stream	20.6 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-11 17:03:09
Message-ID:	CAD21AoApriUVxUvtUGvt9fgMo=fxYkk8iv8McccwGRBsxprStA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 9, 2015 at 8:59 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Nov 17, 2015 at 7:52 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Oops.
>>>
>>> At Tue, 17 Nov 2015 19:40:10 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20151117(dot)194010(dot)17198448(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>>>> Hello,
>>>>
>>>> At Tue, 17 Nov 2015 18:13:11 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC=AN+DKYNwsJp6COZ-6qmHXxuENxVPisxgPXcuXmPEvw(at)mail(dot)gmail(dot)com>
>>>> > >> One question is that what is different between the leading "n" in
>>>> > >> s_s_names and the leading "n" of "n-priority"?
>>>> > >
>>>> > > Ah. Sorry for the ambiguous description. 'n' in s_s_names
>>>> > > representing an arbitrary integer number and that in "n-priority"
>>>> > > is literally an "n", meaning "a format with any number of
>>>> > > priority hosts" as a whole. As an instance,
>>>> > >
>>>> > > synchronous_replication_method = "n-priority"
>>>> > > synchronous_standby_names = "2, mercury, venus, earth, mars, jupiter"
>>>> > >
>>>> > > I added "n-" of "n-priority" to distinguish with "1-priority" so
>>>> > > if we won't provide "1-priority" for backward compatibility,
>>>> > > "priority" would be enough to represent the type.
>>>> > >
>>>> > > By the way, s_r_method is not essentially necessary but it would
>>>> > > be important to avoid complexity of autodetection of formats
>>>> > > including currently undefined ones.
>>>> >
>>>> > Than you for your explanation, I understood that.
>>>> >
>>>> > It means that the format of s_s_names will be changed, which would be not good.
>>>>
>>>> I believe that the format of definition of "replication set"(?)
>>>> is not fixed and it would be more complex format to support
>>>> nested definition. This should be in very different format from
>>>> the current simple list of names. This is a selection among three
>>>> or possiblly more disigns in order to be tolerable for future
>>>> changes, I suppose.
>>>>
>>>> 1. Additional formats of definition in future will be stored in
>>>> elsewhere of s_s_names.
>>>>
>>>> 2. Additional format will be stored in s_s_names, the format will
>>>> be automatically detected.
>>>>
>>>> 3. (ditto), the format is designated by s_r_method.
>>>>
>>>> 4. Any other way?
>>>>
>>>> I choosed the third way. What do you think about future expansion
>>>> of the format?
>>>>
>>
>> I agree with #3 way and the s_s_name format you suggested.
>> I think that It's extensible and is tolerable for future changes.
>> I'm going to implement the patch based on this idea if other hackers
>> agree with this design.
>>
>
> Please find the attached draft patch which supports multi sync replication.
> This patch adds a GUC parameter synchronous_replication_method, which
> represent the method of synchronous replication.
>
> [Design of replication method]
> synchronous_replication_method has two values; 'priority' and
> '1-priority' for now.
> We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.
>
> * s_r_method = '1-priority'
> This method is for backward compatibility, so the syntax of s_s_names
> is same as today.
> The behavior is same as well.
>
> * s_r_method = 'priority'
> This method is for multiple synchronous replication using priority method.
> The syntax of s_s_names is,
> <number of sync standbys>, <standby name> [, ...]
>
> For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
> node3' means that the master waits for acknowledge from at least 2
> lowest priority servers.
> If 4 standbys(node1 - node4) are available, the master server waits
> acknowledge from 'node1' and 'node2.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1 | sync
> node2 | sync
> node3 | potential
> node4 | async
> (4 rows)
>
> After 'node2' crashed, the master will wait for acknowledge from
> 'node1' and 'node3'.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1 | sync
> node3 | sync
> node4 | async
> (3 rows)
>
> [Changing replication method]
> When we want to change the replication method, we have to change the
> s_r_method at first, and then do pg_reload_conf().
> After changing replication method, we can change the s_s_names.
>
> [Expanding replication method]
> If we want to expand new replication method additionally, we need to
> implement two functions for each replication method:
> * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
> This function obtains the list of standbys considered as synchronous
> at that time, and return its length.
> * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
> This function obtains LSNs(write, flush) considered as synced.
>
> Also, this patch debug code is remain yet, you can debug this behavior
> using by enable DEBUG_REPLICATION macro.
>
> Please give me feedbacks.
>

I've attached updated patch.
Please give me feedbacks.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v2.patch	application/octet-stream	20.6 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-14 09:27:38
Message-ID:	20151214.182738.130827803.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the new patch.

At Wed, 9 Dec 2015 20:59:20 +0530, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDcn1fToCcYRqpU6fMY1xnpDdAKDTcbhW1R9M1mPM0kZg(at)mail(dot)gmail(dot)com>
> On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > I agree with #3 way and the s_s_name format you suggested.
> > I think that It's extensible and is tolerable for future changes.
> > I'm going to implement the patch based on this idea if other hackers
> > agree with this design.
>
> Please find the attached draft patch which supports multi sync replication.
> This patch adds a GUC parameter synchronous_replication_method, which
> represent the method of synchronous replication.
>
> [Design of replication method]
> synchronous_replication_method has two values; 'priority' and
> '1-priority' for now.
> We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.
>
> * s_r_method = '1-priority'
> This method is for backward compatibility, so the syntax of s_s_names
> is same as today.
> The behavior is same as well.
>
> * s_r_method = 'priority'
> This method is for multiple synchronous replication using priority method.
> The syntax of s_s_names is,
> <number of sync standbys>, <standby name> [, ...]

Is there anyone opposed to this?

> For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
> node3' means that the master waits for acknowledge from at least 2
> lowest priority servers.
> If 4 standbys(node1 - node4) are available, the master server waits
> acknowledge from 'node1' and 'node2.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1 | sync
> node2 | sync
> node3 | potential
> node4 | async
> (4 rows)
>
> After 'node2' crashed, the master will wait for acknowledge from
> 'node1' and 'node3'.
> The each status of wal senders are;
>
> =# select application_name, sync_state from pg_stat_replication order
> by application_name;
> application_name | sync_state
> ------------------+------------
> node1 | sync
> node3 | sync
> node4 | async
> (3 rows)
>
> [Changing replication method]
> When we want to change the replication method, we have to change the
> s_r_method at first, and then do pg_reload_conf().
> After changing replication method, we can change the s_s_names.

Mmm. I should be able to be changed at once, because s_r_method
and s_s_names contradict each other during the intermediate
state.

> [Expanding replication method]
> If we want to expand new replication method additionally, we need to
> implement two functions for each replication method:
> * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
> This function obtains the list of standbys considered as synchronous
> at that time, and return its length.
> * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
> This function obtains LSNs(write, flush) considered as synced.
>
> Also, this patch debug code is remain yet, you can debug this behavior
> using by enable DEBUG_REPLICATION macro.
>
> Please give me feedbacks.

I haven't looked into this fully (sorry) but I'm concerned about
several points.

- I feel that some function names looks too long. For example
SyncRepGetSynchronousStandbysOnePriority occupies more than the
half of a line. (However, the replication code alrady has many
long function names..)

- The comment below of SyncRepGetSynchronousStandbyOnePriority,
> /* Find lowest priority standby */

The code where the comment is for is doing the correct
thing. Howerver, the comment is confusing. A lower priority
*value* means a higher priority.

- SyncRepGetSynchronousStandbys checks all if()s even when the
first one matches. Use switch or "else if" there if you they
are exclusive each other.

- Do you intende the DEBUG_REPLICATION code in
SyncRepGetSynchronousStandbys*() to be the final shape? The
same code blocks which can work for both method should be in
their common caller but SyncRepGetSyncLsns*() are
headache. Although it might need more refactoring, I'm sorry
but I don't see a desirable shape for now.

By the way, palloc(20)/free() in such short term looks
ineffective.

- SyncRepGetSyncLsnsPriority

For the comment "/* Find lowest XLogRecPtr of both write and
flush from sync_nodes */", LSN is compared as early or late so
the comment would be better to be something like "Keep/Collect
the earliest write and flush LSNs among prioritized standbys".

And what is more important, this block handles write and flush
LSN jumbled and it reults in missing the earliest(= most
delayed) LSN for certain cases. The following is an example.

Standby 1: write LSN = 10, flush LSN = 5
Standby 2: write LSN = 8 , flush LSN = 6

For this case, finally we get tmp_write = 10 and tmp_flush = 5
from the current code, where tmp_write has wrong value since
LSN = 10 has *not* been written yet on standby 2. (the names
"tmp_*" don't seem appropriate here)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-17 18:38:44
Message-ID:	CAD21AoAqaZOHqE9_ALhQ7D3NStJmwDg1MK91opYDiDDd4r7HUg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 14, 2015 at 2:57 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thank you for the new patch.
>
> At Wed, 9 Dec 2015 20:59:20 +0530, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDcn1fToCcYRqpU6fMY1xnpDdAKDTcbhW1R9M1mPM0kZg(at)mail(dot)gmail(dot)com>
>> On Wed, Nov 18, 2015 at 2:06 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> > I agree with #3 way and the s_s_name format you suggested.
>> > I think that It's extensible and is tolerable for future changes.
>> > I'm going to implement the patch based on this idea if other hackers
>> > agree with this design.
>>
>> Please find the attached draft patch which supports multi sync replication.
>> This patch adds a GUC parameter synchronous_replication_method, which
>> represent the method of synchronous replication.
>>
>> [Design of replication method]
>> synchronous_replication_method has two values; 'priority' and
>> '1-priority' for now.
>> We can expand the kind of its value (e.g, 'quorum', 'json' etc) in the future.
>>
>> * s_r_method = '1-priority'
>> This method is for backward compatibility, so the syntax of s_s_names
>> is same as today.
>> The behavior is same as well.
>>
>> * s_r_method = 'priority'
>> This method is for multiple synchronous replication using priority method.
>> The syntax of s_s_names is,
>> <number of sync standbys>, <standby name> [, ...]
>
> Is there anyone opposed to this?
>
>> For example, s_r_method = 'priority' and s_s_names = '2, node1, node2,
>> node3' means that the master waits for acknowledge from at least 2
>> lowest priority servers.
>> If 4 standbys(node1 - node4) are available, the master server waits
>> acknowledge from 'node1' and 'node2.
>> The each status of wal senders are;
>>
>> =# select application_name, sync_state from pg_stat_replication order
>> by application_name;
>> application_name | sync_state
>> ------------------+------------
>> node1 | sync
>> node2 | sync
>> node3 | potential
>> node4 | async
>> (4 rows)
>>
>> After 'node2' crashed, the master will wait for acknowledge from
>> 'node1' and 'node3'.
>> The each status of wal senders are;
>>
>> =# select application_name, sync_state from pg_stat_replication order
>> by application_name;
>> application_name | sync_state
>> ------------------+------------
>> node1 | sync
>> node3 | sync
>> node4 | async
>> (3 rows)
>>
>> [Changing replication method]
>> When we want to change the replication method, we have to change the
>> s_r_method at first, and then do pg_reload_conf().
>> After changing replication method, we can change the s_s_names.

Thank you for reviewing the patch!
Please find attached latest patch.

> Mmm. I should be able to be changed at once, because s_r_method
> and s_s_names contradict each other during the intermediate
> state.

Sorry to confuse you. I meant the case where we want to change the
replication method using ALTER SYSTEM.

>> [Expanding replication method]
>> If we want to expand new replication method additionally, we need to
>> implement two functions for each replication method:
>> * int SyncRepGetSynchronousStandbysXXX(int *sync_standbys)
>> This function obtains the list of standbys considered as synchronous
>> at that time, and return its length.
>> * bool SyncRepGetSyncLsnXXX(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>> This function obtains LSNs(write, flush) considered as synced.
>>
>> Also, this patch debug code is remain yet, you can debug this behavior
>> using by enable DEBUG_REPLICATION macro.
>>
>> Please give me feedbacks.
>
> I haven't looked into this fully (sorry) but I'm concerned about
> several points.
>
>
> - I feel that some function names looks too long. For example
> SyncRepGetSynchronousStandbysOnePriority occupies more than the
> half of a line. (However, the replication code alrady has many
> long function names..)

Yeah, it would be better to change 'Synchronous' to 'Sync' at least.

> - The comment below of SyncRepGetSynchronousStandbyOnePriority,
> > /* Find lowest priority standby */
>
> The code where the comment is for is doing the correct
> thing. Howerver, the comment is confusing. A lower priority
> *value* means a higher priority.

Fixed.

> - SyncRepGetSynchronousStandbys checks all if()s even when the
> first one matches. Use switch or "else if" there if you they
> are exclusive each other.

Fixed.

> - Do you intende the DEBUG_REPLICATION code in
> SyncRepGetSynchronousStandbys*() to be the final shape? The
> same code blocks which can work for both method should be in
> their common caller but SyncRepGetSyncLsns*() are
> headache. Although it might need more refactoring, I'm sorry
> but I don't see a desirable shape for now.

I'm not going to DEBUG_REPLICAION code to be the final shape.
These codes are removed from this version patch.

> By the way, palloc(20)/free() in such short term looks
> ineffective.
>
> - SyncRepGetSyncLsnsPriority
>
> For the comment "/* Find lowest XLogRecPtr of both write and
> flush from sync_nodes */", LSN is compared as early or late so
> the comment would be better to be something like "Keep/Collect
> the earliest write and flush LSNs among prioritized standbys".

Fixed.

> And what is more important, this block handles write and flush
> LSN jumbled and it reults in missing the earliest(= most
> delayed) LSN for certain cases. The following is an example.
>
> Standby 1: write LSN = 10, flush LSN = 5
> Standby 2: write LSN = 8 , flush LSN = 6
>
> For this case, finally we get tmp_write = 10 and tmp_flush = 5
> from the current code, where tmp_write has wrong value since
> LSN = 10 has *not* been written yet on standby 2. (the names
> "tmp_*" don't seem appropriate here)
>

You are right.
We have to handle write and flush LSN individually, and to get each lowest LSN.
For example in this case, we have to get write = 8, flush = 5.
I've change the logic so that.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v3.patch	application/octet-stream	17.7 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-23 02:50:23
Message-ID:	CAEepm=3HQq0CiQw+beox-wUFk-=k0HC5GjkdFfdi==ZTKPhi1w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 18, 2015 at 7:38 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
[000-_multi_sync_replication_v3.patch]

Hi Masahiko,

I haven't tested this version of the patch but I have some comments on the code.

+/* Is this wal sender considerable one? */
+bool
+SyncRepActiveListedWalSender(int num)

Maybe "Is this wal sender managing a standby that is streaming and
listed as a synchronous standby?"

+/*
+ * Obtain three palloc'd arrays containing position of standbys currently
+ * considered as synchronous, and its length.
+ */
+int
+SyncRepGetSyncStandbys(int *sync_standbys)

This comment seems to be out of date. I would say "Populate a
caller-supplied array which much have enough space for ... Returns
...".

+/*
+ * Obtain standby currently considered as synchronous using
+ * '1-priority' method.
+ */
+int
+SyncRepGetSyncStandbysOnePriority(int *sync_standbys)
+ ... code ...

Why do we need a separate function and code path for this case? If
you used SyncRepGetSyncStandbysPriority with a size of 1, should it
not produce the same result in the same time complexity?

+/*
+ * Obtain standby currently considered as synchronous using
+ * 'priority' method.
+ */
+int
+SyncRepGetSyncStandbysPriority(int *sync_standbys)

I would say something more descriptive, maybe like this: "Populates a
caller-supplied buffer with the walsnds indexes of the highest
priority active synchronous standbys, up to the a limit of
'synchronous_standby_num'. The order of the results is undefined.
Returns the number of results actually written."

If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
above, then this function could be renamed to SyncRepGetSyncStandbys.
I think it would be a tiny bit nicer if it also took a Size n argument
along with the output buffer pointer.

As for the body of that function (which I won't paste here), it
contains an algorithm to find the top K elements in an array of N
elements. It does that with a linear search through the top K seen so
far for each value in the input array, so its worst case is O(KN)
comparisons. Some of the sorting gurus on this list might have
something to say about that but my take is that it seems fine for the
tiny values of K and N that we're dealing with here, and it's nice
that it doesn't need any space other than the output buffer, unlike
some other top-K algorithms which would win for larger inputs.

+ /* Found sync standby */

This comment would be clearer as "Found lowest priority standby, so replace it".

+ if (walsndloc->sync_standby_priority == priority &&
+ walsnd->sync_standby_priority < priority)
+ sync_standbys[j] = i;

In this case, couldn't you also update 'priority' directly, and break
out of the loop immediately? Wouldn't "lowest_priority" be a better
variable name than "priority"? It might be good to say "lowest"
rather than "highest" in the nearby comments, to be consistent with
other parts of the code including the function name (lower priority
number means higher priority!).

+/*
+ * Obtain currently synced LSN: write and flush,
+ * using '1-prioirty' method.

s/prioirty/priority/

+ */
+bool
+SyncRepGetSyncLsnsOnePriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)

Similar to the earlier case, why have a special case for 1-priority?
Wouldn't SyncRepGetSyncLsnsPriority produce the same result when is
synchronous_standby_num == 1?

+/*
+ * Obtain currently synced LSN: write and flush,
+ * using 'prioirty' method.

s/prioirty/priority/

+SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
+{
+ int *sync_standbys = NULL;
+ int num_sync;
+ int i;
+ XLogRecPtr synced_write = InvalidXLogRecPtr;
+ XLogRecPtr synced_flush = InvalidXLogRecPtr;
+
+ sync_standbys = (int *) palloc(sizeof(int) * synchronous_standby_num);

Would a fixed size buffer on the stack (of compile time constant size)
be better than palloc/free in here and elsewhere?

+ /*
+ for (i = 0; i < num_sync; i++)
+ {
+ volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]];
+ if (walsndloc == MyWalSnd)
+ {
+ found = true;
+ break;
+ }
+ }
+ */

Dead code.

+ if (synchronous_replication_method == SYNC_REP_METHOD_1_PRIORITY)
+ synchronous_standby_num = 1;
+ else
+ synchronous_standby_num = pg_atoi(lfirst(list_head(elemlist)),
sizeof(int), 0);

Should we detect if synchronous_standby_num > the number of listed
servers, which would be a nonsensical configuration? Should we also
impose some other kind of constant limits, like must be >= 0 (I
haven't tried but I wonder if -1 leads to very large palloc) and must
be <= MAX_XXX (smallish sanity check number like 256, rather than the
INT_MAX limit imposed by pg_atoi), so that we could use that constant
to size stack buffers in the places where you currently palloc?

Could 1-priority mode be inferred from the use of a non-number in the
leading position, and if so, does the mode concept even need to exist,
especially if SyncRepGetSyncLsnsOnePriority and
SyncRepGetSyncStandbysOnePriority aren't really needed either way? Is
there any difference in behaviour between the following
configurations? (Sorry if that particular question has already been
duked out in the long thread about GUCs.)

synchronous_replication_method = 1-priority
synchronous_standby_names = foo, bar

synchronous_replication_method = priority
synchronous_standby_names = 1, foo, bar

(Apologies for the missing leading whitespace in patch fragments
pasted above, it seems that my mail client has eaten it).

--
Thomas Munro
http://www.enterprisedb.com

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-23 03:15:05
Message-ID:	CAEepm=1vMLKTwwq6NToRGEyoKZVc7yrOOXxGTfmmusXMN6zxMg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Fri, Dec 18, 2015 at 7:38 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> [000-_multi_sync_replication_v3.patch]
>
> Hi Masahiko,
>
> I haven't tested this version of the patch but I have some comments on the code.
>
> +/* Is this wal sender considerable one? */
> +bool
> +SyncRepActiveListedWalSender(int num)
>
> Maybe "Is this wal sender managing a standby that is streaming and
> listed as a synchronous standby?"
>
> +/*
> + * Obtain three palloc'd arrays containing position of standbys currently
> + * considered as synchronous, and its length.
> + */
> +int
> +SyncRepGetSyncStandbys(int *sync_standbys)
>
> This comment seems to be out of date. I would say "Populate a
> caller-supplied array which much have enough space for ... Returns
> ...".
>
> +/*
> + * Obtain standby currently considered as synchronous using
> + * '1-priority' method.
> + */
> +int
> +SyncRepGetSyncStandbysOnePriority(int *sync_standbys)
> + ... code ...
>
> Why do we need a separate function and code path for this case? If
> you used SyncRepGetSyncStandbysPriority with a size of 1, should it
> not produce the same result in the same time complexity?
>
> +/*
> + * Obtain standby currently considered as synchronous using
> + * 'priority' method.
> + */
> +int
> +SyncRepGetSyncStandbysPriority(int *sync_standbys)
>
> I would say something more descriptive, maybe like this: "Populates a
> caller-supplied buffer with the walsnds indexes of the highest
> priority active synchronous standbys, up to the a limit of
> 'synchronous_standby_num'. The order of the results is undefined.
> Returns the number of results actually written."
>
> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
> above, then this function could be renamed to SyncRepGetSyncStandbys.
> I think it would be a tiny bit nicer if it also took a Size n argument
> along with the output buffer pointer.
>
> As for the body of that function (which I won't paste here), it
> contains an algorithm to find the top K elements in an array of N
> elements. It does that with a linear search through the top K seen so
> far for each value in the input array, so its worst case is O(KN)
> comparisons. Some of the sorting gurus on this list might have
> something to say about that but my take is that it seems fine for the
> tiny values of K and N that we're dealing with here, and it's nice
> that it doesn't need any space other than the output buffer, unlike
> some other top-K algorithms which would win for larger inputs.
>
> + /* Found sync standby */
>
> This comment would be clearer as "Found lowest priority standby, so replace it".
>
> + if (walsndloc->sync_standby_priority == priority &&
> + walsnd->sync_standby_priority < priority)
> + sync_standbys[j] = i;
>
> In this case, couldn't you also update 'priority' directly, and break
> out of the loop immediately?

Oops, I didn't think that though: you can't break from the loop, you
still need to find the new lowest priority, so I retract that bit.

> Wouldn't "lowest_priority" be a better
> variable name than "priority"? It might be good to say "lowest"
> rather than "highest" in the nearby comments, to be consistent with
> other parts of the code including the function name (lower priority
> number means higher priority!).
>
> +/*
> + * Obtain currently synced LSN: write and flush,
> + * using '1-prioirty' method.
>
> s/prioirty/priority/
>
> + */
> +bool
> +SyncRepGetSyncLsnsOnePriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>
> Similar to the earlier case, why have a special case for 1-priority?
> Wouldn't SyncRepGetSyncLsnsPriority produce the same result when is
> synchronous_standby_num == 1?
>
> +/*
> + * Obtain currently synced LSN: write and flush,
> + * using 'prioirty' method.
>
> s/prioirty/priority/
>
> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
> +{
> + int *sync_standbys = NULL;
> + int num_sync;
> + int i;
> + XLogRecPtr synced_write = InvalidXLogRecPtr;
> + XLogRecPtr synced_flush = InvalidXLogRecPtr;
> +
> + sync_standbys = (int *) palloc(sizeof(int) * synchronous_standby_num);
>
> Would a fixed size buffer on the stack (of compile time constant size)
> be better than palloc/free in here and elsewhere?
>
> + /*
> + for (i = 0; i < num_sync; i++)
> + {
> + volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]];
> + if (walsndloc == MyWalSnd)
> + {
> + found = true;
> + break;
> + }
> + }
> + */
>
> Dead code.
>
> + if (synchronous_replication_method == SYNC_REP_METHOD_1_PRIORITY)
> + synchronous_standby_num = 1;
> + else
> + synchronous_standby_num = pg_atoi(lfirst(list_head(elemlist)),
> sizeof(int), 0);
>
> Should we detect if synchronous_standby_num > the number of listed
> servers, which would be a nonsensical configuration? Should we also
> impose some other kind of constant limits, like must be >= 0 (I
> haven't tried but I wonder if -1 leads to very large palloc) and must
> be <= MAX_XXX (smallish sanity check number like 256, rather than the
> INT_MAX limit imposed by pg_atoi), so that we could use that constant
> to size stack buffers in the places where you currently palloc?
>
> Could 1-priority mode be inferred from the use of a non-number in the
> leading position, and if so, does the mode concept even need to exist,
> especially if SyncRepGetSyncLsnsOnePriority and
> SyncRepGetSyncStandbysOnePriority aren't really needed either way? Is
> there any difference in behaviour between the following
> configurations? (Sorry if that particular question has already been
> duked out in the long thread about GUCs.)
>
> synchronous_replication_method = 1-priority
> synchronous_standby_names = foo, bar
>
> synchronous_replication_method = priority
> synchronous_standby_names = 1, foo, bar
>
> (Apologies for the missing leading whitespace in patch fragments
> pasted above, it seems that my mail client has eaten it).

--
Thomas Munro
http://www.enterprisedb.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-24 03:00:11
Message-ID:	CAB7nPqRT7A2Azb3NmD_Qzaj8faPJ3R_iue=hS5ErXy71RjV7YQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 23, 2015 at 12:15 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Review stuff

I have moved this entry to next CF as review is quite recent.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-24 19:50:16
Message-ID:	CAD21AoB_RYHpg5h_W1H9P9u+DRicDxhRGOAdybdKhf_hBuGMzw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Fri, Dec 18, 2015 at 7:38 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> [000-_multi_sync_replication_v3.patch]
>>
>> Hi Masahiko,
>>
>> I haven't tested this version of the patch but I have some comments on the code.
>>
>> +/* Is this wal sender considerable one? */
>> +bool
>> +SyncRepActiveListedWalSender(int num)
>>
>> Maybe "Is this wal sender managing a standby that is streaming and
>> listed as a synchronous standby?"

Fixed.

>> +/*
>> + * Obtain three palloc'd arrays containing position of standbys currently
>> + * considered as synchronous, and its length.
>> + */
>> +int
>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>
>> This comment seems to be out of date. I would say "Populate a
>> caller-supplied array which much have enough space for ... Returns
>> ...".

Fixed.

>> +/*
>> + * Obtain standby currently considered as synchronous using
>> + * '1-priority' method.
>> + */
>> +int
>> +SyncRepGetSyncStandbysOnePriority(int *sync_standbys)
>> + ... code ...
>>
>> Why do we need a separate function and code path for this case? If
>> you used SyncRepGetSyncStandbysPriority with a size of 1, should it
>> not produce the same result in the same time complexity?

I was thinking that we could add new function like
SyncRepGetSyncStandbysXXXXX function (XXXXX is replication method
name) if we want to expand the kind of repliaction method.
So I include replication method name into function name.
But it's enough to add one function for 2 replication method;
priority, 1-priority

>> +/*
>> + * Obtain standby currently considered as synchronous using
>> + * 'priority' method.
>> + */
>> +int
>> +SyncRepGetSyncStandbysPriority(int *sync_standbys)
>>
>> I would say something more descriptive, maybe like this: "Populates a
>> caller-supplied buffer with the walsnds indexes of the highest
>> priority active synchronous standbys, up to the a limit of
>> 'synchronous_standby_num'. The order of the results is undefined.
>> Returns the number of results actually written."

Fixed.

>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>> I think it would be a tiny bit nicer if it also took a Size n argument
>> along with the output buffer pointer.

Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
function uses synchronous_standby_num which is global variable.
But you mean that the number of synchronous standbys is given as
function argument?

>> As for the body of that function (which I won't paste here), it
>> contains an algorithm to find the top K elements in an array of N
>> elements. It does that with a linear search through the top K seen so
>> far for each value in the input array, so its worst case is O(KN)
>> comparisons. Some of the sorting gurus on this list might have
>> something to say about that but my take is that it seems fine for the
>> tiny values of K and N that we're dealing with here, and it's nice
>> that it doesn't need any space other than the output buffer, unlike
>> some other top-K algorithms which would win for larger inputs.

Yeah, it's improvement point.
But I'm assumed that the number of synchronous replication is not
large, so I use this algorithm as first version.
And I think that its worst case is O(K(N-K)). Am I missing something?

>> + /* Found sync standby */
>>
>> This comment would be clearer as "Found lowest priority standby, so replace it".

Fixed.

>> + if (walsndloc->sync_standby_priority == priority &&
>> + walsnd->sync_standby_priority < priority)
>> + sync_standbys[j] = i;
>>
>> In this case, couldn't you also update 'priority' directly, and break
>> out of the loop immediately?
>
> Oops, I didn't think that though: you can't break from the loop, you
> still need to find the new lowest priority, so I retract that bit.
>
>> Wouldn't "lowest_priority" be a better
>> variable name than "priority"? It might be good to say "lowest"
>> rather than "highest" in the nearby comments, to be consistent with
>> other parts of the code including the function name (lower priority
>> number means higher priority!).
>>
>> +/*
>> + * Obtain currently synced LSN: write and flush,
>> + * using '1-prioirty' method.
>>
>> s/prioirty/priority/
>>
>> + */
>> +bool
>> +SyncRepGetSyncLsnsOnePriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>
>> Similar to the earlier case, why have a special case for 1-priority?
>> Wouldn't SyncRepGetSyncLsnsPriority produce the same result when is
>> synchronous_standby_num == 1?
>>
>> +/*
>> + * Obtain currently synced LSN: write and flush,
>> + * using 'prioirty' method.
>>
>> s/prioirty/priority/
>>
>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>> +{
>> + int *sync_standbys = NULL;
>> + int num_sync;
>> + int i;
>> + XLogRecPtr synced_write = InvalidXLogRecPtr;
>> + XLogRecPtr synced_flush = InvalidXLogRecPtr;
>> +
>> + sync_standbys = (int *) palloc(sizeof(int) * synchronous_standby_num);
>>
>> Would a fixed size buffer on the stack (of compile time constant size)
>> be better than palloc/free in here and elsewhere?
>>
>> + /*
>> + for (i = 0; i < num_sync; i++)
>> + {
>> + volatile WalSnd *walsndloc = &WalSndCtl->walsnds[sync_standbys[i]];
>> + if (walsndloc == MyWalSnd)
>> + {
>> + found = true;
>> + break;
>> + }
>> + }
>> + */
>>
>> Dead code.
>>
>> + if (synchronous_replication_method == SYNC_REP_METHOD_1_PRIORITY)
>> + synchronous_standby_num = 1;
>> + else
>> + synchronous_standby_num = pg_atoi(lfirst(list_head(elemlist)),
>> sizeof(int), 0);

Fixed.

>> Should we detect if synchronous_standby_num > the number of listed
>> servers, which would be a nonsensical configuration? Should we also
>> impose some other kind of constant limits, like must be >= 0 (I
>> haven't tried but I wonder if -1 leads to very large palloc) and must
>> be <= MAX_XXX (smallish sanity check number like 256, rather than the
>> INT_MAX limit imposed by pg_atoi), so that we could use that constant
>> to size stack buffers in the places where you currently palloc?

Yeah, I add validation check for s_s_num.

>> Could 1-priority mode be inferred from the use of a non-number in the
>> leading position, and if so, does the mode concept even need to exist,
>> especially if SyncRepGetSyncLsnsOnePriority and
>> SyncRepGetSyncStandbysOnePriority aren't really needed either way? Is
>> there any difference in behaviour between the following
>> configurations? (Sorry if that particular question has already been
>> duked out in the long thread about GUCs.)
>>
>> synchronous_replication_method = 1-priority
>> synchronous_standby_names = foo, bar
>>
>> synchronous_replication_method = priority
>> synchronous_standby_names = 1, foo, bar

The behaviour under the both configuration are the same.
I added '1-priority' method for backward compatibility. The default
value of s_r_method is '1-priority', so user who is using sync
replicatoin can continues to use after upgrading smoothly.

>> (Apologies for the missing leading whitespace in patch fragments
>> pasted above, it seems that my mail client has eaten it).

No problem. Thank you for reviewing!

> I have moved this entry to next CF as review is quite recent.
Thanks!

Attached latest version patch.
Please review it.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v4.patch	application/octet-stream	17.2 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2015-12-24 22:21:29
Message-ID:	CAEepm=1RaBeQprHw28cickrPzRmBpYP5Dn3u-jMigq_EcoHE3Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>> along with the output buffer pointer.
>
> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
> function uses synchronous_standby_num which is global variable.
> But you mean that the number of synchronous standbys is given as
> function argument?

Yeah, I was thinking of it as the output buffer size which I would be
inclined to make more explicit (I am still coming to terms with the
use of global variables in Postgres) but it doesn't matter, please
disregard that suggestion.

>>> As for the body of that function (which I won't paste here), it
>>> contains an algorithm to find the top K elements in an array of N
>>> elements. It does that with a linear search through the top K seen so
>>> far for each value in the input array, so its worst case is O(KN)
>>> comparisons. Some of the sorting gurus on this list might have
>>> something to say about that but my take is that it seems fine for the
>>> tiny values of K and N that we're dealing with here, and it's nice
>>> that it doesn't need any space other than the output buffer, unlike
>>> some other top-K algorithms which would win for larger inputs.
>
> Yeah, it's improvement point.
> But I'm assumed that the number of synchronous replication is not
> large, so I use this algorithm as first version.
> And I think that its worst case is O(K(N-K)). Am I missing something?

You're right, I was dropping that detail, in the tradition of the
hand-wavy school of big-O notation. (I suppose you could skip the
inner loop when the priority is lower than the current lowest
priority, giving a O(N) best case when the walsenders are perfectly
ordered by coincidence. Probably a bad idea or just not worth
worrying about.)

> Attached latest version patch.

+/*
+ * Obtain currently synced LSN location: write and flush, using priority
- * In 9.1 we support only a single synchronous standby, chosen from a
- * priority list of synchronous_standby_names. Before it can become the
+ * In 9.6 we support multiple synchronous standby, chosen from a priority

s/standby/standbys/

+ * list of synchronous_standby_names. Before it can become the

s/Before it can become the/Before any standby can become a/

* synchronous standby it must have caught up with the primary; that may
* take some time. Once caught up, the current highest priority standby

s/standby/standbys/

* will release waiters from the queue.

+bool
+SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
+{
+ int sync_standbys[synchronous_standby_num];

I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
(Variable sized arrays are a feature of C99 and PostgreSQL is written
in C89.)

+/*
+ * Populate a caller-supplied array which much have enough space for
+ * synchronous_standby_num. Returns position of standbys currently
+ * considered as synchronous, and its length.
+ */
+int
+SyncRepGetSyncStandbys(int *sync_standbys)

s/much/must/ (my bad, in previous email).

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("The number of synchronous standbys must be smaller than the
number of listed : %d",
+ synchronous_standby_num)));

How about "the number of synchronous standbys exceeds the length of
the standby list: %d"? Error messages usually start with lower case,
':' is not usually preceded by a space.

+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("The number of synchronous standbys must be between 1 and %d : %d",

s/The/the/, s/ : /: /

--
Thomas Munro
http://www.enterprisedb.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-03 13:26:44
Message-ID:	CAD21AoCCaekkm7uHTkh=LzEmKFNbrWkUVZCbyYGYxjwwLENx6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>> along with the output buffer pointer.
>>
>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>> function uses synchronous_standby_num which is global variable.
>> But you mean that the number of synchronous standbys is given as
>> function argument?
>
> Yeah, I was thinking of it as the output buffer size which I would be
> inclined to make more explicit (I am still coming to terms with the
> use of global variables in Postgres) but it doesn't matter, please
> disregard that suggestion.
>
>>>> As for the body of that function (which I won't paste here), it
>>>> contains an algorithm to find the top K elements in an array of N
>>>> elements. It does that with a linear search through the top K seen so
>>>> far for each value in the input array, so its worst case is O(KN)
>>>> comparisons. Some of the sorting gurus on this list might have
>>>> something to say about that but my take is that it seems fine for the
>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>> that it doesn't need any space other than the output buffer, unlike
>>>> some other top-K algorithms which would win for larger inputs.
>>
>> Yeah, it's improvement point.
>> But I'm assumed that the number of synchronous replication is not
>> large, so I use this algorithm as first version.
>> And I think that its worst case is O(K(N-K)). Am I missing something?
>
> You're right, I was dropping that detail, in the tradition of the
> hand-wavy school of big-O notation. (I suppose you could skip the
> inner loop when the priority is lower than the current lowest
> priority, giving a O(N) best case when the walsenders are perfectly
> ordered by coincidence. Probably a bad idea or just not worth
> worrying about.)

Thank you for reviewing the patch.
Yeah, I added the logic that skip the inner loop.

>
>> Attached latest version patch.
>
> +/*
> + * Obtain currently synced LSN location: write and flush, using priority
> - * In 9.1 we support only a single synchronous standby, chosen from a
> - * priority list of synchronous_standby_names. Before it can become the
> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>
> s/standby/standbys/
>
> + * list of synchronous_standby_names. Before it can become the
>
> s/Before it can become the/Before any standby can become a/
>
> * synchronous standby it must have caught up with the primary; that may
> * take some time. Once caught up, the current highest priority standby
>
> s/standby/standbys/
>
> * will release waiters from the queue.
>
> +bool
> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
> +{
> + int sync_standbys[synchronous_standby_num];
>
> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
> (Variable sized arrays are a feature of C99 and PostgreSQL is written
> in C89.)
>
> +/*
> + * Populate a caller-supplied array which much have enough space for
> + * synchronous_standby_num. Returns position of standbys currently
> + * considered as synchronous, and its length.
> + */
> +int
> +SyncRepGetSyncStandbys(int *sync_standbys)
>
> s/much/must/ (my bad, in previous email).
>
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("The number of synchronous standbys must be smaller than the
> number of listed : %d",
> + synchronous_standby_num)));
>
> How about "the number of synchronous standbys exceeds the length of
> the standby list: %d"? Error messages usually start with lower case,
> ':' is not usually preceded by a space.
>
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>
> s/The/the/, s/ : /: /

Fixed you mentioned.

Attached latest v5 patch.
Please review it.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v5.patch	text/x-patch	17.3 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-04 06:29:34
Message-ID:	CAB7nPqTp5RoHxcp8YxejGMjRjjtLaXCa8=-BEr7ZnBNbPzPdWA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 3, 2016 at 10:26 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>>> along with the output buffer pointer.
>>>
>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>> function uses synchronous_standby_num which is global variable.
>>> But you mean that the number of synchronous standbys is given as
>>> function argument?
>>
>> Yeah, I was thinking of it as the output buffer size which I would be
>> inclined to make more explicit (I am still coming to terms with the
>> use of global variables in Postgres) but it doesn't matter, please
>> disregard that suggestion.
>>
>>>>> As for the body of that function (which I won't paste here), it
>>>>> contains an algorithm to find the top K elements in an array of N
>>>>> elements. It does that with a linear search through the top K seen so
>>>>> far for each value in the input array, so its worst case is O(KN)
>>>>> comparisons. Some of the sorting gurus on this list might have
>>>>> something to say about that but my take is that it seems fine for the
>>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>>> that it doesn't need any space other than the output buffer, unlike
>>>>> some other top-K algorithms which would win for larger inputs.
>>>
>>> Yeah, it's improvement point.
>>> But I'm assumed that the number of synchronous replication is not
>>> large, so I use this algorithm as first version.
>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>
>> You're right, I was dropping that detail, in the tradition of the
>> hand-wavy school of big-O notation. (I suppose you could skip the
>> inner loop when the priority is lower than the current lowest
>> priority, giving a O(N) best case when the walsenders are perfectly
>> ordered by coincidence. Probably a bad idea or just not worth
>> worrying about.)
>
> Thank you for reviewing the patch.
> Yeah, I added the logic that skip the inner loop.
>
>>
>>> Attached latest version patch.
>>
>> +/*
>> + * Obtain currently synced LSN location: write and flush, using priority
>> - * In 9.1 we support only a single synchronous standby, chosen from a
>> - * priority list of synchronous_standby_names. Before it can become the
>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>
>> s/standby/standbys/
>>
>> + * list of synchronous_standby_names. Before it can become the
>>
>> s/Before it can become the/Before any standby can become a/
>>
>> * synchronous standby it must have caught up with the primary; that may
>> * take some time. Once caught up, the current highest priority standby
>>
>> s/standby/standbys/
>>
>> * will release waiters from the queue.
>>
>> +bool
>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>> +{
>> + int sync_standbys[synchronous_standby_num];
>>
>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>> in C89.)
>>
>> +/*
>> + * Populate a caller-supplied array which much have enough space for
>> + * synchronous_standby_num. Returns position of standbys currently
>> + * considered as synchronous, and its length.
>> + */
>> +int
>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>
>> s/much/must/ (my bad, in previous email).
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("The number of synchronous standbys must be smaller than the
>> number of listed : %d",
>> + synchronous_standby_num)));
>>
>> How about "the number of synchronous standbys exceeds the length of
>> the standby list: %d"? Error messages usually start with lower case,
>> ':' is not usually preceded by a space.
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>
>> s/The/the/, s/ : /: /
>
> Fixed you mentioned.
>
> Attached latest v5 patch.
> Please review it.

Something that I find rather scary with this patch: could it be
possible to get actual regression tests now that there is more
machinery with PostgresNode.pm? As syncrep code paths get more and
more complex, so are debugging and maintenance.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, robertmhaas(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-08 04:53:22
Message-ID:	20160108.135322.121365974.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqTp5RoHxcp8YxejGMjRjjtLaXCa8=-BEr7ZnBNbPzPdWA(at)mail(dot)gmail(dot)com>
> > Attached latest v5 patch.
> > Please review it.
>
> Something that I find rather scary with this patch: could it be
> possible to get actual regression tests now that there is more
> machinery with PostgresNode.pm? As syncrep code paths get more and
> more complex, so are debugging and maintenance.

The test on the whole replication system will very likely to be
too complex and hard to stabilize, and would be
disproportionately large to other tests.

This patch mainly changes the logic to choose the next syncrep
standbys and calculate the 'synched' LSNs, so performing separate
module tests for the logics, then perform the test for the
behavior according to the result of that by, perhaps,
PostgresNode.pm would remarkably reduce the labor for
testing.

Could we have some tapping point for individual testing of the
logics in appropriate way?

In order to do so, the logics should be able to be fed arbitrary
complete set of parameters, in other words, defining a kind of
API to use the logics from the core side, even though it is not
an extension. Then we will *somehow* kick the API with some set
of parameters in regest.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	sawada(dot)mshk(at)gmail(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, amit(dot)kapila16(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-08 06:51:03
Message-ID:	CAB7nPqT9rsxyV1UTiW-1o-O5BTTwxTj3uGmnuL-Byj_bZ3OCdg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 8, 2016 at 1:53 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqTp5RoHxcp8YxejGMjRjjtLaXCa8=-BEr7ZnBNbPzPdWA(at)mail(dot)gmail(dot)com>
>> > Attached latest v5 patch.
>> > Please review it.
>>
>> Something that I find rather scary with this patch: could it be
>> possible to get actual regression tests now that there is more
>> machinery with PostgresNode.pm? As syncrep code paths get more and
>> more complex, so are debugging and maintenance.
>
> The test on the whole replication system will very likely to be
> too complex and hard to stabilize, and would be
> disproportionately large to other tests.

I don't buy that much. Mind you, there is in this commit fest a patch
introducing a basic regression test suite for recovery using the new
infrastructure that has been committed last month. You may want to
look at it.

> This patch mainly changes the logic to choose the next syncrep
> standbys and calculate the 'synched' LSNs, so performing separate
> module tests for the logics, then perform the test for the
> behavior according to the result of that by, perhaps,
> PostgresNode.pm would remarkably reduce the labor for
> testing.
> Could we have some tapping point for individual testing of the
> logics in appropriate way?

Isn't pg_stat_replication enough for this purpose? What you basically
need to do is set up a master, a set of slaves and then look at the
WAL sender status. Am I getting that wrong?

> In order to do so, the logics should be able to be fed arbitrary
> complete set of parameters, in other words, defining a kind of
> API to use the logics from the core side, even though it is not
> an extension. Then we will *somehow* kick the API with some set
> of parameters in regest.

Well, you will need to craft in the syncrep test suite associated in
this patch a set of routines that allows to set up appropriately
s_s_names and the other parameters that this patch introduces. I does
not sound like a barrier impossible to cross.
--
Michael

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, sawada(dot)mshk(at)gmail(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, amit(dot)kapila16(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-12 16:54:28
Message-ID:	20160112165428.GA828165@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Michael Paquier wrote:
> On Fri, Jan 8, 2016 at 1:53 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello,
> >
> > At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqTp5RoHxcp8YxejGMjRjjtLaXCa8=-BEr7ZnBNbPzPdWA(at)mail(dot)gmail(dot)com>
> >>
> >> Something that I find rather scary with this patch: could it be
> >> possible to get actual regression tests now that there is more
> >> machinery with PostgresNode.pm? As syncrep code paths get more and
> >> more complex, so are debugging and maintenance.
> >
> > The test on the whole replication system will very likely to be
> > too complex and hard to stabilize, and would be
> > disproportionately large to other tests.
>
> I don't buy that much. Mind you, there is in this commit fest a patch
> introducing a basic regression test suite for recovery using the new
> infrastructure that has been committed last month. You may want to
> look at it.

Kyotaro, please have a look at this patch:
https://commitfest.postgresql.org/8/438/
which is the recovery test framework Michael is talking about. Is it
possible to use that framework to write tests for this feature? If so,
then my preferred course of action would be to commit that patch and
then introduce in this patch some additional tests for the N-sync-standby
feature. Can you please have a look at the test framework patch and
provide your feedback on how usable it is for this?

Thanks,

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-17 14:09:22
Message-ID:	CAD21AoAczcg8kvBFK2gYd1M3j1SUtUKO=4QKs+qs9-_agJHdWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Michael Paquier wrote:
>> On Fri, Jan 8, 2016 at 1:53 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello,
>> >
>> > At Mon, 4 Jan 2016 15:29:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqTp5RoHxcp8YxejGMjRjjtLaXCa8=-BEr7ZnBNbPzPdWA(at)mail(dot)gmail(dot)com>
>> >>
>> >> Something that I find rather scary with this patch: could it be
>> >> possible to get actual regression tests now that there is more
>> >> machinery with PostgresNode.pm? As syncrep code paths get more and
>> >> more complex, so are debugging and maintenance.
>> >
>> > The test on the whole replication system will very likely to be
>> > too complex and hard to stabilize, and would be
>> > disproportionately large to other tests.
>>
>> I don't buy that much. Mind you, there is in this commit fest a patch
>> introducing a basic regression test suite for recovery using the new
>> infrastructure that has been committed last month. You may want to
>> look at it.
>
> Kyotaro, please have a look at this patch:
> https://commitfest.postgresql.org/8/438/
> which is the recovery test framework Michael is talking about. Is it
> possible to use that framework to write tests for this feature? If so,
> then my preferred course of action would be to commit that patch and
> then introduce in this patch some additional tests for the N-sync-standby
> feature. Can you please have a look at the test framework patch and
> provide your feedback on how usable it is for this?
>

I had a look that patch.
I'm planning to have at least following tests for multiple synchronous
replication.

* Confirm value of pg_stat_replication.sync_state (sync, async or potential)
* Confirm that the data is synchronously replicated to multiple
standbys in same cases.
* case 1 : The standby which is not listed in s_s_name, is down
* case 2 : The standby which is listed in s_s_names but potential
standby, is down
* case 3 : The standby which is considered as sync standby, is down.
* Standby promotion

In order to confirm that the commit isn't done in case #3 forever
unless new sync standby is up, I think we need the framework that
cancels executing query.
That is, what I'm planning is,

1. Set up master server (s_s_name = '2, standby1, standby2)
2. Set up two standby servers
3. Standby1 is down
4. Create some contents on master (But transaction is not committed)
5. Cancel the #4 query. (Also confirm that the flush location of only
standby2 makes progress)

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-18 04:20:52
Message-ID:	CAB7nPqQtJJiVAz9-X-wQjN5a2129+F8s4Tv1i5C-eNZ02Srv0w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
> * Confirm that the data is synchronously replicated to multiple
> standbys in same cases.
> * case 1 : The standby which is not listed in s_s_name, is down
> * case 2 : The standby which is listed in s_s_names but potential
> standby, is down
> * case 3 : The standby which is considered as sync standby, is down.
> * Standby promotion
>
> In order to confirm that the commit isn't done in case #3 forever
> unless new sync standby is up, I think we need the framework that
> cancels executing query.
> That is, what I'm planning is,
> 1. Set up master server (s_s_name = '2, standby1, standby2)
> 2. Set up two standby servers
> 3. Standby1 is down
> 4. Create some contents on master (But transaction is not committed)
> 5. Cancel the #4 query. (Also confirm that the flush location of only
> standby2 makes progress)

This will need some thinking and is not as easy as it sounds. There is
no way to hold on a connection after executing a query in the current
TAP infrastructure. You are just mentioning case 3, but actually cases
1 and 2 are falling into the same need: if there is a failure we want
to be able to not be stuck in the test forever and have a way to
cancel a query execution at will. TAP uses psql -c to execute any sql
queries, but we would need something that is far lower-level, and that
would be basically using the perl driver for Postgres or an equivalent
here.

Honestly for those tests I just thought that we could get to something
reliable by just looking at how each sync replication setup reflects
in pg_stat_replication as the flow is really getting complicated,
giving to the user a clear representation at SQL level of what is
actually occurring in the server depending on the configuration used
being important here.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-18 16:40:04
Message-ID:	CAD21AoBFJa74VS+fJ_f4Mp+JzGBWwLSWZc26poMKtbUnyayKTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>> * Confirm that the data is synchronously replicated to multiple
>> standbys in same cases.
>> * case 1 : The standby which is not listed in s_s_name, is down
>> * case 2 : The standby which is listed in s_s_names but potential
>> standby, is down
>> * case 3 : The standby which is considered as sync standby, is down.
>> * Standby promotion
>>
>> In order to confirm that the commit isn't done in case #3 forever
>> unless new sync standby is up, I think we need the framework that
>> cancels executing query.
>> That is, what I'm planning is,
>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>> 2. Set up two standby servers
>> 3. Standby1 is down
>> 4. Create some contents on master (But transaction is not committed)
>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>> standby2 makes progress)
>
> This will need some thinking and is not as easy as it sounds. There is
> no way to hold on a connection after executing a query in the current
> TAP infrastructure. You are just mentioning case 3, but actually cases
> 1 and 2 are falling into the same need: if there is a failure we want
> to be able to not be stuck in the test forever and have a way to
> cancel a query execution at will. TAP uses psql -c to execute any sql
> queries, but we would need something that is far lower-level, and that
> would be basically using the perl driver for Postgres or an equivalent
> here.
>
> Honestly for those tests I just thought that we could get to something
> reliable by just looking at how each sync replication setup reflects
> in pg_stat_replication as the flow is really getting complicated,
> giving to the user a clear representation at SQL level of what is
> actually occurring in the server depending on the configuration used
> being important here.

I see.
We could check the transition of sync_state in pg_stat_replication.
I think it means that it tests for each replication method (switching
state) rather than synchronization of replication.

What I'm planning to have are,
* Confirm value of pg_stat_replication.sync_state (sync, async or potential)
* Standby promotion
* Standby catching up master
And each replication method has above tests.

Are these enough?

Regards,

--
Masahiko Sawada

From:	Thom Brown <thom(at)linux(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-18 16:52:42
Message-ID:	CAA-aLv5tm0OR=V_YX2xQHDDjLWfsC33e-W7i1W6HY==38C7kbg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3 January 2016 at 13:26, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>>> along with the output buffer pointer.
>>>
>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>> function uses synchronous_standby_num which is global variable.
>>> But you mean that the number of synchronous standbys is given as
>>> function argument?
>>
>> Yeah, I was thinking of it as the output buffer size which I would be
>> inclined to make more explicit (I am still coming to terms with the
>> use of global variables in Postgres) but it doesn't matter, please
>> disregard that suggestion.
>>
>>>>> As for the body of that function (which I won't paste here), it
>>>>> contains an algorithm to find the top K elements in an array of N
>>>>> elements. It does that with a linear search through the top K seen so
>>>>> far for each value in the input array, so its worst case is O(KN)
>>>>> comparisons. Some of the sorting gurus on this list might have
>>>>> something to say about that but my take is that it seems fine for the
>>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>>> that it doesn't need any space other than the output buffer, unlike
>>>>> some other top-K algorithms which would win for larger inputs.
>>>
>>> Yeah, it's improvement point.
>>> But I'm assumed that the number of synchronous replication is not
>>> large, so I use this algorithm as first version.
>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>
>> You're right, I was dropping that detail, in the tradition of the
>> hand-wavy school of big-O notation. (I suppose you could skip the
>> inner loop when the priority is lower than the current lowest
>> priority, giving a O(N) best case when the walsenders are perfectly
>> ordered by coincidence. Probably a bad idea or just not worth
>> worrying about.)
>
> Thank you for reviewing the patch.
> Yeah, I added the logic that skip the inner loop.
>
>>
>>> Attached latest version patch.
>>
>> +/*
>> + * Obtain currently synced LSN location: write and flush, using priority
>> - * In 9.1 we support only a single synchronous standby, chosen from a
>> - * priority list of synchronous_standby_names. Before it can become the
>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>
>> s/standby/standbys/
>>
>> + * list of synchronous_standby_names. Before it can become the
>>
>> s/Before it can become the/Before any standby can become a/
>>
>> * synchronous standby it must have caught up with the primary; that may
>> * take some time. Once caught up, the current highest priority standby
>>
>> s/standby/standbys/
>>
>> * will release waiters from the queue.
>>
>> +bool
>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>> +{
>> + int sync_standbys[synchronous_standby_num];
>>
>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>> in C89.)
>>
>> +/*
>> + * Populate a caller-supplied array which much have enough space for
>> + * synchronous_standby_num. Returns position of standbys currently
>> + * considered as synchronous, and its length.
>> + */
>> +int
>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>
>> s/much/must/ (my bad, in previous email).
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("The number of synchronous standbys must be smaller than the
>> number of listed : %d",
>> + synchronous_standby_num)));
>>
>> How about "the number of synchronous standbys exceeds the length of
>> the standby list: %d"? Error messages usually start with lower case,
>> ':' is not usually preceded by a space.
>>
>> + ereport(ERROR,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>
>> s/The/the/, s/ : /: /
>
> Fixed you mentioned.
>
> Attached latest v5 patch.
> Please review it.

synchronous_standby_num doesn't appear to be a valid GUC name:

LOG: unrecognized configuration parameter "synchronous_standby_num"
in file "/home/thom/Development/test/primary/postgresql.conf" line 244

All I did was uncomment it and set it to a value.

Thom

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-19 05:55:22
Message-ID:	CAB7nPqRaCyO99tgBuZ4Ox6Abg85+mCTG8o_2Y4a_2rtEa9bhkw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 19, 2016 at 1:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>>> * Confirm that the data is synchronously replicated to multiple
>>> standbys in same cases.
>>> * case 1 : The standby which is not listed in s_s_name, is down
>>> * case 2 : The standby which is listed in s_s_names but potential
>>> standby, is down
>>> * case 3 : The standby which is considered as sync standby, is down.
>>> * Standby promotion
>>>
>>> In order to confirm that the commit isn't done in case #3 forever
>>> unless new sync standby is up, I think we need the framework that
>>> cancels executing query.
>>> That is, what I'm planning is,
>>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>>> 2. Set up two standby servers
>>> 3. Standby1 is down
>>> 4. Create some contents on master (But transaction is not committed)
>>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>>> standby2 makes progress)
>>
>> This will need some thinking and is not as easy as it sounds. There is
>> no way to hold on a connection after executing a query in the current
>> TAP infrastructure. You are just mentioning case 3, but actually cases
>> 1 and 2 are falling into the same need: if there is a failure we want
>> to be able to not be stuck in the test forever and have a way to
>> cancel a query execution at will. TAP uses psql -c to execute any sql
>> queries, but we would need something that is far lower-level, and that
>> would be basically using the perl driver for Postgres or an equivalent
>> here.
>>
>> Honestly for those tests I just thought that we could get to something
>> reliable by just looking at how each sync replication setup reflects
>> in pg_stat_replication as the flow is really getting complicated,
>> giving to the user a clear representation at SQL level of what is
>> actually occurring in the server depending on the configuration used
>> being important here.
>
> I see.
> We could check the transition of sync_state in pg_stat_replication.
> I think it means that it tests for each replication method (switching
> state) rather than synchronization of replication.
>
> What I'm planning to have are,
> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
> * Standby promotion
> * Standby catching up master
> And each replication method has above tests.
>
> Are these enough?

Does promoting the standby and checking that it caught really have
value in this context of this patch? What we just want to know is on a
master, which nodes need to be waited for when s_s_names or any other
method is used, no?
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-20 05:35:02
Message-ID:	CAD21AoDAeErHj+v-gJzmyi9P+AY=Ds_54MfpJJhngnkETTntFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 19, 2016 at 1:52 AM, Thom Brown <thom(at)linux(dot)com> wrote:
> On 3 January 2016 at 13:26, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>>>> along with the output buffer pointer.
>>>>
>>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>>> function uses synchronous_standby_num which is global variable.
>>>> But you mean that the number of synchronous standbys is given as
>>>> function argument?
>>>
>>> Yeah, I was thinking of it as the output buffer size which I would be
>>> inclined to make more explicit (I am still coming to terms with the
>>> use of global variables in Postgres) but it doesn't matter, please
>>> disregard that suggestion.
>>>
>>>>>> As for the body of that function (which I won't paste here), it
>>>>>> contains an algorithm to find the top K elements in an array of N
>>>>>> elements. It does that with a linear search through the top K seen so
>>>>>> far for each value in the input array, so its worst case is O(KN)
>>>>>> comparisons. Some of the sorting gurus on this list might have
>>>>>> something to say about that but my take is that it seems fine for the
>>>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>>>> that it doesn't need any space other than the output buffer, unlike
>>>>>> some other top-K algorithms which would win for larger inputs.
>>>>
>>>> Yeah, it's improvement point.
>>>> But I'm assumed that the number of synchronous replication is not
>>>> large, so I use this algorithm as first version.
>>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>>
>>> You're right, I was dropping that detail, in the tradition of the
>>> hand-wavy school of big-O notation. (I suppose you could skip the
>>> inner loop when the priority is lower than the current lowest
>>> priority, giving a O(N) best case when the walsenders are perfectly
>>> ordered by coincidence. Probably a bad idea or just not worth
>>> worrying about.)
>>
>> Thank you for reviewing the patch.
>> Yeah, I added the logic that skip the inner loop.
>>
>>>
>>>> Attached latest version patch.
>>>
>>> +/*
>>> + * Obtain currently synced LSN location: write and flush, using priority
>>> - * In 9.1 we support only a single synchronous standby, chosen from a
>>> - * priority list of synchronous_standby_names. Before it can become the
>>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>>
>>> s/standby/standbys/
>>>
>>> + * list of synchronous_standby_names. Before it can become the
>>>
>>> s/Before it can become the/Before any standby can become a/
>>>
>>> * synchronous standby it must have caught up with the primary; that may
>>> * take some time. Once caught up, the current highest priority standby
>>>
>>> s/standby/standbys/
>>>
>>> * will release waiters from the queue.
>>>
>>> +bool
>>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>> +{
>>> + int sync_standbys[synchronous_standby_num];
>>>
>>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>>> in C89.)
>>>
>>> +/*
>>> + * Populate a caller-supplied array which much have enough space for
>>> + * synchronous_standby_num. Returns position of standbys currently
>>> + * considered as synchronous, and its length.
>>> + */
>>> +int
>>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>>
>>> s/much/must/ (my bad, in previous email).
>>>
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> + errmsg("The number of synchronous standbys must be smaller than the
>>> number of listed : %d",
>>> + synchronous_standby_num)));
>>>
>>> How about "the number of synchronous standbys exceeds the length of
>>> the standby list: %d"? Error messages usually start with lower case,
>>> ':' is not usually preceded by a space.
>>>
>>> + ereport(ERROR,
>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>>
>>> s/The/the/, s/ : /: /
>>
>> Fixed you mentioned.
>>
>> Attached latest v5 patch.
>> Please review it.
>
> synchronous_standby_num doesn't appear to be a valid GUC name:
>
> LOG: unrecognized configuration parameter "synchronous_standby_num"
> in file "/home/thom/Development/test/primary/postgresql.conf" line 244
>
> All I did was uncomment it and set it to a value.
>

Thank you for having a look it.

Yeah, synchronous_standby_num should not exists in postgresql.conf.
Please test for multiple sync replication with latest patch.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v6.patch	binary/octet-stream	17.3 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-20 06:05:01
Message-ID:	CAD21AoCtpS6Pz4BqjnTp_hhp6ioVCzp0_7Db9HVH=JKtRAYDMw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 19, 2016 at 2:55 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Jan 19, 2016 at 1:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Mon, Jan 18, 2016 at 1:20 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Sun, Jan 17, 2016 at 11:09 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Wed, Jan 13, 2016 at 1:54 AM, Alvaro Herrera wrote:
>>>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>>>> * Confirm that the data is synchronously replicated to multiple
>>>> standbys in same cases.
>>>> * case 1 : The standby which is not listed in s_s_name, is down
>>>> * case 2 : The standby which is listed in s_s_names but potential
>>>> standby, is down
>>>> * case 3 : The standby which is considered as sync standby, is down.
>>>> * Standby promotion
>>>>
>>>> In order to confirm that the commit isn't done in case #3 forever
>>>> unless new sync standby is up, I think we need the framework that
>>>> cancels executing query.
>>>> That is, what I'm planning is,
>>>> 1. Set up master server (s_s_name = '2, standby1, standby2)
>>>> 2. Set up two standby servers
>>>> 3. Standby1 is down
>>>> 4. Create some contents on master (But transaction is not committed)
>>>> 5. Cancel the #4 query. (Also confirm that the flush location of only
>>>> standby2 makes progress)
>>>
>>> This will need some thinking and is not as easy as it sounds. There is
>>> no way to hold on a connection after executing a query in the current
>>> TAP infrastructure. You are just mentioning case 3, but actually cases
>>> 1 and 2 are falling into the same need: if there is a failure we want
>>> to be able to not be stuck in the test forever and have a way to
>>> cancel a query execution at will. TAP uses psql -c to execute any sql
>>> queries, but we would need something that is far lower-level, and that
>>> would be basically using the perl driver for Postgres or an equivalent
>>> here.
>>>
>>> Honestly for those tests I just thought that we could get to something
>>> reliable by just looking at how each sync replication setup reflects
>>> in pg_stat_replication as the flow is really getting complicated,
>>> giving to the user a clear representation at SQL level of what is
>>> actually occurring in the server depending on the configuration used
>>> being important here.
>>
>> I see.
>> We could check the transition of sync_state in pg_stat_replication.
>> I think it means that it tests for each replication method (switching
>> state) rather than synchronization of replication.
>>
>> What I'm planning to have are,
>> * Confirm value of pg_stat_replication.sync_state (sync, async or potential)
>> * Standby promotion
>> * Standby catching up master
>> And each replication method has above tests.
>>
>> Are these enough?
>
> Does promoting the standby and checking that it caught really have
> value in this context of this patch? What we just want to know is on a
> master, which nodes need to be waited for when s_s_names or any other
> method is used, no?

Yeah, these 2 tests are not in this context of this patch.
If test framework could have the facility that allows us to execute
query(psql) as another process, we could use pg_cancel_backend()
function to waiting process when master server waiting for standbys.
In order to check whether the master server would wait for the standby
or not, we need test framework to have such facility, I think.

Regards,

--
Masahiko Sawada

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-28 11:05:28
Message-ID:	CAHGQGwHc838dvraSD2MynnpnVUpwrGWxRUmu7_oe8YaUSegPHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 20, 2016 at 2:35 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Jan 19, 2016 at 1:52 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>> On 3 January 2016 at 13:26, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>>>>> along with the output buffer pointer.
>>>>>
>>>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>>>> function uses synchronous_standby_num which is global variable.
>>>>> But you mean that the number of synchronous standbys is given as
>>>>> function argument?
>>>>
>>>> Yeah, I was thinking of it as the output buffer size which I would be
>>>> inclined to make more explicit (I am still coming to terms with the
>>>> use of global variables in Postgres) but it doesn't matter, please
>>>> disregard that suggestion.
>>>>
>>>>>>> As for the body of that function (which I won't paste here), it
>>>>>>> contains an algorithm to find the top K elements in an array of N
>>>>>>> elements. It does that with a linear search through the top K seen so
>>>>>>> far for each value in the input array, so its worst case is O(KN)
>>>>>>> comparisons. Some of the sorting gurus on this list might have
>>>>>>> something to say about that but my take is that it seems fine for the
>>>>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>>>>> that it doesn't need any space other than the output buffer, unlike
>>>>>>> some other top-K algorithms which would win for larger inputs.
>>>>>
>>>>> Yeah, it's improvement point.
>>>>> But I'm assumed that the number of synchronous replication is not
>>>>> large, so I use this algorithm as first version.
>>>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>>>
>>>> You're right, I was dropping that detail, in the tradition of the
>>>> hand-wavy school of big-O notation. (I suppose you could skip the
>>>> inner loop when the priority is lower than the current lowest
>>>> priority, giving a O(N) best case when the walsenders are perfectly
>>>> ordered by coincidence. Probably a bad idea or just not worth
>>>> worrying about.)
>>>
>>> Thank you for reviewing the patch.
>>> Yeah, I added the logic that skip the inner loop.
>>>
>>>>
>>>>> Attached latest version patch.
>>>>
>>>> +/*
>>>> + * Obtain currently synced LSN location: write and flush, using priority
>>>> - * In 9.1 we support only a single synchronous standby, chosen from a
>>>> - * priority list of synchronous_standby_names. Before it can become the
>>>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>>>
>>>> s/standby/standbys/
>>>>
>>>> + * list of synchronous_standby_names. Before it can become the
>>>>
>>>> s/Before it can become the/Before any standby can become a/
>>>>
>>>> * synchronous standby it must have caught up with the primary; that may
>>>> * take some time. Once caught up, the current highest priority standby
>>>>
>>>> s/standby/standbys/
>>>>
>>>> * will release waiters from the queue.
>>>>
>>>> +bool
>>>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>>> +{
>>>> + int sync_standbys[synchronous_standby_num];
>>>>
>>>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>>>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>>>> in C89.)
>>>>
>>>> +/*
>>>> + * Populate a caller-supplied array which much have enough space for
>>>> + * synchronous_standby_num. Returns position of standbys currently
>>>> + * considered as synchronous, and its length.
>>>> + */
>>>> +int
>>>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>>>
>>>> s/much/must/ (my bad, in previous email).
>>>>
>>>> + ereport(ERROR,
>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>> + errmsg("The number of synchronous standbys must be smaller than the
>>>> number of listed : %d",
>>>> + synchronous_standby_num)));
>>>>
>>>> How about "the number of synchronous standbys exceeds the length of
>>>> the standby list: %d"? Error messages usually start with lower case,
>>>> ':' is not usually preceded by a space.
>>>>
>>>> + ereport(ERROR,
>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>>>
>>>> s/The/the/, s/ : /: /
>>>
>>> Fixed you mentioned.
>>>
>>> Attached latest v5 patch.
>>> Please review it.
>>
>> synchronous_standby_num doesn't appear to be a valid GUC name:
>>
>> LOG: unrecognized configuration parameter "synchronous_standby_num"
>> in file "/home/thom/Development/test/primary/postgresql.conf" line 244
>>
>> All I did was uncomment it and set it to a value.
>>
>
> Thank you for having a look it.
>
> Yeah, synchronous_standby_num should not exists in postgresql.conf.
> Please test for multiple sync replication with latest patch.

In synchronous_replication_method = 'priority' case, when I set
synchronous_standby_names to invalid value like 'hoge,foo' and
reloaded the configuration file, the server crashed with
the following error. This crash should not happen.

FATAL: invalid input syntax for integer: "hoge"

+ /*
+ * After read all synchronous replication configuration parameter, we apply
+ * settings according to replication method.
+ */
+ ProcessSynchronousReplicationConfig();

Why does the above function need to be called in ProcessConfigFile(), i.e.,
by every postgres processes? I was thinking that only walsender should
call that to check which walsender is synchronous according to the setting.

When synchronous_replication_method = '1-priority' and
synchronous_standby_names = '*', I started one synchronous standby.
Then, when I ran "SELECT * FROM pg_stat_replication", I got the
following WARNING message.

WARNING: detected write past chunk end in ExprContext 0x2acb3c0

I don't think that it's good design to specify the number of sync replicas
to wait for, in synchronous_standby_names. It's confusing for the users.
It's better to add separate parameter (synchronous_standby_num) for
specifying that number. Which increases the number of GUC parameters,
though.

Are we really planning to implement synchronous_replication_method=quorum
at the first version? If not, I'd like to remove s_r_method parameter
because it's meaningless. We can add it later when we implement "quorum".

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-28 13:10:49
Message-ID:	CAD21AoB5wQFO492WD2w=QsioR3_A9FbrP06GvGW9z6UmpS0JGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 28, 2016 at 8:05 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Jan 20, 2016 at 2:35 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Jan 19, 2016 at 1:52 AM, Thom Brown <thom(at)linux(dot)com> wrote:
>>> On 3 January 2016 at 13:26, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Fri, Dec 25, 2015 at 7:21 AM, Thomas Munro
>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>> On Fri, Dec 25, 2015 at 8:50 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>> On Wed, Dec 23, 2015 at 8:45 AM, Thomas Munro
>>>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>>>> On Wed, Dec 23, 2015 at 3:50 PM, Thomas Munro
>>>>>>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>>>>>>> If you got rid of SyncRepGetSyncStandbysOnePriority as suggested
>>>>>>>> above, then this function could be renamed to SyncRepGetSyncStandbys.
>>>>>>>> I think it would be a tiny bit nicer if it also took a Size n argument
>>>>>>>> along with the output buffer pointer.
>>>>>>
>>>>>> Sorry, I could not get your point. SyncRepGetSyncStandbysPriority()
>>>>>> function uses synchronous_standby_num which is global variable.
>>>>>> But you mean that the number of synchronous standbys is given as
>>>>>> function argument?
>>>>>
>>>>> Yeah, I was thinking of it as the output buffer size which I would be
>>>>> inclined to make more explicit (I am still coming to terms with the
>>>>> use of global variables in Postgres) but it doesn't matter, please
>>>>> disregard that suggestion.
>>>>>
>>>>>>>> As for the body of that function (which I won't paste here), it
>>>>>>>> contains an algorithm to find the top K elements in an array of N
>>>>>>>> elements. It does that with a linear search through the top K seen so
>>>>>>>> far for each value in the input array, so its worst case is O(KN)
>>>>>>>> comparisons. Some of the sorting gurus on this list might have
>>>>>>>> something to say about that but my take is that it seems fine for the
>>>>>>>> tiny values of K and N that we're dealing with here, and it's nice
>>>>>>>> that it doesn't need any space other than the output buffer, unlike
>>>>>>>> some other top-K algorithms which would win for larger inputs.
>>>>>>
>>>>>> Yeah, it's improvement point.
>>>>>> But I'm assumed that the number of synchronous replication is not
>>>>>> large, so I use this algorithm as first version.
>>>>>> And I think that its worst case is O(K(N-K)). Am I missing something?
>>>>>
>>>>> You're right, I was dropping that detail, in the tradition of the
>>>>> hand-wavy school of big-O notation. (I suppose you could skip the
>>>>> inner loop when the priority is lower than the current lowest
>>>>> priority, giving a O(N) best case when the walsenders are perfectly
>>>>> ordered by coincidence. Probably a bad idea or just not worth
>>>>> worrying about.)
>>>>
>>>> Thank you for reviewing the patch.
>>>> Yeah, I added the logic that skip the inner loop.
>>>>
>>>>>
>>>>>> Attached latest version patch.
>>>>>
>>>>> +/*
>>>>> + * Obtain currently synced LSN location: write and flush, using priority
>>>>> - * In 9.1 we support only a single synchronous standby, chosen from a
>>>>> - * priority list of synchronous_standby_names. Before it can become the
>>>>> + * In 9.6 we support multiple synchronous standby, chosen from a priority
>>>>>
>>>>> s/standby/standbys/
>>>>>
>>>>> + * list of synchronous_standby_names. Before it can become the
>>>>>
>>>>> s/Before it can become the/Before any standby can become a/
>>>>>
>>>>> * synchronous standby it must have caught up with the primary; that may
>>>>> * take some time. Once caught up, the current highest priority standby
>>>>>
>>>>> s/standby/standbys/
>>>>>
>>>>> * will release waiters from the queue.
>>>>>
>>>>> +bool
>>>>> +SyncRepGetSyncLsnsPriority(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
>>>>> +{
>>>>> + int sync_standbys[synchronous_standby_num];
>>>>>
>>>>> I think this should be sync_standbys[SYNC_REP_MAX_SYNC_STANDBY_NUM].
>>>>> (Variable sized arrays are a feature of C99 and PostgreSQL is written
>>>>> in C89.)
>>>>>
>>>>> +/*
>>>>> + * Populate a caller-supplied array which much have enough space for
>>>>> + * synchronous_standby_num. Returns position of standbys currently
>>>>> + * considered as synchronous, and its length.
>>>>> + */
>>>>> +int
>>>>> +SyncRepGetSyncStandbys(int *sync_standbys)
>>>>>
>>>>> s/much/must/ (my bad, in previous email).
>>>>>
>>>>> + ereport(ERROR,
>>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>>> + errmsg("The number of synchronous standbys must be smaller than the
>>>>> number of listed : %d",
>>>>> + synchronous_standby_num)));
>>>>>
>>>>> How about "the number of synchronous standbys exceeds the length of
>>>>> the standby list: %d"? Error messages usually start with lower case,
>>>>> ':' is not usually preceded by a space.
>>>>>
>>>>> + ereport(ERROR,
>>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>>> + errmsg("The number of synchronous standbys must be between 1 and %d : %d",
>>>>>
>>>>> s/The/the/, s/ : /: /
>>>>
>>>> Fixed you mentioned.
>>>>
>>>> Attached latest v5 patch.
>>>> Please review it.
>>>
>>> synchronous_standby_num doesn't appear to be a valid GUC name:
>>>
>>> LOG: unrecognized configuration parameter "synchronous_standby_num"
>>> in file "/home/thom/Development/test/primary/postgresql.conf" line 244
>>>
>>> All I did was uncomment it and set it to a value.
>>>
>>
>> Thank you for having a look it.
>>
>> Yeah, synchronous_standby_num should not exists in postgresql.conf.
>> Please test for multiple sync replication with latest patch.
>
> In synchronous_replication_method = 'priority' case, when I set
> synchronous_standby_names to invalid value like 'hoge,foo' and
> reloaded the configuration file, the server crashed with
> the following error. This crash should not happen.
>
> FATAL: invalid input syntax for integer: "hoge"
>
> + /*
> + * After read all synchronous replication configuration parameter, we apply
> + * settings according to replication method.
> + */
> + ProcessSynchronousReplicationConfig();
>
> Why does the above function need to be called in ProcessConfigFile(), i.e.,
> by every postgres processes? I was thinking that only walsender should
> call that to check which walsender is synchronous according to the setting.
>
> When synchronous_replication_method = '1-priority' and
> synchronous_standby_names = '*', I started one synchronous standby.
> Then, when I ran "SELECT * FROM pg_stat_replication", I got the
> following WARNING message.
>
> WARNING: detected write past chunk end in ExprContext 0x2acb3c0
>
> I don't think that it's good design to specify the number of sync replicas
> to wait for, in synchronous_standby_names. It's confusing for the users.
> It's better to add separate parameter (synchronous_standby_num) for
> specifying that number. Which increases the number of GUC parameters,
> though.
>
> Are we really planning to implement synchronous_replication_method=quorum
> at the first version? If not, I'd like to remove s_r_method parameter
> because it's meaningless. We can add it later when we implement "quorum".

Thank you for your comment.

By the discussions so far, I'm planning to have several replication
methods such as 'quorum', 'complex' in the feature, and the each
replication method specifies the syntax of s_s_names.
It means that s_s_names could have the number of sync standbys like
what current patch does.
If we have additional GUC like synchronous_standby_num then it will
look oddly, I think.

Even if we don't have 'quorum' method in first version, the synctax of
s_s_names is completely different between 'priority' and '1-priority'.
So we will need to have new GUC parameter like s_r_method in order to
specify the syntax of s_s_names, I think.

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-31 04:17:07
Message-ID:	CAB7nPqTn00tMxN1Jumjd+SiZ7wPQ0UNVNo9X_UN7dSxgt9inyQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
> By the discussions so far, I'm planning to have several replication
> methods such as 'quorum', 'complex' in the feature, and the each
> replication method specifies the syntax of s_s_names.
> It means that s_s_names could have the number of sync standbys like
> what current patch does.

What if the application_name of a standby node has the format of an integer?
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-31 08:08:38
Message-ID:	CAD21AoAJZa-cUhdFEOYbm5GXO+XyN+GQNeqdz_-cSpOTtgYxcQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>> By the discussions so far, I'm planning to have several replication
>> methods such as 'quorum', 'complex' in the feature, and the each
>> replication method specifies the syntax of s_s_names.
>> It means that s_s_names could have the number of sync standbys like
>> what current patch does.
>
> What if the application_name of a standby node has the format of an integer?

Even if the standby has an integer as application_name, we can set
s_s_names like '2,1,2,3'.
The leading '2' is always handled as the number of sync standbys when
s_r_method = 'priority'.

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-31 08:18:17
Message-ID:	CAB7nPqTaR1hg=5kMXpDiPDfGSLr3=KWhGyEBvO8nPSkV=Eft1w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>> By the discussions so far, I'm planning to have several replication
>>> methods such as 'quorum', 'complex' in the feature, and the each
>>> replication method specifies the syntax of s_s_names.
>>> It means that s_s_names could have the number of sync standbys like
>>> what current patch does.
>>
>> What if the application_name of a standby node has the format of an integer?
>
> Even if the standby has an integer as application_name, we can set
> s_s_names like '2,1,2,3'.
> The leading '2' is always handled as the number of sync standbys when
> s_r_method = 'priority'.

Hm. I agree with Fujii-san here, having the number of sync standbys
defined in a parameter that should have a list of names is a bit
confusing. I'd rather have a separate GUC, which brings us back to one
of the first patches that I came up with, and a couple of people,
including Josh were not happy with that because this did not support
real quorum. Perhaps the final answer would be really to get a set of
hooks, and a contrib module making use of that.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-31 08:28:47
Message-ID:	CAD21AoCc10na_9TB_d3pBbTJw_51dz+rXtzeiGU3FtyQ9TP5rw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>>> By the discussions so far, I'm planning to have several replication
>>>> methods such as 'quorum', 'complex' in the feature, and the each
>>>> replication method specifies the syntax of s_s_names.
>>>> It means that s_s_names could have the number of sync standbys like
>>>> what current patch does.
>>>
>>> What if the application_name of a standby node has the format of an integer?
>>
>> Even if the standby has an integer as application_name, we can set
>> s_s_names like '2,1,2,3'.
>> The leading '2' is always handled as the number of sync standbys when
>> s_r_method = 'priority'.
>
> Hm. I agree with Fujii-san here, having the number of sync standbys
> defined in a parameter that should have a list of names is a bit
> confusing. I'd rather have a separate GUC, which brings us back to one
> of the first patches that I came up with, and a couple of people,
> including Josh were not happy with that because this did not support
> real quorum. Perhaps the final answer would be really to get a set of
> hooks, and a contrib module making use of that.

Yeah, I agree with having set of hooks, and postgres core has simple
multi sync replication mechanism like you suggested at first version.

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-01-31 11:58:26
Message-ID:	CAB7nPqR4EoR9w+TLamU34aojkthU_c==Yrvnix+TjBO5K81ZXw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>>>> By the discussions so far, I'm planning to have several replication
>>>>> methods such as 'quorum', 'complex' in the feature, and the each
>>>>> replication method specifies the syntax of s_s_names.
>>>>> It means that s_s_names could have the number of sync standbys like
>>>>> what current patch does.
>>>>
>>>> What if the application_name of a standby node has the format of an integer?
>>>
>>> Even if the standby has an integer as application_name, we can set
>>> s_s_names like '2,1,2,3'.
>>> The leading '2' is always handled as the number of sync standbys when
>>> s_r_method = 'priority'.
>>
>> Hm. I agree with Fujii-san here, having the number of sync standbys
>> defined in a parameter that should have a list of names is a bit
>> confusing. I'd rather have a separate GUC, which brings us back to one
>> of the first patches that I came up with, and a couple of people,
>> including Josh were not happy with that because this did not support
>> real quorum. Perhaps the final answer would be really to get a set of
>> hooks, and a contrib module making use of that.
>
> Yeah, I agree with having set of hooks, and postgres core has simple
> multi sync replication mechanism like you suggested at first version.

If there are hooks, I don't think that we should really bother about
having in core anything more complicated than what we have now. The
trick will be to come up with a hook design modular enough to support
the kind of configurations mentioned on this thread. Roughly perhaps a
refactoring of the syncrep code so as it is possible to wait for
multiple targets some of them being optional,, one modular way in
pg_stat_get_wal_senders to represent the status of a node to user, and
another hook to return to decide which are the nodes to wait for. Some
of the nodes being waited for may be based on conditions for quorum
support. That's a hard problem to do that in a flexible enough way.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-01 08:36:36
Message-ID:	CAD21AoCmOk_RM6SsMcDs3etXpHEXnz7pYqBdNoiXRwJRXjAn+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>>>>> By the discussions so far, I'm planning to have several replication
>>>>>> methods such as 'quorum', 'complex' in the feature, and the each
>>>>>> replication method specifies the syntax of s_s_names.
>>>>>> It means that s_s_names could have the number of sync standbys like
>>>>>> what current patch does.
>>>>>
>>>>> What if the application_name of a standby node has the format of an integer?
>>>>
>>>> Even if the standby has an integer as application_name, we can set
>>>> s_s_names like '2,1,2,3'.
>>>> The leading '2' is always handled as the number of sync standbys when
>>>> s_r_method = 'priority'.
>>>
>>> Hm. I agree with Fujii-san here, having the number of sync standbys
>>> defined in a parameter that should have a list of names is a bit
>>> confusing. I'd rather have a separate GUC, which brings us back to one
>>> of the first patches that I came up with, and a couple of people,
>>> including Josh were not happy with that because this did not support
>>> real quorum. Perhaps the final answer would be really to get a set of
>>> hooks, and a contrib module making use of that.
>>
>> Yeah, I agree with having set of hooks, and postgres core has simple
>> multi sync replication mechanism like you suggested at first version.
>
> If there are hooks, I don't think that we should really bother about
> having in core anything more complicated than what we have now. The
> trick will be to come up with a hook design modular enough to support
> the kind of configurations mentioned on this thread. Roughly perhaps a
> refactoring of the syncrep code so as it is possible to wait for
> multiple targets some of them being optional,, one modular way in
> pg_stat_get_wal_senders to represent the status of a node to user, and
> another hook to return to decide which are the nodes to wait for. Some
> of the nodes being waited for may be based on conditions for quorum
> support. That's a hard problem to do that in a flexible enough way.

Hm, I think not-nested quorum and priority are not complicated, and we
should support at least both or either simple method in core of
postgres.
More complicated method like using json-style, or dedicated language
would be supported by external module.

Regards,

--
Masahiko Sawada

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-01 14:28:05
Message-ID:	CAHGQGwFrVq09+vYACZJdzGAFDkwXt-Hw6ffZgKO12cSKri4-+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 1, 2016 at 5:36 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>>>>>> By the discussions so far, I'm planning to have several replication
>>>>>>> methods such as 'quorum', 'complex' in the feature, and the each
>>>>>>> replication method specifies the syntax of s_s_names.
>>>>>>> It means that s_s_names could have the number of sync standbys like
>>>>>>> what current patch does.
>>>>>>
>>>>>> What if the application_name of a standby node has the format of an integer?
>>>>>
>>>>> Even if the standby has an integer as application_name, we can set
>>>>> s_s_names like '2,1,2,3'.
>>>>> The leading '2' is always handled as the number of sync standbys when
>>>>> s_r_method = 'priority'.
>>>>
>>>> Hm. I agree with Fujii-san here, having the number of sync standbys
>>>> defined in a parameter that should have a list of names is a bit
>>>> confusing. I'd rather have a separate GUC, which brings us back to one
>>>> of the first patches that I came up with, and a couple of people,
>>>> including Josh were not happy with that because this did not support
>>>> real quorum. Perhaps the final answer would be really to get a set of
>>>> hooks, and a contrib module making use of that.
>>>
>>> Yeah, I agree with having set of hooks, and postgres core has simple
>>> multi sync replication mechanism like you suggested at first version.
>>
>> If there are hooks, I don't think that we should really bother about
>> having in core anything more complicated than what we have now. The
>> trick will be to come up with a hook design modular enough to support
>> the kind of configurations mentioned on this thread. Roughly perhaps a
>> refactoring of the syncrep code so as it is possible to wait for
>> multiple targets some of them being optional,, one modular way in
>> pg_stat_get_wal_senders to represent the status of a node to user, and
>> another hook to return to decide which are the nodes to wait for. Some
>> of the nodes being waited for may be based on conditions for quorum
>> support. That's a hard problem to do that in a flexible enough way.
>
> Hm, I think not-nested quorum and priority are not complicated, and we
> should support at least both or either simple method in core of
> postgres.
> More complicated method like using json-style, or dedicated language
> would be supported by external module.

So what about the following plan?

[first version]
Add only synchronous_standby_num which specifies the number of standbys
that the master must wait for before marking sync replication as completed.
This version supports simple use cases like "I want to have two synchronous
standbys".

[second version]
Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
additionally supports simple quorum commit case like "I want to ensure
that WAL is replicated synchronously to at least two standbys from five
ones listed in s_s_names".

Add something like quorum_replication_num and quorum_standby_names, i.e.,
the master must wait for at least q_r_num standbys from ones listed in
q_s_names before marking sync replication as completed. Also the master
must wait for sync replication according to s_s_num and s_s_num.
That is, this approach separates 'priority' and 'quorum' to each parameters.
This increases the number of GUC parameters, but ISTM less confusing, and
it supports a bit complicated case like "there is one local standby and three
remote standbys, then I want to ensure that WAL is replicated synchronously
to the local standby and at least two remote one", e.g.,

s_s_num = 1, s_s_names = 'local'
q_s_num = 2, q_s_names = 'remote1, remote2, remote3'

[third version]
Add the hooks for more complicated sync replication cases.

I'm thinking that the realistic target for 9.6 might be the first one.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-02 03:18:14
Message-ID:	CAB7nPqTwEqZ8V+To4aKLnOtQzAYp4w9_rPF5uBtB6gJPwiqnYg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 1, 2016 at 11:28 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>wrote:

> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each
> parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and
> three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
> s_s_num = 1, s_s_names = 'local'
> q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.
>
> I'm thinking that the realistic target for 9.6 might be the first one.
>

If we want to get something out for this release, clearly yes, and being
able to specify 2 sync targets is already a win when the two sync standbys
are not exactly at the same location. FWIW, I don't doing coding and/or
review work, that's basically my first patch that needs a bit more love and
polishing, *and* test cases but I am used enough to perl and PostgresNode
these days to produce something based on sanity checks of
pg_stat_replication and my other set of patches that have more basic
routines.

Now I would not mind if we actually jump into the 3rd case if we are fine
with doing nothing for this release, but this requires a lot of design and
background work, so that's not plausible for 9.6. Of course if there are
voices against the scenario proposed by Fujii-san others feel free to speak
up.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-02 15:13:07
Message-ID:	CAD21AoDONwiXEaj0proCxjq9G4VCS9-JMO2-=V6E_7r6=7Dy5w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 1, 2016 at 11:28 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Feb 1, 2016 at 5:36 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>>> On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
>>>>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>>>>> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>>>>>>>> By the discussions so far, I'm planning to have several replication
>>>>>>>> methods such as 'quorum', 'complex' in the feature, and the each
>>>>>>>> replication method specifies the syntax of s_s_names.
>>>>>>>> It means that s_s_names could have the number of sync standbys like
>>>>>>>> what current patch does.
>>>>>>>
>>>>>>> What if the application_name of a standby node has the format of an integer?
>>>>>>
>>>>>> Even if the standby has an integer as application_name, we can set
>>>>>> s_s_names like '2,1,2,3'.
>>>>>> The leading '2' is always handled as the number of sync standbys when
>>>>>> s_r_method = 'priority'.
>>>>>
>>>>> Hm. I agree with Fujii-san here, having the number of sync standbys
>>>>> defined in a parameter that should have a list of names is a bit
>>>>> confusing. I'd rather have a separate GUC, which brings us back to one
>>>>> of the first patches that I came up with, and a couple of people,
>>>>> including Josh were not happy with that because this did not support
>>>>> real quorum. Perhaps the final answer would be really to get a set of
>>>>> hooks, and a contrib module making use of that.
>>>>
>>>> Yeah, I agree with having set of hooks, and postgres core has simple
>>>> multi sync replication mechanism like you suggested at first version.
>>>
>>> If there are hooks, I don't think that we should really bother about
>>> having in core anything more complicated than what we have now. The
>>> trick will be to come up with a hook design modular enough to support
>>> the kind of configurations mentioned on this thread. Roughly perhaps a
>>> refactoring of the syncrep code so as it is possible to wait for
>>> multiple targets some of them being optional,, one modular way in
>>> pg_stat_get_wal_senders to represent the status of a node to user, and
>>> another hook to return to decide which are the nodes to wait for. Some
>>> of the nodes being waited for may be based on conditions for quorum
>>> support. That's a hard problem to do that in a flexible enough way.
>>
>> Hm, I think not-nested quorum and priority are not complicated, and we
>> should support at least both or either simple method in core of
>> postgres.
>> More complicated method like using json-style, or dedicated language
>> would be supported by external module.
>
> So what about the following plan?
>
> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
> s_s_num = 1, s_s_names = 'local'
> q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.
>
> I'm thinking that the realistic target for 9.6 might be the first one.
>

Thank you for suggestion.

I agree with first version, and attached the updated patch which are
modified so that it supports simple multiple sync replication you
suggested.
(but test cases are not included yet.)

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v7.patch	text/x-patch	13.5 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-02 22:33:20
Message-ID:	CA+TgmoYnXbsBe4Ueq1PmCWzDR9jjHmSfg8Wo+=mofKyZVJDsNA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 1, 2016 at 9:28 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> So what about the following plan?
>
> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
> s_s_num = 1, s_s_names = 'local'
> q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.

-1. We're wrapping ourselves around the axle here and ending up with
a design that will not let someone say "the local standby and at least
one remote standby" without writing C code. I understand nobody likes
the mini-language I proposed and nobody likes a JSON configuration
file either. I also understand that either of those things would
allow ridiculously complicated configurations that nobody will ever
need in the real world. But I think "one local and one remote" is a
fairly common case and that you shouldn't need a PhD in
PostgreSQLology to configure it.

Also, to be frank, I think we ought to be putting more effort into
another patch in this same area, specifically Thomas Munro's causal
reads patch. I think a lot of people today are trying to use
synchronous replication to build load-balancing clusters and avoid the
problem where you write some data and then read back stale data from a
standby server. Of course, our current synchronous replication
facilities make no such guarantees - his patch does, and I think
that's pretty important. I'm not saying that we shouldn't do this
too, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-03 01:48:36
Message-ID:	CAHGQGwFHhhOMB4tAxCoJAPqJyKyE+upenqLAj53oDiDHxED36Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Feb 1, 2016 at 9:28 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> So what about the following plan?
>>
>> [first version]
>> Add only synchronous_standby_num which specifies the number of standbys
>> that the master must wait for before marking sync replication as completed.
>> This version supports simple use cases like "I want to have two synchronous
>> standbys".
>>
>> [second version]
>> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
>> additionally supports simple quorum commit case like "I want to ensure
>> that WAL is replicated synchronously to at least two standbys from five
>> ones listed in s_s_names".
>>
>> Or
>>
>> Add something like quorum_replication_num and quorum_standby_names, i.e.,
>> the master must wait for at least q_r_num standbys from ones listed in
>> q_s_names before marking sync replication as completed. Also the master
>> must wait for sync replication according to s_s_num and s_s_num.
>> That is, this approach separates 'priority' and 'quorum' to each parameters.
>> This increases the number of GUC parameters, but ISTM less confusing, and
>> it supports a bit complicated case like "there is one local standby and three
>> remote standbys, then I want to ensure that WAL is replicated synchronously
>> to the local standby and at least two remote one", e.g.,
>>
>> s_s_num = 1, s_s_names = 'local'
>> q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>>
>> [third version]
>> Add the hooks for more complicated sync replication cases.
>
> -1. We're wrapping ourselves around the axle here and ending up with
> a design that will not let someone say "the local standby and at least
> one remote standby" without writing C code. I understand nobody likes
> the mini-language I proposed and nobody likes a JSON configuration
> file either. I also understand that either of those things would
> allow ridiculously complicated configurations that nobody will ever
> need in the real world. But I think "one local and one remote" is a
> fairly common case and that you shouldn't need a PhD in
> PostgreSQLology to configure it.

So you disagree with only third version that I proposed, i.e.,
adding some hooks for sync replication? If yes and you're OK
with the first and second versions, ISTM that we almost reached
consensus on the direction of multiple sync replication feature.
The first version can cover "one local and one remote sync standbys" case,
and the second can cover "one local and at least one from several remote
standbys" case. I'm thinking to focus on the first version now,
and then we can work on the second to support the quorum commit

Regards,

--
Fujii Masao

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-03 02:00:35
Message-ID:	CA+TgmoZvf3sBQZqq_R6myYfkVo2Zm+PMK6o7vvCdqYbz6p7Sxg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> So you disagree with only third version that I proposed, i.e.,
> adding some hooks for sync replication? If yes and you're OK
> with the first and second versions, ISTM that we almost reached
> consensus on the direction of multiple sync replication feature.
> The first version can cover "one local and one remote sync standbys" case,
> and the second can cover "one local and at least one from several remote
> standbys" case. I'm thinking to focus on the first version now,
> and then we can work on the second to support the quorum commit

Well, I think the only hard part of the third problem is deciding on
what syntax to use. It seems like a waste of time to me to go to a
bunch of trouble to implement #1 and #2 using one syntax and then have
to invent a whole new syntax for #3. Seriously, this isn't that hard:
it's not a technical problem. It's just that we've got a bunch of
people who can't agree on what syntax to use. IMO, you should just
pick something. You're presumably the committer for this patch, and I
think you should just decide which of the 47,123 things proposed so
far is best and insist on that. I trust that you will make a good
decision even if it's different than the decision that I would have
made.

Now, if it's easier to implement a subset of that syntax first and
then extend it later, fine. But it makes no sense to me to implement
the easy cases without having some idea of how you're go to extend
that to the hard cases. Then you'll just end up with a mishmash.
Pick something that can be extended to handle all of the plausible
cases, whether it's a mini-language or a JSON blob or a
pg_hba.conf-type file or some other crazy thing that you invent, and
just do it and be done with it. We've wasted far too much time trying
to reach consensus on this: it's time for you to exercise your vast
dictatorial power.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 14:34:24
Message-ID:	CAHGQGwFmoYSh_feORd7hzy6BvW8DydKSWmqT2ScsUs3Bf5CXjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 3, 2016 at 11:00 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> So you disagree with only third version that I proposed, i.e.,
>> adding some hooks for sync replication? If yes and you're OK
>> with the first and second versions, ISTM that we almost reached
>> consensus on the direction of multiple sync replication feature.
>> The first version can cover "one local and one remote sync standbys" case,
>> and the second can cover "one local and at least one from several remote
>> standbys" case. I'm thinking to focus on the first version now,
>> and then we can work on the second to support the quorum commit
>
> Well, I think the only hard part of the third problem is deciding on
> what syntax to use. It seems like a waste of time to me to go to a
> bunch of trouble to implement #1 and #2 using one syntax and then have
> to invent a whole new syntax for #3. Seriously, this isn't that hard:
> it's not a technical problem. It's just that we've got a bunch of
> people who can't agree on what syntax to use. IMO, you should just
> pick something. You're presumably the committer for this patch, and I
> think you should just decide which of the 47,123 things proposed so
> far is best and insist on that. I trust that you will make a good
> decision even if it's different than the decision that I would have
> made.

If we use one syntax for every cases, possible approaches that we can choose
are mini-language, json, etc. Since my previous proposal covers only very
simple cases, extra syntax needs to be supported for more complicated cases.
My plan was to add the hooks so that the developers can choose their own
syntax. But which might confuse users.

Now I'm thinking that mini-language is better choice. A json has some good
points, but its big problem is that the setting value is likely to be very long.
For example, when the master needs to wait for one local standby and
at least one from three remote standbys in London data center, the setting
value (synchronous_standby_names) would be

s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
"nodes":["london1", "london2", "london3"]}]}'

OTOH, the value with mini-language is simple and not so long as follows.

s_s_names = '2[local1, 1(london1, london2, london3)]'

This is why I'm now thinking that mini-language is better. But it's not easy
to completely implement mini-language. There seems to be many problems
that we need to resolve. For example, please imagine the case where
the master needs to wait for at least one from two standbys "tokyo1", "tokyo2"
in Tokyo data center. If Tokyo data center fails, the master needs to
wait for at least one from two standbys "london1", "london2" in London
data center, instead. This case can be configured as follows in mini-language.

s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]'

One problem here is; what pg_stat_replication.sync_state value should be
shown for each standbys? Which standby should be marked as sync? potential?
any other value like quorum? The current design of pg_stat_replication
doesn't fit complicated sync replication cases, so maybe we need to separate
it into several views. It's almost impossible to complete those problems.

My current plan for 9.6 is to support the minimal subset of mini-language;
simple syntax of "<number>[name, ...]". "<number>" specifies the number of
sync standbys that the master needs to wait for. "[name, ...]" specifies
the priorities of the listed standbys. This first version supports neither
quorum commit nor nested sync replication configuration like
"<number>[name, <number>[name, ...]]". It just supports very simple
"1-level" configuration.

Regards,

--
Fujii Masao

From:	Thom Brown <thom(at)linux(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 14:51:25
Message-ID:	CAA-aLv6aOuN=cc8j7VcxbBKSqhaO4PsQTr1Y6yGHf6VZmUKtLw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4 February 2016 at 14:34, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Feb 3, 2016 at 11:00 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Feb 2, 2016 at 8:48 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> So you disagree with only third version that I proposed, i.e.,
>>> adding some hooks for sync replication? If yes and you're OK
>>> with the first and second versions, ISTM that we almost reached
>>> consensus on the direction of multiple sync replication feature.
>>> The first version can cover "one local and one remote sync standbys" case,
>>> and the second can cover "one local and at least one from several remote
>>> standbys" case. I'm thinking to focus on the first version now,
>>> and then we can work on the second to support the quorum commit
>>
>> Well, I think the only hard part of the third problem is deciding on
>> what syntax to use. It seems like a waste of time to me to go to a
>> bunch of trouble to implement #1 and #2 using one syntax and then have
>> to invent a whole new syntax for #3. Seriously, this isn't that hard:
>> it's not a technical problem. It's just that we've got a bunch of
>> people who can't agree on what syntax to use. IMO, you should just
>> pick something. You're presumably the committer for this patch, and I
>> think you should just decide which of the 47,123 things proposed so
>> far is best and insist on that. I trust that you will make a good
>> decision even if it's different than the decision that I would have
>> made.
>
> If we use one syntax for every cases, possible approaches that we can choose
> are mini-language, json, etc. Since my previous proposal covers only very
> simple cases, extra syntax needs to be supported for more complicated cases.
> My plan was to add the hooks so that the developers can choose their own
> syntax. But which might confuse users.
>
> Now I'm thinking that mini-language is better choice. A json has some good
> points, but its big problem is that the setting value is likely to be very long.
> For example, when the master needs to wait for one local standby and
> at least one from three remote standbys in London data center, the setting
> value (synchronous_standby_names) would be
>
> s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
> "nodes":["london1", "london2", "london3"]}]}'
>
> OTOH, the value with mini-language is simple and not so long as follows.
>
> s_s_names = '2[local1, 1(london1, london2, london3)]'
>
> This is why I'm now thinking that mini-language is better. But it's not easy
> to completely implement mini-language. There seems to be many problems
> that we need to resolve. For example, please imagine the case where
> the master needs to wait for at least one from two standbys "tokyo1", "tokyo2"
> in Tokyo data center. If Tokyo data center fails, the master needs to
> wait for at least one from two standbys "london1", "london2" in London
> data center, instead. This case can be configured as follows in mini-language.
>
> s_s_names = '1[1(tokyo1, tokyo2), 1(london1, london2)]'
>
> One problem here is; what pg_stat_replication.sync_state value should be
> shown for each standbys? Which standby should be marked as sync? potential?
> any other value like quorum? The current design of pg_stat_replication
> doesn't fit complicated sync replication cases, so maybe we need to separate
> it into several views. It's almost impossible to complete those problems.
>
> My current plan for 9.6 is to support the minimal subset of mini-language;
> simple syntax of "<number>[name, ...]". "<number>" specifies the number of
> sync standbys that the master needs to wait for. "[name, ...]" specifies
> the priorities of the listed standbys. This first version supports neither
> quorum commit nor nested sync replication configuration like
> "<number>[name, <number>[name, ...]]". It just supports very simple
> "1-level" configuration.

Whatever the solution, I'm really don't like the idea of changing the
definition of s_s_names based on the value of another GUC, mainly
because it seems hacky, but also because the name of the GUC stops
making sense.

Thom

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 16:27:38
Message-ID:	CA+TgmoZjZ_MMo0E1eoZhbR+8ozV4CvwCfFvwtuvSbCj7aT_tiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 9:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Now I'm thinking that mini-language is better choice. A json has some good
> points, but its big problem is that the setting value is likely to be very long.
> For example, when the master needs to wait for one local standby and
> at least one from three remote standbys in London data center, the setting
> value (synchronous_standby_names) would be
>
> s_s_names = '{"priority":2, "nodes":["local1", {"quorum":1,
> "nodes":["london1", "london2", "london3"]}]}'
>
> OTOH, the value with mini-language is simple and not so long as follows.
>
> s_s_names = '2[local1, 1(london1, london2, london3)]'

Yeah, that was my thought also. Another idea which was suggested is
to create a completely new configuration file for this. Most people
would only have simple stuff in there, of course, but then you could
have the information spread across multiple lines.

I don't in the end care very much about how we solve this problem.
But I'm glad you agree that whatever we do to solve the simple problem
should be a logical subset of what the full solution will eventually
look like, not a completely different design. I think that's
important.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 19:21:13
Message-ID:	CAB7nPqTQN1Y9ByX-Xrc7ynFCG8g1_5H0U3G26Lyqr+r6NacwFg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 7:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I don't in the end care very much about how we solve this problem.
> But I'm glad you agree that whatever we do to solve the simple problem
> should be a logical subset of what the full solution will eventually
> look like, not a completely different design. I think that's
> important.

Yes, please let's use the custom language, and let's not care of not
more than 1 level of nesting so as it is possible to represent
pg_stat_replication in a simple way for the user.
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 19:40:11
Message-ID:	CA+TgmoZrv1YSe7ZzA6A3eb=NZ0TdbsmxS30eRTvNZ2Tah=+eZw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Yes, please let's use the custom language, and let's not care of not
> more than 1 level of nesting so as it is possible to represent
> pg_stat_replication in a simple way for the user.

"not" is used twice in this sentence in a way that renders me not able
to be sure that I'm not understanding it not properly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 19:49:11
Message-ID:	CAB7nPqR+TV9BtLTEomAu-ExnKaBbKORKV1JXyK_nyF0ORKP4EQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Yes, please let's use the custom language, and let's not care of not
>> more than 1 level of nesting so as it is possible to represent
>> pg_stat_replication in a simple way for the user.
>
> "not" is used twice in this sentence in a way that renders me not able
> to be sure that I'm not understanding it not properly.

4 times here. Score beaten.

Sorry. Perhaps I am tired... I was just wondering if it would be fine
to only support configurations up to one level of nested objects, like
that:
2[node1, node2, node3]
node1, 2[node2, node3], node3
In short, we could restrict things so as we cannot define a group of
nodes within an existing group.
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 20:06:36
Message-ID:	CA+TgmoYRWrNWYqsNBnsG-4zmop0+Y1bBGjtEEbkSRjCsM98DNw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 2:49 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> Yes, please let's use the custom language, and let's not care of not
>>> more than 1 level of nesting so as it is possible to represent
>>> pg_stat_replication in a simple way for the user.
>>
>> "not" is used twice in this sentence in a way that renders me not able
>> to be sure that I'm not understanding it not properly.
>
> 4 times here. Score beaten.
>
> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> to only support configurations up to one level of nested objects, like
> that:
> 2[node1, node2, node3]
> node1, 2[node2, node3], node3
> In short, we could restrict things so as we cannot define a group of
> nodes within an existing group.

I see. Such a restriction doesn't seem likely to me to prevent people
from doing anything actually useful. But I don't know that it buys
very much either. It's often not very much simpler to handle 2 levels
than n levels. However, I ain't writing the code so...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-04 20:06:45
Message-ID:	CAB7nPqTMV5sZkemGf=SWMyA8QpzV2VW9bRrysXtKzuSVk99ocw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> Yes, please let's use the custom language, and let's not care of not
>>> more than 1 level of nesting so as it is possible to represent
>>> pg_stat_replication in a simple way for the user.
>>
>> "not" is used twice in this sentence in a way that renders me not able
>> to be sure that I'm not understanding it not properly.
>
> 4 times here. Score beaten.
>
> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> to only support configurations up to one level of nested objects, like
> that:
> 2[node1, node2, node3]
> node1, 2[node2, node3], node3
> In short, we could restrict things so as we cannot define a group of
> nodes within an existing group.

No, actually, that's stupid. Having up to two nested levels makes more
sense, a quite common case for this feature being something like that:
2{node1,[node2,node3]}
In short, sync confirmation is waited from node1 and (node2 or node3).

Flattening groups of nodes with a new catalog will be necessary to
ease the view of this data to users:
- group name?
- array of members with nodes/groups
- group type: quorum or priority
- number of items to wait for in this group
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	robertmhaas(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-05 07:40:21
Message-ID:	20160205.164021.30733364.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Thu, 4 Feb 2016 23:06:45 +0300, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqTMV5sZkemGf=SWMyA8QpzV2VW9bRrysXtKzuSVk99ocw(at)mail(dot)gmail(dot)com>
> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > Sorry. Perhaps I am tired... I was just wondering if it would be fine
> > to only support configurations up to one level of nested objects, like
> > that:
> > 2[node1, node2, node3]
> > node1, 2[node2, node3], node3
> > In short, we could restrict things so as we cannot define a group of
> > nodes within an existing group.
>
> No, actually, that's stupid. Having up to two nested levels makes more
> sense, a quite common case for this feature being something like that:
> 2{node1,[node2,node3]}
> In short, sync confirmation is waited from node1 and (node2 or node3).
>
> Flattening groups of nodes with a new catalog will be necessary to
> ease the view of this data to users:
> - group name?
> - array of members with nodes/groups
> - group type: quorum or priority
> - number of items to wait for in this group

Though I personally love the format, I don't fully recognize what
the upcoming consensus is and the discussion looks to be looping
back to the past, so please forgive me to confirm the current
discussion status.

We are coming to agree to have configuration manner including
syntax which is compatible with future possible use, I think this
is correct.

(Though I haven't seen it explicitly written upthread, ) we
regard it as important to keep validity of previous setting using
s_s_names as 1-priority method. Is this correct?

The most promising syntax is now considered as n-level
quorum/priority nesting as Michael's proposal above. Correct?

But aiming to 9.6, we are to support (1 or 2)-levels quorum *or*
priority setup with the subset of the syntax. I don't think this
is fully agreed yet.

We don't consider using extension or some plugin mechanism for
additional configuration method for this feature at least as of
9.6. Correct?

I proposed that s_s_method for backward compatibility, but there
is a voice that such a way of changing the semantics of s_s_names
is confising. I can be in sympathy with him. If so, do we have
another variable (named standbys_definition or likewise?) which
is to be set alternatively with s_s_names? Or take another way?

Sorry for the maybe-noise in advance.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-05 08:36:38
Message-ID:	CAB7nPqSwO3qC_-JVRDbK++=D_vHPFJV0nOStjMxQ+4XbXgAzTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> Yes, please let's use the custom language, and let's not care of not
>>>> more than 1 level of nesting so as it is possible to represent
>>>> pg_stat_replication in a simple way for the user.
>>>
>>> "not" is used twice in this sentence in a way that renders me not able
>>> to be sure that I'm not understanding it not properly.
>>
>> 4 times here. Score beaten.
>>
>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> to only support configurations up to one level of nested objects, like
>> that:
>> 2[node1, node2, node3]
>> node1, 2[node2, node3], node3
>> In short, we could restrict things so as we cannot define a group of
>> nodes within an existing group.
>
> No, actually, that's stupid. Having up to two nested levels makes more
> sense, a quite common case for this feature being something like that:
> 2{node1,[node2,node3]}
> In short, sync confirmation is waited from node1 and (node2 or node3).
>
> Flattening groups of nodes with a new catalog will be necessary to
> ease the view of this data to users:
> - group name?
> - array of members with nodes/groups
> - group type: quorum or priority
> - number of items to wait for in this group

So, here are some thoughts to make that more user-friendly. I think
that the critical issue here is to properly flatten the meta data in
the custom language and represent it properly in a new catalog,
without messing up too much with the existing pg_stat_replication that
people are now used to for 5 releases since 9.0. So, I would think
that we will need to have a new catalog, say
pg_stat_replication_groups with the following things:
- One line of this catalog represents the status of a group or of a single node.
- The status of a node/group is either sync or potential, if a
node/group is specified more than once, it may be possible that it
would be sync and potential depending on where it is defined, in which
case setting its status to 'sync' has the most sense. If it is in sync
state I guess.
- Move sync_priority and sync_state, actually an equivalent from
pg_stat_replication into this new catalog, because those represent the
status of a node or group of nodes.
- group name, and by that I think that we had perhaps better make
mandatory the need to append a name with a quorum or priority group.
The group at the highest level is forcibly named as 'top', 'main', or
whatever if not directly specified by the user. If the entry is
directly a node, use the application_name.
- Type of group, quorum or priority
- Elements in this group, an element can be a group name or a node
name, aka application_name. If group is of type priority, the elements
are listed in increasing order. So the elements with lower priority
get first, etc. We could have one column listing explicitly a list of
integers that map with the elements of a group but it does not seem
worth it, what users would like to know is what are the nodes that are
prioritized. This covers the former 'priority' field of
pg_stat_replication.

We may have a good idea of how to define a custom language, still we
are going to need to design a clean interface at catalog level more or
less close to what is written here. If we can get a clean interface,
the custom language implemented, and TAP tests that take advantage of
this user interface to check the node/group statuses, I guess that we
would be in good shape for this patch.

Anyway that's not a small project, and perhaps I am over-complicating
the whole thing.

Thoughts?
--
Michael

From:	Joshua Berkus <josh(at)agliodbs(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-05 09:10:22
Message-ID:	309589265.42726.1454663422737.JavaMail.zimbra@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.

Yes. The more I look at this, the worse the idea of custom syntax looks. Yes, I realize there are drawbacks to using JSON, but this is worse.

Further, there's a lot of horse-cart inversion here. This proposal involves letting the syntax for sync_list configuration determine the feature set for N-sync. That's backwards; we should decide the total list of features we want to support, and then adopt a syntax which will make it possible to have them.

--
Josh Berkus
Red Hat OSAS
(opinions are my own)

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-05 09:19:24
Message-ID:	CAD21AoA9UqcbTnDKi0osd0yhN4FPgTrg6wuZeTtvpSYy2LqL5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>>> Yes, please let's use the custom language, and let's not care of not
>>>>> more than 1 level of nesting so as it is possible to represent
>>>>> pg_stat_replication in a simple way for the user.
>>>>
>>>> "not" is used twice in this sentence in a way that renders me not able
>>>> to be sure that I'm not understanding it not properly.
>>>
>>> 4 times here. Score beaten.
>>>
>>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>>> to only support configurations up to one level of nested objects, like
>>> that:
>>> 2[node1, node2, node3]
>>> node1, 2[node2, node3], node3
>>> In short, we could restrict things so as we cannot define a group of
>>> nodes within an existing group.
>>
>> No, actually, that's stupid. Having up to two nested levels makes more
>> sense, a quite common case for this feature being something like that:
>> 2{node1,[node2,node3]}
>> In short, sync confirmation is waited from node1 and (node2 or node3).
>>
>> Flattening groups of nodes with a new catalog will be necessary to
>> ease the view of this data to users:
>> - group name?
>> - array of members with nodes/groups
>> - group type: quorum or priority
>> - number of items to wait for in this group
>
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0. So, I would think
> that we will need to have a new catalog, say
> pg_stat_replication_groups with the following things:
> - One line of this catalog represents the status of a group or of a single node.
> - The status of a node/group is either sync or potential, if a
> node/group is specified more than once, it may be possible that it
> would be sync and potential depending on where it is defined, in which
> case setting its status to 'sync' has the most sense. If it is in sync
> state I guess.
> - Move sync_priority and sync_state, actually an equivalent from
> pg_stat_replication into this new catalog, because those represent the
> status of a node or group of nodes.
> - group name, and by that I think that we had perhaps better make
> mandatory the need to append a name with a quorum or priority group.
> The group at the highest level is forcibly named as 'top', 'main', or
> whatever if not directly specified by the user. If the entry is
> directly a node, use the application_name.
> - Type of group, quorum or priority
> - Elements in this group, an element can be a group name or a node
> name, aka application_name. If group is of type priority, the elements
> are listed in increasing order. So the elements with lower priority
> get first, etc. We could have one column listing explicitly a list of
> integers that map with the elements of a group but it does not seem
> worth it, what users would like to know is what are the nodes that are
> prioritized. This covers the former 'priority' field of
> pg_stat_replication.
>
> We may have a good idea of how to define a custom language, still we
> are going to need to design a clean interface at catalog level more or
> less close to what is written here. If we can get a clean interface,
> the custom language implemented, and TAP tests that take advantage of
> this user interface to check the node/group statuses, I guess that we
> would be in good shape for this patch.
>
> Anyway that's not a small project, and perhaps I am over-complicating
> the whole thing.
>

I agree with adding new system catalog to easily checking replication
status for user. And group name will needed for this.
What about adding group name with ":" to immediately after set of
standbys like follows?

2[local, 2[london1, london2, london3]:london, (tokyo1, tokyo2):tokyo]

Also, regarding sync replication according to configuration, the view
I'm thinking is following definition.

- "name" : node name or group name, or "main" meaning top level node.
- "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL.
- "wait_num" : number of nodes/groups to wait for in this group.
- "sync_priority" : priority of node/group in this group. "main" node has "0".
- the standby is in quorum group always has
priority 1.
- the standby is in priority group has
priority according to definition order.
- "sync_state" : 'sync' or 'potential' or 'quorum'.
- the standby is in quorum group is always 'quorum'.
- the standby is in priority group is 'sync'
/ 'potential'.
- "member" : array of members for group node, otherwise NULL.
- "level" : nested level. "main" node is level 0.
- "write/flush/apply_location" : group/node calculated LSN according
to configuration.

When sync replication is set as above, the new system view shows,

=# select * from pg_stat_replication_group;
name | sync_type | wait_num | sync_priority | sync_state |
member | level | write_location | flush_location |
apply_location
-------------+---------------+---------------+-------------------+-----------------+---------------------------------------+-------+---------------------+---------------------+----------------
main | priority | 2 | 0 | sync
| {local,london,tokyo} | 0 |
| |
local | | 0 | 1 |
sync | | 1 |
| |
london | quorum | 2 | 2 | potential
| {london1,london2,london3} | 1 | |
|
london1 | | 0 | 1 |
potential | | 2 |
| |
london2 | | 0 | 2 |
potential | | 2 |
| |
london3 | | 0 | 3 |
potential | | 2 |
| |
tokyo | quorum | 1 | 3 | potential
| {tokyo1,tokyo2} | 1 |
| |
tokyo1 | | 0 | 1 |
quorum | | 2 |
| |
tokyo2 | | 0 | 1 |
quorum | | 2 |
| |
(9 rows)

Thought?

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-05 09:59:42
Message-ID:	CAB7nPqQY1gvSPtd6pU_AMnMVLL3DZUDVPdwMAUME6qUEAKg1Aw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 5, 2016 at 12:19 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> I agree with adding new system catalog to easily checking replication
> status for user. And group name will needed for this.
> What about adding group name with ":" to immediately after set of
> standbys like follows?

This way is fine for me.

Check.

> - "sync_type" : 'priority' or 'quorum' for group node, otherwise NULL.

That would be one or the other.

> - "wait_num" : number of nodes/groups to wait for in this group.

Check. This is taken directly from the meta data.

> - "sync_priority" : priority of node/group in this group. "main" node has "0".
> - the standby is in quorum group always has
> priority 1.
> - the standby is in priority group has
> priority according to definition order.

This is a bit confusing if the same node or group in in multiple
groups. My previous suggestion was to list the elements of the group
in increasing order of priority. That's an important point.

> - "sync_state" : 'sync' or 'potential' or 'quorum'.
> - the standby is in quorum group is always 'quorum'.
> - the standby is in priority group is 'sync'
> / 'potential'.

potential and quorum are the same thing, no? The only difference is
based on the group type here.

> - "member" : array of members for group node, otherwise NULL.

This can be NULL only when the entry is a node.

> - "level" : nested level. "main" node is level 0.

Not sure this one is necessary.

> - "write/flush/apply_location" : group/node calculated LSN according
> to configuration.

This does not need to be part of this catalog, that's a representation
of the data that is part of the WAL sender.
--
Michael

From:	kharagesuraj <suraj(dot)kharage(at)nttdata(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-08 05:58:21
Message-ID:	1454911101210-5886259.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hello,

I have tested v7 patch.
but i think you forgot to remove some debug points in patch from
src/backend/replication/syncrep.c file.

for (i = 0; i < num_sync; i++)
+ {
+ elog(WARNING, "sync_standbys[%d] = %d", i, sync_standbys[i]);
+ }
+ elog(WARNING, "num_sync = %d, s_s_num = %d", num_sync,
synchronous_standby_num);

Please correct my understanding if i am wrong.

Regards
Suraj Kharage

--
View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886259.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-08 15:48:57
Message-ID:	CAHGQGwHnTKmd90Vu19Swu0C+2mnWxvAH=1FE=-xUbo3s94pRRg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I agree that we would need something like such new view in the future,
however it seems too late to work on that for 9.6 unfortunately.
There is only one CommitFest left. Let's focus on very simple case, i.e.,
1-level priority list, now, then we can extend it to cover other cases.

If we can commit the simple version too early and there is enough
time before the date of feature freeze, of course I'm happy to review
the extended version like you proposed, for 9.6.

Regards,

--
Fujii Masao

From:	kharagesuraj <suraj(dot)kharage(at)nttdata(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 03:16:01
Message-ID:	23982A0EFC8B464EA444A66145FC34158EBEEFE3@MAIL222
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

>> I agree with first version, and attached the updated patch which are
>> modified so that it supports simple multiple sync replication you
>>suggested.
>> (but test cases are not included yet.)

I have tried for some basic in-built test cases for multisync rep.
I have created one patch over Michael's <a href="http://www.postgresql.org/message-id/CAB7nPqTEqou=xrYrGSgA13QW1xxsSD6tFHz-Sm_J3EgDvSOCHw@mail.gmail.com">patch</a> patch.
Still it is in progress.
Please have look and correct me if i am wrong and suggest remaining test cases.

Regards
Suraj Kharage

________________________________
If you reply to this email, your message will be added to the discussion below:
http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886259.html
This email was sent by kharagesuraj<http://postgresql.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=346648> (via Nabble)
To receive all replies by email, subscribe to this discussion<http://postgresql.nabble.com/template/NamlServlet.jtp?macro=subscribe_by_code&node=5849384&code=c3VyYWoua2hhcmFnZUBudHRkYXRhLmNvbXw1ODQ5Mzg0fC0xOTM1NzcyNDkx>

______________________________________________________________________
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the intended recipient,
please advise the sender by replying promptly to this email and then delete
and destroy this email and any attachments without any further use, copying
or forwarding.

recovery_test_suite_with_multisync.patch (36K) <http://postgresql.nabble.com/attachment/5886503/0/recovery_test_suite_with_multisync.patch>

--
View this message in context: http://postgresql.nabble.com/Support-for-N-synchronous-standby-servers-take-2-tp5849384p5886503.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	kharagesuraj <suraj(dot)kharage(at)nttdata(dot)com>
Cc:	PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 04:15:15
Message-ID:	CAB7nPqRfx9YFnKT1T0TP5AvKTaECwr+WST2y0bNox91O1n_rxQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 9, 2016 at 12:16 PM, kharagesuraj <suraj(dot)kharage(at)nttdata(dot)com>
wrote:

> Hello,
>
>
>
>
>
> >> I agree with first version, and attached the updated *patch* which are
> >> modified so that it supports simple multiple sync replication you
> >>suggested.
> >> (but test cases are not included yet.)
>
>
>
> I have tried for some basic in-built test cases for multisync rep.
>
> I have created one patch over Michael's <a href="
> http://www.postgresql.org/message-id/CAB7nPqTEqou=[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=5886503&i=0>">patch</a> patch.
>
> Still it is in progress.
>
> Please have look and correct me if i am wrong and suggest remaining test
> cases.
>

So the interesting part of this patch is 006_sync_rep.pl. I think that you
had better build something on top of my patch as a separate patch. This
would make things clearer.

+my $result = $node_master->psql('postgres', "select application_name,
sync_state from pg_stat_replication;");
+print "$result \n";
+is($result, "standby_1|sync\nstandby_2|sync\nstandby_3|potential",
'checked for sync standbys state initially');
Now regarding the test, you visibly got the idea, though I think that we'd
want to update a bit the parameters of postgresql.conf and re-run those
queries a couple of times, that's cheaper than having to re-create new
cluster nodes all the time, so just create a base, then switch s_s_names a
bit, and query pg_stat_replication, and you are already doing the latter.

Also, please attach patches directly to your emails. When loading something
on nabble this is located only there and not within postgresql.org which
would be annoying if nabble disappears at some point. You would also want
to use directly an email client and interact with the community mailing
lists this way instead of going through the nabble's forum-like interface
(never used it, not really willing to use it, but I guess that it is
similar to that).

I am attaching what you posted on this email for the archive's sake.
--
Michael

Attachment	Content-Type	Size
recovery_test_suite_with_multisync.patch	application/x-download	26.7 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 04:16:21
Message-ID:	20160209.131621.54420844.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHnTKmd90Vu19Swu0C+2mnWxvAH=1FE=-xUbo3s94pRRg(at)mail(dot)gmail(dot)com>
> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
> >> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
> >>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
> >>>>> Yes, please let's use the custom language, and let's not care of not
> >>>>> more than 1 level of nesting so as it is possible to represent
> >>>>> pg_stat_replication in a simple way for the user.
> >>>>
> >>>> "not" is used twice in this sentence in a way that renders me not able
> >>>> to be sure that I'm not understanding it not properly.
> >>>
> >>> 4 times here. Score beaten.
> >>>
> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
> >>> to only support configurations up to one level of nested objects, like
> >>> that:
> >>> 2[node1, node2, node3]
> >>> node1, 2[node2, node3], node3
> >>> In short, we could restrict things so as we cannot define a group of
> >>> nodes within an existing group.
> >>
> >> No, actually, that's stupid. Having up to two nested levels makes more
> >> sense, a quite common case for this feature being something like that:
> >> 2{node1,[node2,node3]}
> >> In short, sync confirmation is waited from node1 and (node2 or node3).
> >>
> >> Flattening groups of nodes with a new catalog will be necessary to
> >> ease the view of this data to users:
> >> - group name?
> >> - array of members with nodes/groups
> >> - group type: quorum or priority
> >> - number of items to wait for in this group
> >
> > So, here are some thoughts to make that more user-friendly. I think
> > that the critical issue here is to properly flatten the meta data in
> > the custom language and represent it properly in a new catalog,
> > without messing up too much with the existing pg_stat_replication that
> > people are now used to for 5 releases since 9.0. So, I would think
> > that we will need to have a new catalog, say
> > pg_stat_replication_groups with the following things:
> > - One line of this catalog represents the status of a group or of a single node.
> > - The status of a node/group is either sync or potential, if a
> > node/group is specified more than once, it may be possible that it
> > would be sync and potential depending on where it is defined, in which
> > case setting its status to 'sync' has the most sense. If it is in sync
> > state I guess.
> > - Move sync_priority and sync_state, actually an equivalent from
> > pg_stat_replication into this new catalog, because those represent the
> > status of a node or group of nodes.
> > - group name, and by that I think that we had perhaps better make
> > mandatory the need to append a name with a quorum or priority group.
> > The group at the highest level is forcibly named as 'top', 'main', or
> > whatever if not directly specified by the user. If the entry is
> > directly a node, use the application_name.
> > - Type of group, quorum or priority
> > - Elements in this group, an element can be a group name or a node
> > name, aka application_name. If group is of type priority, the elements
> > are listed in increasing order. So the elements with lower priority
> > get first, etc. We could have one column listing explicitly a list of
> > integers that map with the elements of a group but it does not seem
> > worth it, what users would like to know is what are the nodes that are
> > prioritized. This covers the former 'priority' field of
> > pg_stat_replication.
> >
> > We may have a good idea of how to define a custom language, still we
> > are going to need to design a clean interface at catalog level more or
> > less close to what is written here. If we can get a clean interface,
> > the custom language implemented, and TAP tests that take advantage of
> > this user interface to check the node/group statuses, I guess that we
> > would be in good shape for this patch.
> >
> > Anyway that's not a small project, and perhaps I am over-complicating
> > the whole thing.
> >
> > Thoughts?
>
> I agree that we would need something like such new view in the future,
> however it seems too late to work on that for 9.6 unfortunately.
> There is only one CommitFest left. Let's focus on very simple case, i.e.,
> 1-level priority list, now, then we can extend it to cover other cases.
>
> If we can commit the simple version too early and there is enough
> time before the date of feature freeze, of course I'm happy to review
> the extended version like you proposed, for 9.6.

I agree to Fujii-san. There would be many of convenient gadgets
around this and they are completely welcome, but having
fundamental functionality in 9.6 seems to be far benetifical for
most of us.

At least the extensible syntax is fixed, internal structures can
be gradually exnteded along with syntactical enhancement. Over
three levels of definition or group name are syntactically
reserved and they are allowed to be nothing for now. JSON could
be added but it is too complicated for simple cases.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	kharagesuraj <suraj(dot)kharage(at)nttdata(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 04:18:40
Message-ID:	56B968A0.5060804@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Suraj,

On 2016/02/09 12:16, kharagesuraj wrote:
> Hello,
>
>
>>> I agree with first version, and attached the updated patch which are
>>> modified so that it supports simple multiple sync replication you
>>> suggested.
>>> (but test cases are not included yet.)
>
> I have tried for some basic in-built test cases for multisync rep.
> I have created one patch over Michael's <a href="http://www.postgresql.org/message-id/CAB7nPqTEqou=xrYrGSgA13QW1xxsSD6tFHz-Sm_J3EgDvSOCHw@mail.gmail.com">patch</a> patch.
> Still it is in progress.
> Please have look and correct me if i am wrong and suggest remaining test cases.
>
> recovery_test_suite_with_multisync.patch (36K) <http://postgresql.nabble.com/attachment/5886503/0/recovery_test_suite_with_multisync.patch>

Thanks for creating the patch. Sorry to nitpick but as has been brought up
before, it's better to send patches as email attachments (that is, not as
a links to external sites).

Also, it would be helpful if your patch is submitted as a diff over
applying Michael's patch. That is, only the stuff specific to testing the
multiple sync feature and let the rest be taken care of by Michael's base
patch.

Thanks,
Amit

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, sawada(dot)mshk(at)gmail(dot)com, thom(at)linux(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, amit(dot)kapila16(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 04:31:46
Message-ID:	CAB7nPqSJgDLLsVk_Et-O=NBfJNqx3GbHszCYGvuTLRxHaZV3xQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 9, 2016 at 1:16 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Tue, 9 Feb 2016 00:48:57 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHnTKmd90Vu19Swu0C+2mnWxvAH=1FE=-xUbo3s94pRRg(at)mail(dot)gmail(dot)com>
>> On Fri, Feb 5, 2016 at 5:36 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> > On Thu, Feb 4, 2016 at 11:06 PM, Michael Paquier
>> > <michael(dot)paquier(at)gmail(dot)com> wrote:
>> >> On Thu, Feb 4, 2016 at 10:49 PM, Michael Paquier
>> >> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> >>> On Thu, Feb 4, 2016 at 10:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> >>>> On Thu, Feb 4, 2016 at 2:21 PM, Michael Paquier
>> >>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> >>>>> Yes, please let's use the custom language, and let's not care of not
>> >>>>> more than 1 level of nesting so as it is possible to represent
>> >>>>> pg_stat_replication in a simple way for the user.
>> >>>>
>> >>>> "not" is used twice in this sentence in a way that renders me not able
>> >>>> to be sure that I'm not understanding it not properly.
>> >>>
>> >>> 4 times here. Score beaten.
>> >>>
>> >>> Sorry. Perhaps I am tired... I was just wondering if it would be fine
>> >>> to only support configurations up to one level of nested objects, like
>> >>> that:
>> >>> 2[node1, node2, node3]
>> >>> node1, 2[node2, node3], node3
>> >>> In short, we could restrict things so as we cannot define a group of
>> >>> nodes within an existing group.
>> >>
>> >> No, actually, that's stupid. Having up to two nested levels makes more
>> >> sense, a quite common case for this feature being something like that:
>> >> 2{node1,[node2,node3]}
>> >> In short, sync confirmation is waited from node1 and (node2 or node3).
>> >>
>> >> Flattening groups of nodes with a new catalog will be necessary to
>> >> ease the view of this data to users:
>> >> - group name?
>> >> - array of members with nodes/groups
>> >> - group type: quorum or priority
>> >> - number of items to wait for in this group
>> >
>> > So, here are some thoughts to make that more user-friendly. I think
>> > that the critical issue here is to properly flatten the meta data in
>> > the custom language and represent it properly in a new catalog,
>> > without messing up too much with the existing pg_stat_replication that
>> > people are now used to for 5 releases since 9.0. So, I would think
>> > that we will need to have a new catalog, say
>> > pg_stat_replication_groups with the following things:
>> > - One line of this catalog represents the status of a group or of a single node.
>> > - The status of a node/group is either sync or potential, if a
>> > node/group is specified more than once, it may be possible that it
>> > would be sync and potential depending on where it is defined, in which
>> > case setting its status to 'sync' has the most sense. If it is in sync
>> > state I guess.
>> > - Move sync_priority and sync_state, actually an equivalent from
>> > pg_stat_replication into this new catalog, because those represent the
>> > status of a node or group of nodes.
>> > - group name, and by that I think that we had perhaps better make
>> > mandatory the need to append a name with a quorum or priority group.
>> > The group at the highest level is forcibly named as 'top', 'main', or
>> > whatever if not directly specified by the user. If the entry is
>> > directly a node, use the application_name.
>> > - Type of group, quorum or priority
>> > - Elements in this group, an element can be a group name or a node
>> > name, aka application_name. If group is of type priority, the elements
>> > are listed in increasing order. So the elements with lower priority
>> > get first, etc. We could have one column listing explicitly a list of
>> > integers that map with the elements of a group but it does not seem
>> > worth it, what users would like to know is what are the nodes that are
>> > prioritized. This covers the former 'priority' field of
>> > pg_stat_replication.
>> >
>> > We may have a good idea of how to define a custom language, still we
>> > are going to need to design a clean interface at catalog level more or
>> > less close to what is written here. If we can get a clean interface,
>> > the custom language implemented, and TAP tests that take advantage of
>> > this user interface to check the node/group statuses, I guess that we
>> > would be in good shape for this patch.
>> >
>> > Anyway that's not a small project, and perhaps I am over-complicating
>> > the whole thing.
>> >
>> > Thoughts?
>>
>> I agree that we would need something like such new view in the future,
>> however it seems too late to work on that for 9.6 unfortunately.
>> There is only one CommitFest left. Let's focus on very simple case, i.e.,
>> 1-level priority list, now, then we can extend it to cover other cases.
>>
>> If we can commit the simple version too early and there is enough
>> time before the date of feature freeze, of course I'm happy to review
>> the extended version like you proposed, for 9.6.
>
> I agree to Fujii-san. There would be many of convenient gadgets
> around this and they are completely welcome, but having
> fundamental functionality in 9.6 seems to be far benetifical for
> most of us.

Hm. Rushing features in because we need them now is not really
community-like. I'd rather not have us taking decisions like that
knowing that we may pay a certain price in the long-term, while it
pays in the short term, aka the 9.6 release. However, having a base in
place for the mini-language would give enough room for future
improvements, so I am fine with having only 1-level of nesting, with
{} and [] supported. This can as well be simply represented within
pg_stat_replication because we'd have basically only one group of
nodes for now (if I got the idea correctly), the and status of each
entry in pg_stat_replication would just need to reflect either
potential or sync, which is something that now users are used to.

So, if I got the vibe correctly, we would basically just allow that in
a first shot:
N{node_list}, to define a priority group
N[node_list], to define a quorum group
There can be only one group, and elements in a node list cannot be a
group. No need of group names either.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 13:32:17
Message-ID:	CAB7nPqQs79wTHFhB03kbkt+5Hc-iczFp7xTdo6T6u8TFrCeFBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
> Also, to be frank, I think we ought to be putting more effort into
> another patch in this same area, specifically Thomas Munro's causal
> reads patch. I think a lot of people today are trying to use
> synchronous replication to build load-balancing clusters and avoid the
> problem where you write some data and then read back stale data from a
> standby server. Of course, our current synchronous replication
> facilities make no such guarantees - his patch does, and I think
> that's pretty important. I'm not saying that we shouldn't do this
> too, of course.

Yeah, sure. Each one of those patches is trying to solve a different
problem where Postgres is deficient, here we'd like to be sure a
commit WAL record is correctly flushed on multiple standbys, while the
patch of Thomas is trying to ensure that there is no need to scan for
the replay position of a standby using some GUC parameters and a
validation/sanity layer in syncrep.c to do that. Surely the patch of
this thread has got more attention than Thomas', and both of them have
merits and try to address real problems. FWIW, the patch of Thomas is
a topic that I find rather interesting, and I am planning to look at
it as well, perhaps for next CF or even before that. We'll see how
other things move on.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 16:36:54
Message-ID:	CAD21AoAiEeEM2NcQ6dQv7xf4pHi6FUZ0QYfb-0rwgrL9zcKEvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 9, 2016 at 10:32 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
>> Also, to be frank, I think we ought to be putting more effort into
>> another patch in this same area, specifically Thomas Munro's causal
>> reads patch. I think a lot of people today are trying to use
>> synchronous replication to build load-balancing clusters and avoid the
>> problem where you write some data and then read back stale data from a
>> standby server. Of course, our current synchronous replication
>> facilities make no such guarantees - his patch does, and I think
>> that's pretty important. I'm not saying that we shouldn't do this
>> too, of course.
>
> Yeah, sure. Each one of those patches is trying to solve a different
> problem where Postgres is deficient, here we'd like to be sure a
> commit WAL record is correctly flushed on multiple standbys, while the
> patch of Thomas is trying to ensure that there is no need to scan for
> the replay position of a standby using some GUC parameters and a
> validation/sanity layer in syncrep.c to do that. Surely the patch of
> this thread has got more attention than Thomas', and both of them have
> merits and try to address real problems. FWIW, the patch of Thomas is
> a topic that I find rather interesting, and I am planning to look at
> it as well, perhaps for next CF or even before that. We'll see how
> other things move on.

Attached first version dedicated language patch (document patch is not yet.)

This patch supports only 1-nest priority method, but this feature will
be expanded with adding quorum method or > 1 level nesting.
So this patch are implemented while being considered about its extensibility.
And I've implemented the new system view we discussed on this thread
but that feature is not included in this patch (because it's not
necessary yet now)

== Syntax ==
s_s_names can have two type syntaxes like follows,

1. s_s_names = 'node1, node2, node3'
2. s_s_names = '2[node1, node2, node3]'

#1 syntax is for backward compatibility, which implies the master
server wait for only 1 server.
#2 syntax is new syntax using dedicated language.

In above #2 setting, node1 standby has lowest priority and node3
standby has highest priority.
And master server will wait for COMMIT until at least 2 lowest
priority standbys send ACK to master.

== Memory Structure ==
Previously, master server has value of s_s_names as string, and used
it when master server determine standby priority.
This patch changed it so that master server has new memory structure
(called SyncGroupNode) in order to be able to handle multiple (and
nested in the future) standby nodes flexibly.
All information of SyncGroupNode are set during parsing s_s_names.

The memory structure is,

struct SyncGroupNode
{
/* Common information */
int type;
char *name;
SyncGroupNode *next; /* same group next name node */

/* For group ndoe */
int sync_method; /* priority */
int wait_num;
SyncGroupNode *member; /* member of its group */
bool (*SyncRepGetSyncedLsnsFn) (SyncGroupNode *group, XLogRecPtr *write_pos,
XLogRecPtr *flush_pos);
int (*SyncRepGetSyncStandbysFn) (SyncGroupNode *group, int *list);
};

SyncGroupNode can be different two types; name node, group node, and
have pointer to another name/group node in same group and list of
group members.
name node represents a synchronous standby.
group node represents a group of some name nodes, which can have list
of group member, and synchronous method, number of waiting node.
The list of members are linked with one-way list, and are located in
s_s_names definition order.
e.g. in case of above #2 setting, member list could be,

"main".member -> "node1".next -> "node2".next -> "node3".next -> NULL

The most top level node is always "main" group node. i.g., in this
version patch, only 1 group ("main" group) is created which has some
name nodes (not group node).
And group node has two functions pointer;

* SyncRepGetSyncedLsnsFn
This function decides group write/flush LSNs at that moment.
For example in case of priority method, the lowest LSNs of standbys
that are considered as synchronous should be selected.
If there are not synchronous standbys enough to decide LSNs then this
function return false.

* SyncRepGetSyncStandbysFn :
This function obtains array of walsnd positions of its standby members
that are considered as synchronous.

This implementation might not good in some reason, so please give me feedbacks.
And I will create new commitfest entry for this patch to CF5.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v8.patch	text/x-patch	27.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-09 17:57:54
Message-ID:	CAHGQGwHR1MNpAgRMh9T0oy0OnydkGaymcNgVOE-1VLZ8Z9twjA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Feb 9, 2016 at 10:32 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
>>> Also, to be frank, I think we ought to be putting more effort into
>>> another patch in this same area, specifically Thomas Munro's causal
>>> reads patch. I think a lot of people today are trying to use
>>> synchronous replication to build load-balancing clusters and avoid the
>>> problem where you write some data and then read back stale data from a
>>> standby server. Of course, our current synchronous replication
>>> facilities make no such guarantees - his patch does, and I think
>>> that's pretty important. I'm not saying that we shouldn't do this
>>> too, of course.
>>
>> Yeah, sure. Each one of those patches is trying to solve a different
>> problem where Postgres is deficient, here we'd like to be sure a
>> commit WAL record is correctly flushed on multiple standbys, while the
>> patch of Thomas is trying to ensure that there is no need to scan for
>> the replay position of a standby using some GUC parameters and a
>> validation/sanity layer in syncrep.c to do that. Surely the patch of
>> this thread has got more attention than Thomas', and both of them have
>> merits and try to address real problems. FWIW, the patch of Thomas is
>> a topic that I find rather interesting, and I am planning to look at
>> it as well, perhaps for next CF or even before that. We'll see how
>> other things move on.
>
> Attached first version dedicated language patch (document patch is not yet.)

Thanks for the patch! Will review it.

I think that it's time to write the documentation patch.

Though I've not read the patch yet, I found that your patch
changed s_s_names so that it rejects non-alphabet character
like *, according to my simple test. It should accept any
application_name which we can use.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 00:18:22
Message-ID:	CAB7nPqQjj37iF16sDNFg93Fe1cSiDzxggoC4FYMANiqLw-1h1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 10, 2016 at 2:57 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Attached first version dedicated language patch (document patch is not yet.)
>
> Thanks for the patch! Will review it.
>
> I think that it's time to write the documentation patch.
>
> Though I've not read the patch yet, I found that your patch
> changed s_s_names so that it rejects non-alphabet character
> like *, according to my simple test. It should accept any
> application_name which we can use.

Cool. Planning to look at it as well. Could you as well submit a
regression test based on the recovery infrastructure and submit it as
a separate patch? There is a version upthread of such a test but it
would be good to extract it properly.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 00:39:22
Message-ID:	20160210.093922.252747962.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 9 Feb 2016 13:31:46 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqSJgDLLsVk_Et-O=NBfJNqx3GbHszCYGvuTLRxHaZV3xQ(at)mail(dot)gmail(dot)com>
> On Tue, Feb 9, 2016 at 1:16 PM, Kyotaro HORIGUCHI
> >> > Anyway that's not a small project, and perhaps I am over-complicating
> >> > the whole thing.
> >> >
> >> > Thoughts?
> >>
> >> I agree that we would need something like such new view in the future,
> >> however it seems too late to work on that for 9.6 unfortunately.
> >> There is only one CommitFest left. Let's focus on very simple case, i.e.,
> >> 1-level priority list, now, then we can extend it to cover other cases.
> >>
> >> If we can commit the simple version too early and there is enough
> >> time before the date of feature freeze, of course I'm happy to review
> >> the extended version like you proposed, for 9.6.
> >
> > I agree to Fujii-san. There would be many of convenient gadgets
> > around this and they are completely welcome, but having
> > fundamental functionality in 9.6 seems to be far benetifical for
> > most of us.
>
> Hm. Rushing features in because we need them now is not really
> community-like. I'd rather not have us taking decisions like that
> knowing that we may pay a certain price in the long-term, while it
> pays in the short term, aka the 9.6 release. However, having a base in
> place for the mini-language would give enough room for future
> improvements, so I am fine with having only 1-level of nesting, with
> {} and [] supported. This can as well be simply represented within
> pg_stat_replication because we'd have basically only one group of
> nodes for now (if I got the idea correctly), the and status of each
> entry in pg_stat_replication would just need to reflect either
> potential or sync, which is something that now users are used to.

I agree to be more prudent for more 'stiff', a
hard-to-modify-later things. But if once we decede to use []{}
format at the beginning (I believe) for this feature, it is
surely nextensible enough and 1-level of replication sets is
sufficient to cover many new cases and make implement
simple. Internal structure can be evolutionary in contrast to its
user interface. Such a way of development is I don't think not
community-like, concerning the cases like this.

Anyway thank you very much for understanding.

> So, if I got the vibe correctly, we would basically just allow that in
> a first shot:
> N{node_list}, to define a priority group
> N[node_list], to define a quorum group
> There can be only one group, and elements in a node list cannot be a
> group. No need of group names either.
> --

That's quite reasonable for the first release of this feature. We
can/should consider the extensibility of the implement of this
feature through reviewing.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 02:25:49
Message-ID:	CAD21AoCHytB88ZdC0899J7PLNTKWTg0gczC2M7dqLmK71vdY0w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 10, 2016 at 9:18 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Feb 10, 2016 at 2:57 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> Attached first version dedicated language patch (document patch is not yet.)
>>
>> Thanks for the patch! Will review it.
>>
>> I think that it's time to write the documentation patch.
>>
>> Though I've not read the patch yet, I found that your patch
>> changed s_s_names so that it rejects non-alphabet character
>> like *, according to my simple test. It should accept any
>> application_name which we can use.
>
> Cool. Planning to look at it as well. Could you as well submit a
> regression test based on the recovery infrastructure and submit it as
> a separate patch? There is a version upthread of such a test but it
> would be good to extract it properly.

Yes, I will implement regression test patch and documentation patch as well.

Attached latest version patch supporting s_s_names = '*'.
Unlike currently behaviour a bit, s_s_names can have only one '*' character.
e.g, The following setting will get syntax error.

s_s_names = '*, node1,node2'
s_s_names = `2[node1, *, node2]`

when we use '*' character as s_s_names element, we must set s_s_names
like follows.

s_s_names = '*'
s_s_names = '2[*]'

BTW, we've discussed about mini language syntax.
IIRC, the syntax uses [] and () like,
'N[node1, node2, ...]', to define priority standbys.
'N(node1, node2, ...)', to define quorum standbys.
And current patch behaves so.

Which type of parentheses should be used for this syntax to be more clarity?
Or other character should be used such as <>, // ?

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v9.patch	binary/octet-stream	28.8 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 03:20:48
Message-ID:	20160210.122048.202439289.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Wed, 10 Feb 2016 02:57:54 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHR1MNpAgRMh9T0oy0OnydkGaymcNgVOE-1VLZ8Z9twjA(at)mail(dot)gmail(dot)com>
> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Tue, Feb 9, 2016 at 10:32 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Wed, Feb 3, 2016 at 7:33 AM, Robert Haas wrote:
> >>> Also, to be frank, I think we ought to be putting more effort into
> >>> another patch in this same area, specifically Thomas Munro's causal
> >>> reads patch. I think a lot of people today are trying to use
> >>> synchronous replication to build load-balancing clusters and avoid the
> >>> problem where you write some data and then read back stale data from a
> >>> standby server. Of course, our current synchronous replication
> >>> facilities make no such guarantees - his patch does, and I think
> >>> that's pretty important. I'm not saying that we shouldn't do this
> >>> too, of course.
> >>
> >> Yeah, sure. Each one of those patches is trying to solve a different
> >> problem where Postgres is deficient, here we'd like to be sure a
> >> commit WAL record is correctly flushed on multiple standbys, while the
> >> patch of Thomas is trying to ensure that there is no need to scan for
> >> the replay position of a standby using some GUC parameters and a
> >> validation/sanity layer in syncrep.c to do that. Surely the patch of
> >> this thread has got more attention than Thomas', and both of them have
> >> merits and try to address real problems. FWIW, the patch of Thomas is
> >> a topic that I find rather interesting, and I am planning to look at
> >> it as well, perhaps for next CF or even before that. We'll see how
> >> other things move on.
> >
> > Attached first version dedicated language patch (document patch is not yet.)
>
> Thanks for the patch! Will review it.
>
> I think that it's time to write the documentation patch.
>
> Though I've not read the patch yet, I found that your patch
> changed s_s_names so that it rejects non-alphabet character
> like *, according to my simple test. It should accept any
> application_name which we can use.

Thanks for the quick response. At a glance, I'd like to show you
some random suggestions, mainly on writing conventions.

===
Running postgresql with s_s_names = '*', makes error as Fujii-san
said. And it yeilds the following message.

| $ postgres
| FATAL: syntax error: unexpected character "*"

Mmm.. It should be tough to find what has happened..

===

check_synchronous_standby_names frees parsed SyncRepStandbyNames
immediately but no reason is explained there. The following
comment looks to be saying something related to this but it
doesn't explain the reason to free.

+ /*
+ * Any additional validation of standby names should go here.
+ *
+ * Don't attempt to set WALSender priority because this is executed by
+ * postmaster at startup, not WALSender, so the application_name is not
+ * yet correctly set.
+ */

Addtion to that, I'd like to see a description like
'syncgroup_yyparse sets the global SyncRepStandbyNames as side
effect' around it.

===
malloc/free are used in create_name_node and other functions to
be used in scanner, but syncgroup_gram.y is said to use
palloc/pfree. Maybe they should use the same memory
allocation/freeing functions.

===
The variable name SyncRepStandbyNames holds the list of
SyncGroupNode*. This is somewhat confusing. How about
SyncRepStandbys?

===
+static void
+SyncRepClearStandbyGroupList(SyncGroupNode *group)
+{
+ SyncGroupNode *n = group->member;

The name 'n' is a bit confusing, I believe that the one-letter
variables should be used following implicit (and ancient?)
convention otherwise pretty short-term and obvious cases. name,
or group_name instead might be better. There's similar usage of
'n' in other places.

===
+ * Find active walsender position of WalSnd by name. Returns index of walsnds
+ * array if found, otherwise return -1.

I didn't get what is 'walsender position' within this
comment. And as the discussion upthread, there can be multiple
walsenders with the same name. So this might be like this.

> * Finds the first active synchronous walsender with given name
> * in WalSndCtl->wansnds and returns the index of that. Returns
> * -1 if not found.

===
+ * Get both synced LSNS: write and flush, using its group function and check
+ * whether each LSN has advanced to, or not.

This is question for all. Which to use synced, synched or
synchronized? Maybe we should use non-abbreviated spellings
unless the description become too long to make it hard to read.

> * Return true if we have enough synchronized standbys and the 'safe'
> * written and flushed LSNs, which are LSNs assured in all standbys
> * considered should be synchronized.

# Please rewrite me.

===
+SyncRepSyncedLsnAdvancedTo(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
+{
+ XLogRecPtr cur_write_pos;
+ XLogRecPtr cur_flush_pos;
+ bool ret;

The name cur_*_pos are a bit confusing. They hold LSNs where all
of standbys choosed as synchronized ones. So how about
safe_*_pos? And 'ret' is not the return value of this function
and it can have more specific name, such like... satisfied? or
else..

===
+SyncRepSyncedLsnAdvancedTo(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
...
+ /* Check whether each LSN has advanced to */
+ if (ret)
+ {
...
+ return true;
+ }
+
+ return false;

This might be a kind of favor, It would be simple to be written with
reverse-condition.

===
+ SyncRepSyncedLsnAdvancedTo(XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
...
+ ret = SyncRepStandbyNames->SyncRepGetSyncedLsnsFn(SyncRepStandbyNames,
+ &cur_write_pos,
+ &cur_flush_pos);
...
+ if (MyWalSnd->write >= cur_write_pos)

I suppose SyncRepGetSyncedLsnsFn, or SyncRepGetSyncedLsnsPriority
can return InvalidXLogRecPtr as cur_*_pos even when it returns
true. And, I suppose comparison of LSN values with
InvalidXLogRecPtr is not well-defined. Anyway the condition goes
wrong when cur_write_pos = InvalidXLogRecPtr (but ret = true).

===
+ * Obtain a array containing positions of standbys of specified group
+ * currently considered as synchronous up to wait_num of its group.
+ * Caller is respnsible for allocating the data obtained.

# Anyone please reedit my rewriting below.. Perhaps my writing is
# quite unreadable..

> * Return the positions of the first group->wait_num
> * synchronized standbys in group->member list into
> * sync_list. sync_list is assumed to have enough space for
> * at least group->wait_num elements.

===
+bool
+SyncRepGetSyncedLsnsPriority(SyncGroupNode *group, XLogRecPtr *write_pos, XLogRecPtr *flush_pos)
+{
...
+ for(n = group->member; n != NULL; n = n->next)

group->member holds two or more items, so the name would be
better to be group->members, or member_list.

===
+ /* We already got enough synchronous standbys, return */
+ if (num == group->wait_num)

As convention for saftiness, this kind of comparison is to use
inequality operators.

> if (num >= group->wait_num)

===
At a glance, SyncRepGetSyncedLsnsPriority and
SyncRepGetSyncStandbysPriority does almost the same thing and both
runs loops over group members. Couldn't they run at once?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	michael(dot)paquier(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 03:40:51
Message-ID:	20160210.124051.88302213.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Wed, 10 Feb 2016 11:25:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCHytB88ZdC0899J7PLNTKWTg0gczC2M7dqLmK71vdY0w(at)mail(dot)gmail(dot)com>
> On Wed, Feb 10, 2016 at 9:18 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Wed, Feb 10, 2016 at 2:57 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >> On Wed, Feb 10, 2016 at 1:36 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >>> Attached first version dedicated language patch (document patch is not yet.)
> >>
> >> Thanks for the patch! Will review it.
> >>
> >> I think that it's time to write the documentation patch.
> >>
> >> Though I've not read the patch yet, I found that your patch
> >> changed s_s_names so that it rejects non-alphabet character
> >> like *, according to my simple test. It should accept any
> >> application_name which we can use.
> >
> > Cool. Planning to look at it as well. Could you as well submit a
> > regression test based on the recovery infrastructure and submit it as
> > a separate patch? There is a version upthread of such a test but it
> > would be good to extract it properly.
>
> Yes, I will implement regression test patch and documentation patch as well.
>
> Attached latest version patch supporting s_s_names = '*'.
> Unlike currently behaviour a bit, s_s_names can have only one '*' character.
> e.g, The following setting will get syntax error.
>
> s_s_names = '*, node1,node2'
> s_s_names = `2[node1, *, node2]`

We could use the setting s_s_names = 'node1, node2, *' as a
extended representation of old s_s_names. It tests node1, node2
as first and try any name if they failed. Similary, '2[node1,
node2, *]' is also meaningful.

> when we use '*' character as s_s_names element, we must set s_s_names
> like follows.
>
> s_s_names = '*'
> s_s_names = '2[*]'
>
> BTW, we've discussed about mini language syntax.
> IIRC, the syntax uses [] and () like,
> 'N[node1, node2, ...]', to define priority standbys.
> 'N(node1, node2, ...)', to define quorum standbys.
> And current patch behaves so.
>
> Which type of parentheses should be used for this syntax to be more clarity?
> Or other character should be used such as <>, // ?

I believed that [] and {} are used respectively for no distinct
reason. I think symmetrical pair of characters is preferable for
readability. Candidate pairs in ascii characters are.

(), {}, [] <>

{} might be a bit difficult to distinguish from [] on unclear
consoles :p

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 06:13:53
Message-ID:	CAB7nPqRmqXHTqiRk_-ru2Ox8VimjquNRYC_d+HpMu2t+UA4yEg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 10, 2016 at 11:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
> Yes, I will implement regression test patch and documentation patch as
well.

Cool, now that we have a clear picture of where we want to move, that would
be an excellent thing to have. Having the docs in the place is clearly
mandatory.

> Attached latest version patch supporting s_s_names = '*'.
> Unlike currently behaviour a bit, s_s_names can have only one '*'
character.
> e.g, The following setting will get syntax error.
>
> s_s_names = '*, node1,node2'
> s_s_names = `2[node1, *, node2]`
>
> when we use '*' character as s_s_names element, we must set s_s_names
> like follows.
>
> s_s_names = '*'
> s_s_names = '2[*]'
>
> BTW, we've discussed about mini language syntax.
> IIRC, the syntax uses [] and () like,
> 'N[node1, node2, ...]', to define priority standbys.
> 'N(node1, node2, ...)', to define quorum standbys.
> And current patch behaves so.
>
> Which type of parentheses should be used for this syntax to be more
clarity?
> Or other character should be used such as <>, // ?

I am personally fine with () and [] as you mention, we could even consider
{}, each one of them has a different meaning mathematically..

I am not entered into a detailed review yet (waiting for the docs), but the
patch looks brittle. I have been able to crash the server just by querying
pg_stat_replication:
* thread #1: tid = 0x0000, 0x0000000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
walsender.c:2783, stop reason = signal SIGSTOP
* frame #0: 0x0000000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
walsender.c:2783
frame #1: 0x0000000105d4277d
postgres`ExecMakeTableFunctionResult(funcexpr=0x00007fea128f3838,
econtext=0x00007fea128f1b58, argContext=0x00007fea128c8ea8,
expectedDesc=0x00007fea128f4710, randomAccess='\0') + 1005 at
execQual.c:2211
frame #2: 0x0000000105d70c24
postgres`FunctionNext(node=0x00007fea128f2f78) + 180 at
nodeFunctionscan.c:95
* thread #1: tid = 0x0000, 0x0000000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
walsender.c:2783, stop reason = signal SIGSTOP
frame #0: 0x0000000105eb36c2
postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
walsender.c:2783
2780 /*
2781 * Get the currently active synchronous standby.
2782 */
-> 2783 sync_standbys = (int *) palloc(sizeof(int) *
SyncRepStandbyNames->wait_num);
2784 LWLockAcquire(SyncRepLock, LW_SHARED);
2785 num_sync =
SyncRepGetSyncStandbysPriority(SyncRepStandbyNames, sync_standbys);
2786 LWLockRelease(SyncRepLock);
(lldb) p SyncRepStandbyNames
(SyncGroupNode *) $0 = 0x0000000000000000

+sync_node_group:
+ sync_list { $$ = create_group_node(1,
$1); }
+ | sync_element_ast { $$ = create_group_node(1,
$1);}
+ | INT '[' sync_list ']' { $$ = create_group_node($1,
$3);}
+ | INT '[' sync_element_ast ']' { $$ = create_group_node($1,
$3); }
We may want to be careful with the use of '[' in application_name. I am not
much thrilled with forbidding the use of []() in application_name, so we
may want to recommend user to use a backslash when using s_s_names when a
group is defined.

+void
+yyerror(const char *message)
+{
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg_internal("%s", message)));
+}
whitespace errors here.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 06:22:44
Message-ID:	CAB7nPqRk4ZjoQfs4rmF6Di1zp=b4eA=hk0L4GFzUj47GwhgM7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 10, 2016 at 3:13 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Feb 10, 2016 at 11:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> I am personally fine with () and [] as you mention, we could even consider
> {}, each one of them has a different meaning mathematically..
>
> I am not entered into a detailed review yet (waiting for the docs), but the
> patch looks brittle. I have been able to crash the server just by querying
> pg_stat_replication:
> * thread #1: tid = 0x0000, 0x0000000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> walsender.c:2783, stop reason = signal SIGSTOP
> * frame #0: 0x0000000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> walsender.c:2783
> frame #1: 0x0000000105d4277d
> postgres`ExecMakeTableFunctionResult(funcexpr=0x00007fea128f3838,
> econtext=0x00007fea128f1b58, argContext=0x00007fea128c8ea8,
> expectedDesc=0x00007fea128f4710, randomAccess='\0') + 1005 at
> execQual.c:2211
> frame #2: 0x0000000105d70c24
> postgres`FunctionNext(node=0x00007fea128f2f78) + 180 at
> nodeFunctionscan.c:95
> * thread #1: tid = 0x0000, 0x0000000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> walsender.c:2783, stop reason = signal SIGSTOP
> frame #0: 0x0000000105eb36c2
> postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> walsender.c:2783
> 2780 /*
> 2781 * Get the currently active synchronous standby.
> 2782 */
> -> 2783 sync_standbys = (int *) palloc(sizeof(int) *
> SyncRepStandbyNames->wait_num);
> 2784 LWLockAcquire(SyncRepLock, LW_SHARED);
> 2785 num_sync =
> SyncRepGetSyncStandbysPriority(SyncRepStandbyNames, sync_standbys);
> 2786 LWLockRelease(SyncRepLock);
> (lldb) p SyncRepStandbyNames
> (SyncGroupNode *) $0 = 0x0000000000000000
>
> +sync_node_group:
> + sync_list { $$ = create_group_node(1, $1);
> }
> + | sync_element_ast { $$ = create_group_node(1,
> $1);}
> + | INT '[' sync_list ']' { $$ = create_group_node($1,
> $3);}
> + | INT '[' sync_element_ast ']' { $$ = create_group_node($1,
> $3); }
> We may want to be careful with the use of '[' in application_name. I am not
> much thrilled with forbidding the use of []() in application_name, so we may
> want to recommend user to use a backslash when using s_s_names when a group
> is defined.
>
> +void
> +yyerror(const char *message)
> +{
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg_internal("%s", message)));
> +}
> whitespace errors here.

+#define MAX_WALSENDER_NAME 8192
+
typedef enum WalSndState
{
WALSNDSTATE_STARTUP = 0,
@@ -62,6 +64,11 @@ typedef struct WalSnd
* SyncRepLock.
*/
int sync_standby_priority;
+
+ /*
+ * Corresponding standby's application_name.
+ */
+ const char name[MAX_WALSENDER_NAME];
} WalSnd;
NAMEDATALEN instead?
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 08:34:50
Message-ID:	20160210.173450.119211447.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Wed, 10 Feb 2016 15:22:44 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRk4ZjoQfs4rmF6Di1zp=b4eA=hk0L4GFzUj47GwhgM7g(at)mail(dot)gmail(dot)com>
> On Wed, Feb 10, 2016 at 3:13 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
> > On Wed, Feb 10, 2016 at 11:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> > wrote:
> > I am personally fine with () and [] as you mention, we could even consider
> > {}, each one of them has a different meaning mathematically..
> >
> > I am not entered into a detailed review yet (waiting for the docs), but the
> > patch looks brittle. I have been able to crash the server just by querying
> > pg_stat_replication:
> > * thread #1: tid = 0x0000, 0x0000000105eb36c2
> > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > walsender.c:2783, stop reason = signal SIGSTOP
> > * frame #0: 0x0000000105eb36c2
> > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > walsender.c:2783
> > frame #1: 0x0000000105d4277d
> > postgres`ExecMakeTableFunctionResult(funcexpr=0x00007fea128f3838,
> > econtext=0x00007fea128f1b58, argContext=0x00007fea128c8ea8,
> > expectedDesc=0x00007fea128f4710, randomAccess='\0') + 1005 at
> > execQual.c:2211
> > frame #2: 0x0000000105d70c24
> > postgres`FunctionNext(node=0x00007fea128f2f78) + 180 at
> > nodeFunctionscan.c:95
> > * thread #1: tid = 0x0000, 0x0000000105eb36c2
> > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > walsender.c:2783, stop reason = signal SIGSTOP
> > frame #0: 0x0000000105eb36c2
> > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > walsender.c:2783
> > 2780 /*
> > 2781 * Get the currently active synchronous standby.
> > 2782 */
> > -> 2783 sync_standbys = (int *) palloc(sizeof(int) *
> > SyncRepStandbyNames->wait_num);
> > 2784 LWLockAcquire(SyncRepLock, LW_SHARED);
> > 2785 num_sync =
> > SyncRepGetSyncStandbysPriority(SyncRepStandbyNames, sync_standbys);
> > 2786 LWLockRelease(SyncRepLock);
> > (lldb) p SyncRepStandbyNames
> > (SyncGroupNode *) $0 = 0x0000000000000000
> >
> > +sync_node_group:
> > + sync_list { $$ = create_group_node(1, $1);
> > }
> > + | sync_element_ast { $$ = create_group_node(1,
> > $1);}
> > + | INT '[' sync_list ']' { $$ = create_group_node($1,
> > $3);}
> > + | INT '[' sync_element_ast ']' { $$ = create_group_node($1,
> > $3); }
> > We may want to be careful with the use of '[' in application_name. I am not
> > much thrilled with forbidding the use of []() in application_name, so we may
> > want to recommend user to use a backslash when using s_s_names when a group
> > is defined.

Mmmm. I found that application_name can contain
commas. Furthermore, there seems to be no limitation for
character in the name.

postgres=# set application_name='ho,ge';
postgres=# select application_name from pg_stat_activity;
application_name
------------------
ho,ge

check_application_name() allows all characters in the range
between 32 to 126 in ascii. All other characters are replaced
with '?'.

> > +void
> > +yyerror(const char *message)
> > +{
> > + ereport(ERROR,
> > + (errcode(ERRCODE_SYNTAX_ERROR),
> > + errmsg_internal("%s", message)));
> > +}
> > whitespace errors here.
>
> +#define MAX_WALSENDER_NAME 8192
> +
> typedef enum WalSndState
> {
> WALSNDSTATE_STARTUP = 0,
> @@ -62,6 +64,11 @@ typedef struct WalSnd
> * SyncRepLock.
> */
> int sync_standby_priority;
> +
> + /*
> + * Corresponding standby's application_name.
> + */
> + const char name[MAX_WALSENDER_NAME];
> } WalSnd;
> NAMEDATALEN instead?

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	sawada(dot)mshk(at)gmail(dot)com, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, amit(dot)kapila16(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-10 09:36:43
Message-ID:	CAB7nPqTHmuuDdKWmoaY1ZAi-gRnT_HRdHGyiqpNfFFr15qc5uA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello,
>
> At Wed, 10 Feb 2016 15:22:44 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqRk4ZjoQfs4rmF6Di1zp=b4eA=hk0L4GFzUj47GwhgM7g(at)mail(dot)gmail(dot)com>
> > On Wed, Feb 10, 2016 at 3:13 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> > > On Wed, Feb 10, 2016 at 11:25 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> > > wrote:
> > > I am personally fine with () and [] as you mention, we could even consider
> > > {}, each one of them has a different meaning mathematically..
> > >
> > > I am not entered into a detailed review yet (waiting for the docs), but the
> > > patch looks brittle. I have been able to crash the server just by querying
> > > pg_stat_replication:
> > > * thread #1: tid = 0x0000, 0x0000000105eb36c2
> > > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > > walsender.c:2783, stop reason = signal SIGSTOP
> > > * frame #0: 0x0000000105eb36c2
> > > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > > walsender.c:2783
> > > frame #1: 0x0000000105d4277d
> > > postgres`ExecMakeTableFunctionResult(funcexpr=0x00007fea128f3838,
> > > econtext=0x00007fea128f1b58, argContext=0x00007fea128c8ea8,
> > > expectedDesc=0x00007fea128f4710, randomAccess='\0') + 1005 at
> > > execQual.c:2211
> > > frame #2: 0x0000000105d70c24
> > > postgres`FunctionNext(node=0x00007fea128f2f78) + 180 at
> > > nodeFunctionscan.c:95
> > > * thread #1: tid = 0x0000, 0x0000000105eb36c2
> > > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > > walsender.c:2783, stop reason = signal SIGSTOP
> > > frame #0: 0x0000000105eb36c2
> > > postgres`pg_stat_get_wal_senders(fcinfo=0x00007fff5a156290) + 498 at
> > > walsender.c:2783
> > > 2780 /*
> > > 2781 * Get the currently active synchronous standby.
> > > 2782 */
> > > -> 2783 sync_standbys = (int *) palloc(sizeof(int) *
> > > SyncRepStandbyNames->wait_num);
> > > 2784 LWLockAcquire(SyncRepLock, LW_SHARED);
> > > 2785 num_sync =
> > > SyncRepGetSyncStandbysPriority(SyncRepStandbyNames, sync_standbys);
> > > 2786 LWLockRelease(SyncRepLock);
> > > (lldb) p SyncRepStandbyNames
> > > (SyncGroupNode *) $0 = 0x0000000000000000
> > >
> > > +sync_node_group:
> > > + sync_list { $$ = create_group_node(1, $1);
> > > }
> > > + | sync_element_ast { $$ = create_group_node(1,
> > > $1);}
> > > + | INT '[' sync_list ']' { $$ = create_group_node($1,
> > > $3);}
> > > + | INT '[' sync_element_ast ']' { $$ = create_group_node($1,
> > > $3); }
> > > We may want to be careful with the use of '[' in application_name. I am not
> > > much thrilled with forbidding the use of []() in application_name, so we may
> > > want to recommend user to use a backslash when using s_s_names when a group
> > > is defined.
>
> Mmmm. I found that application_name can contain
> commas. Furthermore, there seems to be no limitation for
> character in the name.
>
> postgres=# set application_name='ho,ge';
> postgres=# select application_name from pg_stat_activity;
> application_name
> ------------------
> ho,ge
>
> check_application_name() allows all characters in the range
> between 32 to 126 in ascii. All other characters are replaced
> with '?'.

Actually I was thinking about that a couple of hours ago. If the
application_name of a node has a comma, it cannot become a sync
replica, no? Wouldn't we need a special handling in s_s_names like
'\,' make a comma part of an application name? Or just ban commas from
the list of supported characters in the application name?
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-11 17:56:09
Message-ID:	CA+TgmoYo-HJ6F6+jY9ov9kBYg7U+=LycxoPCcng8B2QHb9YPkQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 5, 2016 at 3:36 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> So, here are some thoughts to make that more user-friendly. I think
> that the critical issue here is to properly flatten the meta data in
> the custom language and represent it properly in a new catalog,
> without messing up too much with the existing pg_stat_replication that
> people are now used to for 5 releases since 9.0.

Putting the metadata in a catalog doesn't seem great because that only
can ever work on the master. Maybe there's no need to configure this
on the slaves and therefore it's OK, but I feel nervous about putting
cluster configuration in catalogs. Another reason for that is that if
synchronous replication is broken, then you need a way to change the
catalog, which involves committing a write transaction; there's a
danger that your efforts to do this will be tripped up by the broken
synchronous replication configuration.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-11 22:40:54
Message-ID:	CAB7nPqSg2DtEn0e8ajnXhMkZmVvf1KPCBD8MhzPrOzuexHEnTw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 12, 2016 at 2:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Feb 5, 2016 at 3:36 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> So, here are some thoughts to make that more user-friendly. I think
>> that the critical issue here is to properly flatten the meta data in
>> the custom language and represent it properly in a new catalog,
>> without messing up too much with the existing pg_stat_replication that
>> people are now used to for 5 releases since 9.0.
>
> Putting the metadata in a catalog doesn't seem great because that only
> can ever work on the master. Maybe there's no need to configure this
> on the slaves and therefore it's OK, but I feel nervous about putting
> cluster configuration in catalogs. Another reason for that is that if
> synchronous replication is broken, then you need a way to change the
> catalog, which involves committing a write transaction; there's a
> danger that your efforts to do this will be tripped up by the broken
> synchronous replication configuration.

I was referring to a catalog view that parses the information related
to groups of s_s_names in a flattened way to show each group sync
status. Perhaps my words should have been clearer.
--
Michael

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-13 04:22:10
Message-ID:	CA+TgmobtBNLmp_K0q4j_=UdR+gZyqfmoJWj3Fi0cX9Po-nNwdA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 11, 2016 at 5:40 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Fri, Feb 12, 2016 at 2:56 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Feb 5, 2016 at 3:36 AM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> So, here are some thoughts to make that more user-friendly. I think
>>> that the critical issue here is to properly flatten the meta data in
>>> the custom language and represent it properly in a new catalog,
>>> without messing up too much with the existing pg_stat_replication that
>>> people are now used to for 5 releases since 9.0.
>>
>> Putting the metadata in a catalog doesn't seem great because that only
>> can ever work on the master. Maybe there's no need to configure this
>> on the slaves and therefore it's OK, but I feel nervous about putting
>> cluster configuration in catalogs. Another reason for that is that if
>> synchronous replication is broken, then you need a way to change the
>> catalog, which involves committing a write transaction; there's a
>> danger that your efforts to do this will be tripped up by the broken
>> synchronous replication configuration.
>
> I was referring to a catalog view that parses the information related
> to groups of s_s_names in a flattened way to show each group sync
> status. Perhaps my words should have been clearer.

Ah. Well, that's different, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-15 05:11:02
Message-ID:	20160215.141102.28792569.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Wed, 10 Feb 2016 18:36:43 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqTHmuuDdKWmoaY1ZAi-gRnT_HRdHGyiqpNfFFr15qc5uA(at)mail(dot)gmail(dot)com>
> On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > > +sync_node_group:
> > > > + sync_list { $$ = create_group_node(1, $1);
> > > > }
> > > > + | sync_element_ast { $$ = create_group_node(1,
> > > > $1);}
> > > > + | INT '[' sync_list ']' { $$ = create_group_node($1,
> > > > $3);}
> > > > + | INT '[' sync_element_ast ']' { $$ = create_group_node($1,
> > > > $3); }
> > > > We may want to be careful with the use of '[' in application_name. I am not
> > > > much thrilled with forbidding the use of []() in application_name, so we may
> > > > want to recommend user to use a backslash when using s_s_names when a group
> > > > is defined.
> >
> > Mmmm. I found that application_name can contain
> > commas. Furthermore, there seems to be no limitation for
> > character in the name.
> >
> > postgres=# set application_name='ho,ge';
> > postgres=# select application_name from pg_stat_activity;
> > application_name
> > ------------------
> > ho,ge
> >
> > check_application_name() allows all characters in the range
> > between 32 to 126 in ascii. All other characters are replaced
> > with '?'.
>
> Actually I was thinking about that a couple of hours ago. If the
> application_name of a node has a comma, it cannot become a sync
> replica, no? Wouldn't we need a special handling in s_s_names like
> '\,' make a comma part of an application name? Or just ban commas from
> the list of supported characters in the application name?

Surprizingly yes. The list is handled as an identifier list and
parsed by SplitIdentifierString thus it can accept deouble-quoted
names.

s_s_names='abc, def, " abc,""def"'

Result list is ["abc", "def", " abc,\"def"]

Simplly supporting the same notation addresses the problem and
accepts strings like the following.

s_s_names='2["comma,name", "foo[bar,baz]"]'

It is currently an undocumented behavior but I doubt the
necessity to have an explict mention.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	sawada(dot)mshk(at)gmail(dot)com, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, amit(dot)kapila16(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-15 05:54:37
Message-ID:	CAB7nPqRrp2wGq_Qnv_MFTddPY5T1bSJkKVEAsf_LetTwy=sYjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
> Surprizingly yes. The list is handled as an identifier list and
> parsed by SplitIdentifierString thus it can accept double-quoted
> names.

Good point. I was not aware of this trick.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-16 07:19:16
Message-ID:	CAD21AoBT9ctJjymC+d0W3SXgdR+tF5zsqGxfGUqW9Uj9VjHkKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
>> Surprizingly yes. The list is handled as an identifier list and
>> parsed by SplitIdentifierString thus it can accept double-quoted
>> names.
>

Attached latest version patch which has only feature logic so far.
I'm writing document patch about this feature now, so this version
patch doesn't have document and regression test patch.

> | $ postgres
> | FATAL: syntax error: unexpected character "*"
> Mmm.. It should be tough to find what has happened..

I'm trying to implement better error message, but that change is not
included in this version patch yet.

> malloc/free are used in create_name_node and other functions to
> be used in scanner, but syncgroup_gram.y is said to use
> palloc/pfree. Maybe they should use the same memory
> allocation/freeing functions.

Setting like this, I think that we use malloc/free funcion when we
allocate/free memory for SyncRepStandbys variables.
OTOH, we use palloc/pfree function during parsing SyncRepStandbyString.
Am I missing something?

> I suppose SyncRepGetSyncedLsnsFn, or SyncRepGetSyncedLsnsPriority
> can return InvalidXLogRecPtr as cur_*_pos even when it returns
> true. And, I suppose comparison of LSN values with
> InvalidXLogRecPtr is not well-defined. Anyway the condition goes
> wrong when cur_write_pos = InvalidXLogRecPtr (but ret = true).

In this version patch, it's not possible to return InvalidXLogRecPtr
with got_lsns = false (was ret = false).
So we can ensure that we got valid LSNs when got_lsns = true.

> At a glance, SyncRepGetSyncedLsnsPriority and
> SyncRepGetSyncStandbysPriority does almost the same thing and both
> runs loops over group members. Couldn't they run at once?

Yeah, I've optimized that logic.

> We may want to be careful with the use of '[' in application_name.
> I am not much thrilled with forbidding the use of []() in application_name, so we may
> want to recommend user to use a backslash when using s_s_names when a
> group is defined.
> s_s_names='abc, def, " abc,""def"'
>
> Result list is ["abc", "def", " abc,\"def"]
>
> Simplly supporting the same notation addresses the problem and
> accepts strings like the following.
>
> s_s_names='2["comma,name", "foo[bar,baz]"]'

I've changed s_s_names parser so that it can handle special 4
characters (\,\ \[\]) and can handle double-quoted string accurately
same as what SplitIdentifierString does.
We can not use special 4 characters (\,\ \[ \]) without using
double-quoted string. Also if we use "(double-quote) character in
double-quoted string, we should use ""(double double-quotes).
For example,
if application_name = 'hoge " bar', s_s_name = '"hoge "" bar"' would be matched.

Other given comments are fixed.

Remaining tasks are;
- Document patch.
- Regression test patch.
- Syntax error message for s_s_names improvement.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v10.patch	binary/octet-stream	31.9 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-22 13:52:29
Message-ID:	CAHGQGwENujogaQvcc=u0tffNfFGtwXNb1yFcphdTYCJdG1_j1A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 16, 2016 at 4:19 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
>>> Surprizingly yes. The list is handled as an identifier list and
>>> parsed by SplitIdentifierString thus it can accept double-quoted
>>> names.
>>
>
> Attached latest version patch which has only feature logic so far.
> I'm writing document patch about this feature now, so this version
> patch doesn't have document and regression test patch.

Thanks for updating the patch!

When I changed s_s_names to 'hoge*' and reloaded the configuration file,
the server crashed unexpectedly with the following error message.
This is obviously a bug.

FATAL: syntax error

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-23 08:44:44
Message-ID:	20160223.174444.178687579.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwENujogaQvcc=u0tffNfFGtwXNb1yFcphdTYCJdG1_j1A(at)mail(dot)gmail(dot)com>
> On Tue, Feb 16, 2016 at 4:19 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Mon, Feb 15, 2016 at 2:54 PM, Michael Paquier
> > <michael(dot)paquier(at)gmail(dot)com> wrote:
> >> On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote:
> >>> Surprizingly yes. The list is handled as an identifier list and
> >>> parsed by SplitIdentifierString thus it can accept double-quoted
> >>> names.
> >>
> >
> > Attached latest version patch which has only feature logic so far.
> > I'm writing document patch about this feature now, so this version
> > patch doesn't have document and regression test patch.
>
> Thanks for updating the patch!
>
> When I changed s_s_names to 'hoge*' and reloaded the configuration file,
> the server crashed unexpectedly with the following error message.
> This is obviously a bug.
>
> FATAL: syntax error

I had a glance on the lexer part in the new patch. It'd be
better to design the lexer from the beginning according to the
required behavior.

The documentation for the syntax is saying as the following,

http://www.postgresql.org/docs/current/static/runtime-config-logging.html

> application_name (string)
>
> The application_name can be any string of less than NAMEDATALEN
> characters (64 characters in a standard build). <snip> Only
> printable ASCII characters may be used in the application_name
> value. Other characters will be replaced with question marks (?).

And according to what some functions mentioned so far do, totally
an application_name is treated as follwoing, I suppose.

- check_application_name() currently allows [\x20-\x7e], which
differs from the definition of the SQL identifiers.

- SplitIdentifierString() and syncrep code

- allows any byte except a double quote in double-quoted
representation. A double-quote just after a delimiter can open
quoted representation.

- Non-quoted name can contain any character including double
quotes except ',' and white spaces.

- The syncrep code does case-insensitive matching with the
application_name.

So, to preserve or following the current behavior expct the last
one, the following pattern definitions would do. The
lexer/grammer for the new format of s_s_names could be simpler
than what it is.

space [ \n\r\f\t\v] /* See the definition of isspace(3) */
whitespace {space}+
dquote \"
app_name_chars [\x21-\x2b\x2d-\x7e] /* excluding ' ', ',' */
app_name_indq_chars [\x20\x21\x23-\x7e] /* excluding '"' */
app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
delimiter {whitespace}*,{whitespace}*
app_name ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
s_s_names {app_name}({delimiter}{app_name})*

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-24 08:37:57
Message-ID:	20160224.173757.38720623.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

Ok, I think we should concentrate the parser part for now.

At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160223(dot)174444(dot)178687579(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello,
>
> At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwENujogaQvcc=u0tffNfFGtwXNb1yFcphdTYCJdG1_j1A(at)mail(dot)gmail(dot)com>
> > Thanks for updating the patch!
> >
> > When I changed s_s_names to 'hoge*' and reloaded the configuration file,
> > the server crashed unexpectedly with the following error message.
> > This is obviously a bug.
> >
> > FATAL: syntax error
>
> I had a glance on the lexer part in the new patch. It'd be
> better to design the lexer from the beginning according to the
> required behavior.
>
> The documentation for the syntax is saying as the following,
>
> http://www.postgresql.org/docs/current/static/runtime-config-logging.html
>
> > application_name (string)
> >
> > The application_name can be any string of less than NAMEDATALEN
> > characters (64 characters in a standard build). <snip> Only
> > printable ASCII characters may be used in the application_name
> > value. Other characters will be replaced with question marks (?).
>
> And according to what some functions mentioned so far do, totally
> an application_name is treated as follwoing, I suppose.
>
> - check_application_name() currently allows [\x20-\x7e], which
> differs from the definition of the SQL identifiers.
>
> - SplitIdentifierString() and syncrep code
>
> - allows any byte except a double quote in double-quoted
> representation. A double-quote just after a delimiter can open
> quoted representation.
>
> - Non-quoted name can contain any character including double
> quotes except ',' and white spaces.
>
> - The syncrep code does case-insensitive matching with the
> application_name.
>
> So, to preserve or following the current behavior expct the last
> one, the following pattern definitions would do. The
> lexer/grammer for the new format of s_s_names could be simpler
> than what it is.
>
> space [ \n\r\f\t\v] /* See the definition of isspace(3) */
> whitespace {space}+
> dquote \"
> app_name_chars [\x21-\x2b\x2d-\x7e] /* excluding ' ', ',' */
> app_name_indq_chars [\x20\x21\x23-\x7e] /* excluding '"' */
> app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
> delimiter {whitespace}*,{whitespace}*
> app_name ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
> s_s_names {app_name}({delimiter}{app_name})*

So I made a hasty independent parser for the syntax including the
group names for the convenience for separate testing. The parser
takes input from stdin and prints the result structure.

It can take old s_s_name format and new list format. We haven't
discussed how to add gruop names but I added it as "<grpname>"
just before the # of syncronous standbys of [] and {} lists.

Is this usable for further discussions?

The sources can be compiles by the following commandline.

$ bison -v test.y; flex -l test.l; gcc -g -DYYDEBUG=1 -DYYERROR_VERBOSE -o ltest test.tab.c

and it makes the output like following.

[horiguti(at)drain tmp]$ echo '123[1,3,<x>3{a,b,e},4,*]' | ./ltest

TYPE: PRIO_LIST
GROUPNAME: <none>
NSYNC: 123
NEST: 2
CHILDREN {
{
TYPE: HOSTNAME
HOSTNAME: 1
QUOTED: No
NEST: 1
}
{
TYPE: HOSTNAME
HOSTNAME: 3
QUOTED: No
NEST: 0
}
TYPE: QUORUM_LIST
GROUPNAME: x
NSYNC: 3
NEST: 1
CHILDREN {
{
TYPE: HOSTNAME
HOSTNAME: a
QUOTED: No
NEST: 0
}
{
TYPE: HOSTNAME
HOSTNAME: b
QUOTED: No
NEST: 0
}
{
TYPE: HOSTNAME
HOSTNAME: e
QUOTED: No
NEST: 0
}
}
{
TYPE: HOSTNAME
HOSTNAME: 4
QUOTED: No
NEST: 0
}
{
TYPE: HOSTNAME
HOSTNAME: *
QUOTED: No
NEST: 0
}
}

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
unknown_filename	text/plain	1.4 KB
unknown_filename	text/plain	4.9 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-24 09:01:59
Message-ID:	CAD21AoCetS5BMcTpXXtMwG0hyszZgNn=zK1U73GcWTgJ-Wn3pQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 24, 2016 at 5:37 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> Ok, I think we should concentrate the parser part for now.
>
> At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160223(dot)174444(dot)178687579(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> Hello,
>>
>> At Mon, 22 Feb 2016 22:52:29 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwENujogaQvcc=u0tffNfFGtwXNb1yFcphdTYCJdG1_j1A(at)mail(dot)gmail(dot)com>
>> > Thanks for updating the patch!
>> >
>> > When I changed s_s_names to 'hoge*' and reloaded the configuration file,
>> > the server crashed unexpectedly with the following error message.
>> > This is obviously a bug.
>> >
>> > FATAL: syntax error
>>
>> I had a glance on the lexer part in the new patch. It'd be
>> better to design the lexer from the beginning according to the
>> required behavior.
>>
>> The documentation for the syntax is saying as the following,
>>
>> http://www.postgresql.org/docs/current/static/runtime-config-logging.html
>>
>> > application_name (string)
>> >
>> > The application_name can be any string of less than NAMEDATALEN
>> > characters (64 characters in a standard build). <snip> Only
>> > printable ASCII characters may be used in the application_name
>> > value. Other characters will be replaced with question marks (?).
>>
>> And according to what some functions mentioned so far do, totally
>> an application_name is treated as follwoing, I suppose.
>>
>> - check_application_name() currently allows [\x20-\x7e], which
>> differs from the definition of the SQL identifiers.
>>
>> - SplitIdentifierString() and syncrep code
>>
>> - allows any byte except a double quote in double-quoted
>> representation. A double-quote just after a delimiter can open
>> quoted representation.
>>
>> - Non-quoted name can contain any character including double
>> quotes except ',' and white spaces.
>>
>> - The syncrep code does case-insensitive matching with the
>> application_name.
>>
>> So, to preserve or following the current behavior expct the last
>> one, the following pattern definitions would do. The
>> lexer/grammer for the new format of s_s_names could be simpler
>> than what it is.
>>
>> space [ \n\r\f\t\v] /* See the definition of isspace(3) */
>> whitespace {space}+
>> dquote \"
>> app_name_chars [\x21-\x2b\x2d-\x7e] /* excluding ' ', ',' */
>> app_name_indq_chars [\x20\x21\x23-\x7e] /* excluding '"' */
>> app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
>> delimiter {whitespace}*,{whitespace}*
>> app_name ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
>> s_s_names {app_name}({delimiter}{app_name})*
>
>
> So I made a hasty independent parser for the syntax including the
> group names for the convenience for separate testing. The parser
> takes input from stdin and prints the result structure.
>
> It can take old s_s_name format and new list format. We haven't
> discussed how to add gruop names but I added it as "<grpname>"
> just before the # of syncronous standbys of [] and {} lists.
>
> Is this usable for further discussions?

Thank you for your suggestion.

Another option is to add group name with ":" to immediately after set
of standbys as I said earlier.
<http://www.postgresql.org/message-id/CAD21AoA9UqcbTnDKi0osd0yhN4FPgTrg6wuZeTtvpSYy2LqL5Q@mail.gmail.com>

s_s_names with group name would be as follows.
s_s_names = '2[local, 2[london1, london2, london3]:london, (tokyo1,
tokyo2):tokyo]'

Though?

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-25 09:46:28
Message-ID:	20160225.184628.47734634.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Wed, 24 Feb 2016 18:01:59 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCetS5BMcTpXXtMwG0hyszZgNn=zK1U73GcWTgJ-Wn3pQ(at)mail(dot)gmail(dot)com>
> On Wed, Feb 24, 2016 at 5:37 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello,
> >
> > Ok, I think we should concentrate the parser part for now.
> >
> > At Tue, 23 Feb 2016 17:44:44 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160223(dot)174444(dot)178687579(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> >> Hello,
...
> >> So, to preserve or following the current behavior expct the last
> >> one, the following pattern definitions would do. The
> >> lexer/grammer for the new format of s_s_names could be simpler
> >> than what it is.
> >>
> >> space [ \n\r\f\t\v] /* See the definition of isspace(3) */
> >> whitespace {space}+
> >> dquote \"
> >> app_name_chars [\x21-\x2b\x2d-\x7e] /* excluding ' ', ',' */
> >> app_name_indq_chars [\x20\x21\x23-\x7e] /* excluding '"' */
> >> app_name_dq_chars ({app_name_indq_chars}|{dquote}{dquote})
> >> delimiter {whitespace}*,{whitespace}*
> >> app_name ({app_name_chars}+|{dquote}{app_name_dq_chars}+{dquote})
> >> s_s_names {app_name}({delimiter}{app_name})*
> >
> >
> > So I made a hasty independent parser for the syntax including the
> > group names for the convenience for separate testing. The parser
> > takes input from stdin and prints the result structure.
> >
> > It can take old s_s_name format and new list format. We haven't
> > discussed how to add gruop names but I added it as "<grpname>"
> > just before the # of syncronous standbys of [] and {} lists.
> >
> > Is this usable for further discussions?
>
> Thank you for your suggestion.
>
> Another option is to add group name with ":" to immediately after set
> of standbys as I said earlier.
> <http://www.postgresql.org/message-id/CAD21AoA9UqcbTnDKi0osd0yhN4FPgTrg6wuZeTtvpSYy2LqL5Q@mail.gmail.com>
>
> s_s_names with group name would be as follows.
> s_s_names = '2[local, 2[london1, london2, london3]:london, (tokyo1,
> tokyo2):tokyo]'
>
> Though?

I have no problem with it. The attached new sample parser does
so.

By the way, your parser also complains for an example I've seen
somewhere upthread "1[2,3,4]". This is because '2', '3' and '4'
are regarded as INT, not NAME. Whether a sequence of digits is a
prefix number of a list or a host name cannot be identified until
reading some following characters. So my previous test.l defined
NAME_OR_INTEGER and it is distinguished in the grammar side to
resolve this problem.

If you want them identified in the lexer side, it should do
looking-forward as <NAME_OR_PREFIX>{prefix} in the attached
test.l does. This makes the lexer a bit complex but in contrast
test.y simpler. The test.l, test.y attached got refactored but .l
gets a bit tricky..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
unknown_filename	text/plain	2.3 KB
unknown_filename	text/plain	4.3 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-25 16:23:26
Message-ID:	CAD21AoDJFyCmd+Jw7LxS+qCT+JT90=1mzyvPwYXXsSpQNbXc7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Attached latest patch includes document patch.

> When I changed s_s_names to 'hoge*' and reloaded the configuration file,
> the server crashed unexpectedly with the following error message.
> This is obviously a bug.

Fixed.

> - allows any byte except a double quote in double-quoted
> representation. A double-quote just after a delimiter can open
> quoted representation.

No. double quote is also allowed in double-quoted representation using
by two double-quotes.
if s_s_names = '"node""hoge"' then standby name will be 'node"hoge'.

>
> I have no problem with it. The attached new sample parser does
> so.
>
> By the way, your parser also complains for an example I've seen
> somewhere upthread "1[2,3,4]". This is because '2', '3' and '4'
> are regarded as INT, not NAME. Whether a sequence of digits is a
> prefix number of a list or a host name cannot be identified until
> reading some following characters. So my previous test.l defined
> NAME_OR_INTEGER and it is distinguished in the grammar side to
> resolve this problem.
>
> If you want them identified in the lexer side, it should do
> looking-forward as <NAME_OR_PREFIX>{prefix} in the attached
> test.l does. This makes the lexer a bit complex but in contrast
> test.y simpler. The test.l, test.y attached got refactored but .l
> gets a bit tricky..

I think that lexer can pass both INT and NAME as char* to parser, and
then parser regards them as integer or char*.
It would be more simple.
Thoughts?

Thank you for giving lexer and parser example but I'm not sure that it
makes thing more easier.
It seems to make thing more complex.

Attached patch handles parameter using similar way as postgres parses SQL.
Please having a look it and give me feedbacks.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v11.patch	application/octet-stream	39.0 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-25 23:52:54
Message-ID:	CAD21AoAZKFVu8-MVhkJ3ywAiJmb=P-HSbJTGi=gK1La73KjS6Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2016 at 1:23 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Attached latest patch includes document patch.
>
>> When I changed s_s_names to 'hoge*' and reloaded the configuration file,
>> the server crashed unexpectedly with the following error message.
>> This is obviously a bug.
>
> Fixed.
>
>> - allows any byte except a double quote in double-quoted
>> representation. A double-quote just after a delimiter can open
>> quoted representation.
>
> No. double quote is also allowed in double-quoted representation using
> by two double-quotes.
> if s_s_names = '"node""hoge"' then standby name will be 'node"hoge'.
>
>>
>> I have no problem with it. The attached new sample parser does
>> so.
>>
>> By the way, your parser also complains for an example I've seen
>> somewhere upthread "1[2,3,4]". This is because '2', '3' and '4'
>> are regarded as INT, not NAME. Whether a sequence of digits is a
>> prefix number of a list or a host name cannot be identified until
>> reading some following characters. So my previous test.l defined
>> NAME_OR_INTEGER and it is distinguished in the grammar side to
>> resolve this problem.
>>
>> If you want them identified in the lexer side, it should do
>> looking-forward as <NAME_OR_PREFIX>{prefix} in the attached
>> test.l does. This makes the lexer a bit complex but in contrast
>> test.y simpler. The test.l, test.y attached got refactored but .l
>> gets a bit tricky..
>
> I think that lexer can pass both INT and NAME as char* to parser, and
> then parser regards them as integer or char*.
> It would be more simple.
> Thoughts?
>
> Thank you for giving lexer and parser example but I'm not sure that it
> makes thing more easier.
> It seems to make thing more complex.
>
> Attached patch handles parameter using similar way as postgres parses SQL.
> Please having a look it and give me feedbacks.
>

Previous patch could not parse one character standby name correctly.
Attached latest patch.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v12.patch	application/octet-stream	39.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-26 01:38:22
Message-ID:	20160226.103822.12680005.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, Thanks for the new patch.

At Fri, 26 Feb 2016 08:52:54 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAZKFVu8-MVhkJ3ywAiJmb=P-HSbJTGi=gK1La73KjS6Q(at)mail(dot)gmail(dot)com>
> Previous patch could not parse one character standby name correctly.
> Attached latest patch.

I haven't looked it in detail but it won't work as you
expected. flex compains as the following for v12 patch.

syncgroup_scanner.l:80: warning, rule cannot be matched
syncgroup_scanner.l:84: warning, rule cannot be matched

They are warnings about the patterns [1-9][0-9]* and {asterisk}
because it is matched by {node_name}+. The latter would no harm
(or the pattern is useless) but the former will make '1[a,b,c]'
to fail.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-26 01:53:25
Message-ID:	20160226.105325.168253404.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 26 Feb 2016 10:38:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160226(dot)103822(dot)12680005(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello, Thanks for the new patch.
>
>
> At Fri, 26 Feb 2016 08:52:54 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAZKFVu8-MVhkJ3ywAiJmb=P-HSbJTGi=gK1La73KjS6Q(at)mail(dot)gmail(dot)com>
> > Previous patch could not parse one character standby name correctly.
> > Attached latest patch.
>
> I haven't looked it in detail but it won't work as you
> expected. flex compains as the following for v12 patch.
>
> syncgroup_scanner.l:80: warning, rule cannot be matched
> syncgroup_scanner.l:84: warning, rule cannot be matched

Making it independent from postgres body then compile it with
-DYYDEBUG and set yydebug = 1 would give you valuable information
and make testing of the parser far easier.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
unknown_filename	text/plain	3.4 KB
unknown_filename	text/plain	5.5 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-27 19:04:37
Message-ID:	CAD21AoB69-tNLVzKRZ0Opzsr6LcLY36GJ2tHGohW33Btq3yRsw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 26, 2016 at 10:53 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Fri, 26 Feb 2016 10:38:22 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160226(dot)103822(dot)12680005(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> Hello, Thanks for the new patch.
>>
>>
>> At Fri, 26 Feb 2016 08:52:54 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAZKFVu8-MVhkJ3ywAiJmb=P-HSbJTGi=gK1La73KjS6Q(at)mail(dot)gmail(dot)com>
>> > Previous patch could not parse one character standby name correctly.
>> > Attached latest patch.
>>
>> I haven't looked it in detail but it won't work as you
>> expected. flex compains as the following for v12 patch.
>>
>> syncgroup_scanner.l:80: warning, rule cannot be matched
>> syncgroup_scanner.l:84: warning, rule cannot be matched
>
> Making it independent from postgres body then compile it with
> -DYYDEBUG and set yydebug = 1 would give you valuable information
> and make testing of the parser far easier.

Thank you for your suggestion.
Attached latest version patch.

The changes from previous version are,
- Fix parser, lexer bugs.
- Add regression test patch based on patch Suraji submitted.

Please review it.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v13.patch	application/octet-stream	43.1 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-02-29 10:24:14
Message-ID:	20160229.192414.166736236.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry, I misread the previous patch. It actually worked.

At Sun, 28 Feb 2016 04:04:37 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB69-tNLVzKRZ0Opzsr6LcLY36GJ2tHGohW33Btq3yRsw(at)mail(dot)gmail(dot)com>
> The changes from previous version are,
> - Fix parser, lexer bugs.
> - Add regression test patch based on patch Suraji submitted.

Thank you for the new patch. The parser almost looks to work as
expected, but the following warnings were seen on build.

> In file included from syncgroup_gram.y:138:0:
> syncgroup_scanner.l:23:12: warning: ‘xd_size’ defined but not used [-Wunused-variable]
> static int xd_size; /* actual size of xd_string */
> ^
> syncgroup_scanner.l:24:12: warning: ‘xd_len’ defined but not used [-Wunused-variable]
> static int xd_len; /* string length of xd_string */

Some random comments follow.

Commnents for the lexer part.

===
> +node_name [^\ \,\[\]]

This accepts 'abc^Id' as a name, which is wrong behavior (but
such appliction names are not allowed anyway. If you assume so,
I'd like to see a comment for that.). And the excessive escaping
make it hard to read a bit. The pattern can be written as the
following more precisely. (but I don't know whether it is
generally easy to read..)

| node_name [\x20-\x7f]{-}[ \[\],]

===
The pattern name {node_name} gives me a bit
uneasiness. node_name_cont or name_chars would be preferable.

===
> [1-9][0-9]* {

I see no necessity to inhibit 0-prefixed integers as NUM. Would
you mind allowing [0-9]+ there?

===
addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
char ychar) requires differnt character types. Is there any reason
for that?

===
I personally don't like addlit*string() things for such simple
syntax but itself is acceptble enough for me. However it uses
StringInfo to hold double-quoted names, which pallocs 1024 bytes
of memory chunk for every double-quoted name. The chunks are
finally stacked up left uncollected until the current
memorycontext is deleted or reset (It is deleted just after
finishing config file processing). Addition to that, setting
s_s_names runs the parser twice. It seems to me too greedy and
seems that static char [NAMEDATALEN] is enough using the v12 way
without palloc/repalloc.

Comments for parser part.

===
The rule "result" in syncgruop_gram.y sets malloced chunk to
SyncRepStandbys ignoring exiting content so repetitive setting to
the gud s_s_names causes a memory leak. Using
SyncRepClearStandbyGroupList would be enough.

===
The meaning of SyncGroupNode.type seems oscure. The member seems
to be referred to decide how to treat the node, but the following
code will break the assumption.

> group_node->type = SYNC_REP_GROUP_GROUP | SYNC_REP_GROUP_MAIN;

It seems me that *_MAIN is an equivalent of *_GROUP &&
sync_method = *_PRIORITY. If so, *_MAIN is useless. The reader of
SyncGroupNode needs not to see wheter it was in traditional
s_s_names or in new format.

===
Bare names in s_s_names are down-cased and double-quoted ones are
not. The parser of this patch doesn't for both.

===
xd_stringdup() doesn't make a copy of the string against its
name. It's error-prone.

===
I found that the name SyncGroupName.wait_num is not
instinctive. How about sync_num, sync_member_num or
sync_standby_num? If the last is preferable, .members also should
be .standbys .

Comment for the quorum commit body part.
===
I am quite uncomfortable with the existence of
WanSnd.sync_standby_priority. It represented the pirority in the
old linear s_s_names format but nested groups or even
single-level quarum list obviously doesn't fit it. Can we get rid
of sync_standby_priority, even though we realize atmost
n-priority for now?

===
The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
have specific code for every prioritizing method (which are
priority, quorum, nested and so). Is there any reson to use it as
a callback of SyncGroupNode?

Others - random commnets
===
SyncRepClearStandbyGroupList is defined in syncrep.c but the
other related functions are defined in syncgroup_gram.y. It would
be better to place them together.

===
SyncRepStandbys are to be in multilevel and the struct is
naturally allowed to be so but SyncRepClearStandbyGroupList
assumes it in single level. Make the function to free multilevel
or explicitly inhibit multilevel using asserttion.

===
- errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
+ errdetail("The transaction has already committed locally, but might not have been replicated to the standby(s).")));

The message doesn't contain specific number of standbys so just
using plural seems to be enough for me. And besides, the message
should describe the situation more precisely. Word correction is
left to anyone else:)

+ errdetail("The transaction has already committed locally, but might not have been replicated to some of the required standbys.")));

===
+ * Check whether specified standby is active, which means not only having
+ * pid but also having any priority.

"active" means not only defined priority but also have informed
WAL flush position.

+ * Check whether specified standby is active, which means not only having
+ * pid but also having any priority and valid flush position reported.

===
If there's no reason for SyncRepStandbyIsSync not to take WalSnd
directly, taking walsnd is simpler.

static bool SyncRepStandbyIsSync(volatile WalSnd *walsnd);

===
> * Update the LSNs on each queue based upon our latest state. This
> * implements a simple policy of first-valid-standby-releases-waiter.
> *
> * Other policies are possible, which would change what we do here and what
> * perhaps also which information we store as well.
> */
> void
> SyncRepReleaseWaiters(void)

This comment looks wrong for the new code.

===
> * Select low priority standbys from walsnds array. If there are same
> * priority standbys, first defined standby is selected. It's possible
> * to have same priority different standbys, so we can not break loop
> * even when standby having target_prioirty priority is found.

"low priority" here seems to be a mistake of "high priority
standbys" or "standbys with low priority value".

> * Returns the list of standbys in sync up to the number that
> * required to satisfy synchronous_standby_names. If there
> * are standbys with the same priority values, the first
> * defined ones are selected. It's possible for multiple
> * standbys to have a same priority value when multiple
> * walreceiver gives the same name, so we do not break the
> * inner loop just by finding a standby with the
> * target_priority.

===
> /* Got enough synchronous stnadby */

"staneby" => "standbys"

===
This is a comment from the aspect of abstractness of objects.
The callers of SyncRepGetSyncStandbysUsingPriority() need to care
the inside of SyncGroupNode but what the function should just
return seems to be the list of wansnds element. Element number is
useless when the SyncGroupNode nests.

> int
> SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)

This might need to expose 'volatile WalSnd*' (only pointer type)
outside of walsender.

Or it should return the list of index number of
*WalSndCtl->walsnds*.

===
The dependency definition seems to be wrong in Makefile so

editing related files won't cause appropriate
compilation. syncgroup_gram.h and syncgroup_gram.c are generated
at once from the .y file. and syncgroup_gram.o is generated from
syncgroup_gram.c and syncgroup_scanner.c.

-syncgroup_gram.o: syncgroup_scanner.c
-
-syncgroup_gram.h: syncgroup_gram.c ;
+syncgroup_gram.o: syncgroup_scanner.c syncgroup_gram.c

===
In pg_stat_get_wal_senders, the num_sync looks to have a chance
to be used uninitialized but I don't know why the compiler don't
complain about it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-03 01:10:08
Message-ID:	CAEepm=2YF137rbbC9D86VEz=boXod23qRriO4g+BQdd8MJYXuA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Feb 28, 2016 at 8:04 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Attached latest version patch.
>
> The changes from previous version are,
> - Fix parser, lexer bugs.
> - Add regression test patch based on patch Suraji submitted.
>
> Please review it.
>
> [000_multi_sync_replication_v13.patch]

Hi Masahiko,

Hi,

I have a couple of small suggestions for the documentation and comments:

+ Specifies a standby names that can support
<firstterm>synchronous replication</> using
+ either two types of syntax; comma-separated list or dedicated
language, as
+ described in <xref linkend="synchronous-replication">.
+ Transcations waiting for commit will be allowed to proceed after the
+ specified number of standby servers confirms receipt of their data.

Suggestion: Specifies the standby names that can support
<firstterm>synchronous replication</> using either of two syntaxes: a
comma-separated list, or a more flexible syntax described in <xref
linkend="synchronous-replication">. Transactions waiting for commit
will be allowed to proceed after a configurable subset of standby
servers confirms receipt of their data. For the simple
comma-separated list syntax, it is one server.

+ If the current any of synchronous standbys disconnects for
whatever reason,

s/the current any of/any of the current/

+ no mechanism to enforce uniqueness. For each specified standby name,
+ only the specified count of standbys will be chosen to be synchronous
+ standbys, though exactly which one is indeterminate, the rest will
+ represent potential synchronous standbys.

s/one/ones/
s/indeterminate, the/indeterminate. The/

+ made by a transcation have been transferred to one or more
synchronous standby
+ server. This extends that standard levelof durability

s/transcation/transaction/
s/that standard levelof/the standard level of/

offered by a transaction commit. This level of protection is referred
to as 2-safe replication in computer science theory.

Is this still called "2-safe" or does this patch make it "N-safe",
"group-safe", or something else?

- The minimum wait time is the roundtrip time between primary to standby.
+ The minimum wait time is the roundtrip time between primary to standbys.

Suggestion: The minimum wait time is the roundtrip time between the
primary and the slowest synchronous standby.

+ Multiple synchronous replication is set up by setting <xref
linkend="guc-synchronous-standby-names">
+ using dedicated language. The syntax of dedicated language is following.

Suggestion: Multiple synchronous replication is set up by setting
<xref linkend="guc-synchronous-standby-names"> using the following
syntax.

+ Using dedicated language, we can define a synchronous group with
a number N.
+ synchronous group can have some members which are consdiered as
synchronous standby using comma-separated list.
+ Any standby name is accepted at any position of its list, but '*'
is accepted at only tailing of the standby list.
+ The leading N is a number which specifies that how many standbys
the master server waits to commit for. This number
+ must be less than actual number of members of its group.
+ The listed standby are given highest priority from left defined
starting with 1.

Suggestion: This syntax allows us to define a synchronous group that
will wait for at least N standbys, and a comma-separated list of group
members. The special value <literal>*</> is accepted at the tail of
the member list, and matches any standby. The number N must not be
greater than the number of members listed in the group, unless
<literal>*</> is used. Priority is given to servers in the order that
they appear in the list. The first named server has the highest
priority.

+ All ASCII characters except for special characters(',', '&quot',
'[', ']', ' ') are allowed as standby name.
+ When these special characters are used as standby name, whole
standby name string need to be written in
+ double-quoted representation.

Suggestion: ... are allowed in unquoted standby names. To use these
special characters, the standby name should be enclosed in double
quotes.

+ * In 9.5 we support the possibility to have multiple synchronous standbys,

s/9.5/9.6/

+ * as defined in synchronous_standby_names. Before on standby can become a

s/ on / a /

+ * Waiters will be released from the queue once the number of standbys
+ * specified in synchronous_standby_names have caught.

s/caught/processed the commit record/

+ * Check whether specified standby is active, which means not only having
+ * pid but also having any priority.

s/having any priority/having a non-zero priority (meaning it is
configured as potential sync standby)./

- announce_next_takeover = true;

By removing this, haven't we lost the ability to announce takeover
more than once per walsender? I'm not sure exactly where this should
go now but the walsender needs to detect its own transition from
potential to sync state. Also, that message, where it appears below
should probably be tweaked slightly s/the/a/, so "standby \"%s\" is
now a synchronous standby with priority %u", not "... the synchronous
standby ...".

/*
+ * Return true if we have enough synchrononized standbys and the 'safe' written
+ * flushed LSNs, which are LSNs assured in all standbys considered should be
+ * synchronized.
+ */

Suggestion: Return true if we have enough synchronous standbys. If
true, also store the 'safe' write and flush position in the output
parameters write_pos and flush_pos, but only if the standby managed by
this walsender is one of the standbys that has reached each safe
position respectively.

+ /* Check whether each LSN has advanced to */

Suggestion: /* Check whether this standby has reached the safe positions. */

+/*
+ * Decide synced LSNs at this moment using priority method.
+ * If there are not active standbys enough to determine LSNs, return false.

s/not active standbys enough/not enough active standbys/

+/*
+ * Return the positions of the first group->wait_num synchronized standbys
+ * in group->member list into sync_list. sync_list is assumed to have enough
+ * space for at least group->wait_num elements.
+ */

s/Return/Write/
s/sychronized/synchronous/
Then add: "Return the number found."

+int
+SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, int *sync_list)
+{
+ int target_priority = 1; /* lowest priority is 1 */

1 is actually the *highest* priority standby.

+ /*
+ * Select low priority standbys from walsnds array. If there are same
+ * priority standbys, first defined standby is selected. It's possible
+ * to have same priority different standbys, so we can not break loop
+ * even when standby having target_prioirty priority is found.

s/target_prioirty/target_priority/

+ /* Got enough synchronous stnadby */

s/stnadby/standbys/

+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ (errmsg_internal("The number of group memebers must be less than its
group waits."))));

I'm not sure what the right error code is, but this isn't an syntax
error. Maybe ERRCODE_CONFIG_FILE_ERROR or
ERRCODE_INVALID_PARAMETER_VALUE? Suggestion for the message: "the
configured number of synchronous standbys exceeds the length of the
group of standby names: %d"

+ /*
+ * syncgroup_yyparse sets the global SyncRepStandbys as side effect.
+ * But this function is required to just check, so frees SyncRepStandbyNanes

s/SyncRepStandbyNanes/SyncRepStandbys/ ???

+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ (errmsg_internal("Invalid syntax. synchronous_standby_names parse
returned %d",
+ parse_rc))));

Looking at other error messages I see that they always start with
lower case and then put extra details after ':' rather than using a
'.'. Maybe this could be "could not parse synchronous_standby_names:
error code %d"?

+#define MAX_WALSENDER_NAME 8192

Seems to be unused.

Thanks!

--
Thomas Munro
http://www.enterprisedb.com

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-03 14:30:49
Message-ID:	CAD21AoD3XGZtuvgc5uKJdvcoJP5S0rvGQQCJLRL4rLsruRch5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Thank you so much for reviewing this patch!

All review comments regarding document and comment are fixed.
Attached latest v14 patch.

> This accepts 'abc^Id' as a name, which is wrong behavior (but
> such appliction names are not allowed anyway. If you assume so,
> I'd like to see a comment for that.).

'abc^Id' is accepted as application_name, no?
postgres(1)=# set application_name to 'abc^Id';
SET
postgres(1)=# show application_name ;
application_name
------------------
abc^Id
(1 row)

> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> char ychar) requires differnt character types. Is there any reason
> for that?

Because addlit_xd_string() is for adding string(char *) to xd_string,
OTOH addlit_xd_char() is for adding just one character to xd_string.

> I personally don't like addlit*string() things for such simple
> syntax but itself is acceptble enough for me. However it uses
> StringInfo to hold double-quoted names, which pallocs 1024 bytes
> of memory chunk for every double-quoted name. The chunks are
> finally stacked up left uncollected until the current
> memorycontext is deleted or reset (It is deleted just after
> finishing config file processing). Addition to that, setting
> s_s_names runs the parser twice. It seems to me too greedy and
> seems that static char [NAMEDATALEN] is enough using the v12 way
> without palloc/repalloc.

I though that length of group name could be more than NAMEDATALEN, so
I use StringInfo.
Is it not necessary?

> I found that the name SyncGroupName.wait_num is not
> instinctive. How about sync_num, sync_member_num or
> sync_standby_num? If the last is preferable, .members also should
> be .standbys .

Thanks, sync_num is preferable to me.

===
> I am quite uncomfortable with the existence of
> WanSnd.sync_standby_priority. It represented the pirority in the
> old linear s_s_names format but nested groups or even
> single-level quarum list obviously doesn't fit it. Can we get rid
> of sync_standby_priority, even though we realize atmost
> n-priority for now?

We could get rid of sync_standby_priority.
But if so, we will not be able to see the next sync standby in
pg_stat_replication system view.
Regarding each node priority, I was thinking that standbys in quorum
list have same priority, and in nested group each standbys are given
the priority starting from 1.

===
> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
> have specific code for every prioritizing method (which are
> priority, quorum, nested and so). Is there any reson to use it as
> a callback of SyncGroupNode?

The reason why the current code is so is that current code is for only
priority method supporting.
At first version of this feature, I'd like to implement it more simple.

Aside from this, of course I'm planning to have specific code for nested design.
- The group can have some name nodes or group nodes.
- The group can use either 2 types of method: priority or quorum.
- The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
- SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
at that moment using group's method.
- SyncRepGetStandbysFn() function returns standbys of its group,
which are considered as sync using group's method.

For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
memory structure will be,

"main(quorum)" --- "a"
|
-- "b"
|
-- "group1(priority)" --- "c"
|
-- "d"

When determine synced LSNs, we need to consider group1's LSN using by
priority method at first, and then we can determine main's LSN using
by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
So SyncRepGetSyncedLsnsUsingPriority() function would be,

bool
SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
{
sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);

if (sync_num < group->sync_num)
return false;

for (each member of sync_list)
{
if (member->type == group node)
call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
f into lsn_list.
else
Store name node LSNs into lsn_list.
}

Determine synced LSNs of this group using lsn_list and priority method.
Store synced LSNs into write_lsn and flush_lsn.
return true;
}

> SyncRepClearStandbyGroupList is defined in syncrep.c but the
> other related functions are defined in syncgroup_gram.y. It would
> be better to place them together.

SyncRepClearStandbyGroupList() is used by
check_synchronous_standby_names(), so I put this function syncrep.c.

> SyncRepStandbys are to be in multilevel and the struct is
> naturally allowed to be so but SyncRepClearStandbyGroupList
> assumes it in single level.

Because I think that we don't need to implement to fully support
nested style at first version.
We have to carefully design this feature while considering
expandability, but overkill implementation could be cause of crash.
Consider remaining time for 9.6, I feel we could implement quorum
method at best.

> This is a comment from the aspect of abstractness of objects.
> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
> the inside of SyncGroupNode but what the function should just
> return seems to be the list of wansnds element. Element number is
> useless when the SyncGroupNode nests.
> > int
> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
> This might need to expose 'volatile WalSnd*' (only pointer type)
> outside of walsender.
> Or it should return the list of index number of
> *WalSndCtl->walsnds*.

SyncRepGetSyncStandbysUsingPriority() already returns the list of
index number of "WalSndCtl->walsnd" as sync_list, no?
As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
inside of SyncGroupNode in my design.
Selecting sync nodes from its group doesn't depend on the type of node.
What SyncRepGetSyncStandbyFn() should do is to select sync node from
*its* group.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v14.patch	binary/octet-stream	44.2 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-03 18:40:31
Message-ID:	CAD21AoAr=-ZECe-95tO+KrK8iNSdpn0r2qYKJtoYiZFXFembBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Hi,
>
> Thank you so much for reviewing this patch!
>
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
>
>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>> such appliction names are not allowed anyway. If you assume so,
>> I'd like to see a comment for that.).
>
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
> application_name
> ------------------
> abc^Id
> (1 row)
>
>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> char ychar) requires differnt character types. Is there any reason
>> for that?
>
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
>> I personally don't like addlit*string() things for such simple
>> syntax but itself is acceptble enough for me. However it uses
>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> of memory chunk for every double-quoted name. The chunks are
>> finally stacked up left uncollected until the current
>> memorycontext is deleted or reset (It is deleted just after
>> finishing config file processing). Addition to that, setting
>> s_s_names runs the parser twice. It seems to me too greedy and
>> seems that static char [NAMEDATALEN] is enough using the v12 way
>> without palloc/repalloc.
>
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?
>
>> I found that the name SyncGroupName.wait_num is not
>> instinctive. How about sync_num, sync_member_num or
>> sync_standby_num? If the last is preferable, .members also should
>> be .standbys .
>
> Thanks, sync_num is preferable to me.
>
> ===
>> I am quite uncomfortable with the existence of
>> WanSnd.sync_standby_priority. It represented the pirority in the
>> old linear s_s_names format but nested groups or even
>> single-level quarum list obviously doesn't fit it. Can we get rid
>> of sync_standby_priority, even though we realize atmost
>> n-priority for now?
>
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.
>
> ===
>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>> have specific code for every prioritizing method (which are
>> priority, quorum, nested and so). Is there any reson to use it as
>> a callback of SyncGroupNode?
>
> The reason why the current code is so is that current code is for only
> priority method supporting.
> At first version of this feature, I'd like to implement it more simple.
>
> Aside from this, of course I'm planning to have specific code for nested design.
> - The group can have some name nodes or group nodes.
> - The group can use either 2 types of method: priority or quorum.
> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
> at that moment using group's method.
> - SyncRepGetStandbysFn() function returns standbys of its group,
> which are considered as sync using group's method.
>
> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
> memory structure will be,
>
> "main(quorum)" --- "a"
> |
> -- "b"
> |
> -- "group1(priority)" --- "c"
> |
> -- "d"
>
> When determine synced LSNs, we need to consider group1's LSN using by
> priority method at first, and then we can determine main's LSN using
> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>
> bool
> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
> {
> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>
> if (sync_num < group->sync_num)
> return false;
>
> for (each member of sync_list)
> {
> if (member->type == group node)
> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
> f into lsn_list.
> else
> Store name node LSNs into lsn_list.
> }
>
> Determine synced LSNs of this group using lsn_list and priority method.
> Store synced LSNs into write_lsn and flush_lsn.
> return true;
> }
>
>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
>> other related functions are defined in syncgroup_gram.y. It would
>> be better to place them together.
>
> SyncRepClearStandbyGroupList() is used by
> check_synchronous_standby_names(), so I put this function syncrep.c.
>
>> SyncRepStandbys are to be in multilevel and the struct is
>> naturally allowed to be so but SyncRepClearStandbyGroupList
>> assumes it in single level.
>
> Because I think that we don't need to implement to fully support
> nested style at first version.
> We have to carefully design this feature while considering
> expandability, but overkill implementation could be cause of crash.
> Consider remaining time for 9.6, I feel we could implement quorum
> method at best.
>
>> This is a comment from the aspect of abstractness of objects.
>> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
>> the inside of SyncGroupNode but what the function should just
>> return seems to be the list of wansnds element. Element number is
>> useless when the SyncGroupNode nests.
>> > int
>> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
>> This might need to expose 'volatile WalSnd*' (only pointer type)
>> outside of walsender.
>> Or it should return the list of index number of
>> *WalSndCtl->walsnds*.
>
> SyncRepGetSyncStandbysUsingPriority() already returns the list of
> index number of "WalSndCtl->walsnd" as sync_list, no?
> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
> inside of SyncGroupNode in my design.
> Selecting sync nodes from its group doesn't depend on the type of node.
> What SyncRepGetSyncStandbyFn() should do is to select sync node from
> *its* group.
>

Previous patch has bug around GUC parameter handling.
Attached updated version.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
000_multi_sync_replication_v15.patch	application/octet-stream	44.1 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-03 19:37:24
Message-ID:	CAEepm=2ubiu2tWBXgfeiDJuiBnzUtDbdmkE+0PSbn4MzopYfJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 4, 2016 at 7:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Previous patch has bug around GUC parameter handling.
> Attached updated version.

I spotted a couple of typos:

+ used. Priority is given to servers in the order that the appear
in the list.

s/the appear/they appear/

- The minimum wait time is the roundtrip time between primary to standby.
+ The minimum wait time is the roundtrip time between the primary and the
+ almost synchronous standby.

s/almost/slowest/

--
Thomas Munro
http://www.enterprisedb.com

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	thomas(dot)munro(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-04 08:22:28
Message-ID:	20160304.172228.29892605.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

Sorry for long, hard-to-read writings in advance..

At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD3XGZtuvgc5uKJdvcoJP5S0rvGQQCJLRL4rLsruRch5Q(at)mail(dot)gmail(dot)com>
> Hi,
>
> Thank you so much for reviewing this patch!
>
> All review comments regarding document and comment are fixed.
> Attached latest v14 patch.
>
> > This accepts 'abc^Id' as a name, which is wrong behavior (but
> > such appliction names are not allowed anyway. If you assume so,
> > I'd like to see a comment for that.).
>
> 'abc^Id' is accepted as application_name, no?
> postgres(1)=# set application_name to 'abc^Id';
> SET
> postgres(1)=# show application_name ;
> application_name
> ------------------
> abc^Id
> (1 row)

Sorry, I implicitly used "^" in the meaning of "ctrl key". So
"^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
following in psql shows that.

=# set application_name to E'abc\td';
=# show application_name ;
application_name
------------------
ab?d
(1 row)

The <tab> is replaced with '?' (literally) at the time of
guc assinment.

> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> > char ychar) requires differnt character types. Is there any reason
> > for that?
>
> Because addlit_xd_string() is for adding string(char *) to xd_string,
> OTOH addlit_xd_char() is for adding just one character to xd_string.

Umm. My qustion might have been a bit out of the point.

The addlitchar_xd_string(str,unsigned char c) does
appendStringInfoChar(, c). On the other hand, the signature of
the function of stringinfo is the following.

AppendStringInfoChar(StringInfo str, char ch);

Of course "char" is equivalent of "signed char" as
default. addlitchar_xd_string assigns the given character in
"unsigned char" to the parameter of AppendStringInfoChar of
"signed char".

These two are incompatible types. Imagine the
following codelet,

#include <stdio.h>

void hoge(signed char c){
int ch = c;
fprintf(stderr, "char = %d\n", ch);
}

int main(void)
{
unsigned char u;

u = 200;
hoge(u);
return 0;
}

The result is -56. So we generally should get rid of such type of
mixture of signedness for no particular reason.

In this case, the domain of the variable is 0x20-0x7e so no
problem won't be actualized but also there's no reason for the
signedness mixture.

> > I personally don't like addlit*string() things for such simple
> > syntax but itself is acceptble enough for me. However it uses
> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
> > of memory chunk for every double-quoted name. The chunks are
> > finally stacked up left uncollected until the current
> > memorycontext is deleted or reset (It is deleted just after
> > finishing config file processing). Addition to that, setting
> > s_s_names runs the parser twice. It seems to me too greedy and
> > seems that static char [NAMEDATALEN] is enough using the v12 way
> > without palloc/repalloc.
>
> I though that length of group name could be more than NAMEDATALEN, so
> I use StringInfo.
> Is it not necessary?

Such long names doesn't seem to necessary. Too long identifiers
no longer act as identifier for human eyeballs. We are limiting
the length of identifiers of the whole database system to
NAMEDATALEN-1, which seems to have been enough so I don't see any
reason to have a group name longer than that.

> > I found that the name SyncGroupName.wait_num is not
> > instinctive. How about sync_num, sync_member_num or
> > sync_standby_num? If the last is preferable, .members also should
> > be .standbys .
>
> Thanks, sync_num is preferable to me.
>
> ===
> > I am quite uncomfortable with the existence of
> > WanSnd.sync_standby_priority. It represented the pirority in the
> > old linear s_s_names format but nested groups or even
> > single-level quarum list obviously doesn't fit it. Can we get rid
> > of sync_standby_priority, even though we realize atmost
> > n-priority for now?
>
> We could get rid of sync_standby_priority.
> But if so, we will not be able to see the next sync standby in
> pg_stat_replication system view.
> Regarding each node priority, I was thinking that standbys in quorum
> list have same priority, and in nested group each standbys are given
> the priority starting from 1.

As far as I can see the varialbe is referred to as a boolean to
indicate whether a walsernder is connected to a candidate
synchronous standby. So the value is totally useless, at least
for now. However, SyncRepRelaseWaiters uses the value to check if
the synced LSNs can be advaned by a walsender so the variable is
useful as a boolean.

In the previous versions, the reason why WanSnd had the priority
value is that a pair of synchronized LSNs is determined only by
one wansender, which has the highest priority among active
wansenders. So even if a walsender receives a response from
walreceiver, it doesn't need to do nothing if it is not at the
highest priority. It's a simple world.

In the quorum commit word, in contrast, what
SyncRepGetSyncStandbysFn shoud do is returning certain private
information to be used to calculate a pair of safe/synched LSNs
in SyncRepGetSYncedLsnsFn looking into WalSndCtl->wansnds
list. The latter passes a pair of safe/synced LSNs to the upper
level list or SyncRepSyncedLsnAdvancedTo as the topmost
caller. There's no room for sync_standby_priority to work as the
original objective.

Even if we assign the value in the explained way, the values are
always 1 for quorum method and duplicate values for multiple
priority method. What do you want to show by the value to users?

> ===
> > The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
> > have specific code for every prioritizing method (which are
> > priority, quorum, nested and so). Is there any reson to use it as
> > a callback of SyncGroupNode?
>
> The reason why the current code is so is that current code is for only
> priority method supporting.
> At first version of this feature, I'd like to implement it more simple.
>
> Aside from this, of course I'm planning to have specific code for nested design.
> - The group can have some name nodes or group nodes.
> - The group can use either 2 types of method: priority or quorum.
> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
> at that moment using group's method.
> - SyncRepGetStandbysFn() function returns standbys of its group,
> which are considered as sync using group's method.
>
> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
> memory structure will be,
>
> "main(quorum)" --- "a"
> |
> -- "b"
> |
> -- "group1(priority)" --- "c"
> |
> -- "d"
>
> When determine synced LSNs, we need to consider group1's LSN using by
> priority method at first, and then we can determine main's LSN using
> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
> So SyncRepGetSyncedLsnsUsingPriority() function would be,

Thank you for the explanation. I *recalled* that.

> > SyncRepClearStandbyGroupList is defined in syncrep.c but the
> > other related functions are defined in syncgroup_gram.y. It would
> > be better to place them together.
>
> SyncRepClearStandbyGroupList() is used by
> check_synchronous_standby_names(), so I put this function syncrep.c.

Thanks.

> > SyncRepStandbys are to be in multilevel and the struct is
> > naturally allowed to be so but SyncRepClearStandbyGroupList
> > assumes it in single level.
>
> Because I think that we don't need to implement to fully support
> nested style at first version.
> We have to carefully design this feature while considering
> expandability, but overkill implementation could be cause of crash.
> Consider remaining time for 9.6, I feel we could implement quorum
> method at best.

Yes, so I proposed to ass Aseert() in the function.

> > This is a comment from the aspect of abstractness of objects.
> > The callers of SyncRepGetSyncStandbysUsingPriority() need to care
> > the inside of SyncGroupNode but what the function should just
> > return seems to be the list of wansnds element. Element number is
> > useless when the SyncGroupNode nests.
> > > int
> > > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
> > This might need to expose 'volatile WalSnd*' (only pointer type)
> > outside of walsender.
> > Or it should return the list of index number of
> > *WalSndCtl->walsnds*.
>
> SyncRepGetSyncStandbysUsingPriority() already returns the list of
> index number of "WalSndCtl->walsnd" as sync_list, no?

Yes, myself don't understand what I tried to say by this:( Maybe
I mistook what sync_list returns as an index list of
SyncGroupNode. Anyway sorry for the noise.

> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
> inside of SyncGroupNode in my design.
> Selecting sync nodes from its group doesn't depend on the type of node.
> What SyncRepGetSyncStandbyFn() should do is to select sync node from
> *its* group.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-07 07:55:30
Message-ID:	CAD21AoCfSX_MbOze8C--zoDJatUfaJcF5PzZ-ELqpigNBQ_ERw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Reply to multiple hackers.
Thank you for reviewing this patch.

> + used. Priority is given to servers in the order that the appear
> in the list.
>
> s/the appear/they appear/
>
> - The minimum wait time is the roundtrip time between primary to standby.
> + The minimum wait time is the roundtrip time between the primary and the
> + almost synchronous standby.
>
> s/almost/slowest/

Will fix this typo. Thanks!

On Fri, Mar 4, 2016 at 5:22 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> Sorry for long, hard-to-read writings in advance..
>
> At Thu, 3 Mar 2016 23:30:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD3XGZtuvgc5uKJdvcoJP5S0rvGQQCJLRL4rLsruRch5Q(at)mail(dot)gmail(dot)com>
>> Hi,
>>
>> Thank you so much for reviewing this patch!
>>
>> All review comments regarding document and comment are fixed.
>> Attached latest v14 patch.
>>
>> > This accepts 'abc^Id' as a name, which is wrong behavior (but
>> > such appliction names are not allowed anyway. If you assume so,
>> > I'd like to see a comment for that.).
>>
>> 'abc^Id' is accepted as application_name, no?
>> postgres(1)=# set application_name to 'abc^Id';
>> SET
>> postgres(1)=# show application_name ;
>> application_name
>> ------------------
>> abc^Id
>> (1 row)
>
> Sorry, I implicitly used "^" in the meaning of "ctrl key". So
> "^I" is so-called Ctrl-I, that is horizontal tab or 0x09. So the
> following in psql shows that.
>
> =# set application_name to E'abc\td';
> =# show application_name ;
> application_name
> ------------------
> ab?d
> (1 row)
>
> The <tab> is replaced with '?' (literally) at the time of
> guc assinment.

Oh, I see.
I will comment for that.

>> > addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>> > char ychar) requires differnt character types. Is there any reason
>> > for that?
>>
>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>
> Umm. My qustion might have been a bit out of the point.
>
> The addlitchar_xd_string(str,unsigned char c) does
> appendStringInfoChar(, c). On the other hand, the signature of
> the function of stringinfo is the following.
>
> AppendStringInfoChar(StringInfo str, char ch);
>
> Of course "char" is equivalent of "signed char" as
> default. addlitchar_xd_string assigns the given character in
> "unsigned char" to the parameter of AppendStringInfoChar of
> "signed char".
>
> These two are incompatible types. Imagine the
> following codelet,
>
> #include <stdio.h>
>
> void hoge(signed char c){
> int ch = c;
> fprintf(stderr, "char = %d\n", ch);
> }
>
> int main(void)
> {
> unsigned char u;
>
> u = 200;
> hoge(u);
> return 0;
> }
>
> The result is -56. So we generally should get rid of such type of
> mixture of signedness for no particular reason.
>
> In this case, the domain of the variable is 0x20-0x7e so no
> problem won't be actualized but also there's no reason for the
> signedness mixture.

Thank you for explanation.
I will fix this.

>> > I personally don't like addlit*string() things for such simple
>> > syntax but itself is acceptble enough for me. However it uses
>> > StringInfo to hold double-quoted names, which pallocs 1024 bytes
>> > of memory chunk for every double-quoted name. The chunks are
>> > finally stacked up left uncollected until the current
>> > memorycontext is deleted or reset (It is deleted just after
>> > finishing config file processing). Addition to that, setting
>> > s_s_names runs the parser twice. It seems to me too greedy and
>> > seems that static char [NAMEDATALEN] is enough using the v12 way
>> > without palloc/repalloc.
>>
>> I though that length of group name could be more than NAMEDATALEN, so
>> I use StringInfo.
>> Is it not necessary?
>
> Such long names doesn't seem to necessary. Too long identifiers
> no longer act as identifier for human eyeballs. We are limiting
> the length of identifiers of the whole database system to
> NAMEDATALEN-1, which seems to have been enough so I don't see any
> reason to have a group name longer than that.
>

I see. I will fix this.

>> > I found that the name SyncGroupName.wait_num is not
>> > instinctive. How about sync_num, sync_member_num or
>> > sync_standby_num? If the last is preferable, .members also should
>> > be .standbys .
>>
>> Thanks, sync_num is preferable to me.
>>
>> ===
>> > I am quite uncomfortable with the existence of
>> > WanSnd.sync_standby_priority. It represented the pirority in the
>> > old linear s_s_names format but nested groups or even
>> > single-level quarum list obviously doesn't fit it. Can we get rid
>> > of sync_standby_priority, even though we realize atmost
>> > n-priority for now?
>>
>> We could get rid of sync_standby_priority.
>> But if so, we will not be able to see the next sync standby in
>> pg_stat_replication system view.
>> Regarding each node priority, I was thinking that standbys in quorum
>> list have same priority, and in nested group each standbys are given
>> the priority starting from 1.
>
> As far as I can see the varialbe is referred to as a boolean to
> indicate whether a walsernder is connected to a candidate
> synchronous standby. So the value is totally useless, at least
> for now. However, SyncRepRelaseWaiters uses the value to check if
> the synced LSNs can be advaned by a walsender so the variable is
> useful as a boolean.
>
> In the previous versions, the reason why WanSnd had the priority
> value is that a pair of synchronized LSNs is determined only by
> one wansender, which has the highest priority among active
> wansenders. So even if a walsender receives a response from
> walreceiver, it doesn't need to do nothing if it is not at the
> highest priority. It's a simple world.
>
> In the quorum commit word, in contrast, what
> SyncRepGetSyncStandbysFn shoud do is returning certain private
> information to be used to calculate a pair of safe/synched LSNs
> in SyncRepGetSYncedLsnsFn looking into WalSndCtl->wansnds
> list. The latter passes a pair of safe/synced LSNs to the upper
> level list or SyncRepSyncedLsnAdvancedTo as the topmost
> caller. There's no room for sync_standby_priority to work as the
> original objective.
>
> Even if we assign the value in the explained way, the values are
> always 1 for quorum method and duplicate values for multiple
> priority method. What do you want to show by the value to users?

I agree with you.
When we implement nested style of multiple sync replication, it would
tough to show to users using by sync_standby_priority.
But in current our first goal (implementing 1-nest style), it doesn't
seem to need to get rid of sync_standby_priority from WalSnd so far,
no?
Towards multiple nested style, I'm roughly planning to have new system
view is defined like follows.

- New system view shows all groups and nodes informations.
- Move sync_state from pg_stat_replication to new system view.
- Get rid of sync_priority from pg_stat_replication.
- Add new sync_state 'quorum' that indicates candidate sync standbys
of its group using quorum method.
- If parent group state is potential, 'potential:' prefix is added to
the child standby's sync_state.

* s_s_names = '2[a, 1(b,c):group1, 1[d,e]:gorup2]'
name | sync_method | member | sync_num |
sync_state | parant_group
-----------+--------------------+---------------------------+---------------+--------------------------+--------------
main | priority | {a,group1,group2} | 2 |
|
a | | |
| sync | main
group1 | quorum | {b,c} | 1 |
sync | main
b | | |
| sync | group1
c | | |
| potential | group1
group2 | priority | {d,e} | 1
| potential | main
d | | |
| potential:sync | group2
e | | |
| potential:potential | group2
(8 rows)

* s_s_names = '2(a, 1[b,c]:group1, 1(d,e):group2)'
name | sync_method | member | sync_num |
sync_state | parant_group
-----------+--------------------+--------------------------+----------------+--------------------------+--------------
main | quorum | {a,group1,group2} | 2 |
|
a | | |
| quorum | main
group1 | priority | {b,c} | 1
| quorum | main
b | | |
| sync | group1
c | | |
| potential | group1
group2 | quorum | {d,e} | 1 |
quorum | main
d | | |
| quorum | group2
e | | |
| quorum | group2
(8 rows)

>> > SyncRepStandbys are to be in multilevel and the struct is
>> > naturally allowed to be so but SyncRepClearStandbyGroupList
>> > assumes it in single level.
>>
>> Because I think that we don't need to implement to fully support
>> nested style at first version.
>> We have to carefully design this feature while considering
>> expandability, but overkill implementation could be cause of crash.
>> Consider remaining time for 9.6, I feel we could implement quorum
>> method at best.
>
> Yes, so I proposed to ass Aseert() in the function.

Will add it.

Regards,

--
Masahiko Sawada

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-10 10:21:01
Message-ID:	CAHGQGwEu=KWZ7YxRtGM6=Q7ZPT3sf7H6XEQtiVxxRToqq33nnQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Hi,
>>
>> Thank you so much for reviewing this patch!
>>
>> All review comments regarding document and comment are fixed.
>> Attached latest v14 patch.
>>
>>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>>> such appliction names are not allowed anyway. If you assume so,
>>> I'd like to see a comment for that.).
>>
>> 'abc^Id' is accepted as application_name, no?
>> postgres(1)=# set application_name to 'abc^Id';
>> SET
>> postgres(1)=# show application_name ;
>> application_name
>> ------------------
>> abc^Id
>> (1 row)
>>
>>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>>> char ychar) requires differnt character types. Is there any reason
>>> for that?
>>
>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>>
>>> I personally don't like addlit*string() things for such simple
>>> syntax but itself is acceptble enough for me. However it uses
>>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>>> of memory chunk for every double-quoted name. The chunks are
>>> finally stacked up left uncollected until the current
>>> memorycontext is deleted or reset (It is deleted just after
>>> finishing config file processing). Addition to that, setting
>>> s_s_names runs the parser twice. It seems to me too greedy and
>>> seems that static char [NAMEDATALEN] is enough using the v12 way
>>> without palloc/repalloc.
>>
>> I though that length of group name could be more than NAMEDATALEN, so
>> I use StringInfo.
>> Is it not necessary?
>>
>>> I found that the name SyncGroupName.wait_num is not
>>> instinctive. How about sync_num, sync_member_num or
>>> sync_standby_num? If the last is preferable, .members also should
>>> be .standbys .
>>
>> Thanks, sync_num is preferable to me.
>>
>> ===
>>> I am quite uncomfortable with the existence of
>>> WanSnd.sync_standby_priority. It represented the pirority in the
>>> old linear s_s_names format but nested groups or even
>>> single-level quarum list obviously doesn't fit it. Can we get rid
>>> of sync_standby_priority, even though we realize atmost
>>> n-priority for now?
>>
>> We could get rid of sync_standby_priority.
>> But if so, we will not be able to see the next sync standby in
>> pg_stat_replication system view.
>> Regarding each node priority, I was thinking that standbys in quorum
>> list have same priority, and in nested group each standbys are given
>> the priority starting from 1.
>>
>> ===
>>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>>> have specific code for every prioritizing method (which are
>>> priority, quorum, nested and so). Is there any reson to use it as
>>> a callback of SyncGroupNode?
>>
>> The reason why the current code is so is that current code is for only
>> priority method supporting.
>> At first version of this feature, I'd like to implement it more simple.
>>
>> Aside from this, of course I'm planning to have specific code for nested design.
>> - The group can have some name nodes or group nodes.
>> - The group can use either 2 types of method: priority or quorum.
>> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
>> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
>> at that moment using group's method.
>> - SyncRepGetStandbysFn() function returns standbys of its group,
>> which are considered as sync using group's method.
>>
>> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
>> memory structure will be,
>>
>> "main(quorum)" --- "a"
>> |
>> -- "b"
>> |
>> -- "group1(priority)" --- "c"
>> |
>> -- "d"
>>
>> When determine synced LSNs, we need to consider group1's LSN using by
>> priority method at first, and then we can determine main's LSN using
>> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
>> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>>
>> bool
>> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
>> {
>> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>>
>> if (sync_num < group->sync_num)
>> return false;
>>
>> for (each member of sync_list)
>> {
>> if (member->type == group node)
>> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
>> f into lsn_list.
>> else
>> Store name node LSNs into lsn_list.
>> }
>>
>> Determine synced LSNs of this group using lsn_list and priority method.
>> Store synced LSNs into write_lsn and flush_lsn.
>> return true;
>> }
>>
>>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
>>> other related functions are defined in syncgroup_gram.y. It would
>>> be better to place them together.
>>
>> SyncRepClearStandbyGroupList() is used by
>> check_synchronous_standby_names(), so I put this function syncrep.c.
>>
>>> SyncRepStandbys are to be in multilevel and the struct is
>>> naturally allowed to be so but SyncRepClearStandbyGroupList
>>> assumes it in single level.
>>
>> Because I think that we don't need to implement to fully support
>> nested style at first version.
>> We have to carefully design this feature while considering
>> expandability, but overkill implementation could be cause of crash.
>> Consider remaining time for 9.6, I feel we could implement quorum
>> method at best.
>>
>>> This is a comment from the aspect of abstractness of objects.
>>> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
>>> the inside of SyncGroupNode but what the function should just
>>> return seems to be the list of wansnds element. Element number is
>>> useless when the SyncGroupNode nests.
>>> > int
>>> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
>>> This might need to expose 'volatile WalSnd*' (only pointer type)
>>> outside of walsender.
>>> Or it should return the list of index number of
>>> *WalSndCtl->walsnds*.
>>
>> SyncRepGetSyncStandbysUsingPriority() already returns the list of
>> index number of "WalSndCtl->walsnd" as sync_list, no?
>> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
>> inside of SyncGroupNode in my design.
>> Selecting sync nodes from its group doesn't depend on the type of node.
>> What SyncRepGetSyncStandbyFn() should do is to select sync node from
>> *its* group.
>>
>
> Previous patch has bug around GUC parameter handling.
> Attached updated version.

Thanks for updating the patch!

Now I'm fixing some problems (e.g., current patch doesn't work
with EXEC_BACKEND environment) and revising the patch.
I will post the revised version this weekend or the first half
of next week.

Regards,

--
Fujii Masao

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-16 01:13:48
Message-ID:	CAEepm=3Ye+Ax_5=MZeHMkx9DFn25QoRzs362sQGNvGcVWx+18w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

<para>
Synchronous replication offers the ability to confirm that all changes
- made by a transaction have been transferred to one synchronous standby
- server. This extends the standard level of durability
+ made by a transaction have been transferred to one or more
synchronous standby
+ server. This extends that standard level of durability
offered by a transaction commit. This level of protection is referred
- to as 2-safe replication in computer science theory.
+ to as group-safe replication in computer science theory.
</para>

A message on the -general list today pointed me to some earlier
discussion[1] which quoted and referenced definitions of these
academic terms[2]. I think the above documentation should say:

"This level of protection is referred to as 2-safe replication in
computer science literature when <variable>synchronous_commit</> is
set to <literal>on</>, and group-1-safe (group-safe and 1-safe) when
<variable>synchronous_commit</> is set to <literal>remote_write</>."

By my reading, the situation doesn't actually change with this patch.
It doesn't matter whether you need 1 or 42 synchronous standbys to
make a quorum: 2-safe means durable (fsync) on all of them,
group-1-safe means durable on one server and received (implied by
remote_write) by all of them.

I think we should be using those definitions because Gray's earlier
definition of 2-safe from Transaction Processing 12.6.3 doesn't really
fit: It can optionally mean remote receipt or remote durable storage,
but it doesn't wait if the 'backup' is down, so it's not the same type
of guarantee. (He also has 'very safe' which might describe our
syncrep, I'm not sure.)

[1] http://www.postgresql.org/message-id/603c8f070812132142n5408e7ddk899e83cddd4cb0b2@mail.gmail.com
[2] http://infoscience.epfl.ch/record/33053/files/EPFL_TH2577.pdf page 76

On Thu, Mar 10, 2016 at 11:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> Thank you so much for reviewing this patch!
>>>
>>> All review comments regarding document and comment are fixed.
>>> Attached latest v14 patch.
>>>
>>>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>>>> such appliction names are not allowed anyway. If you assume so,
>>>> I'd like to see a comment for that.).
>>>
>>> 'abc^Id' is accepted as application_name, no?
>>> postgres(1)=# set application_name to 'abc^Id';
>>> SET
>>> postgres(1)=# show application_name ;
>>> application_name
>>> ------------------
>>> abc^Id
>>> (1 row)
>>>
>>>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>>>> char ychar) requires differnt character types. Is there any reason
>>>> for that?
>>>
>>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>>>
>>>> I personally don't like addlit*string() things for such simple
>>>> syntax but itself is acceptble enough for me. However it uses
>>>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>>>> of memory chunk for every double-quoted name. The chunks are
>>>> finally stacked up left uncollected until the current
>>>> memorycontext is deleted or reset (It is deleted just after
>>>> finishing config file processing). Addition to that, setting
>>>> s_s_names runs the parser twice. It seems to me too greedy and
>>>> seems that static char [NAMEDATALEN] is enough using the v12 way
>>>> without palloc/repalloc.
>>>
>>> I though that length of group name could be more than NAMEDATALEN, so
>>> I use StringInfo.
>>> Is it not necessary?
>>>
>>>> I found that the name SyncGroupName.wait_num is not
>>>> instinctive. How about sync_num, sync_member_num or
>>>> sync_standby_num? If the last is preferable, .members also should
>>>> be .standbys .
>>>
>>> Thanks, sync_num is preferable to me.
>>>
>>> ===
>>>> I am quite uncomfortable with the existence of
>>>> WanSnd.sync_standby_priority. It represented the pirority in the
>>>> old linear s_s_names format but nested groups or even
>>>> single-level quarum list obviously doesn't fit it. Can we get rid
>>>> of sync_standby_priority, even though we realize atmost
>>>> n-priority for now?
>>>
>>> We could get rid of sync_standby_priority.
>>> But if so, we will not be able to see the next sync standby in
>>> pg_stat_replication system view.
>>> Regarding each node priority, I was thinking that standbys in quorum
>>> list have same priority, and in nested group each standbys are given
>>> the priority starting from 1.
>>>
>>> ===
>>>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>>>> have specific code for every prioritizing method (which are
>>>> priority, quorum, nested and so). Is there any reson to use it as
>>>> a callback of SyncGroupNode?
>>>
>>> The reason why the current code is so is that current code is for only
>>> priority method supporting.
>>> At first version of this feature, I'd like to implement it more simple.
>>>
>>> Aside from this, of course I'm planning to have specific code for nested design.
>>> - The group can have some name nodes or group nodes.
>>> - The group can use either 2 types of method: priority or quorum.
>>> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
>>> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
>>> at that moment using group's method.
>>> - SyncRepGetStandbysFn() function returns standbys of its group,
>>> which are considered as sync using group's method.
>>>
>>> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
>>> memory structure will be,
>>>
>>> "main(quorum)" --- "a"
>>> |
>>> -- "b"
>>> |
>>> -- "group1(priority)" --- "c"
>>> |
>>> -- "d"
>>>
>>> When determine synced LSNs, we need to consider group1's LSN using by
>>> priority method at first, and then we can determine main's LSN using
>>> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
>>> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>>>
>>> bool
>>> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
>>> {
>>> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>>>
>>> if (sync_num < group->sync_num)
>>> return false;
>>>
>>> for (each member of sync_list)
>>> {
>>> if (member->type == group node)
>>> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
>>> f into lsn_list.
>>> else
>>> Store name node LSNs into lsn_list.
>>> }
>>>
>>> Determine synced LSNs of this group using lsn_list and priority method.
>>> Store synced LSNs into write_lsn and flush_lsn.
>>> return true;
>>> }
>>>
>>>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
>>>> other related functions are defined in syncgroup_gram.y. It would
>>>> be better to place them together.
>>>
>>> SyncRepClearStandbyGroupList() is used by
>>> check_synchronous_standby_names(), so I put this function syncrep.c.
>>>
>>>> SyncRepStandbys are to be in multilevel and the struct is
>>>> naturally allowed to be so but SyncRepClearStandbyGroupList
>>>> assumes it in single level.
>>>
>>> Because I think that we don't need to implement to fully support
>>> nested style at first version.
>>> We have to carefully design this feature while considering
>>> expandability, but overkill implementation could be cause of crash.
>>> Consider remaining time for 9.6, I feel we could implement quorum
>>> method at best.
>>>
>>>> This is a comment from the aspect of abstractness of objects.
>>>> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
>>>> the inside of SyncGroupNode but what the function should just
>>>> return seems to be the list of wansnds element. Element number is
>>>> useless when the SyncGroupNode nests.
>>>> > int
>>>> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
>>>> This might need to expose 'volatile WalSnd*' (only pointer type)
>>>> outside of walsender.
>>>> Or it should return the list of index number of
>>>> *WalSndCtl->walsnds*.
>>>
>>> SyncRepGetSyncStandbysUsingPriority() already returns the list of
>>> index number of "WalSndCtl->walsnd" as sync_list, no?
>>> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
>>> inside of SyncGroupNode in my design.
>>> Selecting sync nodes from its group doesn't depend on the type of node.
>>> What SyncRepGetSyncStandbyFn() should do is to select sync node from
>>> *its* group.
>>>
>>
>> Previous patch has bug around GUC parameter handling.
>> Attached updated version.
>
> Thanks for updating the patch!
>
> Now I'm fixing some problems (e.g., current patch doesn't work
> with EXEC_BACKEND environment) and revising the patch.
> I will post the revised version this weekend or the first half
> of next week.
>
> Regards,
>
> --
> Fujii Masao

--
Thomas Munro
http://www.enterprisedb.com

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	thomas(dot)munro(at)enterprisedb(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-16 07:48:33
Message-ID:	20160316.164833.188624159.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

It seems to me a matter of definition of "available replicas".

At Wed, 16 Mar 2016 14:13:48 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=3Ye+Ax_5=MZeHMkx9DFn25QoRzs362sQGNvGcVWx+18w(at)mail(dot)gmail(dot)com>
> <para>
> Synchronous replication offers the ability to confirm that all changes
> - made by a transaction have been transferred to one synchronous standby
> - server. This extends the standard level of durability
> + made by a transaction have been transferred to one or more
> synchronous standby
> + server. This extends that standard level of durability
> offered by a transaction commit. This level of protection is referred
> - to as 2-safe replication in computer science theory.
> + to as group-safe replication in computer science theory.
> </para>
>
> A message on the -general list today pointed me to some earlier
> discussion[1] which quoted and referenced definitions of these
> academic terms[2]. I think the above documentation should say:
>
> "This level of protection is referred to as 2-safe replication in
> computer science literature when <variable>synchronous_commit</> is
> set to <literal>on</>, and group-1-safe (group-safe and 1-safe) when
> <variable>synchronous_commit</> is set to <literal>remote_write</>."

I suppose that the "available replica" on the paper is equivalent
to "one choosen synchronous server" at the top of the queue of
living standbys specified by s_s_names. The original description
is true based on this interpretation.

> By my reading, the situation doesn't actually change with this patch.
> It doesn't matter whether you need 1 or 42 synchronous standbys to
> make a quorum: 2-safe means durable (fsync) on all of them,
> group-1-safe means durable on one server and received (implied by
> remote_write) by all of them.

Likewise, "the first two of the living standbys" (2[r01, ..r42])
and the master is translated to "three replicas". So it keeps
2-safe for the case.

> I think we should be using those definitions because Gray's earlier
> definition of 2-safe from Transaction Processing 12.6.3 doesn't really
> fit: It can optionally mean remote receipt or remote durable storage,
> but it doesn't wait if the 'backup' is down, so it's not the same type
> of guarantee. (He also has 'very safe' which might describe our
> syncrep, I'm not sure.)

If the discussion above is true, the description doesn't seem to
need to be amended in the view of the safe-criteria.

> <para>
> Synchronous replication offers the ability to confirm that all changes
> - made by a transaction have been transferred to one synchronous standby
> - server. This extends the standard level of durability
> + made by a transaction have been transferred to one or more synchronous standby
> + server. This extends that standard level of durability
> offered by a transaction commit. This level of protection is referred
> to as 2-safe replication in computer science theory.
> </para>

But some additional explanation might be needed.

For the true quorum commit, a client will be notified when the
master and any n of all standbys have committed. This won't fit
exactly to the criterias in the paper.

In regard to Gray's definition, "2-safe" looks to be PG's syncrep
with automatic release mechanism, such like what pgsql-RA
offers. And "high availability" doesn't seem to fit to
PostgreSQL's behavior because the master virtually commits a
transaction before making an agreement to commit among all of
replicas.

# I'm reading it in Japanese so some words may be incorrect.

Thoughts?

> [1] http://www.postgresql.org/message-id/603c8f070812132142n5408e7ddk899e83cddd4cb0b2@mail.gmail.com
> [2] http://infoscience.epfl.ch/record/33053/files/EPFL_TH2577.pdf page 76
>
> On Thu, Mar 10, 2016 at 11:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >>> Hi,
> >>>
> >>> Thank you so much for reviewing this patch!
> >>>
> >>> All review comments regarding document and comment are fixed.
> >>> Attached latest v14 patch.
> >>>
> >>>> This accepts 'abc^Id' as a name, which is wrong behavior (but
> >>>> such appliction names are not allowed anyway. If you assume so,
> >>>> I'd like to see a comment for that.).
> >>>
> >>> 'abc^Id' is accepted as application_name, no?
> >>> postgres(1)=# set application_name to 'abc^Id';
> >>> SET
> >>> postgres(1)=# show application_name ;
> >>> application_name
> >>> ------------------
> >>> abc^Id
> >>> (1 row)
> >>>
> >>>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
> >>>> char ychar) requires differnt character types. Is there any reason
> >>>> for that?
> >>>
> >>> Because addlit_xd_string() is for adding string(char *) to xd_string,
> >>> OTOH addlit_xd_char() is for adding just one character to xd_string.
> >>>
> >>>> I personally don't like addlit*string() things for such simple
> >>>> syntax but itself is acceptble enough for me. However it uses
> >>>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
> >>>> of memory chunk for every double-quoted name. The chunks are
> >>>> finally stacked up left uncollected until the current
> >>>> memorycontext is deleted or reset (It is deleted just after
> >>>> finishing config file processing). Addition to that, setting
> >>>> s_s_names runs the parser twice. It seems to me too greedy and
> >>>> seems that static char [NAMEDATALEN] is enough using the v12 way
> >>>> without palloc/repalloc.
> >>>
> >>> I though that length of group name could be more than NAMEDATALEN, so
> >>> I use StringInfo.
> >>> Is it not necessary?
> >>>
> >>>> I found that the name SyncGroupName.wait_num is not
> >>>> instinctive. How about sync_num, sync_member_num or
> >>>> sync_standby_num? If the last is preferable, .members also should
> >>>> be .standbys .
> >>>
> >>> Thanks, sync_num is preferable to me.
> >>>
> >>> ===
> >>>> I am quite uncomfortable with the existence of
> >>>> WanSnd.sync_standby_priority. It represented the pirority in the
> >>>> old linear s_s_names format but nested groups or even
> >>>> single-level quarum list obviously doesn't fit it. Can we get rid
> >>>> of sync_standby_priority, even though we realize atmost
> >>>> n-priority for now?
> >>>
> >>> We could get rid of sync_standby_priority.
> >>> But if so, we will not be able to see the next sync standby in
> >>> pg_stat_replication system view.
> >>> Regarding each node priority, I was thinking that standbys in quorum
> >>> list have same priority, and in nested group each standbys are given
> >>> the priority starting from 1.
> >>>
> >>> ===
> >>>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
> >>>> have specific code for every prioritizing method (which are
> >>>> priority, quorum, nested and so). Is there any reson to use it as
> >>>> a callback of SyncGroupNode?
> >>>
> >>> The reason why the current code is so is that current code is for only
> >>> priority method supporting.
> >>> At first version of this feature, I'd like to implement it more simple.
> >>>
> >>> Aside from this, of course I'm planning to have specific code for nested design.
> >>> - The group can have some name nodes or group nodes.
> >>> - The group can use either 2 types of method: priority or quorum.
> >>> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
> >>> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
> >>> at that moment using group's method.
> >>> - SyncRepGetStandbysFn() function returns standbys of its group,
> >>> which are considered as sync using group's method.
> >>>
> >>> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
> >>> memory structure will be,
> >>>
> >>> "main(quorum)" --- "a"
> >>> |
> >>> -- "b"
> >>> |
> >>> -- "group1(priority)" --- "c"
> >>> |
> >>> -- "d"
> >>>
> >>> When determine synced LSNs, we need to consider group1's LSN using by
> >>> priority method at first, and then we can determine main's LSN using
> >>> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
> >>> So SyncRepGetSyncedLsnsUsingPriority() function would be,
> >>>
> >>> bool
> >>> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
> >>> {
> >>> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
> >>>
> >>> if (sync_num < group->sync_num)
> >>> return false;
> >>>
> >>> for (each member of sync_list)
> >>> {
> >>> if (member->type == group node)
> >>> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
> >>> f into lsn_list.
> >>> else
> >>> Store name node LSNs into lsn_list.
> >>> }
> >>>
> >>> Determine synced LSNs of this group using lsn_list and priority method.
> >>> Store synced LSNs into write_lsn and flush_lsn.
> >>> return true;
> >>> }
> >>>
> >>>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
> >>>> other related functions are defined in syncgroup_gram.y. It would
> >>>> be better to place them together.
> >>>
> >>> SyncRepClearStandbyGroupList() is used by
> >>> check_synchronous_standby_names(), so I put this function syncrep.c.
> >>>
> >>>> SyncRepStandbys are to be in multilevel and the struct is
> >>>> naturally allowed to be so but SyncRepClearStandbyGroupList
> >>>> assumes it in single level.
> >>>
> >>> Because I think that we don't need to implement to fully support
> >>> nested style at first version.
> >>> We have to carefully design this feature while considering
> >>> expandability, but overkill implementation could be cause of crash.
> >>> Consider remaining time for 9.6, I feel we could implement quorum
> >>> method at best.
> >>>
> >>>> This is a comment from the aspect of abstractness of objects.
> >>>> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
> >>>> the inside of SyncGroupNode but what the function should just
> >>>> return seems to be the list of wansnds element. Element number is
> >>>> useless when the SyncGroupNode nests.
> >>>> > int
> >>>> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
> >>>> This might need to expose 'volatile WalSnd*' (only pointer type)
> >>>> outside of walsender.
> >>>> Or it should return the list of index number of
> >>>> *WalSndCtl->walsnds*.
> >>>
> >>> SyncRepGetSyncStandbysUsingPriority() already returns the list of
> >>> index number of "WalSndCtl->walsnd" as sync_list, no?
> >>> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
> >>> inside of SyncGroupNode in my design.
> >>> Selecting sync nodes from its group doesn't depend on the type of node.
> >>> What SyncRepGetSyncStandbyFn() should do is to select sync node from
> >>> *its* group.
> >>>
> >>
> >> Previous patch has bug around GUC parameter handling.
> >> Attached updated version.
> >
> > Thanks for updating the patch!
> >
> > Now I'm fixing some problems (e.g., current patch doesn't work
> > with EXEC_BACKEND environment) and revising the patch.
> > I will post the revised version this weekend or the first half
> > of next week.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-22 07:02:39
Message-ID:	CAHGQGwGnvuX8wR-FYH+TrNi_TWunZzU=nJFMdXkO6O8M4GbNvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 10, 2016 at 7:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Mar 4, 2016 at 3:40 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Thu, Mar 3, 2016 at 11:30 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> Hi,
>>>
>>> Thank you so much for reviewing this patch!
>>>
>>> All review comments regarding document and comment are fixed.
>>> Attached latest v14 patch.
>>>
>>>> This accepts 'abc^Id' as a name, which is wrong behavior (but
>>>> such appliction names are not allowed anyway. If you assume so,
>>>> I'd like to see a comment for that.).
>>>
>>> 'abc^Id' is accepted as application_name, no?
>>> postgres(1)=# set application_name to 'abc^Id';
>>> SET
>>> postgres(1)=# show application_name ;
>>> application_name
>>> ------------------
>>> abc^Id
>>> (1 row)
>>>
>>>> addlit_xd_string(char *ytext) and addlitchar_xd_string(unsigned
>>>> char ychar) requires differnt character types. Is there any reason
>>>> for that?
>>>
>>> Because addlit_xd_string() is for adding string(char *) to xd_string,
>>> OTOH addlit_xd_char() is for adding just one character to xd_string.
>>>
>>>> I personally don't like addlit*string() things for such simple
>>>> syntax but itself is acceptble enough for me. However it uses
>>>> StringInfo to hold double-quoted names, which pallocs 1024 bytes
>>>> of memory chunk for every double-quoted name. The chunks are
>>>> finally stacked up left uncollected until the current
>>>> memorycontext is deleted or reset (It is deleted just after
>>>> finishing config file processing). Addition to that, setting
>>>> s_s_names runs the parser twice. It seems to me too greedy and
>>>> seems that static char [NAMEDATALEN] is enough using the v12 way
>>>> without palloc/repalloc.
>>>
>>> I though that length of group name could be more than NAMEDATALEN, so
>>> I use StringInfo.
>>> Is it not necessary?
>>>
>>>> I found that the name SyncGroupName.wait_num is not
>>>> instinctive. How about sync_num, sync_member_num or
>>>> sync_standby_num? If the last is preferable, .members also should
>>>> be .standbys .
>>>
>>> Thanks, sync_num is preferable to me.
>>>
>>> ===
>>>> I am quite uncomfortable with the existence of
>>>> WanSnd.sync_standby_priority. It represented the pirority in the
>>>> old linear s_s_names format but nested groups or even
>>>> single-level quarum list obviously doesn't fit it. Can we get rid
>>>> of sync_standby_priority, even though we realize atmost
>>>> n-priority for now?
>>>
>>> We could get rid of sync_standby_priority.
>>> But if so, we will not be able to see the next sync standby in
>>> pg_stat_replication system view.
>>> Regarding each node priority, I was thinking that standbys in quorum
>>> list have same priority, and in nested group each standbys are given
>>> the priority starting from 1.
>>>
>>> ===
>>>> The function SyncRepGetSyncedLsnsUsingPriority doesn't seem to
>>>> have specific code for every prioritizing method (which are
>>>> priority, quorum, nested and so). Is there any reson to use it as
>>>> a callback of SyncGroupNode?
>>>
>>> The reason why the current code is so is that current code is for only
>>> priority method supporting.
>>> At first version of this feature, I'd like to implement it more simple.
>>>
>>> Aside from this, of course I'm planning to have specific code for nested design.
>>> - The group can have some name nodes or group nodes.
>>> - The group can use either 2 types of method: priority or quorum.
>>> - The group has SyncRepGetSyncedLsnFn() and SyncRepGetStandbysFn()
>>> - SyncRepGetSyncedLsnsFn() function recursively determine synced LSN
>>> at that moment using group's method.
>>> - SyncRepGetStandbysFn() function returns standbys of its group,
>>> which are considered as sync using group's method.
>>>
>>> For example, s_s_name = '3(a, b, 2[c,d]::group1)', SyncRepStandbys
>>> memory structure will be,
>>>
>>> "main(quorum)" --- "a"
>>> |
>>> -- "b"
>>> |
>>> -- "group1(priority)" --- "c"
>>> |
>>> -- "d"
>>>
>>> When determine synced LSNs, we need to consider group1's LSN using by
>>> priority method at first, and then we can determine main's LSN using
>>> by quorum method with "a" LSNs, "b" LSNs and "group1" LSNs.
>>> So SyncRepGetSyncedLsnsUsingPriority() function would be,
>>>
>>> bool
>>> SyncRepGetSyncedLsnsUsingPriority(*group, *write_lsn, *flush_lsn)
>>> {
>>> sync_num = group->SynRepGetSyncstandbysFn(group, sync_list);
>>>
>>> if (sync_num < group->sync_num)
>>> return false;
>>>
>>> for (each member of sync_list)
>>> {
>>> if (member->type == group node)
>>> call SyncRepGetSyncedLsnsFn(member, w, f) and store w and
>>> f into lsn_list.
>>> else
>>> Store name node LSNs into lsn_list.
>>> }
>>>
>>> Determine synced LSNs of this group using lsn_list and priority method.
>>> Store synced LSNs into write_lsn and flush_lsn.
>>> return true;
>>> }
>>>
>>>> SyncRepClearStandbyGroupList is defined in syncrep.c but the
>>>> other related functions are defined in syncgroup_gram.y. It would
>>>> be better to place them together.
>>>
>>> SyncRepClearStandbyGroupList() is used by
>>> check_synchronous_standby_names(), so I put this function syncrep.c.
>>>
>>>> SyncRepStandbys are to be in multilevel and the struct is
>>>> naturally allowed to be so but SyncRepClearStandbyGroupList
>>>> assumes it in single level.
>>>
>>> Because I think that we don't need to implement to fully support
>>> nested style at first version.
>>> We have to carefully design this feature while considering
>>> expandability, but overkill implementation could be cause of crash.
>>> Consider remaining time for 9.6, I feel we could implement quorum
>>> method at best.
>>>
>>>> This is a comment from the aspect of abstractness of objects.
>>>> The callers of SyncRepGetSyncStandbysUsingPriority() need to care
>>>> the inside of SyncGroupNode but what the function should just
>>>> return seems to be the list of wansnds element. Element number is
>>>> useless when the SyncGroupNode nests.
>>>> > int
>>>> > SyncRepGetSyncStandbysUsingPriority(SyncGroupNode *group, volatile WalSnd **sync_list)
>>>> This might need to expose 'volatile WalSnd*' (only pointer type)
>>>> outside of walsender.
>>>> Or it should return the list of index number of
>>>> *WalSndCtl->walsnds*.
>>>
>>> SyncRepGetSyncStandbysUsingPriority() already returns the list of
>>> index number of "WalSndCtl->walsnd" as sync_list, no?
>>> As I mentioned above, SyncRepGetSyncStandbysFn() doesn't need care the
>>> inside of SyncGroupNode in my design.
>>> Selecting sync nodes from its group doesn't depend on the type of node.
>>> What SyncRepGetSyncStandbyFn() should do is to select sync node from
>>> *its* group.
>>>
>>
>> Previous patch has bug around GUC parameter handling.
>> Attached updated version.
>
> Thanks for updating the patch!
>
> Now I'm fixing some problems (e.g., current patch doesn't work
> with EXEC_BACKEND environment) and revising the patch.

Sorry for the delay... Here is the revised version of the patch.
Please review and test this version!
BTW, I've not revised the documentation and regression test yet.
I will do that during the review and test of the patch.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v16.patch	text/x-patch	44.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-22 12:58:48
Message-ID:	20160322.215848.19286227.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the revised patch.

At Tue, 22 Mar 2016 16:02:39 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwGnvuX8wR-FYH+TrNi_TWunZzU=nJFMdXkO6O8M4GbNvQ(at)mail(dot)gmail(dot)com>
> On Thu, Mar 10, 2016 at 7:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Sorry for the delay... Here is the revised version of the patch.
> Please review and test this version!
> BTW, I've not revised the documentation and regression test yet.
> I will do that during the review and test of the patch.

This version looks to focus on n-priority method. Stuffs for the
other methods like n-quorum has been removed. It is okay for me.

So using WalSnd->sync_standby_priority is reasonable.

SyncRePGetSyncStandbys seems to work as expected, that is,
collecting n standbys in the order of priority, even if multiple
standbys are at the same prioirity, but in (pseudo) random order
among the standbys with the same priority, not LSN order. This is
the difference from the true quoraum method.

About announcement of take over,

> if (announce_next_takeover && am_sync)
> {
> announce_next_takeover = false;
> ereport(LOG,
> (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
> application_name, MyWalSnd->sync_standby_priority)));

This can announces for the seemingly same standby successively if
standbys with the same application_name are comming-in and
going-out. But this is the same as the current behavior.

Otherwise, as far as I can see, SyncRepReleaseWaiters seems to
work correctly.

SyncRepinitConfig parses s_s_names then prioritize all walsenders
based on the result. This is run at the start of a walsender and
at reloading of config. Ended walsenders are excluded on
collectiong sync-standbys. All of these seems to work
properly. (as before).

The parser became far simpler by getting rid of the stuffs for
the future expansion. It accepts only '<n>[name, ...]' and the
old s_s_names format.

StringInfo for double-quoted names seems to me to be overkill,
since it allocates 1024 byte block for every such name. A static
buffer seems enough for the usage as I said.

The parser is called for not only for SIGHUP, but also for
starting of every walsender. The latter is not necessary but it
is the matter of trade-off between simplisity and
effectiveness. The same can be said for
check_synchronous_standby_names().

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-22 14:08:36
Message-ID:	CAHGQGwFYG829=2r4mxV0ULeBNaUuG0ek_10yymx8Cu-gLYcLng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thank you for the revised patch.

Thanks for reviewing the patch!

> This version looks to focus on n-priority method. Stuffs for the
> other methods like n-quorum has been removed. It is okay for me.

I don't think it's so difficult to extend this version so that
it supports also quorum commit.

> StringInfo for double-quoted names seems to me to be overkill,
> since it allocates 1024 byte block for every such name. A static
> buffer seems enough for the usage as I said.

So, what about changing the scanner code as follows?

<xd>{xdstop} {
yylval.str = pstrdup(xdbuf.data);
pfree(xdbuf.data);
BEGIN(INITIAL);
return NAME;

> The parser is called for not only for SIGHUP, but also for
> starting of every walsender. The latter is not necessary but it
> is the matter of trade-off between simplisity and
> effectiveness.

Could you elaborate why you think that's not necessary?

BTW, in previous patch, s_s_names is parsed by postmaster during the server
startup. A child process takes over the internal data struct for the parsed
s_s_names when it's forked by the postmaster. This is what the previous
patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
In that environment, the data struct should be passed to a child process via
the special file (like write_nondefault_variables() does), or it should
be constructed during walsender startup (like latest version of the patch
does). IMO the latter is simpler.

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-22 17:28:39
Message-ID:	CAD21AoBHa59i_XsjK-aeo=DWPP3S9xuMeFNHzGJf7U01D8qaCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 22, 2016 at 11:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Thank you for the revised patch.
>
> Thanks for reviewing the patch!
>
>> This version looks to focus on n-priority method. Stuffs for the
>> other methods like n-quorum has been removed. It is okay for me.
>
> I don't think it's so difficult to extend this version so that
> it supports also quorum commit.

Yeah, 1-nest level implementation would not so difficult.

>> StringInfo for double-quoted names seems to me to be overkill,
>> since it allocates 1024 byte block for every such name. A static
>> buffer seems enough for the usage as I said.
>
> So, what about changing the scanner code as follows?
>
> <xd>{xdstop} {
> yylval.str = pstrdup(xdbuf.data);
> pfree(xdbuf.data);
> BEGIN(INITIAL);
> return NAME;
>> The parser is called for not only for SIGHUP, but also for
>> starting of every walsender. The latter is not necessary but it
>> is the matter of trade-off between simplisity and
>> effectiveness.
>
> Could you elaborate why you think that's not necessary?
>
> BTW, in previous patch, s_s_names is parsed by postmaster during the server
> startup. A child process takes over the internal data struct for the parsed
> s_s_names when it's forked by the postmaster. This is what the previous
> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
> In that environment, the data struct should be passed to a child process via
> the special file (like write_nondefault_variables() does), or it should
> be constructed during walsender startup (like latest version of the patch
> does). IMO the latter is simpler.

Thank you for updating patch.

Followings are random review comments.

==
+ for (cell = list_head(pending); cell; cell = next)

Can we use foreach() instead?
==
+ pending = list_delete_cell(pending, cell, prev);
+
+ if (list_length(pending) == 0)
+ {
+ list_free(pending);
+ return result; /*
Exit if pending list is empty */
+ }

If pending list become empty after deleting element, we can return.
It's a small optimisation.
==
If num_sync is greater than the number of members of sync standby
list, we'd rather return error message immediately.
Thoughts?
==
I got assertion error when master server is set up with empty s_s_names.
Because current patch always tries to parse s_s_names and use it
regardless value of parameter.

Attached patch incorporates above comments.
Please find it.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
multi_sync_replication_v17.patch	application/octet-stream	40.0 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-23 04:21:02
Message-ID:	CAHGQGwEQws1brZkcKv8khgcsBK5hNeauJkcEH6AH+fsfQ06q5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 23, 2016 at 2:28 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Mar 22, 2016 at 11:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Thank you for the revised patch.
>>
>> Thanks for reviewing the patch!
>>
>>> This version looks to focus on n-priority method. Stuffs for the
>>> other methods like n-quorum has been removed. It is okay for me.
>>
>> I don't think it's so difficult to extend this version so that
>> it supports also quorum commit.
>
> Yeah, 1-nest level implementation would not so difficult.
>
>>> StringInfo for double-quoted names seems to me to be overkill,
>>> since it allocates 1024 byte block for every such name. A static
>>> buffer seems enough for the usage as I said.
>>
>> So, what about changing the scanner code as follows?
>>
>> <xd>{xdstop} {
>> yylval.str = pstrdup(xdbuf.data);
>> pfree(xdbuf.data);
>> BEGIN(INITIAL);
>> return NAME;

I applied this change to the latest version of the patch.
Please check that.

Also I changed syncrep.c so that it uses list_free_deep() to free the list
of the parsed s_s_names. Because the data in the list is palloc'd by
syncrep_scanner.l.

>>> The parser is called for not only for SIGHUP, but also for
>>> starting of every walsender. The latter is not necessary but it
>>> is the matter of trade-off between simplisity and
>>> effectiveness.
>>
>> Could you elaborate why you think that's not necessary?
>>
>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
>> startup. A child process takes over the internal data struct for the parsed
>> s_s_names when it's forked by the postmaster. This is what the previous
>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
>> In that environment, the data struct should be passed to a child process via
>> the special file (like write_nondefault_variables() does), or it should
>> be constructed during walsender startup (like latest version of the patch
>> does). IMO the latter is simpler.
>
> Thank you for updating patch.
>
> Followings are random review comments.
>
> ==
> + for (cell = list_head(pending); cell; cell = next)
>
> Can we use foreach() instead?

Yes.

> ==
> + pending = list_delete_cell(pending, cell, prev);
> +
> + if (list_length(pending) == 0)
> + {
> + list_free(pending);
> + return result; /*
> Exit if pending list is empty */
> + }
>
> If pending list become empty after deleting element, we can return.
> It's a small optimisation.

I don' think this is necessary because currently we can get ouf of the loop
immediately after that deletion.

But I found the bug about the calculation of the next highest priority.
This could cause extra unnecessary loop. I fixed that in the latest version
of the patch.

> ==
> If num_sync is greater than the number of members of sync standby
> list, we'd rather return error message immediately.
> Thoughts?

No. For example, please imagine the case where s_s_names is set to '*'
and more than one sync standbys are connecting to the master.
That's valid setting.

> ==
> I got assertion error when master server is set up with empty s_s_names.
> Because current patch always tries to parse s_s_names and use it
> regardless value of parameter.

Yeah, you're right.

>
> Attached patch incorporates above comments.
> Please find it.

Attached is the latest version of the patch based on your patch.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v18.patch	text/x-patch	44.1 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-23 04:36:39
Message-ID:	CAB7nPqQCB5fWedQKALsKBeTmMMtjcuYJb1o5XFZ4eULJeQ2pnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 23, 2016 at 1:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Mar 23, 2016 at 2:28 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Attached patch incorporates above comments.
>> Please find it.
>
> Attached is the latest version of the patch based on your patch.

Not really having a look at the core patch yet...

+ my $result = $node_master->psql('postgres', "SELECT
application_name, sync_priority, sync_state FROM
pg_stat_replication;");
+ print "$result \n";
Having ORDER BY application_name would be good for those queries, and
the result outputs could be made more consistent as a result.

+ # Change the s_s_names = '2[standby1,standby2,standby3]' and check sync state
+ $node_master->psql('postgres', "ALTER SYSTEM SET
synchronous_standby_names = '2[standby1,standby2,standby3]';");
+ $node_master->psql('postgres', "SELECT pg_reload_conf();");
Let's add a reload routine in PostgresNode.pm, this patch is not the
only one who would use it.

--- b/src/test/recovery/t/006_multisync_rep.pl
***************
*** 0 ****
--- 1,106 ----
+ use strict;
+ use warnings;
You may want to add a small description for this test as header.

$postgres->AddFiles('src/backend/replication', 'repl_scanner.l',
'repl_gram.y');
+ $postgres->AddFiles('src/backend/replication', 'syncrep_scanner.l',
+ 'syncrep_gram.y');
There is no need for a new routine call here, you can just append the
new files on the existing call.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-23 08:32:00
Message-ID:	20160323.173200.20902715.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFYG829=2r4mxV0ULeBNaUuG0ek_10yymx8Cu-gLYcLng(at)mail(dot)gmail(dot)com>
> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Thank you for the revised patch.
>
> Thanks for reviewing the patch!
>
> > This version looks to focus on n-priority method. Stuffs for the
> > other methods like n-quorum has been removed. It is okay for me.
>
> I don't think it's so difficult to extend this version so that
> it supports also quorum commit.

Mmm. I think I understand this just now. As Sawada-san said
before, all standbys in a single-level quorum set having the same
sync_standby_prioirity, the current algorithm works as it is. It
also true for the case that some quorum sets are in a priority
set.

What about some priority sets in a quorum set?

Sorry, starting of walsender is not so large problem, 1024 bytes
memory is just abandoned once. SIGHUP is rather a problem.

The part is called under two kinds of memory context, "config
file processing" then "Replication command context". The former
is deleted just after reading the config file so no harm but the
latter is a quite long-lasting context and every reloading bloats
the context with abandoned memory blocks. It is needed to be
pfreed or to use a memory context with shorter lifetime, or use
static storage of 64 byte-length, even though the bloat become
visible after very many times of conf reloads.

> BTW, in previous patch, s_s_names is parsed by postmaster during the server
> startup. A child process takes over the internal data struct for the parsed
> s_s_names when it's forked by the postmaster. This is what the previous
> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
> In that environment, the data struct should be passed to a child process via
> the special file (like write_nondefault_variables() does), or it should
> be constructed during walsender startup (like latest version of the patch
> does). IMO the latter is simpler.

Ah, I haven't notice that but I agree with it.

As per my previous comment, syncrep_scanner.l doesn't reject some
(nonprintable and multibyte) characters in a name, which is to be
silently replaced with '?' for application_name. It would not be
a problem for almost all of us but might be needed to be
documented if we won't change the behavior to be the same as
application_name.

By the way, the following documentation fix mentioned by Thomas,

- to as 2-safe replication in computer science theory.
+ to as group-safe replication in computer science theory.

should be restored if the discussion in the following message is
true. And some supplemental description would be needed.

http://www.postgresql.org/message-id/20160316.164833.188624159.horiguchi.kyotaro@lab.ntt.co.jp

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-24 02:34:33
Message-ID:	CAHGQGwGPTs+oyYKGspsiosn69RMZRnnx1pAT+GuqwFCNCGeafA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFYG829=2r4mxV0ULeBNaUuG0ek_10yymx8Cu-gLYcLng(at)mail(dot)gmail(dot)com>
>> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Thank you for the revised patch.
>>
>> Thanks for reviewing the patch!
>>
>> > This version looks to focus on n-priority method. Stuffs for the
>> > other methods like n-quorum has been removed. It is okay for me.
>>
>> I don't think it's so difficult to extend this version so that
>> it supports also quorum commit.
>
> Mmm. I think I understand this just now. As Sawada-san said
> before, all standbys in a single-level quorum set having the same
> sync_standby_prioirity, the current algorithm works as it is. It
> also true for the case that some quorum sets are in a priority
> set.
>
> What about some priority sets in a quorum set?
>
>> > StringInfo for double-quoted names seems to me to be overkill,
>> > since it allocates 1024 byte block for every such name. A static
>> > buffer seems enough for the usage as I said.
>>
>> So, what about changing the scanner code as follows?
>>
>> <xd>{xdstop} {
>> yylval.str = pstrdup(xdbuf.data);
>> pfree(xdbuf.data);
>> BEGIN(INITIAL);
>> return NAME;
>>
>> > The parser is called for not only for SIGHUP, but also for
>> > starting of every walsender. The latter is not necessary but it
>> > is the matter of trade-off between simplisity and
>> > effectiveness.
>>
>> Could you elaborate why you think that's not necessary?
>
> Sorry, starting of walsender is not so large problem, 1024 bytes
> memory is just abandoned once. SIGHUP is rather a problem.
>
> The part is called under two kinds of memory context, "config
> file processing" then "Replication command context". The former
> is deleted just after reading the config file so no harm but the
> latter is a quite long-lasting context and every reloading bloats
> the context with abandoned memory blocks. It is needed to be
> pfreed or to use a memory context with shorter lifetime, or use
> static storage of 64 byte-length, even though the bloat become
> visible after very many times of conf reloads.

SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
in the patch. Or am I missing something?

>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
>> startup. A child process takes over the internal data struct for the parsed
>> s_s_names when it's forked by the postmaster. This is what the previous
>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
>> In that environment, the data struct should be passed to a child process via
>> the special file (like write_nondefault_variables() does), or it should
>> be constructed during walsender startup (like latest version of the patch
>> does). IMO the latter is simpler.
>
> Ah, I haven't notice that but I agree with it.
>
>
> As per my previous comment, syncrep_scanner.l doesn't reject some
> (nonprintable and multibyte) characters in a name, which is to be
> silently replaced with '?' for application_name. It would not be
> a problem for almost all of us but might be needed to be
> documented if we won't change the behavior to be the same as
> application_name.

There are three options:

1. Replace nonprintable and non-ASCII characters in s_s_names with ?
2. Emit an error if s_s_names contains nonprintable and non-ASCII characters
3. Do nothing (9.5 or before behave in this way)

You implied that we should choose #1 or #2?

> By the way, the following documentation fix mentioned by Thomas,
>
> - to as 2-safe replication in computer science theory.
> + to as group-safe replication in computer science theory.
>
> should be restored if the discussion in the following message is
> true. And some supplemental description would be needed.
>
> http://www.postgresql.org/message-id/20160316.164833.188624159.horiguchi.kyotaro@lab.ntt.co.jp

Yeah, the document needs to be updated.

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-24 04:04:49
Message-ID:	CAD21AoBVn3_5qC_CKeKSXTu963mM=n9-GxzF7KCPreTTMS+JGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 24, 2016 at 11:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Hello,
>>
>> At Tue, 22 Mar 2016 23:08:36 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFYG829=2r4mxV0ULeBNaUuG0ek_10yymx8Cu-gLYcLng(at)mail(dot)gmail(dot)com>
>>> On Tue, Mar 22, 2016 at 9:58 PM, Kyotaro HORIGUCHI
>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> > Thank you for the revised patch.
>>>
>>> Thanks for reviewing the patch!
>>>
>>> > This version looks to focus on n-priority method. Stuffs for the
>>> > other methods like n-quorum has been removed. It is okay for me.
>>>
>>> I don't think it's so difficult to extend this version so that
>>> it supports also quorum commit.
>>
>> Mmm. I think I understand this just now. As Sawada-san said
>> before, all standbys in a single-level quorum set having the same
>> sync_standby_prioirity, the current algorithm works as it is. It
>> also true for the case that some quorum sets are in a priority
>> set.
>>
>> What about some priority sets in a quorum set?

We should surely consider it that when we support more than 1 nest
level configuration.
IMO, we can have another information which indicates current sync
standbys instead of sync_priority.
For now, we are'nt trying to support even quorum method, so we could
consider it after we can support both priority method and quorum
method without incident.

>>> > StringInfo for double-quoted names seems to me to be overkill,
>>> > since it allocates 1024 byte block for every such name. A static
>>> > buffer seems enough for the usage as I said.
>>>
>>> So, what about changing the scanner code as follows?
>>>
>>> <xd>{xdstop} {
>>> yylval.str = pstrdup(xdbuf.data);
>>> pfree(xdbuf.data);
>>> BEGIN(INITIAL);
>>> return NAME;
>>>
>>> > The parser is called for not only for SIGHUP, but also for
>>> > starting of every walsender. The latter is not necessary but it
>>> > is the matter of trade-off between simplisity and
>>> > effectiveness.
>>>
>>> Could you elaborate why you think that's not necessary?
>>
>> Sorry, starting of walsender is not so large problem, 1024 bytes
>> memory is just abandoned once. SIGHUP is rather a problem.
>>
>> The part is called under two kinds of memory context, "config
>> file processing" then "Replication command context". The former
>> is deleted just after reading the config file so no harm but the
>> latter is a quite long-lasting context and every reloading bloats
>> the context with abandoned memory blocks. It is needed to be
>> pfreed or to use a memory context with shorter lifetime, or use
>> static storage of 64 byte-length, even though the bloat become
>> visible after very many times of conf reloads.
>
> SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
> in the patch. Or am I missing something?
>
>>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
>>> startup. A child process takes over the internal data struct for the parsed
>>> s_s_names when it's forked by the postmaster. This is what the previous
>>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
>>> In that environment, the data struct should be passed to a child process via
>>> the special file (like write_nondefault_variables() does), or it should
>>> be constructed during walsender startup (like latest version of the patch
>>> does). IMO the latter is simpler.
>>
>> Ah, I haven't notice that but I agree with it.
>>
>>
>> As per my previous comment, syncrep_scanner.l doesn't reject some
>> (nonprintable and multibyte) characters in a name, which is to be
>> silently replaced with '?' for application_name. It would not be
>> a problem for almost all of us but might be needed to be
>> documented if we won't change the behavior to be the same as
>> application_name.
>
> There are three options:
>
> 1. Replace nonprintable and non-ASCII characters in s_s_names with ?
> 2. Emit an error if s_s_names contains nonprintable and non-ASCII characters
> 3. Do nothing (9.5 or before behave in this way)
>
> You implied that we should choose #1 or #2?

Previous(9.5 or before) s_s_names also accepts non-ASCII character and
non-printable character, and can show it without replacing these
character to '?'.
From backward compatibility perspective, we should not choose #1 or #2.
Different behaviour between previous and current s_s_names is that
previous s_s_names doesn't accept the node name having the sort of
white-space character that isspace() returns true with.
But current s_s_names allows us to specify such a node name.
I guess that changing such behaviour is enough for fixing this issue.
Thoughts?

>
>> By the way, the following documentation fix mentioned by Thomas,
>>
>> - to as 2-safe replication in computer science theory.
>> + to as group-safe replication in computer science theory.
>>
>> should be restored if the discussion in the following message is
>> true. And some supplemental description would be needed.
>>
>> http://www.postgresql.org/message-id/20160316.164833.188624159.horiguchi.kyotaro@lab.ntt.co.jp
>
> Yeah, the document needs to be updated.

I will do that.

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-24 05:26:40
Message-ID:	20160324.142640.33417187.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBVn3_5qC_CKeKSXTu963mM=n9-GxzF7KCPreTTMS+JGQ(at)mail(dot)gmail(dot)com>
> On Thu, Mar 24, 2016 at 11:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >>> I don't think it's so difficult to extend this version so that
> >>> it supports also quorum commit.
> >>
> >> Mmm. I think I understand this just now. As Sawada-san said
> >> before, all standbys in a single-level quorum set having the same
> >> sync_standby_prioirity, the current algorithm works as it is. It
> >> also true for the case that some quorum sets are in a priority
> >> set.
> >>
> >> What about some priority sets in a quorum set?
>
> We should surely consider it that when we support more than 1 nest
> level configuration.
> IMO, we can have another information which indicates current sync
> standbys instead of sync_priority.
> For now, we are'nt trying to support even quorum method, so we could
> consider it after we can support both priority method and quorum
> method without incident.

Fine with me.

> >>> > StringInfo for double-quoted names seems to me to be overkill,
> >>> > since it allocates 1024 byte block for every such name. A static
> >>> > buffer seems enough for the usage as I said.
> >>>
> >>> So, what about changing the scanner code as follows?
> >>>
> >>> <xd>{xdstop} {
> >>> yylval.str = pstrdup(xdbuf.data);
> >>> pfree(xdbuf.data);
> >>> BEGIN(INITIAL);
> >>> return NAME;
> >>>
> >>> > The parser is called for not only for SIGHUP, but also for
> >>> > starting of every walsender. The latter is not necessary but it
> >>> > is the matter of trade-off between simplisity and
> >>> > effectiveness.
> >>>
> >>> Could you elaborate why you think that's not necessary?
> >>
> >> Sorry, starting of walsender is not so large problem, 1024 bytes
> >> memory is just abandoned once. SIGHUP is rather a problem.
> >>
> >> The part is called under two kinds of memory context, "config
> >> file processing" then "Replication command context". The former
> >> is deleted just after reading the config file so no harm but the
> >> latter is a quite long-lasting context and every reloading bloats
> >> the context with abandoned memory blocks. It is needed to be
> >> pfreed or to use a memory context with shorter lifetime, or use
> >> static storage of 64 byte-length, even though the bloat become
> >> visible after very many times of conf reloads.
> >
> > SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
> > in the patch. Or am I missing something?

Sorry, instead, the memory from strdup() will be abandoned in
upper level. (Thinking for some time..) Ah, I found that the
problem should be here.

> SyncRepFreeConfig(SyncRepConfigData *config)
> {
...
!> list_free(config->members);
> pfree(config);
> }

The list_free *doesn't* free the memory blocks pointed by
lfirst(cell), which has been pstrdup'ed. It should be
list_free_deep(config->members) instead to free it completely.

> >>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
> >>> startup. A child process takes over the internal data struct for the parsed
> >>> s_s_names when it's forked by the postmaster. This is what the previous
> >>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
> >>> In that environment, the data struct should be passed to a child process via
> >>> the special file (like write_nondefault_variables() does), or it should
> >>> be constructed during walsender startup (like latest version of the patch
> >>> does). IMO the latter is simpler.
> >>
> >> Ah, I haven't notice that but I agree with it.
> >>
> >>
> >> As per my previous comment, syncrep_scanner.l doesn't reject some
> >> (nonprintable and multibyte) characters in a name, which is to be
> >> silently replaced with '?' for application_name. It would not be
> >> a problem for almost all of us but might be needed to be
> >> documented if we won't change the behavior to be the same as
> >> application_name.
> >
> > There are three options:
> >
> > 1. Replace nonprintable and non-ASCII characters in s_s_names with ?
> > 2. Emit an error if s_s_names contains nonprintable and non-ASCII characters
> > 3. Do nothing (9.5 or before behave in this way)
> >
> > You implied that we should choose #1 or #2?
>
> Previous(9.5 or before) s_s_names also accepts non-ASCII character and
> non-printable character, and can show it without replacing these
> character to '?'.

Thank you for pointint it out (it was completely out of my
mind..). I have no objection to keep the previous behavior.

> From backward compatibility perspective, we should not choose #1 or #2.
> Different behaviour between previous and current s_s_names is that
> previous s_s_names doesn't accept the node name having the sort of
> white-space character that isspace() returns true with.
> But current s_s_names allows us to specify such a node name.
> I guess that changing such behaviour is enough for fixing this issue.
> Thoughts?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-24 10:00:48
Message-ID:	CAHGQGwGM8Fa1SLW6U6qHgnJvXP3w2WW8wF471R1j7rtq_RLtHA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Yep, but SyncRepFreeConfig() already uses list_free_deep in the latest patch.
Could you read the latest version that I posted upthread.

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-24 13:29:01
Message-ID:	CAD21AoCxwezOTf9kLQRhuf2y=1c_fGjCormqJfqHOmQW8EgaDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 24, 2016 at 2:26 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Thu, 24 Mar 2016 13:04:49 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBVn3_5qC_CKeKSXTu963mM=n9-GxzF7KCPreTTMS+JGQ(at)mail(dot)gmail(dot)com>
>> On Thu, Mar 24, 2016 at 11:34 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > On Wed, Mar 23, 2016 at 5:32 PM, Kyotaro HORIGUCHI
>> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >>> I don't think it's so difficult to extend this version so that
>> >>> it supports also quorum commit.
>> >>
>> >> Mmm. I think I understand this just now. As Sawada-san said
>> >> before, all standbys in a single-level quorum set having the same
>> >> sync_standby_prioirity, the current algorithm works as it is. It
>> >> also true for the case that some quorum sets are in a priority
>> >> set.
>> >>
>> >> What about some priority sets in a quorum set?
>>
>> We should surely consider it that when we support more than 1 nest
>> level configuration.
>> IMO, we can have another information which indicates current sync
>> standbys instead of sync_priority.
>> For now, we are'nt trying to support even quorum method, so we could
>> consider it after we can support both priority method and quorum
>> method without incident.
>
> Fine with me.
>
>> >>> > StringInfo for double-quoted names seems to me to be overkill,
>> >>> > since it allocates 1024 byte block for every such name. A static
>> >>> > buffer seems enough for the usage as I said.
>> >>>
>> >>> So, what about changing the scanner code as follows?
>> >>>
>> >>> <xd>{xdstop} {
>> >>> yylval.str = pstrdup(xdbuf.data);
>> >>> pfree(xdbuf.data);
>> >>> BEGIN(INITIAL);
>> >>> return NAME;
>> >>>
>> >>> > The parser is called for not only for SIGHUP, but also for
>> >>> > starting of every walsender. The latter is not necessary but it
>> >>> > is the matter of trade-off between simplisity and
>> >>> > effectiveness.
>> >>>
>> >>> Could you elaborate why you think that's not necessary?
>> >>
>> >> Sorry, starting of walsender is not so large problem, 1024 bytes
>> >> memory is just abandoned once. SIGHUP is rather a problem.
>> >>
>> >> The part is called under two kinds of memory context, "config
>> >> file processing" then "Replication command context". The former
>> >> is deleted just after reading the config file so no harm but the
>> >> latter is a quite long-lasting context and every reloading bloats
>> >> the context with abandoned memory blocks. It is needed to be
>> >> pfreed or to use a memory context with shorter lifetime, or use
>> >> static storage of 64 byte-length, even though the bloat become
>> >> visible after very many times of conf reloads.
>> >
>> > SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
>> > in the patch. Or am I missing something?
>
> Sorry, instead, the memory from strdup() will be abandoned in
> upper level. (Thinking for some time..) Ah, I found that the
> problem should be here.
>
> > SyncRepFreeConfig(SyncRepConfigData *config)
> > {
> ...
> !> list_free(config->members);
> > pfree(config);
> > }
>
> The list_free *doesn't* free the memory blocks pointed by
> lfirst(cell), which has been pstrdup'ed. It should be
> list_free_deep(config->members) instead to free it completely.
>> >>> BTW, in previous patch, s_s_names is parsed by postmaster during the server
>> >>> startup. A child process takes over the internal data struct for the parsed
>> >>> s_s_names when it's forked by the postmaster. This is what the previous
>> >>> patch was expecting. However, this doesn't work in EXEC_BACKEND environment.
>> >>> In that environment, the data struct should be passed to a child process via
>> >>> the special file (like write_nondefault_variables() does), or it should
>> >>> be constructed during walsender startup (like latest version of the patch
>> >>> does). IMO the latter is simpler.
>> >>
>> >> Ah, I haven't notice that but I agree with it.
>> >>
>> >>
>> >> As per my previous comment, syncrep_scanner.l doesn't reject some
>> >> (nonprintable and multibyte) characters in a name, which is to be
>> >> silently replaced with '?' for application_name. It would not be
>> >> a problem for almost all of us but might be needed to be
>> >> documented if we won't change the behavior to be the same as
>> >> application_name.
>> >
>> > There are three options:
>> >
>> > 1. Replace nonprintable and non-ASCII characters in s_s_names with ?
>> > 2. Emit an error if s_s_names contains nonprintable and non-ASCII characters
>> > 3. Do nothing (9.5 or before behave in this way)
>> >
>> > You implied that we should choose #1 or #2?
>>
>> Previous(9.5 or before) s_s_names also accepts non-ASCII character and
>> non-printable character, and can show it without replacing these
>> character to '?'.
>
> Thank you for pointint it out (it was completely out of my
> mind..). I have no objection to keep the previous behavior.
>
>> From backward compatibility perspective, we should not choose #1 or #2.
>> Different behaviour between previous and current s_s_names is that
>> previous s_s_names doesn't accept the node name having the sort of
>> white-space character that isspace() returns true with.
>> But current s_s_names allows us to specify such a node name.
>> I guess that changing such behaviour is enough for fixing this issue.
>> Thoughts?
>

Attached latest patch incorporating all review comments so far.

Aside from the review comments, I did following changes;
- Add logic to avoid fatal exit in yy_fatal_error().
- Improve regression test cases.

Also I felt a sense of discomfort regarding using [ and ] as a special
character for priority method.
Because (, ) and [, ] are a little similar each other, so it would
easily make many syntax errors when nested style is supported.
And the synopsis of that in documentation is odd;
synchronous_standby_names = 'N [ node_name [, ...] ]'

This topic has been already discussed before but, we might want to
change it to other characters such as < and >?

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
muti_sync_replication_v19.patch	text/x-diff	41.8 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-25 12:20:15
Message-ID:	CA+TgmoZqFs9pGjVTVbJfxKpMdR86TtvYT9h0y1guAN6BTBG57Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 24, 2016 at 9:29 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Also I felt a sense of discomfort regarding using [ and ] as a special
> character for priority method.
> Because (, ) and [, ] are a little similar each other, so it would
> easily make many syntax errors when nested style is supported.
> And the synopsis of that in documentation is odd;
> synchronous_standby_names = 'N [ node_name [, ...] ]'
>
> This topic has been already discussed before but, we might want to
> change it to other characters such as < and >?

I personally would recommend against <>. Those should mean less-than
and greater-than, not grouping. I think you could use parentheses,
(). There's nothing saying that has to mean any particular thing, so
you may as well use it for the first thing implemented, perhaps. Or
you could use [] or {}. It *is* important that you don't create
confusing syntax summaries, but I don't think that's a reason to pick
a nonstandard syntax for grouping.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-28 06:59:21
Message-ID:	CAD21AoAM2c-D3=LRtOckoChNni68eCU3x-aUCwyHbF5FNxc+5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 25, 2016 at 9:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Mar 24, 2016 at 9:29 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> Also I felt a sense of discomfort regarding using [ and ] as a special
>> character for priority method.
>> Because (, ) and [, ] are a little similar each other, so it would
>> easily make many syntax errors when nested style is supported.
>> And the synopsis of that in documentation is odd;
>> synchronous_standby_names = 'N [ node_name [, ...] ]'
>>
>> This topic has been already discussed before but, we might want to
>> change it to other characters such as < and >?
>
> I personally would recommend against <>. Those should mean less-than
> and greater-than, not grouping. I think you could use parentheses,
> (). There's nothing saying that has to mean any particular thing, so
> you may as well use it for the first thing implemented, perhaps. Or
> you could use [] or {}. It *is* important that you don't create
> confusing syntax summaries, but I don't think that's a reason to pick
> a nonstandard syntax for grouping.
>

I agree with you.
I've changed it to use parentheses.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
muti_sync_replication_v20.patch	text/x-diff	41.8 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-28 08:50:19
Message-ID:	20160328.175019.231062657.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the new patch. Sorry to have overlooked some
versions. I'm looking the v19 patch now.

make complains for an unused variable.

| syncrep.c: In function ‘SyncRepGetSyncStandbys’:
| syncrep.c:601:13: warning: variable ‘next’ set but not used [-Wunused-but-set-variable]
| ListCell *next;

At Thu, 24 Mar 2016 22:29:01 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCxwezOTf9kLQRhuf2y=1c_fGjCormqJfqHOmQW8EgaDg(at)mail(dot)gmail(dot)com>
> >> > SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
> >> > in the patch. Or am I missing something?
> >
> > Sorry, instead, the memory from strdup() will be abandoned in
> > upper level. (Thinking for some time..) Ah, I found that the
> > problem should be here.
> >
> > > SyncRepFreeConfig(SyncRepConfigData *config)
> > > {
> > ...
> > !> list_free(config->members);
> > > pfree(config);
> > > }
> >
> > The list_free *doesn't* free the memory blocks pointed by
> > lfirst(cell), which has been pstrdup'ed. It should be
> > list_free_deep(config->members) instead to free it completely.

Fujii> Yep, but SyncRepFreeConfig() already uses list_free_deep
Fujii> in the latest patch. Could you read the latest version
Fujii> that I posted upthread.

Sorry for overlooked the version. Every pair of parse(or
SyncRepUpdateConfig) and SyncRepFreeConfig is on the same memory
context so it seems safe (but might be fragile since it relies on
that the caller does so.).

> >> Previous(9.5 or before) s_s_names also accepts non-ASCII character and
> >> non-printable character, and can show it without replacing these
> >> character to '?'.
> >
> > Thank you for pointint it out (it was completely out of my
> > mind..). I have no objection to keep the previous behavior.
> >
> >> From backward compatibility perspective, we should not choose #1 or #2.
> >> Different behaviour between previous and current s_s_names is that
> >> previous s_s_names doesn't accept the node name having the sort of
> >> white-space character that isspace() returns true with.
> >> But current s_s_names allows us to specify such a node name.
> >> I guess that changing such behaviour is enough for fixing this issue.
> >> Thoughts?
> >
>
> Attached latest patch incorporating all review comments so far.
>
> Aside from the review comments, I did following changes;
> - Add logic to avoid fatal exit in yy_fatal_error().

Maybe good catch, but..

> syncrep_scanstr(const char *str)
..
> * Regain control after a fatal, internal flex error. It may have
> * corrupted parser state. Consequently, abandon the file, but trust
~~~~~~~~~~~~~~~~
> * that the state remains sane enough for syncrep_yy_delete_buffer().
~~~~~~~~~~~~~~~~~~~~~~~~

guc-file.l actually abandones the config file but syncrep_scanner
reads only a value of an item in it. And, the latter is
eventually true but a bit hard to understand.

The patch will emit a mysterious error message like this.

> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
> configuration file ".../postgresql.conf" contains errors

This is utterly wrong. A bit related to that, it seems to me that
syncrep_scan.l doesn't need the same mechanism with
guc-file.l. The nature of the modification would be making
call_*_check_hook to be tri-state instead of boolean. So just
cathing errors in call_*_check_hook and ereport()'ing as SQL
parser does seems enough, but either will do for me.

> - Improve regression test cases.

I forgot to mention that, but additionalORDER BY makes the test
robust.

I doubt the validity of the behavior in the following test.

> # Change the synchronous_standby_names = '2[standby1,*,standby2]' and check sync_state

Is this regarded as a correct as a value for it?

> Also I felt a sense of discomfort regarding using [ and ] as a special
> character for priority method.
> Because (, ) and [, ] are a little similar each other, so it would
> easily make many syntax errors when nested style is supported.
> And the synopsis of that in documentation is odd;
> synchronous_standby_names = 'N [ node_name [, ...] ]'
>
> This topic has been already discussed before but, we might want to
> change it to other characters such as < and >?

I don't mind ether but as Robert said, it is true that the
characters essentially to be used to enclose something should be
preferred to other characters. Distinguishability of glyphs has
less signinficance, perhaps.

# LISPers don't hesitate to dive into Sea of Parens.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-28 09:25:47
Message-ID:	56F8F89B.3080906@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016/03/28 17:50, Kyotaro HORIGUCHI wrote:
>
> # LISPers don't hesitate to dive into Sea of Parens.

Sorry in advance to be off-topic: https://xkcd.com/297 :)

Thanks,
Amit

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-28 09:38:22
Message-ID:	CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Thank you for the new patch. Sorry to have overlooked some
> versions. I'm looking the v19 patch now.
>
> make complains for an unused variable.
>
> | syncrep.c: In function ‘SyncRepGetSyncStandbys’:
> | syncrep.c:601:13: warning: variable ‘next’ set but not used [-Wunused-but-set-variable]
> | ListCell *next;
>
>
> At Thu, 24 Mar 2016 22:29:01 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCxwezOTf9kLQRhuf2y=1c_fGjCormqJfqHOmQW8EgaDg(at)mail(dot)gmail(dot)com>
>> >> > SyncRepInitConfig()->SyncRepFreeConfig() has already pfree'd that
>> >> > in the patch. Or am I missing something?
>> >
>> > Sorry, instead, the memory from strdup() will be abandoned in
>> > upper level. (Thinking for some time..) Ah, I found that the
>> > problem should be here.
>> >
>> > > SyncRepFreeConfig(SyncRepConfigData *config)
>> > > {
>> > ...
>> > !> list_free(config->members);
>> > > pfree(config);
>> > > }
>> >
>> > The list_free *doesn't* free the memory blocks pointed by
>> > lfirst(cell), which has been pstrdup'ed. It should be
>> > list_free_deep(config->members) instead to free it completely.
>
> Fujii> Yep, but SyncRepFreeConfig() already uses list_free_deep
> Fujii> in the latest patch. Could you read the latest version
> Fujii> that I posted upthread.
>
> Sorry for overlooked the version. Every pair of parse(or
> SyncRepUpdateConfig) and SyncRepFreeConfig is on the same memory
> context so it seems safe (but might be fragile since it relies on
> that the caller does so.).
>
>> >> Previous(9.5 or before) s_s_names also accepts non-ASCII character and
>> >> non-printable character, and can show it without replacing these
>> >> character to '?'.
>> >
>> > Thank you for pointint it out (it was completely out of my
>> > mind..). I have no objection to keep the previous behavior.
>> >
>> >> From backward compatibility perspective, we should not choose #1 or #2.
>> >> Different behaviour between previous and current s_s_names is that
>> >> previous s_s_names doesn't accept the node name having the sort of
>> >> white-space character that isspace() returns true with.
>> >> But current s_s_names allows us to specify such a node name.
>> >> I guess that changing such behaviour is enough for fixing this issue.
>> >> Thoughts?
>> >
>>
>> Attached latest patch incorporating all review comments so far.
>>
>> Aside from the review comments, I did following changes;
>> - Add logic to avoid fatal exit in yy_fatal_error().
>
> Maybe good catch, but..
>
>> syncrep_scanstr(const char *str)
> ..
>> * Regain control after a fatal, internal flex error. It may have
>> * corrupted parser state. Consequently, abandon the file, but trust
> ~~~~~~~~~~~~~~~~
>> * that the state remains sane enough for syncrep_yy_delete_buffer().
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> guc-file.l actually abandones the config file but syncrep_scanner
> reads only a value of an item in it. And, the latter is
> eventually true but a bit hard to understand.
>
> The patch will emit a mysterious error message like this.
>
>> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
>> configuration file ".../postgresql.conf" contains errors
>
> This is utterly wrong. A bit related to that, it seems to me that
> syncrep_scan.l doesn't need the same mechanism with
> guc-file.l. The nature of the modification would be making
> call_*_check_hook to be tri-state instead of boolean. So just
> cathing errors in call_*_check_hook and ereport()'ing as SQL
> parser does seems enough, but either will do for me.

Well, I think that call_*_check_hook can not catch such a fatal error.
Because if yy_fatal_error() is called without preventing logic when
reloading configuration file, postmaster process will abnormal exit
immediately as well as wal sender process.

>
>> - Improve regression test cases.
>
> I forgot to mention that, but additionalORDER BY makes the test
> robust.
>
> I doubt the validity of the behavior in the following test.
>
>> # Change the synchronous_standby_names = '2[standby1,*,standby2]' and check sync_state
>
> Is this regarded as a correct as a value for it?

Since previous s_s_names (9.5 or before) can accept this value, I
didn't change behaviour.
And I added this test case for checking backward compatibility more finely.

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	masao(dot)fujii(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-29 07:23:13
Message-ID:	20160329.162313.82540751.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Thank you for the new patch. Sorry to have overlooked some
> > versions. I'm looking the v19 patch now.
> >
> > make complains for an unused variable.

Thank you. I'll have a closer look on it a bit later.

> >> Attached latest patch incorporating all review comments so far.
> >>
> >> Aside from the review comments, I did following changes;
> >> - Add logic to avoid fatal exit in yy_fatal_error().
> >
> > Maybe good catch, but..
> >
> >> syncrep_scanstr(const char *str)
> > ..
> >> * Regain control after a fatal, internal flex error. It may have
> >> * corrupted parser state. Consequently, abandon the file, but trust
> > ~~~~~~~~~~~~~~~~
> >> * that the state remains sane enough for syncrep_yy_delete_buffer().
> > ~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > guc-file.l actually abandones the config file but syncrep_scanner
> > reads only a value of an item in it. And, the latter is
> > eventually true but a bit hard to understand.
> >
> > The patch will emit a mysterious error message like this.
> >
> >> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
> >> configuration file ".../postgresql.conf" contains errors
> >
> > This is utterly wrong. A bit related to that, it seems to me that
> > syncrep_scan.l doesn't need the same mechanism with
> > guc-file.l. The nature of the modification would be making
> > call_*_check_hook to be tri-state instead of boolean. So just
> > cathing errors in call_*_check_hook and ereport()'ing as SQL
> > parser does seems enough, but either will do for me.
>
> Well, I think that call_*_check_hook can not catch such a fatal error.

As mentioned in my comment, SQL parser converts yy_fatal_error
into ereport(ERROR), which can be caught by the upper PG_TRY (by
#define'ing fprintf). So it is doable if you mind exit().

> Because if yy_fatal_error() is called without preventing logic when
> reloading configuration file, postmaster process will abnormal exit
> immediately as well as wal sender process.

> >> - Improve regression test cases.
> >
> > I forgot to mention that, but additionalORDER BY makes the test
> > robust.
> >
> > I doubt the validity of the behavior in the following test.
> >
> >> # Change the synchronous_standby_names = '2[standby1,*,standby2]' and check sync_state
> >
> > Is this regarded as a correct as a value for it?
>
> Since previous s_s_names (9.5 or before) can accept this value, I
> didn't change behaviour.
> And I added this test case for checking backward compatibility more finely.

I understand that and it's fine. But we need a explanation for
the reason above in the test case or somewhere else.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-29 07:51:02
Message-ID:	CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello,
>
> At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
> sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Thank you for the new patch. Sorry to have overlooked some
>> > versions. I'm looking the v19 patch now.
>> >
>> > make complains for an unused variable.
>
> Thank you. I'll have a closer look on it a bit later.
>
>> >> Attached latest patch incorporating all review comments so far.
>> >>
>> >> Aside from the review comments, I did following changes;
>> >> - Add logic to avoid fatal exit in yy_fatal_error().
>> >
>> > Maybe good catch, but..
>> >
>> >> syncrep_scanstr(const char *str)
>> > ..
>> >> * Regain control after a fatal, internal flex error. It may have
>> >> * corrupted parser state. Consequently, abandon the file, but trust
>> > ~~~~~~~~~~~~~~~~
>> >> * that the state remains sane enough for syncrep_yy_delete_buffer().
>> > ~~~~~~~~~~~~~~~~~~~~~~~~
>> >
>> > guc-file.l actually abandones the config file but syncrep_scanner
>> > reads only a value of an item in it. And, the latter is
>> > eventually true but a bit hard to understand.
>> >
>> > The patch will emit a mysterious error message like this.
>> >
>> >> invalid value for parameter "synchronous_standby_names": "2[a,b,c]"
>> >> configuration file ".../postgresql.conf" contains errors
>> >
>> > This is utterly wrong. A bit related to that, it seems to me that
>> > syncrep_scan.l doesn't need the same mechanism with
>> > guc-file.l. The nature of the modification would be making
>> > call_*_check_hook to be tri-state instead of boolean. So just
>> > cathing errors in call_*_check_hook and ereport()'ing as SQL
>> > parser does seems enough, but either will do for me.
>>
>> Well, I think that call_*_check_hook can not catch such a fatal error.
>
> As mentioned in my comment, SQL parser converts yy_fatal_error
> into ereport(ERROR), which can be caught by the upper PG_TRY (by
> #define'ing fprintf). So it is doable if you mind exit().

I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
flex fatal error occurs, postmaster just exits instead of jumping out of parser.

ISTM that, when an internal flex fatal error occurs, it's better to elog(FATAL)
and terminate the problematic process. This might lead to the server crash
(e.g., if postmaster emits a FATAL error, it and its all child processes will
exit soon). But probably we can live with this because the fatal error basically
rarely happens.

OTOH, if we make the process keep running even after it gets an internal
fatal error (like Sawada's patch or your idea do), this might cause more
serious problem. Please imagine the case where one walsender gets the fatal
error (e.g., because of OOM), abandon new setting value of
synchronous_standby_names, and keep running with the previous setting value.
OTOH, the other walsender processes successfully parse the setting and
keep running with new setting. In this case, the inconsistency of the setting
which each walsender is based on happens. This completely will mess up the
synchronous replication.

Therefore, I think that it's better to make the problematic process exit
with FATAL error rather than ignore the error and keep it running.

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-29 08:36:44
Message-ID:	20160329.173644.22077026.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I personally don't think it needs such a survive measure. It is
very small syntax and the parser reads very short text. If the
parser failes in such mode, something more serious should have
occurred.

At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Hello,
> >
> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > As mentioned in my comment, SQL parser converts yy_fatal_error
> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
> > #define'ing fprintf). So it is doable if you mind exit().
>
> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
> flex fatal error occurs, postmaster just exits instead of jumping out of parser.

If The ERROR may be LOG or DEBUG2 either, if we think the parser
fatal erros are recoverable. guc-file.l is doing so.

> ISTM that, when an internal flex fatal error occurs, it's
> better to elog(FATAL) and terminate the problematic
> process. This might lead to the server crash (e.g., if
> postmaster emits a FATAL error, it and its all child processes
> will exit soon). But probably we can live with this because the
> fatal error basically rarely happens.

I agree to this

> OTOH, if we make the process keep running even after it gets an internal
> fatal error (like Sawada's patch or your idea do), this might cause more
> serious problem. Please imagine the case where one walsender gets the fatal
> error (e.g., because of OOM), abandon new setting value of
> synchronous_standby_names, and keep running with the previous setting value.
> OTOH, the other walsender processes successfully parse the setting and
> keep running with new setting. In this case, the inconsistency of the setting
> which each walsender is based on happens. This completely will mess up the
> synchronous replication.

On the other hand, guc-file.l seems ignoring parser errors under
normal operation, even though it may cause similar inconsistency,
if any..

| LOG: received SIGHUP, reloading configuration files
| LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
| LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied

> Therefore, I think that it's better to make the problematic process exit
> with FATAL error rather than ignore the error and keep it running.

+1. Restarting walsender would be far less harmful than keeping
it running in doubtful state.

Sould I wait for the next version or have a look on the latest?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-30 14:43:36
Message-ID:	CAD21AoBG7p3zzS4jYPz1d9oh+s+kU-Gen_zw7Q_tGrmFmF5FfQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I personally don't think it needs such a survive measure. It is
> very small syntax and the parser reads very short text. If the
> parser failes in such mode, something more serious should have
> occurred.
>
> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello,
>> >
>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>> > #define'ing fprintf). So it is doable if you mind exit().
>>
>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>
> If The ERROR may be LOG or DEBUG2 either, if we think the parser
> fatal erros are recoverable. guc-file.l is doing so.
>
>> ISTM that, when an internal flex fatal error occurs, it's
>> better to elog(FATAL) and terminate the problematic
>> process. This might lead to the server crash (e.g., if
>> postmaster emits a FATAL error, it and its all child processes
>> will exit soon). But probably we can live with this because the
>> fatal error basically rarely happens.
>
> I agree to this
>
>> OTOH, if we make the process keep running even after it gets an internal
>> fatal error (like Sawada's patch or your idea do), this might cause more
>> serious problem. Please imagine the case where one walsender gets the fatal
>> error (e.g., because of OOM), abandon new setting value of
>> synchronous_standby_names, and keep running with the previous setting value.
>> OTOH, the other walsender processes successfully parse the setting and
>> keep running with new setting. In this case, the inconsistency of the setting
>> which each walsender is based on happens. This completely will mess up the
>> synchronous replication.
>
> On the other hand, guc-file.l seems ignoring parser errors under
> normal operation, even though it may cause similar inconsistency,
> if any..
>
> | LOG: received SIGHUP, reloading configuration files
> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>
>> Therefore, I think that it's better to make the problematic process exit
>> with FATAL error rather than ignore the error and keep it running.
>
> +1. Restarting walsender would be far less harmful than keeping
> it running in doubtful state.
>
> Sould I wait for the next version or have a look on the latest?
>

Attached latest patch incorporate some review comments so far, and is
rebased against current HEAD.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
multi_sync_replication_v21.patch	application/octet-stream	41.8 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-30 14:55:36
Message-ID:	CAD21AoDdC9Ek3E=_dfLi-yHLDYZpitid_Z=b+Z62uqATHbaZJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> I personally don't think it needs such a survive measure. It is
>> very small syntax and the parser reads very short text. If the
>> parser failes in such mode, something more serious should have
>> occurred.
>>
>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> > Hello,
>>> >
>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>
>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>
>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>> fatal erros are recoverable. guc-file.l is doing so.
>>
>>> ISTM that, when an internal flex fatal error occurs, it's
>>> better to elog(FATAL) and terminate the problematic
>>> process. This might lead to the server crash (e.g., if
>>> postmaster emits a FATAL error, it and its all child processes
>>> will exit soon). But probably we can live with this because the
>>> fatal error basically rarely happens.
>>
>> I agree to this
>>
>>> OTOH, if we make the process keep running even after it gets an internal
>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>> serious problem. Please imagine the case where one walsender gets the fatal
>>> error (e.g., because of OOM), abandon new setting value of
>>> synchronous_standby_names, and keep running with the previous setting value.
>>> OTOH, the other walsender processes successfully parse the setting and
>>> keep running with new setting. In this case, the inconsistency of the setting
>>> which each walsender is based on happens. This completely will mess up the
>>> synchronous replication.
>>
>> On the other hand, guc-file.l seems ignoring parser errors under
>> normal operation, even though it may cause similar inconsistency,
>> if any..
>>
>> | LOG: received SIGHUP, reloading configuration files
>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>>
>>> Therefore, I think that it's better to make the problematic process exit
>>> with FATAL error rather than ignore the error and keep it running.
>>
>> +1. Restarting walsender would be far less harmful than keeping
>> it running in doubtful state.
>>
>> Sould I wait for the next version or have a look on the latest?
>>
>
> Attached latest patch incorporate some review comments so far, and is
> rebased against current HEAD.
>

Sorry I attached wrong patch.
Attached patch is correct patch.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
multi_sync_replication_v21.patch	application/octet-stream	42.3 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-03-31 04:11:40
Message-ID:	CAEepm=0Gufk1p1Mk6-2y361hpQ6cpqq3+sKckfam=RbGhY-HvQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> I personally don't think it needs such a survive measure. It is
>>> very small syntax and the parser reads very short text. If the
>>> parser failes in such mode, something more serious should have
>>> occurred.
>>>
>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> > Hello,
>>>> >
>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>>
>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>>
>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>> fatal erros are recoverable. guc-file.l is doing so.
>>>
>>>> ISTM that, when an internal flex fatal error occurs, it's
>>>> better to elog(FATAL) and terminate the problematic
>>>> process. This might lead to the server crash (e.g., if
>>>> postmaster emits a FATAL error, it and its all child processes
>>>> will exit soon). But probably we can live with this because the
>>>> fatal error basically rarely happens.
>>>
>>> I agree to this
>>>
>>>> OTOH, if we make the process keep running even after it gets an internal
>>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>>> serious problem. Please imagine the case where one walsender gets the fatal
>>>> error (e.g., because of OOM), abandon new setting value of
>>>> synchronous_standby_names, and keep running with the previous setting value.
>>>> OTOH, the other walsender processes successfully parse the setting and
>>>> keep running with new setting. In this case, the inconsistency of the setting
>>>> which each walsender is based on happens. This completely will mess up the
>>>> synchronous replication.
>>>
>>> On the other hand, guc-file.l seems ignoring parser errors under
>>> normal operation, even though it may cause similar inconsistency,
>>> if any..
>>>
>>> | LOG: received SIGHUP, reloading configuration files
>>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>>>
>>>> Therefore, I think that it's better to make the problematic process exit
>>>> with FATAL error rather than ignore the error and keep it running.
>>>
>>> +1. Restarting walsender would be far less harmful than keeping
>>> it running in doubtful state.
>>>
>>> Sould I wait for the next version or have a look on the latest?
>>>
>>
>> Attached latest patch incorporate some review comments so far, and is
>> rebased against current HEAD.
>>
>
> Sorry I attached wrong patch.
> Attached patch is correct patch.
>
> [mulit_sync_replication_v21.patch]

Here are some TPS numbers from some quick tests I ran on a set of
Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
as primary + 3 standbys, to try out different combinations of
synchronous_commit levels and synchronous_standby_names numbers. They
were run for a short time only and these are of course systems with
limited and perhaps uneven IO and CPU, but they still give some idea
of the trends. And reassuringly, the trends are travelling in the
expected directions.

All default settings except shared_buffers = 1GB, and the GUCs
required for replication.

pgbench postgres -j2 -c2 -N bench2 -T 600

1(*) 2(*) 3(*)
==== ==== ====
off = 4056 4096 4092
local = 1323 1299 1312
remote_write = 1130 1046 958
on = 860 744 701
remote_apply = 785 725 604

pgbench postgres -j16 -c16 -N bench2 -T 600

1(*) 2(*) 3(*)
==== ==== ====
off = 3952 3943 3933
local = 2964 2984 3026
remote_write = 2790 2724 2675
on = 2731 2627 2523
remote_apply = 2627 2501 2432

One thing I noticed is that there are LOG messages telling me when a
standby becomes a synchronous standby, but nothing to tell me if a
standby stops being a standby (ie because a higher priority one has
taken its place in the quorum). Would that be interesting?

Also, I spotted some tiny mistakes:

+ <indexterm zone="high-availability">
+ <primary>Dedicated language for multiple synchornous replication</primary>
+ </indexterm>

s/synchornous/synchronous/

+ /*
+ * If we are managing the sync standby, though we weren't
+ * prior to this, then announce we are now the sync standby.
+ */

s/ the / a / (two occurrences)

+ ereport(LOG,
+ (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
+ application_name, MyWalSnd->sync_standby_priority)));

s/ the / a /

offered by a transaction commit. This level of protection is referred
- to as 2-safe replication in computer science theory.
+ to as 2-safe replication in computer science theory, and group-1-safe
+ (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
+ more than <literal>remote_write</>.

Why "more than"? I think those two words should be changed to "at
least", or removed.

+ <para>
+ This syntax allows us to define a synchronous group that will wait for at
+ least N standbys of them, and a comma-separated list of group
members that are surrounded by
+ parantheses. The special value <literal>*</> for server name
matches any standby.
+ By surrounding list of group members using parantheses,
synchronous standbys are chosen from
+ that group using priority method.
+ </para>

s/parantheses/parentheses/ (two occurrences)

+ <sect2 id="dedicated-language-for-multi-sync-replication-priority">
+ <title>Prioirty Method</title>

s/Prioirty Method/Priority Method/

--
Thomas Munro
http://www.enterprisedb.com

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-02 01:20:55
Message-ID:	CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> I personally don't think it needs such a survive measure. It is
>>>> very small syntax and the parser reads very short text. If the
>>>> parser failes in such mode, something more serious should have
>>>> occurred.
>>>>
>>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>> > Hello,
>>>>> >
>>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>>>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>>>
>>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>>>
>>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>>> fatal erros are recoverable. guc-file.l is doing so.
>>>>
>>>>> ISTM that, when an internal flex fatal error occurs, it's
>>>>> better to elog(FATAL) and terminate the problematic
>>>>> process. This might lead to the server crash (e.g., if
>>>>> postmaster emits a FATAL error, it and its all child processes
>>>>> will exit soon). But probably we can live with this because the
>>>>> fatal error basically rarely happens.
>>>>
>>>> I agree to this
>>>>
>>>>> OTOH, if we make the process keep running even after it gets an internal
>>>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>>>> serious problem. Please imagine the case where one walsender gets the fatal
>>>>> error (e.g., because of OOM), abandon new setting value of
>>>>> synchronous_standby_names, and keep running with the previous setting value.
>>>>> OTOH, the other walsender processes successfully parse the setting and
>>>>> keep running with new setting. In this case, the inconsistency of the setting
>>>>> which each walsender is based on happens. This completely will mess up the
>>>>> synchronous replication.
>>>>
>>>> On the other hand, guc-file.l seems ignoring parser errors under
>>>> normal operation, even though it may cause similar inconsistency,
>>>> if any..
>>>>
>>>> | LOG: received SIGHUP, reloading configuration files
>>>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>>>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>>>>
>>>>> Therefore, I think that it's better to make the problematic process exit
>>>>> with FATAL error rather than ignore the error and keep it running.
>>>>
>>>> +1. Restarting walsender would be far less harmful than keeping
>>>> it running in doubtful state.
>>>>
>>>> Sould I wait for the next version or have a look on the latest?
>>>>
>>>
>>> Attached latest patch incorporate some review comments so far, and is
>>> rebased against current HEAD.
>>>
>>
>> Sorry I attached wrong patch.
>> Attached patch is correct patch.
>>
>> [mulit_sync_replication_v21.patch]
>
> Here are some TPS numbers from some quick tests I ran on a set of
> Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
> as primary + 3 standbys, to try out different combinations of
> synchronous_commit levels and synchronous_standby_names numbers. They
> were run for a short time only and these are of course systems with
> limited and perhaps uneven IO and CPU, but they still give some idea
> of the trends. And reassuringly, the trends are travelling in the
> expected directions.
>
> All default settings except shared_buffers = 1GB, and the GUCs
> required for replication.
>
> pgbench postgres -j2 -c2 -N bench2 -T 600
>
> 1(*) 2(*) 3(*)
> ==== ==== ====
> off = 4056 4096 4092
> local = 1323 1299 1312
> remote_write = 1130 1046 958
> on = 860 744 701
> remote_apply = 785 725 604
>
> pgbench postgres -j16 -c16 -N bench2 -T 600
>
> 1(*) 2(*) 3(*)
> ==== ==== ====
> off = 3952 3943 3933
> local = 2964 2984 3026
> remote_write = 2790 2724 2675
> on = 2731 2627 2523
> remote_apply = 2627 2501 2432
>
> One thing I noticed is that there are LOG messages telling me when a
> standby becomes a synchronous standby, but nothing to tell me if a
> standby stops being a standby (ie because a higher priority one has
> taken its place in the quorum). Would that be interesting?
>
> Also, I spotted some tiny mistakes:
>
> + <indexterm zone="high-availability">
> + <primary>Dedicated language for multiple synchornous replication</primary>
> + </indexterm>
>
> s/synchornous/synchronous/
>
> + /*
> + * If we are managing the sync standby, though we weren't
> + * prior to this, then announce we are now the sync standby.
> + */
>
> s/ the / a / (two occurrences)
>
> + ereport(LOG,
> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
> + application_name, MyWalSnd->sync_standby_priority)));
>
> s/ the / a /
>
> offered by a transaction commit. This level of protection is referred
> - to as 2-safe replication in computer science theory.
> + to as 2-safe replication in computer science theory, and group-1-safe
> + (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
> + more than <literal>remote_write</>.
>
> Why "more than"? I think those two words should be changed to "at
> least", or removed.
>
> + <para>
> + This syntax allows us to define a synchronous group that will wait for at
> + least N standbys of them, and a comma-separated list of group
> members that are surrounded by
> + parantheses. The special value <literal>*</> for server name
> matches any standby.
> + By surrounding list of group members using parantheses,
> synchronous standbys are chosen from
> + that group using priority method.
> + </para>
>
> s/parantheses/parentheses/ (two occurrences)
>
> + <sect2 id="dedicated-language-for-multi-sync-replication-priority">
> + <title>Prioirty Method</title>
>
> s/Prioirty Method/Priority Method/

A couple more comments:

/*
- * If we aren't managing the highest priority standby then just leave.
+ * If the number of sync standbys is less than requested or we aren't
+ * managing the sync standby then just leave.
*/
- if (syncWalSnd != MyWalSnd)
+ if (!got_oldest || !am_sync)

s/ the sync / a sync /

+ /*
+ * Consider all pending standbys as sync if the number of them plus
+ * already-found sync ones is lower than the configuration requests.
+ */
+ if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync)
+ return list_concat(result, pending);

The cells from 'pending' will be attached to 'result', and 'result'
will be freed by the caller. But won't the List header object from
'pending' be leaked?

+ result = lappend_int(result, i);
+ if (list_length(result) == SyncRepConfig->num_sync)
+ {
+ list_free(pending);
+ return result; /* Exit if got enough sync standbys */
+ }

If we didn't take the early return in the list-not-long-enough case
mentioned above, we should *always* exit via this return statement,
right? Since we know that the pending list had enough elements to
reach num_sync. I think that is worth a comment, and also a "not
reached" comment at the bottom of the function, if it is true.

As a future improvement, I wonder if we could avoid recomputing the
current set of sync standbys in every walsender every time we call
SyncRepReleaseWaiters, perhaps by maintaining that set incrementally
in shmem when walsender states change etc.

I don't have any other comments, other than to say: thank you to all
the people who have contributed to this feature so far and I really
really hope it goes into 9.6!

--
Thomas Munro
http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 08:28:07
Message-ID:	CAHGQGwG2Ze0YD=U35bZFQxLFU1cA_=+5v864mLHuvhKER8MkpQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Apr 2, 2016 at 10:20 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Mar 31, 2016 at 5:11 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> On Thu, Mar 31, 2016 at 3:55 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Mar 30, 2016 at 11:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Tue, Mar 29, 2016 at 5:36 PM, Kyotaro HORIGUCHI
>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>> I personally don't think it needs such a survive measure. It is
>>>>> very small syntax and the parser reads very short text. If the
>>>>> parser failes in such mode, something more serious should have
>>>>> occurred.
>>>>>
>>>>> At Tue, 29 Mar 2016 16:51:02 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwFth8pnYhaLBx0nF8o4qmwctdzEOcWRqEu7HOwgdJGa3g(at)mail(dot)gmail(dot)com>
>>>>>> On Tue, Mar 29, 2016 at 4:23 PM, Kyotaro HORIGUCHI
>>>>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>>> > Hello,
>>>>>> >
>>>>>> > At Mon, 28 Mar 2016 18:38:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoAJMDV1EUKMfeyaV24arx4pzUjGHYbY4ZxzKpkiCUvh0Q(at)mail(dot)gmail(dot)com>
>>>>>> > sawada.mshk> On Mon, Mar 28, 2016 at 5:50 PM, Kyotaro HORIGUCHI
>>>>>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>>>> > As mentioned in my comment, SQL parser converts yy_fatal_error
>>>>>> > into ereport(ERROR), which can be caught by the upper PG_TRY (by
>>>>>> > #define'ing fprintf). So it is doable if you mind exit().
>>>>>>
>>>>>> I'm afraid that your idea doesn't work in postmaster. Because ereport(ERROR) is
>>>>>> implicitly promoted to ereport(FATAL) in postmaster. IOW, when an internal
>>>>>> flex fatal error occurs, postmaster just exits instead of jumping out of parser.
>>>>>
>>>>> If The ERROR may be LOG or DEBUG2 either, if we think the parser
>>>>> fatal erros are recoverable. guc-file.l is doing so.
>>>>>
>>>>>> ISTM that, when an internal flex fatal error occurs, it's
>>>>>> better to elog(FATAL) and terminate the problematic
>>>>>> process. This might lead to the server crash (e.g., if
>>>>>> postmaster emits a FATAL error, it and its all child processes
>>>>>> will exit soon). But probably we can live with this because the
>>>>>> fatal error basically rarely happens.
>>>>>
>>>>> I agree to this
>>>>>
>>>>>> OTOH, if we make the process keep running even after it gets an internal
>>>>>> fatal error (like Sawada's patch or your idea do), this might cause more
>>>>>> serious problem. Please imagine the case where one walsender gets the fatal
>>>>>> error (e.g., because of OOM), abandon new setting value of
>>>>>> synchronous_standby_names, and keep running with the previous setting value.
>>>>>> OTOH, the other walsender processes successfully parse the setting and
>>>>>> keep running with new setting. In this case, the inconsistency of the setting
>>>>>> which each walsender is based on happens. This completely will mess up the
>>>>>> synchronous replication.
>>>>>
>>>>> On the other hand, guc-file.l seems ignoring parser errors under
>>>>> normal operation, even though it may cause similar inconsistency,
>>>>> if any..
>>>>>
>>>>> | LOG: received SIGHUP, reloading configuration files
>>>>> | LOG: input in flex scanner failed at file "/home/horiguti/data/data_work/postgresql.conf" line 1
>>>>> | LOG: configuration file "/home/horiguti/data/data_work/postgresql.conf" contains errors; no changes were applied
>>>>>
>>>>>> Therefore, I think that it's better to make the problematic process exit
>>>>>> with FATAL error rather than ignore the error and keep it running.
>>>>>
>>>>> +1. Restarting walsender would be far less harmful than keeping
>>>>> it running in doubtful state.
>>>>>
>>>>> Sould I wait for the next version or have a look on the latest?
>>>>>
>>>>
>>>> Attached latest patch incorporate some review comments so far, and is
>>>> rebased against current HEAD.
>>>>
>>>
>>> Sorry I attached wrong patch.
>>> Attached patch is correct patch.

Thanks for updating the patch!

I applied the following changes to the patch.
Attached is the revised version of the patch.

- Changed syncrep_flex_fatal() so that it just calls ereport(FATAL), based on
the recent discussion with Horiguchi-san.
- Improved the documentation.
- Fixed some bugs.
- Removed the changes for recovery testing framework. I'd like to commit
those changes later separately from the main patch of multiple sync rep.

Barring any objections, I'll commit this patch.

>> One thing I noticed is that there are LOG messages telling me when a
>> standby becomes a synchronous standby, but nothing to tell me if a
>> standby stops being a standby (ie because a higher priority one has
>> taken its place in the quorum). Would that be interesting?

Confirmed that there is no typo "synchornous" in the latest patch.

>> + /*
>> + * If we are managing the sync standby, though we weren't
>> + * prior to this, then announce we are now the sync standby.
>> + */
>>
>> s/ the / a / (two occurrences)

Fixed.

>> + ereport(LOG,
>> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
>> + application_name, MyWalSnd->sync_standby_priority)));
>>
>> s/ the / a /

I have no objection to this change itself. But we have used this message
in 9.5 or before, so if we apply this change, probably we need
back-patching.

>>
>> offered by a transaction commit. This level of protection is referred
>> - to as 2-safe replication in computer science theory.
>> + to as 2-safe replication in computer science theory, and group-1-safe
>> + (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
>> + more than <literal>remote_write</>.
>>
>> Why "more than"? I think those two words should be changed to "at
>> least", or removed.

Removed.

>> + <para>
>> + This syntax allows us to define a synchronous group that will wait for at
>> + least N standbys of them, and a comma-separated list of group
>> members that are surrounded by
>> + parantheses. The special value <literal>*</> for server name
>> matches any standby.
>> + By surrounding list of group members using parantheses,
>> synchronous standbys are chosen from
>> + that group using priority method.
>> + </para>
>>
>> s/parantheses/parentheses/ (two occurrences)

Confirmed that this typo doesn't exist in the latest patch.

>>
>> + <sect2 id="dedicated-language-for-multi-sync-replication-priority">
>> + <title>Prioirty Method</title>
>>
>> s/Prioirty Method/Priority Method/

Confirmed that this typo doesn't exist in the latest patch.

> A couple more comments:
>
> /*
> - * If we aren't managing the highest priority standby then just leave.
> + * If the number of sync standbys is less than requested or we aren't
> + * managing the sync standby then just leave.
> */
> - if (syncWalSnd != MyWalSnd)
> + if (!got_oldest || !am_sync)
>
> s/ the sync / a sync /

Fixed.

> + /*
> + * Consider all pending standbys as sync if the number of them plus
> + * already-found sync ones is lower than the configuration requests.
> + */
> + if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync)
> + return list_concat(result, pending);
>
> The cells from 'pending' will be attached to 'result', and 'result'
> will be freed by the caller. But won't the List header object from
> 'pending' be leaked?

Yes if 'result' is not NIL. I added pfree(pending) for that case.

> + result = lappend_int(result, i);
> + if (list_length(result) == SyncRepConfig->num_sync)
> + {
> + list_free(pending);
> + return result; /* Exit if got enough sync standbys */
> + }
>
> If we didn't take the early return in the list-not-long-enough case
> mentioned above, we should *always* exit via this return statement,
> right? Since we know that the pending list had enough elements to
> reach num_sync. I think that is worth a comment, and also a "not
> reached" comment at the bottom of the function, if it is true.

Good catch! I added the comments. Also added Assert(false) at
the bottom of the function.

> As a future improvement, I wonder if we could avoid recomputing the
> current set of sync standbys in every walsender every time we call
> SyncRepReleaseWaiters, perhaps by maintaining that set incrementally
> in shmem when walsender states change etc.

> I don't have any other comments, other than to say: thank you to all
> the people who have contributed to this feature so far and I really
> really hope it goes into 9.6!

+1000

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v22.patch	text/x-patch	47.0 KB

From:	Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 08:59:20
Message-ID:	20160404085920.GA7426@toroid.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At 2016-04-04 17:28:07 +0900, masao(dot)fujii(at)gmail(dot)com wrote:
>
> Barring any objections, I'll commit this patch.

No objections, just a minor wording tweak:

doc/src/sgml/config.sgml:

"The synchronous standbys will be the standbys that their names appear
early in this list" should be "The synchronous standbys will be those
whose names appear earlier in this list".

doc/src/sgml/high-availability.sgml:

"The standbys that their names appear early in this list are given
higher priority and will be considered as synchronous" should be "The
standbys whose names appear earlier in the list are given higher
priority and will be considered as synchronous".

"The standbys that their names appear early in the list will be used as
the synchronous standby" should be "The standbys whose names appear
earlier in the list will be used as synchronous standbys".

You may prefer to reword this in some other way, but the current "that
their names appear" wording should be changed.

-- Abhijit

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	thomas(dot)munro(at)enterprisedb(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 09:03:41
Message-ID:	20160404.180341.148979338.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for testing.

At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw(at)mail(dot)gmail(dot)com>
> >>> Attached latest patch incorporate some review comments so far, and is
> >>> rebased against current HEAD.
> >>>
> >>
> >> Sorry I attached wrong patch.
> >> Attached patch is correct patch.
> >>
> >> [mulit_sync_replication_v21.patch]
> >
> > Here are some TPS numbers from some quick tests I ran on a set of
> > Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
> > as primary + 3 standbys, to try out different combinations of
> > synchronous_commit levels and synchronous_standby_names numbers. They
> > were run for a short time only and these are of course systems with
> > limited and perhaps uneven IO and CPU, but they still give some idea
> > of the trends. And reassuringly, the trends are travelling in the
> > expected directions.
> >
> > All default settings except shared_buffers = 1GB, and the GUCs
> > required for replication.
> >
> > pgbench postgres -j2 -c2 -N bench2 -T 600
> >
> > 1(*) 2(*) 3(*)
> > ==== ==== ====
> > off = 4056 4096 4092
> > local = 1323 1299 1312
> > remote_write = 1130 1046 958
> > on = 860 744 701
> > remote_apply = 785 725 604
> >
> > pgbench postgres -j16 -c16 -N bench2 -T 600
> >
> > 1(*) 2(*) 3(*)
> > ==== ==== ====
> > off = 3952 3943 3933
> > local = 2964 2984 3026
> > remote_write = 2790 2724 2675
> > on = 2731 2627 2523
> > remote_apply = 2627 2501 2432
> >
> > One thing I noticed is that there are LOG messages telling me when a
> > standby becomes a synchronous standby, but nothing to tell me if a
> > standby stops being a standby (ie because a higher priority one has
> > taken its place in the quorum). Would that be interesting?

A walsender exits by proc_exit() for any operational
termination so wrapping proc_exit() should work. (Attached file 1)

For the setting "2(Sby1, Sby2, Sby3)", the master says that all
of the standbys are sync-standbys and no message is emited on
failure of Sby1, which should cause a promotion of Sby3.

> standby "Sby3" is now the synchronous standby with priority 3
> standby "Sby2" is now the synchronous standby with priority 2
> standby "Sby1" is now the synchronous standby with priority 1
..<Sby 1 failure>
> standby "Sby3" is now the synchronous standby with priority 3

Sby3 becomes sync standby twice:p

This was a behavior taken over from the single-sync-rep era but
it should be confusing for the new sync-rep selection mechanism.
The second attached diff makes this as the following.

> 17:48:21.969 LOG: standby "Sby3" is now a synchronous standby with priority 3
> 17:48:23.087 LOG: standby "Sby2" is now a synchronous standby with priority 2
> 17:48:25.617 LOG: standby "Sby1" is now a synchronous standby with priority 1
> 17:48:31.990 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
> 17:48:43.905 LOG: standby "Sby3" is now a synchronous standby with priority 3
> 17:49:10.262 LOG: standby "Sby1" is now a synchronous standby with priority 1
> 17:49:13.865 LOG: standby "Sby3" is now a potential synchronous standby with priority 3

Since this status check is taken place for every reply from
stanbys, the message of downgrading to "potential" may be
diferred or even fail to occur but it should be no problem.

Applying the both of the above patches, the message would be like
the following.

> 17:54:08.367 LOG: standby "Sby3" is now a synchronous standby with priority 3
> 17:54:08.564 LOG: standby "Sby1" is now a synchronous standby with priority 1
> 17:54:08.565 LOG: standby "Sby2" is now a synchronous standby with priority 2
> 17:54:18.387 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
> 17:54:28.887 LOG: synchronous standby "Sby1" with priority 1 exited
> 17:54:31.359 LOG: standby "Sby3" is now a synchronous standby with priority 3
> 17:54:39.008 LOG: standby "Sby1" is now a synchronous standby with priority 1
> 17:54:41.382 LOG: standby "Sby3" is now a potential synchronous standby with priority 3

Does this make sense?

By the way, Sawada-san, you have changed the parentheses for the
priority method from '[]' to '()'. And I mistankenly defined
s_s_names as '2[Sby1, Sby2, Sby3]' and got wrong behavior, that
is, only Sby2 is registed as mandatory synchronous standby.

For this case, the tree members of SyncRepConfig are '2[Sby1,',
'Sby2', "Sby3]'. This syntax is valid for the current
specification but will surely get different meaning by the future
changes. We should refuse this known-to-be-wrong-in-future syntax
from now.

And, this error was very hard to know. pg_setting only shows the
string itself

=# select name, setting from pg_settings where name = 'synchronous_standby_names';
name | setting
---------------------------+---------------------
synchronous_standby_names | 2[Sby1, Sby2, Sby3]
(1 row)

Since the sintax is no longer so simple, we may need some means
to see the current standby-group setting clearly, but it wont'be
if refusing the known....-future syntax now.

> > Also, I spotted some tiny mistakes:
> >
> > + <indexterm zone="high-availability">
> > + <primary>Dedicated language for multiple synchornous replication</primary>
> > + </indexterm>
> >
> > s/synchornous/synchronous/
> >
> > + /*
> > + * If we are managing the sync standby, though we weren't
> > + * prior to this, then announce we are now the sync standby.
> > + */
> >
> > s/ the / a / (two occurrences)
> >
> > + ereport(LOG,
> > + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
> > + application_name, MyWalSnd->sync_standby_priority)));
> >
> > s/ the / a /
> >
> > offered by a transaction commit. This level of protection is referred
> > - to as 2-safe replication in computer science theory.
> > + to as 2-safe replication in computer science theory, and group-1-safe
> > + (group-safe and 1-safe) when <varname>synchronous_commit</> is set to
> > + more than <literal>remote_write</>.
> >
> > Why "more than"? I think those two words should be changed to "at
> > least", or removed.
> >
> > + <para>
> > + This syntax allows us to define a synchronous group that will wait for at
> > + least N standbys of them, and a comma-separated list of group
> > members that are surrounded by
> > + parantheses. The special value <literal>*</> for server name
> > matches any standby.
> > + By surrounding list of group members using parantheses,
> > synchronous standbys are chosen from
> > + that group using priority method.
> > + </para>
> >
> > s/parantheses/parentheses/ (two occurrences)
> >
> > + <sect2 id="dedicated-language-for-multi-sync-replication-priority">
> > + <title>Prioirty Method</title>
> >
> > s/Prioirty Method/Priority Method/
>
> A couple more comments:
>
> /*
> - * If we aren't managing the highest priority standby then just leave.
> + * If the number of sync standbys is less than requested or we aren't
> + * managing the sync standby then just leave.
> */
> - if (syncWalSnd != MyWalSnd)
> + if (!got_oldest || !am_sync)
>
> s/ the sync / a sync /
>
> + /*
> + * Consider all pending standbys as sync if the number of them plus
> + * already-found sync ones is lower than the configuration requests.
> + */
> + if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync)
> + return list_concat(result, pending);
>
> The cells from 'pending' will be attached to 'result', and 'result'
> will be freed by the caller. But won't the List header object from
> 'pending' be leaked?
>
> + result = lappend_int(result, i);
> + if (list_length(result) == SyncRepConfig->num_sync)
> + {
> + list_free(pending);
> + return result; /* Exit if got enough sync standbys */
> + }
>
> If we didn't take the early return in the list-not-long-enough case
> mentioned above, we should *always* exit via this return statement,
> right? Since we know that the pending list had enough elements to
> reach num_sync. I think that is worth a comment, and also a "not
> reached" comment at the bottom of the function, if it is true.
>
> As a future improvement, I wonder if we could avoid recomputing the
> current set of sync standbys in every walsender every time we call
> SyncRepReleaseWaiters, perhaps by maintaining that set incrementally
> in shmem when walsender states change etc.
>
> I don't have any other comments, other than to say: thank you to all
> the people who have contributed to this feature so far and I really
> really hope it goes into 9.6!

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
multi_sync_exit_msg.diff	text/x-patch	3.1 KB
multi_sync_potential_msg.diff	text/x-patch	1.7 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 09:35:34
Message-ID:	CANP8+jJt_AVMoEUYGCdO8XXm1vPoNLgDxkUBiGRZoNhdBxSt_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4 April 2016 at 09:28, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> Barring any objections, I'll commit this patch.
>

That sounds good.

May I have one more day to review this? Actually more like 3-4 hours.

I have no comments on an initial read, so I'm hopeful of having nothing at
all to say on it.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 09:45:46
Message-ID:	20160404094546.GB21257@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-04-04 10:35:34 +0100, Simon Riggs wrote:
> On 4 April 2016 at 09:28, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > Barring any objections, I'll commit this patch.

No objection here either, just one question: Has anybody thought about
the ability to extend this to do per-database syncrep? Logical decoding
works on a database level, and that can cause some problems with global
configuration.

> That sounds good.
>
> May I have one more day to review this? Actually more like 3-4 hours.

> I have no comments on an initial read, so I'm hopeful of having nothing at
> all to say on it.

Simon, perhaps you could hold the above question in your mind while
looking through this?

Thanks,

Andres

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 13:00:24
Message-ID:	CAD21AoDoq1ubY4KkKhrA9jzaVXekwAT7gV5pQJbS+wj98b9-3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello, thank you for testing.
>
> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw(at)mail(dot)gmail(dot)com>
>> >>> Attached latest patch incorporate some review comments so far, and is
>> >>> rebased against current HEAD.
>> >>>
>> >>
>> >> Sorry I attached wrong patch.
>> >> Attached patch is correct patch.
>> >>
>> >> [mulit_sync_replication_v21.patch]
>> >
>> > Here are some TPS numbers from some quick tests I ran on a set of
>> > Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
>> > as primary + 3 standbys, to try out different combinations of
>> > synchronous_commit levels and synchronous_standby_names numbers. They
>> > were run for a short time only and these are of course systems with
>> > limited and perhaps uneven IO and CPU, but they still give some idea
>> > of the trends. And reassuringly, the trends are travelling in the
>> > expected directions.
>> >
>> > All default settings except shared_buffers = 1GB, and the GUCs
>> > required for replication.
>> >
>> > pgbench postgres -j2 -c2 -N bench2 -T 600
>> >
>> > 1(*) 2(*) 3(*)
>> > ==== ==== ====
>> > off = 4056 4096 4092
>> > local = 1323 1299 1312
>> > remote_write = 1130 1046 958
>> > on = 860 744 701
>> > remote_apply = 785 725 604
>> >
>> > pgbench postgres -j16 -c16 -N bench2 -T 600
>> >
>> > 1(*) 2(*) 3(*)
>> > ==== ==== ====
>> > off = 3952 3943 3933
>> > local = 2964 2984 3026
>> > remote_write = 2790 2724 2675
>> > on = 2731 2627 2523
>> > remote_apply = 2627 2501 2432
>> >
>> > One thing I noticed is that there are LOG messages telling me when a
>> > standby becomes a synchronous standby, but nothing to tell me if a
>> > standby stops being a standby (ie because a higher priority one has
>> > taken its place in the quorum). Would that be interesting?
>
> A walsender exits by proc_exit() for any operational
> termination so wrapping proc_exit() should work. (Attached file 1)
>
> For the setting "2(Sby1, Sby2, Sby3)", the master says that all
> of the standbys are sync-standbys and no message is emited on
> failure of Sby1, which should cause a promotion of Sby3.
>
>> standby "Sby3" is now the synchronous standby with priority 3
>> standby "Sby2" is now the synchronous standby with priority 2
>> standby "Sby1" is now the synchronous standby with priority 1
> ..<Sby 1 failure>
>> standby "Sby3" is now the synchronous standby with priority 3
>
> Sby3 becomes sync standby twice:p
>
> This was a behavior taken over from the single-sync-rep era but
> it should be confusing for the new sync-rep selection mechanism.
> The second attached diff makes this as the following.
>
>
>> 17:48:21.969 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:48:23.087 LOG: standby "Sby2" is now a synchronous standby with priority 2
>> 17:48:25.617 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:48:31.990 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>> 17:48:43.905 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:49:10.262 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:49:13.865 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>
> Since this status check is taken place for every reply from
> stanbys, the message of downgrading to "potential" may be
> diferred or even fail to occur but it should be no problem.
>
> Applying the both of the above patches, the message would be like
> the following.
>
>> 17:54:08.367 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:54:08.564 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:54:08.565 LOG: standby "Sby2" is now a synchronous standby with priority 2
>> 17:54:18.387 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>> 17:54:28.887 LOG: synchronous standby "Sby1" with priority 1 exited
>> 17:54:31.359 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> 17:54:39.008 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> 17:54:41.382 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>
> Does this make sense?
>
> By the way, Sawada-san, you have changed the parentheses for the
> priority method from '[]' to '()'. And I mistankenly defined
> s_s_names as '2[Sby1, Sby2, Sby3]' and got wrong behavior, that
> is, only Sby2 is registed as mandatory synchronous standby.
>
> For this case, the tree members of SyncRepConfig are '2[Sby1,',
> 'Sby2', "Sby3]'. This syntax is valid for the current
> specification but will surely get different meaning by the future
> changes. We should refuse this known-to-be-wrong-in-future syntax
> from now.
>

I have no objection about current version patch.
But one optimise idea I came up with is to return false before
calculation of lowest LSN from sync standby if MyWalSnd is not listed
in sync_standby.
For example in SyncRepGetOldestSyncRecPtr(),

==
sync_standby = SyncRepGetSyncStandbys();

if (list_length(sync_standbys) <SyncRepConfig->num_sync()
{
(snip)
}

/* Here if MyWalSnd is not listed in sync_standby, quick exit. */
if (list_member_int(sync_standbys, MyWalSnd->slotno))
return false;

foreach(cell, sync_standbys)
{
(snip)
}
==

> For this case, the tree members of SyncRepConfig are '2[Sby1,',
> 'Sby2', "Sby3]'. This syntax is valid for the current
> specification but will surely get different meaning by the future
> changes. We should refuse this known-to-be-wrong-in-future syntax
> from now.

I couldn't get your point but why will the above syntax meaning be
different from current meaning by future change?
I thought that another method uses another kind of parentheses.

Regards,

--
Masahiko Sawada

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-04 13:03:55
Message-ID:	CANP8+jKHT1=7unXWJv_pisriHbPQRE7joAPCa1PTbJLouCiyZw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4 April 2016 at 10:45, Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> Simon, perhaps you could hold the above question in your mind while
> looking through this?
>

Sure, np.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 07:31:22
Message-ID:	CAA4eK1L55ZgqZKFDQrtR7dnxuY+EYU3aPh+FV1avHgrxWierhg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 4, 2016 at 1:58 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>
> Thanks for updating the patch!
>
> I applied the following changes to the patch.
> Attached is the revised version of the patch.
>

1.
{
{"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
gettext_noop("List of names of potential synchronous standbys."),
NULL,
GUC_LIST_INPUT
},
&SyncRepStandbyNames,
"",
check_synchronous_standby_names, NULL, NULL
},

Isn't it better to modify the description of synchronous_standby_names in
guc.c based on new usage?

2.
pg_stat_get_wal_senders()
{
..
/*
! * Allocate and update the config data of synchronous replication,
! * and then get the currently active synchronous standbys.
*/
+ SyncRepUpdateConfig();
LWLockAcquire(SyncRepLock, LW_SHARED);
! sync_standbys = SyncRepGetSyncStandbys();
LWLockRelease(SyncRepLock);
..
}

Why is it important to update the config with patch? Earlier also any
update to config between calls wouldn't have been visible.

3.
<title>Planning for High Availability</title>

<para>
! <varname>synchronous_standby_names</> specifies the number of
! synchronous standbys that transaction commits made when

Is it better to say like: <varname>synchronous_standby_names</> specifies
the number and names of

4.
+ /*
+ * Return the list of sync standbys, or NIL if no sync standby is
connected.
+ *
+ * If there are multiple standbys with the same priority,
+ * the first one found is considered as higher priority.

Here line indentation of second line can be improved.

5.
! /*
! * syncrep_yyparse sets the global syncrep_parse_result as side effect.
! * But this function is required to just check, so frees it
! * once parsing parameter.
! */
! SyncRepFreeConfig(syncrep_parse_result);

How about below change in comment?
/so frees it once parsing parameter/so frees it after parsing the parameter

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 07:35:53
Message-ID:	CANP8+jLHfBVv_pW6grASNUpW+bdk5DcTu7GWpNAP-+-ZWvKT6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4 April 2016 at 10:35, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> On 4 April 2016 at 09:28, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>
>> Barring any objections, I'll commit this patch.
>>
>
> That sounds good.
>
> May I have one more day to review this? Actually more like 3-4 hours.
>

What we have here is useful and elegant. I love the simplicity and
backwards compatibility of the design. Very nice, chef.

I am in favour of committing something for 9.6, though I do have some
objective comments

1. Header comments in syncrep.c need changes, not just additions.

2. We need tests to ensure that k >=1 and k<=N

3. There should be a WARNING if k == N to say that we don't yet provide a
level to give Apply consistency. (I mean if we specify 2 (n1, n2) or 3(n1,
n2, n3) etc

4. How does it work?
It's pretty strange, but that isn't documented anywhere. It took me a while
to figure it out even though I know that code. My thought is its a lot
slower than before, which is a concern when we know by definition that k
>=2 for the new feature. I was going to mention the fact that this code
only needs to be executed by standbys mentioned in s_s_n, so we can avoid
overhead and contention for async standbys (But Masahiko just mentioned
that also).

5. Timing – k > 1 will be slower by definition and more complex to
configure, yet there is no timing facility to measure the effect of this,
even though we have a new timing facility in 9.6. It would be useful to
have a track_syncrep option to keep track of typical response times from
nodes.

6. Meaning of k (n1, n2, n3) with N servers

It's clearly documented that this means k replies IN SEQUENCE. I believe
the typical meaning of would be “any k out of N”, which would be faster
than what we have, e.g.
3 (n1, n2, n3) would release as soon as (n1, n2) or (n2, n3) or (n1, n3)
acknowledge.

The “any k” option is not currently possible, but could be fairly easily.
The syntax should also be easily extensible.

I would call what we have now “first” semantics, and we could have both of
these...

* first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
* any k (n1, n2, n3) – would release waiters as soon as we have the
responses from k out of N standbys. “any k” would be faster, so is
desirable for performance and resilience

>>> So I am suggesting we put an extra keyword in front of the “k”, to
explain how the k responses should be gathered as an extension to the the
syntax. I also think implementing “any k” is actually fairly trivial and
could be done for 9.6 (rather than just "first k").

Future thoughts that relate to syntax choices now, not for 9.6

Eventually I would want to be able to specify this…
2 ( any (london1, london2), any (nyc1, nyc2))
meaning I want a response from at least 1 London server and at least one
NYC server, but whichever one responds first doesn't matter.

And I also want to be able to specify node groups in there. So elsewhere we
would specify London node group as (London1, London2) and NYC node group as
(NYC1, NYC2) and then specify

any 2 (London, NYC, Tokyo).

Good work

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 07:58:46
Message-ID:	57037036.3070105@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016/04/05 16:35, Simon Riggs wrote:
> 6. Meaning of k (n1, n2, n3) with N servers
>
> It's clearly documented that this means k replies IN SEQUENCE. I believe
> the typical meaning of would be “any k out of N”, which would be faster
> than what we have, e.g.
> 3 (n1, n2, n3) would release as soon as (n1, n2) or (n2, n3) or (n1, n3)
> acknowledge.
>
> The “any k” option is not currently possible, but could be fairly easily.
> The syntax should also be easily extensible.
>
> I would call what we have now “first” semantics, and we could have both of
> these...
>
> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
> * any k (n1, n2, n3) – would release waiters as soon as we have the
> responses from k out of N standbys. “any k” would be faster, so is
> desirable for performance and resilience
>
>>>> So I am suggesting we put an extra keyword in front of the “k”, to
> explain how the k responses should be gathered as an extension to the the
> syntax. I also think implementing “any k” is actually fairly trivial and
> could be done for 9.6 (rather than just "first k").

+1 for 'first/any k (...)', with possibly only 'first' supported for now,
if the 'any' case is more involved than we would like to spend time on,
given the time considerations. IMHO, the extra keyword adds to clarity of
the syntax.

Thanks,
Amit

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 08:48:47
Message-ID:	CAHGQGwEkuUe5XHkjWkK6hznBjVA+Kg_aNCxDA8iAZCVC+3j-9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 4, 2016 at 5:59 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> At 2016-04-04 17:28:07 +0900, masao(dot)fujii(at)gmail(dot)com wrote:
>>
>> Barring any objections, I'll commit this patch.
>
> No objections, just a minor wording tweak:
>
> doc/src/sgml/config.sgml:
>
> "The synchronous standbys will be the standbys that their names appear
> early in this list" should be "The synchronous standbys will be those
> whose names appear earlier in this list".
>
> doc/src/sgml/high-availability.sgml:
>
> "The standbys that their names appear early in this list are given
> higher priority and will be considered as synchronous" should be "The
> standbys whose names appear earlier in the list are given higher
> priority and will be considered as synchronous".
>
> "The standbys that their names appear early in the list will be used as
> the synchronous standby" should be "The standbys whose names appear
> earlier in the list will be used as synchronous standbys".
>
> You may prefer to reword this in some other way, but the current "that
> their names appear" wording should be changed.

Thanks for the review! Will apply these comments to new patch.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 09:08:20
Message-ID:	CAHGQGwG+DM=LCctG6q_Uxkgk17CbLKrHBwtPfUN3+Hu9QbvNuQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 4, 2016 at 10:00 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Hello, thank you for testing.
>>
>> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw(at)mail(dot)gmail(dot)com>
>>> >>> Attached latest patch incorporate some review comments so far, and is
>>> >>> rebased against current HEAD.
>>> >>>
>>> >>
>>> >> Sorry I attached wrong patch.
>>> >> Attached patch is correct patch.
>>> >>
>>> >> [mulit_sync_replication_v21.patch]
>>> >
>>> > Here are some TPS numbers from some quick tests I ran on a set of
>>> > Amazon EC2 m3.large instances ("2 vCPU" virtual machines) configured
>>> > as primary + 3 standbys, to try out different combinations of
>>> > synchronous_commit levels and synchronous_standby_names numbers. They
>>> > were run for a short time only and these are of course systems with
>>> > limited and perhaps uneven IO and CPU, but they still give some idea
>>> > of the trends. And reassuringly, the trends are travelling in the
>>> > expected directions.
>>> >
>>> > All default settings except shared_buffers = 1GB, and the GUCs
>>> > required for replication.
>>> >
>>> > pgbench postgres -j2 -c2 -N bench2 -T 600
>>> >
>>> > 1(*) 2(*) 3(*)
>>> > ==== ==== ====
>>> > off = 4056 4096 4092
>>> > local = 1323 1299 1312
>>> > remote_write = 1130 1046 958
>>> > on = 860 744 701
>>> > remote_apply = 785 725 604
>>> >
>>> > pgbench postgres -j16 -c16 -N bench2 -T 600
>>> >
>>> > 1(*) 2(*) 3(*)
>>> > ==== ==== ====
>>> > off = 3952 3943 3933
>>> > local = 2964 2984 3026
>>> > remote_write = 2790 2724 2675
>>> > on = 2731 2627 2523
>>> > remote_apply = 2627 2501 2432
>>> >
>>> > One thing I noticed is that there are LOG messages telling me when a
>>> > standby becomes a synchronous standby, but nothing to tell me if a
>>> > standby stops being a standby (ie because a higher priority one has
>>> > taken its place in the quorum). Would that be interesting?
>>
>> A walsender exits by proc_exit() for any operational
>> termination so wrapping proc_exit() should work. (Attached file 1)
>>
>> For the setting "2(Sby1, Sby2, Sby3)", the master says that all
>> of the standbys are sync-standbys and no message is emited on
>> failure of Sby1, which should cause a promotion of Sby3.
>>
>>> standby "Sby3" is now the synchronous standby with priority 3
>>> standby "Sby2" is now the synchronous standby with priority 2
>>> standby "Sby1" is now the synchronous standby with priority 1
>> ..<Sby 1 failure>
>>> standby "Sby3" is now the synchronous standby with priority 3
>>
>> Sby3 becomes sync standby twice:p
>>
>> This was a behavior taken over from the single-sync-rep era but
>> it should be confusing for the new sync-rep selection mechanism.
>> The second attached diff makes this as the following.
>>
>>
>>> 17:48:21.969 LOG: standby "Sby3" is now a synchronous standby with priority 3
>>> 17:48:23.087 LOG: standby "Sby2" is now a synchronous standby with priority 2
>>> 17:48:25.617 LOG: standby "Sby1" is now a synchronous standby with priority 1
>>> 17:48:31.990 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>>> 17:48:43.905 LOG: standby "Sby3" is now a synchronous standby with priority 3
>>> 17:49:10.262 LOG: standby "Sby1" is now a synchronous standby with priority 1
>>> 17:49:13.865 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>>
>> Since this status check is taken place for every reply from
>> stanbys, the message of downgrading to "potential" may be
>> diferred or even fail to occur but it should be no problem.
>>
>> Applying the both of the above patches, the message would be like
>> the following.
>>
>>> 17:54:08.367 LOG: standby "Sby3" is now a synchronous standby with priority 3
>>> 17:54:08.564 LOG: standby "Sby1" is now a synchronous standby with priority 1
>>> 17:54:08.565 LOG: standby "Sby2" is now a synchronous standby with priority 2
>>> 17:54:18.387 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>>> 17:54:28.887 LOG: synchronous standby "Sby1" with priority 1 exited
>>> 17:54:31.359 LOG: standby "Sby3" is now a synchronous standby with priority 3
>>> 17:54:39.008 LOG: standby "Sby1" is now a synchronous standby with priority 1
>>> 17:54:41.382 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>>
>> Does this make sense?
>>
>> By the way, Sawada-san, you have changed the parentheses for the
>> priority method from '[]' to '()'. And I mistankenly defined
>> s_s_names as '2[Sby1, Sby2, Sby3]' and got wrong behavior, that
>> is, only Sby2 is registed as mandatory synchronous standby.
>>
>> For this case, the tree members of SyncRepConfig are '2[Sby1,',
>> 'Sby2', "Sby3]'. This syntax is valid for the current
>> specification but will surely get different meaning by the future
>> changes. We should refuse this known-to-be-wrong-in-future syntax
>> from now.
>>
>
> I have no objection about current version patch.
> But one optimise idea I came up with is to return false before
> calculation of lowest LSN from sync standby if MyWalSnd is not listed
> in sync_standby.
> For example in SyncRepGetOldestSyncRecPtr(),
>
> ==
> sync_standby = SyncRepGetSyncStandbys();
>
> if (list_length(sync_standbys) <SyncRepConfig->num_sync()
> {
> (snip)
> }
>
> /* Here if MyWalSnd is not listed in sync_standby, quick exit. */
> if (list_member_int(sync_standbys, MyWalSnd->slotno))
> return false;

list_member_int() performs the loop internally. So I'm not sure how much
adding extra list_member_int() here can optimize this processing.
Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
standby or not. In this idea, without adding extra loop, we can exit earilier
in the case where I'm not a sync standby. Does this make sense?

Regards,

--
Fujii Masao

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 09:09:39
Message-ID:	CANP8+jKfnKM9QZUKmj04Pxjjkty+HR45sx_1tddPegtT57yj9Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 April 2016 at 08:58, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
wrote:

> >>>> So I am suggesting we put an extra keyword in front of the “k”, to
> > explain how the k responses should be gathered as an extension to the the
> > syntax. I also think implementing “any k” is actually fairly trivial and
> > could be done for 9.6 (rather than just "first k").
>
> +1 for 'first/any k (...)', with possibly only 'first' supported for now,
> if the 'any' case is more involved than we would like to spend time on,
> given the time considerations. IMHO, the extra keyword adds to clarity of
> the syntax.
>

Further thoughts:

I said "any k" was faster, though what I mean is both faster and more
robust. If you have network peaks from any of the k sync standbys then the
user will wait longer. With "any k", if a network peak occurs, then another
standby response will work just as well. So the performance of "any k" will
be both faster, more consistent and less prone to misconfiguration.

I also didn't explain why I think it is easy to implement "any k".

All we need to do is change SyncRepGetOldestSyncRecPtr() so that it returns
the k'th oldest pointer of any named standby. Then use that to wake up user
backends. So the change requires only slightly modified logic in a very
isolated part of the code, almost all of which would be code inserts to
cope with the new option. The syntax and doc changes would take a couple of
hours.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 09:10:24
Message-ID:	CAHGQGwEBpmD7M4e_-LiP8F47EwLrgat3naNqVuAErn=B3tSDNQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 4, 2016 at 6:45 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-04-04 10:35:34 +0100, Simon Riggs wrote:
>> On 4 April 2016 at 09:28, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > Barring any objections, I'll commit this patch.
>
> No objection here either, just one question: Has anybody thought about
> the ability to extend this to do per-database syncrep?

Nope at least for me... You'd like to extend synchronous_standby_names
so that users can specify that per-database?

Regards,

--
Fujii Masao

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 09:13:50
Message-ID:	CANP8+jJb6f43xvwKRaMx7fpBPB4F-zDcaS1wqyMDu2HNDojD=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 April 2016 at 10:10, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Mon, Apr 4, 2016 at 6:45 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2016-04-04 10:35:34 +0100, Simon Riggs wrote:
> >> On 4 April 2016 at 09:28, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >> > Barring any objections, I'll commit this patch.
> >
> > No objection here either, just one question: Has anybody thought about
> > the ability to extend this to do per-database syncrep?
>
> Nope at least for me... You'd like to extend synchronous_standby_names
> so that users can specify that per-database?
>

As requested, I did consider whether we could have syntax for per-database
settings.

ISTM that it is already possible to have one database in async mode and
another in sync mode, using settings of synchronous_commit.

The easiest way to have per-database settings if you want more is to use
different instances. Adding a dbname into the syntax would complicate it
significantly and even if we agreed that, I don't think it would happen for
9.6. The lack of per-database settings is not a blocker for me.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 09:26:41
Message-ID:	20160405092641.2yz6xfsulk2o6rwa@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016-04-05 10:13:50 +0100, Simon Riggs wrote:
> The lack of per-database settings is not a blocker for me.

Just to clarify: Neither is it for me.

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 09:45:46
Message-ID:	CAHGQGwGi4wCJAro9z3G6L8=-TTn25=oGgn7gXYU3gqRfh3CwtQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 4:31 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Apr 4, 2016 at 1:58 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>>
>> Thanks for updating the patch!
>>
>> I applied the following changes to the patch.
>> Attached is the revised version of the patch.
>>
>
> 1.
> {
> {"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
> gettext_noop("List of names of potential synchronous standbys."),
> NULL,
> GUC_LIST_INPUT
> },
> &SyncRepStandbyNames,
> "",
> check_synchronous_standby_names, NULL, NULL
> },
>
> Isn't it better to modify the description of synchronous_standby_names in
> guc.c based on new usage?

What about "Number of synchronous standbys and list of names of
potential synchronous ones"? Better idea?

> 2.
> pg_stat_get_wal_senders()
> {
> ..
> /*
> ! * Allocate and update the config data of synchronous replication,
> ! * and then get the currently active synchronous standbys.
> */
> + SyncRepUpdateConfig();
> LWLockAcquire(SyncRepLock, LW_SHARED);
> ! sync_standbys = SyncRepGetSyncStandbys();
> LWLockRelease(SyncRepLock);
> ..
> }
>
> Why is it important to update the config with patch? Earlier also any
> update to config between calls wouldn't have been visible.

Because a backend has no chance to call SyncRepUpdateConfig() and
parse the latest value of s_s_names if SyncRepUpdateConfig() is not
called here. This means that pg_stat_replication may return the information
based on the old value of s_s_names.

> 3.
> <title>Planning for High Availability</title>
>
> <para>
> ! <varname>synchronous_standby_names</> specifies the number of
> ! synchronous standbys that transaction commits made when
>
> Is it better to say like: <varname>synchronous_standby_names</> specifies
> the number and names of

Precisely s_s_names specifies a list of names of potential sync standbys
not sync ones.

> 4.
> + /*
> + * Return the list of sync standbys, or NIL if no sync standby is
> connected.
> + *
> + * If there are multiple standbys with the same priority,
> + * the first one found is considered as higher priority.
>
> Here line indentation of second line can be improved.

What about "the first one found is selected first"? Or better idea?

>
> ! /*
> ! * syncrep_yyparse sets the global syncrep_parse_result as side effect.
> ! * But this function is required to just check, so frees it
> ! * once parsing parameter.
> ! */
> ! SyncRepFreeConfig(syncrep_parse_result);
>
> How about below change in comment?
> /so frees it once parsing parameter/so frees it after parsing the parameter

Will apply this to the patch.

Thanks for the review!

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 10:17:27
Message-ID:	20160405.191727.58728480.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 5 Apr 2016 18:08:20 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwG+DM=LCctG6q_Uxkgk17CbLKrHBwtPfUN3+Hu9QbvNuQ(at)mail(dot)gmail(dot)com>
> On Mon, Apr 4, 2016 at 10:00 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >> Hello, thank you for testing.
> >>
> >> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw(at)mail(dot)gmail(dot)com>
> >>> > One thing I noticed is that there are LOG messages telling me when a
> >>> > standby becomes a synchronous standby, but nothing to tell me if a
> >>> > standby stops being a standby (ie because a higher priority one has
> >>> > taken its place in the quorum). Would that be interesting?
> >>
> >> A walsender exits by proc_exit() for any operational
> >> termination so wrapping proc_exit() should work. (Attached file 1)
> >>
> >> For the setting "2(Sby1, Sby2, Sby3)", the master says that all
> >> of the standbys are sync-standbys and no message is emited on
> >> failure of Sby1, which should cause a promotion of Sby3.
> >>
> >>> standby "Sby3" is now the synchronous standby with priority 3
> >>> standby "Sby2" is now the synchronous standby with priority 2
> >>> standby "Sby1" is now the synchronous standby with priority 1
> >> ..<Sby 1 failure>
> >>> standby "Sby3" is now the synchronous standby with priority 3
> >>
> >> Sby3 becomes sync standby twice:p
> >>
> >> This was a behavior taken over from the single-sync-rep era but
> >> it should be confusing for the new sync-rep selection mechanism.
> >> The second attached diff makes this as the following.
...
> >> Applying the both of the above patches, the message would be like
> >> the following.
> >>
> >>> 17:54:08.367 LOG: standby "Sby3" is now a synchronous standby with priority 3
> >>> 17:54:08.564 LOG: standby "Sby1" is now a synchronous standby with priority 1
> >>> 17:54:08.565 LOG: standby "Sby2" is now a synchronous standby with priority 2
> >>> 17:54:18.387 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
> >>> 17:54:28.887 LOG: synchronous standby "Sby1" with priority 1 exited
> >>> 17:54:31.359 LOG: standby "Sby3" is now a synchronous standby with priority 3
> >>> 17:54:39.008 LOG: standby "Sby1" is now a synchronous standby with priority 1
> >>> 17:54:41.382 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
> >>
> >> Does this make sense?
> >>
> >> By the way, Sawada-san, you have changed the parentheses for the
> >> priority method from '[]' to '()'. And I mistankenly defined
> >> s_s_names as '2[Sby1, Sby2, Sby3]' and got wrong behavior, that
> >> is, only Sby2 is registed as mandatory synchronous standby.
> >>
> >> For this case, the tree members of SyncRepConfig are '2[Sby1,',
> >> 'Sby2', "Sby3]'. This syntax is valid for the current
> >> specification but will surely get different meaning by the future
> >> changes. We should refuse this known-to-be-wrong-in-future syntax
> >> from now.
> >>
> >
> > I have no objection about current version patch.
> > But one optimise idea I came up with is to return false before
> > calculation of lowest LSN from sync standby if MyWalSnd is not listed
> > in sync_standby.
> > For example in SyncRepGetOldestSyncRecPtr(),
> >
> > ==
> > sync_standby = SyncRepGetSyncStandbys();
> >
> > if (list_length(sync_standbys) <SyncRepConfig->num_sync()
> > {
> > (snip)
> > }
> >
> > /* Here if MyWalSnd is not listed in sync_standby, quick exit. */
> > if (list_member_int(sync_standbys, MyWalSnd->slotno))
> > return false;
>
> list_member_int() performs the loop internally. So I'm not sure how much
> adding extra list_member_int() here can optimize this processing.
> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
> standby or not. In this idea, without adding extra loop, we can exit earilier
> in the case where I'm not a sync standby. Does this make sense?

The list_member_int() is also performed in the "(snip)" part. So
SyncRepGetSyncStandbys() returning am_sync seems making sense.

sync_standbys = SyncRepGetSyncStandbys(am_sync);

/*
* Quick exit if I am not synchronous or there's not
* enough synchronous standbys
* /
if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
{
list_free(sync_standbys);
return false;
}

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 10:18:09
Message-ID:	CAHGQGwFSPPL1Y5EF9mnDg27wyVo_fR57nuvYBCEwRE8kW0uctg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 4:35 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 4 April 2016 at 10:35, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>
>> On 4 April 2016 at 09:28, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>>>
>>> Barring any objections, I'll commit this patch.
>>
>>
>> That sounds good.
>>
>> May I have one more day to review this? Actually more like 3-4 hours.
>
>
> What we have here is useful and elegant. I love the simplicity and backwards
> compatibility of the design. Very nice, chef.
>
> I am in favour of committing something for 9.6, though I do have some
> objective comments

Thanks for the review!

> 1. Header comments in syncrep.c need changes, not just additions.

Okay, will consider this later. And I'd appreciate if you elaborate what
changes are necessary specifically.

> 2. We need tests to ensure that k >=1 and k<=N

The changes to replication test framework was included in the patch before,
but I excluded it from the patch because I'd like to commit the core part of
the patch first. Will review the test part later.

>
> 3. There should be a WARNING if k == N to say that we don't yet provide a
> level to give Apply consistency. (I mean if we specify 2 (n1, n2) or 3(n1,
> n2, n3) etc

Sorry I failed to get your point. Could you tell me what Apply consistency
and why we cannot provide it when k = N?

> 4. How does it work?
> It's pretty strange, but that isn't documented anywhere. It took me a while
> to figure it out even though I know that code. My thought is its a lot
> slower than before, which is a concern when we know by definition that k >=2
> for the new feature. I was going to mention the fact that this code only
> needs to be executed by standbys mentioned in s_s_n, so we can avoid
> overhead and contention for async standbys (But Masahiko just mentioned that
> also).

Unless I'm missing something, the patch already avoids the overhead
of async standbys. Please see the top of SyncRepReleaseWaiters().
Since async standbys exit at the beginning of SyncRepReleaseWaiters(),
they don't need to perform any operations that the patch adds
(e.g., find out which standbys are synchronous).

> 5. Timing – k > 1 will be slower by definition and more complex to
> configure, yet there is no timing facility to measure the effect of this,
> even though we have a new timing facility in 9.6. It would be useful to have
> a track_syncrep option to keep track of typical response times from nodes.

Maybe it's useful. But it seems completely new feature, so I'm not sure
if we have enough time to push it to 9.6. Probably it's for 9.7.

> 6. Meaning of k (n1, n2, n3) with N servers
>
> It's clearly documented that this means k replies IN SEQUENCE. I believe the
> typical meaning of would be “any k out of N”, which would be faster than
> what we have, e.g.
> 3 (n1, n2, n3) would release as soon as (n1, n2) or (n2, n3) or (n1, n3)
> acknowledge.
>
> The “any k” option is not currently possible, but could be fairly easily.
> The syntax should also be easily extensible.
>
> I would call what we have now “first” semantics, and we could have both of
> these...
>
> * first k (n1, n2, n3) – does the same as k (n1, n2, n3) does now
> * any k (n1, n2, n3) – would release waiters as soon as we have the
> responses from k out of N standbys. “any k” would be faster, so is desirable
> for performance and resilience

We discussed the syntax very long time, so restarting the discussion
and keeping the patch uncommited is not good. We might fail to commit
anything about N-sync rep in 9.6. So let's commit the current patch first
and restart the discussion later.

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	thomas(dot)munro(at)enterprisedb(dot)com, masao(dot)fujii(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 10:23:05
Message-ID:	20160405.192305.57944288.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 4 Apr 2016 22:00:24 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDoq1ubY4KkKhrA9jzaVXekwAT7gV5pQJbS+wj98b9-3A(at)mail(dot)gmail(dot)com>
> > For this case, the tree members of SyncRepConfig are '2[Sby1,',
> > 'Sby2', "Sby3]'. This syntax is valid for the current
> > specification but will surely get different meaning by the future
> > changes. We should refuse this known-to-be-wrong-in-future syntax
> > from now.
>
> I couldn't get your point but why will the above syntax meaning be
> different from current meaning by future change?
> I thought that another method uses another kind of parentheses.

If the 'another kind of parehtheses' is a pair of brackets, an
application_name 'tokyo[A]', for example, is currently allowed to
occur unquoted in the list but will become disallowed by the
syntax change.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 10:23:41
Message-ID:	CAHGQGwGHTHFEgoyDFpzjPeDpTfkW+xzb3VYSGSn9CGHrUsYqAg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 6:09 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 5 April 2016 at 08:58, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
> wrote:
>
>>
>> >>>> So I am suggesting we put an extra keyword in front of the “k”, to
>> > explain how the k responses should be gathered as an extension to the
>> > the
>> > syntax. I also think implementing “any k” is actually fairly trivial and
>> > could be done for 9.6 (rather than just "first k").
>>
>> +1 for 'first/any k (...)', with possibly only 'first' supported for now,
>> if the 'any' case is more involved than we would like to spend time on,
>> given the time considerations. IMHO, the extra keyword adds to clarity of
>> the syntax.
>
>
> Further thoughts:
>
> I said "any k" was faster, though what I mean is both faster and more
> robust. If you have network peaks from any of the k sync standbys then the
> user will wait longer. With "any k", if a network peak occurs, then another
> standby response will work just as well. So the performance of "any k" will
> be both faster, more consistent and less prone to misconfiguration.
>
> I also didn't explain why I think it is easy to implement "any k".
>
> All we need to do is change SyncRepGetOldestSyncRecPtr() so that it returns
> the k'th oldest pointer of any named standby.

s/oldest/newest ?

> Then use that to wake up user
> backends. So the change requires only slightly modified logic in a very
> isolated part of the code, almost all of which would be code inserts to cope
> with the new option.

Yes. Probably we need to use some time to find what algorithm is the best
for searching the k'th newest pointer.

> The syntax and doc changes would take a couple of
> hours.

Yes, the updates of documentation would need more time.

Regards,

--
Fujii Masao

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 11:08:25
Message-ID:	CANP8+jLo4cojuxfb5yChqowtpZ0B29MVLPyRZ_PJMOBMwPWF=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 April 2016 at 11:18, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> > 1. Header comments in syncrep.c need changes, not just additions.
>
> Okay, will consider this later. And I'd appreciate if you elaborate what
> changes are necessary specifically.

Some of the old header comments are now wrong.

> > 2. We need tests to ensure that k >=1 and k<=N
>
> The changes to replication test framework was included in the patch before,
> but I excluded it from the patch because I'd like to commit the core part
> of
> the patch first. Will review the test part later.

I meant tests of setting the parameters, not tests of the feature itself.

> >
> > 3. There should be a WARNING if k == N to say that we don't yet provide a
> > level to give Apply consistency. (I mean if we specify 2 (n1, n2) or
> 3(n1,
> > n2, n3) etc
>
> Sorry I failed to get your point. Could you tell me what Apply consistency
> and why we cannot provide it when k = N?
>
> > 4. How does it work?
> > It's pretty strange, but that isn't documented anywhere. It took me a
> while
> > to figure it out even though I know that code. My thought is its a lot
> > slower than before, which is a concern when we know by definition that k
> >=2
> > for the new feature. I was going to mention the fact that this code only
> > needs to be executed by standbys mentioned in s_s_n, so we can avoid
> > overhead and contention for async standbys (But Masahiko just mentioned
> that
> > also).
>
> Unless I'm missing something, the patch already avoids the overhead
> of async standbys. Please see the top of SyncRepReleaseWaiters().
> Since async standbys exit at the beginning of SyncRepReleaseWaiters(),
> they don't need to perform any operations that the patch adds
> (e.g., find out which standbys are synchronous).
>

I was thinking about the overhead of scanning through the full list of
WALSenders for each message, when it is a sync standby.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 11:17:21
Message-ID:	CAHGQGwE8_F79BUpC5TmJ7aazXU=Uju0VznFCCKDK57-wNpHV-g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 7:17 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Tue, 5 Apr 2016 18:08:20 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwG+DM=LCctG6q_Uxkgk17CbLKrHBwtPfUN3+Hu9QbvNuQ(at)mail(dot)gmail(dot)com>
>> On Mon, Apr 4, 2016 at 10:00 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> > On Mon, Apr 4, 2016 at 6:03 PM, Kyotaro HORIGUCHI
>> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >> Hello, thank you for testing.
>> >>
>> >> At Sat, 2 Apr 2016 14:20:55 +1300, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> wrote in <CAEepm=2sdDL2hs3XbWb5FORegNHBObLJ-8C2=aaeG-riZTd2Rw(at)mail(dot)gmail(dot)com>
>> >>> > One thing I noticed is that there are LOG messages telling me when a
>> >>> > standby becomes a synchronous standby, but nothing to tell me if a
>> >>> > standby stops being a standby (ie because a higher priority one has
>> >>> > taken its place in the quorum). Would that be interesting?
>> >>
>> >> A walsender exits by proc_exit() for any operational
>> >> termination so wrapping proc_exit() should work. (Attached file 1)
>> >>
>> >> For the setting "2(Sby1, Sby2, Sby3)", the master says that all
>> >> of the standbys are sync-standbys and no message is emited on
>> >> failure of Sby1, which should cause a promotion of Sby3.
>> >>
>> >>> standby "Sby3" is now the synchronous standby with priority 3
>> >>> standby "Sby2" is now the synchronous standby with priority 2
>> >>> standby "Sby1" is now the synchronous standby with priority 1
>> >> ..<Sby 1 failure>
>> >>> standby "Sby3" is now the synchronous standby with priority 3
>> >>
>> >> Sby3 becomes sync standby twice:p
>> >>
>> >> This was a behavior taken over from the single-sync-rep era but
>> >> it should be confusing for the new sync-rep selection mechanism.
>> >> The second attached diff makes this as the following.
> ...
>> >> Applying the both of the above patches, the message would be like
>> >> the following.
>> >>
>> >>> 17:54:08.367 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> >>> 17:54:08.564 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> >>> 17:54:08.565 LOG: standby "Sby2" is now a synchronous standby with priority 2
>> >>> 17:54:18.387 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>> >>> 17:54:28.887 LOG: synchronous standby "Sby1" with priority 1 exited
>> >>> 17:54:31.359 LOG: standby "Sby3" is now a synchronous standby with priority 3
>> >>> 17:54:39.008 LOG: standby "Sby1" is now a synchronous standby with priority 1
>> >>> 17:54:41.382 LOG: standby "Sby3" is now a potential synchronous standby with priority 3
>> >>
>> >> Does this make sense?
>> >>
>> >> By the way, Sawada-san, you have changed the parentheses for the
>> >> priority method from '[]' to '()'. And I mistankenly defined
>> >> s_s_names as '2[Sby1, Sby2, Sby3]' and got wrong behavior, that
>> >> is, only Sby2 is registed as mandatory synchronous standby.
>> >>
>> >> For this case, the tree members of SyncRepConfig are '2[Sby1,',
>> >> 'Sby2', "Sby3]'. This syntax is valid for the current
>> >> specification but will surely get different meaning by the future
>> >> changes. We should refuse this known-to-be-wrong-in-future syntax
>> >> from now.
>> >>
>> >
>> > I have no objection about current version patch.
>> > But one optimise idea I came up with is to return false before
>> > calculation of lowest LSN from sync standby if MyWalSnd is not listed
>> > in sync_standby.
>> > For example in SyncRepGetOldestSyncRecPtr(),
>> >
>> > ==
>> > sync_standby = SyncRepGetSyncStandbys();
>> >
>> > if (list_length(sync_standbys) <SyncRepConfig->num_sync()
>> > {
>> > (snip)
>> > }
>> >
>> > /* Here if MyWalSnd is not listed in sync_standby, quick exit. */
>> > if (list_member_int(sync_standbys, MyWalSnd->slotno))
>> > return false;
>>
>> list_member_int() performs the loop internally. So I'm not sure how much
>> adding extra list_member_int() here can optimize this processing.
>> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
>> standby or not. In this idea, without adding extra loop, we can exit earilier
>> in the case where I'm not a sync standby. Does this make sense?
>
> The list_member_int() is also performed in the "(snip)" part. So
> SyncRepGetSyncStandbys() returning am_sync seems making sense.
>
> sync_standbys = SyncRepGetSyncStandbys(am_sync);
>
> /*
> * Quick exit if I am not synchronous or there's not
> * enough synchronous standbys
> * /
> if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
> {
> list_free(sync_standbys);
> return false;
> }

Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
it checks whether we're managing a sync standby or not.
Attached is the updated version of the patch. I also applied several
review comments to the patch.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v23.patch	text/x-patch	48.3 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 11:19:12
Message-ID:	CANP8+j+4PhCtmCxtZPiM0dXn11daoW-u8-ri87vbVLUDjdy+tg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 April 2016 at 11:23, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Tue, Apr 5, 2016 at 6:09 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On 5 April 2016 at 08:58, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
> > wrote:
> >
> >>
> >> >>>> So I am suggesting we put an extra keyword in front of the “k”, to
> >> > explain how the k responses should be gathered as an extension to the
> >> > the
> >> > syntax. I also think implementing “any k” is actually fairly trivial
> and
> >> > could be done for 9.6 (rather than just "first k").
> >>
> >> +1 for 'first/any k (...)', with possibly only 'first' supported for
> now,
> >> if the 'any' case is more involved than we would like to spend time on,
> >> given the time considerations. IMHO, the extra keyword adds to clarity
> of
> >> the syntax.
> >
> >
> > Further thoughts:
> >
> > I said "any k" was faster, though what I mean is both faster and more
> > robust. If you have network peaks from any of the k sync standbys then
> the
> > user will wait longer. With "any k", if a network peak occurs, then
> another
> > standby response will work just as well. So the performance of "any k"
> will
> > be both faster, more consistent and less prone to misconfiguration.
> >
> > I also didn't explain why I think it is easy to implement "any k".
> >
> > All we need to do is change SyncRepGetOldestSyncRecPtr() so that it
> returns
> > the k'th oldest pointer of any named standby.
>
> s/oldest/newest ?
>

Sure

> > Then use that to wake up user
> > backends. So the change requires only slightly modified logic in a very
> > isolated part of the code, almost all of which would be code inserts to
> cope
> > with the new option.
>
> Yes. Probably we need to use some time to find what algorithm is the best
> for searching the k'th newest pointer.
>

I think we would all agree an insertion sort would be the fastest for k ~
2-5, no much discussion there.

We do already use that in this section of code, namely SHMQueue.

> > The syntax and doc changes would take a couple of
> > hours.
>
> Yes, the updates of documentation would need more time.
>

I can help, if you wish that.

"any k" is in my mind what people would be expecting us to deliver with
this feature, which is why I suggest it now, especially since it is a small
additional item.

Please don't see these comments as blocking your progress to commit.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 11:26:39
Message-ID:	CAHGQGwH-abFrgKW8hmaVv_rFv-3c4f_sJ-gUtxErVBpBnXpVzQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 8:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 5 April 2016 at 11:18, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>>
>> > 1. Header comments in syncrep.c need changes, not just additions.
>>
>> Okay, will consider this later. And I'd appreciate if you elaborate what
>> changes are necessary specifically.
>
>
> Some of the old header comments are now wrong.

Okay, will check.

>> > 2. We need tests to ensure that k >=1 and k<=N
>>
>> The changes to replication test framework was included in the patch
>> before,
>> but I excluded it from the patch because I'd like to commit the core part
>> of
>> the patch first. Will review the test part later.
>
>
> I meant tests of setting the parameters, not tests of the feature itself.

k<=0 causes an error while parsing s_s_names in current patch.

Regarding the test of k<=N, you mean that an error should be emitted
when k is larger than or equal to the number of standby names in the list?
Multiple standbys with the same name may connect to the master.
In this case, users might want to specifiy k<=N. So k<=N seems not invalid
setting.

>> > 3. There should be a WARNING if k == N to say that we don't yet provide
>> > a
>> > level to give Apply consistency. (I mean if we specify 2 (n1, n2) or
>> > 3(n1,
>> > n2, n3) etc
>>
>> Sorry I failed to get your point. Could you tell me what Apply consistency
>> and why we cannot provide it when k = N?
>>
>> > 4. How does it work?
>> > It's pretty strange, but that isn't documented anywhere. It took me a
>> > while
>> > to figure it out even though I know that code. My thought is its a lot
>> > slower than before, which is a concern when we know by definition that k
>> > >=2
>> > for the new feature. I was going to mention the fact that this code only
>> > needs to be executed by standbys mentioned in s_s_n, so we can avoid
>> > overhead and contention for async standbys (But Masahiko just mentioned
>> > that
>> > also).
>>
>> Unless I'm missing something, the patch already avoids the overhead
>> of async standbys. Please see the top of SyncRepReleaseWaiters().
>> Since async standbys exit at the beginning of SyncRepReleaseWaiters(),
>> they don't need to perform any operations that the patch adds
>> (e.g., find out which standbys are synchronous).
>
>
> I was thinking about the overhead of scanning through the full list of
> WALSenders for each message, when it is a sync standby.

This is true even in current release or before.

Regards,

--
Fujii Masao

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 11:51:53
Message-ID:	CANP8+jJzr_AooGxt0WUAv7q1D8p69xCKa0f1_Ksao5R9XPZTyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5 April 2016 at 12:26, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> Multiple standbys with the same name may connect to the master.
> In this case, users might want to specifiy k<=N. So k<=N seems not invalid
> setting.

Confusing as that is, it is already the case; k > N could make sense. ;-(

However, in most cases, k > N would not make sense and we should issue a
WARNING.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 14:40:14
Message-ID:	CAA4eK1+bpAmxNG_TDu1aydXbW38cKcOdpJysd0213EPrBXGMfA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 3:15 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Tue, Apr 5, 2016 at 4:31 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Mon, Apr 4, 2016 at 1:58 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
wrote:
> >>
> >>
> >> Thanks for updating the patch!
> >>
> >> I applied the following changes to the patch.
> >> Attached is the revised version of the patch.
> >>
> >
> > 1.
> > {
> > {"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
> > gettext_noop("List of names of potential synchronous standbys."),
> > NULL,
> > GUC_LIST_INPUT
> > },
> > &SyncRepStandbyNames,
> > "",
> > check_synchronous_standby_names, NULL, NULL
> > },
> >
> > Isn't it better to modify the description of synchronous_standby_names
in
> > guc.c based on new usage?
>
> What about "Number of synchronous standbys and list of names of
> potential synchronous ones"? Better idea?
>

Looks good.

>
> > 2.
> > pg_stat_get_wal_senders()
> > {
> > ..
> > /*
> > ! * Allocate and update the config data of synchronous replication,
> > ! * and then get the currently active synchronous standbys.
> > */
> > + SyncRepUpdateConfig();
> > LWLockAcquire(SyncRepLock, LW_SHARED);
> > ! sync_standbys = SyncRepGetSyncStandbys();
> > LWLockRelease(SyncRepLock);
> > ..
> > }
> >
> > Why is it important to update the config with patch? Earlier also any
> > update to config between calls wouldn't have been visible.
>
> Because a backend has no chance to call SyncRepUpdateConfig() and
> parse the latest value of s_s_names if SyncRepUpdateConfig() is not
> called here. This means that pg_stat_replication may return the
information
> based on the old value of s_s_names.
>

Thats right, but without this patch also won't pg_stat_replication can show
old information? If no, why so?

> > 3.
> > <title>Planning for High Availability</title>
> >
> > <para>
> > ! <varname>synchronous_standby_names</> specifies the number of
> > ! synchronous standbys that transaction commits made when
> >
> > Is it better to say like: <varname>synchronous_standby_names</>
specifies
> > the number and names of
>
> Precisely s_s_names specifies a list of names of potential sync standbys
> not sync ones.
>

Okay, but you doesn't seem to have updated this in your latest patch.

> > 4.
> > + /*
> > + * Return the list of sync standbys, or NIL if no sync standby is
> > connected.
> > + *
> > + * If there are multiple standbys with the same priority,
> > + * the first one found is considered as higher priority.
> >
> > Here line indentation of second line can be improved.
>
> What about "the first one found is selected first"? Or better idea?
>

What I was complaining about that few words from second line can be moved
to previous line, but may be pgindent will take care of same, so no need to
worry.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 14:47:45
Message-ID:	CA+TgmoaL35mUQBu5+9zKCNeSL-vfXn=Q0oict579epBKcVRQWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 4, 2016 at 4:28 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> + ereport(LOG,
>>> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
>>> + application_name, MyWalSnd->sync_standby_priority)));
>>>
>>> s/ the / a /
>
> I have no objection to this change itself. But we have used this message
> in 9.5 or before, so if we apply this change, probably we need
> back-patching.

"the" implies that there can be only one synchronous standby at that
priority, while "a" implies that there could be more than one. So the
situation might be different with this patch than previously. (I
haven't read the patch so I don't know whether this is actually true,
but it might be what Thomas was going for.)

Also, I'd like to associate myself with the general happiness about
the prospect of having this feature in 9.6 (but without specifically
endorsing the code, since I have not read it).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-05 15:22:52
Message-ID:	CAD21AoDhLN4zK4MZeE3Vx7g=qO0cyqHHn6bRMbYoC+ANf-=9VA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 7:23 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Mon, 4 Apr 2016 22:00:24 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDoq1ubY4KkKhrA9jzaVXekwAT7gV5pQJbS+wj98b9-3A(at)mail(dot)gmail(dot)com>
>> > For this case, the tree members of SyncRepConfig are '2[Sby1,',
>> > 'Sby2', "Sby3]'. This syntax is valid for the current
>> > specification but will surely get different meaning by the future
>> > changes. We should refuse this known-to-be-wrong-in-future syntax
>> > from now.
>>
>> I couldn't get your point but why will the above syntax meaning be
>> different from current meaning by future change?
>> I thought that another method uses another kind of parentheses.
>
> If the 'another kind of parehtheses' is a pair of brackets, an
> application_name 'tokyo[A]', for example, is currently allowed to
> occur unquoted in the list but will become disallowed by the
> syntax change.
>
>

Thank you for explaining.
I understood but since the future syntax is yet to be reached
consensus, I thought that it would be difficult to refuse particular
kind of parentheses for now.

> > list_member_int() performs the loop internally. So I'm not sure how much
> > adding extra list_member_int() here can optimize this processing.
> > Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
> > standby or not. In this idea, without adding extra loop, we can exit earilier
> > in the case where I'm not a sync standby. Does this make sense?
> The list_member_int() is also performed in the "(snip)" part. So
> SyncRepGetSyncStandbys() returning am_sync seems making sense.
>
> sync_standbys = SyncRepGetSyncStandbys(am_sync);
>
> /*
> * Quick exit if I am not synchronous or there's not
> * enough synchronous standbys
> * /
> if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
> {
> list_free(sync_standbys);
> return false;

I meant that it can skip to acquire spin lock at least, so it will
optimise that logic.
But anyway I agree with making SyncRepGetSyncStandbys returns am_sync variable.

--
Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 05:18:44
Message-ID:	20160406.141844.70860176.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwE8_F79BUpC5TmJ7aazXU=Uju0VznFCCKDK57-wNpHV-g(at)mail(dot)gmail(dot)com>
> >> list_member_int() performs the loop internally. So I'm not sure how much
> >> adding extra list_member_int() here can optimize this processing.
> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
> >> standby or not. In this idea, without adding extra loop, we can exit earilier
> >> in the case where I'm not a sync standby. Does this make sense?
> >
> > The list_member_int() is also performed in the "(snip)" part. So
> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
> >
> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
> >
> > /*
> > * Quick exit if I am not synchronous or there's not
> > * enough synchronous standbys
> > * /
> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
> > {
> > list_free(sync_standbys);
> > return false;
> > }
>
> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
> it checks whether we're managing a sync standby or not.
> Attached is the updated version of the patch. I also applied several
> review comments to the patch.

It still does list_member_int but it can be gotten rid of as the
attached patch.

regards,

Attachment	Content-Type	Size
multi_sync_replication_v23_list_member_int.diff	text/x-patch	1.5 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 05:21:03
Message-ID:	CAHGQGwFkPaco=tu2gzZrjGnk-KkRfL21mznzeDgagCOaPL=a8w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 5 April 2016 at 12:26, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>>
>> Multiple standbys with the same name may connect to the master.
>> In this case, users might want to specifiy k<=N. So k<=N seems not invalid
>> setting.
>
>
> Confusing as that is, it is already the case; k > N could make sense. ;-(
>
> However, in most cases, k > N would not make sense and we should issue a
> WARNING.

Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
and the code for that test was included in the old patch (but I excluded it).
Now the majority seems to prefer to add that test, so I just revived and
revised that test code.

Attached is the updated version of the patch. I also completed Amit's
and Robert's comments.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v24.patch	text/x-patch	53.1 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 05:47:13
Message-ID:	CAHGQGwEXUO_2rDnPK72a85gFE64ADY+Q4QXyYZOMzh_cJKFQbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 11:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Apr 5, 2016 at 3:15 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Tue, Apr 5, 2016 at 4:31 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > On Mon, Apr 4, 2016 at 1:58 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> > wrote:
>> >>
>> >>
>> >> Thanks for updating the patch!
>> >>
>> >> I applied the following changes to the patch.
>> >> Attached is the revised version of the patch.
>> >>
>> >
>> > 1.
>> > {
>> > {"synchronous_standby_names", PGC_SIGHUP, REPLICATION_MASTER,
>> > gettext_noop("List of names of potential synchronous standbys."),
>> > NULL,
>> > GUC_LIST_INPUT
>> > },
>> > &SyncRepStandbyNames,
>> > "",
>> > check_synchronous_standby_names, NULL, NULL
>> > },
>> >
>> > Isn't it better to modify the description of synchronous_standby_names
>> > in
>> > guc.c based on new usage?
>>
>> What about "Number of synchronous standbys and list of names of
>> potential synchronous ones"? Better idea?
>>
>
> Looks good.
>
>>
>> > 2.
>> > pg_stat_get_wal_senders()
>> > {
>> > ..
>> > /*
>> > ! * Allocate and update the config data of synchronous replication,
>> > ! * and then get the currently active synchronous standbys.
>> > */
>> > + SyncRepUpdateConfig();
>> > LWLockAcquire(SyncRepLock, LW_SHARED);
>> > ! sync_standbys = SyncRepGetSyncStandbys();
>> > LWLockRelease(SyncRepLock);
>> > ..
>> > }
>> >
>> > Why is it important to update the config with patch? Earlier also any
>> > update to config between calls wouldn't have been visible.
>>
>> Because a backend has no chance to call SyncRepUpdateConfig() and
>> parse the latest value of s_s_names if SyncRepUpdateConfig() is not
>> called here. This means that pg_stat_replication may return the
>> information
>> based on the old value of s_s_names.
>>
>
> Thats right, but without this patch also won't pg_stat_replication can show
> old information? If no, why so?

Without the patch, when s_s_names is changed and SIGHUP is sent,
a backend calls ProcessConfigFile(), parse the configuration file and
set the global variable SyncRepStandbyNames to the latest value of
s_s_names. When pg_stat_replication is accessed, a backend calculates
which standby is synchronous based on that latest value in SyncRepStandbyNames,
and then displays the information of sync replication.

With the patch, basically the same steps are executed when s_s_names is
changed. But the difference is that, with the patch, SyncRepUpdateConfig()
must be called after ProcessConfigFile() is called before the calculation of
sync standbys. So I just added the call of SyncRepUpdateConfig() to
pg_stat_get_wal_senders().

BTW, we can move SyncRepUpdateConfig() just after ProcessConfigFile()
from pg_stat_get_wal_senders() and every backends always parse the value
of s_s_names when the setting is changed.

>> > 3.
>> > <title>Planning for High Availability</title>
>> >
>> > <para>
>> > ! <varname>synchronous_standby_names</> specifies the number of
>> > ! synchronous standbys that transaction commits made when
>> >
>> > Is it better to say like: <varname>synchronous_standby_names</>
>> > specifies
>> > the number and names of
>>
>> Precisely s_s_names specifies a list of names of potential sync standbys
>> not sync ones.
>>
>
> Okay, but you doesn't seem to have updated this in your latest patch.

I applied the change you suggested, to the patch. Thanks!

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 05:50:06
Message-ID:	CAHGQGwF4=H81tuXmvZzsJhHvMyBBX6b0K-YEjTf4Jvm6HfOwiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 5, 2016 at 11:47 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Apr 4, 2016 at 4:28 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> + ereport(LOG,
>>>> + (errmsg("standby \"%s\" is now the synchronous standby with priority %u",
>>>> + application_name, MyWalSnd->sync_standby_priority)));
>>>>
>>>> s/ the / a /
>>
>> I have no objection to this change itself. But we have used this message
>> in 9.5 or before, so if we apply this change, probably we need
>> back-patching.
>
> "the" implies that there can be only one synchronous standby at that
> priority, while "a" implies that there could be more than one. So the
> situation might be different with this patch than previously. (I
> haven't read the patch so I don't know whether this is actually true,
> but it might be what Thomas was going for.)

Thanks for the explanation!
I applied that change, to the latest patch I posted upthread.

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 05:51:15
Message-ID:	CAD21AoDv8k7z7DgGyc0Vuxu697+Vw4Y9sE6F99t9Pt5ZPb=tHQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 2:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 5 April 2016 at 12:26, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>>>
>>> Multiple standbys with the same name may connect to the master.
>>> In this case, users might want to specifiy k<=N. So k<=N seems not invalid
>>> setting.
>>
>>
>> Confusing as that is, it is already the case; k > N could make sense. ;-(
>>
>> However, in most cases, k > N would not make sense and we should issue a
>> WARNING.
>
> Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
> and the code for that test was included in the old patch (but I excluded it).
> Now the majority seems to prefer to add that test, so I just revived and
> revised that test code.

The regression test codes seems not to be included in latest patch, no?

Regards,

--
Masahiko Sawada

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 06:05:23
Message-ID:	CAB7nPqQML7z9o9_roJYUvuSH+OZQthjgKLtjjvk8tTRLcS7mQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 2:51 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 2:21 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Apr 5, 2016 at 8:51 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> On 5 April 2016 at 12:26, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>>>
>>>> Multiple standbys with the same name may connect to the master.
>>>> In this case, users might want to specifiy k<=N. So k<=N seems not invalid
>>>> setting.
>>>
>>>
>>> Confusing as that is, it is already the case; k > N could make sense. ;-(
>>>
>>> However, in most cases, k > N would not make sense and we should issue a
>>> WARNING.
>>
>> Somebody (maybe Horiguchi-san and Sawada-san) commented this upthread
>> and the code for that test was included in the old patch (but I excluded it).
>> Now the majority seems to prefer to add that test, so I just revived and
>> revised that test code.
>
> The regression test codes seems not to be included in latest patch, no?

I am looking at the latest patch now, and they are not included. It
would be good to get those tests bundled in for a last lookup I think.
--
Michael

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 06:29:12
Message-ID:	CAHGQGwHGQEwH2c9buiZ=G7Ko8PQYwiU7=NsDkvCjRKUPSN8j7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 2:18 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwE8_F79BUpC5TmJ7aazXU=Uju0VznFCCKDK57-wNpHV-g(at)mail(dot)gmail(dot)com>
>> >> list_member_int() performs the loop internally. So I'm not sure how much
>> >> adding extra list_member_int() here can optimize this processing.
>> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
>> >> standby or not. In this idea, without adding extra loop, we can exit earilier
>> >> in the case where I'm not a sync standby. Does this make sense?
>> >
>> > The list_member_int() is also performed in the "(snip)" part. So
>> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
>> >
>> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
>> >
>> > /*
>> > * Quick exit if I am not synchronous or there's not
>> > * enough synchronous standbys
>> > * /
>> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
>> > {
>> > list_free(sync_standbys);
>> > return false;
>> > }
>>
>> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
>> it checks whether we're managing a sync standby or not.
>> Attached is the updated version of the patch. I also applied several
>> review comments to the patch.
>
> It still does list_member_int but it can be gotten rid of as the
> attached patch.

Thanks for the review!

>
> regards,
>
> diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
> index 9b2137a..6998bb8 100644
> --- a/src/backend/replication/syncrep.c
> +++ b/src/backend/replication/syncrep.c
> @@ -590,6 +590,10 @@ SyncRepGetSyncStandbys(bool *am_sync)
> if (XLogRecPtrIsInvalid(walsnd->flush))
> continue;
>
> + /* Notify myself as 'synchonized' if I am */
> + if (am_sync != NULL && walsnd == MyWalSnd)
> + *am_sync = true;
> +
> /*
> * If the priority is equal to 1, consider this standby as sync
> * and append it to the result. Otherwise append this standby
> @@ -598,8 +602,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
> if (this_priority == 1)
> {
> result = lappend_int(result, i);
> - if (am_sync != NULL && walsnd == MyWalSnd)
> - *am_sync = true;
> if (list_length(result) == SyncRepConfig->num_sync)
> {
> list_free(pending);
> @@ -630,9 +632,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
> {
> bool needfree = (result != NIL && pending != NIL);
>
> - if (am_sync != NULL && !(*am_sync))
> - *am_sync = list_member_int(pending, MyWalSnd->slotno);
> -
> result = list_concat(result, pending);
> if (needfree)
> pfree(pending);
> @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
> }
>
> /*
> + * The pending list contains eventually potentially-synchronized standbys
> + * and this walsender may be one of them. So once reset am_sync.
> + */
> + if (am_sync != NULL)
> + *am_sync = false;
> +
> + /*

This code seems wrong in the case where this walsender is in the result list.
So I adopted another logic. Attached is the updated version of the patch.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v25.patch	text/x-patch	53.3 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 06:32:18
Message-ID:	CAHGQGwHS09i4Zt5_bd7nn9t2Mxhpqzk=5ok3EWWWsXvYxsRWFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I intentionally excluded the regression test from the patch because
I'd like to review and commit it separately from the main part of the feature.

I'd appreciate if you read through the regression test which was included
in previous patch and update it if required.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 07:08:49
Message-ID:	CAB7nPqTY3xgqeT3_WxnAG0A2f5fWzPiO0fDbwLgE4qZJaEpaUQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 3:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 2:18 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwE8_F79BUpC5TmJ7aazXU=Uju0VznFCCKDK57-wNpHV-g(at)mail(dot)gmail(dot)com>
>>> >> list_member_int() performs the loop internally. So I'm not sure how much
>>> >> adding extra list_member_int() here can optimize this processing.
>>> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
>>> >> standby or not. In this idea, without adding extra loop, we can exit earilier
>>> >> in the case where I'm not a sync standby. Does this make sense?
>>> >
>>> > The list_member_int() is also performed in the "(snip)" part. So
>>> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
>>> >
>>> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
>>> >
>>> > /*
>>> > * Quick exit if I am not synchronous or there's not
>>> > * enough synchronous standbys
>>> > * /
>>> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
>>> > {
>>> > list_free(sync_standbys);
>>> > return false;
>>> > }
>>>
>>> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
>>> it checks whether we're managing a sync standby or not.
>>> Attached is the updated version of the patch. I also applied several
>>> review comments to the patch.
>>
>> It still does list_member_int but it can be gotten rid of as the
>> attached patch.
>
> Thanks for the review!
>
>>
>> regards,
>>
>> diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
>> index 9b2137a..6998bb8 100644
>> --- a/src/backend/replication/syncrep.c
>> +++ b/src/backend/replication/syncrep.c
>> @@ -590,6 +590,10 @@ SyncRepGetSyncStandbys(bool *am_sync)
>> if (XLogRecPtrIsInvalid(walsnd->flush))
>> continue;
>>
>> + /* Notify myself as 'synchonized' if I am */
>> + if (am_sync != NULL && walsnd == MyWalSnd)
>> + *am_sync = true;
>> +
>> /*
>> * If the priority is equal to 1, consider this standby as sync
>> * and append it to the result. Otherwise append this standby
>> @@ -598,8 +602,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
>> if (this_priority == 1)
>> {
>> result = lappend_int(result, i);
>> - if (am_sync != NULL && walsnd == MyWalSnd)
>> - *am_sync = true;
>> if (list_length(result) == SyncRepConfig->num_sync)
>> {
>> list_free(pending);
>> @@ -630,9 +632,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
>> {
>> bool needfree = (result != NIL && pending != NIL);
>>
>> - if (am_sync != NULL && !(*am_sync))
>> - *am_sync = list_member_int(pending, MyWalSnd->slotno);
>> -
>> result = list_concat(result, pending);
>> if (needfree)
>> pfree(pending);
>> @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
>> }
>>
>> /*
>> + * The pending list contains eventually potentially-synchronized standbys
>> + * and this walsender may be one of them. So once reset am_sync.
>> + */
>> + if (am_sync != NULL)
>> + *am_sync = false;
>> +
>> + /*
>
> This code seems wrong in the case where this walsender is in the result list.
> So I adopted another logic. Attached is the updated version of the patch.

To be honest, this is a nice patch that we have here, and it received
a fair amount of work. I have been playing with it a bit but I could
not break it.

Here are few things I have noticed:
+ for (i = 0; i < max_wal_senders; i++)
+ {
+ walsnd = &WalSndCtl->walsnds[i];
No volatile pointer to prevent code reordering?

*/
typedef struct WalSnd
{
+ int slotno; /* index of this slot in WalSnd array */
pid_t pid; /* this walsender's process id, or 0 */
slotno is used nowhere.

I'll grab the tests and look at them.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 08:01:51
Message-ID:	20160406.170151.246853881.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 6 Apr 2016 15:29:12 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHGQEwH2c9buiZ=G7Ko8PQYwiU7=NsDkvCjRKUPSN8j7A(at)mail(dot)gmail(dot)com>
> > @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
> > }
> >
> > /*
> > + * The pending list contains eventually potentially-synchronized standbys
> > + * and this walsender may be one of them. So once reset am_sync.
> > + */
> > + if (am_sync != NULL)
> > + *am_sync = false;
> > +
> > + /*
>
> This code seems wrong in the case where this walsender is in the result list.
> So I adopted another logic. Attached is the updated version of the patch.

You must misread the patch. am_sync is originally set in the loop
just after that for the case.

! while (priority <= lowest_priority)
! {
..
! for (cell = list_head(pending); cell != NULL; cell = next)
! {
...
! if (this_priority == priority)
! {
! result = lappend_int(result, i);
! if (am_sync != NULL && walsnd == MyWalSnd)
! *am_sync = true;

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 08:04:33
Message-ID:	CAB7nPqTeSCjCBBOapsCaBymHDjkhuiNjG=KSdr1m7UOHdPUkng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Here are few things I have noticed:
> + for (i = 0; i < max_wal_senders; i++)
> + {
> + walsnd = &WalSndCtl->walsnds[i];
> No volatile pointer to prevent code reordering?
>
> */
> typedef struct WalSnd
> {
> + int slotno; /* index of this slot in WalSnd array */
> pid_t pid; /* this walsender's process id, or 0 */
> slotno is used nowhere.
>
> I'll grab the tests and look at them.

So I had a look at those tests and finished with the attached:
- patch 1 adds a reload routine to PostgresNode
- patch 2 the list of tests.

I took the tests from patch 21 and did many tweaks on them:
- Use of qq() instead of quotes
- Removal of hardcoded newlines
- typo fixes and sanity fixes
- etc.
Regards,
--
Michael

Attachment	Content-Type	Size
2_n_sync_tests.patch	invalid/octet-stream	4.7 KB
1_add_reload_routine.patch	invalid/octet-stream	590 bytes

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 08:07:47
Message-ID:	CAHGQGwH57b1MROYZuc6GLN_=YDxajNKBnLKT9zU6qQPMqw9+kg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 3:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Apr 6, 2016 at 2:18 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwE8_F79BUpC5TmJ7aazXU=Uju0VznFCCKDK57-wNpHV-g(at)mail(dot)gmail(dot)com>
>>>> >> list_member_int() performs the loop internally. So I'm not sure how much
>>>> >> adding extra list_member_int() here can optimize this processing.
>>>> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
>>>> >> standby or not. In this idea, without adding extra loop, we can exit earilier
>>>> >> in the case where I'm not a sync standby. Does this make sense?
>>>> >
>>>> > The list_member_int() is also performed in the "(snip)" part. So
>>>> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
>>>> >
>>>> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
>>>> >
>>>> > /*
>>>> > * Quick exit if I am not synchronous or there's not
>>>> > * enough synchronous standbys
>>>> > * /
>>>> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
>>>> > {
>>>> > list_free(sync_standbys);
>>>> > return false;
>>>> > }
>>>>
>>>> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
>>>> it checks whether we're managing a sync standby or not.
>>>> Attached is the updated version of the patch. I also applied several
>>>> review comments to the patch.
>>>
>>> It still does list_member_int but it can be gotten rid of as the
>>> attached patch.
>>
>> Thanks for the review!
>>
>>>
>>> regards,
>>>
>>> diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
>>> index 9b2137a..6998bb8 100644
>>> --- a/src/backend/replication/syncrep.c
>>> +++ b/src/backend/replication/syncrep.c
>>> @@ -590,6 +590,10 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>> if (XLogRecPtrIsInvalid(walsnd->flush))
>>> continue;
>>>
>>> + /* Notify myself as 'synchonized' if I am */
>>> + if (am_sync != NULL && walsnd == MyWalSnd)
>>> + *am_sync = true;
>>> +
>>> /*
>>> * If the priority is equal to 1, consider this standby as sync
>>> * and append it to the result. Otherwise append this standby
>>> @@ -598,8 +602,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>> if (this_priority == 1)
>>> {
>>> result = lappend_int(result, i);
>>> - if (am_sync != NULL && walsnd == MyWalSnd)
>>> - *am_sync = true;
>>> if (list_length(result) == SyncRepConfig->num_sync)
>>> {
>>> list_free(pending);
>>> @@ -630,9 +632,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>> {
>>> bool needfree = (result != NIL && pending != NIL);
>>>
>>> - if (am_sync != NULL && !(*am_sync))
>>> - *am_sync = list_member_int(pending, MyWalSnd->slotno);
>>> -
>>> result = list_concat(result, pending);
>>> if (needfree)
>>> pfree(pending);
>>> @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>> }
>>>
>>> /*
>>> + * The pending list contains eventually potentially-synchronized standbys
>>> + * and this walsender may be one of them. So once reset am_sync.
>>> + */
>>> + if (am_sync != NULL)
>>> + *am_sync = false;
>>> +
>>> + /*
>>
>> This code seems wrong in the case where this walsender is in the result list.
>> So I adopted another logic. Attached is the updated version of the patch.
>
> To be honest, this is a nice patch that we have here, and it received
> a fair amount of work. I have been playing with it a bit but I could
> not break it.
>
> Here are few things I have noticed:

Thanks for the review!

> + for (i = 0; i < max_wal_senders; i++)
> + {
> + walsnd = &WalSndCtl->walsnds[i];
> No volatile pointer to prevent code reordering?

Yes. Since spin lock is not taken there, volatile is necessary.

> */
> typedef struct WalSnd
> {
> + int slotno; /* index of this slot in WalSnd array */
> pid_t pid; /* this walsender's process id, or 0 */
> slotno is used nowhere.

Yep. Attached is the updated version of the patch.

> I'll grab the tests and look at them.

Many thanks!

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
multi_sync_replication_v26.patch	text/x-patch	52.7 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 08:10:12
Message-ID:	CAHGQGwH9SJHxCfZYbRDtOHeMNDpm2+gbJYKrJCgBtAhUj4Wb5A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 5:01 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Wed, 6 Apr 2016 15:29:12 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwHGQEwH2c9buiZ=G7Ko8PQYwiU7=NsDkvCjRKUPSN8j7A(at)mail(dot)gmail(dot)com>
>> > @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
>> > }
>> >
>> > /*
>> > + * The pending list contains eventually potentially-synchronized standbys
>> > + * and this walsender may be one of them. So once reset am_sync.
>> > + */
>> > + if (am_sync != NULL)
>> > + *am_sync = false;
>> > +
>> > + /*
>>
>> This code seems wrong in the case where this walsender is in the result list.
>> So I adopted another logic. Attached is the updated version of the patch.
>
> You must misread the patch. am_sync is originally set in the loop
> just after that for the case.
>
> ! while (priority <= lowest_priority)
> ! {
> ..
> ! for (cell = list_head(pending); cell != NULL; cell = next)
> ! {
> ...
> ! if (this_priority == priority)
> ! {
> ! result = lappend_int(result, i);
> ! if (am_sync != NULL && walsnd == MyWalSnd)
> ! *am_sync = true;

But if this walsender has the priority 1, *am_sync is set to true in
the first loop not the second one. No?

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, michael(dot)paquier(at)gmail(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 08:16:12
Message-ID:	20160406.171612.81865787.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry, my code was wrong in the case that the total numer of
synchronous standby exceeds required number and the wansender is
at priority 1.

Sorry for the noise.

At Wed, 06 Apr 2016 17:01:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160406(dot)170151(dot)246853881(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> You must misread the patch. am_sync is originally set in the loop
> just after that for the case.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 08:23:39
Message-ID:	CAHGQGwEWVCiW3sqZAtYeidLj=ROLtbSkcNqU_54p8SPL+VSqug@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 5:07 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Wed, Apr 6, 2016 at 3:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Wed, Apr 6, 2016 at 2:18 PM, Kyotaro HORIGUCHI
>>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> At Tue, 5 Apr 2016 20:17:21 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwE8_F79BUpC5TmJ7aazXU=Uju0VznFCCKDK57-wNpHV-g(at)mail(dot)gmail(dot)com>
>>>>> >> list_member_int() performs the loop internally. So I'm not sure how much
>>>>> >> adding extra list_member_int() here can optimize this processing.
>>>>> >> Another idea is to make SyncRepGetSyncStandby() check whether I'm sync
>>>>> >> standby or not. In this idea, without adding extra loop, we can exit earilier
>>>>> >> in the case where I'm not a sync standby. Does this make sense?
>>>>> >
>>>>> > The list_member_int() is also performed in the "(snip)" part. So
>>>>> > SyncRepGetSyncStandbys() returning am_sync seems making sense.
>>>>> >
>>>>> > sync_standbys = SyncRepGetSyncStandbys(am_sync);
>>>>> >
>>>>> > /*
>>>>> > * Quick exit if I am not synchronous or there's not
>>>>> > * enough synchronous standbys
>>>>> > * /
>>>>> > if (!*am_sync || list_length(sync_standbys) < SyncRepConfig->num_sync)
>>>>> > {
>>>>> > list_free(sync_standbys);
>>>>> > return false;
>>>>> > }
>>>>>
>>>>> Thanks for the comment! I changed SyncRepGetSyncStandbys() so that
>>>>> it checks whether we're managing a sync standby or not.
>>>>> Attached is the updated version of the patch. I also applied several
>>>>> review comments to the patch.
>>>>
>>>> It still does list_member_int but it can be gotten rid of as the
>>>> attached patch.
>>>
>>> Thanks for the review!
>>>
>>>>
>>>> regards,
>>>>
>>>> diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
>>>> index 9b2137a..6998bb8 100644
>>>> --- a/src/backend/replication/syncrep.c
>>>> +++ b/src/backend/replication/syncrep.c
>>>> @@ -590,6 +590,10 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>>> if (XLogRecPtrIsInvalid(walsnd->flush))
>>>> continue;
>>>>
>>>> + /* Notify myself as 'synchonized' if I am */
>>>> + if (am_sync != NULL && walsnd == MyWalSnd)
>>>> + *am_sync = true;
>>>> +
>>>> /*
>>>> * If the priority is equal to 1, consider this standby as sync
>>>> * and append it to the result. Otherwise append this standby
>>>> @@ -598,8 +602,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>>> if (this_priority == 1)
>>>> {
>>>> result = lappend_int(result, i);
>>>> - if (am_sync != NULL && walsnd == MyWalSnd)
>>>> - *am_sync = true;
>>>> if (list_length(result) == SyncRepConfig->num_sync)
>>>> {
>>>> list_free(pending);
>>>> @@ -630,9 +632,6 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>>> {
>>>> bool needfree = (result != NIL && pending != NIL);
>>>>
>>>> - if (am_sync != NULL && !(*am_sync))
>>>> - *am_sync = list_member_int(pending, MyWalSnd->slotno);
>>>> -
>>>> result = list_concat(result, pending);
>>>> if (needfree)
>>>> pfree(pending);
>>>> @@ -640,6 +639,13 @@ SyncRepGetSyncStandbys(bool *am_sync)
>>>> }
>>>>
>>>> /*
>>>> + * The pending list contains eventually potentially-synchronized standbys
>>>> + * and this walsender may be one of them. So once reset am_sync.
>>>> + */
>>>> + if (am_sync != NULL)
>>>> + *am_sync = false;
>>>> +
>>>> + /*
>>>
>>> This code seems wrong in the case where this walsender is in the result list.
>>> So I adopted another logic. Attached is the updated version of the patch.
>>
>> To be honest, this is a nice patch that we have here, and it received
>> a fair amount of work. I have been playing with it a bit but I could
>> not break it.
>>
>> Here are few things I have noticed:
>
> Thanks for the review!
>
>> + for (i = 0; i < max_wal_senders; i++)
>> + {
>> + walsnd = &WalSndCtl->walsnds[i];
>> No volatile pointer to prevent code reordering?
>
> Yes. Since spin lock is not taken there, volatile is necessary.
>
>> */
>> typedef struct WalSnd
>> {
>> + int slotno; /* index of this slot in WalSnd array */
>> pid_t pid; /* this walsender's process id, or 0 */
>> slotno is used nowhere.
>
> Yep. Attached is the updated version of the patch.

Okay, I pushed the patch!
Many thanks to all involved in the development of this feature!

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 09:18:08
Message-ID:	CAB7nPqRFWoUo8Fwe80gVTsNfTfNW0d2ETZ5wdB3VQkqk+EzGZg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 5:23 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Okay, I pushed the patch!
> Many thanks to all involved in the development of this feature!

I think that I am crying... Really cool to see this milestone accomplished.
--
Michael

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 09:26:20
Message-ID:	CANP8+j+RndoRTWZ-7QHhH=MYY-KUaZkw9KxrWMdLF-Ku6Od=Ag@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 6 April 2016 at 09:23, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> Okay, I pushed the patch!
> Many thanks to all involved in the development of this feature!
>

Very good.

I think the description in the commit message that we don't support "quorum
commit" is sufficient to cover my concerns about what others might expect
from this feature. Could we add similar wording to the docs?

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 11:59:23
Message-ID:	CAA4eK1JMYBBjLs_D0bdX_8kfy4rV74ZNdX-sfCbJDhtqCSrUOw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 11:17 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Tue, Apr 5, 2016 at 11:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >>
> >> > 2.
> >> > pg_stat_get_wal_senders()
> >> > {
> >> > ..
> >> > /*
> >> > ! * Allocate and update the config data of synchronous replication,
> >> > ! * and then get the currently active synchronous standbys.
> >> > */
> >> > + SyncRepUpdateConfig();
> >> > LWLockAcquire(SyncRepLock, LW_SHARED);
> >> > ! sync_standbys = SyncRepGetSyncStandbys();
> >> > LWLockRelease(SyncRepLock);
> >> > ..
> >> > }
> >> >
> >> > Why is it important to update the config with patch? Earlier also
any
> >> > update to config between calls wouldn't have been visible.
> >>
> >> Because a backend has no chance to call SyncRepUpdateConfig() and
> >> parse the latest value of s_s_names if SyncRepUpdateConfig() is not
> >> called here. This means that pg_stat_replication may return the
> >> information
> >> based on the old value of s_s_names.
> >>
> >
> > Thats right, but without this patch also won't pg_stat_replication can
show
> > old information? If no, why so?
>
> Without the patch, when s_s_names is changed and SIGHUP is sent,
> a backend calls ProcessConfigFile(), parse the configuration file and
> set the global variable SyncRepStandbyNames to the latest value of
> s_s_names. When pg_stat_replication is accessed, a backend calculates
> which standby is synchronous based on that latest value in
SyncRepStandbyNames,
> and then displays the information of sync replication.
>
> With the patch, basically the same steps are executed when s_s_names is
> changed. But the difference is that, with the patch, SyncRepUpdateConfig()
> must be called after ProcessConfigFile() is called before the calculation
of
> sync standbys. So I just added the call of SyncRepUpdateConfig() to
> pg_stat_get_wal_senders().
>

Then why to call it just in pg_stat_get_wal_senders(), isn't it better if
we call it always after ProcessConfigFile() (after
setting SyncRepStandbyNames)

> BTW, we can move SyncRepUpdateConfig() just after ProcessConfigFile()
> from pg_stat_get_wal_senders() and every backends always parse the value
> of s_s_names when the setting is changed.
>

That sounds appropriate, but not sure what is exact place to call it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 13:33:45
Message-ID:	CAHGQGwF20iE92HAVhGHBPvroLOVCjRVmPNd4x3zbZf4rLfiajQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 8:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 11:17 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Tue, Apr 5, 2016 at 11:40 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >>
>> >> > 2.
>> >> > pg_stat_get_wal_senders()
>> >> > {
>> >> > ..
>> >> > /*
>> >> > ! * Allocate and update the config data of synchronous replication,
>> >> > ! * and then get the currently active synchronous standbys.
>> >> > */
>> >> > + SyncRepUpdateConfig();
>> >> > LWLockAcquire(SyncRepLock, LW_SHARED);
>> >> > ! sync_standbys = SyncRepGetSyncStandbys();
>> >> > LWLockRelease(SyncRepLock);
>> >> > ..
>> >> > }
>> >> >
>> >> > Why is it important to update the config with patch? Earlier also
>> >> > any
>> >> > update to config between calls wouldn't have been visible.
>> >>
>> >> Because a backend has no chance to call SyncRepUpdateConfig() and
>> >> parse the latest value of s_s_names if SyncRepUpdateConfig() is not
>> >> called here. This means that pg_stat_replication may return the
>> >> information
>> >> based on the old value of s_s_names.
>> >>
>> >
>> > Thats right, but without this patch also won't pg_stat_replication can
>> > show
>> > old information? If no, why so?
>>
>> Without the patch, when s_s_names is changed and SIGHUP is sent,
>> a backend calls ProcessConfigFile(), parse the configuration file and
>> set the global variable SyncRepStandbyNames to the latest value of
>> s_s_names. When pg_stat_replication is accessed, a backend calculates
>> which standby is synchronous based on that latest value in
>> SyncRepStandbyNames,
>> and then displays the information of sync replication.
>>
>> With the patch, basically the same steps are executed when s_s_names is
>> changed. But the difference is that, with the patch, SyncRepUpdateConfig()
>> must be called after ProcessConfigFile() is called before the calculation
>> of
>> sync standbys. So I just added the call of SyncRepUpdateConfig() to
>> pg_stat_get_wal_senders().
>>
>
> Then why to call it just in pg_stat_get_wal_senders(), isn't it better if we
> call it always after ProcessConfigFile() (after setting SyncRepStandbyNames)
>
>> BTW, we can move SyncRepUpdateConfig() just after ProcessConfigFile()
>> from pg_stat_get_wal_senders() and every backends always parse the value
>> of s_s_names when the setting is changed.
>>
>
> That sounds appropriate, but not sure what is exact place to call it.

Maybe just after the following ProcessConfigFile().

-----------------------------------------
/*
* (6) check for any other interesting events that happened while we
* slept.
*/
if (got_SIGHUP)
{
got_SIGHUP = false;
ProcessConfigFile(PGC_SIGHUP);
}
-----------------------------------------

If we do the move, we also need to either (1) make postmaster call
SyncRepUpdateConfig() and pass the parsed result to any forked backends
via a file like write_nondefault_variables() does for EXEC_BACKEND
environment, or (2) make a backend call SyncRepUpdateConfig() during
its initialization phase so that the first call of pg_stat_replication
can use the parsed result. (1) seems complicated and overkill.
(2) may add very small overhead into the fork of a backend. It would
be almost negligible, though. So which logic should we adopt?

Regards,

--
Fujii Masao

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 14:14:19
Message-ID:	CAA4eK1L0Ad1+iFaot0q3qN6jg0yx-8tcz1s6wwaCwrj11QdHyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 7:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Wed, Apr 6, 2016 at 8:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> >> BTW, we can move SyncRepUpdateConfig() just after ProcessConfigFile()
> >> from pg_stat_get_wal_senders() and every backends always parse the
value
> >> of s_s_names when the setting is changed.
> >>
> >
> > That sounds appropriate, but not sure what is exact place to call it.
>
> Maybe just after the following ProcessConfigFile().
>
> -----------------------------------------
> /*
> * (6) check for any other interesting events that happened while we
> * slept.
> */
> if (got_SIGHUP)
> {
> got_SIGHUP = false;
> ProcessConfigFile(PGC_SIGHUP);
> }
> -----------------------------------------
>
> If we do the move, we also need to either (1) make postmaster call
> SyncRepUpdateConfig() and pass the parsed result to any forked backends
> via a file like write_nondefault_variables() does for EXEC_BACKEND
> environment, or (2) make a backend call SyncRepUpdateConfig() during
> its initialization phase so that the first call of pg_stat_replication
> can use the parsed result. (1) seems complicated and overkill.
> (2) may add very small overhead into the fork of a backend. It would
> be almost negligible, though. So which logic should we adopt?
>

Won't it be possible to have assign_* function
for synchronous_standby_names as we have for some of the other settings
like assign_XactIsoLevel and then call SyncRepUpdateConfig() in that
function?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-06 14:41:58
Message-ID:	CAHGQGwFM1kgTaRnpJ9x6bwJhpY6=iZ1C7mW5r28k=TtBJJv+9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 11:14 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 7:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Wed, Apr 6, 2016 at 8:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >
>> >> BTW, we can move SyncRepUpdateConfig() just after ProcessConfigFile()
>> >> from pg_stat_get_wal_senders() and every backends always parse the
>> >> value
>> >> of s_s_names when the setting is changed.
>> >>
>> >
>> > That sounds appropriate, but not sure what is exact place to call it.
>>
>> Maybe just after the following ProcessConfigFile().
>>
>> -----------------------------------------
>> /*
>> * (6) check for any other interesting events that happened while we
>> * slept.
>> */
>> if (got_SIGHUP)
>> {
>> got_SIGHUP = false;
>> ProcessConfigFile(PGC_SIGHUP);
>> }
>> -----------------------------------------
>>
>> If we do the move, we also need to either (1) make postmaster call
>> SyncRepUpdateConfig() and pass the parsed result to any forked backends
>> via a file like write_nondefault_variables() does for EXEC_BACKEND
>> environment, or (2) make a backend call SyncRepUpdateConfig() during
>> its initialization phase so that the first call of pg_stat_replication
>> can use the parsed result. (1) seems complicated and overkill.
>> (2) may add very small overhead into the fork of a backend. It would
>> be almost negligible, though. So which logic should we adopt?
>>
>
> Won't it be possible to have assign_* function for synchronous_standby_names
> as we have for some of the other settings like assign_XactIsoLevel and then
> call SyncRepUpdateConfig() in that function?

It's possible, but still seems to need (1), i.e., the variable that assign_XXX
function assigned needs to be passed to a backend via file for EXEC_BACKEND
environment.

Regards,

--
Fujii Masao

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 04:22:43
Message-ID:	CAA4eK1+gwGXnkh=8jTP23j_caOpVVfjqgnpXg4GjNiudTWUhkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 8:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Wed, Apr 6, 2016 at 11:14 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Wed, Apr 6, 2016 at 7:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
wrote:
> >>
> >> On Wed, Apr 6, 2016 at 8:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >> wrote:
> >> >
> >> >> BTW, we can move SyncRepUpdateConfig() just after
ProcessConfigFile()
> >> >> from pg_stat_get_wal_senders() and every backends always parse the
> >> >> value
> >> >> of s_s_names when the setting is changed.
> >> >>
> >> >
> >> > That sounds appropriate, but not sure what is exact place to call it.
> >>
> >> Maybe just after the following ProcessConfigFile().
> >>
> >> -----------------------------------------
> >> /*
> >> * (6) check for any other interesting events that happened while we
> >> * slept.
> >> */
> >> if (got_SIGHUP)
> >> {
> >> got_SIGHUP = false;
> >> ProcessConfigFile(PGC_SIGHUP);
> >> }
> >> -----------------------------------------
> >>
> >> If we do the move, we also need to either (1) make postmaster call
> >> SyncRepUpdateConfig() and pass the parsed result to any forked backends
> >> via a file like write_nondefault_variables() does for EXEC_BACKEND
> >> environment, or (2) make a backend call SyncRepUpdateConfig() during
> >> its initialization phase so that the first call of pg_stat_replication
> >> can use the parsed result. (1) seems complicated and overkill.
> >> (2) may add very small overhead into the fork of a backend. It would
> >> be almost negligible, though. So which logic should we adopt?
> >>
> >
> > Won't it be possible to have assign_* function for
synchronous_standby_names
> > as we have for some of the other settings like assign_XactIsoLevel and
then
> > call SyncRepUpdateConfig() in that function?
>
> It's possible, but still seems to need (1), i.e., the variable that
assign_XXX
> function assigned needs to be passed to a backend via file for
EXEC_BACKEND
> environment.
>

But for that, I think we don't need to do anything extra. I
mean write_nondefault_variables() will automatically write the non-default
value of variable and then during backend initialization, it will
call read_nondefault_variables which will call set_config_option for
non-default parameters and that should set the required value if we have
assign_* function defined for the variable.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 04:32:24
Message-ID:	CAHGQGwH5AUd+4aEYjvdaNLnhPwvhWPEZKzJkFzGBjdqg3vxkbQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 1:22 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 8:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Wed, Apr 6, 2016 at 11:14 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > On Wed, Apr 6, 2016 at 7:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> On Wed, Apr 6, 2016 at 8:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> >> wrote:
>> >> >
>> >> >> BTW, we can move SyncRepUpdateConfig() just after
>> >> >> ProcessConfigFile()
>> >> >> from pg_stat_get_wal_senders() and every backends always parse the
>> >> >> value
>> >> >> of s_s_names when the setting is changed.
>> >> >>
>> >> >
>> >> > That sounds appropriate, but not sure what is exact place to call it.
>> >>
>> >> Maybe just after the following ProcessConfigFile().
>> >>
>> >> -----------------------------------------
>> >> /*
>> >> * (6) check for any other interesting events that happened while we
>> >> * slept.
>> >> */
>> >> if (got_SIGHUP)
>> >> {
>> >> got_SIGHUP = false;
>> >> ProcessConfigFile(PGC_SIGHUP);
>> >> }
>> >> -----------------------------------------
>> >>
>> >> If we do the move, we also need to either (1) make postmaster call
>> >> SyncRepUpdateConfig() and pass the parsed result to any forked backends
>> >> via a file like write_nondefault_variables() does for EXEC_BACKEND
>> >> environment, or (2) make a backend call SyncRepUpdateConfig() during
>> >> its initialization phase so that the first call of pg_stat_replication
>> >> can use the parsed result. (1) seems complicated and overkill.
>> >> (2) may add very small overhead into the fork of a backend. It would
>> >> be almost negligible, though. So which logic should we adopt?
>> >>
>> >
>> > Won't it be possible to have assign_* function for
>> > synchronous_standby_names
>> > as we have for some of the other settings like assign_XactIsoLevel and
>> > then
>> > call SyncRepUpdateConfig() in that function?
>>
>> It's possible, but still seems to need (1), i.e., the variable that
>> assign_XXX
>> function assigned needs to be passed to a backend via file for
>> EXEC_BACKEND
>> environment.
>>
>
> But for that, I think we don't need to do anything extra. I mean
> write_nondefault_variables() will automatically write the non-default value
> of variable and then during backend initialization, it will call
> read_nondefault_variables which will call set_config_option for non-default
> parameters and that should set the required value if we have assign_*
> function defined for the variable.

Yes if the variable that we'd like to pass to a backend is BOOL, INT,
REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
complicated. So ISTM that write_one_nondefault_variable() needs to
be updated so that SyncRepConfig is written to a file.

Regards,

--
Fujii Masao

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 05:48:21
Message-ID:	CAA4eK1JwxnOda50WkmeN4XAacOy+ET2cdNy1-pnOjvxSFxVY=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Thu, Apr 7, 2016 at 1:22 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> > But for that, I think we don't need to do anything extra. I mean
> > write_nondefault_variables() will automatically write the non-default
value
> > of variable and then during backend initialization, it will call
> > read_nondefault_variables which will call set_config_option for
non-default
> > parameters and that should set the required value if we have assign_*
> > function defined for the variable.
>
> Yes if the variable that we'd like to pass to a backend is BOOL, INT,
> REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
> complicated.
>

SyncRepConfig is a parsed result of SyncRepStandbyNames, why you want to
pass that? I assume that current non-default value of SyncRepStandbyNames
will be passed via write_nondefault_variables(), so we can use that to
regenerate SyncRepConfig.

>
> So ISTM that write_one_nondefault_variable() needs to
> be updated so that SyncRepConfig is written to a file.
>

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 06:26:42
Message-ID:	CAHGQGwG4Mw70LM+Nx-9=1+-udU6BOQ9kqCkXZwPxA9FV4H1jaw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 2:48 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Apr 7, 2016 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Thu, Apr 7, 2016 at 1:22 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >
>> > But for that, I think we don't need to do anything extra. I mean
>> > write_nondefault_variables() will automatically write the non-default
>> > value
>> > of variable and then during backend initialization, it will call
>> > read_nondefault_variables which will call set_config_option for
>> > non-default
>> > parameters and that should set the required value if we have assign_*
>> > function defined for the variable.
>>
>> Yes if the variable that we'd like to pass to a backend is BOOL, INT,
>> REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
>> complicated.
>>
>
> SyncRepConfig is a parsed result of SyncRepStandbyNames, why you want to
> pass that? I assume that current non-default value of SyncRepStandbyNames
> will be passed via write_nondefault_variables(), so we can use that to
> regenerate SyncRepConfig.

Yes, so SyncRepUpdateConfig() needs to be called by a backend after fork,
to regenerate SyncRepConfig from the passed value of SyncRepStandbyNames.
This is the approach of (2) which I explained upthread. assign_XXX function
doesn't seem to be helpful for this case.

Regards,

--
Fujii Masao

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 06:34:01
Message-ID:	CAA4eK1+LF3is9x1ZpyH8ARYGcWV=ses5tuH0iu7joQrzmeEWOw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 11:56 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Thu, Apr 7, 2016 at 2:48 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Thu, Apr 7, 2016 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
wrote:
> >>
> >> On Thu, Apr 7, 2016 at 1:22 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> >> wrote:
> >> >
> >> > But for that, I think we don't need to do anything extra. I mean
> >> > write_nondefault_variables() will automatically write the non-default
> >> > value
> >> > of variable and then during backend initialization, it will call
> >> > read_nondefault_variables which will call set_config_option for
> >> > non-default
> >> > parameters and that should set the required value if we have assign_*
> >> > function defined for the variable.
> >>
> >> Yes if the variable that we'd like to pass to a backend is BOOL, INT,
> >> REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
> >> complicated.
> >>
> >
> > SyncRepConfig is a parsed result of SyncRepStandbyNames, why you want to
> > pass that? I assume that current non-default value of
SyncRepStandbyNames
> > will be passed via write_nondefault_variables(), so we can use that to
> > regenerate SyncRepConfig.
>
> Yes, so SyncRepUpdateConfig() needs to be called by a backend after fork,
> to regenerate SyncRepConfig from the passed value of SyncRepStandbyNames.
> This is the approach of (2) which I explained upthread. assign_XXX
function
> doesn't seem to be helpful for this case.
>

Then where do you want to call it? Also, this is only required for
EXEC_BACKEND builds.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 08:00:00
Message-ID:	57061380.6040500@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2016/04/07 15:26, Fujii Masao wrote:
> On Thu, Apr 7, 2016 at 2:48 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Thu, Apr 7, 2016 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> Yes if the variable that we'd like to pass to a backend is BOOL, INT,
>>> REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
>>> complicated.
>> SyncRepConfig is a parsed result of SyncRepStandbyNames, why you want to
>> pass that? I assume that current non-default value of SyncRepStandbyNames
>> will be passed via write_nondefault_variables(), so we can use that to
>> regenerate SyncRepConfig.
>
> Yes, so SyncRepUpdateConfig() needs to be called by a backend after fork,
> to regenerate SyncRepConfig from the passed value of SyncRepStandbyNames.
> This is the approach of (2) which I explained upthread. assign_XXX function
> doesn't seem to be helpful for this case.

I don't see why there is need to SyncRepUpdateConfig() after every fork or
anywhere outside syncrep.c/walsender.c for that matter. AIUI, only
walsender or a backend that runs pg_stat_get_wal_senders() ever needs to
run SyncRepUpdateConfig() to get parsed synchronous standbys info from the
string that is SyncRepStandbyNames. For rest of the world, it's just a
string guc and is written to and read from any external file as one (e.g.
the file that write_nondefault_variables() writes to in the EXEC_BACKEND
case). I hope I'm not entirely missing the point of the discussion you
and Amit are having.

Thanks,
Amit

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 10:29:25
Message-ID:	CAA4eK1+K=m0T8ba_B=dKw-BngPW2Lp8v7jUdDomdT6rFkzu0=A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 1:30 PM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
wrote:
>
> On 2016/04/07 15:26, Fujii Masao wrote:
> > On Thu, Apr 7, 2016 at 2:48 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >> On Thu, Apr 7, 2016 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
wrote:
> >>> Yes if the variable that we'd like to pass to a backend is BOOL, INT,
> >>> REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
> >>> complicated.
> >> SyncRepConfig is a parsed result of SyncRepStandbyNames, why you want
to
> >> pass that? I assume that current non-default value of
SyncRepStandbyNames
> >> will be passed via write_nondefault_variables(), so we can use that to
> >> regenerate SyncRepConfig.
> >
> > Yes, so SyncRepUpdateConfig() needs to be called by a backend after
fork,
> > to regenerate SyncRepConfig from the passed value of
SyncRepStandbyNames.
> > This is the approach of (2) which I explained upthread. assign_XXX
function
> > doesn't seem to be helpful for this case.
>
> I don't see why there is need to SyncRepUpdateConfig() after every fork or
> anywhere outside syncrep.c/walsender.c for that matter. AIUI, only
> walsender or a backend that runs pg_stat_get_wal_senders() ever needs to
> run SyncRepUpdateConfig() to get parsed synchronous standbys info from the
> string that is SyncRepStandbyNames.
>

So if we go by this each time backend calls pg_stat_get_wal_senders, it
needs to do parsing to form SyncRepConfig whether it's changed or not from
previous time. I understand that this is not a performance critical path,
but still if we can do it in some other optimal way which doesn't hurt any
other path, then it will be better.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 12:19:12
Message-ID:	CAHGQGwHiHP06N6G4Qkoz9NJVotC-C8h7Mu0E12Y94KStiePhhA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 7:29 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Apr 7, 2016 at 1:30 PM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
> wrote:
>>
>> On 2016/04/07 15:26, Fujii Masao wrote:
>> > On Thu, Apr 7, 2016 at 2:48 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> > wrote:
>> >> On Thu, Apr 7, 2016 at 10:02 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> >> wrote:
>> >>> Yes if the variable that we'd like to pass to a backend is BOOL, INT,
>> >>> REAL, STRING or ENUM. But SyncRepConfig variable is a bit more
>> >>> complicated.
>> >> SyncRepConfig is a parsed result of SyncRepStandbyNames, why you want
>> >> to
>> >> pass that? I assume that current non-default value of
>> >> SyncRepStandbyNames
>> >> will be passed via write_nondefault_variables(), so we can use that to
>> >> regenerate SyncRepConfig.
>> >
>> > Yes, so SyncRepUpdateConfig() needs to be called by a backend after
>> > fork,
>> > to regenerate SyncRepConfig from the passed value of
>> > SyncRepStandbyNames.
>> > This is the approach of (2) which I explained upthread. assign_XXX
>> > function
>> > doesn't seem to be helpful for this case.
>>
>> I don't see why there is need to SyncRepUpdateConfig() after every fork or
>> anywhere outside syncrep.c/walsender.c for that matter. AIUI, only
>> walsender or a backend that runs pg_stat_get_wal_senders() ever needs to
>> run SyncRepUpdateConfig() to get parsed synchronous standbys info from the
>> string that is SyncRepStandbyNames.
>>
>
> So if we go by this each time backend calls pg_stat_get_wal_senders, it
> needs to do parsing to form SyncRepConfig whether it's changed or not from
> previous time. I understand that this is not a performance critical path,
> but still if we can do it in some other optimal way which doesn't hurt any
> other path, then it will be better.

So, will you write the patch? Either current implementation or
the approach you're suggesting works to me. If you really want
to change the current one, I'm happy to review that.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-07 14:43:57
Message-ID:	CAHGQGwGEe33r65P4hsiWZzM0tKDi9uAt0EbdMmkzde5-aJEByw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 5:04 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Here are few things I have noticed:
>> + for (i = 0; i < max_wal_senders; i++)
>> + {
>> + walsnd = &WalSndCtl->walsnds[i];
>> No volatile pointer to prevent code reordering?
>>
>> */
>> typedef struct WalSnd
>> {
>> + int slotno; /* index of this slot in WalSnd array */
>> pid_t pid; /* this walsender's process id, or 0 */
>> slotno is used nowhere.
>>
>> I'll grab the tests and look at them.
>
> So I had a look at those tests and finished with the attached:
> - patch 1 adds a reload routine to PostgresNode
> - patch 2 the list of tests.

Thanks for updating the patches!

Attached is the refactored version of the patch.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
test-n-syncrep.patch	text/x-patch	5.7 KB

From:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-08 03:55:26
Message-ID:	CAEepm=1791g=nfxy+8h=6wKfSEdqD4eaVEMUcqa7TSC_S9wU3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 8:23 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> Okay, I pushed the patch!
> Many thanks to all involved in the development of this feature!
>

Hi,

I spotted a couple of places in the documentation that still implied there
was only one synchronous standby. Please see suggested fixes attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachment	Content-Type	Size
doc.patch	application/octet-stream	1.8 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-08 04:26:01
Message-ID:	CAHGQGwEiUeEArF0eCEY5Q5M+zcdnoRO0FLg7BcBDoCLwy9NBWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 8, 2016 at 12:55 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Wed, Apr 6, 2016 at 8:23 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> Okay, I pushed the patch!
>> Many thanks to all involved in the development of this feature!
>
>
> Hi,
>
> I spotted a couple of places in the documentation that still implied there
> was only one synchronous standby. Please see suggested fixes attached.

Thanks! Applied.

Regards,

--
Fujii Masao

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-08 05:26:57
Message-ID:	CAB7nPqRQKSGuRfGEOs0zKCDEupkYDpSyjH=cFfme=NBOoCOwCw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 11:43 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 6, 2016 at 5:04 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> Here are few things I have noticed:
>>> + for (i = 0; i < max_wal_senders; i++)
>>> + {
>>> + walsnd = &WalSndCtl->walsnds[i];
>>> No volatile pointer to prevent code reordering?
>>>
>>> */
>>> typedef struct WalSnd
>>> {
>>> + int slotno; /* index of this slot in WalSnd array */
>>> pid_t pid; /* this walsender's process id, or 0 */
>>> slotno is used nowhere.
>>>
>>> I'll grab the tests and look at them.
>>
>> So I had a look at those tests and finished with the attached:
>> - patch 1 adds a reload routine to PostgresNode
>> - patch 2 the list of tests.
>
> Thanks for updating the patches!
>
> Attached is the refactored version of the patch.

Thanks. This looks good to me.

.gitattributes complains a bit:
$ git diff n_sync --check
src/test/recovery/t/007_sync_rep.pl:22: trailing whitespace.
+ $self->reload;
--
Michael

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-08 07:50:23
Message-ID:	CAHGQGwHkQy0hSc2gv=ucP1JgiDwq9Zhoz09cb58zTKxghATiLw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 8, 2016 at 2:26 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Apr 7, 2016 at 11:43 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Apr 6, 2016 at 5:04 PM, Michael Paquier
>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> Here are few things I have noticed:
>>>> + for (i = 0; i < max_wal_senders; i++)
>>>> + {
>>>> + walsnd = &WalSndCtl->walsnds[i];
>>>> No volatile pointer to prevent code reordering?
>>>>
>>>> */
>>>> typedef struct WalSnd
>>>> {
>>>> + int slotno; /* index of this slot in WalSnd array */
>>>> pid_t pid; /* this walsender's process id, or 0 */
>>>> slotno is used nowhere.
>>>>
>>>> I'll grab the tests and look at them.
>>>
>>> So I had a look at those tests and finished with the attached:
>>> - patch 1 adds a reload routine to PostgresNode
>>> - patch 2 the list of tests.
>>
>> Thanks for updating the patches!
>>
>> Attached is the refactored version of the patch.
>
> Thanks. This looks good to me.
>
> .gitattributes complains a bit:
> $ git diff n_sync --check
> src/test/recovery/t/007_sync_rep.pl:22: trailing whitespace.
> + $self->reload;

Thanks for the review! I've finally pushed the patch.

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-08 08:07:21
Message-ID:	CAD21AoA5SQ7RS32UB3P1YOzJ=UECQHXK5-pRj4rnN07MkONrvg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 8, 2016 at 4:50 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Apr 8, 2016 at 2:26 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Apr 7, 2016 at 11:43 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Wed, Apr 6, 2016 at 5:04 PM, Michael Paquier
>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>> On Wed, Apr 6, 2016 at 4:08 PM, Michael Paquier
>>>> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>>>> Here are few things I have noticed:
>>>>> + for (i = 0; i < max_wal_senders; i++)
>>>>> + {
>>>>> + walsnd = &WalSndCtl->walsnds[i];
>>>>> No volatile pointer to prevent code reordering?
>>>>>
>>>>> */
>>>>> typedef struct WalSnd
>>>>> {
>>>>> + int slotno; /* index of this slot in WalSnd array */
>>>>> pid_t pid; /* this walsender's process id, or 0 */
>>>>> slotno is used nowhere.
>>>>>
>>>>> I'll grab the tests and look at them.
>>>>
>>>> So I had a look at those tests and finished with the attached:
>>>> - patch 1 adds a reload routine to PostgresNode
>>>> - patch 2 the list of tests.
>>>
>>> Thanks for updating the patches!
>>>
>>> Attached is the refactored version of the patch.
>>
>> Thanks. This looks good to me.
>>
>> .gitattributes complains a bit:
>> $ git diff n_sync --check
>> src/test/recovery/t/007_sync_rep.pl:22: trailing whitespace.
>> + $self->reload;
>
> Thanks for the review! I've finally pushed the patch.
>

Thank you!

Regards,

--
Masahiko Sawada

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-08 11:02:55
Message-ID:	CAA4eK1KUi=ps5H1UEOJB+40Kcah-whjTOezmzWQ2ZNB5oJyq7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 7, 2016 at 5:49 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Thu, Apr 7, 2016 at 7:29 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> > So if we go by this each time backend calls pg_stat_get_wal_senders, it
> > needs to do parsing to form SyncRepConfig whether it's changed or not
from
> > previous time. I understand that this is not a performance critical
path,
> > but still if we can do it in some other optimal way which doesn't hurt
any
> > other path, then it will be better.
>
> So, will you write the patch? Either current implementation or
> the approach you're suggesting works to me. If you really want
> to change the current one, I'm happy to review that.
>

Sorry, I don't have time to complete the patch, but I have written an
initial patch to show you what I have in mind and something on this lines
should work. I think with such an approach, you don't need to parse for
s_s_names twice (once in check_* and once in syncupdate* function), you
can refer check_temp_tablespaces() and assign_temp_tablespaces() to see how
to use the work done by check_* function in assign_* function. Also write
now, I have used TopMemoryContext for allocation in
assign_synchronous_standby_names, it is better to use guc_malloc or
something similar for allocation as is done in other check_* and assign_*
functions.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v1.patch	application/octet-stream	3.3 KB

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-09 02:27:02
Message-ID:	CAMkU=1zsmU2FYkeRm2Nhx6WzhcJKBf3CnomB3+nLDS=SW7EnJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 6, 2016 at 1:23 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> Okay, I pushed the patch!
> Many thanks to all involved in the development of this feature!

Thanks, a nice feature.

When I compile now without cassert, I get the compiler warning:

syncrep.c: In function 'SyncRepUpdateConfig':
syncrep.c:878:6: warning: variable 'parse_rc' set but not used
[-Wunused-but-set-variable]

Cheers,

Jeff

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-09 03:32:50
Message-ID:	2116.1460172770@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> When I compile now without cassert, I get the compiler warning:

> syncrep.c: In function 'SyncRepUpdateConfig':
> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
> [-Wunused-but-set-variable]

If there's a good reason for that to be an Assert, I don't see it.
There are no callers of SyncRepUpdateConfig that look like they
need to, or should expect not to have to, tolerate errors.
I think the way to fix this is to turn the Assert into a plain
old test-and-ereport-ERROR.

regards, tom lane

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-11 01:58:06
Message-ID:	CAD21AoBQ2-EmF58Nmzh14u-VjKN5wPVbg5DMTkhg_ZY5_kfnTw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> When I compile now without cassert, I get the compiler warning:
>
>> syncrep.c: In function 'SyncRepUpdateConfig':
>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>> [-Wunused-but-set-variable]
>
> If there's a good reason for that to be an Assert, I don't see it.
> There are no callers of SyncRepUpdateConfig that look like they
> need to, or should expect not to have to, tolerate errors.
> I think the way to fix this is to turn the Assert into a plain
> old test-and-ereport-ERROR.
>

I've changed the draft patch Amit implemented so that it doesn't parse
twice(check_hook and assign_hook).
So assertion that was in assign_hook is no longer necessary.

Please find attached.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v2.patch	text/x-patch	5.2 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-11 04:31:38
Message-ID:	CAHGQGwFwu6COyxK4ZqUv3CpNj7f949SvEzPPWiGKSBvFa3owKg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 11, 2016 at 10:58 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>> When I compile now without cassert, I get the compiler warning:
>>
>>> syncrep.c: In function 'SyncRepUpdateConfig':
>>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>>> [-Wunused-but-set-variable]
>>
>> If there's a good reason for that to be an Assert, I don't see it.
>> There are no callers of SyncRepUpdateConfig that look like they
>> need to, or should expect not to have to, tolerate errors.
>> I think the way to fix this is to turn the Assert into a plain
>> old test-and-ereport-ERROR.
>>
>
> I've changed the draft patch Amit implemented so that it doesn't parse
> twice(check_hook and assign_hook).
> So assertion that was in assign_hook is no longer necessary.
>
> Please find attached.

Thanks for the patch!

When I emptied s_s_names, reloaded the configration file, set it to 'standby1'
and reloaded the configuration file again, the master crashed with
the following error.

*** glibc detected *** postgres: wal sender process postgres [local]
streaming 0/3015F18: munmap_chunk(): invalid pointer:
0x00000000024d9a40 ***
======= Backtrace: =========
*** glibc detected *** postgres: wal sender process postgres [local]
streaming 0/3015F18: munmap_chunk(): invalid pointer:
0x00000000024d9a40 ***
/lib64/libc.so.6[0x3be8e75f4e]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
======= Backtrace: =========
/lib64/libc.so.6[0x3be8e75f4e]
postgres: wal sender process postgres [local] streaming
0/3015F18(set_config_option+0x12cb)[0x982242]
postgres: wal sender process postgres [local] streaming
0/3015F18(SetConfigOption+0x4b)[0x9827ff]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
postgres: wal sender process postgres [local] streaming
0/3015F18(set_config_option+0x12cb)[0x982242]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
postgres: wal sender process postgres [local] streaming
0/3015F18(SetConfigOption+0x4b)[0x9827ff]
postgres: wal sender process postgres [local] streaming
0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
postgres: wal sender process postgres [local] streaming
0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
postgres: wal sender process postgres [local] streaming
0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
postgres: wal sender process postgres [local] streaming
0/3015F18(PostgresMain+0x772)[0x8141b6]
postgres: wal sender process postgres [local] streaming
0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
postgres: wal sender process postgres [local] streaming
0/3015F18(PostgresMain+0x772)[0x8141b6]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
postgres: wal sender process postgres [local] streaming
0/3015F18(PostmasterMain+0x1134)[0x784c08]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x6ce12e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3be8e1ed5d]
postgres: wal sender process postgres [local] streaming
0/3015F18(PostmasterMain+0x1134)[0x784c08]
postgres: wal sender process postgres [local] streaming 0/3015F18[0x467e99]

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-11 06:55:29
Message-ID:	CAHGQGwH0F2VdZMegKQ+mCfAn6vh3YpQSs1NdZPBsK_u56csLfg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for the report!

> If there's a good reason for that to be an Assert, I don't see it.
> There are no callers of SyncRepUpdateConfig that look like they
> need to, or should expect not to have to, tolerate errors.
> I think the way to fix this is to turn the Assert into a plain
> old test-and-ereport-ERROR.

Okay, I pushed that change. Thanks for the suggestion!

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-11 08:52:24
Message-ID:	CAD21AoBqft7H3fVJJP3tkXL4GVkyUeyFVLUwdKHmh7vCCYH4vA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 11, 2016 at 1:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Apr 11, 2016 at 10:58 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>>> When I compile now without cassert, I get the compiler warning:
>>>
>>>> syncrep.c: In function 'SyncRepUpdateConfig':
>>>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>>>> [-Wunused-but-set-variable]
>>>
>>> If there's a good reason for that to be an Assert, I don't see it.
>>> There are no callers of SyncRepUpdateConfig that look like they
>>> need to, or should expect not to have to, tolerate errors.
>>> I think the way to fix this is to turn the Assert into a plain
>>> old test-and-ereport-ERROR.
>>>
>>
>> I've changed the draft patch Amit implemented so that it doesn't parse
>> twice(check_hook and assign_hook).
>> So assertion that was in assign_hook is no longer necessary.
>>
>> Please find attached.
>
> Thanks for the patch!
>
> When I emptied s_s_names, reloaded the configration file, set it to 'standby1'
> and reloaded the configuration file again, the master crashed with
> the following error.
>
> *** glibc detected *** postgres: wal sender process postgres [local]
> streaming 0/3015F18: munmap_chunk(): invalid pointer:
> 0x00000000024d9a40 ***
> ======= Backtrace: =========
> *** glibc detected *** postgres: wal sender process postgres [local]
> streaming 0/3015F18: munmap_chunk(): invalid pointer:
> 0x00000000024d9a40 ***
> /lib64/libc.so.6[0x3be8e75f4e]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
> ======= Backtrace: =========
> /lib64/libc.so.6[0x3be8e75f4e]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(set_config_option+0x12cb)[0x982242]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(set_config_option+0x12cb)[0x982242]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(PostgresMain+0x772)[0x8141b6]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(PostgresMain+0x772)[0x8141b6]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x6ce12e]
> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3be8e1ed5d]
> postgres: wal sender process postgres [local] streaming
> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
> postgres: wal sender process postgres [local] streaming 0/3015F18[0x467e99]
>

Thank you for reviewing.

SyncRepUpdateConfig() seems to be no longer necessary.
Attached updated version.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v3.patch	text/x-patch	5.9 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-11 11:47:30
Message-ID:	CAHGQGwHEAbDfYjyM5jSSJVs-LYsm25n1-5HMUprYS9PCg5Xczg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 11, 2016 at 5:52 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Apr 11, 2016 at 1:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Mon, Apr 11, 2016 at 10:58 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>>>> When I compile now without cassert, I get the compiler warning:
>>>>
>>>>> syncrep.c: In function 'SyncRepUpdateConfig':
>>>>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>>>>> [-Wunused-but-set-variable]
>>>>
>>>> If there's a good reason for that to be an Assert, I don't see it.
>>>> There are no callers of SyncRepUpdateConfig that look like they
>>>> need to, or should expect not to have to, tolerate errors.
>>>> I think the way to fix this is to turn the Assert into a plain
>>>> old test-and-ereport-ERROR.
>>>>
>>>
>>> I've changed the draft patch Amit implemented so that it doesn't parse
>>> twice(check_hook and assign_hook).
>>> So assertion that was in assign_hook is no longer necessary.
>>>
>>> Please find attached.
>>
>> Thanks for the patch!
>>
>> When I emptied s_s_names, reloaded the configration file, set it to 'standby1'
>> and reloaded the configuration file again, the master crashed with
>> the following error.
>>
>> *** glibc detected *** postgres: wal sender process postgres [local]
>> streaming 0/3015F18: munmap_chunk(): invalid pointer:
>> 0x00000000024d9a40 ***
>> ======= Backtrace: =========
>> *** glibc detected *** postgres: wal sender process postgres [local]
>> streaming 0/3015F18: munmap_chunk(): invalid pointer:
>> 0x00000000024d9a40 ***
>> /lib64/libc.so.6[0x3be8e75f4e]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
>> ======= Backtrace: =========
>> /lib64/libc.so.6[0x3be8e75f4e]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(set_config_option+0x12cb)[0x982242]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(set_config_option+0x12cb)[0x982242]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(PostgresMain+0x772)[0x8141b6]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(PostgresMain+0x772)[0x8141b6]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x6ce12e]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3be8e1ed5d]
>> postgres: wal sender process postgres [local] streaming
>> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x467e99]
>>
>
> Thank you for reviewing.
>
> SyncRepUpdateConfig() seems to be no longer necessary.

Really? I was thinking that something like that function needs to
be called at the beginning of a backend and walsender in
EXEC_BACKEND case. No?

Regards,

--
Fujii Masao

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-12 00:04:20
Message-ID:	CAD21AoDSh4eVUVARf47zUFk1Bq57pUB+C=9NZtLh8Hw6Ru_KWg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 11, 2016 at 8:47 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Apr 11, 2016 at 5:52 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Mon, Apr 11, 2016 at 1:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Mon, Apr 11, 2016 at 10:58 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>>>>> When I compile now without cassert, I get the compiler warning:
>>>>>
>>>>>> syncrep.c: In function 'SyncRepUpdateConfig':
>>>>>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>>>>>> [-Wunused-but-set-variable]
>>>>>
>>>>> If there's a good reason for that to be an Assert, I don't see it.
>>>>> There are no callers of SyncRepUpdateConfig that look like they
>>>>> need to, or should expect not to have to, tolerate errors.
>>>>> I think the way to fix this is to turn the Assert into a plain
>>>>> old test-and-ereport-ERROR.
>>>>>
>>>>
>>>> I've changed the draft patch Amit implemented so that it doesn't parse
>>>> twice(check_hook and assign_hook).
>>>> So assertion that was in assign_hook is no longer necessary.
>>>>
>>>> Please find attached.
>>>
>>> Thanks for the patch!
>>>
>>> When I emptied s_s_names, reloaded the configration file, set it to 'standby1'
>>> and reloaded the configuration file again, the master crashed with
>>> the following error.
>>>
>>> *** glibc detected *** postgres: wal sender process postgres [local]
>>> streaming 0/3015F18: munmap_chunk(): invalid pointer:
>>> 0x00000000024d9a40 ***
>>> ======= Backtrace: =========
>>> *** glibc detected *** postgres: wal sender process postgres [local]
>>> streaming 0/3015F18: munmap_chunk(): invalid pointer:
>>> 0x00000000024d9a40 ***
>>> /lib64/libc.so.6[0x3be8e75f4e]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
>>> ======= Backtrace: =========
>>> /lib64/libc.so.6[0x3be8e75f4e]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(set_config_option+0x12cb)[0x982242]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(set_config_option+0x12cb)[0x982242]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(PostgresMain+0x772)[0x8141b6]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(PostgresMain+0x772)[0x8141b6]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x6ce12e]
>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3be8e1ed5d]
>>> postgres: wal sender process postgres [local] streaming
>>> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x467e99]
>>>
>>
>> Thank you for reviewing.
>>
>> SyncRepUpdateConfig() seems to be no longer necessary.
>
> Really? I was thinking that something like that function needs to
> be called at the beginning of a backend and walsender in
> EXEC_BACKEND case. No?
>

Hmm, in EXEC_BACKEND case, I guess that each child process calls
read_nondefault_variables that parses and validates these
configuration parameters in SubPostmasterMain.
Previous patch didn't apply to HEAD cleanly, attached updated version.

Regards,

--
Masahiko Sawada

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v4.patch	text/x-patch	6.0 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-12 19:43:35
Message-ID:	CAHGQGwEmZhBdjb1x3+KtUU9VV5xnhgCBO4TejibOXF_VHaeVXg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 12, 2016 at 9:04 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Apr 11, 2016 at 8:47 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Mon, Apr 11, 2016 at 5:52 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Mon, Apr 11, 2016 at 1:31 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Mon, Apr 11, 2016 at 10:58 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>>> On Sat, Apr 9, 2016 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>>>>>>> When I compile now without cassert, I get the compiler warning:
>>>>>>
>>>>>>> syncrep.c: In function 'SyncRepUpdateConfig':
>>>>>>> syncrep.c:878:6: warning: variable 'parse_rc' set but not used
>>>>>>> [-Wunused-but-set-variable]
>>>>>>
>>>>>> If there's a good reason for that to be an Assert, I don't see it.
>>>>>> There are no callers of SyncRepUpdateConfig that look like they
>>>>>> need to, or should expect not to have to, tolerate errors.
>>>>>> I think the way to fix this is to turn the Assert into a plain
>>>>>> old test-and-ereport-ERROR.
>>>>>>
>>>>>
>>>>> I've changed the draft patch Amit implemented so that it doesn't parse
>>>>> twice(check_hook and assign_hook).
>>>>> So assertion that was in assign_hook is no longer necessary.
>>>>>
>>>>> Please find attached.
>>>>
>>>> Thanks for the patch!
>>>>
>>>> When I emptied s_s_names, reloaded the configration file, set it to 'standby1'
>>>> and reloaded the configuration file again, the master crashed with
>>>> the following error.
>>>>
>>>> *** glibc detected *** postgres: wal sender process postgres [local]
>>>> streaming 0/3015F18: munmap_chunk(): invalid pointer:
>>>> 0x00000000024d9a40 ***
>>>> ======= Backtrace: =========
>>>> *** glibc detected *** postgres: wal sender process postgres [local]
>>>> streaming 0/3015F18: munmap_chunk(): invalid pointer:
>>>> 0x00000000024d9a40 ***
>>>> /lib64/libc.so.6[0x3be8e75f4e]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
>>>> ======= Backtrace: =========
>>>> /lib64/libc.so.6[0x3be8e75f4e]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(set_config_option+0x12cb)[0x982242]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x97dae2]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(set_config_option+0x12cb)[0x982242]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(SetConfigOption+0x4b)[0x9827ff]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x988b4e]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x98af40]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(ProcessConfigFile+0x9f)[0x98a98b]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b50fd]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7b359c]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(PostgresMain+0x772)[0x8141b6]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(exec_replication_command+0x1a7)[0x7b47b6]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(PostgresMain+0x772)[0x8141b6]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x7896f7]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x788e62]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x785544]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x6ce12e]
>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x3be8e1ed5d]
>>>> postgres: wal sender process postgres [local] streaming
>>>> 0/3015F18(PostmasterMain+0x1134)[0x784c08]
>>>> postgres: wal sender process postgres [local] streaming 0/3015F18[0x467e99]
>>>>
>>>
>>> Thank you for reviewing.
>>>
>>> SyncRepUpdateConfig() seems to be no longer necessary.
>>
>> Really? I was thinking that something like that function needs to
>> be called at the beginning of a backend and walsender in
>> EXEC_BACKEND case. No?
>>
>
> Hmm, in EXEC_BACKEND case, I guess that each child process calls
> read_nondefault_variables that parses and validates these
> configuration parameters in SubPostmasterMain.

SyncRepStandbyNames is passed but SyncRepConfig is not, I think.

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-13 08:14:22
Message-ID:	20160413.171422.221669162.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 13 Apr 2016 04:43:35 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwEmZhBdjb1x3+KtUU9VV5xnhgCBO4TejibOXF_VHaeVXg(at)mail(dot)gmail(dot)com>
> >>> Thank you for reviewing.
> >>>
> >>> SyncRepUpdateConfig() seems to be no longer necessary.
> >>
> >> Really? I was thinking that something like that function needs to
> >> be called at the beginning of a backend and walsender in
> >> EXEC_BACKEND case. No?
> >>
> >
> > Hmm, in EXEC_BACKEND case, I guess that each child process calls
> > read_nondefault_variables that parses and validates these
> > configuration parameters in SubPostmasterMain.
>
> SyncRepStandbyNames is passed but SyncRepConfig is not, I think.

SyncRepStandbyNames is passed to exec'ed backends by
read_nondefault_variables, which calls set_config_option, which
calls check/assign_s_s_names then syncrep_yyparse, which sets
SyncRepConfig.

Since guess battle is a waste of time, I actually built and ran
on Windows7 and observed that SyncRepConfig has been set before
WalSndLoop starts.

> LOG: check_s_s_names(pid=20596, newval=)
> LOG: assign_s_s_names(pid=20596, newval=, SyncRepConfig=00000000)
> LOG: read_nondefault_variables(pid=20596)
> LOG: set_config_option(synchronous_standby_names)(pid=20596)
> LOG: check_s_s_names(pid=20596, newval=2[standby,sby2,sby3])
> LOG: assign_s_s_names(pid=20596, newval=2[standby,sby2,sby3], SyncRepConfig=01383598)
> LOG: WalSndLoop(pid=20596)

By the way, the patch assumes that one check_s_s_names is
followed by exactly one assign_s_s_names. I suppose that myextra
should be handled without such assumption.

Plus, the name myextra should be any saner name..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 02:45:14
Message-ID:	CAA4eK1K0UJzYWojr4JSFWC9QXYY+sw2kiDR5amjXFmGoQH5M7g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 13, 2016 at 1:44 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> At Wed, 13 Apr 2016 04:43:35 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
wrote in <CAHGQGwEmZhBdjb1x3+KtUU9VV5xnhgCBO4TejibOXF_VHaeVXg(at)mail(dot)gmail(dot)com
>
> > >>> Thank you for reviewing.
> > >>>
> > >>> SyncRepUpdateConfig() seems to be no longer necessary.
> > >>
> > >> Really? I was thinking that something like that function needs to
> > >> be called at the beginning of a backend and walsender in
> > >> EXEC_BACKEND case. No?
> > >>
> > >
> > > Hmm, in EXEC_BACKEND case, I guess that each child process calls
> > > read_nondefault_variables that parses and validates these
> > > configuration parameters in SubPostmasterMain.
> >
> > SyncRepStandbyNames is passed but SyncRepConfig is not, I think.
>
> SyncRepStandbyNames is passed to exec'ed backends by
> read_nondefault_variables, which calls set_config_option, which
> calls check/assign_s_s_names then syncrep_yyparse, which sets
> SyncRepConfig.
>
> Since guess battle is a waste of time, I actually built and ran
> on Windows7 and observed that SyncRepConfig has been set before
> WalSndLoop starts.
>

Yes, this is what I was trying to explain to Fujii-san upthread and I have
also verified that the same works on Windows. I think one point which we
should try to ensure in this patch is whether it is good to use
TopMemoryContext to allocate the memory in the check or assign function or
should we allocate some temporary context (like we do in load_tzoffsets())
to perform parsing and then delete the same at end.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 03:42:06
Message-ID:	CAHGQGwH7F5gWfdCT71Ucix_w+8ipR1Owzv9k4VnA1fcMYyfr6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 14, 2016 at 11:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Apr 13, 2016 at 1:44 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>> At Wed, 13 Apr 2016 04:43:35 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> wrote in
>> <CAHGQGwEmZhBdjb1x3+KtUU9VV5xnhgCBO4TejibOXF_VHaeVXg(at)mail(dot)gmail(dot)com>
>> > >>> Thank you for reviewing.
>> > >>>
>> > >>> SyncRepUpdateConfig() seems to be no longer necessary.
>> > >>
>> > >> Really? I was thinking that something like that function needs to
>> > >> be called at the beginning of a backend and walsender in
>> > >> EXEC_BACKEND case. No?
>> > >>
>> > >
>> > > Hmm, in EXEC_BACKEND case, I guess that each child process calls
>> > > read_nondefault_variables that parses and validates these
>> > > configuration parameters in SubPostmasterMain.
>> >
>> > SyncRepStandbyNames is passed but SyncRepConfig is not, I think.
>>
>> SyncRepStandbyNames is passed to exec'ed backends by
>> read_nondefault_variables, which calls set_config_option, which
>> calls check/assign_s_s_names then syncrep_yyparse, which sets
>> SyncRepConfig.
>>
>> Since guess battle is a waste of time, I actually built and ran
>> on Windows7 and observed that SyncRepConfig has been set before
>> WalSndLoop starts.
>>
>
> Yes, this is what I was trying to explain to Fujii-san upthread and I have
> also verified that the same works on Windows.

Oh, okay, understood. Thanks for explaining that!

> I think one point which we
> should try to ensure in this patch is whether it is good to use
> TopMemoryContext to allocate the memory in the check or assign function or
> should we allocate some temporary context (like we do in load_tzoffsets())
> to perform parsing and then delete the same at end.

Seems yes if some memories are allocated by palloc and they are not
free'd while parsing s_s_names.

Here are another comment for the patch.

-SyncRepFreeConfig(SyncRepConfigData *config)
+SyncRepFreeConfig(SyncRepConfigData *config, bool itself)

SyncRepFreeConfig() was extended so that it accepts the second boolean
argument. But it's always called with the second argument = false. So,
I just wonder why that second argument is required.

SyncRepConfigData *config =
- (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
+ (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));

Why should we use malloc instead of palloc here?

*If* we use malloc, its return value must be checked.

Regards,

--
Fujii Masao

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	masao(dot)fujii(at)gmail(dot)com
Cc:	amit(dot)kapila16(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 04:11:10
Message-ID:	20160414.131110.145073440.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 14 Apr 2016 12:42:06 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwH7F5gWfdCT71Ucix_w+8ipR1Owzv9k4VnA1fcMYyfr6w(at)mail(dot)gmail(dot)com>
> > Yes, this is what I was trying to explain to Fujii-san upthread and I have
> > also verified that the same works on Windows.
>
> Oh, okay, understood. Thanks for explaining that!
>
> > I think one point which we
> > should try to ensure in this patch is whether it is good to use
> > TopMemoryContext to allocate the memory in the check or assign function or
> > should we allocate some temporary context (like we do in load_tzoffsets())
> > to perform parsing and then delete the same at end.
>
> Seems yes if some memories are allocated by palloc and they are not
> free'd while parsing s_s_names.
>
> Here are another comment for the patch.
>
> -SyncRepFreeConfig(SyncRepConfigData *config)
> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself)
>
> SyncRepFreeConfig() was extended so that it accepts the second boolean
> argument. But it's always called with the second argument = false. So,
> I just wonder why that second argument is required.
>
> SyncRepConfigData *config =
> - (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
> + (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
>
> Why should we use malloc instead of palloc here?
>
> *If* we use malloc, its return value must be checked.

Because it should live irrelevant to any memory context, as guc
values are so. guc.c provides guc_malloc for this purpose, which
is a malloc having some simple error handling, so having
walsender_malloc would be reasonable.

I don't think it's good to use TopMemoryContext for syncrep
parser. syncrep_scanner.l uses palloc. This basically causes a
memory leak on all postgres processes.

It might be better if the parser works on the current memory
context and the caller copies the result on the malloc'ed
memory. But some list-creation functions using palloc.. Changing
SyncRepConfigData.members to be char** would be messier..

Any idea?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 04:24:34
Message-ID:	CAB7nPqThcdv+CrWyWbFQGYL0GJFZeWVGXs5K9x65WWgbqkJ7YQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 14, 2016 at 11:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Yes, this is what I was trying to explain to Fujii-san upthread and I have
> also verified that the same works on Windows.

If you could, it would be nice as well to check that nothing breaks
with VS when using vcregress recoverycheck.
--
Michael

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 07:40:56
Message-ID:	CAD21AoA9GWKn73cvu950=RRrnbrKgKrxOcUFPiENyDB7Q=zW4w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 14, 2016 at 1:11 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Thu, 14 Apr 2016 12:42:06 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote in <CAHGQGwH7F5gWfdCT71Ucix_w+8ipR1Owzv9k4VnA1fcMYyfr6w(at)mail(dot)gmail(dot)com>
>> > Yes, this is what I was trying to explain to Fujii-san upthread and I have
>> > also verified that the same works on Windows.
>>
>> Oh, okay, understood. Thanks for explaining that!
>>
>> > I think one point which we
>> > should try to ensure in this patch is whether it is good to use
>> > TopMemoryContext to allocate the memory in the check or assign function or
>> > should we allocate some temporary context (like we do in load_tzoffsets())
>> > to perform parsing and then delete the same at end.
>>
>> Seems yes if some memories are allocated by palloc and they are not
>> free'd while parsing s_s_names.
>>
>> Here are another comment for the patch.
>>
>> -SyncRepFreeConfig(SyncRepConfigData *config)
>> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself)
>>
>> SyncRepFreeConfig() was extended so that it accepts the second boolean
>> argument. But it's always called with the second argument = false. So,
>> I just wonder why that second argument is required.
>>
>> SyncRepConfigData *config =
>> - (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
>> + (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
>>
>> Why should we use malloc instead of palloc here?
>>
>> *If* we use malloc, its return value must be checked.
>
> Because it should live irrelevant to any memory context, as guc
> values are so. guc.c provides guc_malloc for this purpose, which
> is a malloc having some simple error handling, so having
> walsender_malloc would be reasonable.
>
> I don't think it's good to use TopMemoryContext for syncrep
> parser. syncrep_scanner.l uses palloc. This basically causes a
> memory leak on all postgres processes.
>
> It might be better if the parser works on the current memory
> context and the caller copies the result on the malloc'ed
> memory. But some list-creation functions using palloc.. Changing
> SyncRepConfigData.members to be char** would be messier..

SyncRepGetSyncStandby logic assumes deeply that the sync standby names
are constructed as a list.
I think that it would entail a radical change in SyncRepGetStandby
Another idea is to prepare the some functions that allocate/free
element of list using by malloc, free.

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	amit(dot)kapila16(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 08:25:39
Message-ID:	20160414.172539.34325458.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Thu, 14 Apr 2016 13:24:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqThcdv+CrWyWbFQGYL0GJFZeWVGXs5K9x65WWgbqkJ7YQ(at)mail(dot)gmail(dot)com>
> On Thu, Apr 14, 2016 at 11:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > Yes, this is what I was trying to explain to Fujii-san upthread and I have
> > also verified that the same works on Windows.
>
> If you could, it would be nice as well to check that nothing breaks
> with VS when using vcregress recoverycheck.

I failed the test because of not preparing for TAP tests. But
instead, I noticed that vcregress.pl shows a bit wrong help
message.

The new messages in the following diff is the same to the regexp
to check the parameter of vcregress.

======
diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
index 3d14544..08e2acc 100644
--- a/src/tools/msvc/vcregress.pl
+++ b/src/tools/msvc/vcregress.pl
@@ -548,6 +548,6 @@ sub usage
{
print STDERR
"Usage: vcregress.pl ",
-"<check|installcheck|plcheck|contribcheck|isolationcheck|ecpgcheck|upgradecheck> [schedule]\n";
+"<check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck> [schedule]\n";
exit(1);
}
=====

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	amit(dot)kapila16(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 08:48:42
Message-ID:	20160414.174842.42247402.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 14 Apr 2016 17:25:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160414(dot)172539(dot)34325458(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello,
>
> At Thu, 14 Apr 2016 13:24:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqThcdv+CrWyWbFQGYL0GJFZeWVGXs5K9x65WWgbqkJ7YQ(at)mail(dot)gmail(dot)com>
> > On Thu, Apr 14, 2016 at 11:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > Yes, this is what I was trying to explain to Fujii-san upthread and I have
> > > also verified that the same works on Windows.
> >
> > If you could, it would be nice as well to check that nothing breaks
> > with VS when using vcregress recoverycheck.

IPC::Run is not installed on Active Perl on my environment and
Active state seems to be saying that IPC-Run cannot be compiled
on Windows. ppm doesn't show IPC-Run. Is there any means to do
TAP test other than this way?

https://code.activestate.com/ppm/IPC-Run/

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	amit(dot)kapila16(at)gmail(dot)com, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, sawada(dot)mshk(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jeff(dot)janes(at)gmail(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 12:04:29
Message-ID:	CAB7nPqQ3CHDGZqSsNpbw-Tyk0h=cK+zc0i-GKFrTTWk95nR1GQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 14, 2016 at 5:25 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> diff --git a/src/tools/msvc/vcregress.pl b/src/tools/msvc/vcregress.pl
> index 3d14544..08e2acc 100644
> --- a/src/tools/msvc/vcregress.pl
> +++ b/src/tools/msvc/vcregress.pl
> @@ -548,6 +548,6 @@ sub usage
> {
> print STDERR
> "Usage: vcregress.pl ",
> -"<check|installcheck|plcheck|contribcheck|isolationcheck|ecpgcheck|upgradecheck> [schedule]\n";
> +"<check|installcheck|plcheck|contribcheck|modulescheck|ecpgcheck|isolationcheck|upgradecheck|bincheck|recoverycheck> [schedule]\n";
> exit(1);
> }

Right, this is missing modulescheck, bincheck and recoverycheck. All 3
are actually mainly my fault, or perhaps Andrew scored once on
bincheck. Honestly, this is unreadable and that's always tiring to
decrypt it, so why not changing it to something more explicit like the
attached? See by yourself:
$ perl vcregress.pl
Usage: vcregress.pl <mode> [ <schedule> ]

Options for <mode>:
bincheck run tests of utilities in src/bin/
check deploy instance and run regression tests on it
contribcheck run tests of modules in contrib/
ecpgcheck run regression tests of ECPG driver
installcheck run regression tests on existing instance
isolationcheck run isolation tests
modulescheck run tests of modules in src/test/modules
plcheck run tests of PL languages
recoverycheck run recovery test suite
upgradecheck run tests of pg_upgrade

Options for <schedule>:
serial serial mode
parallel parallel mode
--
Michael

Attachment	Content-Type	Size
msvc-vcregress-help.patch	text/x-patch	1.1 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	amit(dot)kapila16(at)gmail(dot)com, Masao Fujii <masao(dot)fujii(at)gmail(dot)com>, sawada(dot)mshk(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, jeff(dot)janes(at)gmail(dot)com, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-14 12:05:40
Message-ID:	CAB7nPqSWLyP5ObQz_9Y=kezi0oGeZHaCPn6FT9BYK9tB3HbiVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 14, 2016 at 5:48 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Thu, 14 Apr 2016 17:25:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160414(dot)172539(dot)34325458(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> Hello,
>>
>> At Thu, 14 Apr 2016 13:24:34 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqThcdv+CrWyWbFQGYL0GJFZeWVGXs5K9x65WWgbqkJ7YQ(at)mail(dot)gmail(dot)com>
>> > On Thu, Apr 14, 2016 at 11:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> > > Yes, this is what I was trying to explain to Fujii-san upthread and I have
>> > > also verified that the same works on Windows.
>> >
>> > If you could, it would be nice as well to check that nothing breaks
>> > with VS when using vcregress recoverycheck.
>
> IPC::Run is not installed on Active Perl on my environment and
> Active state seems to be saying that IPC-Run cannot be compiled
> on Windows. ppm doesn't show IPC-Run. Is there any means to do
> TAP test other than this way?
>
> https://code.activestate.com/ppm/IPC-Run/

IPC::Run is a mandatory dependency I am afraid. You could just
download it from cpan and install it manually in your PERL5LIB path.
That's what I did, and it proves to work just fine.
--
Michael

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-15 03:22:56
Message-ID:	CAA4eK1+Qsw2hLEhrEBvveKC91uZQhDce9i-4dB8VPz87Ciz+OQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Apr 14, 2016 at 1:10 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Thu, Apr 14, 2016 at 1:11 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > At Thu, 14 Apr 2016 12:42:06 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
wrote in <CAHGQGwH7F5gWfdCT71Ucix_w+8ipR1Owzv9k4VnA1fcMYyfr6w(at)mail(dot)gmail(dot)com
>
> >> > Yes, this is what I was trying to explain to Fujii-san upthread and
I have
> >> > also verified that the same works on Windows.
> >>
> >> Oh, okay, understood. Thanks for explaining that!
> >>
> >> > I think one point which we
> >> > should try to ensure in this patch is whether it is good to use
> >> > TopMemoryContext to allocate the memory in the check or assign
function or
> >> > should we allocate some temporary context (like we do in
load_tzoffsets())
> >> > to perform parsing and then delete the same at end.
> >>
> >> Seems yes if some memories are allocated by palloc and they are not
> >> free'd while parsing s_s_names.
> >>
> >> Here are another comment for the patch.
> >>
> >> -SyncRepFreeConfig(SyncRepConfigData *config)
> >> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself)
> >>
> >> SyncRepFreeConfig() was extended so that it accepts the second boolean
> >> argument. But it's always called with the second argument = false. So,
> >> I just wonder why that second argument is required.
> >>
> >> SyncRepConfigData *config =
> >> - (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
> >> + (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
> >>
> >> Why should we use malloc instead of palloc here?
> >>
> >> *If* we use malloc, its return value must be checked.
> >
> > Because it should live irrelevant to any memory context, as guc
> > values are so. guc.c provides guc_malloc for this purpose, which
> > is a malloc having some simple error handling, so having
> > walsender_malloc would be reasonable.
> >
> > I don't think it's good to use TopMemoryContext for syncrep
> > parser. syncrep_scanner.l uses palloc. This basically causes a
> > memory leak on all postgres processes.
> >
> > It might be better if the parser works on the current memory
> > context and the caller copies the result on the malloc'ed
> > memory. But some list-creation functions using palloc..

How about if we do all the parsing stuff in temporary context and then copy
the results using TopMemoryContext? I don't think it will be a leak in
TopMemoryContext, because next time we try to check/assign s_s_names, it
will free the previous result.

>
> Changing
> > SyncRepConfigData.members to be char** would be messier..
>
> SyncRepGetSyncStandby logic assumes deeply that the sync standby names
> are constructed as a list.
> I think that it would entail a radical change in SyncRepGetStandby
> Another idea is to prepare the some functions that allocate/free
> element of list using by malloc, free.
>

Yeah, that could be another way of doing it, but seems like much more work.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	amit(dot)kapila16(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-15 04:45:00
Message-ID:	20160415.134500.134856973.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 14 Apr 2016 21:05:40 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqSWLyP5ObQz_9Y=kezi0oGeZHaCPn6FT9BYK9tB3HbiVg(at)mail(dot)gmail(dot)com>
> > IPC::Run is not installed on Active Perl on my environment and
> > Active state seems to be saying that IPC-Run cannot be compiled
> > on Windows. ppm doesn't show IPC-Run. Is there any means to do
> > TAP test other than this way?
> >
> > https://code.activestate.com/ppm/IPC-Run/
>
> IPC::Run is a mandatory dependency I am afraid. You could just
> download it from cpan and install it manually in your PERL5LIB path.
> That's what I did, and it proves to work just fine.

Hmm. I got an error that dmake is not found for the first time
but I could successfully install it this time. Thank you for
letting me retry.

I confirmed that fix_sync_rep_update_conf_v4.patch doesn't make
nothing to be broken in vcregress recoverycheck. And I will be
able to recheck for revised versions.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	amit(dot)kapila16(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-15 06:00:05
Message-ID:	20160415.150005.201575622.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+Qsw2hLEhrEBvveKC91uZQhDce9i-4dB8VPz87Ciz+OQ(at)mail(dot)gmail(dot)com>
> On Thu, Apr 14, 2016 at 1:10 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> >
> > On Thu, Apr 14, 2016 at 1:11 PM, Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > At Thu, 14 Apr 2016 12:42:06 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote in <CAHGQGwH7F5gWfdCT71Ucix_w+8ipR1Owzv9k4VnA1fcMYyfr6w(at)mail(dot)gmail(dot)com
> >
> > >> > Yes, this is what I was trying to explain to Fujii-san upthread and
> I have
> > >> > also verified that the same works on Windows.
> > >>
> > >> Oh, okay, understood. Thanks for explaining that!
> > >>
> > >> > I think one point which we
> > >> > should try to ensure in this patch is whether it is good to use
> > >> > TopMemoryContext to allocate the memory in the check or assign
> function or
> > >> > should we allocate some temporary context (like we do in
> load_tzoffsets())
> > >> > to perform parsing and then delete the same at end.
> > >>
> > >> Seems yes if some memories are allocated by palloc and they are not
> > >> free'd while parsing s_s_names.
> > >>
> > >> Here are another comment for the patch.
> > >>
> > >> -SyncRepFreeConfig(SyncRepConfigData *config)
> > >> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself)
> > >>
> > >> SyncRepFreeConfig() was extended so that it accepts the second boolean
> > >> argument. But it's always called with the second argument = false. So,
> > >> I just wonder why that second argument is required.
> > >>
> > >> SyncRepConfigData *config =
> > >> - (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
> > >> + (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
> > >>
> > >> Why should we use malloc instead of palloc here?
> > >>
> > >> *If* we use malloc, its return value must be checked.
> > >
> > > Because it should live irrelevant to any memory context, as guc
> > > values are so. guc.c provides guc_malloc for this purpose, which
> > > is a malloc having some simple error handling, so having
> > > walsender_malloc would be reasonable.
> > >
> > > I don't think it's good to use TopMemoryContext for syncrep
> > > parser. syncrep_scanner.l uses palloc. This basically causes a
> > > memory leak on all postgres processes.
> > >
> > > It might be better if the parser works on the current memory
> > > context and the caller copies the result on the malloc'ed
> > > memory. But some list-creation functions using palloc..
>
> How about if we do all the parsing stuff in temporary context and then copy
> the results using TopMemoryContext? I don't think it will be a leak in
> TopMemoryContext, because next time we try to check/assign s_s_names, it
> will free the previous result.

I agree with you. A temporary context for the parser seems
reasonable. TopMemoryContext is created very early in main() so
palloc on it is effectively the same with malloc.

One problem is that only the top memory block is assumed to be
free()'d, not pfree()'d by guc_set_extra. It makes this quite
ugly..

Maybe we shouldn't use the extra for this purpose.

Thoughts?

> > Changing
> > > SyncRepConfigData.members to be char** would be messier..
> >
> > SyncRepGetSyncStandby logic assumes deeply that the sync standby names
> > are constructed as a list.
> > I think that it would entail a radical change in SyncRepGetStandby
> > Another idea is to prepare the some functions that allocate/free
> > element of list using by malloc, free.
> >
>
> Yeah, that could be another way of doing it, but seems like much more work.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v5.patch	text/x-patch	7.8 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-15 08:36:57
Message-ID:	CAD21AoCOL6BCC+FWNCZH_XPgtWc_otnvShMx6_uAcU7Bwb16Rw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 15, 2016 at 3:00 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1+Qsw2hLEhrEBvveKC91uZQhDce9i-4dB8VPz87Ciz+OQ(at)mail(dot)gmail(dot)com>
>> On Thu, Apr 14, 2016 at 1:10 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> wrote:
>> >
>> > On Thu, Apr 14, 2016 at 1:11 PM, Kyotaro HORIGUCHI
>> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > > At Thu, 14 Apr 2016 12:42:06 +0900, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>> wrote in <CAHGQGwH7F5gWfdCT71Ucix_w+8ipR1Owzv9k4VnA1fcMYyfr6w(at)mail(dot)gmail(dot)com
>> >
>> > >> > Yes, this is what I was trying to explain to Fujii-san upthread and
>> I have
>> > >> > also verified that the same works on Windows.
>> > >>
>> > >> Oh, okay, understood. Thanks for explaining that!
>> > >>
>> > >> > I think one point which we
>> > >> > should try to ensure in this patch is whether it is good to use
>> > >> > TopMemoryContext to allocate the memory in the check or assign
>> function or
>> > >> > should we allocate some temporary context (like we do in
>> load_tzoffsets())
>> > >> > to perform parsing and then delete the same at end.
>> > >>
>> > >> Seems yes if some memories are allocated by palloc and they are not
>> > >> free'd while parsing s_s_names.
>> > >>
>> > >> Here are another comment for the patch.
>> > >>
>> > >> -SyncRepFreeConfig(SyncRepConfigData *config)
>> > >> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself)
>> > >>
>> > >> SyncRepFreeConfig() was extended so that it accepts the second boolean
>> > >> argument. But it's always called with the second argument = false. So,
>> > >> I just wonder why that second argument is required.
>> > >>
>> > >> SyncRepConfigData *config =
>> > >> - (SyncRepConfigData *) palloc(sizeof(SyncRepConfigData));
>> > >> + (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
>> > >>
>> > >> Why should we use malloc instead of palloc here?
>> > >>
>> > >> *If* we use malloc, its return value must be checked.
>> > >
>> > > Because it should live irrelevant to any memory context, as guc
>> > > values are so. guc.c provides guc_malloc for this purpose, which
>> > > is a malloc having some simple error handling, so having
>> > > walsender_malloc would be reasonable.
>> > >
>> > > I don't think it's good to use TopMemoryContext for syncrep
>> > > parser. syncrep_scanner.l uses palloc. This basically causes a
>> > > memory leak on all postgres processes.
>> > >
>> > > It might be better if the parser works on the current memory
>> > > context and the caller copies the result on the malloc'ed
>> > > memory. But some list-creation functions using palloc..
>>
>> How about if we do all the parsing stuff in temporary context and then copy
>> the results using TopMemoryContext? I don't think it will be a leak in
>> TopMemoryContext, because next time we try to check/assign s_s_names, it
>> will free the previous result.
>
> I agree with you. A temporary context for the parser seems
> reasonable. TopMemoryContext is created very early in main() so
> palloc on it is effectively the same with malloc.
> One problem is that only the top memory block is assumed to be
> free()'d, not pfree()'d by guc_set_extra. It makes this quite
> ugly..
>
> Maybe we shouldn't use the extra for this purpose.
>
> Thoughts?
>

How about if check_hook just parses parameter in
CurrentMemoryContext(i.g., T_AllocSetContext), and then the
assign_hook copies syncrep_parse_result to TopMemoryContext.
Because syncrep_parse_result is a global variable, these hooks can see it.

Here are some comments.

-SyncRepUpdateConfig(void)
+SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext cxt)

Sorry, it's my bad. itself variables is no longer needed because
SyncRepFreeConfig is called by only one function.

-void
-SyncRepFreeConfig(SyncRepConfigData *config)
+SyncRepConfigData *
+SyncRepCopyConfig(SyncRepConfigData *oldconfig, MemoryContext targetcxt)

I'm not sure targetcxt argument is necessary.

Regards,

--
Masahiko Sawada

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-16 07:20:30
Message-ID:	CAA4eK1LzC=6-EEVuCZhoYnKDHSqKUptV6F+5SavSR5P6jHdfXw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Apr 15, 2016 at 11:30 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote :
> >
> > How about if we do all the parsing stuff in temporary context and then
copy
> > the results using TopMemoryContext? I don't think it will be a leak in
> > TopMemoryContext, because next time we try to check/assign s_s_names, it
> > will free the previous result.
>
> I agree with you. A temporary context for the parser seems
> reasonable. TopMemoryContext is created very early in main() so
> palloc on it is effectively the same with malloc.
>
> One problem is that only the top memory block is assumed to be
> free()'d, not pfree()'d by guc_set_extra. It makes this quite
> ugly..
>

+ newconfig = (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
Is there a reason to use malloc here, can't we use palloc directly? Also
for both the functions SyncRepCopyConfig() and SyncRepFreeConfig(), if we
directly use TopMemoryContext inside the function (if required) rather than
taking it as argument, then it will simplify the code a lot.

+SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext
cxt)

Do we really need 'bool itself' parameter in above function?

+ if (cxt)

+ oldcxt = MemoryContextSwitchTo(cxt);

+ list_free_deep(config->members);

+ if(oldcxt)

+ MemoryContextSwitchTo(oldcxt);
Why do you need MemoryContextSwitchTo for freeing members?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	amit(dot)kapila16(at)gmail(dot)com
Cc:	sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-18 04:24:08
Message-ID:	20160418.132408.196641495.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Sat, 16 Apr 2016 12:50:30 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1LzC=6-EEVuCZhoYnKDHSqKUptV6F+5SavSR5P6jHdfXw(at)mail(dot)gmail(dot)com>
> On Fri, Apr 15, 2016 at 11:30 AM, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > At Fri, 15 Apr 2016 08:52:56 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote :
> > >
> > > How about if we do all the parsing stuff in temporary context and then
> copy
> > > the results using TopMemoryContext? I don't think it will be a leak in
> > > TopMemoryContext, because next time we try to check/assign s_s_names, it
> > > will free the previous result.
> >
> > I agree with you. A temporary context for the parser seems
> > reasonable. TopMemoryContext is created very early in main() so
> > palloc on it is effectively the same with malloc.
> >
> > One problem is that only the top memory block is assumed to be
> > free()'d, not pfree()'d by guc_set_extra. It makes this quite
> > ugly..
> >
>
> + newconfig = (SyncRepConfigData *) malloc(sizeof(SyncRepConfigData));
> Is there a reason to use malloc here, can't we use palloc directly?

The reason is the memory block is to released using free() in
guc_extra_field (not guc_set_extra). Even if we allocate and
deallocate it using palloc/pfree, the 'extra' pointer to the
block in gconf cannot be NULLed there and guc_extra_field tries
freeing it again using free() then bang.

> Also
> for both the functions SyncRepCopyConfig() and SyncRepFreeConfig(), if we
> directly use TopMemoryContext inside the function (if required) rather than
> taking it as argument, then it will simplify the code a lot.

Either is fine. I placed the parameter in order to emphasize
where the memory block is placed on, other than current memory
context nor bare heap, rather than for some practical reasons.

> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext
> cxt)
>
> Do we really need 'bool itself' parameter in above function?
>
> + if (cxt)
>
> + oldcxt = MemoryContextSwitchTo(cxt);
>
> + list_free_deep(config->members);
>
> +
>
> + if(oldcxt)
>
> + MemoryContextSwitchTo(oldcxt);
> Why do you need MemoryContextSwitchTo for freeing members?

Ah, sorry. It's just a slip of my fingers.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	amit(dot)kapila16(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-18 05:15:21
Message-ID:	20160418.141521.96343354.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 15 Apr 2016 17:36:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCOL6BCC+FWNCZH_XPgtWc_otnvShMx6_uAcU7Bwb16Rw(at)mail(dot)gmail(dot)com>
> >> How about if we do all the parsing stuff in temporary context and then copy
> >> the results using TopMemoryContext? I don't think it will be a leak in
> >> TopMemoryContext, because next time we try to check/assign s_s_names, it
> >> will free the previous result.
> >
> > I agree with you. A temporary context for the parser seems
> > reasonable. TopMemoryContext is created very early in main() so
> > palloc on it is effectively the same with malloc.
> > One problem is that only the top memory block is assumed to be
> > free()'d, not pfree()'d by guc_set_extra. It makes this quite
> > ugly..
> >
> > Maybe we shouldn't use the extra for this purpose.
> >
> > Thoughts?
> >
>
> How about if check_hook just parses parameter in
> CurrentMemoryContext(i.g., T_AllocSetContext), and then the
> assign_hook copies syncrep_parse_result to TopMemoryContext.
> Because syncrep_parse_result is a global variable, these hooks can see it.

Hmm. Somewhat uneasy but should work. The attached patch does it.

> Here are some comments.
>
> -SyncRepUpdateConfig(void)
> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext cxt)
>
> Sorry, it's my bad. itself variables is no longer needed because
> SyncRepFreeConfig is called by only one function.
>
> -void
> -SyncRepFreeConfig(SyncRepConfigData *config)
> +SyncRepConfigData *
> +SyncRepCopyConfig(SyncRepConfigData *oldconfig, MemoryContext targetcxt)
>
> I'm not sure targetcxt argument is necessary.

Yes, these are just for signalling so removal doesn't harm.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v6.patch	text/x-patch	6.6 KB

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-20 02:51:09
Message-ID:	CAD21AoC5rrWSk-V79xjVfYr2UqQYrrCKsXkSxZrN9p5YAaeKJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Apr 18, 2016 at 2:15 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Fri, 15 Apr 2016 17:36:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCOL6BCC+FWNCZH_XPgtWc_otnvShMx6_uAcU7Bwb16Rw(at)mail(dot)gmail(dot)com>
>> >> How about if we do all the parsing stuff in temporary context and then copy
>> >> the results using TopMemoryContext? I don't think it will be a leak in
>> >> TopMemoryContext, because next time we try to check/assign s_s_names, it
>> >> will free the previous result.
>> >
>> > I agree with you. A temporary context for the parser seems
>> > reasonable. TopMemoryContext is created very early in main() so
>> > palloc on it is effectively the same with malloc.
>> > One problem is that only the top memory block is assumed to be
>> > free()'d, not pfree()'d by guc_set_extra. It makes this quite
>> > ugly..
>> >
>> > Maybe we shouldn't use the extra for this purpose.
>> >
>> > Thoughts?
>> >
>>
>> How about if check_hook just parses parameter in
>> CurrentMemoryContext(i.g., T_AllocSetContext), and then the
>> assign_hook copies syncrep_parse_result to TopMemoryContext.
>> Because syncrep_parse_result is a global variable, these hooks can see it.
>
> Hmm. Somewhat uneasy but should work. The attached patch does it.
>
>> Here are some comments.
>>
>> -SyncRepUpdateConfig(void)
>> +SyncRepFreeConfig(SyncRepConfigData *config, bool itself, MemoryContext cxt)
>>
>> Sorry, it's my bad. itself variables is no longer needed because
>> SyncRepFreeConfig is called by only one function.
>>
>> -void
>> -SyncRepFreeConfig(SyncRepConfigData *config)
>> +SyncRepConfigData *
>> +SyncRepCopyConfig(SyncRepConfigData *oldconfig, MemoryContext targetcxt)
>>
>> I'm not sure targetcxt argument is necessary.
>
> Yes, these are just for signalling so removal doesn't harm.
>

Thank you for updating the patch.

Here are some comments.

+ Assert(syncrep_parse_result == NULL);
+

Why do we need Assert at this point?
It's possible that syncrep_parse_result is not NULL after setting
s_s_names by ALTER SYSTEM.

+ /*
+ * this memory block will be freed as a part of the
memory contxt for
+ * config file processing.
+ */

s/contxt/context/

Regards,

--
Masahiko Sawada

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sawada(dot)mshk(at)gmail(dot)com
Cc:	amit(dot)kapila16(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, jeff(dot)janes(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-20 07:16:37
Message-ID:	20160420.161637.109686478.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Wed, 20 Apr 2016 11:51:09 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoC5rrWSk-V79xjVfYr2UqQYrrCKsXkSxZrN9p5YAaeKJA(at)mail(dot)gmail(dot)com>
> On Mon, Apr 18, 2016 at 2:15 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > At Fri, 15 Apr 2016 17:36:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoCOL6BCC+FWNCZH_XPgtWc_otnvShMx6_uAcU7Bwb16Rw(at)mail(dot)gmail(dot)com>
> >> How about if check_hook just parses parameter in
> >> CurrentMemoryContext(i.g., T_AllocSetContext), and then the
> >> assign_hook copies syncrep_parse_result to TopMemoryContext.
> >> Because syncrep_parse_result is a global variable, these hooks can see it.
> >
> > Hmm. Somewhat uneasy but should work. The attached patch does it.
..
> Thank you for updating the patch.
>
> Here are some comments.
>
> + Assert(syncrep_parse_result == NULL);
> +
>
> Why do we need Assert at this point?
> It's possible that syncrep_parse_result is not NULL after setting
> s_s_names by ALTER SYSTEM.

Thank you for pointing it out. It is just a trace of an
assumption no longer useful.

> + /*
> + * this memory block will be freed as a part of the
> memory contxt for
> + * config file processing.
> + */
>
> s/contxt/context/

Thanks. I removed whole the comment and the corresponding code
since it's meaningless.

assign_s_s_names causes SEGV when it is called without calling
check_s_s_names. I think that's not the case for this varialbe
because it is unresettable amid a session. It is very uneasy for
me but I don't see a proper means to reset
syncrep_parse_result. MemoryContext deletion hook would work but
it seems to be an overkill for this single use.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v7.patch	text/x-patch	7.1 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-23 10:44:59
Message-ID:	CAA4eK1LBA-xvns-c8YbaOueG95d+VJ1PFy-He3n-OOCF5ReZDw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 20, 2016 at 12:46 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
>
> assign_s_s_names causes SEGV when it is called without calling
> check_s_s_names. I think that's not the case for this varialbe
> because it is unresettable amid a session. It is very uneasy for
> me but I don't see a proper means to reset
> syncrep_parse_result.
>

Is it because syncrep_parse_result is not freed after creating a copy of it
in assign_synchronous_standby_names()? If it so, then I think we need to
call SyncRepFreeConfig(syncrep_parse_result); in
assign_synchronous_standby_names at below place:

+ /* Copy the parsed config into TopMemoryContext if exists */

+ if (syncrep_parse_result)

+ SyncRepConfig = SyncRepCopyConfig(syncrep_parse_result);

Could you please explain how to trigger the scenario where you have seen
SEGV?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-23 11:50:22
Message-ID:	CAB7nPqTP8CjMykaGrWUKDpYytqFDDAyUPuVtZH92GT-LHG1+fA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Apr 23, 2016 at 7:44 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Apr 20, 2016 at 12:46 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>>
>> assign_s_s_names causes SEGV when it is called without calling
>> check_s_s_names. I think that's not the case for this varialbe
>> because it is unresettable amid a session. It is very uneasy for
>> me but I don't see a proper means to reset
>> syncrep_parse_result.
>>
>
> Is it because syncrep_parse_result is not freed after creating a copy of it
> in assign_synchronous_standby_names()? If it so, then I think we need to
> call SyncRepFreeConfig(syncrep_parse_result); in
> assign_synchronous_standby_names at below place:
>
> + /* Copy the parsed config into TopMemoryContext if exists */
>
> + if (syncrep_parse_result)
>
> + SyncRepConfig = SyncRepCopyConfig(syncrep_parse_result);
>
> Could you please explain how to trigger the scenario where you have seen
> SEGV?

Seeing this discussion moving on, I am wondering if we should not
discuss those improvements for 9.7. We are getting close to beta 1,
and this is clearly not a bug, and it's not like HEAD is broken. So I
think that we should not take the risk to make the code unstable at
this stage.
--
Michael

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-23 12:11:07
Message-ID:	CAA4eK1J3iJ90USJC0Vfmt-TLKpmkVQ1PVWdcTAFVdhbR8bfaBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Apr 23, 2016 at 5:20 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:
>
> On Sat, Apr 23, 2016 at 7:44 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Wed, Apr 20, 2016 at 12:46 PM, Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >>
> >>
> >> assign_s_s_names causes SEGV when it is called without calling
> >> check_s_s_names. I think that's not the case for this varialbe
> >> because it is unresettable amid a session. It is very uneasy for
> >> me but I don't see a proper means to reset
> >> syncrep_parse_result.
> >>
> >
> > Is it because syncrep_parse_result is not freed after creating a copy
of it
> > in assign_synchronous_standby_names()? If it so, then I think we need
to
> > call SyncRepFreeConfig(syncrep_parse_result); in
> > assign_synchronous_standby_names at below place:
> >
> > + /* Copy the parsed config into TopMemoryContext if exists */
> >
> > + if (syncrep_parse_result)
> >
> > + SyncRepConfig = SyncRepCopyConfig(syncrep_parse_result);
> >
> > Could you please explain how to trigger the scenario where you have seen
> > SEGV?
>
> Seeing this discussion moving on, I am wondering if we should not
> discuss those improvements for 9.7.
>

The main point for this improvement is that the handling for guc s_s_names
is not similar to what we do for other somewhat similar guc's and which
causes in-efficiency in non-hot code path (less used code). So, we can
push this improvement to 9.7, but OTOH we can also consider it as a
non-beta blocker issue and see if we can make this code path better in the
mean time.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-23 14:12:03
Message-ID:	476.1461420723@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> The main point for this improvement is that the handling for guc s_s_names
> is not similar to what we do for other somewhat similar guc's and which
> causes in-efficiency in non-hot code path (less used code).

This is not about efficiency, this is about correctness. The proposed
v7 patch is flat out not acceptable, not now and not for 9.7 either,
because it introduces a GUC assign hook that can easily fail (eg, through
out-of-memory for the copy step). Assign hook functions need to be
incapable of failure. I do not see any good reason why this one cannot
satisfy that requirement, either. It just needs to make use of the
"extra" mechanism to pass back an already-suitably-long-lived result from
check_synchronous_standby_names. See check_timezone_abbreviations/
assign_timezone_abbreviations for a model to follow. You are going to
need to find a way to package the parse result into a single malloc'd
blob, though, because that's as much as guc.c can keep track of for an
"extra" value.

regards, tom lane

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-26 02:02:25
Message-ID:	20160426.110225.35506931.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Sat, 23 Apr 2016 10:12:03 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <476(dot)1461420723(at)sss(dot)pgh(dot)pa(dot)us>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > The main point for this improvement is that the handling for guc s_s_names
> > is not similar to what we do for other somewhat similar guc's and which
> > causes in-efficiency in non-hot code path (less used code).
>
> This is not about efficiency, this is about correctness. The proposed
> v7 patch is flat out not acceptable, not now and not for 9.7 either,
> because it introduces a GUC assign hook that can easily fail (eg, through
> out-of-memory for the copy step). Assign hook functions need to be
> incapable of failure. I do not see any good reason why this one cannot
> satisfy that requirement, either. It just needs to make use of the
> "extra" mechanism to pass back an already-suitably-long-lived result from
> check_synchronous_standby_names. See check_timezone_abbreviations/
> assign_timezone_abbreviations for a model to follow.

I had already seen there before the v7 and had the same feeling
below in mind but packing in a blob needs to use other than List
to hold the name list (just should be an array) and it is
followed by the necessity of many changes in where the list is
accessed. But the result is hopeless as you mentioned :(

> You are going to
> need to find a way to package the parse result into a single malloc'd
> blob, though, because that's as much as guc.c can keep track of for an
> "extra" value.

Ok, I'll post the v8 with the blob solution sooner.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-26 03:45:46
Message-ID:	20160426.124546.237896223.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, attached is the new version v8.

At Tue, 26 Apr 2016 11:02:25 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20160426(dot)110225(dot)35506931(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Sat, 23 Apr 2016 10:12:03 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <476(dot)1461420723(at)sss(dot)pgh(dot)pa(dot)us>
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > > The main point for this improvement is that the handling for guc s_s_names
> > > is not similar to what we do for other somewhat similar guc's and which
> > > causes in-efficiency in non-hot code path (less used code).
> >
> > This is not about efficiency, this is about correctness. The proposed
> > v7 patch is flat out not acceptable, not now and not for 9.7 either,
> > because it introduces a GUC assign hook that can easily fail (eg, through
> > out-of-memory for the copy step). Assign hook functions need to be
> > incapable of failure. I do not see any good reason why this one cannot
> > satisfy that requirement, either. It just needs to make use of the
> > "extra" mechanism to pass back an already-suitably-long-lived result from
> > check_synchronous_standby_names. See check_timezone_abbreviations/
> > assign_timezone_abbreviations for a model to follow.
>
> I had already seen there before the v7 and had the same feeling
> below in mind but packing in a blob needs to use other than List
> to hold the name list (just should be an array) and it is
> followed by the necessity of many changes in where the list is
> accessed. But the result is hopeless as you mentioned :(
>
> > You are going to
> > need to find a way to package the parse result into a single malloc'd
> > blob, though, because that's as much as guc.c can keep track of for an
> > "extra" value.
>
> Ok, I'll post the v8 with the blob solution sooner.

Hmm. It was way easier than I thought. The attached v8 patch does,

- Changed SyncRepConfigData from a struct using liked list to a
blob. Since the former struct is useful in parsing, it is still
used and converted into the latter form in check_s_s_names.

- Make assign_s_s_names not to do nothing other than just
assigning SyncRepConfig.

- Change SyncRepGetSyncStandbys to read the latter form of
configuration.

- SyncRepFreeConfig is removed since it is no longer needed.

It passes both make check and recovery/make check.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v8.patch	text/x-patch	8.8 KB

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-26 04:27:50
Message-ID:	CAA4eK1KGVrQTueP2Rijjg_FNQ_TU3n5rt8-X5a0LaEzUQ-+i-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Apr 26, 2016 at 9:15 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hello, attached is the new version v8.
>
> At Tue, 26 Apr 2016 11:02:25 +0900 (Tokyo Standard Time), Kyotaro
> HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <
> 20160426(dot)110225(dot)35506931(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > At Sat, 23 Apr 2016 10:12:03 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
> in <476(dot)1461420723(at)sss(dot)pgh(dot)pa(dot)us>
> > > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > > > The main point for this improvement is that the handling for guc
> s_s_names
> > > > is not similar to what we do for other somewhat similar guc's and
> which
> > > > causes in-efficiency in non-hot code path (less used code).
> > >
> > > This is not about efficiency, this is about correctness. The proposed
> > > v7 patch is flat out not acceptable, not now and not for 9.7 either,
> > > because it introduces a GUC assign hook that can easily fail (eg,
> through
> > > out-of-memory for the copy step). Assign hook functions need to be
> > > incapable of failure.

It seems to me that similar problem can be there
for assign_pgstat_temp_directory() as it can also lead to "out of memory"
error. However, in general I understand your concern and I think we should
avoid any such failure in assign functions.

> I do not see any good reason why this one cannot
> > > satisfy that requirement, either. It just needs to make use of the
> > > "extra" mechanism to pass back an already-suitably-long-lived result
> from
> > > check_synchronous_standby_names. See check_timezone_abbreviations/
> > > assign_timezone_abbreviations for a model to follow.
> >
> > I had already seen there before the v7 and had the same feeling
> > below in mind but packing in a blob needs to use other than List
> > to hold the name list (just should be an array) and it is
> > followed by the necessity of many changes in where the list is
> > accessed. But the result is hopeless as you mentioned :(
> >
> > > You are going to
> > > need to find a way to package the parse result into a single malloc'd
> > > blob, though, because that's as much as guc.c can keep track of for an
> > > "extra" value.
> >
> > Ok, I'll post the v8 with the blob solution sooner.
>
> Hmm. It was way easier than I thought. The attached v8 patch does,
>
> - Changed SyncRepConfigData from a struct using liked list to a
> blob. Since the former struct is useful in parsing, it is still
> used and converted into the latter form in check_s_s_names.
>
> - Make assign_s_s_names not to do nothing other than just
> assigning SyncRepConfig.
>
> - Change SyncRepGetSyncStandbys to read the latter form of
> configuration.
>
> - SyncRepFreeConfig is removed since it is no longer needed.
>
>
+ /* Convert SyncRepConfig into the packed struct fit to guc extra */

+ pconf = (SyncRepConfigData *)

+ malloc(SizeOfSyncRepConfig(

+ list_length(syncrep_parse_result->members)));

I think there should be a check for malloc failure in above code.

+ /* No further need for syncrep_parse_result */

+ syncrep_parse_result = NULL;

Isn't this a memory leak? Shouldn't we need to free the corresponding
memory as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	amit(dot)kapila16(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-27 01:14:34
Message-ID:	20160427.101434.186007757.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 26 Apr 2016 09:57:50 +0530, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote in <CAA4eK1KGVrQTueP2Rijjg_FNQ_TU3n5rt8-X5a0LaEzUQ-+i-Q(at)mail(dot)gmail(dot)com>
> > > > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> > > > > The main point for this improvement is that the handling for guc
> > s_s_names
> > > > > is not similar to what we do for other somewhat similar guc's and
> > which
> > > > > causes in-efficiency in non-hot code path (less used code).
> > > >
> > > > This is not about efficiency, this is about correctness. The proposed
> > > > v7 patch is flat out not acceptable, not now and not for 9.7 either,
> > > > because it introduces a GUC assign hook that can easily fail (eg,
> > through
> > > > out-of-memory for the copy step). Assign hook functions need to be
> > > > incapable of failure.
>
>
> It seems to me that similar problem can be there
> for assign_pgstat_temp_directory() as it can also lead to "out of memory"
> error. However, in general I understand your concern and I think we should
> avoid any such failure in assign functions.

I noticed that forgetting error handling of malloc then searched
for the callers of guc_malloc just now and found the same
thing. This should be addressed as another issue.

> > > > You are going to
> > > > need to find a way to package the parse result into a single malloc'd
> > > > blob, though, because that's as much as guc.c can keep track of for an
> > > > "extra" value.
> > >
> > > Ok, I'll post the v8 with the blob solution sooner.
> >
> > Hmm. It was way easier than I thought. The attached v8 patch does,
...
> + /* Convert SyncRepConfig into the packed struct fit to guc extra */
>
> + pconf = (SyncRepConfigData *)
>
> + malloc(SizeOfSyncRepConfig(
>
> + list_length(syncrep_parse_result->members)));
>
> I think there should be a check for malloc failure in above code.

Yes, I'm ashamed to have forgotten what I mentioned just
before. Added the same thing with guc_malloc. The error is at
ERROR since parsing GUC files should continue on parse errors
(and seeing check_log_destination).

> + /* No further need for syncrep_parse_result */
>
> + syncrep_parse_result = NULL;
>
> Isn't this a memory leak? Shouldn't we need to free the corresponding
> memory as well.

It is palloc'ed on the current context, which AFAICS would be
'config file processing' or 'PortalHeapMemory'for the ALTER
SYSTEM case. Both of them are rather short-living. I don't think
that leaving them is a problem on both of the cases and there's
no point freeing only it among those (if any) allocated in the
generated code by bison and flex... I suppose.

I just added a comment in the v9.

| * No further need for syncrep_parse_result. The memory blocks are
| * released along with the deletion of the current context.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v9.patch	text/x-patch	0 bytes

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	amit(dot)kapila16(at)gmail(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-27 03:58:35
Message-ID:	CAM103DtAHXJr=nNriSKwoQ1HC4BAnf6qyCXgxSxD07F2k6Cj2Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Apr 27, 2016 at 10:14 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I just added a comment in the v9.

Sorry, I have attached an empty patch. This is another one that should
be with content.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
fix_sync_rep_update_conf_v9.patch	application/octet-stream	9.0 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-27 15:10:45
Message-ID:	27732.1461769845@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Sorry, I have attached an empty patch. This is another one that should
> be with content.

I started to review this, and in passing came across this gem in
syncrep_scanner.l:

/*
* flex emits a yy_fatal_error() function that it calls in response to
* critical errors like malloc failure, file I/O errors, and detection of
* internal inconsistency. That function prints a message and calls exit().
* Mutate it to instead call ereport(FATAL), which terminates this process.
*
* The process that causes this fatal error should be terminated.
* Otherwise it has to abandon the new setting value of
* synchronous_standby_names and keep running with the previous one
* while the other processes switch to the new one.
* This inconsistency of the setting that each process is based on
* can cause a serious problem. Though it's basically not good idea to
* use FATAL here because it can take down the postmaster,
* we should do that in order to avoid such an inconsistency.
*/
#undef fprintf
#define fprintf(file, fmt, msg) syncrep_flex_fatal(fmt, msg)

static void
syncrep_flex_fatal(const char *fmt, const char *msg)
{
ereport(FATAL, (errmsg_internal("%s", msg)));
}

This is the faultiest reasoning possible. There are a hundred reasons why
a process might fail to absorb a GUC setting, and causing just one such
code path to FATAL out is not going to improve system stability one bit.

If you think it is absolutely imperative that all processes in the system
have identical synchronous_standby_names settings, then we need to make
it be PGC_POSTMASTER, not indulge in half-baked non-solutions like this.
But I'd like to know why that is so essential. It looks to me like what
matters is only whether each individual walsender thinks its client is
a sync standby, and so inconsistent settings between different walsenders
don't really matter. Which is a good thing, because if it's to remain
SIGHUP, you can't promise that they'll all absorb a new value at the same
instant anyway.

In short, I don't see any good reason not to make this be a plain ERROR
like it is in every other scanner in the backend.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-27 22:05:26
Message-ID:	3167.1461794726@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Sorry, I have attached an empty patch. This is another one that should
> be with content.

I pushed this after whacking it around some, and cleaning up some
sort-of-related problems in the syncrep parser/lexer.

There remains a point that I'm not very happy about, which is the code
in check_synchronous_standby_names to emit a WARNING if the num_sync
setting is too large. That's a pretty bad compromise: we should either
decide that the case is legal or that it is not. If it's legal, people
who are correctly using the case will not thank us for logging a WARNING
every single time the postmaster gets a SIGHUP (and those who aren't using
it correctly will have their systems freezing up, warning or no warning).
If it's not legal, we should make it an error not a warning.

My inclination is to just rip out the warning. But I wonder whether the
desire to have one doesn't imply that the semantics are poorly chosen
and should be revisited.

regards, tom lane

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-28 08:39:07
Message-ID:	20160428.173907.219222048.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Wed, 27 Apr 2016 18:05:26 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <3167(dot)1461794726(at)sss(dot)pgh(dot)pa(dot)us>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > Sorry, I have attached an empty patch. This is another one that should
> > be with content.
>
> I pushed this after whacking it around some, and cleaning up some
> sort-of-related problems in the syncrep parser/lexer.

Thank you for pushing this (with improvements) and improvements
of synchronous_standby_names. I agree to the discussion that
standby names should have restriction not to break possible
extension to be happen near future.

> There remains a point that I'm not very happy about, which is the code
> in check_synchronous_standby_names to emit a WARNING if the num_sync
> setting is too large. That's a pretty bad compromise: we should either
> decide that the case is legal or that it is not. If it's legal, people
> who are correctly using the case will not thank us for logging a WARNING
> every single time the postmaster gets a SIGHUP (and those who aren't using
> it correctly will have their systems freezing up, warning or no warning).
> If it's not legal, we should make it an error not a warning.

This specification makes the code a bit complex and makes the
document a bit less understandable. It seems to me somewhat
suspicious that allowing duplcate (potentially synchronous)
walrecivers is so useful as to justify such disadvantages.

In spite of this, my inclination is also the same as the
following:p rather than making the behavior consistent and clear.

> My inclination is to just rip out the warning.

Is there anyone object to removing the warining?

> But I wonder whether the
> desire to have one doesn't imply that the semantics are poorly chosen
> and should be revisited.

We already have abandoned a bit of backward compatibility in this
feature.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	amit(dot)kapila16(at)gmail(dot)com, michael(dot)paquier(at)gmail(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)gmail(dot)com, jeff(dot)janes(at)gmail(dot)com, thomas(dot)munro(at)enterprisedb(dot)com, robertmhaas(at)gmail(dot)com, thom(at)linux(dot)com, memissemerson(at)gmail(dot)com, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Support for N synchronous standby servers - take 2
Date:	2016-04-30 14:55:31
Message-ID:	28039.1462028131@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> At Wed, 27 Apr 2016 18:05:26 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <3167(dot)1461794726(at)sss(dot)pgh(dot)pa(dot)us>
>> My inclination is to just rip out the warning.

> Is there anyone object to removing the warining?

Hearing no objections, done.

regards, tom lane