Coding TODO for 8.4: Synch Rep

Lists: pgsql-hackers
From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Coding TODO for 8.4: Synch Rep
Date: 2008-12-16 05:27:47
Message-ID: 3f0b79eb0812152127s463e8600u569242b971523bae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I'd like to clarify the coding TODO of Synch Rep for 8.4. If indispensable
TODO item is not listed, please feel free to let me know.

1. replication_timeout_action (GUC)

This is new GUC to specify the reaction to replication_timeout. In the
latest patch, the user cannot configure the reaction, and the primary
always continue processing after the timeout occurs. In the next, the
user can choose the reaction:

- standalone
When the backend waits for replication much longer than the
specified time (replication_timeout), that is, the timeout occurs, the
backend sends replication_timeout interrupt to walsender. Then,
walsender closes the connection to the standby, wake all waiting
backends and exits. All the processing go on the standalone
primary.

- down
When the timeout occurs, walsender signals SIGQUIT to
postmaster instead of waking all backends, then the primary shuts
down immediately.

2. log_min_duration_replication (GUC)

If the backend waits much longer than log_min_duration_replication,
the warning log message is produced like log_min_duration_xxx.
Unit is not percent against the timeout but msec because "msec" is
more convenient.

3. recovery.conf

I need to change the recovery.conf patch to work with EXEC_BACKEND.
Someone advised locally me to move the options of replication to
postgresql.conf for convenient. That is, in order to start replication,
all the configuration files the user has to care is postgresql.conf.
Which do you think is best?

The options which I'm going to use for replication are the following.

- host of the primary (new)
- port of the primary (new)
- username to connect to the primary (new)
- restore_command

4. sleeping
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php

I'm looking for the better idea. How should we resolve that problem?
Only reduce the timeout of pq_wait to 100ms? Get rid of
SA_RESTART only during pq_wait as follows?

remove SA_RESTART
pq_wait()
add SA_RESTART

5. Extend archive_mode
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00718.php

6. Continuous recovery without pg_standby
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00296.php

7. Switch modes on the standby
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00503.php

8. Define new trigger to promote the standby

In the latest patch, since the standby always recover with pg_standby,
the standby is promoted by only the trigger file of pg_standby. But, the
architecture should be changed as indicated #6, 7. We need to define
new trigger to promote the standby to the primary. I have two ideas:

- Trigger based on file
Like pg_standby, startup process also check whether the trigger file
exists periodically. The path of trigger file is specified in recovery.conf.
The advantage of this idea is that one trigger file can promote the
standby easily whether it's in FLS or SLS mode.

- Trigger based on signal
If postmaster received SIGTERM during recovery, the standby stops
walreceiver, completes recovery and becomes the primary. In current
HEAD, SIGTERM (Smart Shutdown) during recovery is not used yet.

Which idea is better? Or, do you have any other better idea?

In my design, trigger is always required to promote the standby. I mean,
the standby is not permitted to complete recovery and become the
primary without trigger. Even if the standby finds the corruption of WAL
record, it waits for trigger before ending recovery. This is because
postgres cannot make a correct decision whether to end recovery,
and wrong decision might cause split brain and undesirable increment
of timeline. Is this design OK?

9. New synchronous option on the standby
http://archives.postgresql.org/pgsql-hackers/2008-12/msg01160.php

Pending now. These features are indispensable for 8.4?

10. Hang all connections everything is setup for "sync rep"
http://archives.postgresql.org/pgsql-hackers/2008-12/msg00868.php

Since there are many TODO items, I'm worried about the deadline.
When is the deadline of this commit fest? December 31st? first half
of January? ...etc?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Coding TODO for 8.4: Synch Rep
Date: 2008-12-16 13:01:49
Message-ID: 20081216130149.GD4741@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao escribió:

> Since there are many TODO items, I'm worried about the deadline.
> When is the deadline of this commit fest? December 31st? first half
> of January? ...etc?

November 1st was the deadline. We're now in feature freeze.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Coding TODO for 8.4: Synch Rep
Date: 2008-12-17 01:06:01
Message-ID: 200812162006.01319.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 16 December 2008 08:01:49 Alvaro Herrera wrote:
> Fujii Masao escribió:
> > Since there are many TODO items, I'm worried about the deadline.
> > When is the deadline of this commit fest? December 31st? first half
> > of January? ...etc?
>
> November 1st was the deadline. We're now in feature freeze.
>

November 1st was when the commitfest started, I think he was wondering when
the commitfest was going to end. This being the last commitfest, it runs
differently than others; as Alvaro mentioned, we're now in feature freeze,
but when that will end is still undetermined. In other words, if you have a
patch for 8.4 that is already submitted but not committed, keep hacking!

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Coding TODO for 8.4: Synch Rep
Date: 2008-12-18 00:55:58
Message-ID: 1229561758.4793.234.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-12-16 at 14:27 +0900, Fujii Masao wrote:

> I'd like to clarify the coding TODO of Synch Rep for 8.4. If indispensable
> TODO item is not listed, please feel free to let me know.

> Since there are many TODO items, I'm worried about the deadline.
> When is the deadline of this commit fest? December 31st? first half
> of January? ...etc?

I think we're in a difficult position. The changes I've requested are
major architecture changes, not that difficult to implement. I would
have to say *not* doing them leaves us in a situation with a fairly
awful architecture and it really doesn't make sense to sacrifice long
term design for a few weeks.

I don't think the review or scale of change is any different to other
major patches in recent times. If people want to spend time discussing
the points again, we can. Changes always seem like heavy lifting, but
there's nothing I've asked for that is difficult, it's all
straightforward stuff.

In all honesty, I didn't think you were going to make the deadline. But
you did, though with significantly reduced discussion on the key issues.
That's definitely not a problem with me, sure we're a few weeks behind
where we wanted to be, but that's nothing when you look at what we're
dealing with and what we will gain.

> 1. replication_timeout_action (GUC)
>
> This is new GUC to specify the reaction to replication_timeout. In the
> latest patch, the user cannot configure the reaction, and the primary
> always continue processing after the timeout occurs. In the next, the
> user can choose the reaction:
>
> - standalone
> When the backend waits for replication much longer than the
> specified time (replication_timeout), that is, the timeout occurs, the
> backend sends replication_timeout interrupt to walsender. Then,
> walsender closes the connection to the standby, wake all waiting
> backends and exits. All the processing go on the standalone
> primary.
>
> - down
> When the timeout occurs, walsender signals SIGQUIT to
> postmaster instead of waking all backends, then the primary shuts
> down immediately.

I'd put this as a much lower priority than other changes. It might still
be required, but lets get it out there as soon as possible and see. If
that means we have to punt on it entirely, so be it.

> 2. log_min_duration_replication (GUC)
>
> If the backend waits much longer than log_min_duration_replication,
> the warning log message is produced like log_min_duration_xxx.
> Unit is not percent against the timeout but msec because "msec" is
> more convenient.

Yes, but low priority.

> 3. recovery.conf
>
> I need to change the recovery.conf patch to work with EXEC_BACKEND.
> Someone advised locally me to move the options of replication to
> postgresql.conf for convenient. That is, in order to start replication,
> all the configuration files the user has to care is postgresql.conf.
> Which do you think is best?
>
> The options which I'm going to use for replication are the following.
>
> - host of the primary (new)
> - port of the primary (new)
> - username to connect to the primary (new)
> - restore_command

Why not just have walreceiver explicitly read recovery.conf? That's what
Startup process does. (It's only those two processes, right?)

Reworking everything in the way described above would take ages and
introduce lots of bugs.

> 4. sleeping
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php
>
> I'm looking for the better idea. How should we resolve that problem?
> Only reduce the timeout of pq_wait to 100ms? Get rid of
> SA_RESTART only during pq_wait as follows?
>
> remove SA_RESTART
> pq_wait()
> add SA_RESTART

Not sure, will consider. Ask others as well.

> 5. Extend archive_mode
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00718.php

Yes, definitely.

> 6. Continuous recovery without pg_standby
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00296.php

Yes, definitely.

> 7. Switch modes on the standby
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00503.php

This is a consequence of 5 and 6, not an additional feature. It's part
of the same thing. So yes, definitely.

> 8. Define new trigger to promote the standby
>
> In the latest patch, since the standby always recover with pg_standby,
> the standby is promoted by only the trigger file of pg_standby. But, the
> architecture should be changed as indicated #6, 7. We need to define
> new trigger to promote the standby to the primary. I have two ideas:
>
> - Trigger based on file
> Like pg_standby, startup process also check whether the trigger file
> exists periodically. The path of trigger file is specified in recovery.conf.
> The advantage of this idea is that one trigger file can promote the
> standby easily whether it's in FLS or SLS mode.
>
> - Trigger based on signal
> If postmaster received SIGTERM during recovery, the standby stops
> walreceiver, completes recovery and becomes the primary. In current
> HEAD, SIGTERM (Smart Shutdown) during recovery is not used yet.
>
> Which idea is better? Or, do you have any other better idea?
>
> In my design, trigger is always required to promote the standby. I mean,
> the standby is not permitted to complete recovery and become the
> primary without trigger. Even if the standby finds the corruption of WAL
> record, it waits for trigger before ending recovery. This is because
> postgres cannot make a correct decision whether to end recovery,
> and wrong decision might cause split brain and undesirable increment
> of timeline. Is this design OK?

We don't need this change now because of (7). We aren't using pg_standby
except for the initial stage so its much less important to do this for
failover. So low priority, if at all.

> 9. New synchronous option on the standby
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01160.php
>
>
> Pending now. These features are indispensable for 8.4?

Given comments, yes.

I don't see that as hard. Is there a problem in implementation? This
seems the easiest thing to implement, just sneak in an fsync().

> 10. Hang all connections everything is setup for "sync rep"
> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00868.php

IMHO don't really think we can do this sensibly until we can support
multiple standby nodes. If we did this it would imply that if the
standby was down then we should stop processing transactions, which is
just a recipe for low availability, not high availability.

ISTM we should offer a simple boolean function which says whether
streaming replication is connected or not. If people want to defer
connection until replication is connected then they can create a more
complex startup script, just as they do to ensure correct sequence of
all the required services already.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Coding TODO for 8.4: Synch Rep
Date: 2008-12-18 03:04:06
Message-ID: 3f0b79eb0812171904x220b5f2cw3b639320c7473b43@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Thu, Dec 18, 2008 at 9:55 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Tue, 2008-12-16 at 14:27 +0900, Fujii Masao wrote:
>
>> I'd like to clarify the coding TODO of Synch Rep for 8.4. If indispensable
>> TODO item is not listed, please feel free to let me know.
>
>> Since there are many TODO items, I'm worried about the deadline.
>> When is the deadline of this commit fest? December 31st? first half
>> of January? ...etc?
>
> I think we're in a difficult position. The changes I've requested are
> major architecture changes, not that difficult to implement. I would
> have to say *not* doing them leaves us in a situation with a fairly
> awful architecture and it really doesn't make sense to sacrifice long
> term design for a few weeks.
>
> I don't think the review or scale of change is any different to other
> major patches in recent times. If people want to spend time discussing
> the points again, we can. Changes always seem like heavy lifting, but
> there's nothing I've asked for that is difficult, it's all
> straightforward stuff.

You are right. But I'm afraid that my coding speed is not so high as some
great hackers including you ;-) Yeah, I'm ready for happy Coding Xmas!

>
> In all honesty, I didn't think you were going to make the deadline. But
> you did, though with significantly reduced discussion on the key issues.
> That's definitely not a problem with me, sure we're a few weeks behind
> where we wanted to be, but that's nothing when you look at what we're
> dealing with and what we will gain.
>
>> 1. replication_timeout_action (GUC)
>>
>> This is new GUC to specify the reaction to replication_timeout. In the
>> latest patch, the user cannot configure the reaction, and the primary
>> always continue processing after the timeout occurs. In the next, the
>> user can choose the reaction:
>>
>> - standalone
>> When the backend waits for replication much longer than the
>> specified time (replication_timeout), that is, the timeout occurs, the
>> backend sends replication_timeout interrupt to walsender. Then,
>> walsender closes the connection to the standby, wake all waiting
>> backends and exits. All the processing go on the standalone
>> primary.
>>
>> - down
>> When the timeout occurs, walsender signals SIGQUIT to
>> postmaster instead of waking all backends, then the primary shuts
>> down immediately.
>
> I'd put this as a much lower priority than other changes. It might still
> be required, but lets get it out there as soon as possible and see. If
> that means we have to punt on it entirely, so be it.

Okey.

>
>> 2. log_min_duration_replication (GUC)
>>
>> If the backend waits much longer than log_min_duration_replication,
>> the warning log message is produced like log_min_duration_xxx.
>> Unit is not percent against the timeout but msec because "msec" is
>> more convenient.
>
> Yes, but low priority.

Okey.

>
>> 3. recovery.conf
>>
>> I need to change the recovery.conf patch to work with EXEC_BACKEND.
>> Someone advised locally me to move the options of replication to
>> postgresql.conf for convenient. That is, in order to start replication,
>> all the configuration files the user has to care is postgresql.conf.
>> Which do you think is best?
>>
>> The options which I'm going to use for replication are the following.
>>
>> - host of the primary (new)
>> - port of the primary (new)
>> - username to connect to the primary (new)
>> - restore_command
>
> Why not just have walreceiver explicitly read recovery.conf? That's what
> Startup process does. (It's only those two processes, right?)
>
> Reworking everything in the way described above would take ages and
> introduce lots of bugs.

Yes, I will make startup and walreceiver read recovery.conf separately.

>
>> 4. sleeping
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00438.php
>>
>> I'm looking for the better idea. How should we resolve that problem?
>> Only reduce the timeout of pq_wait to 100ms? Get rid of
>> SA_RESTART only during pq_wait as follows?
>>
>> remove SA_RESTART
>> pq_wait()
>> add SA_RESTART
>
> Not sure, will consider. Ask others as well.
>
>> 5. Extend archive_mode
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00718.php
>
> Yes, definitely.

Okey.

>
>> 6. Continuous recovery without pg_standby
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00296.php
>
> Yes, definitely.

Okey.

>
>> 7. Switch modes on the standby
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00503.php
>
> This is a consequence of 5 and 6, not an additional feature. It's part
> of the same thing. So yes, definitely.

Yes.

>
>> 8. Define new trigger to promote the standby
>>
>> In the latest patch, since the standby always recover with pg_standby,
>> the standby is promoted by only the trigger file of pg_standby. But, the
>> architecture should be changed as indicated #6, 7. We need to define
>> new trigger to promote the standby to the primary. I have two ideas:
>>
>> - Trigger based on file
>> Like pg_standby, startup process also check whether the trigger file
>> exists periodically. The path of trigger file is specified in recovery.conf.
>> The advantage of this idea is that one trigger file can promote the
>> standby easily whether it's in FLS or SLS mode.
>>
>> - Trigger based on signal
>> If postmaster received SIGTERM during recovery, the standby stops
>> walreceiver, completes recovery and becomes the primary. In current
>> HEAD, SIGTERM (Smart Shutdown) during recovery is not used yet.
>>
>> Which idea is better? Or, do you have any other better idea?
>>
>> In my design, trigger is always required to promote the standby. I mean,
>> the standby is not permitted to complete recovery and become the
>> primary without trigger. Even if the standby finds the corruption of WAL
>> record, it waits for trigger before ending recovery. This is because
>> postgres cannot make a correct decision whether to end recovery,
>> and wrong decision might cause split brain and undesirable increment
>> of timeline. Is this design OK?
>
> We don't need this change now because of (7). We aren't using pg_standby
> except for the initial stage so its much less important to do this for
> failover. So low priority, if at all.

I think that this feature is requisite. Otherwise, startup process
might wait for
next WAL record forever. And, since this is the problem about interface,
I wanted to hear from users before conding it.

>
>> 9. New synchronous option on the standby
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg01160.php
>>
>>
>> Pending now. These features are indispensable for 8.4?
>
> Given comments, yes.
>
> I don't see that as hard. Is there a problem in implementation? This
> seems the easiest thing to implement, just sneak in an fsync().

Ooops! Sorry for my confusing writing.
"Pending now" covers the following items, that is, (10). Of course,
I will add new synchronous option (fsync mode).

>
>> 10. Hang all connections everything is setup for "sync rep"
>> http://archives.postgresql.org/pgsql-hackers/2008-12/msg00868.php
>
> IMHO don't really think we can do this sensibly until we can support
> multiple standby nodes. If we did this it would imply that if the
> standby was down then we should stop processing transactions, which is
> just a recipe for low availability, not high availability.
>
> ISTM we should offer a simple boolean function which says whether
> streaming replication is connected or not. If people want to defer
> connection until replication is connected then they can create a more
> complex startup script, just as they do to ensure correct sequence of
> all the required services already.

OK, I wiil add that function.

Name: pg_is_in_replication
Args: None
Returns: boolean
Description: whether replication is in progress

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "pgsql-hackers list" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Coding TODO for 8.4: Synch Rep
Date: 2008-12-18 03:25:58
Message-ID: 20081218120728.AB0A.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com> wrote:

> > ISTM we should offer a simple boolean function which says whether
> > streaming replication is connected or not. If people want to defer
> > connection until replication is connected then they can create a more
> > complex startup script, just as they do to ensure correct sequence of
> > all the required services already.
>
> OK, I wiil add that function.
>
> Name: pg_is_in_replication
> Args: None
> Returns: boolean
> Description: whether replication is in progress

It might not be an item for 8.4, we'd better to provide a method
to query information about standby servers something like:

- IP address of the standby server.
- Time of the connection established.
- Statistics information of replication.
- # of sent bytes
- average response time
etc...

Those information will be two or more rows when we support
multiple standby servers. So, the method should be an system
view (like pg_standby_servers), not a scalar function.
If there were the view, pg_is_in_replication() could be defined
as "SELECT count(*) > 0 FROM pg_standby_servers" .

However, pg_is_in_replication() is enough for 8.4, so I think
it has low priority. IP address of standby can be retrived with
ps command already.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center