Should we remove "not fast" promotion at all?

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tomonari Katsumata <t(dot)katsumata1122(at)gmail(dot)com>
Cc: Tomonari Katsumata <katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Should we remove "not fast" promotion at all?
Date: 2013-08-05 18:24:58
Message-ID: CAHGQGwGYkF+CvpOMdxaO=+aNAzc1Oo9O4LqWo50MxpvFj+0VOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

We discussed the $SUBJECT in the following threads:
http://www.postgresql.org/message-id/CA+TgmoZbR+WL8E7MF_KRp6fY4FD2pMr11TPiuyjMFX_Vtg1Wrw@mail.gmail.com
http://www.postgresql.org/message-id/CAHGQGwEBUvgcx8X+Z0Hh+VdwYcJ8KCuRuLt1jSsxeLxPcX=0_w@mail.gmail.com

Our consensus seems to remove "not fast" promotion at all
because there is no use case for that promotion.

Attached patch removes "not fast" promotion. Barring any objections,
I will commit this patch.

Regards,

On Sat, Aug 3, 2013 at 4:31 PM, Tomonari Katsumata
<t(dot)katsumata1122(at)gmail(dot)com> wrote:
> Hi,
>
> I made a patch for REL9_3_STABLE which gets rid of
> old promote processing. please check it.
> This patch make PostgreSQL do fast promoting(*) always.
> (*) which means skipping long checkpoint before increasing
> timeline.
>
> And after this, I'll do make another patch for unlinking files which are
> created by user as a trigger_file or "pg_ctl promote" command.
>
> ---------------
> Tomonari Katsumata
> 2013/7/30 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
>>
>> On Sat, Jul 27, 2013 at 6:57 PM, Tomonari Katsumata
>> <t(dot)katsumata1122(at)gmail(dot)com> wrote:
>> > Hi,
>> >
>> >
>> >>>> Yes, it prevents PROMOTE_SIGNAL_FILE from remaining even if
>> >>>> both promote files exist.
>> >>>>
>> >>> The command("unlink(PROMOTE_SIGNAL_FILE)") here is for
>> >>> unusualy case.
>> >>> Because the case is when done both procedures below.
>> >>> - user create "promote" file on PGDATA
>> >>> - user issue "pg_ctl promote"
>> >>>
>> >>> I understand the reason.
>> >>> But I think it's better to unlink(PROMOTE_SIGNAL_FILE) before
>> >>> unlink(FAST_PROMOTE_SIGNAL_FILE).
>> >>> Because FAST_PROMOTE_SIGNAL_FILE is definetly there but
>> >>> PROMOTE_SIGNAL_FILE is sometimes there or not there.
>> >>
>> >> I could not understand why that's better. Could you elaborate that?
>> >>
>> > I'm sorry for less explanation.
>> >
>> > I've thought that errno would be set ENOENT and
>> > this may lead something wrong.
>> > I checked this and I know it's not problem.
>> >
>> > sorry for confusing you.
>> >
>> >
>> >
>> >>> And I have another question linking this behavior.
>> >>> I think TriggerFile should be removed too.
>> >>> This is corner-case but it will happen.
>> >>> How do you think of it ?
>> >>
>> >> I don't have strong opinion about that. I've never heard the complaint
>> >> about that current behavior so far.
>> >>
>> > For example, please imagine the cascading replication environment and
>> > using old master as a standby without copying the timeline history file
>> > to new standby.
>> >
>> > -------
>> > 1. replicating 3 servers(A,B,C)
>> > A->B->C
>> > ("trigger_file = /tmp/trig" is set in recovery_recovery.conf on B and
>> > C.)
>> >
>> > 2. stop server A and promoting server B with "touch /tmp/trig;pg_ctl
>> > promote"
>>
>> Why do you need to both create the trigger file and run pg_ctl promote?
>>
>> Anyway, if the patch is useful for fail-safe and it doesn't break the
>> current
>> behavior, I'd be happy to apply it. You are suggesting that we should
>> remove
>> the trigger file in CheckForStandbyTrigger() even if pg_ctl promote is
>> executed.
>> But there can be some cases where we can get out of the WAL replay loop,
>> for example, reach the recovery_target_xxx. So ISTM we should try to
>> remove
>> both the trigger file and "promote" file at the end of recovery
>> instead. Thought?
>>
>> > B->C
>> > (/tmp/trig file remains on server B)
>> >
>> > 4. stop server B and promoting server C with "pg_ctl promote"
>> > C
>> >
>> > 5. making server B connect for standby of server C
>> > C->B
>> > ---------
>> >
>> > In step5 server B will promote as soon as it starts,
>> > because "/tmp/trig" is stil there.
>> >
>> >
>> >
>> >>>> One question is that: we really still need to support normal promote?
>> >>>> pg_ctl promote provides only way to do fast promotion. If we want to
>> >>>> do normal promotion, we need to create PROMOTE_SIGNAL_FILE
>> >>>> and send the SIGUSR1 signal to postmaster by hand. This seems messy.
>> >>>>
>> >>>> I think that we should remove normal promotion at all, or change
>> >>>> pg_ctl promote so that provides also the way to do normal promotion.
>> >>>>
>> >>> I think he merit of "fast promote" is
>> >>> - allowing quick connection by skipping checkpoint
>> >>> and its demerit is
>> >>> - taking little bit longer when crash-recovery
>> >>>
>> >>> If it is seldom to happen its crash soon after promoting
>> >>> and "fast promte" never breaks consistency of database cluster,
>> >>> I think we don't need normal promotion.
>> >>
>> >> You can execute checkpoint after fast promotion for that.
>> >>
>> > OK.
>> > Then I think we should do below things.
>> > - removing normal promotion at all from source
>> > - adding the know-how you suggest on document
>>
>> IMO either is necessary.
>>
>> Regards,
>>
>> --
>> Fujii Masao
>
>

--
Fujii Masao

Attachment Content-Type Size
remove_not_fast_promote_v1.patch application/octet-stream 7.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-08-05 18:27:12 Re: Disabling ALTER SYSTEM SET WAS: Re: ALTER SYSTEM SET command to change postgresql.conf parameters
Previous Message Josh Berkus 2013-08-05 18:21:57 Re: Unsafe GUCs and ALTER SYSTEM WAS: Re: ALTER SYSTEM SET