Immediate standby promotion

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Immediate standby promotion
Date: 2014-08-14 08:19:06
Message-ID: CAHGQGwHtvyDqKZaYWYA9zyyLEcAKiF5P0KpcpuNE_tsrGTFtQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I'd like to propose to add new option "--immediate" to pg_ctl promote.
When this option is set, recovery ignores any WAL data which have not
been replayed yet and exits immediately. Patch attached.

This promotion is faster than normal one but can cause data loss. So
it's useful if we want to switch the server to normal operation as
soon as possible at the expense of durability. Also imagine the case
where, while recovery is being delayed (by a time-delayed standby
which was introduced in 9.4) or paused (by pg_xlog_replay_pause),
you find that subsequent WAL data can cause a disaster to happen
(for example, WAL data indicating an unexpected deletion of
important database). In this case, this immediate promotion can be
used to ignore such problematic WAL data.

With the patch, "--immediate" option controls whether immediate
promotion is performed or not. OTOH, we can extend "-m" option
that pg_ctl stop has already used so that it controls also the mode of
promotion. But when we discussed this feature before, someone
disagreed to do that because shutdown mode and promotion mode
are different concepts. For example, if "smart" is specified as mode,
how should the promotion work? I agree with him, and just added
new separate option. Thought?

Regards,

--
Fujii Masao

Attachment Content-Type Size
immediate_promote_v1.patch text/x-patch 8.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-08-14 14:12:17
Message-ID: 2345.1408025537@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> I'd like to propose to add new option "--immediate" to pg_ctl promote.
> When this option is set, recovery ignores any WAL data which have not
> been replayed yet and exits immediately. Patch attached.

> This promotion is faster than normal one but can cause data loss.

TBH, I cannot imagine a situation where that would be a sane thing to do.
If you have WAL, why would you not replay what you have? The purpose
of a database is to preserve your data, not randomly throw it away.

> Also imagine the case
> where, while recovery is being delayed (by a time-delayed standby
> which was introduced in 9.4) or paused (by pg_xlog_replay_pause),
> you find that subsequent WAL data can cause a disaster to happen
> (for example, WAL data indicating an unexpected deletion of
> important database). In this case, this immediate promotion can be
> used to ignore such problematic WAL data.

That example does not demonstrate that a patch like this is useful.
What you'd presumably want is a way to stop replay at a defined place
(comparable to the PITR facilities). Not to just abandon the WAL stream
at whatever point replay has reached.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-08-14 19:27:32
Message-ID: CA+TgmoYLwKJ4LAgOGOB_gMOVQohicw0gOJ-9ETbwH_i4EM47Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 14, 2014 at 10:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> I'd like to propose to add new option "--immediate" to pg_ctl promote.
>> When this option is set, recovery ignores any WAL data which have not
>> been replayed yet and exits immediately. Patch attached.
>
>> This promotion is faster than normal one but can cause data loss.
>
> TBH, I cannot imagine a situation where that would be a sane thing to do.
> If you have WAL, why would you not replay what you have? The purpose
> of a database is to preserve your data, not randomly throw it away.

I've wanted this a number of times, so I think it's quite sane.

>> Also imagine the case
>> where, while recovery is being delayed (by a time-delayed standby
>> which was introduced in 9.4) or paused (by pg_xlog_replay_pause),
>> you find that subsequent WAL data can cause a disaster to happen
>> (for example, WAL data indicating an unexpected deletion of
>> important database). In this case, this immediate promotion can be
>> used to ignore such problematic WAL data.
>
> That example does not demonstrate that a patch like this is useful.
> What you'd presumably want is a way to stop replay at a defined place
> (comparable to the PITR facilities). Not to just abandon the WAL stream
> at whatever point replay has reached.

We already have the facilities to stop replay at a defined place. But
then what? Without this patch, do well tell the customer to stop
replay, do a pg_dump of the whole database, and restore it into a new
database? Because that's crazy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-08-14 19:36:42
Message-ID: CAFcNs+ofS8aqDN+TbnvZ5JBb2NTfWOo+7uNTF67ebO432W=hHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 14, 2014 at 4:27 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> We already have the facilities to stop replay at a defined place. But
> then what? Without this patch, do well tell the customer to stop
> replay, do a pg_dump of the whole database, and restore it into a new
> database? Because that's crazy.
>

Yeah... and as Fujji already said another case is when some operation error
occurs in the master (like a wrong "drop database") and we have a
time-delayed standby that can be used to recover the mistake quickly.

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: "fabriziomello(at)gmail(dot)com" <fabriziomello(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-08-14 20:07:38
Message-ID: 1408046858.61496.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> We already have the facilities to stop replay at a defined
>> place.  But then what?  Without this patch, do well tell the
>> customer to stop replay, do a pg_dump of the whole database, and
>> restore it into a new database?  Because that's crazy.
>
> Yeah... and as Fujji already said another case is when some
> operation error occurs in the master (like a wrong "drop
> database") and we have a time-delayed standby that can be used to
> recover the mistake quickly.

I have been in the position of having an ad hoc data fix by someone
running raw SQL where they forgot the WHERE clause on a DELETE from
the table that just about everything joins to (the "Case" table
for a court system).  Since we had both PITR backups and logical
replication we were able to recover by kicking the users out, doing
a PITR recovery up to shortly before the mistake was made, and then
replaying the logical transaction stream from that point to the
end, excluding the bad transaction.

On the face of it, that doesn't sound like a big deal, right?  But
we had to kick out seven state Supreme Court justices, 16 Court of
Appeals judges, and the related support staff for a couple hours.
Trust me, with a delayed replica and the option of an immediate
promotion of the standby, I would have had a less stressful day.
Instead of telling all those people they couldn't use a key tool in
their workflow for two hours, I could have told them that there
would be a one or two minute outage after which any entries in the
last "n" minutes would be delayed in appearing in their view of the
data for two hours.  The justices would have been a lot happier,
and when they are happier, so is everyone else.  :-)

The suggested feature seems useful to me.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-01 06:23:18
Message-ID: CAA4eK1+FRQP_UurDZSP0=dviZHGG7zcpUQP_TY==ZFE29MTiUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 14, 2014 at 1:49 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> Hi,
>
> I'd like to propose to add new option "--immediate" to pg_ctl promote.
> When this option is set, recovery ignores any WAL data which have not
> been replayed yet and exits immediately. Patch attached.
>
> This promotion is faster than normal one but can cause data loss.

I think there is one downside as well for this proposal that
apart from data loss, it can lead to uncommitted data occupying
space in database which needs to be later cleaned by vacuum.
This can happen with non-immediate promote as well, but the
chances with immediate are more. So the gain we got by doing
immediate promotion can lead to slow down of operations in some
cases. It might be useful if we mention this in docs.

Few comments about patch:

1.
On standby we will see below message:

LOG: received promote request

User will always see above message irrespective of whether it
is immediate promote or any other mode of promote. I think it will
be better to distinguish between different modes and display the
appropriate message.

2.
StartupXLOG()
{
..
+ if (immediate_promote)
+ break;
..
}

Why are you doing this check after pause
(recoveryApplyDelay/recoveryPausesHere) for recovery?

Why can't we do it after ReadRecord()?

3.
! * of "promote" and "immediate_promote"
shouldn't in above sentence 'or' is more appropriate?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-01 11:14:41
Message-ID: CAHGQGwF02Jrsv1m=GOiNJCRykSi+xk9CcXdGTu=38uEU2VYTAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 1, 2014 at 3:23 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Aug 14, 2014 at 1:49 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> I'd like to propose to add new option "--immediate" to pg_ctl promote.
>> When this option is set, recovery ignores any WAL data which have not
>> been replayed yet and exits immediately. Patch attached.
>>
>> This promotion is faster than normal one but can cause data loss.
>

Thanks for reviewing the patch!

> I think there is one downside as well for this proposal that
> apart from data loss, it can lead to uncommitted data occupying
> space in database which needs to be later cleaned by vacuum.
> This can happen with non-immediate promote as well, but the
> chances with immediate are more. So the gain we got by doing
> immediate promotion can lead to slow down of operations in some
> cases. It might be useful if we mention this in docs.

Yep, the immediate promotion might be more likely to cause
the recovery to end before replaying WAL data of VACUUM. But, OTOH,
I think that the immediate promotion might be more likely to cause
the recovery to end before replaying WAL data which will generate
garbage data. So I'm not sure if it's worth adding that note to the doc.

>
> Few comments about patch:
>
> 1.
> On standby we will see below message:
>
> LOG: received promote request
>
> User will always see above message irrespective of whether it
> is immediate promote or any other mode of promote. I think it will
> be better to distinguish between different modes and display the
> appropriate message.

Agreed. So I'm thinking to change the code as follows.

if (immediate_promote)
ereport(LOG, (errmsg("received immediate promote request")));
else
ereport(LOG, (errmsg("received promote request")));

Or we should name the normal promotion?

>
> 2.
> StartupXLOG()
> {
> ..
> + if (immediate_promote)
> + break;
> ..
> }
>
> Why are you doing this check after pause
> (recoveryApplyDelay/recoveryPausesHere) for recovery?
>
> Why can't we do it after ReadRecord()?

We can do that check either after ReadRecord() or after pause.
I preferred to add the check after pause because immediate promotion
would be likely to be requested while recovery is being paused.
In this case, if we do that check after ReadRecord(), we need to read
one more WAL record that actually we don't need.

BTW, in the current patch, when immediate promotion is requested while
recovery is being paused, the recovery keeps being paused until it's
manually resumed. But immediate promotion should cause even paused
recovery to end immediately?

> 3.
> ! * of "promote" and "immediate_promote"
> shouldn't in above sentence 'or' is more appropriate?

Yep.

Regards,

--
Fujii Masao


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-02 06:53:33
Message-ID: CAA4eK1LtY73E3SX5CR2CvMKkT=K+5rDXOoD1vi5x6GUUX08c5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 1, 2014 at 4:44 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Sep 1, 2014 at 3:23 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > I think there is one downside as well for this proposal that
> > apart from data loss, it can lead to uncommitted data occupying
> > space in database which needs to be later cleaned by vacuum.
> > This can happen with non-immediate promote as well, but the
> > chances with immediate are more. So the gain we got by doing
> > immediate promotion can lead to slow down of operations in some
> > cases. It might be useful if we mention this in docs.
>
> Yep, the immediate promotion might be more likely to cause
> the recovery to end before replaying WAL data of VACUUM. But, OTOH,
> I think that the immediate promotion might be more likely to cause
> the recovery to end before replaying WAL data which will generate
> garbage data.

This seems arguable, because immediate promotion won't allow
WAL data to be replayed completely which means more chance
that only partial data of transactions will be replayed and commit
for those transactions won't get replayed, so it can lead to garbage
data.

> So I'm not sure if it's worth adding that note to the doc.

No issues, I just want to bring this point to your notice so that if
you think it is important enough that we can mention it then we
can update the docs else leave it.

> >
> > Few comments about patch:
> >
> > 1.
> > On standby we will see below message:
> >
> > LOG: received promote request
> >
> > User will always see above message irrespective of whether it
> > is immediate promote or any other mode of promote. I think it will
> > be better to distinguish between different modes and display the
> > appropriate message.
>
> Agreed. So I'm thinking to change the code as follows.
>
> if (immediate_promote)
> ereport(LOG, (errmsg("received immediate promote request")));
> else
> ereport(LOG, (errmsg("received promote request")));

This seems fine to me.

> Or we should name the normal promotion?

No need.

> >
> > 2.
> > StartupXLOG()
> > {
> > ..
> > + if (immediate_promote)
> > + break;
> > ..
> > }
> >
> > Why are you doing this check after pause
> > (recoveryApplyDelay/recoveryPausesHere) for recovery?
> >
> > Why can't we do it after ReadRecord()?
>
> We can do that check either after ReadRecord() or after pause.
> I preferred to add the check after pause because immediate promotion
> would be likely to be requested while recovery is being paused.
> In this case, if we do that check after ReadRecord(), we need to read
> one more WAL record that actually we don't need.

Okay, but for that you need to make sure that pause can detect
promotion request.

> BTW, in the current patch, when immediate promotion is requested while
> recovery is being paused, the recovery keeps being paused until it's
> manually resumed. But immediate promotion should cause even paused
> recovery to end immediately?

Yeap, I also think so.

Another issue with immediate promotion is that currently if primary server
is continuously sending the data, then standby could not detect --immediate
promote request and the reason seems to be below code:
WaitForWALToBecomeAvailable()
{
...
{
/* just make sure source info is correct... */
readSource = XLOG_FROM_STREAM;
XLogReceiptSource = XLOG_FROM_STREAM;
return true;
}
..
if (CheckForStandbyTrigger())
}

Basically we won't check for promote request if the data is available.

I have even reproduced this by below test case:
Primary (session-1) -
1. Create table t1 (c1 int, c2 char(500)) with (fillfactor = 10);

Standby -
2. Configure and start standby
3. Just connect with one client

Primary (session-1) -
4. insert into t1 values (generate_series(1,100000), 'aaaaa');

From another window, run command:
5. pg_ctl promote --immediate -D ..\..\Database1

Run step-4 and step-5 at the same time.

Currently standby is promoted only after insert operation
in step-4 is finished which seems to be wrong.

Apart from above issue, I have one question for you regarding
this feature, currently the patch supports immediate promotion
via pg_ctl promote, however we have another mechanism (trigger_file)
which you have not enhanced to support this new feature. Is there
any reason for same?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-03 16:56:38
Message-ID: CA+Tgmoa5JU_UU3v7T_aTvgLSvrOHFgUsqaVpUixMAG1MzJ5X5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 1, 2014 at 7:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> I think there is one downside as well for this proposal that
>> apart from data loss, it can lead to uncommitted data occupying
>> space in database which needs to be later cleaned by vacuum.
>> This can happen with non-immediate promote as well, but the
>> chances with immediate are more. So the gain we got by doing
>> immediate promotion can lead to slow down of operations in some
>> cases. It might be useful if we mention this in docs.
>
> Yep, the immediate promotion might be more likely to cause
> the recovery to end before replaying WAL data of VACUUM. But, OTOH,
> I think that the immediate promotion might be more likely to cause
> the recovery to end before replaying WAL data which will generate
> garbage data. So I'm not sure if it's worth adding that note to the doc.

-1 for documenting that. This is mostly a general PostgreSQL
phenomenon and has little to do with immediate promotion specifically.
I think anything we write here is likely to be more confusing than
helpful.

> Agreed. So I'm thinking to change the code as follows.
>
> if (immediate_promote)
> ereport(LOG, (errmsg("received immediate promote request")));
> else
> ereport(LOG, (errmsg("received promote request")));

+1 for that version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-17 11:23:41
Message-ID: CA+U5nMKam02HEtDNxVLiu83Jj+-uJsbFUKjgq32hOrTsX4s5sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14 August 2014 20:27, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Aug 14, 2014 at 10:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>> I'd like to propose to add new option "--immediate" to pg_ctl promote.
>>> When this option is set, recovery ignores any WAL data which have not
>>> been replayed yet and exits immediately. Patch attached.
>>
>>> This promotion is faster than normal one but can cause data loss.
>>
>> TBH, I cannot imagine a situation where that would be a sane thing to do.
>> If you have WAL, why would you not replay what you have? The purpose
>> of a database is to preserve your data, not randomly throw it away.
>
> I've wanted this a number of times, so I think it's quite sane.

"fast" promotion was actually a supported option in r8 of Postgres but
this option was removed when we implemented streaming replication in
r9.0

The *rough* requirement is sane, but that's not the same thing as
saying this exact patch makes sense.

If you are paused and you can see that WAL up ahead is damaged, then
YES, you do want to avoid applying it. That is possible by setting a
PITR target so that recovery stops at a precise location specified by
you. As an existing option is it better than the blunt force trauma
suggested here.

If you really don't care, just shutdown server, resetxlog and start
her up - again, no need for new option.

Anyway, my view is that this is a blunt instrument to do something we
can already do.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-18 00:22:48
Message-ID: CA+TgmoZg0K9csZ1KPXLos2cpe3pea+vuhsku9_5evx=ZzQ1LjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 17, 2014 at 7:23 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 14 August 2014 20:27, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Aug 14, 2014 at 10:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>> I'd like to propose to add new option "--immediate" to pg_ctl promote.
>>>> When this option is set, recovery ignores any WAL data which have not
>>>> been replayed yet and exits immediately. Patch attached.
>>>
>>>> This promotion is faster than normal one but can cause data loss.
>>>
>>> TBH, I cannot imagine a situation where that would be a sane thing to do.
>>> If you have WAL, why would you not replay what you have? The purpose
>>> of a database is to preserve your data, not randomly throw it away.
>>
>> I've wanted this a number of times, so I think it's quite sane.
>
> "fast" promotion was actually a supported option in r8 of Postgres but
> this option was removed when we implemented streaming replication in
> r9.0
>
> The *rough* requirement is sane, but that's not the same thing as
> saying this exact patch makes sense.

Granted. Fair point.

> If you are paused and you can see that WAL up ahead is damaged, then
> YES, you do want to avoid applying it. That is possible by setting a
> PITR target so that recovery stops at a precise location specified by
> you. As an existing option is it better than the blunt force trauma
> suggested here.

You can pause at a recovery target, but then what if you want to go
read/write at that point? Or what if you've got a time-delayed
standby and you want to break replication so that it doesn't replay
the DROP TABLE students that somebody ran on the master? It doesn't
have to be that WAL is unreadable or corrupt; it's enough for it to
contain changes you wish to avoid replaying.

> If you really don't care, just shutdown server, resetxlog and start
> her up - again, no need for new option.

To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
seems like a friendlier interface than making somebody shut down the
server, run pg_resetxlog, and start it up again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-24 20:36:50
Message-ID: CA+U5nMKOsigzvYbe7f7k0k-15Z5RMj3DRhMSko1hNVQBWWGomA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 September 2014 01:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> "fast" promotion was actually a supported option in r8 of Postgres but
>> this option was removed when we implemented streaming replication in
>> r9.0
>>
>> The *rough* requirement is sane, but that's not the same thing as
>> saying this exact patch makes sense.
>
> Granted. Fair point.
>
>> If you are paused and you can see that WAL up ahead is damaged, then
>> YES, you do want to avoid applying it. That is possible by setting a
>> PITR target so that recovery stops at a precise location specified by
>> you. As an existing option is it better than the blunt force trauma
>> suggested here.
>
> You can pause at a recovery target, but then what if you want to go
> read/write at that point? Or what if you've got a time-delayed
> standby and you want to break replication so that it doesn't replay
> the DROP TABLE students that somebody ran on the master? It doesn't
> have to be that WAL is unreadable or corrupt; it's enough for it to
> contain changes you wish to avoid replaying.
>
>> If you really don't care, just shutdown server, resetxlog and start
>> her up - again, no need for new option.
>
> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> seems like a friendlier interface than making somebody shut down the
> server, run pg_resetxlog, and start it up again.

It makes sense to go from paused --> promoted.

It doesn't make sense to go from normal running --> promoted, since
that is just random data loss. I very much understand the case where
somebody is shouting "get the web site up, we are losing business".
Implementing a feature that allows people to do exactly what they
asked (go live now), but loses business transactions that we thought
had been safely recorded is not good. It implements only the exact
request, not its actual intention.

Any feature that lumps both cases together is wrongly designed and
will cause data loss.

We go to a lot of trouble to ensure data is successfully on disk and
in WAL. I won't give that up, nor do I want to make it easier to lose
data than it already is.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 02:03:11
Message-ID: CAB7nPqTEyEnCk+MTm2G9LaSMmQJEWfC=FYweS_6nKqV6E3KQ7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 25, 2014 at 5:36 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> We go to a lot of trouble to ensure data is successfully on disk and
> in WAL. I won't give that up, nor do I want to make it easier to lose
> data than it already is.
+1.
--
Michael


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 15:13:50
Message-ID: CA+TgmobB-XkO_bnXL4eBUzOtC8PCdxb=BKjgJmimfPAujdNEJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 24, 2014 at 4:36 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
>> seems like a friendlier interface than making somebody shut down the
>> server, run pg_resetxlog, and start it up again.
>
> It makes sense to go from paused --> promoted.

Well, we could certainly require that people call
pg_xlog_replay_pause() before they call pg_promote_now(). I don't
think I like that better, but it beats not having pg_promote_now().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 15:29:34
Message-ID: 20140925152934.GA21746@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-24 21:36:50 +0100, Simon Riggs wrote:
> On 18 September 2014 01:22, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> >> "fast" promotion was actually a supported option in r8 of Postgres but
> >> this option was removed when we implemented streaming replication in
> >> r9.0
> >>
> >> The *rough* requirement is sane, but that's not the same thing as
> >> saying this exact patch makes sense.
> >
> > Granted. Fair point.
> >
> >> If you are paused and you can see that WAL up ahead is damaged, then
> >> YES, you do want to avoid applying it. That is possible by setting a
> >> PITR target so that recovery stops at a precise location specified by
> >> you. As an existing option is it better than the blunt force trauma
> >> suggested here.
> >
> > You can pause at a recovery target, but then what if you want to go
> > read/write at that point? Or what if you've got a time-delayed
> > standby and you want to break replication so that it doesn't replay
> > the DROP TABLE students that somebody ran on the master? It doesn't
> > have to be that WAL is unreadable or corrupt; it's enough for it to
> > contain changes you wish to avoid replaying.
> >
> >> If you really don't care, just shutdown server, resetxlog and start
> >> her up - again, no need for new option.

I think that should pretty much never be something an admin has to
run. It's just about impossible to get this right. In all likelihood
just running pg_resetxlog on a database in recovery will have corrupted
your database.
Which is why pg_resetxlog won't even let you proceed without using -f
because it checks for DB_SHUTDOWNED. Rightly so.

pg_resetxlog *removes* *all* existing WAL and sets the current control
file state to DB_SHUTDOWNED. Thus there will be no recovery when
starting afterwards.

> > To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> > seems like a friendlier interface than making somebody shut down the
> > server, run pg_resetxlog, and start it up again.
>
> It makes sense to go from paused --> promoted.
>
> It doesn't make sense to go from normal running --> promoted, since
> that is just random data loss.

Why? I don't see what's random in promoting a node in the current state
*iff* it's currently consistent.

Just imagine something like promoting a current standby to a full node
because you want to run some tests on it that require writes. There's
absolutely no need to investigate the current state for that.

> I very much understand the case where
> somebody is shouting "get the web site up, we are losing business".
> Implementing a feature that allows people to do exactly what they
> asked (go live now), but loses business transactions that we thought
> had been safely recorded is not good. It implements only the exact
> request, not its actual intention.

That seems to be a problem of massively understanding on the part of the
user. And I don't see how this is going to be safer by requiring the
user to first issue a pause reuest.

I think we should attempt to solve this by naming the command
appropriately. Something like 'abort_replay_and_promote'. Long,
nontrivial to type, and descriptive.

> Any feature that lumps both cases together is wrongly designed and
> will cause data loss.
>
> We go to a lot of trouble to ensure data is successfully on disk and
> in WAL. I won't give that up, nor do I want to make it easier to lose
> data than it already is.

I think that's not really related. Such a promotion doesn't cause data
loss in the sense of loosing data a *clueful* operator wanted to
keep. Yes, it can be used wrongly, but it's far from alone in that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 15:34:48
Message-ID: 20140925153448.GI9633@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-25 11:13:50 -0400, Robert Haas wrote:
> On Wed, Sep 24, 2014 at 4:36 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
> >> seems like a friendlier interface than making somebody shut down the
> >> server, run pg_resetxlog, and start it up again.
> >
> > It makes sense to go from paused --> promoted.
>
> Well, we could certainly require that people call
> pg_xlog_replay_pause() before they call pg_promote_now(). I don't
> think I like that better, but it beats not having pg_promote_now().

FWIW I think it's a mistake to only allow this on a hot standby. This
should be doable using pg_ctl alone. a) works with wal_level=archive, b)
sometimes hot_standby=off is a good bit more efficient c) sometimes you
don't want to allow connections.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 16:09:15
Message-ID: CA+TgmoaXrHe6pS_4aJAVv5E0MH9LYOKRv5GhJQ8h3oh-qgAwgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 25, 2014 at 11:34 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-09-25 11:13:50 -0400, Robert Haas wrote:
>> On Wed, Sep 24, 2014 at 4:36 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> >> To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
>> >> seems like a friendlier interface than making somebody shut down the
>> >> server, run pg_resetxlog, and start it up again.
>> >
>> > It makes sense to go from paused --> promoted.
>>
>> Well, we could certainly require that people call
>> pg_xlog_replay_pause() before they call pg_promote_now(). I don't
>> think I like that better, but it beats not having pg_promote_now().
>
> FWIW I think it's a mistake to only allow this on a hot standby. This
> should be doable using pg_ctl alone. a) works with wal_level=archive, b)
> sometimes hot_standby=off is a good bit more efficient c) sometimes you
> don't want to allow connections.

Good point. Also, a pg_ctl command is more friendly to cluster-ware,
of which there is a lot these days.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 17:18:09
Message-ID: CA+U5nM+XRdGpH1Mrk-S+9k5j7ExCVq33vcAb4r3EVkd=x0auRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25 September 2014 16:29, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

>> > To me, being able to say "pg_ctl promote_right_now -m yes_i_mean_it"
>> > seems like a friendlier interface than making somebody shut down the
>> > server, run pg_resetxlog, and start it up again.
>>
>> It makes sense to go from paused --> promoted.
>>
>> It doesn't make sense to go from normal running --> promoted, since
>> that is just random data loss.
>
> Why? I don't see what's random in promoting a node in the current state
> *iff* it's currently consistent.
>
> Just imagine something like promoting a current standby to a full node
> because you want to run some tests on it that require writes. There's
> absolutely no need to investigate the current state for that.
>
>> I very much understand the case where
>> somebody is shouting "get the web site up, we are losing business".
>> Implementing a feature that allows people to do exactly what they
>> asked (go live now), but loses business transactions that we thought
>> had been safely recorded is not good. It implements only the exact
>> request, not its actual intention.
>
> That seems to be a problem of massively understanding on the part of the
> user. And I don't see how this is going to be safer by requiring the
> user to first issue a pause reuest.
>
> I think we should attempt to solve this by naming the command
> appropriately. Something like 'abort_replay_and_promote'. Long,
> nontrivial to type, and descriptive.
>
>> Any feature that lumps both cases together is wrongly designed and
>> will cause data loss.
>>
>> We go to a lot of trouble to ensure data is successfully on disk and
>> in WAL. I won't give that up, nor do I want to make it easier to lose
>> data than it already is.
>
> I think that's not really related. Such a promotion doesn't cause data
> loss in the sense of loosing data a *clueful* operator wanted to
> keep. Yes, it can be used wrongly, but it's far from alone in that.

Yes it does cause data loss. The clueful operator has no idea where
they are so there is no "used rightly" in that case.

If I were to give this feature a name it would be --discard or
--random-data-loss, or --reset-hard

The point of pausing is misunderstood. That is close but not quite relevant.

If you are at a known location and request promotion, we can presume
you know what you are doing, so it is simply Promote.

If you are at an unknown location and therefore have clearly not
verified any state before promotion, you are clearly making an
uninformed decision that will likely result in data loss, for which
there is no way of knowing the impact and no mechanism for recovering
from. Trying to promote something while it is still recovering proves
we don't know the state, we're just picking a random LSN.

So if you have a time delayed standby and the master breaks, or there
is a bad transaction then the correct action would be to
* pause the delayed standby
* discover where the master broke, or the xid of the bad transaction
* restart recovery to go up to the correct time/xid/lsn
* promote standby

That is already possible in 9.4

The original patch for pausing contained code to reset the PITR target
with functions, which would make the above even easier.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 17:30:03
Message-ID: 20140925173003.GA1169@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-25 18:18:09 +0100, Simon Riggs wrote:
> On 25 September 2014 16:29, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I think that's not really related. Such a promotion doesn't cause data
> > loss in the sense of loosing data a *clueful* operator wanted to
> > keep. Yes, it can be used wrongly, but it's far from alone in that.
>
> Yes it does cause data loss. The clueful operator has no idea where
> they are so there is no "used rightly" in that case.

What? There definitely are cases where you don't need to know that to
the T. Just think of the - quite frequently happening - need to promote
a standby to run tests or reporting queries that can't be run on a
standby.

Sure, you shouldn't use it if you expect a very specific set of the data
being there, but that's not always necessary. And that's why it should
never, ever be the default.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 18:00:30
Message-ID: CA+TgmoZYC7RCHLUzuSzmz2BYnnfzwxbasbMzLoGt0VjvdMPFJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 25, 2014 at 1:30 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Yes it does cause data loss. The clueful operator has no idea where
>> they are so there is no "used rightly" in that case.
>
> What? There definitely are cases where you don't need to know that to
> the T. Just think of the - quite frequently happening - need to promote
> a standby to run tests or reporting queries that can't be run on a
> standby.
>
> Sure, you shouldn't use it if you expect a very specific set of the data
> being there, but that's not always necessary.

Very well put. +1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate standby promotion
Date: 2014-09-25 21:04:18
Message-ID: CA+U5nM+5__0n-52GM1TSmjfk_3uBpTMBYgOYftEfKTE1RWx_hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25 September 2014 18:30, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-09-25 18:18:09 +0100, Simon Riggs wrote:
>> On 25 September 2014 16:29, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > I think that's not really related. Such a promotion doesn't cause data
>> > loss in the sense of loosing data a *clueful* operator wanted to
>> > keep. Yes, it can be used wrongly, but it's far from alone in that.
>>
>> Yes it does cause data loss. The clueful operator has no idea where
>> they are so there is no "used rightly" in that case.
>
> What? There definitely are cases where you don't need to know that to
> the T. Just think of the - quite frequently happening - need to promote
> a standby to run tests or reporting queries that can't be run on a
> standby.

What do they do with the standby afterwards?

Perhaps for testing, but I'd hope that Business Intelligence is done
by freezing databases at known target times. So at least you can say,
"using a database snapshot of 9am, we had the following results".

We seem to be trying to justify something that is dangerous and will
destroy data for incautious users. Of course it has uses, but thats
not the point, its the danger that is the problem, not the lack of
use. We go to a lot of trouble to avoid footguns elsewhere across many
years, so I can't see why you'd want to have the --footgun option
added here. recovery-target = 'vague'

> Sure, you shouldn't use it if you expect a very specific set of the data
> being there, but that's not always necessary. And that's why it should
> never, ever be the default.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services