Re: gitlab post-mortem: pg_basebackup waiting for checkpoint

Lists: pgsql-hackers
From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-11 09:38:09
Message-ID: 1486805889.24568.96.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

one take-away from the Gitlab Post-Mortem[1] appears to be that after
their secondary lost replication, they were confused about what
pg_basebackup was doing when they tried to rebuild it. It just sat there
and did nothing (even with --verbose), so they assumed something was
wrong with either the primary or the connection, and restarted it
several times.

AFAICT, it turns out the checkpoint was written on the master (they
probably did not use -c fast), but this wasn't obvious to them:

"One of the engineers went to the secondary and wiped the data
directory, then ran pg_basebackup. Unfortunately pg_basebackup would
hang, producing no meaningful output, despite the --verbose option being
set."

[...]

"Unfortunately this did not resolve the problem of pg_basebackup not
starting replication immediately. One of the engineers decided to run it
with strace to see what it was blocking on. strace showed that
pg_basebackup was hanging in a poll call, but that did not provide any
other meaningful information that might have explained why."

[...]

"It would later be revealed by another engineer (who wasn't around at
the time) that this is normal behavior: pg_basebackup will wait for the
primary to start sending over replication data and it will sit and wait
silently until that time. Unfortunately this was not clearly documented
in our engineering runbooks nor in the official pg_basebackup document."

ISTM that even with WAL streaming, nothing would be written on the
client server until the checkpoint is complete, as do_pg_start_backup()
runs the checkpoint and only returns the starting WAL location
afterwards.

The attached (untested) patch is to kick of a discussion on how to
improve the situation, it is supposed to mention the checkpoint when
--verbose is used and adds a paragraph about the checkpoint being run to
the Notes section of the documentation.

Michael

[1]https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment Content-Type Size
pg_basebackup.patch text/x-patch 1.5 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-11 10:07:59
Message-ID: CABUevExpVYuLUgoNgYNNHxFmZqo3PuuaKgcVwYE5B5wCGScZkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 11, 2017 at 10:38 AM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> one take-away from the Gitlab Post-Mortem[1] appears to be that after
> their secondary lost replication, they were confused about what
> pg_basebackup was doing when they tried to rebuild it. It just sat there
> and did nothing (even with --verbose), so they assumed something was
> wrong with either the primary or the connection, and restarted it
> several times.
>
> AFAICT, it turns out the checkpoint was written on the master (they
> probably did not use -c fast), but this wasn't obvious to them:
>

Yeah, I've seen this happen to a number of people. I think that sounds like
what's happened here as well. I've considered things in the line of the
patch you posted, but never got around to actually doing anything about it.

> ISTM that even with WAL streaming, nothing would be written on the
> client server until the checkpoint is complete, as do_pg_start_backup()
> runs the checkpoint and only returns the starting WAL location
> afterwards.
>
> The attached (untested) patch is to kick of a discussion on how to
> improve the situation, it is supposed to mention the checkpoint when
> --verbose is used and adds a paragraph about the checkpoint being run to
> the Notes section of the documentation.
>
>
Docs look good to me, other than claiming that pg_basebackup runs on a
server (it can run anywhere). I would just say "during which pg_basebackup
will appear idle". How does that sound to you?

As for the code, while I haven't tested it, isn't the "checkpoint
completed" message in the wrong place? Doesn't PQsendQuery() complete
immediately, and the check needs to be put *after* the PQgetResult() call?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-11 10:25:20
Message-ID: 1486808720.24568.103.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Am Samstag, den 11.02.2017, 11:07 +0100 schrieb Magnus Hagander:

> As for the code, while I haven't tested it, isn't the "checkpoint
> completed" message in the wrong place? Doesn't PQsendQuery() complete
> immediately, and the check needs to be put *after* the PQgetResult()
> call?

I guess you're right, I've moved it further down. There is in fact a
message about the xlog location (unless you switch off wal entirely),
but having another one right before that mentioning the completed
checkpoint sounds ok to me.

There's also some inconsistencies around which messages are prepended
with "pg_basebackup: " and which are translatable; I guess all messages
printed on --verbose should be translatable? Also, as almost all
messages have a "pg_basebackup: " prefix, I've added it to the rest.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment Content-Type Size
pg_basebackup_v2.patch text/x-patch 2.6 KB

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-11 10:36:24
Message-ID: 1486809384.14426.1.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Am Samstag, den 11.02.2017, 11:25 +0100 schrieb Michael Banck:
> Am Samstag, den 11.02.2017, 11:07 +0100 schrieb Magnus Hagander:
> > As for the code, while I haven't tested it, isn't the "checkpoint
> > completed" message in the wrong place? Doesn't PQsendQuery() complete
> > immediately, and the check needs to be put *after* the PQgetResult()
> > call?
>
> I guess you're right, I've moved it further down. There is in fact a
> message about the xlog location (unless you switch off wal entirely),
> but having another one right before that mentioning the completed
> checkpoint sounds ok to me.
>
> There's also some inconsistencies around which messages are prepended
> with "pg_basebackup: " and which are translatable; I guess all messages
> printed on --verbose should be translatable? Also, as almost all
> messages have a "pg_basebackup: " prefix, I've added it to the rest.

Sorry, there were two typoes in the last patch, I've attached a fixed
one.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment Content-Type Size
pg_basebackup_v3.patch text/x-patch 2.6 KB

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-13 02:29:25
Message-ID: f444d1a6-fb97-3283-cfb5-b71b77b791b8@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/11/17 4:36 AM, Michael Banck wrote:
> I guess you're right, I've moved it further down. There is in fact a
> message about the xlog location (unless you switch off wal entirely),
> but having another one right before that mentioning the completed
> checkpoint sounds ok to me.

1) I don't think this should be verbose output. Having a program sit
there "doing nothing" for no apparent reason is just horrible UI design.

2) I think it'd be useful to have a way to get the status of a running
checkpoint. The checkpointer already has that info, and I think it might
even be in shared memory already. If there was a function that reported
checkpoint status pg_basebackup could poll that to provide users with
live status. That should be a separate patch though.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-13 08:31:03
Message-ID: CABUevEyAHBmsnkUOLDGLfAqL7Utn2v6v-cCaRmnATjwq3LcvPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:

> On 2/11/17 4:36 AM, Michael Banck wrote:
>
>> I guess you're right, I've moved it further down. There is in fact a
>> message about the xlog location (unless you switch off wal entirely),
>> but having another one right before that mentioning the completed
>> checkpoint sounds ok to me.
>>
>
> 1) I don't think this should be verbose output. Having a program sit there
> "doing nothing" for no apparent reason is just horrible UI design.
>

That would include much of Unix then.. For example if I run "cp" on a large
file it sits around "doing nothing". Same if I do "tar". No?

> 2) I think it'd be useful to have a way to get the status of a running
> checkpoint. The checkpointer already has that info, and I think it might
> even be in shared memory already. If there was a function that reported
> checkpoint status pg_basebackup could poll that to provide users with live
> status. That should be a separate patch though.

I agree that this would definitely be useful. But it might be something
that's better exposed as a server-side view?

(and if pg_basebackup could poll it it would probably still not be included
by default -- only if -P was given).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-13 09:33:47
Message-ID: 1486978427.21010.6.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander:
> On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
> wrote:
> On 2/11/17 4:36 AM, Michael Banck wrote:
> I guess you're right, I've moved it further down.
> There is in fact a
> message about the xlog location (unless you switch off
> wal entirely),
> but having another one right before that mentioning
> the completed
> checkpoint sounds ok to me.
>
> 1) I don't think this should be verbose output. Having a
> program sit there "doing nothing" for no apparent reason is
> just horrible UI design.
>
>
> That would include much of Unix then.. For example if I run "cp" on a
> large file it sits around "doing nothing". Same if I do "tar". No?

The expectation for all three commands is that, even if there is no
output on stdout, they will write data to the local machine. So you can
easily monitor the progress of cp and tar by running du or something in
a different terminal.

With pg_basebackup, nothing is happening on the local machine until the
checkpoint on the remote is finished; while this is obvious to somebody
familiar with how basebackups work internally, it appears to be not
clear at all to some users.

So I think notifying the user that something is happening remotely while
the local process waits would be useful, but on the other hand,
pg_basebackup does not print anything unless (i) --verbose is set or
(ii) there is an error, so I think having it mention the checkpoint in
--verbose mode only is acceptable.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-14 17:06:48
Message-ID: CABUevEzQh1Mo1rrb5pknL6=bAn_agK-1YxHHyJNPZ9FOS=iHcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 13, 2017 at 10:33 AM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander:
> > On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
> > wrote:
> > On 2/11/17 4:36 AM, Michael Banck wrote:
> > I guess you're right, I've moved it further down.
> > There is in fact a
> > message about the xlog location (unless you switch off
> > wal entirely),
> > but having another one right before that mentioning
> > the completed
> > checkpoint sounds ok to me.
> >
> > 1) I don't think this should be verbose output. Having a
> > program sit there "doing nothing" for no apparent reason is
> > just horrible UI design.
> >
> >
> > That would include much of Unix then.. For example if I run "cp" on a
> > large file it sits around "doing nothing". Same if I do "tar". No?
>
> The expectation for all three commands is that, even if there is no
> output on stdout, they will write data to the local machine. So you can
> easily monitor the progress of cp and tar by running du or something in
> a different terminal.
>
> With pg_basebackup, nothing is happening on the local machine until the
> checkpoint on the remote is finished; while this is obvious to somebody
> familiar with how basebackups work internally, it appears to be not
> clear at all to some users.
>

True.

However, outputing this info by default will make it show up in things like
everybodys cronjobs by default. Right now a successful pg_basebackup run
will come out with no output at all, which is how most Unix commands work,
and brings it's own advantages. If we change that people will have to send
all the output to /dev/null, resulting in missing the things that are
actually important in any regard.

>
> So I think notifying the user that something is happening remotely while
> the local process waits would be useful, but on the other hand,
> pg_basebackup does not print anything unless (i) --verbose is set or
> (ii) there is an error, so I think having it mention the checkpoint in
> --verbose mode only is acceptable.
>

Yeah, that's my view as well. I'm all for including it in verbose mode.

*Iff* we can get a progress indicator through the checkpoint we could
include that in --progress mode. But that's a different patch, of course,
but it shouldn't be included in the default output even if we find it.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-14 18:41:33
Message-ID: CA+TgmoZ2Ru__LmeNQJm3AjGPHBHjfhU0D+X5pTw7UdZtvK2qhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 14, 2017 at 12:06 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> However, outputing this info by default will make it show up in things like
> everybodys cronjobs by default. Right now a successful pg_basebackup run
> will come out with no output at all, which is how most Unix commands work,
> and brings it's own advantages. If we change that people will have to send
> all the output to /dev/null, resulting in missing the things that are
> actually important in any regard.

I agree with that. I think having this show up in verbose mode is a
really good idea - when something just hangs, users don't know what's
going on, and that's bad. But showing it all the time seems like a
bridge too far. As the postmortem linked above shows, people will
think of things like "hey, let's try --verbose mode" when the obvious
thing doesn't work. What is really irritating to them is when
--verbose mode fails to be, uh, verbose.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-14 21:06:51
Message-ID: 20170214210651.ldwzk6frxxclbomw@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Tue, Feb 14, 2017 at 12:06 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > However, outputing this info by default will make it show up in things like
> > everybodys cronjobs by default. Right now a successful pg_basebackup run
> > will come out with no output at all, which is how most Unix commands work,
> > and brings it's own advantages. If we change that people will have to send
> > all the output to /dev/null, resulting in missing the things that are
> > actually important in any regard.
>
> I agree with that. I think having this show up in verbose mode is a
> really good idea - when something just hangs, users don't know what's
> going on, and that's bad. But showing it all the time seems like a
> bridge too far. As the postmortem linked above shows, people will
> think of things like "hey, let's try --verbose mode" when the obvious
> thing doesn't work. What is really irritating to them is when
> --verbose mode fails to be, uh, verbose.

I'd rather have a --quiet mode instead. If you're running it by hand,
you're likely to omit the switch, whereas when writing the cron job
you're going to notice lack of switch even before you let the job run
once.

I think progress reporting ought to go to stderr anyway.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-14 23:18:24
Message-ID: CA+TgmoYeTytkZdSQqX8Ck+ZHSQkW5dyY8-0=rCdofxUbjdya_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> I'd rather have a --quiet mode instead. If you're running it by hand,
> you're likely to omit the switch, whereas when writing the cron job
> you're going to notice lack of switch even before you let the job run
> once.

Well, that might've been a better way to design it, but changing it
now would break backward compatibility and I'm not really sure that's
a good idea. Even if it is, it's a separate concern from whether or
not in the less-quiet mode we should point out that we're waiting for
a checkpoint on the server side.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-15 23:05:28
Message-ID: CAMkU=1wV11t2X6XQgK3G6B5zLGNZ95PsMzVsjsnXhmyP-3hzmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 14, 2017 at 9:06 AM, Magnus Hagander <magnus(at)hagander(dot)net>
wrote:

Yeah, that's my view as well. I'm all for including it in verbose mode.
>
> *Iff* we can get a progress indicator through the checkpoint we could
> include that in --progress mode. But that's a different patch, of course,
> but it shouldn't be included in the default output even if we find it.
>
>
I think it should show up in --progress mode. It would be great if we
could show fine-grained progress reports on the checkpoint, but if we can't
do that we should still report as fine as we are able to, which is that a
checkpoint is in progress. Otherwise we are setting the perfect as the
enemy of the good.

Cheers,

Jeff


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-17 19:17:02
Message-ID: c48f49ef-a3e1-5315-447b-84f8fa7ccab0@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2/14/17 5:18 PM, Robert Haas wrote:
> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> I'd rather have a --quiet mode instead. If you're running it by hand,
>> you're likely to omit the switch, whereas when writing the cron job
>> you're going to notice lack of switch even before you let the job run
>> once.
>
> Well, that might've been a better way to design it, but changing it
> now would break backward compatibility and I'm not really sure that's

Meh... it's really only going to affect cronjobs or scripts, which are
easy enough to fix, and you're not going to have that many of them (or
if you do you certainly have an automated way to push the update).

> a good idea. Even if it is, it's a separate concern from whether or
> not in the less-quiet mode we should point out that we're waiting for
> a checkpoint on the server side.

Well, --quite was suggested because of confusion from pg_basebackup
twiddling it's thumbs...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-17 23:22:20
Message-ID: c869c5ba-0aa2-06df-4e1d-60169cf7e230@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/17/2017 08:17 PM, Jim Nasby wrote:
> On 2/14/17 5:18 PM, Robert Haas wrote:
>> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
>> <alvherre(at)2ndquadrant(dot)com> wrote:
>>> I'd rather have a --quiet mode instead. If you're running it by hand,
>>> you're likely to omit the switch, whereas when writing the cron job
>>> you're going to notice lack of switch even before you let the job run
>>> once.
>>
>> Well, that might've been a better way to design it, but changing it
>> now would break backward compatibility and I'm not really sure that's
>
> Meh... it's really only going to affect cronjobs or scripts, which are
> easy enough to fix, and you're not going to have that many of them (or
> if you do you certainly have an automated way to push the update).
>

I think you're underestimating the breakage and overestimating how easy
it's going to be to it. It's true we'd only change this in a major
version, so people should assume possible breakage and test.

>> a good idea. Even if it is, it's a separate concern from whether or
>> not in the less-quiet mode we should point out that we're waiting for
>> a checkpoint on the server side.
>
> Well, --quite was suggested because of confusion from pg_basebackup
> twiddling it's thumbs...

I'm in favor of the '--verbose' route. People are used to that when
investigating issues, and it does not break existing cron jobs. I can
live with --quiet though, as long as we don't resort to some craziness
along the lines "if there's tty be verbose, otherwise be quiet".

I have my doubts about this actually addressing gitlab-like mistakes,
though, because it's a helluva jump from "It's waiting and not doing
anything," to "We need to remove the datadir." (One of the reasons being
that non-empty directory is a local issue, and there's no reason why the
tool should wait instead of just reporting an error.)

FWIW before messing with the pg_basebackup code, perhaps we should
improve the documentation and explain clearly the meaning of 'fast' and
'spread' checkpoint modes. Right now, pg_basebackup docs only say this:

Sets checkpoint mode to fast or spread (default) (see Section 24.3.3).

which is pretty damn useless, when you're investigating an issue. And
the referenced section (Making a Base Backup Using the Low Level API)
does not clearly explain how this maps to pg_start_backup(_,?).

What about adding a paragraph into pg_basebackup docs, explaining that
with 'fast' it does immediate checkpoint, while with 'spread' it'll wait
for a spread checkpoint.

regards

-- Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-18 00:43:27
Message-ID: CAKFQuwZQU6z6ei99oAjMQJJx5t1qphJkF-PjiEMp=hAUJV053Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 17, 2017 at 4:22 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> What about adding a paragraph into pg_basebackup docs, explaining that
> with 'fast' it does immediate checkpoint, while with 'spread' it'll wait
> for a spread checkpoint.
>

I agree that a better, and self-contained, explanation of the behaviors
that fast and spread invoke on the server should be included directly in
the pg_basebackup docs.

Additionally, a primary benefit of pg_basebackup is hiding the low-level
details from the user and in that spirit the cross-reference link to
Section 25.3.3 "Making a Base Backup Using the Low Level API" should be
removed. If there is specific information there that a user of
pg_basebackup needs it should be presented properly in the application
documentation.

The top of pg_basebackup points to the entire 25.3 chapter but the flow
from there is solid - coverage of pg_basebackup occurs and points out the
low level API for those whose needs are not fully served by the bundled
application. If one uses pg_basebackup they should be able to stop at that
point, go back to the app page, and continue reading and skip all of 25.3.3

The term "spread checkpoint" isn't actually a defined term in our
docs...and aside from the word spread itself describing out a checkpoint
works, it isn't used outside of pg_basebackup docs. So "it will wait for a
spread checkpoint" doesn't really work - "it will start and then wait for a
normal checkpoint to complete" does.

More holistically (i.e., feel free to skip)

This paragraph from 25.3.3:

"""
This is because it performs a checkpoint, and the I/O required for the
checkpoint will be spread out over a significant period of time, by default
half your inter-checkpoint interval (see the configuration parameter
checkpoint_completion_target). This is usually what you want, because it
minimizes the impact on query processing. If you want to start the backup
as soon as possible, change the second parameter to true.
"""

is good but buried and seems like it would be more visible in Chapter 30.
Reliability and the Write-Ahead Log. To there both the internals and
backbackup pages could point the reader. There isn't a chapter dedicated
to checkpoints - nor does there need to be - but a section in 30 seems
warranted as being the official reference. Right now you have to skim the
configuration variables and "WAL Configuration" and "CHECKPOINT" and "base
backup API and pg_basebackup" to cover everything. A checkpoint chapter
with that paragraph as a focus would allow the other items to simply say
"immediate or normal checkpoint" as needed and redirect the reader for
additional context as to the trade-offs of each - whether done manually or
during some form of backup script.

David J.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-19 10:21:06
Message-ID: CA+Tgmoa6tRN-dDAEygmmr9K5HiiZ6xt=g0e-3E7o=TGAuAwm+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> I have my doubts about this actually addressing gitlab-like mistakes,
> though, because it's a helluva jump from "It's waiting and not doing
> anything," to "We need to remove the datadir." (One of the reasons being
> that non-empty directory is a local issue, and there's no reason why the
> tool should wait instead of just reporting an error.)

It's pretty clear that the gitlab postmortem involves multiple people
making multiple serious errors, including failing to test that the
ostensible backups could actually be restored. I was taught that rule
#1 as far as backups are concerned is to test that you can restore
them, so that seems like a big miss. However, I don't think the fact
they made other mistakes is a reason not to improve the things we can
improve and, certainly, having some way for pg_basebackup to tell you
that it's waiting for the master to checkpoint will help the next
person who is confused by that particular thing. That person may go
on to be confused by something else, but then again maybe not.
Improving the reporting in this case stands on its own merits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-26 19:27:03
Message-ID: 1488137223.24065.7.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> > I'd rather have a --quiet mode instead. If you're running it by hand,
> > you're likely to omit the switch, whereas when writing the cron job
> > you're going to notice lack of switch even before you let the job run
> > once.
>
> Well, that might've been a better way to design it, but changing it
> now would break backward compatibility and I'm not really sure that's
> a good idea. Even if it is, it's a separate concern from whether or
> not in the less-quiet mode we should point out that we're waiting for
> a checkpoint on the server side.

ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.

So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).

Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.

So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment Content-Type Size
0001-Documentation-updates-regarding-checkpoints-for-base.patch text/x-patch 2.5 KB
0002-Mention-initial-checkpoint-in-pg_basebackup-for-verb.patch text/x-patch 2.7 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-26 20:32:30
Message-ID: CABUevEzxrh0tFeUof=S-AL2t5aNVAqbL2jpaSwpMGSXazgV1sQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> > <alvherre(at)2ndquadrant(dot)com> wrote:
> > > I'd rather have a --quiet mode instead. If you're running it by hand,
> > > you're likely to omit the switch, whereas when writing the cron job
> > > you're going to notice lack of switch even before you let the job run
> > > once.
> >
> > Well, that might've been a better way to design it, but changing it
> > now would break backward compatibility and I'm not really sure that's
> > a good idea. Even if it is, it's a separate concern from whether or
> > not in the less-quiet mode we should point out that we're waiting for
> > a checkpoint on the server side.
>
> ISTM the consensus is that there should be no output in regular mode,
> but a message should be displayed in verbose and progress mode.
>
> So I went forth and also added a message in progress mode (unless
> verbose messages are requested anyway).
>
> Regarding the documentation, I tried to clarify the difference between
> the checkpoint types a bit more, but I think any further action is
> probably a larger rewrite of the documentation on this topic.
>
> So attached are two patches, I've split it up in the documentation and
> the code output part. I'll add it as one commitfest entry in the
> "Clients" section though, as it's not really a big patch, unless
> somebody thinks it should have a secon entry in "Documentation"?

Agreed, and applied as one patch. Except I noticed you also fixed a couple
of entries which were missing the progname in the messages -- I broke those
out to a separate patch instead.

Made a small change to "using as much I/O as available" rather than "as
possible", which I think is a better wording, along with the change of the
idle wording I suggested before. (but feel free to point it out to me if
that's wrong).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-26 20:53:46
Message-ID: 1488142426.24065.10.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:

> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
> <michael(dot)banck(at)credativ(dot)de> wrote:

> Agreed, and applied as one patch. Except I noticed you also fixed a
> couple of entries which were missing the progname in the messages -- I
> broke those out to a separate patch instead.

Thanks!

> Made a small change to "using as much I/O as available" rather than
> "as possible", which I think is a better wording, along with the
> change of the idle wording I suggested before. (but feel free to point
> it out to me if that's wrong).

LGTM, I apparently missed your suggestion when I re-read the thread.

I am just wondering whether this could/should be back-patched, maybe? It
is not a bug fix, of course, but OTOH is rather small and probably
helpful to some users on current releases.

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-26 20:55:18
Message-ID: CABUevEzLNQt--kA5uD2QFEi2M7R23QDq2mNOiy+uhE-GH-E6=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 9:53 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:
>
> > On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
> > <michael(dot)banck(at)credativ(dot)de> wrote:
>
> > Agreed, and applied as one patch. Except I noticed you also fixed a
> > couple of entries which were missing the progname in the messages -- I
> > broke those out to a separate patch instead.
>
> Thanks!
>
> > Made a small change to "using as much I/O as available" rather than
> > "as possible", which I think is a better wording, along with the
> > change of the idle wording I suggested before. (but feel free to point
> > it out to me if that's wrong).
>
> LGTM, I apparently missed your suggestion when I re-read the thread.
>
> I am just wondering whether this could/should be back-patched, maybe? It
> is not a bug fix, of course, but OTOH is rather small and probably
> helpful to some users on current releases.
>
>
Good point. We should definitely back-patch the documentation updates.

Not 100% sure about the others, as it's a small behaviour change. But since
it's only in verbose mode, I doubt it is very likely to break anybodys
scripts relying on certain output or so.

What do others think?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-26 20:59:24
Message-ID: 21444.1488142764@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
> wrote:
>> ISTM the consensus is that there should be no output in regular mode,
>> but a message should be displayed in verbose and progress mode.

> Agreed, and applied as one patch.

Is there an argument for back-patching this?

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-27 15:20:47
Message-ID: CABUevEwVsphx0vnKPOBjRHxhMZ0yrbmZjurLOuh+kWa91sP96Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <
> michael(dot)banck(at)credativ(dot)de>
> > wrote:
> >> ISTM the consensus is that there should be no output in regular mode,
> >> but a message should be displayed in verbose and progress mode.
>
> > Agreed, and applied as one patch.
>
> Is there an argument for back-patching this?
>

Seems you were typing that at the same time as we did.

I'm considering it, but not swayed in either direction. Should I take your
comment as a vote that we should back-patch it?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-27 15:22:43
Message-ID: 23988.1488208963@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Is there an argument for back-patching this?

> I'm considering it, but not swayed in either direction. Should I take your
> comment as a vote that we should back-patch it?

Yeah, I'd vote for it.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-27 17:51:18
Message-ID: CANP8+j+TDYEX1jQt11kkSomCrQ=UvR_q4=bMZtGsnNO4oeku=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 February 2017 at 20:55, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> What do others think?

Changing the output behaviour of a command isn't something we usually
do as a backpatch.

This change doesn't affect the default behaviour so probably wouldn't
make a difference to the outcome of the situation that generated this
thread.

Having said that, if it helps others to avoid mistakes in the future
then its worth doing, so +1 to backpatch.

I've looked into changing the actual underlying behaviour and I don't
think its feasible, so making this change will at least allow some
responsiveness from us. Thanks Michael, Magnus.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-02-27 18:46:18
Message-ID: CAMkU=1zVisFuD-O4WSe8fUcA1gE3b0DewzgKYT7kTk7TZehYKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander <magnus(at)hagander(dot)net>
wrote:

>
> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
> wrote:
>
>> Hi,
>>
>> Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
>> > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
>> > <alvherre(at)2ndquadrant(dot)com> wrote:
>> > > I'd rather have a --quiet mode instead. If you're running it by hand,
>> > > you're likely to omit the switch, whereas when writing the cron job
>> > > you're going to notice lack of switch even before you let the job run
>> > > once.
>> >
>> > Well, that might've been a better way to design it, but changing it
>> > now would break backward compatibility and I'm not really sure that's
>> > a good idea. Even if it is, it's a separate concern from whether or
>> > not in the less-quiet mode we should point out that we're waiting for
>> > a checkpoint on the server side.
>>
>> ISTM the consensus is that there should be no output in regular mode,
>> but a message should be displayed in verbose and progress mode.
>>
>> So I went forth and also added a message in progress mode (unless
>> verbose messages are requested anyway).
>>
>> Regarding the documentation, I tried to clarify the difference between
>> the checkpoint types a bit more, but I think any further action is
>> probably a larger rewrite of the documentation on this topic.
>>
>> So attached are two patches, I've split it up in the documentation and
>> the code output part. I'll add it as one commitfest entry in the
>> "Clients" section though, as it's not really a big patch, unless
>> somebody thinks it should have a secon entry in "Documentation"?
>
>
> Agreed, and applied as one patch. Except I noticed you also fixed a couple
> of entries which were missing the progname in the messages -- I broke those
> out to a separate patch instead.
>
> Made a small change to "using as much I/O as available" rather than "as
> possible", which I think is a better wording, along with the change of the
> idle wording I suggested before. (but feel free to point it out to me if
> that's wrong).
>

Should the below fprintf end in a \r rather than a \n, so that the the
progress message gets over-written once the checkpoint is done and we have
moved on?

if (showprogress && !verbose)
fprintf(stderr, "waiting for checkpoint\n");

That would seem more in keeping with how the other progress messages
operate.

Cheers,

Jeff


From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-03-29 11:05:54
Message-ID: 1490785554.18436.18.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:
> On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Is there an argument for back-patching this?
>
>
> Seems you were typing that at the same time as we did.
>
>
> I'm considering it, but not swayed in either direction. Should I take
> your comment as a vote that we should back-patch it?

I've checked back into this thread, and there seems to be a +1 from Tom
and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
decide against it in the end, or is this still an open item?

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-03-31 06:59:18
Message-ID: CABUevEwHSy=oaimevRryg=sFC-0CSkyksFFt85BCM-uCAgfDsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 29, 2017 at 1:05 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
wrote:

> Hi,
>
> Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:
> > On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Is there an argument for back-patching this?
> >
> >
> > Seems you were typing that at the same time as we did.
> >
> >
> > I'm considering it, but not swayed in either direction. Should I take
> > your comment as a vote that we should back-patch it?
>
> I've checked back into this thread, and there seems to be a +1 from Tom
> and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
> decide against it in the end, or is this still an open item?

No, I plan to work on it, so it's still an open item. I've been backlogged
with other things, but I will try to get too it soon.

(This also includes considering Jeff's note)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-04-01 15:12:11
Message-ID: CABUevEzC8qwdMv+1q6Zsk5Dr2TKipvrRgNV_hxg+z_Ao4kLSPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 27, 2017 at 7:46 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
>
>>
>> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de
>> > wrote:
>>
>>> Hi,
>>>
>>> Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
>>> > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
>>> > <alvherre(at)2ndquadrant(dot)com> wrote:
>>> > > I'd rather have a --quiet mode instead. If you're running it by
>>> hand,
>>> > > you're likely to omit the switch, whereas when writing the cron job
>>> > > you're going to notice lack of switch even before you let the job run
>>> > > once.
>>> >
>>> > Well, that might've been a better way to design it, but changing it
>>> > now would break backward compatibility and I'm not really sure that's
>>> > a good idea. Even if it is, it's a separate concern from whether or
>>> > not in the less-quiet mode we should point out that we're waiting for
>>> > a checkpoint on the server side.
>>>
>>> ISTM the consensus is that there should be no output in regular mode,
>>> but a message should be displayed in verbose and progress mode.
>>>
>>> So I went forth and also added a message in progress mode (unless
>>> verbose messages are requested anyway).
>>>
>>> Regarding the documentation, I tried to clarify the difference between
>>> the checkpoint types a bit more, but I think any further action is
>>> probably a larger rewrite of the documentation on this topic.
>>>
>>> So attached are two patches, I've split it up in the documentation and
>>> the code output part. I'll add it as one commitfest entry in the
>>> "Clients" section though, as it's not really a big patch, unless
>>> somebody thinks it should have a secon entry in "Documentation"?
>>
>>
>> Agreed, and applied as one patch. Except I noticed you also fixed a
>> couple of entries which were missing the progname in the messages -- I
>> broke those out to a separate patch instead.
>>
>> Made a small change to "using as much I/O as available" rather than "as
>> possible", which I think is a better wording, along with the change of the
>> idle wording I suggested before. (but feel free to point it out to me if
>> that's wrong).
>>
>
> Should the below fprintf end in a \r rather than a \n, so that the the
> progress message gets over-written once the checkpoint is done and we have
> moved on?
>
> if (showprogress && !verbose)
> fprintf(stderr, "waiting for checkpoint\n");
>
> That would seem more in keeping with how the other progress messages
> operate.
>
>
Agreed, that makes more sense. I've pushed a patch that does this.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-04-01 15:29:42
Message-ID: CABUevEzp4a_wYuT_YKsmOPONENAy5aHyE-06RutYcPO5Twwa=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 31, 2017 at 8:59 AM, Magnus Hagander <magnus(at)hagander(dot)net>
wrote:

>
> On Wed, Mar 29, 2017 at 1:05 PM, Michael Banck <michael(dot)banck(at)credativ(dot)de>
> wrote:
>
>> Hi,
>>
>> Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:
>> > On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Is there an argument for back-patching this?
>> >
>> >
>> > Seems you were typing that at the same time as we did.
>> >
>> >
>> > I'm considering it, but not swayed in either direction. Should I take
>> > your comment as a vote that we should back-patch it?
>>
>> I've checked back into this thread, and there seems to be a +1 from Tom
>> and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
>> decide against it in the end, or is this still an open item?
>
>
> No, I plan to work on it, so it's still an open item. I've been backlogged
> with other things, but I will try to get too it soon.
>
> (This also includes considering Jeff's note)
>
>
I've applied a backpatch to 9.4. Prior to that pretty much the entire patch
is a conflict, so it would need a full rewrite.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gitlab post-mortem: pg_basebackup waiting for checkpoint
Date: 2017-04-03 07:17:44
Message-ID: 1491203864.25270.1.camel@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am Samstag, den 01.04.2017, 17:29 +0200 schrieb Magnus Hagander:

> I've applied a backpatch to 9.4. Prior to that pretty much the entire
> patch is a conflict, so it would need a full rewrite.

Thanks!

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer