Re: pg_start_backup() takes too long

Lists: pgsql-general
From: "Ivan Zolotukhin" <ivan(dot)zolotukhin(at)gmail(dot)com>
To: "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: pg_start_backup() takes too long
Date: 2008-09-10 10:20:01
Message-ID: 751e56400809100320t68128c22q3d7b0afb526d8dc6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,

What is the reason for

select pg_start_backup('label');

taking 10 minutes on not so loaded system even right after manual checkpoint?


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-23 02:18:52
Message-ID: 200809230218.m8N2Iqx25311@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Ivan Zolotukhin wrote:
> Hello,
>
> What is the reason for
>
> select pg_start_backup('label');
>
> taking 10 minutes on not so loaded system even right after manual checkpoint?

No idea; something is seriously wrong if that is happening. Do the
database server logs or kernel logs show anything unusual?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Ivan Zolotukhin" <ivan(dot)zolotukhin(at)gmail(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-28 15:27:14
Message-ID: 751e56400809280827v6dd2ac81t3a8e5581fd3404d0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,

Nothing bad both in system and postgres logs :( No serious activity
during backup. I've had to change statement_timeout for backup user to
make it work. But I cannot reproduce this case unfortunately.

Regards,
Ivan

On Tue, Sep 23, 2008 at 6:18 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Ivan Zolotukhin wrote:
>> Hello,
>>
>> What is the reason for
>>
>> select pg_start_backup('label');
>>
>> taking 10 minutes on not so loaded system even right after manual checkpoint?
>
> No idea; something is seriously wrong if that is happening. Do the
> database server logs or kernel logs show anything unusual?
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +
>


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-28 15:35:49
Message-ID: 48DFA455.5050808@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Ivan Zolotukhin wrote:
> Hello,
>
> Nothing bad both in system and postgres logs :( No serious activity
> during backup. I've had to change statement_timeout for backup user to
> make it work. But I cannot reproduce this case unfortunately.

This is actually not uncommon and PostgreSQL shows exactly nothing in
terms of why it is taking so long. The only assumption I have come up
with is that start_backup does cause a checkpoint.

Sincerely,

Joshua D. Drake


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-28 16:18:05
Message-ID: 1222618685.4445.1040.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Sun, 2008-09-28 at 08:35 -0700, Joshua D. Drake wrote:
> Ivan Zolotukhin wrote:
> > Hello,
> >
> > Nothing bad both in system and postgres logs :( No serious activity
> > during backup. I've had to change statement_timeout for backup user to
> > make it work. But I cannot reproduce this case unfortunately.
>
> This is actually not uncommon and PostgreSQL shows exactly nothing in
> terms of why it is taking so long. The only assumption I have come up
> with is that start_backup does cause a checkpoint.

Yes, it does a normal checkpoint and writes a file. No reason for it to
take longer than any other checkpoint.

At 8.2 and below checkpoints were frequently delayed on busy systems.
This was because of lwlock starvation during commit phase of
transactions. That was fixed in 8.3.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Ivan Zolotukhin" <ivan(dot)zolotukhin(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 09:39:58
Message-ID: 751e56400809290239o4ed25292o13d023bd14753b0d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Guys,

This is all not about checkpoints. As I've mentioned in the first
message, even right after manual run of CHECKPOINT command in psql
pg_start_backup() takes same time (~10 minutes).

Regards,
Ivan

On Sun, Sep 28, 2008 at 8:18 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Sun, 2008-09-28 at 08:35 -0700, Joshua D. Drake wrote:
>> Ivan Zolotukhin wrote:
>> > Hello,
>> >
>> > Nothing bad both in system and postgres logs :( No serious activity
>> > during backup. I've had to change statement_timeout for backup user to
>> > make it work. But I cannot reproduce this case unfortunately.
>>
>> This is actually not uncommon and PostgreSQL shows exactly nothing in
>> terms of why it is taking so long. The only assumption I have come up
>> with is that start_backup does cause a checkpoint.
>
> Yes, it does a normal checkpoint and writes a file. No reason for it to
> take longer than any other checkpoint.
>
> At 8.2 and below checkpoints were frequently delayed on busy systems.
> This was because of lwlock starvation during commit phase of
> transactions. That was fixed in 8.3.
>
> --
> Simon Riggs www.2ndQuadrant.com
> PostgreSQL Training, Services and Support
>
>


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 10:12:20
Message-ID: 1222683140.4445.1120.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Mon, 2008-09-29 at 13:39 +0400, Ivan Zolotukhin wrote:

> This is all not about checkpoints. As I've mentioned in the first
> message, even right after manual run of CHECKPOINT command in psql
> pg_start_backup() takes same time (~10 minutes).

As explained, there's not very much going on apart from the checkpoint
and that can be problematic.

What version are you running?
What are your checkpoint_timeout and checkpoint_completion_target
settings?

My guesses are 8.3, 20 minutes, and default.

pg_start_backup() doesn't do an immediate checkpoint, it does a smooth
one, so doing a CHECKPOINT beforehand should make no difference in 8.3.

(Looks at code...)

I'm surprised that checkpoint smoothing moves slowly even when it has so
little to do. ISTM checkpoint completion target should set its write
rate according to the thought that if shared_buffers were all dirty it
would write them out in checkpoint_timeout *
checkpoint_completion_target seconds. However, what it does is write
them *all* out in that time, no matter how many dirty blocks there are.
If there is just a few blocks to write, we take the *same* time to write
them as if it were all dirty. Which looks fairly ludicrous to me, but
the only time that is possible in current code is pg_start_backup()
since no other code requests a checkpoint exactly when you ask, but
slowly. It makes more sense to have a constant write rate during
checkpoint, or at very least a minimum rate during checkpoint.

IMHO the checkpoint smoothing code is wrong, but since it only shows
itself for pg_start_backup() I think people will say we should change
that instead. If we do, then we'll get people saying "how come
pg_start_backup() causes such a performance drop?" because we start
doing an immediate checkpoint. The idea of this is that "online backup"
should have as little effect as possible on normal running. So I suggest
we change the checkpoint code instead.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 12:35:37
Message-ID: 10113.1222691737@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> I'm surprised that checkpoint smoothing moves slowly even when it has so
> little to do.

AFAIK that's operating as designed. The point being that we shouldn't
create any more I/O load than we absolutely have to.

It's not clear to me that it's a bug for pg_start_backup to take awhile.
If it is a bug then I'd vote for just making it do an immediate
checkpoint --- that might cause big I/O load but it's hardly likely to
be worse than what will happen when you start taking the subsequent
filesystem backup. The checkpoint code is too complicated already;
I don't want to make it support a third behavior. And I'd vote against
taking out the current default behavior.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 14:03:51
Message-ID: 1222697031.4445.1238.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Mon, 2008-09-29 at 08:35 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > I'm surprised that checkpoint smoothing moves slowly even when it has so
> > little to do.
>
> AFAIK that's operating as designed. The point being that we shouldn't
> create any more I/O load than we absolutely have to.
>
> It's not clear to me that it's a bug for pg_start_backup to take awhile.

> If it is a bug then I'd vote for just making it do an immediate
> checkpoint --- that might cause big I/O load but it's hardly likely to
> be worse than what will happen when you start taking the subsequent
> filesystem backup.

It was a clear intention for it to *not* cause a spike if we could avoid
it. The idea was if you wanted it to happen quickly then you could do a
checkpoint command first... oh well.

People might want to I/O limit the backup also, which they can do
without needing to let us know.

I'm happy to put an option in for this, so we have another function:
pg_start_backup(label text, immediate_chkpt boolean). I'll not be
rushing to do this though given my current TODO.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 21:49:26
Message-ID: Pine.GSO.4.64.0809291725170.16949@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 29 Sep 2008, Simon Riggs wrote:

> I'm surprised that checkpoint smoothing moves slowly even when it has so
> little to do. ISTM checkpoint completion target should set its write
> rate according to the thought that if shared_buffers were all dirty it
> would write them out in checkpoint_timeout *
> checkpoint_completion_target seconds. However, what it does is write
> them *all* out in that time, no matter how many dirty blocks there are.
> If there is just a few blocks to write, we take the *same* time to write
> them as if it were all dirty.

The checkpoint smoothing code that made it into 8.3 missed a couple of
nice to have features that just didn't make the schedule cut-off.
Enforcing a minimum rate was one, another was smoothing fsync calls.

Back when we were talking about the patch to sort writes at checkpoint
time, someone (I think you actually) commented that it might be worthwile
to create some sort of hook for making behavior of checkpoint-time dirty
buffer processing easy to change with a custom strategy. The sorted
behavior would then be the first such strategy available. Another one I
was thinking of was something that specified min+max write writes, which
would make this problem go away--might even auto-tune checkpoint_segments
or replace it altogether with an implementation based on those inputs.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 23:06:46
Message-ID: 200809292306.m8TN6kX20231@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Simon Riggs wrote:
> > If it is a bug then I'd vote for just making it do an immediate
> > checkpoint --- that might cause big I/O load but it's hardly likely to
> > be worse than what will happen when you start taking the subsequent
> > filesystem backup.
>
> It was a clear intention for it to *not* cause a spike if we could avoid
> it. The idea was if you wanted it to happen quickly then you could do a
> checkpoint command first... oh well.
>
> People might want to I/O limit the backup also, which they can do
> without needing to let us know.
>
> I'm happy to put an option in for this, so we have another function:
> pg_start_backup(label text, immediate_chkpt boolean). I'll not be
> rushing to do this though given my current TODO.

I agree with Tom; either we make the pg_start_backup() checkpoint
immediate or leave the behavior unchanged.

Personally I think immediate makes more sense because issuing
pg_start_backup() seems like it should behave like a manual CHECKPOINT
command.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Joshua Drake <jd(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-29 23:26:28
Message-ID: 20080929162628.1a05852b@jd-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 29 Sep 2008 19:06:46 -0400 (EDT)
Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> I agree with Tom; either we make the pg_start_backup() checkpoint
> immediate or leave the behavior unchanged.
>
> Personally I think immediate makes more sense because issuing
> pg_start_backup() seems like it should behave like a manual CHECKPOINT
> command.
>

I am not actually that concerned that it takes a long time now that I
know why. I am more concerned that it isn't obvious why it takes so
long. Something like this would be more than sufficient:

elog(NOTICE, "pg_start_backup called, issuing CHECKPOINT");
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT);

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/


From: "Ivan Zolotukhin" <ivan(dot)zolotukhin(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_start_backup() takes too long
Date: 2008-09-30 08:58:58
Message-ID: 751e56400809300158j6703ba40q849af8bd61440015@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Sep 29, 2008 at 2:12 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Mon, 2008-09-29 at 13:39 +0400, Ivan Zolotukhin wrote:
>
>> This is all not about checkpoints. As I've mentioned in the first
>> message, even right after manual run of CHECKPOINT command in psql
>> pg_start_backup() takes same time (~10 minutes).
>
> As explained, there's not very much going on apart from the checkpoint
> and that can be problematic.
>
> What version are you running?
> What are your checkpoint_timeout and checkpoint_completion_target
> settings?
>
> My guesses are 8.3, 20 minutes, and default.

You're dead right: 8.3.3, 30 minutes (it's intended), and default.

> pg_start_backup() doesn't do an immediate checkpoint, it does a smooth
> one, so doing a CHECKPOINT beforehand should make no difference in 8.3.

Aham, now I see.

Just a few points on pg_start_backup() from user point of view. I
personally would prefer to have some control over the process, e.g. it
would be nice to have proposed pg_start_backup(label text,
immediate_chkpt boolean). But if that's not what you want by some
other reason, it would be nice to add at least log notice telling that
system is going to make a spreaded checkpoint and it's gonna take this
approximate time (one can estimate a time limit from config constants,
right?).


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Ivan Zolotukhin <ivan(dot)zolotukhin(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_start_backup() takes too long
Date: 2008-10-01 06:27:05
Message-ID: 1222842425.4445.1403.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Tue, 2008-09-30 at 12:58 +0400, Ivan Zolotukhin wrote:

> Just a few points on pg_start_backup() from user point of view. I
> personally would prefer to have some control over the process, e.g. it
> would be nice to have proposed pg_start_backup(label text,
> immediate_chkpt boolean).

I've added this function to Infrastructure Changes for Recovery (v8)
patch, with a NOTICE message as suggested also. Seemed easier to work on
it while I was there.

No changes to bgwriter smoothing.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support