Quick Links

Re: [ADMIN] pg_basebackup blocking all queries with horrible performance

Lists:	pgsql-adminpgsql-hackers

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	pgsql-admin(at)postgresql(dot)org
Subject:	pg_basebackup blocking all queries with horrible performance
Date:	2012-06-07 17:41:36
Message-ID:	CAP=oouEy9xxa6biSnqNSJKMf0fuKsufiYjbz5SJyvj0gc0Qkiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

Greetings,
I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
replication to 3 hot standby servers). All of them are running
Fedora-16-x86_64.

http://wiki.postgresql.org/wiki/Lock_Monitoring

I'm finding that I cannot runpg_basebackup at all, or it slows down all
SQL queries from running until pg_basebackup has completed (and the
load on the box just takes off to over 30.00). By "blocks" I mean
that any query that is submitted just hangs and does not return for
seconds or sometimes even minutes
until pg_basebackup has stopped. I'm assuming that this isn't
expected behavior, so I'm rather confused on what is going on. The
command that I'm issuing is:
pg_basebackup -v -D /mnt/backups/backups/tmp0 -x -Ft -U postgres

Can someone provide some guidance on how to debug this? Or is there
some way to reduce the performance/priority of pg_basebackup so that
it has much less impact on overall performance?

thanks!

From:	Fabricio <fabrixio1(at)hotmail(dot)com>
To:	<pgsql-admin(at)postgresql(dot)org>
Subject:	could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": No such file or directory
Date:	2012-06-07 18:01:37
Message-ID:	SNT139-W522649589173A052C32010FEF30@phx.gbl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

Hi.

I have this problem:

I have PostgreSQL 9.1.3 and the last night crash it.

This was the first error after an autovacuum (the night before last):

<2012-06-06 00:59:07 MDT 814 4fceffbb.32e
>LOG: autovacuum: found orphan temp table
"(null)"."tmpmuestadistica" in database
"dbRX"
<2012-06-06 01:05:26 MDT 1854 4fc7d1eb.73e
>LOG: could not rename temporary statistics file
"pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": No such file or
directory
<2012-06-06 01:05:28 MDT 1383 4fcf0136.567 >ERROR: tuple concurrently updated
<2012-06-06 01:05:28 MDT 1383 4fcf0136.567 >CONTEXT: automatic vacuum of table "global.pg_catalog.pg_attrdef"
<2012-06-06
01:06:09 MDT 1851 4fc7d1eb.73b >ERROR: xlog flush request
4/E29EE490 is not satisfied --- flushed only to 3/13527A10
<2012-06-06 01:06:09 MDT 1851 4fc7d1eb.73b >CONTEXT: writing block 0 of relation base/311360/12244_vm
<2012-06-06
01:06:10 MDT 1851 4fc7d1eb.73b >ERROR: xlog flush request
4/E29EE490 is not satisfied --- flushed only to 3/13527A10
<2012-06-06 01:06:10 MDT 1851 4fc7d1eb.73b >CONTEXT: writing block 0 of relation base/311360/12244_vm
<2012-06-06 01:06:10 MDT 1851 4fc7d1eb.73b >WARNING: could not write block 0 of base/311360/12244_vm
<2012-06-06 01:06:10 MDT 1851 4fc7d1eb.73b >DETAIL: Multiple failures --- write error might be permanent.

Last night it was terminated by signal 6.

<2012-06-07 01:36:44 MDT 2509 4fd05a0c.9cd >LOG: startup process (PID 2525) was terminated by signal 6: Aborted
<2012-06-07 01:36:44 MDT 2509 4fd05a0c.9cd >LOG: aborting startup due to startup process failure
<2012-06-07
01:37:37 MDT 2680 4fd05a41.a78 >LOG: database system shutdown
was interrupted; last known up at 2012-06-07 01:29:40 MDT
<2012-06-07
01:37:37 MDT 2680 4fd05a41.a78 >LOG: could not open file
"pg_xlog/000000010000000300000013" (log file 3, segment 19): No such
file or directory
<2012-06-07 01:37:37 MDT 2680 4fd05a41.a78 >LOG: invalid primary checkpoint record

And the only option was pg_resetxlog.

After this a lot of querys showed me this error:
<2012-06-07 09:24:22 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0 for toast value 393330 in pg_toast_2619
<2012-06-07 09:24:31 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0 for toast value 393332 in pg_toast_2619

I lost some databases.

I restarted the cluster again with initdb and then I restored the databases that I could backup (for the other I restored an old backup)

no space or permissions problem. No filesystem or disk error.

Can you help me to know what happened?

Thanks and regards...

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-07 18:04:53
Message-ID:	CAP=oouEu0sRCTvk+99wr-k3HG1vN3pyL4PXvVPQ+UZcAz6x2_g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> Greetings,
> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
> replication to 3 hot standby servers). All of them are running
> Fedora-16-x86_64.
>
> http://wiki.postgresql.org/wiki/Lock_Monitoring

err, i included that URL but neglected to explain why. On a different
list someone suggested that I verify that there were no locks that
were blocking things, and I did so, and found no locks.

So I'm still at a loss why pg_basebackup is killing perf, and would
appreciate pointers on how to debug it or at least reduce its impact
on performance if that is possible.

tahnks

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-07 19:40:45
Message-ID:	CABUevEw9ax5SLaQYjf56gFM82SOfROMHHtWXQiMkbxVPrvxAFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>> Greetings,
>> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
>> replication to 3 hot standby servers). All of them are running
>> Fedora-16-x86_64.
>>
>> http://wiki.postgresql.org/wiki/Lock_Monitoring
>
> err, i included that URL but neglected to explain why. On a different
> list someone suggested that I verify that there were no locks that
> were blocking things, and I did so, and found no locks.
>
> So I'm still at a loss why pg_basebackup is killing perf, and would
> appreciate pointers on how to debug it or at least reduce its impact
> on performance if that is possible.
>

My guess would be that you are overloading your I/O system. You should
look at values from iostat and vmstat from when the system works fine
and when you run pg_basebackup, that should give you a hint in the
right direction.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-07 20:08:27
Message-ID:	CAP=oouGeFVeDvEFa6DXf1YBwv-23rAh0PZ-5xOcezAFspC+gGg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>> On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>> Greetings,
>>> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
>>> replication to 3 hot standby servers). All of them are running
>>> Fedora-16-x86_64.
>>>
>>> http://wiki.postgresql.org/wiki/Lock_Monitoring
>>
>> err, i included that URL but neglected to explain why. On a different
>> list someone suggested that I verify that there were no locks that
>> were blocking things, and I did so, and found no locks.
>>
>> So I'm still at a loss why pg_basebackup is killing perf, and would
>> appreciate pointers on how to debug it or at least reduce its impact
>> on performance if that is possible.
>>
>
> My guess would be that you are overloading your I/O system. You should
> look at values from iostat and vmstat from when the system works fine
> and when you run pg_basebackup, that should give you a hint in the
> right direction.

ok, thanks. i'll take a look at that. If this turns out to be the
issue, is there some way to get pg_basebackup to run more slowly, so
that it has less impact? Or could I do this with ionice on the
pg_basebackup process?

From:	Jerry Sievers <gsievers19(at)comcast(dot)net>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-08 00:07:24
Message-ID:	874nqmy8cj.fsf@comcast.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

Lonni J Friedman <netllama(at)gmail(dot)com> writes:

> On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>> On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>> On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>>> Greetings,
>>>> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
>>>> replication to 3 hot standby servers). Â All of them are running
>>>> Fedora-16-x86_64.
>>>>
>>>> http://wiki.postgresql.org/wiki/Lock_Monitoring
>>>
>>> err, i included that URL but neglected to explain why. Â On a different
>>> list someone suggested that I verify that there were no locks that
>>> were blocking things, and I did so, and found no locks.
>>>
>>> So I'm still at a loss why pg_basebackup is killing perf, and would
>>> appreciate pointers on how to debug it or at least reduce its impact
>>> on performance if that is possible.
>>>
>>
>> My guess would be that you are overloading your I/O system. You should
>> look at values from iostat and vmstat from when the system works fine
>> and when you run pg_basebackup, that should give you a hint in the
>> right direction.
>
> ok, thanks. i'll take a look at that. If this turns out to be the
> issue, is there some way to get pg_basebackup to run more slowly, so
> that it has less impact? Or could I do this with ionice on the
> pg_basebackup process?

You might try stopping pg_basebackup in place with SIGSTOP and check
if problem goes away. SIGCONT and you should start having
sluggishness again.

If verified, then any sort of throttling mechanism should work.

> --
> Sent via pgsql-admin mailing list (pgsql-admin(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres(dot)consulting(at)comcast(dot)net
p: 732.216.7255

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Jerry Sievers <gsievers19(at)comcast(dot)net>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-08 01:01:46
Message-ID:	CAP=oouF_gTLJvxQ8tK8czZ_wRF3Xt63j40LJnBfbHE26qfOYgg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers <gsievers19(at)comcast(dot)net> wrote:
> Lonni J Friedman <netllama(at)gmail(dot)com> writes:
>
>> On Thu, Jun 7, 2012 at 12:40 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>
>>> On Thu, Jun 7, 2012 at 8:04 PM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>>> On Thu, Jun 7, 2012 at 10:41 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>>>> Greetings,
>>>>> I have a 4 server postgresql-9.1.3 cluster (one master doing streaming
>>>>> replication to 3 hot standby servers). All of them are running
>>>>> Fedora-16-x86_64.
>>>>>
>>>>> http://wiki.postgresql.org/wiki/Lock_Monitoring
>>>>
>>>> err, i included that URL but neglected to explain why. On a different
>>>> list someone suggested that I verify that there were no locks that
>>>> were blocking things, and I did so, and found no locks.
>>>>
>>>> So I'm still at a loss why pg_basebackup is killing perf, and would
>>>> appreciate pointers on how to debug it or at least reduce its impact
>>>> on performance if that is possible.
>>>>
>>>
>>> My guess would be that you are overloading your I/O system. You should
>>> look at values from iostat and vmstat from when the system works fine
>>> and when you run pg_basebackup, that should give you a hint in the
>>> right direction.
>>
>> ok, thanks. i'll take a look at that. If this turns out to be the
>> issue, is there some way to get pg_basebackup to run more slowly, so
>> that it has less impact? Or could I do this with ionice on the
>> pg_basebackup process?
>
> You might try stopping pg_basebackup in place with SIGSTOP and check
> if problem goes away. SIGCONT and you should start having
> sluggishness again.
>
> If verified, then any sort of throttling mechanism should work.

I'm certain that the problem is triggered only when pg_basebackup is
running. Its very predictable, and goes away as soon as pg_basebackup
finishes running. What do you mean by a throttling mechanism?

From:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-08 06:04:41
Message-ID:	4FD195F9.8080108@ringerc.id.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net> wrote:
>> You might try stopping pg_basebackup in place with SIGSTOP and check
>> if problem goes away. SIGCONT and you should start having
>> sluggishness again.
>>
>> If verified, then any sort of throttling mechanism should work.
>
> I'm certain that the problem is triggered only when pg_basebackup is
> running. Its very predictable, and goes away as soon as pg_basebackup
> finishes running. What do you mean by a throttling mechanism?

Sure, it only happens when pg_basebackup is running. But if you *pause*
pg_basebackup, so it's still running but not currently doing work, does
the problem go away? Does it come back when you unpause pg_basebackup?
That's what Jerry was telling you to try.

If the problem goes away when you pause pg_basebackup and comes back
when you unpause it, it's probably a system load problem.

If it doesn't go away, it's more likely to be a locking issue or
something _other_ than simple load.

SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT")
resumes it, so on Linux you can use these to try and find out. When you
SIGSTOP pg_basebackup then the postgres backend associated with it
should block shortly afterwards as its buffers fill up and it can't send
more data, so the load should come off the server.

A "throttling mechanism" refers to anything that limits the rate or
speed of a thing. In this case, what you want to do if your problem is
system overload is to limit the speed at which pg_basebackup does its
work so other things can still get work done. In other words you want to
throttle it. Typical throttling mechanisms include the "ionice" and
"renice" commands to change I/O and CPU priority, respectively.

Note that you may need to change the priority of the *backend* that
pg_basebackup is using, not necessarily the pg_basebackup command its
self. I haven't done enough with Pg's replication to know how that
works, so someone else will have to fill that bit in.

--
Craig Ringer

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc:	Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-08 19:30:48
Message-ID:	CAP=oouGZ4=B8fg6694urDoL-E7SsksOuTZGXmP_6UQ3T67vj5w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>
>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>> wrote:
>>>
>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>
>>> if problem goes away. SIGCONT and you should start having
>>> sluggishness again.
>>>
>>> If verified, then any sort of throttling mechanism should work.
>>
>>
>> I'm certain that the problem is triggered only when pg_basebackup is
>> running. Its very predictable, and goes away as soon as pg_basebackup
>> finishes running. What do you mean by a throttling mechanism?
>
>
> Sure, it only happens when pg_basebackup is running. But if you *pause*
> pg_basebackup, so it's still running but not currently doing work, does the
> problem go away? Does it come back when you unpause pg_basebackup? That's
> what Jerry was telling you to try.
>
> If the problem goes away when you pause pg_basebackup and comes back when
> you unpause it, it's probably a system load problem.
>
> If it doesn't go away, it's more likely to be a locking issue or something
> _other_ than simple load.
>
> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
> it, so on Linux you can use these to try and find out. When you SIGSTOP
> pg_basebackup then the postgres backend associated with it should block
> shortly afterwards as its buffers fill up and it can't send more data, so
> the load should come off the server.
>
> A "throttling mechanism" refers to anything that limits the rate or speed of
> a thing. In this case, what you want to do if your problem is system
> overload is to limit the speed at which pg_basebackup does its work so other
> things can still get work done. In other words you want to throttle it.
> Typical throttling mechanisms include the "ionice" and "renice" commands to
> change I/O and CPU priority, respectively.
>
> Note that you may need to change the priority of the *backend* that
> pg_basebackup is using, not necessarily the pg_basebackup command its self.
> I haven't done enough with Pg's replication to know how that works, so
> someone else will have to fill that bit in.

Thanks for your reply. I've confirmed that issuing a SIGSTOP does
eliminate the thrashing, and issuing a SIGCONT resumes the thrash.

I've looked at iostat output both before & during pg_basebackup runs,
and I'm not seeing any indication that the problem is due to disk IO
bottlenecks. The numbers don't vary very much at all between the good
& bad times. This is typical when pg_basebackup is running:
########
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
md0
0.00 0.00 67.76 68.62 4.42 1.46
88.34 0.00 0.00 0.00 0.00 0.00 0.00
########

and this is when the system is ok:
########
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
md0
0.00 0.00 68.04 68.56 4.44 1.46
88.39 0.00 0.00 0.00 0.00 0.00 0.00
########

I looked at vmstat output, but nothing is jumping out at me as being
dramatically different when pg_basebackup is running. swap in and
swap out are zero 100% of the time for the good & bad perf cases. I
can post example output if someone is interested, or if there's
something specific that I should be looking at as a potential problem,
let me know.

thanks

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-09 02:29:36
Message-ID:	CAHGQGwERsw_mmXcEktbkSC01cUs3-SXfQbNq5y5JDbMe8B=9RA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>>
>>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>> wrote:
>>>>
>>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>>
>>>> if problem goes away. SIGCONT and you should start having
>>>> sluggishness again.
>>>>
>>>> If verified, then any sort of throttling mechanism should work.
>>>
>>>
>>> I'm certain that the problem is triggered only when pg_basebackup is
>>> running. Its very predictable, and goes away as soon as pg_basebackup
>>> finishes running. What do you mean by a throttling mechanism?
>>
>>
>> Sure, it only happens when pg_basebackup is running. But if you *pause*
>> pg_basebackup, so it's still running but not currently doing work, does the
>> problem go away? Does it come back when you unpause pg_basebackup? That's
>> what Jerry was telling you to try.
>>
>> If the problem goes away when you pause pg_basebackup and comes back when
>> you unpause it, it's probably a system load problem.
>>
>> If it doesn't go away, it's more likely to be a locking issue or something
>> _other_ than simple load.
>>
>> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
>> it, so on Linux you can use these to try and find out. When you SIGSTOP
>> pg_basebackup then the postgres backend associated with it should block
>> shortly afterwards as its buffers fill up and it can't send more data, so
>> the load should come off the server.
>>
>> A "throttling mechanism" refers to anything that limits the rate or speed of
>> a thing. In this case, what you want to do if your problem is system
>> overload is to limit the speed at which pg_basebackup does its work so other
>> things can still get work done. In other words you want to throttle it.
>> Typical throttling mechanisms include the "ionice" and "renice" commands to
>> change I/O and CPU priority, respectively.
>>
>> Note that you may need to change the priority of the *backend* that
>> pg_basebackup is using, not necessarily the pg_basebackup command its self.
>> I haven't done enough with Pg's replication to know how that works, so
>> someone else will have to fill that bit in.
>
> Thanks for your reply. I've confirmed that issuing a SIGSTOP does
> eliminate the thrashing, and issuing a SIGCONT resumes the thrash.
>
> I've looked at iostat output both before & during pg_basebackup runs,
> and I'm not seeing any indication that the problem is due to disk IO
> bottlenecks. The numbers don't vary very much at all between the good
> & bad times. This is typical when pg_basebackup is running:
> ########
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> md0
> 0.00 0.00 67.76 68.62 4.42 1.46
> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
> ########
>
> and this is when the system is ok:
> ########
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> md0
> 0.00 0.00 68.04 68.56 4.44 1.46
> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
> ########
>
>
> I looked at vmstat output, but nothing is jumping out at me as being
> dramatically different when pg_basebackup is running. swap in and
> swap out are zero 100% of the time for the good & bad perf cases. I
> can post example output if someone is interested, or if there's
> something specific that I should be looking at as a potential problem,
> let me know.

Did you set synchronous_standby_names to '*'? If so, the problem you
encountered can happen.

When synchronous_standby_names is '*', you cannot control which
standbys take a role of synchronous standby. The standby which you
expect to run as asynchronous one might be synchronous one. So
my guess is that at first one of your three standbys was running as
synchronous standby, and all queries were executed normally. But
when you started pg_basebackup, pg_basebackup unexpectedly
got the role of synchronous standby from another standby. Since
pg_basebackup doesn't send the information about replication
progress back to the master, all queries (more precisely, transaction
commit) got stuck, and kept waiting for the reply from synchronous
standby.

You can avoid this problem by setting synchronous_standby_names
to the names of your standbys instead of '*'.

This seems a bug. I think we should prevent pg_basebackup from
becoming synchronous standby. Thought?

Regards,

--
Fujii Masao

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-09 06:53:35
Message-ID:	CAOR=d=1vLOBPXcBhM+7kkUMFAb4oJpJREOxYqepHZX9o+ZyDuQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Fri, Jun 8, 2012 at 1:30 PM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> I've looked at iostat output both before & during pg_basebackup runs,
> and I'm not seeing any indication that the problem is due to disk IO
> bottlenecks. The numbers don't vary very much at all between the good
> & bad times. This is typical when pg_basebackup is running:
> ########
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> md0
> 0.00 0.00 67.76 68.62 4.42 1.46
> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
> ########
>
> and this is when the system is ok:
> ########
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> md0
> 0.00 0.00 68.04 68.56 4.44 1.46
> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
> ########

Two points. 1: md0 don't show things like %util, only the physical
drives will have that output, which is what you want to see, if it's
hopping up to 100%. 2: you need to run it with a number and get
something AFTER the first line, which is the average since the machine
was first turned on.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-09 12:51:10
Message-ID:	26788.1339246270@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> This seems a bug. I think we should prevent pg_basebackup from
> becoming synchronous standby. Thought?

Absolutely. If we have replication clients that are not actually
capable of being standbys, there *must* be a way for the master
to know that.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 10:43:32
Message-ID:	CABUevEwYxpqA0EJ4pdcpLY6QZuK4DbmoRy_PC+NDCWKvCXwwkw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> This seems a bug. I think we should prevent pg_basebackup from
>> becoming synchronous standby. Thought?
>
> Absolutely. If we have replication clients that are not actually
> capable of being standbys, there *must* be a way for the master
> to know that.

I thought we fixed this already by sending InvalidXlogRecPtr as flush
location? And that this only applied in 9.2?

Are you saying we picked pg_basebackup *in backup mode* (not log
streaming) as synchronous standby? If so then yes, that is
*definitely* a bug that should be fixed. We should never select a
connection that's not even streaming log as standby!

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 12:25:38
Message-ID:	CAHGQGwHaLwySPXNb+vcEL2Vay1UN36b8NkURLwV54d_4QriHzA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>> This seems a bug. I think we should prevent pg_basebackup from
>>> becoming synchronous standby. Thought?
>>
>> Absolutely. If we have replication clients that are not actually
>> capable of being standbys, there *must* be a way for the master
>> to know that.
>
> I thought we fixed this already by sending InvalidXlogRecPtr as flush
> location? And that this only applied in 9.2?
>
> Are you saying we picked pg_basebackup *in backup mode* (not log
> streaming) as synchronous standby?

Yes.

> If so then yes, that is
> *definitely* a bug that should be fixed. We should never select a
> connection that's not even streaming log as standby!

Agreed. Attached patch prevents pg_basebackup from becoming sync
standby. Also this patch fixes another problem: currently only walsender
which reaches STREAMING state can become sync walsender. OTOH,
sync walsender thinks that walsender with higher priority will be sync one
whether its state is STREAMING, and switches to potential sync walsender.
So when the standby with higher priority connects to the master, we
might have no sync standby until it reaches the STREAMING state.
To fix this problem, the patch switches walsender's state from sync to
potential *after* walsender with higher priority has reached the
STREAMING state.

We also should not select (1) background stream process forked from
pg_basebackup and (2) pg_receivexlog as sync standby because they
don't send back replication progress. To address this, I'm thinking to
introduce new option "NOSYNC" in "START_REPLICATION" command
as follows, and to change (1) and (2) so that they specify NOSYNC.

START_REPLICATION XXX/XXX [NOSYNC]

If the standby specifies NOSYNC option, it's never assigned as sync
standby even if its name is in synchronous_standby_names. Thought?

BTW, we are discussing about changing pg_receivexlog so that it sends
back replication progress, in another thread. So if this change will have
been applied, probably we don't need to change pg_receivexlog so that
it uses NOSYNC option.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
prevent_pgbasebackup_from_becoming_sync_standby_v1.patch	application/octet-stream	404 bytes

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 13:34:06
Message-ID:	CAHGQGwHPK2qvgjyHX_HwUN8n_cQighYGL7Z5e0jVst7zpbAj9g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>> becoming synchronous standby. Thought?
>>>
>>> Absolutely. If we have replication clients that are not actually
>>> capable of being standbys, there *must* be a way for the master
>>> to know that.
>>
>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>> location? And that this only applied in 9.2?
>>
>> Are you saying we picked pg_basebackup *in backup mode* (not log
>> streaming) as synchronous standby?
>
> Yes.
>
>> If so then yes, that is
>> *definitely* a bug that should be fixed. We should never select a
>> connection that's not even streaming log as standby!
>
> Agreed. Attached patch prevents pg_basebackup from becoming sync
> standby. Also this patch fixes another problem: currently only walsender
> which reaches STREAMING state can become sync walsender. OTOH,
> sync walsender thinks that walsender with higher priority will be sync one
> whether its state is STREAMING, and switches to potential sync walsender.
> So when the standby with higher priority connects to the master, we
> might have no sync standby until it reaches the STREAMING state.
> To fix this problem, the patch switches walsender's state from sync to
> potential *after* walsender with higher priority has reached the
> STREAMING state.
>
> We also should not select (1) background stream process forked from
> pg_basebackup and (2) pg_receivexlog as sync standby because they
> don't send back replication progress. To address this, I'm thinking to
> introduce new option "NOSYNC" in "START_REPLICATION" command
> as follows, and to change (1) and (2) so that they specify NOSYNC.
>
> START_REPLICATION XXX/XXX [NOSYNC]
>
> If the standby specifies NOSYNC option, it's never assigned as sync
> standby even if its name is in synchronous_standby_names. Thought?

The standby which always sends InvalidXLogRecPtr back should not
become sync one. So instead of NOSYNC option, by checking whether
InvalidXLogRecPtr is sent, we can avoid problematic sync standby.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 14:08:58
Message-ID:	CAHGQGwFF74FEJPQhp8wWFptAxOVwa3QkxiY+WCcDcrCd+tD4YQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>> becoming synchronous standby. Thought?
>>>>
>>>> Absolutely. If we have replication clients that are not actually
>>>> capable of being standbys, there *must* be a way for the master
>>>> to know that.
>>>
>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>> location? And that this only applied in 9.2?
>>>
>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>> streaming) as synchronous standby?
>>
>> Yes.
>>
>>> If so then yes, that is
>>> *definitely* a bug that should be fixed. We should never select a
>>> connection that's not even streaming log as standby!
>>
>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>> standby. Also this patch fixes another problem: currently only walsender
>> which reaches STREAMING state can become sync walsender. OTOH,
>> sync walsender thinks that walsender with higher priority will be sync one
>> whether its state is STREAMING, and switches to potential sync walsender.
>> So when the standby with higher priority connects to the master, we
>> might have no sync standby until it reaches the STREAMING state.
>> To fix this problem, the patch switches walsender's state from sync to
>> potential *after* walsender with higher priority has reached the
>> STREAMING state.
>>
>> We also should not select (1) background stream process forked from
>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>> don't send back replication progress. To address this, I'm thinking to
>> introduce new option "NOSYNC" in "START_REPLICATION" command
>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>
>> START_REPLICATION XXX/XXX [NOSYNC]
>>
>> If the standby specifies NOSYNC option, it's never assigned as sync
>> standby even if its name is in synchronous_standby_names. Thought?
>
> The standby which always sends InvalidXLogRecPtr back should not
> become sync one. So instead of NOSYNC option, by checking whether
> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.

We should not do this because Magnus is proposing the patch
(http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
which breaks the above assumption at all. So we should introduce
something like NOSYNC option.

Regards,

--
Fujii Masao

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 14:10:46
Message-ID:	CABUevEzGVfcciovM+i4wQQXN2YqLbG3v5k_r+J0OY=OhiaM5cg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>> becoming synchronous standby. Thought?
>>>>>
>>>>> Absolutely. If we have replication clients that are not actually
>>>>> capable of being standbys, there *must* be a way for the master
>>>>> to know that.
>>>>
>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>> location? And that this only applied in 9.2?
>>>>
>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>> streaming) as synchronous standby?
>>>
>>> Yes.
>>>
>>>> If so then yes, that is
>>>> *definitely* a bug that should be fixed. We should never select a
>>>> connection that's not even streaming log as standby!
>>>
>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>> standby. Also this patch fixes another problem: currently only walsender
>>> which reaches STREAMING state can become sync walsender. OTOH,
>>> sync walsender thinks that walsender with higher priority will be sync one
>>> whether its state is STREAMING, and switches to potential sync walsender.
>>> So when the standby with higher priority connects to the master, we
>>> might have no sync standby until it reaches the STREAMING state.
>>> To fix this problem, the patch switches walsender's state from sync to
>>> potential *after* walsender with higher priority has reached the
>>> STREAMING state.
>>>
>>> We also should not select (1) background stream process forked from
>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>> don't send back replication progress. To address this, I'm thinking to
>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>
>>> START_REPLICATION XXX/XXX [NOSYNC]
>>>
>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>> standby even if its name is in synchronous_standby_names. Thought?
>>
>> The standby which always sends InvalidXLogRecPtr back should not
>> become sync one. So instead of NOSYNC option, by checking whether
>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>
> We should not do this because Magnus is proposing the patch
> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
> which breaks the above assumption at all. So we should introduce
> something like NOSYNC option.

Wouldn't the better choice there in that case be to give a switch to
pg_receivexlog if you *want* it to be able to become a sync replica,
and by default disallow it? And then keep the backend just treating
InvalidXlogRecPtr as don't-become-sync-replica.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 14:29:47
Message-ID:	CAHGQGwF_OKN38_WDO_ap1dyp7QP0ajOuMLNxKsL14HbN6M8RWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>> becoming synchronous standby. Thought?
>>>>>>
>>>>>> Absolutely. If we have replication clients that are not actually
>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>> to know that.
>>>>>
>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>> location? And that this only applied in 9.2?
>>>>>
>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>> streaming) as synchronous standby?
>>>>
>>>> Yes.
>>>>
>>>>> If so then yes, that is
>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>> connection that's not even streaming log as standby!
>>>>
>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>> standby. Also this patch fixes another problem: currently only walsender
>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>> So when the standby with higher priority connects to the master, we
>>>> might have no sync standby until it reaches the STREAMING state.
>>>> To fix this problem, the patch switches walsender's state from sync to
>>>> potential *after* walsender with higher priority has reached the
>>>> STREAMING state.
>>>>
>>>> We also should not select (1) background stream process forked from
>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>> don't send back replication progress. To address this, I'm thinking to
>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>
>>>> START_REPLICATION XXX/XXX [NOSYNC]
>>>>
>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>
>>> The standby which always sends InvalidXLogRecPtr back should not
>>> become sync one. So instead of NOSYNC option, by checking whether
>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>
>> We should not do this because Magnus is proposing the patch
>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>> which breaks the above assumption at all. So we should introduce
>> something like NOSYNC option.
>
> Wouldn't the better choice there in that case be to give a switch to
> pg_receivexlog if you *want* it to be able to become a sync replica,
> and by default disallow it? And then keep the backend just treating
> InvalidXlogRecPtr as don't-become-sync-replica.

I don't object to making pg_receivexlog as sync standby at all. So at least
for me, that switch is not necessary. What I'm worried about is the
background stream process forked from pg_basebackup. I think that
it should not run as sync standby but sending back its replication progress
seems helpful because a user can see the progress from pg_stat_replication.
So I'm thinking that something like NOSYNC option is required.

Regards,

--
Fujii Masao

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-10 14:45:57
Message-ID:	CABUevEyDfaMwEPxQMkhBqTci0sTQwEV2WTu-NANEKtivw_FTOQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>
>>>>>>> Absolutely. If we have replication clients that are not actually
>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>> to know that.
>>>>>>
>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>> location? And that this only applied in 9.2?
>>>>>>
>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>> streaming) as synchronous standby?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> If so then yes, that is
>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>> connection that's not even streaming log as standby!
>>>>>
>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>> So when the standby with higher priority connects to the master, we
>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>> potential *after* walsender with higher priority has reached the
>>>>> STREAMING state.
>>>>>
>>>>> We also should not select (1) background stream process forked from
>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>
>>>>> START_REPLICATION XXX/XXX [NOSYNC]
>>>>>
>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>
>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>
>>> We should not do this because Magnus is proposing the patch
>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>> which breaks the above assumption at all. So we should introduce
>>> something like NOSYNC option.
>>
>> Wouldn't the better choice there in that case be to give a switch to
>> pg_receivexlog if you *want* it to be able to become a sync replica,
>> and by default disallow it? And then keep the backend just treating
>> InvalidXlogRecPtr as don't-become-sync-replica.
>
> I don't object to making pg_receivexlog as sync standby at all. So at least
> for me, that switch is not necessary. What I'm worried about is the
> background stream process forked from pg_basebackup. I think that
> it should not run as sync standby but sending back its replication progress
> seems helpful because a user can see the progress from pg_stat_replication.
> So I'm thinking that something like NOSYNC option is required.

On principle, no. By default, yes.

How about:
pg_basebackup background: *never* sends flush location, and therefor
won't become sync replica
pg_receivexlog *optionally* sends flush location. by defualt own't
become sync replica, but can be made so with a switch

(this is on top of the "make sure pg_basebackup in *non-streaming*
mode can never be picked" of coursE)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-11 13:19:12
Message-ID:	CABUevEyRmeVrJetsoeiHCXZn98oFnKKaxFMKm-s1y3KfzE9Pzw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jun 10, 2012 at 2:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>> becoming synchronous standby. Thought?
>>>
>>> Absolutely. If we have replication clients that are not actually
>>> capable of being standbys, there *must* be a way for the master
>>> to know that.
>>
>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>> location? And that this only applied in 9.2?
>>
>> Are you saying we picked pg_basebackup *in backup mode* (not log
>> streaming) as synchronous standby?
>
> Yes.
>
>> If so then yes, that is
>> *definitely* a bug that should be fixed. We should never select a
>> connection that's not even streaming log as standby!
>
> Agreed. Attached patch prevents pg_basebackup from becoming sync
> standby. Also this patch fixes another problem: currently only walsender
> which reaches STREAMING state can become sync walsender. OTOH,
> sync walsender thinks that walsender with higher priority will be sync one
> whether its state is STREAMING, and switches to potential sync walsender.
> So when the standby with higher priority connects to the master, we
> might have no sync standby until it reaches the STREAMING state.
> To fix this problem, the patch switches walsender's state from sync to
> potential *after* walsender with higher priority has reached the
> STREAMING state.

This fix needs to be applied independently of the other discussions,
since it affects 9.1 and needs to be backpatched.

So - applied, and backpatched.

The issues wrt the pg_basebackup background process and pg_receivexlog
are only for 9.2...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-11 15:47:00
Message-ID:	CABUevEx+oitecsYAX7+7R5cAD5hriUF4R21Skiv9NSiDr+m2oA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>
>>>>>>>>>>> Absolutely. If we have replication clients that are not actually
>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>> to know that.
>>>>>>>>>>
>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>
>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>> If so then yes, that is
>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>
>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>> STREAMING state.
>>>>>>>>>
>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>
>>>>>>>>> START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>
>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>
>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>
>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>> something like NOSYNC option.
>>>>>>
>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>
>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>> background stream process forked from pg_basebackup. I think that
>>>>> it should not run as sync standby but sending back its replication progress
>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>
>>>> On principle, no. By default, yes.
>>>>
>>>> How about:
>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>> won't become sync replica
>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>> become sync replica, but can be made so with a switch
>>>
>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>> background don't flush WAL files at all?
>>
>> That sounds like a "documentable issue".
>>
>> But maybe you're right, and we need the "never become sync" as a flag.
>
> You agreed to add something like NOSYNC option into START_REPLICATION command?

I'm on the fence. I was hoping somebody else would chime in with an
opinion as well.

I just realized this thread is on -admin. Moving it to -hackers so
more of the right people will spot it.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-11 16:06:58
Message-ID:	CAHGQGwHAb1KdipmdTJsXKPp3H4oqiJFdn02jrgnnrfKMLY08Xw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Tue, Jun 12, 2012 at 12:47 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>>>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>>
>>>>>>>>>>>> Absolutely. If we have replication clients that are not actually
>>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>>> to know that.
>>>>>>>>>>>
>>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>>
>>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>>
>>>>>>>>>> Yes.
>>>>>>>>>>
>>>>>>>>>>> If so then yes, that is
>>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>>
>>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>>> STREAMING state.
>>>>>>>>>>
>>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>>
>>>>>>>>>> START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>>
>>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>>
>>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>>
>>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>>> something like NOSYNC option.
>>>>>>>
>>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>>
>>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>>> background stream process forked from pg_basebackup. I think that
>>>>>> it should not run as sync standby but sending back its replication progress
>>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>>
>>>>> On principle, no. By default, yes.
>>>>>
>>>>> How about:
>>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>>> won't become sync replica
>>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>>> become sync replica, but can be made so with a switch
>>>>
>>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>>> background don't flush WAL files at all?
>>>
>>> That sounds like a "documentable issue".
>>>
>>> But maybe you're right, and we need the "never become sync" as a flag.
>>
>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>
> I'm on the fence. I was hoping somebody else would chime in with an
> opinion as well.

Regards,

--
Fujii Masao

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-11 17:37:41
Message-ID:	CAP=oouE43rQMs=hyGTiOZKPYFHoRvucvqiBfsSDYjf9uMU0kNg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>> On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>>>
>>>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>>> wrote:
>>>>>
>>>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>>>
>>>>> if problem goes away. SIGCONT and you should start having
>>>>> sluggishness again.
>>>>>
>>>>> If verified, then any sort of throttling mechanism should work.
>>>>
>>>>
>>>> I'm certain that the problem is triggered only when pg_basebackup is
>>>> running. Its very predictable, and goes away as soon as pg_basebackup
>>>> finishes running. What do you mean by a throttling mechanism?
>>>
>>>
>>> Sure, it only happens when pg_basebackup is running. But if you *pause*
>>> pg_basebackup, so it's still running but not currently doing work, does the
>>> problem go away? Does it come back when you unpause pg_basebackup? That's
>>> what Jerry was telling you to try.
>>>
>>> If the problem goes away when you pause pg_basebackup and comes back when
>>> you unpause it, it's probably a system load problem.
>>>
>>> If it doesn't go away, it's more likely to be a locking issue or something
>>> _other_ than simple load.
>>>
>>> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
>>> it, so on Linux you can use these to try and find out. When you SIGSTOP
>>> pg_basebackup then the postgres backend associated with it should block
>>> shortly afterwards as its buffers fill up and it can't send more data, so
>>> the load should come off the server.
>>>
>>> A "throttling mechanism" refers to anything that limits the rate or speed of
>>> a thing. In this case, what you want to do if your problem is system
>>> overload is to limit the speed at which pg_basebackup does its work so other
>>> things can still get work done. In other words you want to throttle it.
>>> Typical throttling mechanisms include the "ionice" and "renice" commands to
>>> change I/O and CPU priority, respectively.
>>>
>>> Note that you may need to change the priority of the *backend* that
>>> pg_basebackup is using, not necessarily the pg_basebackup command its self.
>>> I haven't done enough with Pg's replication to know how that works, so
>>> someone else will have to fill that bit in.
>>
>> Thanks for your reply. I've confirmed that issuing a SIGSTOP does
>> eliminate the thrashing, and issuing a SIGCONT resumes the thrash.
>>
>> I've looked at iostat output both before & during pg_basebackup runs,
>> and I'm not seeing any indication that the problem is due to disk IO
>> bottlenecks. The numbers don't vary very much at all between the good
>> & bad times. This is typical when pg_basebackup is running:
>> ########
>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>> md0
>> 0.00 0.00 67.76 68.62 4.42 1.46
>> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
>> ########
>>
>> and this is when the system is ok:
>> ########
>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>> md0
>> 0.00 0.00 68.04 68.56 4.44 1.46
>> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
>> ########
>>
>>
>> I looked at vmstat output, but nothing is jumping out at me as being
>> dramatically different when pg_basebackup is running. swap in and
>> swap out are zero 100% of the time for the good & bad perf cases. I
>> can post example output if someone is interested, or if there's
>> something specific that I should be looking at as a potential problem,
>> let me know.
>
> Did you set synchronous_standby_names to '*'? If so, the problem you
> encountered can happen.
>
> When synchronous_standby_names is '*', you cannot control which
> standbys take a role of synchronous standby. The standby which you
> expect to run as asynchronous one might be synchronous one. So
> my guess is that at first one of your three standbys was running as
> synchronous standby, and all queries were executed normally. But
> when you started pg_basebackup, pg_basebackup unexpectedly
> got the role of synchronous standby from another standby. Since
> pg_basebackup doesn't send the information about replication
> progress back to the master, all queries (more precisely, transaction
> commit) got stuck, and kept waiting for the reply from synchronous
> standby.
>
> You can avoid this problem by setting synchronous_standby_names
> to the names of your standbys instead of '*'.

I don't have synchronous_standby_names set at all. I'm only doing
asynchronous replication.

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fabricio <fabrixio1(at)hotmail(dot)com>
Cc:	pgsql-admin(at)postgresql(dot)org
Subject:	Re: could not rename temporary statistics file "pg_stat_tmp/pgstat.tmp" to "pg_stat_tmp/pgstat.stat": No such file or directory
Date:	2012-06-12 14:27:43
Message-ID:	CABUevEzpyuxxZRV+6g09gppvnNEnxSBZDoKpYoRvoSrnQKtHkw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 7, 2012 at 8:01 PM, Fabricio <fabrixio1(at)hotmail(dot)com> wrote:
> Hi.
>
> I have this problem:
>
> I have PostgreSQL 9.1.3 and the last night crash it.
>
> This was the first error after an autovacuum (the night before last):
>
> <2012-06-06 00:59:07 MDT    814 4fceffbb.32e >LOG: autovacuum: found orphan
> temp table "(null)"."tmpmuestadistica" in database "dbRX"
> <2012-06-06 01:05:26 MDT    1854 4fc7d1eb.73e >LOG: could not rename
> temporary statistics file "pg_stat_tmp/pgstat.tmp" to
> "pg_stat_tmp/pgstat.stat": No such file or directory
> <2012-06-06 01:05:28 MDT    1383 4fcf0136.567 >ERROR: tuple concurrently
> updated
> <2012-06-06 01:05:28 MDT    1383 4fcf0136.567 >CONTEXT: automatic vacuum of
> table "global.pg_catalog.pg_attrdef"
> <2012-06-06 01:06:09 MDT    1851 4fc7d1eb.73b >ERROR: xlog flush request
> 4/E29EE490 is not satisfied --- flushed only to 3/13527A10
> <2012-06-06 01:06:09 MDT    1851 4fc7d1eb.73b >CONTEXT: writing block 0 of
> relation base/311360/12244_vm
> <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >ERROR: xlog flush request
> 4/E29EE490 is not satisfied --- flushed only to 3/13527A10
> <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >CONTEXT: writing block 0 of
> relation base/311360/12244_vm
> <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >WARNING: could not write
> block 0 of base/311360/12244_vm
> <2012-06-06 01:06:10 MDT    1851 4fc7d1eb.73b >DETAIL: Multiple failures
> --- write error might be permanent.
>
>
> Last night it was terminated by signal 6.
>
> <2012-06-07 01:36:44 MDT    2509 4fd05a0c.9cd >LOG: startup process (PID
> 2525) was terminated by signal 6: Aborted
> <2012-06-07 01:36:44 MDT    2509 4fd05a0c.9cd >LOG: aborting startup due to
> startup process failure
> <2012-06-07 01:37:37 MDT    2680 4fd05a41.a78 >LOG: database system
> shutdown was interrupted; last known up at 2012-06-07 01:29:40 MDT
> <2012-06-07 01:37:37 MDT    2680 4fd05a41.a78 >LOG: could not open file
> "pg_xlog/000000010000000300000013" (log file 3, segment 19): No such file or
> directory
> <2012-06-07 01:37:37 MDT    2680 4fd05a41.a78 >LOG: invalid primary
> checkpoint record
>
> And the only option was pg_resetxlog.
>
> After this a lot of querys showed me this error:
> <2012-06-07 09:24:22 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0
> for toast value 393330 in pg_toast_2619
> <2012-06-07 09:24:31 MDT 1306 4fd0c7a6.51a >ERROR: missing chunk number 0
> for toast value 393332 in pg_toast_2619
>
> I lost some databases.
>
> I restarted the cluster again with initdb and then I restored the databases
> that I could backup (for the other I restored an old backup)
>
> no space or permissions problem. No filesystem or disk error.
>
> Can you help me to know what happened?

I'd say that everything still points to a filesystem error. Have you
tried unmounting it and running an offline check?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-12 17:49:11
Message-ID:	CAHGQGwHEz+B9SpxBNwRyr83YYJ-SugZ1tQFZY2g9n29x4a_Crw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>> On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>>> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>>>>
>>>>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>>>> wrote:
>>>>>>
>>>>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>>>>
>>>>>> if problem goes away. SIGCONT and you should start having
>>>>>> sluggishness again.
>>>>>>
>>>>>> If verified, then any sort of throttling mechanism should work.
>>>>>
>>>>>
>>>>> I'm certain that the problem is triggered only when pg_basebackup is
>>>>> running. Its very predictable, and goes away as soon as pg_basebackup
>>>>> finishes running. What do you mean by a throttling mechanism?
>>>>
>>>>
>>>> Sure, it only happens when pg_basebackup is running. But if you *pause*
>>>> pg_basebackup, so it's still running but not currently doing work, does the
>>>> problem go away? Does it come back when you unpause pg_basebackup? That's
>>>> what Jerry was telling you to try.
>>>>
>>>> If the problem goes away when you pause pg_basebackup and comes back when
>>>> you unpause it, it's probably a system load problem.
>>>>
>>>> If it doesn't go away, it's more likely to be a locking issue or something
>>>> _other_ than simple load.
>>>>
>>>> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
>>>> it, so on Linux you can use these to try and find out. When you SIGSTOP
>>>> pg_basebackup then the postgres backend associated with it should block
>>>> shortly afterwards as its buffers fill up and it can't send more data, so
>>>> the load should come off the server.
>>>>
>>>> A "throttling mechanism" refers to anything that limits the rate or speed of
>>>> a thing. In this case, what you want to do if your problem is system
>>>> overload is to limit the speed at which pg_basebackup does its work so other
>>>> things can still get work done. In other words you want to throttle it.
>>>> Typical throttling mechanisms include the "ionice" and "renice" commands to
>>>> change I/O and CPU priority, respectively.
>>>>
>>>> Note that you may need to change the priority of the *backend* that
>>>> pg_basebackup is using, not necessarily the pg_basebackup command its self.
>>>> I haven't done enough with Pg's replication to know how that works, so
>>>> someone else will have to fill that bit in.
>>>
>>> Thanks for your reply. I've confirmed that issuing a SIGSTOP does
>>> eliminate the thrashing, and issuing a SIGCONT resumes the thrash.
>>>
>>> I've looked at iostat output both before & during pg_basebackup runs,
>>> and I'm not seeing any indication that the problem is due to disk IO
>>> bottlenecks. The numbers don't vary very much at all between the good
>>> & bad times. This is typical when pg_basebackup is running:
>>> ########
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>> md0
>>> 0.00 0.00 67.76 68.62 4.42 1.46
>>> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
>>> ########
>>>
>>> and this is when the system is ok:
>>> ########
>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>> md0
>>> 0.00 0.00 68.04 68.56 4.44 1.46
>>> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
>>> ########
>>>
>>>
>>> I looked at vmstat output, but nothing is jumping out at me as being
>>> dramatically different when pg_basebackup is running. swap in and
>>> swap out are zero 100% of the time for the good & bad perf cases. I
>>> can post example output if someone is interested, or if there's
>>> something specific that I should be looking at as a potential problem,
>>> let me know.
>>
>> Did you set synchronous_standby_names to '*'? If so, the problem you
>> encountered can happen.
>>
>> When synchronous_standby_names is '*', you cannot control which
>> standbys take a role of synchronous standby. The standby which you
>> expect to run as asynchronous one might be synchronous one. So
>> my guess is that at first one of your three standbys was running as
>> synchronous standby, and all queries were executed normally. But
>> when you started pg_basebackup, pg_basebackup unexpectedly
>> got the role of synchronous standby from another standby. Since
>> pg_basebackup doesn't send the information about replication
>> progress back to the master, all queries (more precisely, transaction
>> commit) got stuck, and kept waiting for the reply from synchronous
>> standby.
>>
>> You can avoid this problem by setting synchronous_standby_names
>> to the names of your standbys instead of '*'.
>
> I don't have synchronous_standby_names set at all. I'm only doing
> asynchronous replication.

Hmm... I have no idea about what happened on your environment, for now.
Could you show me the self-contained test case?

Regards,

--
Fujii Masao

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-12 18:37:42
Message-ID:	CAP=oouGu=Pcdk3s4ceVZkdcpTQdt3LAiGo1ukkdYBibbc1+iWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Tue, Jun 12, 2012 at 10:49 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>> On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>>> On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>>>> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>>>>>
>>>>>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>>>>> wrote:
>>>>>>>
>>>>>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>>>>>
>>>>>>> if problem goes away. SIGCONT and you should start having
>>>>>>> sluggishness again.
>>>>>>>
>>>>>>> If verified, then any sort of throttling mechanism should work.
>>>>>>
>>>>>>
>>>>>> I'm certain that the problem is triggered only when pg_basebackup is
>>>>>> running. Its very predictable, and goes away as soon as pg_basebackup
>>>>>> finishes running. What do you mean by a throttling mechanism?
>>>>>
>>>>>
>>>>> Sure, it only happens when pg_basebackup is running. But if you *pause*
>>>>> pg_basebackup, so it's still running but not currently doing work, does the
>>>>> problem go away? Does it come back when you unpause pg_basebackup? That's
>>>>> what Jerry was telling you to try.
>>>>>
>>>>> If the problem goes away when you pause pg_basebackup and comes back when
>>>>> you unpause it, it's probably a system load problem.
>>>>>
>>>>> If it doesn't go away, it's more likely to be a locking issue or something
>>>>> _other_ than simple load.
>>>>>
>>>>> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
>>>>> it, so on Linux you can use these to try and find out. When you SIGSTOP
>>>>> pg_basebackup then the postgres backend associated with it should block
>>>>> shortly afterwards as its buffers fill up and it can't send more data, so
>>>>> the load should come off the server.
>>>>>
>>>>> A "throttling mechanism" refers to anything that limits the rate or speed of
>>>>> a thing. In this case, what you want to do if your problem is system
>>>>> overload is to limit the speed at which pg_basebackup does its work so other
>>>>> things can still get work done. In other words you want to throttle it.
>>>>> Typical throttling mechanisms include the "ionice" and "renice" commands to
>>>>> change I/O and CPU priority, respectively.
>>>>>
>>>>> Note that you may need to change the priority of the *backend* that
>>>>> pg_basebackup is using, not necessarily the pg_basebackup command its self.
>>>>> I haven't done enough with Pg's replication to know how that works, so
>>>>> someone else will have to fill that bit in.
>>>>
>>>> Thanks for your reply. I've confirmed that issuing a SIGSTOP does
>>>> eliminate the thrashing, and issuing a SIGCONT resumes the thrash.
>>>>
>>>> I've looked at iostat output both before & during pg_basebackup runs,
>>>> and I'm not seeing any indication that the problem is due to disk IO
>>>> bottlenecks. The numbers don't vary very much at all between the good
>>>> & bad times. This is typical when pg_basebackup is running:
>>>> ########
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>> md0
>>>> 0.00 0.00 67.76 68.62 4.42 1.46
>>>> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
>>>> ########
>>>>
>>>> and this is when the system is ok:
>>>> ########
>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>> md0
>>>> 0.00 0.00 68.04 68.56 4.44 1.46
>>>> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
>>>> ########
>>>>
>>>>
>>>> I looked at vmstat output, but nothing is jumping out at me as being
>>>> dramatically different when pg_basebackup is running. swap in and
>>>> swap out are zero 100% of the time for the good & bad perf cases. I
>>>> can post example output if someone is interested, or if there's
>>>> something specific that I should be looking at as a potential problem,
>>>> let me know.
>>>
>>> Did you set synchronous_standby_names to '*'? If so, the problem you
>>> encountered can happen.
>>>
>>> When synchronous_standby_names is '*', you cannot control which
>>> standbys take a role of synchronous standby. The standby which you
>>> expect to run as asynchronous one might be synchronous one. So
>>> my guess is that at first one of your three standbys was running as
>>> synchronous standby, and all queries were executed normally. But
>>> when you started pg_basebackup, pg_basebackup unexpectedly
>>> got the role of synchronous standby from another standby. Since
>>> pg_basebackup doesn't send the information about replication
>>> progress back to the master, all queries (more precisely, transaction
>>> commit) got stuck, and kept waiting for the reply from synchronous
>>> standby.
>>>
>>> You can avoid this problem by setting synchronous_standby_names
>>> to the names of your standbys instead of '*'.
>>
>> I don't have synchronous_standby_names set at all. I'm only doing
>> asynchronous replication.
>
> Hmm... I have no idea about what happened on your environment, for now.
> Could you show me the self-contained test case?

I'm running the following, which gets piped over ssh to a remote
server (at gigabit ethernet speed):
pg_basebackup -v -D - -x -Ft -U postgres

One thing that I've discovered is that if I throttle back the speed of
what is getting piped to the remote server, that directly correlates
to the load on the server.

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Lonni J Friedman <netllama(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-12 18:39:23
Message-ID:	CABUevEzcJNNRHQNn=USd9McPShLuR4UT41ycKQJG6356ifti5A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Tue, Jun 12, 2012 at 8:37 PM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
> On Tue, Jun 12, 2012 at 10:49 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Jun 12, 2012 at 2:37 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>> On Fri, Jun 8, 2012 at 7:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Sat, Jun 9, 2012 at 4:30 AM, Lonni J Friedman <netllama(at)gmail(dot)com> wrote:
>>>>> On Thu, Jun 7, 2012 at 11:04 PM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
>>>>>> On 06/08/2012 09:01 AM, Lonni J Friedman wrote:
>>>>>>>
>>>>>>> On Thu, Jun 7, 2012 at 5:07 PM, Jerry Sievers<gsievers19(at)comcast(dot)net>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> You might try stopping pg_basebackup in place with SIGSTOP and check
>>>>>>>>
>>>>>>>> if problem goes away. SIGCONT and you should start having
>>>>>>>> sluggishness again.
>>>>>>>>
>>>>>>>> If verified, then any sort of throttling mechanism should work.
>>>>>>>
>>>>>>>
>>>>>>> I'm certain that the problem is triggered only when pg_basebackup is
>>>>>>> running. Its very predictable, and goes away as soon as pg_basebackup
>>>>>>> finishes running. What do you mean by a throttling mechanism?
>>>>>>
>>>>>>
>>>>>> Sure, it only happens when pg_basebackup is running. But if you *pause*
>>>>>> pg_basebackup, so it's still running but not currently doing work, does the
>>>>>> problem go away? Does it come back when you unpause pg_basebackup? That's
>>>>>> what Jerry was telling you to try.
>>>>>>
>>>>>> If the problem goes away when you pause pg_basebackup and comes back when
>>>>>> you unpause it, it's probably a system load problem.
>>>>>>
>>>>>> If it doesn't go away, it's more likely to be a locking issue or something
>>>>>> _other_ than simple load.
>>>>>>
>>>>>> SIGSTOP ("kill -STOP") pauses a process, and SIGCONT ("kill -CONT") resumes
>>>>>> it, so on Linux you can use these to try and find out. When you SIGSTOP
>>>>>> pg_basebackup then the postgres backend associated with it should block
>>>>>> shortly afterwards as its buffers fill up and it can't send more data, so
>>>>>> the load should come off the server.
>>>>>>
>>>>>> A "throttling mechanism" refers to anything that limits the rate or speed of
>>>>>> a thing. In this case, what you want to do if your problem is system
>>>>>> overload is to limit the speed at which pg_basebackup does its work so other
>>>>>> things can still get work done. In other words you want to throttle it.
>>>>>> Typical throttling mechanisms include the "ionice" and "renice" commands to
>>>>>> change I/O and CPU priority, respectively.
>>>>>>
>>>>>> Note that you may need to change the priority of the *backend* that
>>>>>> pg_basebackup is using, not necessarily the pg_basebackup command its self.
>>>>>> I haven't done enough with Pg's replication to know how that works, so
>>>>>> someone else will have to fill that bit in.
>>>>>
>>>>> Thanks for your reply. I've confirmed that issuing a SIGSTOP does
>>>>> eliminate the thrashing, and issuing a SIGCONT resumes the thrash.
>>>>>
>>>>> I've looked at iostat output both before & during pg_basebackup runs,
>>>>> and I'm not seeing any indication that the problem is due to disk IO
>>>>> bottlenecks. The numbers don't vary very much at all between the good
>>>>> & bad times. This is typical when pg_basebackup is running:
>>>>> ########
>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>>> md0
>>>>> 0.00 0.00 67.76 68.62 4.42 1.46
>>>>> 88.34 0.00 0.00 0.00 0.00 0.00 0.00
>>>>> ########
>>>>>
>>>>> and this is when the system is ok:
>>>>> ########
>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s
>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util
>>>>> md0
>>>>> 0.00 0.00 68.04 68.56 4.44 1.46
>>>>> 88.39 0.00 0.00 0.00 0.00 0.00 0.00
>>>>> ########
>>>>>
>>>>>
>>>>> I looked at vmstat output, but nothing is jumping out at me as being
>>>>> dramatically different when pg_basebackup is running. swap in and
>>>>> swap out are zero 100% of the time for the good & bad perf cases. I
>>>>> can post example output if someone is interested, or if there's
>>>>> something specific that I should be looking at as a potential problem,
>>>>> let me know.
>>>>
>>>> Did you set synchronous_standby_names to '*'? If so, the problem you
>>>> encountered can happen.
>>>>
>>>> When synchronous_standby_names is '*', you cannot control which
>>>> standbys take a role of synchronous standby. The standby which you
>>>> expect to run as asynchronous one might be synchronous one. So
>>>> my guess is that at first one of your three standbys was running as
>>>> synchronous standby, and all queries were executed normally. But
>>>> when you started pg_basebackup, pg_basebackup unexpectedly
>>>> got the role of synchronous standby from another standby. Since
>>>> pg_basebackup doesn't send the information about replication
>>>> progress back to the master, all queries (more precisely, transaction
>>>> commit) got stuck, and kept waiting for the reply from synchronous
>>>> standby.
>>>>
>>>> You can avoid this problem by setting synchronous_standby_names
>>>> to the names of your standbys instead of '*'.
>>>
>>> I don't have synchronous_standby_names set at all. I'm only doing
>>> asynchronous replication.
>>
>> Hmm... I have no idea about what happened on your environment, for now.
>> Could you show me the self-contained test case?
>
> I'm running the following, which gets piped over ssh to a remote
> server (at gigabit ethernet speed):
> pg_basebackup -v -D - -x -Ft -U postgres
>
> One thing that I've discovered is that if I throttle back the speed of
> what is getting piped to the remote server, that directly correlates
> to the load on the server.

That seems to indicate that you're overloading the I/O system... Or
the CPU, but more likely I/O.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 11:18:23
Message-ID:	CABUevEzpPTMxCNrRK7yFqeVF7YgFQ3dX7JRACitjE9Ji9LNE4A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Mon, Jun 11, 2012 at 6:06 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jun 12, 2012 at 12:47 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Mon, Jun 11, 2012 at 5:37 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Mon, Jun 11, 2012 at 3:24 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> On Sun, Jun 10, 2012 at 6:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Sun, Jun 10, 2012 at 11:45 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>> On Sun, Jun 10, 2012 at 4:29 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>> On Sun, Jun 10, 2012 at 11:10 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>> On Sun, Jun 10, 2012 at 4:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>> On Sun, Jun 10, 2012 at 10:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>>> On Sun, Jun 10, 2012 at 9:25 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>>>>>>>> On Sun, Jun 10, 2012 at 7:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>>>>> On Sat, Jun 9, 2012 at 2:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>>>>>>>>>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>>>>>>>>>>>>> This seems a bug. I think we should prevent pg_basebackup from
>>>>>>>>>>>>>> becoming synchronous standby. Thought?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Absolutely. If we have replication clients that are not actually
>>>>>>>>>>>>> capable of being standbys, there *must* be a way for the master
>>>>>>>>>>>>> to know that.
>>>>>>>>>>>>
>>>>>>>>>>>> I thought we fixed this already by sending InvalidXlogRecPtr as flush
>>>>>>>>>>>> location? And that this only applied in 9.2?
>>>>>>>>>>>>
>>>>>>>>>>>> Are you saying we picked pg_basebackup *in backup mode* (not log
>>>>>>>>>>>> streaming) as synchronous standby?
>>>>>>>>>>>
>>>>>>>>>>> Yes.
>>>>>>>>>>>
>>>>>>>>>>>> If so then yes, that is
>>>>>>>>>>>> *definitely* a bug that should be fixed. We should never select a
>>>>>>>>>>>> connection that's not even streaming log as standby!
>>>>>>>>>>>
>>>>>>>>>>> Agreed. Attached patch prevents pg_basebackup from becoming sync
>>>>>>>>>>> standby. Also this patch fixes another problem: currently only walsender
>>>>>>>>>>> which reaches STREAMING state can become sync walsender. OTOH,
>>>>>>>>>>> sync walsender thinks that walsender with higher priority will be sync one
>>>>>>>>>>> whether its state is STREAMING, and switches to potential sync walsender.
>>>>>>>>>>> So when the standby with higher priority connects to the master, we
>>>>>>>>>>> might have no sync standby until it reaches the STREAMING state.
>>>>>>>>>>> To fix this problem, the patch switches walsender's state from sync to
>>>>>>>>>>> potential *after* walsender with higher priority has reached the
>>>>>>>>>>> STREAMING state.
>>>>>>>>>>>
>>>>>>>>>>> We also should not select (1) background stream process forked from
>>>>>>>>>>> pg_basebackup and (2) pg_receivexlog as sync standby because they
>>>>>>>>>>> don't send back replication progress. To address this, I'm thinking to
>>>>>>>>>>> introduce new option "NOSYNC" in "START_REPLICATION" command
>>>>>>>>>>> as follows, and to change (1) and (2) so that they specify NOSYNC.
>>>>>>>>>>>
>>>>>>>>>>> START_REPLICATION XXX/XXX [NOSYNC]
>>>>>>>>>>>
>>>>>>>>>>> If the standby specifies NOSYNC option, it's never assigned as sync
>>>>>>>>>>> standby even if its name is in synchronous_standby_names. Thought?
>>>>>>>>>>
>>>>>>>>>> The standby which always sends InvalidXLogRecPtr back should not
>>>>>>>>>> become sync one. So instead of NOSYNC option, by checking whether
>>>>>>>>>> InvalidXLogRecPtr is sent, we can avoid problematic sync standby.
>>>>>>>>>
>>>>>>>>> We should not do this because Magnus is proposing the patch
>>>>>>>>> (http://archives.postgresql.org/pgsql-hackers/2012-06/msg00348.php)
>>>>>>>>> which breaks the above assumption at all. So we should introduce
>>>>>>>>> something like NOSYNC option.
>>>>>>>>
>>>>>>>> Wouldn't the better choice there in that case be to give a switch to
>>>>>>>> pg_receivexlog if you *want* it to be able to become a sync replica,
>>>>>>>> and by default disallow it? And then keep the backend just treating
>>>>>>>> InvalidXlogRecPtr as don't-become-sync-replica.
>>>>>>>
>>>>>>> I don't object to making pg_receivexlog as sync standby at all. So at least
>>>>>>> for me, that switch is not necessary. What I'm worried about is the
>>>>>>> background stream process forked from pg_basebackup. I think that
>>>>>>> it should not run as sync standby but sending back its replication progress
>>>>>>> seems helpful because a user can see the progress from pg_stat_replication.
>>>>>>> So I'm thinking that something like NOSYNC option is required.
>>>>>>
>>>>>> On principle, no. By default, yes.
>>>>>>
>>>>>> How about:
>>>>>> pg_basebackup background: *never* sends flush location, and therefor
>>>>>> won't become sync replica
>>>>>> pg_receivexlog *optionally* sends flush location. by defualt own't
>>>>>> become sync replica, but can be made so with a switch
>>>>>
>>>>> Wouldn't a user who sees NULL in flush_location from pg_stat_replication
>>>>> misunderstand that pg_receivexlog (in default mode) and pg_basebackup
>>>>> background don't flush WAL files at all?
>>>>
>>>> That sounds like a "documentable issue".
>>>>
>>>> But maybe you're right, and we need the "never become sync" as a flag.
>>>
>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>
>> I'm on the fence. I was hoping somebody else would chime in with an
>> opinion as well.
>
> +1

Nobody else with any opinion on this? :(

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 11:58:32
Message-ID:	CA+U5nM+SGGr02xLLCCppjw2Y4q1Lnb35rtuGyNkoCjdo=aUsog@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On 11 June 2012 23:47, Magnus Hagander <magnus(at)hagander(dot)net> wrote

>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>
> I'm on the fence. I was hoping somebody else would chime in with an
> opinion as well.

Why would you add it to synchronous_standby_names and then explicitly ignore it?

I don't see why you'd want this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Lonni J Friedman <netllama(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 12:02:57
Message-ID:	CA+U5nMJmuYsFD35SvjhKat0Fcd=8DiALxB8JUdY9kFvF3=bStw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On 13 June 2012 02:39, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

>> I'm running the following, which gets piped over ssh to a remote
>> server (at gigabit ethernet speed):
>> pg_basebackup -v -D - -x -Ft -U postgres
>>
>> One thing that I've discovered is that if I throttle back the speed of
>> what is getting piped to the remote server, that directly correlates
>> to the load on the server.
>
> That seems to indicate that you're overloading the I/O system... Or
> the CPU, but more likely I/O.

CPU utilisation of ssl connections is bad. If network bandwidth is
good, perhaps running WALSender at full speed with encryption can tank
the server.

An effect related to cacheing of WAL files? Perhaps we need to mark
them as FADV_DONTNEED at some point.

Hard to say without detailed analysis.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 13:27:17
Message-ID:	CAP=oouFaJ82i+GZPCqLFmByhP3TY0XXYG7qPyiAnUF6h712HBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Wed, Jun 20, 2012 at 5:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 13 June 2012 02:39, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>>> I'm running the following, which gets piped over ssh to a remote
>>> server (at gigabit ethernet speed):
>>> pg_basebackup -v -D - -x -Ft -U postgres
>>>
>>> One thing that I've discovered is that if I throttle back the speed of
>>> what is getting piped to the remote server, that directly correlates
>>> to the load on the server.
>>
>> That seems to indicate that you're overloading the I/O system... Or
>> the CPU, but more likely I/O.
>
> CPU utilisation of ssl connections is bad. If network bandwidth is
> good, perhaps running WALSender at full speed with encryption can tank
> the server.

I'm not using SSL.

From:	"Welty, Richard" <rwelty(at)ltionline(dot)com>
To:	"Lonni J Friedman" <netllama(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Magnus Hagander" <magnus(at)hagander(dot)net>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "Craig Ringer" <ringerc(at)ringerc(dot)id(dot)au>, "Jerry Sievers" <gsievers19(at)comcast(dot)net>, <pgsql-admin(at)postgresql(dot)org>
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 13:56:38
Message-ID:	C35FDD5FDF7E584991A6AAF8F1A8425F0226AD75@ltischx01.lti.int
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

Lonni J Friedman writes:

>I'm not using SSL.

ummm, ssh uses ssl.

richard

From:	Lonni J Friedman <netllama(at)gmail(dot)com>
To:	"Welty, Richard" <rwelty(at)ltionline(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject:	Re: pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 14:06:30
Message-ID:	CAP=oouHXYFjohM0dxVp8Qw=4C7BPmdVAHYf-gQ=aCkSmyw8_nA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Wed, Jun 20, 2012 at 6:56 AM, Welty, Richard <rwelty(at)ltionline(dot)com> wrote:
> Lonni J Friedman writes:
>
>>I'm not using SSL.
>
> ummm, ssh uses ssl.
>

Sure, although I thought that Simon was referring to the database
itself. However, I don't think ssh is the problem, as I can scp a
file from the server and the load doesn't go crazy, nor does the
database ground to a halt.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-20 18:18:39
Message-ID:	CA+TgmoYrrx32Qg5BuLFAh+jUrWbLWcZMDg0jgU-Z3BGTKPqg=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>
>>> I'm on the fence. I was hoping somebody else would chime in with an
>>> opinion as well.
>>
>> +1
>
> Nobody else with any opinion on this? :(

I don't think we really need a NOSYNC flag at this point. Just not
setting the flush location in clients that make a point of flushing in
a timely fashion seems fine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-27 17:24:05
Message-ID:	CAHGQGwEPBrECq9ht1MnEYPK5Bpy4ozv1VYyLO6LJY_6OFi2SYQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>
>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>> opinion as well.
>>>
>>> +1
>>
>> Nobody else with any opinion on this? :(
>
> I don't think we really need a NOSYNC flag at this point. Just not
> setting the flush location in clients that make a point of flushing in
> a timely fashion seems fine.

Okay, I'm in the minority, so I'm writing the patch that way. WIP
patch attached.

In the patch, pg_basebackup background process and pg_receivexlog always
return invalid location as flush one, and will never become sync standby even
if their name is in synchronous_standby_names. The timing of their sending
the reply depends on the standby_message_timeout specified in -s option. So
the write position may lag behind the true position.

pg_receivexlog accepts new option -S (better option character?). If this option
is specified, pg_receivexlog returns true flush position, and can become sync
standby. It sends back the reply to the master each time the write position
changes or the timeout passes. If synchronous_commit is set to remote_write,
synchronous replication to pg_receivexlog would work well.

The patch needs more documentation. But I think that it's worth reviewing the
code in advance, so I attached the WIP patch. Comments? Objections?

The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
we need to write the backport version of the patch for 9.2.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
pg_receivexlog_syncstandby_v1.patch	application/octet-stream	10.0 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-27 17:58:41
Message-ID:	CA+U5nMJ+4uMC+gVS0t4ZKyD9U-yC+YOy-bMWf+E5TpB=v1o4Cg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On 27 June 2012 18:24, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> will never become sync standby even
> if their name is in synchronous_standby_names.

I don't understand why you'd want that.

What is wrong with removing the name from synchronous_standby_names if
you don't like it?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-29 10:04:39
Message-ID:	CABUevEwjhq1==Zqo904KrvuydgTVGbG1em3E9S+ekBtEkE__JQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Wed, Jun 27, 2012 at 7:58 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 27 June 2012 18:24, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
>> will never become sync standby even
>> if their name is in synchronous_standby_names.
>
> I don't understand why you'd want that.
>
> What is wrong with removing the name from synchronous_standby_names if
> you don't like it?

I believe that's just fallout -the main point is that we don't want to
become a sync standby when synchronous_standby_names is set to '*'.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-06-29 10:22:18
Message-ID:	CABUevEwEfd1+md8kqgetP6wxe6yP3yFUOa4tixSmFvZRiQRvfg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>
>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>> opinion as well.
>>>>
>>>> +1
>>>
>>> Nobody else with any opinion on this? :(
>>
>> I don't think we really need a NOSYNC flag at this point. Just not
>> setting the flush location in clients that make a point of flushing in
>> a timely fashion seems fine.
>
> Okay, I'm in the minority, so I'm writing the patch that way. WIP
> patch attached.
>
> In the patch, pg_basebackup background process and pg_receivexlog always
> return invalid location as flush one, and will never become sync standby even
> if their name is in synchronous_standby_names. The timing of their sending

That doesn't match with the patch, afaics. The patch always sets the
correct write location, which means it can become a remote_write
synchronous standby, no? It will only send it back when timeout
expires, but it will be sent back.

I wonder if that might actually be a more reasonable mode of operation
in general:

* always send back the write position, at the write interval
* always send back the flush position, when we're flushing (meaning
when we switch xlog)

have an option that makes it possible to:
* always send back the write position as soon as it changes (making
for a reasonable remote_write sync standby)
* actually flush the log after each write instead of end of file
(making for a reasonable full sync standby)

meaning you'd have something like "pg_receivexlog --sync=write" and
"pg_receivexlog --sync=flush" controlling it instead.

And deal with the "user put * in synchronous_standby_names and
accidentally got pg_receivexlog as the sync standby" by more clearly
warning people not to use * for that parameter... Since it's simply
dangerous :)

> the reply depends on the standby_message_timeout specified in -s option. So
> the write position may lag behind the true position.
>
> pg_receivexlog accepts new option -S (better option character?). If this option
> is specified, pg_receivexlog returns true flush position, and can become sync
> standby. It sends back the reply to the master each time the write position
> changes or the timeout passes. If synchronous_commit is set to remote_write,
> synchronous replication to pg_receivexlog would work well.

Yeah, I hadn't considered the remote_write mode, but I guess that's
why you have to track the current write position across loads, which
first confused me.

Looking at some other usecases for this, I wonder if we should also
force a status message whenever we switch xlog files, even if we
aren't running in sync mode, even if the timeout hasn't expired. I
think that would be a reasonable thing to do, since you often want to
track things based on files.

> The patch needs more documentation. But I think that it's worth reviewing the
> code in advance, so I attached the WIP patch. Comments? Objections?

Looking at the code, what exactly prompts the changes to the backend
side? That seems unrelated? Are we actually considering picking a
standby with InvalidXlogRecPtr as a sync standby today?

Isn't it enough to just send the proper write and flush locations from
the frontend?

> The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
> we need to write the backport version of the patch for 9.2.

Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-07-01 17:14:25
Message-ID:	CAHGQGwFfQA4X2M=vPkaoGfpN7t19=GhHZ0gFGHfy7_p=1Cka4g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

Thanks for the review!

On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>
>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>> opinion as well.
>>>>>
>>>>> +1
>>>>
>>>> Nobody else with any opinion on this? :(
>>>
>>> I don't think we really need a NOSYNC flag at this point. Just not
>>> setting the flush location in clients that make a point of flushing in
>>> a timely fashion seems fine.
>>
>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>> patch attached.
>>
>> In the patch, pg_basebackup background process and pg_receivexlog always
>> return invalid location as flush one, and will never become sync standby even
>> if their name is in synchronous_standby_names. The timing of their sending
>
> That doesn't match with the patch, afaics. The patch always sets the
> correct write location, which means it can become a remote_write
> synchronous standby, no? It will only send it back when timeout
> expires, but it will be sent back.

No. Though correct write location is sent back, they don't become sync standby
because flush location is always invalid. While flush location is
invalid, the master
will never regard the remote server as sync one even if synchronous_commit is
set to remote_write.

>
> I wonder if that might actually be a more reasonable mode of operation
> in general:
>
> * always send back the write position, at the write interval
> * always send back the flush position, when we're flushing (meaning
> when we switch xlog)
>
> have an option that makes it possible to:
> * always send back the write position as soon as it changes (making
> for a reasonable remote_write sync standby)
> * actually flush the log after each write instead of end of file
> (making for a reasonable full sync standby)
>
> meaning you'd have something like "pg_receivexlog --sync=write" and
> "pg_receivexlog --sync=flush" controlling it instead.

Yeah, in this way, pg_receivexlog can become sync even if
synchronous_commit is on, which seems more useful. But
I'm thinking that the synchronous pg_receivexlog stuff should
be postponed to 9.3 because its patch seems to become too
big to apply at this beta stage. So, in 9.2, to fix the problem,
what about just applying the simple patch which prevents
pg_basebackup background process and pg_receivexlog from
becoming sync standby whatever synchronous_standby_names
and synchronous_commit are set to?

> And deal with the "user put * in synchronous_standby_names and
> accidentally got pg_receivexlog as the sync standby" by more clearly
> warning people not to use * for that parameter... Since it's simply
> dangerous :)

Yep.

>> the reply depends on the standby_message_timeout specified in -s option. So
>> the write position may lag behind the true position.
>>
>> pg_receivexlog accepts new option -S (better option character?). If this option
>> is specified, pg_receivexlog returns true flush position, and can become sync
>> standby. It sends back the reply to the master each time the write position
>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>> synchronous replication to pg_receivexlog would work well.
>
> Yeah, I hadn't considered the remote_write mode, but I guess that's
> why you have to track the current write position across loads, which
> first confused me.

The patch has to track the current write location to decide whether to send
back the reply to the master, IOW to know whether the write location
has changed, IOW to know whether we've already sent the reply about
the latest write location.

> Looking at some other usecases for this, I wonder if we should also
> force a status message whenever we switch xlog files, even if we
> aren't running in sync mode, even if the timeout hasn't expired. I
> think that would be a reasonable thing to do, since you often want to
> track things based on files.

You mean that the pg_receivexlog should send back the correct flush
location whenever it switches xlog files?

>> The patch needs more documentation. But I think that it's worth reviewing the
>> code in advance, so I attached the WIP patch. Comments? Objections?
>
> Looking at the code, what exactly prompts the changes to the backend
> side? That seems unrelated? Are we actually considering picking a
> standby with InvalidXlogRecPtr as a sync standby today?
>
> Isn't it enough to just send the proper write and flush locations from
> the frontend?

No, unless I'm missing something.

The problem that we should address first is that the master can pick up
pg_basebackup background process and pg_receivexlog as a sync
standby even if they always return an invalid write and flush positions.
Since they don't send any correct write and flush positions, if they are
accidentally regarded as sync standby, all transactions can get blocked
infinitely. So the patch needed to change the walsender code so that it
doesn't pick up the remote server as sync one while its flush position
is invalid.

>> The patch is based on current HEAD, i.e., 9.3dev. If the patch is applied,
>> we need to write the backport version of the patch for 9.2.
>
> Oh, conflicts with Heikkis xlog patches, right? Ugh. But yeah.

Yep.

Regards,

--
Fujii Masao

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-07-01 19:01:12
Message-ID:	CABUevEx7xWv8pFVWTuA2WM157AEWPB5Gkc=p6BhEytJE3aEa8w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>
> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>
>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>> opinion as well.
>>>>>>
>>>>>> +1
>>>>>
>>>>> Nobody else with any opinion on this? :(
>>>>
>>>> I don't think we really need a NOSYNC flag at this point. Just not
>>>> setting the flush location in clients that make a point of flushing in
>>>> a timely fashion seems fine.
>>>
>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>> patch attached.
>>>
>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>> return invalid location as flush one, and will never become sync standby even
>>> if their name is in synchronous_standby_names. The timing of their sending
>>
>> That doesn't match with the patch, afaics. The patch always sets the
>> correct write location, which means it can become a remote_write
>> synchronous standby, no? It will only send it back when timeout
>> expires, but it will be sent back.
>
> No. Though correct write location is sent back, they don't become sync standby
> because flush location is always invalid. While flush location is
> invalid, the master
> will never regard the remote server as sync one even if synchronous_commit is
> set to remote_write.

Oh. I wasn't aware of that part.

>> I wonder if that might actually be a more reasonable mode of operation
>> in general:
>>
>> * always send back the write position, at the write interval
>> * always send back the flush position, when we're flushing (meaning
>> when we switch xlog)
>>
>> have an option that makes it possible to:
>> * always send back the write position as soon as it changes (making
>> for a reasonable remote_write sync standby)
>> * actually flush the log after each write instead of end of file
>> (making for a reasonable full sync standby)
>>
>> meaning you'd have something like "pg_receivexlog --sync=write" and
>> "pg_receivexlog --sync=flush" controlling it instead.
>
> Yeah, in this way, pg_receivexlog can become sync even if
> synchronous_commit is on, which seems more useful. But
> I'm thinking that the synchronous pg_receivexlog stuff should
> be postponed to 9.3 because its patch seems to become too
> big to apply at this beta stage. So, in 9.2, to fix the problem,
> what about just applying the simple patch which prevents
> pg_basebackup background process and pg_receivexlog from
> becoming sync standby whatever synchronous_standby_names
> and synchronous_commit are set to?

Agreed.

With the addition that we should set the write location, because
that's very useful and per what you said above should be perfectly
safe.

>> And deal with the "user put * in synchronous_standby_names and
>> accidentally got pg_receivexlog as the sync standby" by more clearly
>> warning people not to use * for that parameter... Since it's simply
>> dangerous :)
>
> Yep.

What would be good wording? Something along the line of "Using the *
entry is not recommended since it can lead to unexpected results when
new standbys are added" or something like that?

>>> the reply depends on the standby_message_timeout specified in -s option. So
>>> the write position may lag behind the true position.
>>>
>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>> standby. It sends back the reply to the master each time the write position
>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>> synchronous replication to pg_receivexlog would work well.
>>
>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>> why you have to track the current write position across loads, which
>> first confused me.
>
> The patch has to track the current write location to decide whether to send
> back the reply to the master, IOW to know whether the write location
> has changed, IOW to know whether we've already sent the reply about
> the latest write location.

Yeha, makes perfect sense.

>> Looking at some other usecases for this, I wonder if we should also
>> force a status message whenever we switch xlog files, even if we
>> aren't running in sync mode, even if the timeout hasn't expired. I
>> think that would be a reasonable thing to do, since you often want to
>> track things based on files.
>
> You mean that the pg_receivexlog should send back the correct flush
> location whenever it switches xlog files?

No, I mean just send back a status message. Meaning that without
specifiying the sync modes per above, it would send back the *write*
location. This would be useful for tracking xlog filenames between
master and pg_receivexlog, without extra delay.

>>> The patch needs more documentation. But I think that it's worth reviewing the
>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>
>> Looking at the code, what exactly prompts the changes to the backend
>> side? That seems unrelated? Are we actually considering picking a
>> standby with InvalidXlogRecPtr as a sync standby today?
>>
>> Isn't it enough to just send the proper write and flush locations from
>> the frontend?
>
> No, unless I'm missing something.
>
> The problem that we should address first is that the master can pick up
> pg_basebackup background process and pg_receivexlog as a sync
> standby even if they always return an invalid write and flush positions.
> Since they don't send any correct write and flush positions, if they are
> accidentally regarded as sync standby, all transactions can get blocked
> infinitely. So the patch needed to change the walsender code so that it
> doesn't pick up the remote server as sync one while its flush position
> is invalid.

Yeah, that is clearly wrong. I think I missed this behaviour, and got
confused by the fact that the patch was trying to fix two different
things - only one of which I was aware of.

So yes, per above, let's isolate out this part as one patch and get
that into 9.2, along with the "set the proper write location", but
leave everything else for 9.3.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-07-02 18:17:16
Message-ID:	CAHGQGwHhUZf=vMhhAp+HBsYEEvxjTk_krL+s7H0n_eiXn4zqDA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>>
>>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>>> opinion as well.
>>>>>>>
>>>>>>> +1
>>>>>>
>>>>>> Nobody else with any opinion on this? :(
>>>>>
>>>>> I don't think we really need a NOSYNC flag at this point. Just not
>>>>> setting the flush location in clients that make a point of flushing in
>>>>> a timely fashion seems fine.
>>>>
>>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>>> patch attached.
>>>>
>>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>>> return invalid location as flush one, and will never become sync standby even
>>>> if their name is in synchronous_standby_names. The timing of their sending
>>>
>>> That doesn't match with the patch, afaics. The patch always sets the
>>> correct write location, which means it can become a remote_write
>>> synchronous standby, no? It will only send it back when timeout
>>> expires, but it will be sent back.
>>
>> No. Though correct write location is sent back, they don't become sync standby
>> because flush location is always invalid. While flush location is
>> invalid, the master
>> will never regard the remote server as sync one even if synchronous_commit is
>> set to remote_write.
>
> Oh. I wasn't aware of that part.
>
>
>>> I wonder if that might actually be a more reasonable mode of operation
>>> in general:
>>>
>>> * always send back the write position, at the write interval
>>> * always send back the flush position, when we're flushing (meaning
>>> when we switch xlog)
>>>
>>> have an option that makes it possible to:
>>> * always send back the write position as soon as it changes (making
>>> for a reasonable remote_write sync standby)
>>> * actually flush the log after each write instead of end of file
>>> (making for a reasonable full sync standby)
>>>
>>> meaning you'd have something like "pg_receivexlog --sync=write" and
>>> "pg_receivexlog --sync=flush" controlling it instead.
>>
>> Yeah, in this way, pg_receivexlog can become sync even if
>> synchronous_commit is on, which seems more useful. But
>> I'm thinking that the synchronous pg_receivexlog stuff should
>> be postponed to 9.3 because its patch seems to become too
>> big to apply at this beta stage. So, in 9.2, to fix the problem,
>> what about just applying the simple patch which prevents
>> pg_basebackup background process and pg_receivexlog from
>> becoming sync standby whatever synchronous_standby_names
>> and synchronous_commit are set to?
>
> Agreed.
>
> With the addition that we should set the write location, because
> that's very useful and per what you said above should be perfectly
> safe.
>
>
>>> And deal with the "user put * in synchronous_standby_names and
>>> accidentally got pg_receivexlog as the sync standby" by more clearly
>>> warning people not to use * for that parameter... Since it's simply
>>> dangerous :)
>>
>> Yep.
>
> What would be good wording? Something along the line of "Using the *
> entry is not recommended since it can lead to unexpected results when
> new standbys are added" or something like that?
>
>
>>>> the reply depends on the standby_message_timeout specified in -s option. So
>>>> the write position may lag behind the true position.
>>>>
>>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>>> standby. It sends back the reply to the master each time the write position
>>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>>> synchronous replication to pg_receivexlog would work well.
>>>
>>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>>> why you have to track the current write position across loads, which
>>> first confused me.
>>
>> The patch has to track the current write location to decide whether to send
>> back the reply to the master, IOW to know whether the write location
>> has changed, IOW to know whether we've already sent the reply about
>> the latest write location.
>
> Yeha, makes perfect sense.
>
>
>>> Looking at some other usecases for this, I wonder if we should also
>>> force a status message whenever we switch xlog files, even if we
>>> aren't running in sync mode, even if the timeout hasn't expired. I
>>> think that would be a reasonable thing to do, since you often want to
>>> track things based on files.
>>
>> You mean that the pg_receivexlog should send back the correct flush
>> location whenever it switches xlog files?
>
> No, I mean just send back a status message. Meaning that without
> specifiying the sync modes per above, it would send back the *write*
> location. This would be useful for tracking xlog filenames between
> master and pg_receivexlog, without extra delay.
>
>>>> The patch needs more documentation. But I think that it's worth reviewing the
>>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>>
>>> Looking at the code, what exactly prompts the changes to the backend
>>> side? That seems unrelated? Are we actually considering picking a
>>> standby with InvalidXlogRecPtr as a sync standby today?
>>>
>>> Isn't it enough to just send the proper write and flush locations from
>>> the frontend?
>>
>> No, unless I'm missing something.
>>
>> The problem that we should address first is that the master can pick up
>> pg_basebackup background process and pg_receivexlog as a sync
>> standby even if they always return an invalid write and flush positions.
>> Since they don't send any correct write and flush positions, if they are
>> accidentally regarded as sync standby, all transactions can get blocked
>> infinitely. So the patch needed to change the walsender code so that it
>> doesn't pick up the remote server as sync one while its flush position
>> is invalid.
>
> Yeah, that is clearly wrong. I think I missed this behaviour, and got
> confused by the fact that the patch was trying to fix two different
> things - only one of which I was aware of.
>
> So yes, per above, let's isolate out this part as one patch and get
> that into 9.2, along with the "set the proper write location", but
> leave everything else for 9.3.

Agreed. The attached patch always sets the correct write location and
prevents the remote server sending back invalid flush location from
becoming sync standby.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
prevent_pgreceivexlog_becoming_syncstandby_v1.patch	application/octet-stream	4.7 KB

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Lonni J Friedman <netllama(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Jerry Sievers <gsievers19(at)comcast(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ADMIN] pg_basebackup blocking all queries with horrible performance
Date:	2012-07-04 13:26:00
Message-ID:	CABUevEwh70t0t19PNpfGxPmXdcT82G61_pyZ5LtWvdf0Ptci3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-admin pgsql-hackers

On Mon, Jul 2, 2012 at 8:17 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, Jul 2, 2012 at 4:01 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Sun, Jul 1, 2012 at 7:14 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> On Fri, Jun 29, 2012 at 7:22 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> On Wed, Jun 27, 2012 at 7:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Thu, Jun 21, 2012 at 3:18 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>>> On Wed, Jun 20, 2012 at 7:18 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>>>>>> You agreed to add something like NOSYNC option into START_REPLICATION command?
>>>>>>>>>
>>>>>>>>> I'm on the fence. I was hoping somebody else would chime in with an
>>>>>>>>> opinion as well.
>>>>>>>>
>>>>>>>> +1
>>>>>>>
>>>>>>> Nobody else with any opinion on this? :(
>>>>>>
>>>>>> I don't think we really need a NOSYNC flag at this point. Just not
>>>>>> setting the flush location in clients that make a point of flushing in
>>>>>> a timely fashion seems fine.
>>>>>
>>>>> Okay, I'm in the minority, so I'm writing the patch that way. WIP
>>>>> patch attached.
>>>>>
>>>>> In the patch, pg_basebackup background process and pg_receivexlog always
>>>>> return invalid location as flush one, and will never become sync standby even
>>>>> if their name is in synchronous_standby_names. The timing of their sending
>>>>
>>>> That doesn't match with the patch, afaics. The patch always sets the
>>>> correct write location, which means it can become a remote_write
>>>> synchronous standby, no? It will only send it back when timeout
>>>> expires, but it will be sent back.
>>>
>>> No. Though correct write location is sent back, they don't become sync standby
>>> because flush location is always invalid. While flush location is
>>> invalid, the master
>>> will never regard the remote server as sync one even if synchronous_commit is
>>> set to remote_write.
>>
>> Oh. I wasn't aware of that part.
>>
>>
>>>> I wonder if that might actually be a more reasonable mode of operation
>>>> in general:
>>>>
>>>> * always send back the write position, at the write interval
>>>> * always send back the flush position, when we're flushing (meaning
>>>> when we switch xlog)
>>>>
>>>> have an option that makes it possible to:
>>>> * always send back the write position as soon as it changes (making
>>>> for a reasonable remote_write sync standby)
>>>> * actually flush the log after each write instead of end of file
>>>> (making for a reasonable full sync standby)
>>>>
>>>> meaning you'd have something like "pg_receivexlog --sync=write" and
>>>> "pg_receivexlog --sync=flush" controlling it instead.
>>>
>>> Yeah, in this way, pg_receivexlog can become sync even if
>>> synchronous_commit is on, which seems more useful. But
>>> I'm thinking that the synchronous pg_receivexlog stuff should
>>> be postponed to 9.3 because its patch seems to become too
>>> big to apply at this beta stage. So, in 9.2, to fix the problem,
>>> what about just applying the simple patch which prevents
>>> pg_basebackup background process and pg_receivexlog from
>>> becoming sync standby whatever synchronous_standby_names
>>> and synchronous_commit are set to?
>>
>> Agreed.
>>
>> With the addition that we should set the write location, because
>> that's very useful and per what you said above should be perfectly
>> safe.
>>
>>
>>>> And deal with the "user put * in synchronous_standby_names and
>>>> accidentally got pg_receivexlog as the sync standby" by more clearly
>>>> warning people not to use * for that parameter... Since it's simply
>>>> dangerous :)
>>>
>>> Yep.
>>
>> What would be good wording? Something along the line of "Using the *
>> entry is not recommended since it can lead to unexpected results when
>> new standbys are added" or something like that?
>>
>>
>>>>> the reply depends on the standby_message_timeout specified in -s option. So
>>>>> the write position may lag behind the true position.
>>>>>
>>>>> pg_receivexlog accepts new option -S (better option character?). If this option
>>>>> is specified, pg_receivexlog returns true flush position, and can become sync
>>>>> standby. It sends back the reply to the master each time the write position
>>>>> changes or the timeout passes. If synchronous_commit is set to remote_write,
>>>>> synchronous replication to pg_receivexlog would work well.
>>>>
>>>> Yeah, I hadn't considered the remote_write mode, but I guess that's
>>>> why you have to track the current write position across loads, which
>>>> first confused me.
>>>
>>> The patch has to track the current write location to decide whether to send
>>> back the reply to the master, IOW to know whether the write location
>>> has changed, IOW to know whether we've already sent the reply about
>>> the latest write location.
>>
>> Yeha, makes perfect sense.
>>
>>
>>>> Looking at some other usecases for this, I wonder if we should also
>>>> force a status message whenever we switch xlog files, even if we
>>>> aren't running in sync mode, even if the timeout hasn't expired. I
>>>> think that would be a reasonable thing to do, since you often want to
>>>> track things based on files.
>>>
>>> You mean that the pg_receivexlog should send back the correct flush
>>> location whenever it switches xlog files?
>>
>> No, I mean just send back a status message. Meaning that without
>> specifiying the sync modes per above, it would send back the *write*
>> location. This would be useful for tracking xlog filenames between
>> master and pg_receivexlog, without extra delay.
>>
>>>>> The patch needs more documentation. But I think that it's worth reviewing the
>>>>> code in advance, so I attached the WIP patch. Comments? Objections?
>>>>
>>>> Looking at the code, what exactly prompts the changes to the backend
>>>> side? That seems unrelated? Are we actually considering picking a
>>>> standby with InvalidXlogRecPtr as a sync standby today?
>>>>
>>>> Isn't it enough to just send the proper write and flush locations from
>>>> the frontend?
>>>
>>> No, unless I'm missing something.
>>>
>>> The problem that we should address first is that the master can pick up
>>> pg_basebackup background process and pg_receivexlog as a sync
>>> standby even if they always return an invalid write and flush positions.
>>> Since they don't send any correct write and flush positions, if they are
>>> accidentally regarded as sync standby, all transactions can get blocked
>>> infinitely. So the patch needed to change the walsender code so that it
>>> doesn't pick up the remote server as sync one while its flush position
>>> is invalid.
>>
>> Yeah, that is clearly wrong. I think I missed this behaviour, and got
>> confused by the fact that the patch was trying to fix two different
>> things - only one of which I was aware of.
>>
>> So yes, per above, let's isolate out this part as one patch and get
>> that into 9.2, along with the "set the proper write location", but
>> leave everything else for 9.3.
>
> Agreed. The attached patch always sets the correct write location and
> prevents the remote server sending back invalid flush location from
> becoming sync standby.

Thanks, applied.

I also put the prevent invalid flush location from becoming a sync
standby part on 9.1 (required only a minor change -
GetXLogReplayRecPtr() didn't take an argument back then).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/