Archiver not picking up changes to archive_command

Lists: pgsql-generalpgsql-hackers
From: bricklen <bricklen(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Archiver not picking up changes to archive_command
Date: 2010-05-11 00:01:03
Message-ID: AANLkTinmg9-gtu9NUxebe-NKfk5dkhNY2-sa1H0-WxCr@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

I'm stumped by an issue we are experiencing at the moment. We have
been successfully archiving logs to two standby sites for many months
now using the following command:

rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
--bwlimit=1250 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/

Due to some heavy processing today, we have been falling behind on
shipping log files (by about a 1000 logs or so), so wanted to up our
bwlimit like so:

rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/

The db is showing the change.
SHOW archive_command:
rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
--bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/

Yet, the running processes never get above the original bwlimit of
1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
(I'm leery of trying that untested though)

ps aux | grep rsync
postgres 27704 0.0 0.0 63820 1068 ? S 16:55 0:00 sh -c
rsync -a pg_xlog/000000010000071700000070
postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync --bwlimit=1250 -az
pg_xlog/000000010000071700000070 postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
postgres 27714 37.2 0.0 68716 1612 ? S 16:55 0:01 rsync
--bwlimit=1250 -az pg_xlog/000000010000071700000070
postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
postgres 27715 3.0 0.0 60764 5648 ? S 16:55 0:00 ssh
-l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 .
/WAL_Archive/

Thanks,

bricklen


From: bricklen <bricklen(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Archiver not picking up changes to archive_command
Date: 2010-05-11 00:04:53
Message-ID: AANLkTim4uSu7Rqgi7gluFZo8ODZ1DlWkhtI0D7ZJP-F7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Sorry, version: PostgreSQL 8.4.2 on x86_64-redhat-linux-gnu, compiled
by GCC gcc (GCC) 4.1.2 20071124 (Red Hat 4.1.2-42), 64-bit

On Mon, May 10, 2010 at 5:01 PM, bricklen <bricklen(at)gmail(dot)com> wrote:
> Hi,
>
> I'm stumped by an issue we are experiencing at the moment. We have
> been successfully archiving logs to two standby sites for many months
> now using the following command:
>
> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
> --bwlimit=1250 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
>
> Due to some heavy processing today, we have been falling behind on
> shipping log files (by about a 1000 logs or so), so wanted to up our
> bwlimit like so:
>
> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
>
>
> The db is showing the change.
> SHOW archive_command:
> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
>
>
> Yet, the running processes never get above the original bwlimit of
> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
> (I'm leery of trying that untested though)
>
> ps aux | grep rsync
> postgres 27704  0.0  0.0  63820  1068 ?        S    16:55   0:00 sh -c
> rsync -a pg_xlog/000000010000071700000070
> postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync --bwlimit=1250 -az
> pg_xlog/000000010000071700000070 postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
> postgres 27714 37.2  0.0  68716  1612 ?        S    16:55   0:01 rsync
> --bwlimit=1250 -az pg_xlog/000000010000071700000070
> postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
> postgres 27715  3.0  0.0  60764  5648 ?        S    16:55   0:00 ssh
> -l postgres 14.121.70.98 rsync --server -logDtprz --bwlimit=1250 .
> /WAL_Archive/
>
>
> Thanks,
>
> bricklen
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: bricklen <bricklen(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Archiver not picking up changes to archive_command
Date: 2010-05-11 00:50:53
Message-ID: 21571.1273539053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

bricklen <bricklen(at)gmail(dot)com> writes:
> Due to some heavy processing today, we have been falling behind on
> shipping log files (by about a 1000 logs or so), so wanted to up our
> bwlimit like so:

> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/

> The db is showing the change.
> SHOW archive_command:
> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
> --bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/

> Yet, the running processes never get above the original bwlimit of
> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
> (I'm leery of trying that untested though)

A look at the code shows that the archiver only notices SIGHUP once
per outer loop, so the change would only take effect once you catch up,
which is not going to help much in this case. Possibly we should change
it to check for SIGHUP after each archive_command execution.

If you kill -9 the archiver process, the postmaster will just start
a new one, but realize that that would result in two concurrent
rsync's. It might work ok to kill -9 the archiver and the current
rsync in the same command.

regards, tom lane


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: bricklen <bricklen(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Archiver not picking up changes to archive_command
Date: 2010-05-11 01:12:00
Message-ID: 4BE8AEE0.7080701@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case. Possibly we should change
> it to check for SIGHUP after each archive_command execution.
>

I never considered this a really important issue to sort out because I
tell everybody it's unwise to put something complicated directly into
archive_command. Much better to call a script that gets passed %f/%p,
then let that script do all the work; don't even have to touch the
server config if you need to fix something then. The lack of error
checking that you get when just writing some shell commands directly in
the archive_command itself horrifies me in a production environment.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: bricklen <bricklen(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Archiver not picking up changes to archive_command
Date: 2010-05-11 01:44:09
Message-ID: AANLkTinHuEBR1GuARRcJk-HHC-dnhobAILP_Q2c3QaB4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, May 10, 2010 at 5:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case.  Possibly we should change
> it to check for SIGHUP after each archive_command execution.
>
> If you kill -9 the archiver process, the postmaster will just start
> a new one, but realize that that would result in two concurrent
> rsync's.  It might work ok to kill -9 the archiver and the current
> rsync in the same command.
>
>                        regards, tom lane
>

I think I'll just wait it out, then sighup.

Thanks for looking into this.


From: bricklen <bricklen(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: Archiver not picking up changes to archive_command
Date: 2010-05-11 01:45:38
Message-ID: AANLkTilD1DXbigG2bRVALhcEIvCg63X3n2nN5CQ-4Y68@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, May 10, 2010 at 6:12 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Tom Lane wrote:
>>
>> A look at the code shows that the archiver only notices SIGHUP once
>> per outer loop, so the change would only take effect once you catch up,
>> which is not going to help much in this case.  Possibly we should change
>> it to check for SIGHUP after each archive_command execution.
>>
>
> I never considered this a really important issue to sort out because I tell
> everybody it's unwise to put something complicated directly into
> archive_command.  Much better to call a script that gets passed %f/%p, then
> let that script do all the work; don't even have to touch the server config
> if you need to fix something then.  The lack of error checking that you get
> when just writing some shell commands directly in the archive_command itself
> horrifies me in a production environment.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com   www.2ndQuadrant.us

Thanks Greg, that's a good idea. I'll revise that series of commands
into a script, and add some error handling as you suggest.

Cheers,

Bricklen


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: bricklen <bricklen(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archiver not picking up changes to archive_command
Date: 2010-05-11 04:21:16
Message-ID: AANLkTiniEcM9kHs0mOs8RUIoGkhipEJ7PR0DzDPd39qS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, May 11, 2010 at 9:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> bricklen <bricklen(at)gmail(dot)com> writes:
>> Due to some heavy processing today, we have been falling behind on
>> shipping log files (by about a 1000 logs or so), so wanted to up our
>> bwlimit like so:
>
>> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
>> --bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
>
>> The db is showing the change.
>> SHOW archive_command:
>> rsync -a %p postgres(at)192(dot)168(dot)80(dot)174:/WAL_Archive/ && rsync
>> --bwlimit=1875 -az %p postgres(at)14(dot)121(dot)70(dot)98:/WAL_Archive/
>
>> Yet, the running processes never get above the original bwlimit of
>> 1250. Have I missed a step? Would "kill -HUP <archiver pid>" help?
>> (I'm leery of trying that untested though)
>
> A look at the code shows that the archiver only notices SIGHUP once
> per outer loop, so the change would only take effect once you catch up,
> which is not going to help much in this case.  Possibly we should change
> it to check for SIGHUP after each archive_command execution.

+1

Here is the simple patch to do so.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
pgarch_check_sighup_v1.patch application/octet-stream 457 bytes