Re: Possible bug in cascaded standby

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Possible bug in cascaded standby
Date: 2013-06-05 16:03:08
Message-ID: CABOikdN1=zjSL7U6ykzgia8ExPqLYfcBY2Pne__80CWVrrA11g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I am experimenting with the cascade standby and hit a problem which is
reproducible with the current HEAD. I haven't tried other branches, but not
sure if the test setup I am trying even works for older releases because of
the timeline ID issue.

Anyways, I set up a cascaded standby such that it streams from the first
standby and then stopped the original master and promoted the first standby
to be the new master. If I then try to smart shutdown the cascaded standby,
it fails after waiting for the walreceiver to terminate. What's worse, the
walsender on the first standby gets into an infinite loop consuming 100%
CPU.

I tried to investigate this a bit, but haven't made progress worth
reporting. I can spend more time, but just wanted to make sure that I'm not
trying something which is a known issue or limitation. BTW, this is on my
Macbook Pro. Attached is the script that I used to set up the environment.
You will need to modify it for your setup though.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Attachment Content-Type Size
test_cascade_stdby.sh application/x-sh 1.6 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in cascaded standby
Date: 2013-06-05 17:27:44
Message-ID: CAHGQGwEEjT8VkvjP941erL930oXUe8YhqiLXuud=Da_sptd6Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 6, 2013 at 1:03 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Hello,
>
> I am experimenting with the cascade standby and hit a problem which is
> reproducible with the current HEAD. I haven't tried other branches, but not
> sure if the test setup I am trying even works for older releases because of
> the timeline ID issue.
>
> Anyways, I set up a cascaded standby such that it streams from the first
> standby and then stopped the original master and promoted the first standby
> to be the new master. If I then try to smart shutdown the cascaded standby,
> it fails after waiting for the walreceiver to terminate. What's worse, the
> walsender on the first standby gets into an infinite loop consuming 100%
> CPU.
>
> I tried to investigate this a bit, but haven't made progress worth
> reporting. I can spend more time, but just wanted to make sure that I'm not
> trying something which is a known issue or limitation. BTW, this is on my
> Macbook Pro. Attached is the script that I used to set up the environment.
> You will need to modify it for your setup though.

I was not able to reproduce the problem. Maybe this is the timing problem.
Could you share the server log of each server at the time when the problem
happened? Just in case, I attached the server logs which I got when I ran
the script to reproduce the problem.

Regards,

--
Fujii Masao

Attachment Content-Type Size
master.log application/octet-stream 283 bytes
sb1.log application/octet-stream 808 bytes
sb2.log application/octet-stream 1.0 KB

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in cascaded standby
Date: 2013-06-06 05:00:20
Message-ID: CABOikdPnNsoFS7ED_0f7g34i56fOm10YYNZJ3B_OGovx1mSDtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 5, 2013 at 10:57 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

>
>
> I was not able to reproduce the problem. Maybe this is the timing problem.
>

Hmm. I can't reproduce this on my Ubuntu box either. I will retry on the
Mac machine in the evening. Surprisingly, I could reproduce it very easily
on that box. What I'd observed is that the walreceiver on the cascaded
standby is stuck at walreceiver.c:447, which in turn is waiting infinitely
at libpqwalreceiver.c:501 i.e. PQgetResult() call.

I'll retry and report back if I see the problem on the offending platform.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in cascaded standby
Date: 2013-06-08 08:06:46
Message-ID: CABOikdOQEh=n-nbb6xDg_7bmB4W1Ncabh2eeH6_aspo4exbkbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'll retry and report back if I see the problem on the offending platform.
>
>
Just to close out this thread, I can't reproduce this on the Mac OS either.
While I'd done a "make clean" earlier, "make distclean" did the trick.
Sorry for the noise.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee