Possible alpha5 SR bug

Lists: pgsql-bugs
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Cc: quinn(at)fairpath(dot)com, Josh Berkus <josh(at)postgresql(dot)org>
Subject: Possible alpha5 SR bug
Date: 2010-04-13 04:36:02
Message-ID: 1271133362.5250.103.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

During the testing day organized a week ago, Quinn
Weaver ran into what looks like a problem. I attached the log output at
the end of this email. Note that he was running a Mac, but replicating
from a Linux machine (both 64-bit). I know this is not a supported
configuration, but a segfault seems like a problem anyway.

Quinn helpfully provided a tarball of his data directory here:

http://fairpath.com/QuinnPgBug.tar.gz

and described his machine

"My machine is a Mac with an Intel Core 2 Duo processor (64-bit)
running Mac OS X 10.6.3. It has 2 GB of RAM, which should be plenty
for the config we used."

I was trying to sort this bug out somewhat before posting, but we
weren't able to reproduce it (it happened near the end of testing, and
people were leaving), and I didn't have much chance to investigate in
the last week.

Regards,
Jeff Davis

postgres(at)tao:/usr/local/pgsql-9.0alpha5-build1/data/data9.0$ ../../bin/postmaster -D /usr/local/pgsql-9.0alpha5-build1/data/data9.0
LOG: database system was interrupted; last known up at 2010-04-03 16:55:20 PDT
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: entering standby mode
LOG: redo starts at 0/BC0000B8
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: unexpected pageaddr 0/9B000000 in log file 0, segment 189, offset 0
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
FATAL: could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
Reason: no suitable image found. Did find:
/Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: unexpected pageaddr 0/9B000000 in log file 0, segment 189, offset 0
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
postgres(at)tao:/usr/local/pgsql-9.0alpha5-build1/data/data9.0$ fg
bash: fg: current: no such job


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, quinn(at)fairpath(dot)com, Josh Berkus <josh(at)postgresql(dot)org>
Subject: Re: Possible alpha5 SR bug
Date: 2010-04-13 06:22:53
Message-ID: o2g3f0b79eb1004122322t2e5ab435x37793b36e8ad3a9b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thanks for the test and report!

On Tue, Apr 13, 2010 at 1:36 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> FATAL:  could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
>          Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
>          Reason: no suitable image found.  Did find:
>                /Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13

Seems to have failed in loading libpq.5.dylib. I guess that "errno=13" means
"permission denied". Please ensure that the permission is appropriate.

> LOG:  WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault

Oops! I guess that this happened because walrcv_disconnect() was called in
WalRcvDie() even though libpqwalreceiver.so couldn't be loaded. The attached
patch would fix the problem.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
walrcv_segv_v1.patch application/octet-stream 580 bytes

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, quinn <quinn(at)fairpath(dot)com>, Josh Berkus <josh(at)postgresql(dot)org>
Subject: Re: Possible alpha5 SR bug
Date: 2010-04-13 08:16:25
Message-ID: k2i9837222c1004130116m9941c70ejbd2a464bd5935dbf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Apr 13, 2010 at 08:22, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Thanks for the test and report!
>
> On Tue, Apr 13, 2010 at 1:36 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> FATAL:  could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
>>          Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
>>          Reason: no suitable image found.  Did find:
>>                /Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13
>
> Seems to have failed in loading libpq.5.dylib. I guess that "errno=13" means
> "permission denied". Please ensure that the permission is appropriate.
>
>> LOG:  WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault
>
> Oops! I guess that this happened because walrcv_disconnect() was called in
> WalRcvDie() even though libpqwalreceiver.so couldn't be loaded. The attached
> patch would fix the problem.

Applied, thanks.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/