Re: 9.4 beta1 crash on Debian sid/i386

From: Christoph Berg <cb(at)df7cb(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4 beta1 crash on Debian sid/i386
Date: 2014-05-17 22:40:42
Message-ID: 20140517224042.GF9148@msgid.df7cb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Re: Tom Lane 2014-05-14 <1357(dot)1400028161(at)sss(dot)pgh(dot)pa(dot)us>
> Christoph Berg <cb(at)df7cb(dot)de> writes:
> > Building 9.4 beta1 on Debian sid/i386 fails during the regression
> > tests. amd64 works fine, as does i386 on the released distributions.
>
> It would appear that something is wrong with check_stack_depth(),
> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
> None of that logic has changed in awhile. You might try checking what
> the max_stack_depth GUC gets set to by default in each build, and how that
> compares to what "ulimit -s" has to say. If it looks sane, try tracing
> through check_stack_depth.

ulimit -s is 8192 (kB); max_stack_depth is 2MB.

check_stack_depth looks right, max_stack_depth_bytes there is 2097152
and I can see stack_base_ptr - &stack_top_loc grow over repeated
invocations of the function (stack_depth itself is optimized out).
Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".

Using "b check_stack_depth if (stack_base_ptr - &stack_top_loc) >
1000000", I could see the stack size at 1000264, though when I then
tried with > 1900000, it caught SIGBUS again.

In the meantime, the problem has manifested itself also on other
architectures: armel, armhf, and mipsel (the build logs are at [1],
though they don't contain anything except a "FATAL: the database
system is in recovery mode").

[1] https://buildd.debian.org/status/logs.php?pkg=postgresql-9.4&ver=9.4~beta1-1

Interestingly, the Debian buildd managed to run the testsuite for
i386, while I could reproduce the problem on the pgapt build machine
and on my notebook, so there must be some system difference. Possibly
the reason is these two machines are running a 64bit kernel and I'm
building in a 32bit chroot, though that hasn't been a problem before.

Christoph
--
cb(at)df7cb(dot)de | http://www.df7cb.de/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-05-17 23:15:15 pgbench is broken on strict-C89 compilers
Previous Message David Rowley 2014-05-17 22:04:42 Re: Allowing join removals for more join types