From: | Christoph Berg <cb(at)df7cb(dot)de> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: 9.4 beta1 crash on Debian sid/i386 |
Date: | 2014-05-18 09:08:34 |
Message-ID: | 20140518090834.GA18253@msgid.df7cb.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Re: Tom Lane 2014-05-18 <9058(dot)1400385611(at)sss(dot)pgh(dot)pa(dot)us>
> Christoph Berg <cb(at)df7cb(dot)de> writes:
> > Re: Tom Lane 2014-05-14 <1357(dot)1400028161(at)sss(dot)pgh(dot)pa(dot)us>
> >> It would appear that something is wrong with check_stack_depth(),
> >> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack.
>
> > ulimit -s is 8192 (kB); max_stack_depth is 2MB.
>
> > check_stack_depth looks right, max_stack_depth_bytes there is 2097152
> > and I can see stack_base_ptr - &stack_top_loc grow over repeated
> > invocations of the function (stack_depth itself is optimized out).
> > Still, it never enters "if (stack_depth > max_stack_depth_bytes...)".
>
> Hm. Did you check that stack_base_ptr is non-NULL? If it were somehow
> not getting set, that would disable the error report. But on most
> architectures that would also result in silly values for the pointer
> difference, so I doubt this is the issue.
stack_base_ptr was non-NULL. The stack size started around 3 or 5kB
(don't remember exactly), and grew by something like a few 100B in
each iteration, so this looked sane.
> > Interestingly, the Debian buildd managed to run the testsuite for
> > i386, while I could reproduce the problem on the pgapt build machine
> > and on my notebook, so there must be some system difference. Possibly
> > the reason is these two machines are running a 64bit kernel and I'm
> > building in a 32bit chroot, though that hasn't been a problem before.
>
> I'm suspicious that something has changed in your build environment,
> because that stack-checking logic hasn't changed since these commits:
It's something in the combination of build and runtime environment. I
can reproduce the problem in the package that the Debian
i386/experimental buildd has compiled, including passing the
regression tests there. Possibly a change in libc there. I'll try to
ask some kernel/libc people if they have an idea. My current bet is on
the gcc hardening flags we are using.
> The lack of reports from the buildfarm or other users is also evidence
> against there being a widespread issue here.
The only animal running Debian testing/unstable I can see is dugong,
which is ia64 - which has been removed from Debian some months ago.
I guess I should look into setting up a new animal for this.
> A different thought: I have heard of environments in which the available
> stack depth is much less than what ulimit would suggest because the ulimit
> space gets split up for multiple per-thread stacks. That should not be
> happening in a Postgres backend, since we don't do threading, but I'm
> running out of ideas to investigate ...
I've done some builds now and there's no clear picture yet when the
problem is occurring. Still trying...
Christoph
--
cb(at)df7cb(dot)de | http://www.df7cb.de/
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-05-18 09:14:45 | Re: 9.4 beta1 crash on Debian sid/i386 |
Previous Message | Raghavendra | 2014-05-18 08:40:53 | Is it typo in pg_stat_replication column name in PG 9.4 ? |