Re: BUG #1473: Backend bus error, possibly due to ANALYZE

Lists: pgsql-bugs
From: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-11 22:13:36
Message-ID: 20050211221336.GA25636@bbdab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Feb 11, 2005 at 04:56:16PM -0500, Tom Lane wrote:
> regression=# create function infinite_recurse() returns int as '
> regression'# select infinite_recurse()' language sql;
> regression=# \set VERBOSITY terse
> regression=# select infinite_recurse();
> ERROR: stack depth limit exceeded
>
> and see if you get the proper error or a core dump?

This makes the backend core dump in about 5-10 seconds. It looks like your
analysis is correct. Excellent work. I am not sure if it's my particular
FreeBSD installation that is screwing up or if it's FreeBSD in general.
Did you happen to test with FreeBSD on one of your test machines?

> If it dumps core, then the thing to look at is check_stack_depth() in
> src/backend/tcop/postgres.c. Maybe your compiler is bogusly optimizing
> the address arithmetic there?

It is possible. Using gcc 2.95.4. I will eliminate all the optimization
options when I recompile PostgreSQL and see what happens.

Thanks,

Brian B.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-11 22:17:43
Message-ID: 15512.1108160263@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Brian B." <brian-pgsql(at)bbdab(dot)org> writes:
> Did you happen to test with FreeBSD on one of your test machines?

No, I don't have any BSD machines here. However, I've added this test
case to the regression tests, so in a few hours we'll have a spectrum
of results from the PG build farm.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-11 23:46:55
Message-ID: 16235.1108165615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

>> Did you happen to test with FreeBSD on one of your test machines?

> No, I don't have any BSD machines here. However, I've added this test
> case to the regression tests, so in a few hours we'll have a spectrum
> of results from the PG build farm.

FWIW, I see a "pass" from buildfarm member cockatoo, which claims to be
FreeBSD 4.10-STABLE gcc 2.95.4 x86 ... that's at least pretty close to
your setup, no?

http://www.pgbuildfarm.org/cgi-bin/show_status.pl

regards, tom lane


From: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-11 23:54:31
Message-ID: 20050211235431.GA54211@bbdab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Feb 11, 2005 at 06:46:55PM -0500, Tom Lane wrote:
> >> Did you happen to test with FreeBSD on one of your test machines?
>
> > No, I don't have any BSD machines here. However, I've added this test
> > case to the regression tests, so in a few hours we'll have a spectrum
> > of results from the PG build farm.
>
> FWIW, I see a "pass" from buildfarm member cockatoo, which claims to be
> FreeBSD 4.10-STABLE gcc 2.95.4 x86 ... that's at least pretty close to
> your setup, no?

Yup ... that's really close. I'm going to do a few tests to try to figure
out what's going on with my environment and let you know if I come up with
anything odd.

Thanks for your time!

Brian B.


From: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 01:55:24
Message-ID: 20050212015524.GA15442@bbdab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Feb 11, 2005 at 06:54:31PM -0500, I wrote:
> Yup ... that's really close. I'm going to do a few tests to try to figure
> out what's going on with my environment and let you know if I come up with
> anything odd.

I think I have figured out the culprit.

I use the FreeBSD PostgreSQL port and I set the option to use threads so
that pl/python will work with PostgreSQL. If I unset this option and rebuild
the port, I will receive the normal error message but no crash. Just to
make sure it was not the port's fault, I build the source by hand without
any port-specific patches. The same behavior occurs.

So I am not sure if, again, it's my libc_r or something gets messed up in
the PostgreSQL code dealing with the stack/recursion, when I have pthread
libs linked in. I am guessing it is just my particular environment, though.

Thanks,

Brian B.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 02:46:50
Message-ID: 17594.1108176410@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Brian B." <brian-pgsql(at)bbdab(dot)org> writes:
> I use the FreeBSD PostgreSQL port and I set the option to use threads so
> that pl/python will work with PostgreSQL.

What option is that, exactly?

It's entirely possible that something has decided that the backend is
going to be multithreaded and is only giving the "main" thread a
1MB-or-so stack. If so, I would regard this as a build error. We do
not want threading libraries linked into the backend.

regards, tom lane


From: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 03:02:01
Message-ID: 20050212030201.GA25804@bbdab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Feb 11, 2005 at 09:46:50PM -0500, Tom Lane wrote:
> "Brian B." <brian-pgsql(at)bbdab(dot)org> writes:
> > I use the FreeBSD PostgreSQL port and I set the option to use threads so
> > that pl/python will work with PostgreSQL.
>
> What option is that, exactly?
>
> It's entirely possible that something has decided that the backend is
> going to be multithreaded and is only giving the "main" thread a
> 1MB-or-so stack. If so, I would regard this as a build error. We do
> not want threading libraries linked into the backend.

Apologies, due to message revising, I forgot to include the explanation of what
this port setting entails.

When setting the "LIBC_R" option, the FreeBSD port essentially sets CFLAGS to
-D_THREAD_SAFE and LDFLAGS to -pthread. This is probably due to the Python
procedural handler not being able to link with PostgreSQL until PostgreSQL
is built pthread-aware. I could be wrong on all of this, but it seems to
work as such. Unfortunately, this has some unforeseen broken behavior for
PostgreSQL that was sorta hard to debug. :)

Perhaps there is another way around it. I think the situation now is that I
should converse with the FreeBSD port developer rather than using up your time.

Thank you,

Brian B.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 03:05:36
Message-ID: 17777.1108177536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Brian B." <brian-pgsql(at)bbdab(dot)org> writes:
> When setting the "LIBC_R" option, the FreeBSD port essentially sets CFLAGS to
> -D_THREAD_SAFE and LDFLAGS to -pthread. This is probably due to the Python
> procedural handler not being able to link with PostgreSQL until PostgreSQL
> is built pthread-aware.

Hmm, is that a FreeBSD-specific restriction? I've not had any such
trouble on Linux or Mac OS X.

regards, tom lane


From: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 03:24:46
Message-ID: 20050212032446.GA25933@bbdab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Feb 11, 2005 at 10:05:36PM -0500, Tom Lane wrote:
> "Brian B." <brian-pgsql(at)bbdab(dot)org> writes:
> > When setting the "LIBC_R" option, the FreeBSD port essentially sets CFLAGS to
> > -D_THREAD_SAFE and LDFLAGS to -pthread. This is probably due to the Python
> > procedural handler not being able to link with PostgreSQL until PostgreSQL
> > is built pthread-aware.
>
> Hmm, is that a FreeBSD-specific restriction? I've not had any such
> trouble on Linux or Mac OS X.

I was citing that behavior from memory. It actually builds/installs OK until
one decides to add the handler to a database:

$ createlang plpythonu
createlang: language installation failed: ERROR: could not load library
"/usr/local/lib/postgresql/plpython.so": dlopen
'/usr/local/lib/postgresql/plpython.so' failed.
(/usr/local/lib/python2.4/config/libpython2.4.so: Undefined symbol
"pthread_attr_destroy")

If I build the backend with the pthread stuff, this step succeeds.

Thanks,

Brian B.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 03:33:00
Message-ID: 17981.1108179180@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

BTW, something that would be interesting is to figure out what the
thread stack size actually is (I assume this is available in the FreeBSD
docs) and experiment to find what is the maximum value max_stack_depth
can be set to without letting infinite_recurse() dump core.

regards, tom lane


From: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 04:13:37
Message-ID: 20050212041337.GA43654@bbdab.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Feb 11, 2005 at 10:33:00PM -0500, Tom Lane wrote:
> BTW, something that would be interesting is to figure out what the
> thread stack size actually is (I assume this is available in the FreeBSD
> docs) and experiment to find what is the maximum value max_stack_depth
> can be set to without letting infinite_recurse() dump core.

You may be onto something, there. After doing some searching, I have
found the FreeBSD thread stack size as a default of 64KB! After Googling
for things like "freebsd thread stack size set", it seems other projects are
running into this situation with FreeBSD, as well as FreeBSD mailinglist
chatter about the philosophy behind proper stack usage and whether to just
match Linux's settings for this.

Some notable topics on the matter, being:

(Question about our default pthread stack size)
http://lists.freebsd.org/pipermail/freebsd-threads/2004-November/002699.html

([PATCH] Dynamic thread stack size)
http://lists.freebsd.org/pipermail/freebsd-threads/2005-January/002793.html

Search results showed several other projects having this issue. Some
try to workaround the problem by calling pthread_attr_setstacksize and using
a value anywhere from a default doubling (of 128KB) up to around 1MB.

Thanks,

Brian B.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 19:49:54
Message-ID: 23066.1108237794@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Brian B." <brian-pgsql(at)bbdab(dot)org> writes:
> You may be onto something, there. After doing some searching, I have
> found the FreeBSD thread stack size as a default of 64KB!

Ugh :-(. That might be reasonable for a program that's actually using
multiple threads, but a program that is not thread-aware at all
shouldn't be forced into that model IMHO.

As of now we are seeing one similar failure in the PG build farm,
member osprey: http://www.pgbuildfarm.org/cgi-bin/show_status.pl
It would seem that NetBSD 2.0 also has an unreasonably small default
stack size. Can anyone check on what NetBSD is using?

regards, tom lane


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Brian B(dot)" <brian-pgsql(at)bbdab(dot)org>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1473: Backend bus error, possibly due to ANALYZE
Date: 2005-02-12 21:16:10
Message-ID: 20050212211610.GA24118@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I haven't been following this thread closely, but if I understand
correctly then the problems are the result of linking PostgreSQL
against libc_r instead of libc on FreeBSD (at least FreeBSD 4.x),
which was done in an attempt to make plpythonu work. Otherwise
attempting to createlang plpythonu results in the following error:

createlang: language installation failed:
ERROR: could not load library "/usr/local/pgsql80/lib/plpython.so":
dlopen '/usr/local/pgsql80/lib/plpython.so' failed.
(/usr/local/lib/python2.4/config/libpython2.4.so: Undefined symbol "pthread_attr_destroy")

Is that right?

What I'm about to describe is a hack and it's probably wrong and
dangerous, but I did it as an experiment anyway to see what would
happen. I'm in no way suggesting that anybody should try it except
on a test server that can tolerate failure and data corruption.

I created a .so file with stub versions of the missing functions
and used preload_libraries to load it into the PostgreSQL backend.
I built PostgreSQL "normally", i.e., without linking against libc_r.

It worked, at least in simple tests.

Pthreads functions generally return 0 on success or some errno value
on failure. Most functions have EINVAL as a documented return value,
so I wrote all of the stub versions to return EINVAL -- I figured that
they should report failure instead of success because they don't
actually do anything. I also used ereport() to log the fact that a
function was called.

CREATE FUNCTION foo(integer) RETURNS integer AS $$
return args[0]
$$ LANGUAGE plpythonu IMMUTABLE STRICT;

SELECT foo(1234);
NOTICE: sem_init() called
NOTICE: sem_wait() called
NOTICE: sem_post() called
NOTICE: pthread_self() called
...
foo
------
1234
(1 row)

The stub functions are called only when the language handler is
first loaded -- subsequent calls to plpythonu functions don't print
any of the notices, at least not that I've seen so far:

SELECT foo(5678);
foo
------
5678
(1 row)

It's interesting that although all of the stub functions report
failure, the code runs anyway. It makes one wonder how thorough
the error checking is.

My pthread_phony.so file contains stub versions of the following
functions; all were required to stop the linker from complaining:

pthread_attr_destroy
pthread_attr_init
pthread_attr_setstacksize
pthread_create
pthread_detach
pthread_self
sem_destroy
sem_init
sem_post
sem_trywait
sem_wait

Again, this was nothing more than an experiment, and so far I've
done only a few simple tests. It could very well cause the system
to crash and burn. Don't try it unless you can afford to have your
database trashed.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/