Re: backend process terminates

Lists: pgsql-general
From: Geoffrey Myers <geof(at)serioustechnology(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: backend process terminates
Date: 2007-08-03 15:12:43
Message-ID: 46B345EB.1070906@serioustechnology.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We've been wrestling with a problem where the backend process terminates
with a SEGSEGV. We are having a hard time tracking this thing down, so
I decided to run a batch gdb process that single steps through the code
until it crashes and post the output to the list for a request for
assistance. The problem is that the output file is 324k, so I'm
sticking it on a website so as not to send such a large file as a
attachment. We would appreciate any assistance folks might have in
helping us determine what is going on here. The following is the query
run that generated this segfault:

select pcm_getmiles_s('sparta, nc', 'buffalo, ny', 0);

We are building pcm_getmiles_s() into the backend process.

This is Postgresql 7.4.17 on Red Hat Enterprise 4.

The output from the gdb batch process may be found here:

http://www.serioustechnology.com/gdbbatch.txt

Any help will be greatly appreciated.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Geoffrey Myers <geof(at)serioustechnology(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: backend process terminates
Date: 2007-08-04 20:14:18
Message-ID: 1222.1186258458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Geoffrey Myers <geof(at)serioustechnology(dot)com> writes:
> The output from the gdb batch process may be found here:
> http://www.serioustechnology.com/gdbbatch.txt

gdb isn't telling you the whole truth, evidently --- how'd control get
from line 781 to 912 with nothing in between? Recompiling the backend
with -O0 or at most -O1 would be a good idea to get a more trustworthy
gdb trace.

regards, tom lane


From: Geoffrey <lists(at)serioustechnology(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: backend process terminates
Date: 2007-08-04 23:46:14
Message-ID: 46B50FC6.50107@serioustechnology.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Geoffrey Myers <geof(at)serioustechnology(dot)com> writes:
>> The output from the gdb batch process may be found here:
>> http://www.serioustechnology.com/gdbbatch.txt
>
> gdb isn't telling you the whole truth, evidently --- how'd control get
> from line 781 to 912 with nothing in between? Recompiling the backend
> with -O0 or at most -O1 would be a good idea to get a more trustworthy
> gdb trace.

Well, there is some third party libraries we've built into the backend
that we don't have the source for. We think it may be that there's some
memory corruption going on there.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin


From: Geoffrey <lists(at)serioustechnology(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: backend process terminates
Date: 2007-08-07 11:46:45
Message-ID: 46B85BA5.3080103@serioustechnology.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> Geoffrey Myers <geof(at)serioustechnology(dot)com> writes:
>> The output from the gdb batch process may be found here:
>> http://www.serioustechnology.com/gdbbatch.txt
>
> gdb isn't telling you the whole truth, evidently --- how'd control get
> from line 781 to 912 with nothing in between? Recompiling the backend
> with -O0 or at most -O1 would be a good idea to get a more trustworthy
> gdb trace.

As previously noted, we are building some third party code into the
backend. We don't have the source code, so it's difficult to know what
might be going on there.

I don't know all the idiosyncrasies of how this works, so bear with me
on this. The developer at the vendor indicated that he's narrowed down
the problem to a set of wrapper routines in their code. They are named
OpenFile(), CloseFile() and ReadFile(); He inquired as to whether there
might be routines in the Postgresql code with the same names that might
be causing a conflict. Sure enough, I searched the Postgresql source
code and found routines with the same names. I don't see how this could
pose a problem though, as it is my understanding that the compiler will
properly address this issue.

Anyone think this might be a problem?

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Geoffrey <lists(at)serioustechnology(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: backend process terminates
Date: 2007-08-08 08:47:13
Message-ID: 20070808084713.GA27465@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Aug 07, 2007 at 07:46:45AM -0400, Geoffrey wrote:
> I don't know all the idiosyncrasies of how this works, so bear with me
> on this. The developer at the vendor indicated that he's narrowed down
> the problem to a set of wrapper routines in their code. They are named
> OpenFile(), CloseFile() and ReadFile(); He inquired as to whether there
> might be routines in the Postgresql code with the same names that might
> be causing a conflict. Sure enough, I searched the Postgresql source
> code and found routines with the same names. I don't see how this could
> pose a problem though, as it is my understanding that the compiler will
> properly address this issue.

Yes, this could cause a problem. In general, when loading a library,
any external references are first resolved against the main
executable, then already loaded libraries, then the library being
loaded. It's all in the ELF standard, if you're interested.

As for solutions:
1. In your third party library, have the library built in such a way
that the symbols are explicitly bound to the internal library version.
There are various methods for dealing with that, it all depends on the
toolchain used to build it. I suppose this product is actually several
libraries that call eachother? Namespace would help here.

2. Make sure that any externally visible symbols in libraries are
always prefixed by a tag, like libpq does (almost all symbols are pq*).

Running "nm -D" over the main postgres executable and your libraries
should give you an idea of the scope of the problem.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Geoffrey <lists(at)serioustechnology(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: backend process terminates
Date: 2007-08-08 12:50:41
Message-ID: 46B9BC21.40206@serioustechnology.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout wrote:
> On Tue, Aug 07, 2007 at 07:46:45AM -0400, Geoffrey wrote:
>> I don't know all the idiosyncrasies of how this works, so bear with me
>> on this. The developer at the vendor indicated that he's narrowed down
>> the problem to a set of wrapper routines in their code. They are named
>> OpenFile(), CloseFile() and ReadFile(); He inquired as to whether there
>> might be routines in the Postgresql code with the same names that might
>> be causing a conflict. Sure enough, I searched the Postgresql source
>> code and found routines with the same names. I don't see how this could
>> pose a problem though, as it is my understanding that the compiler will
>> properly address this issue.
>
> Yes, this could cause a problem. In general, when loading a library,
> any external references are first resolved against the main
> executable, then already loaded libraries, then the library being
> loaded. It's all in the ELF standard, if you're interested.

I will be checking them out. My compiler knowledge is a bit rusty,
circa SVR4... ;)

> As for solutions:
> 1. In your third party library, have the library built in such a way
> that the symbols are explicitly bound to the internal library version.
> There are various methods for dealing with that, it all depends on the
> toolchain used to build it. I suppose this product is actually several
> libraries that call eachother? Namespace would help here.

Correct on both counts. Many of the routines are wrapper routines used
to assist in code portability.

> 2. Make sure that any externally visible symbols in libraries are
> always prefixed by a tag, like libpq does (almost all symbols are pq*).
>
> Running "nm -D" over the main postgres executable and your libraries
> should give you an idea of the scope of the problem.
>
> Hope this helps,

It appears that the common routine names were causing the problem. We
are currently testing new versions of these libraries where they have
renamed the common routines with unique names.

Thanks for the insights.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Geoffrey <lists(at)serioustechnology(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: backend process terminates
Date: 2007-08-08 22:38:00
Message-ID: 20070808223800.GB14445@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Aug 08, 2007 at 08:50:41AM -0400, Geoffrey wrote:
> Correct on both counts. Many of the routines are wrapper routines used
> to assist in code portability.

That ok in programs, but shared libraries need to be careful not to use
names likely to be used by programs that use them. FWIW, this document
has lots of information about ELF shared libraries.

http://people.redhat.com/drepper/dsohowto.pdf

There's a lot of technical stuff that you can skip, but there is a lot
of info about scopes and how they are resolved, common problems and how
to fix them.

Have a nice,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.