libxml incompatibility

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: libxml incompatibility
Date: 2009-03-06 19:14:04
Message-ID: 20090306191404.GK3901@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

It seems that if you load libxml into a backend for whatever reason (say
you create a table with a column of type xml) and then create a plperlu
function that "use XML::LibXML", we get a segmentation fault.

This sequence reproduces the problem for me in 8.3:

create table xmlcrash (a xml);
insert into xmlcrash values ('<a />');
create function xmlcrash() returns void language plperlu as $$ use XML::LibXML; $$;

The problem is reported as

TRAP: BadArgument(«!(((context) != ((void *)0) && (((((Node*)((context)))->type) == T_AllocSetContext))))», Archivo: «/pgsql/source/83_rel/src/backend/utils/mmgr/mcxt.c», Línea: 507)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-03-06 19:30:46
Message-ID: 20090306193046.GB13289@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This looks like a problem caused by two different libxml versions:
the one used for the perl XML::LibXML wrappers and the one used to
build PostgreSQL. They really need to be the same. Does it still
segfault if they are identical?

Regards,
Ken

On Fri, Mar 06, 2009 at 04:14:04PM -0300, Alvaro Herrera wrote:
> Hi,
>
> It seems that if you load libxml into a backend for whatever reason (say
> you create a table with a column of type xml) and then create a plperlu
> function that "use XML::LibXML", we get a segmentation fault.
>
> This sequence reproduces the problem for me in 8.3:
>
> create table xmlcrash (a xml);
> insert into xmlcrash values ('<a />');
> create function xmlcrash() returns void language plperlu as $$ use XML::LibXML; $$;
>
> The problem is reported as
>
> TRAP: BadArgument(?!(((context) != ((void *)0) && (((((Node*)((context)))->type) == T_AllocSetContext))))?, Archivo: ?/pgsql/source/83_rel/src/backend/utils/mmgr/mcxt.c?, L?nea: 507)
>
>
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-03-06 19:58:30
Message-ID: 49B18066.50202@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Hi,
>
> It seems that if you load libxml into a backend for whatever reason (say
> you create a table with a column of type xml) and then create a plperlu
> function that "use XML::LibXML", we get a segmentation fault.
>
>
>

Yes, I discovered this a few weeks ago. It looks like libxml is not
reentrant, so for perl you need to use some other XML library. Very
annoying.

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-03-06 20:23:45
Message-ID: 20090306202345.GB3161@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> This looks like a problem caused by two different libxml versions:
> the one used for the perl XML::LibXML wrappers and the one used to
> build PostgreSQL. They really need to be the same. Does it still
> segfault if they are identical?

Unlikely, because AFAICT there's a single libxml installed on my system.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-03-06 20:32:25
Message-ID: 20090306203225.GC13289@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>
>
> Alvaro Herrera wrote:
>> Hi,
>>
>> It seems that if you load libxml into a backend for whatever reason (say
>> you create a table with a column of type xml) and then create a plperlu
>> function that "use XML::LibXML", we get a segmentation fault.
>>
>>
>>
>
> Yes, I discovered this a few weeks ago. It looks like libxml is not
> reentrant, so for perl you need to use some other XML library. Very
> annoying.
>
> cheers
>
> andrew
>
Ugh! That is worse than a simple library link incompatibility.

Ken


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-03-06 21:14:34
Message-ID: 20090306211434.GD3161@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kenneth Marshall wrote:
> On Fri, Mar 06, 2009 at 05:23:45PM -0300, Alvaro Herrera wrote:
> > Kenneth Marshall wrote:
> > > This looks like a problem caused by two different libxml versions:
> > > the one used for the perl XML::LibXML wrappers and the one used to
> > > build PostgreSQL. They really need to be the same. Does it still
> > > segfault if they are identical?
> >
> > Unlikely, because AFAICT there's a single libxml installed on my system.
> >
> Yes, I saw Andrew's comment and I have had that problem my self with
> Apache/PHP and perl with libxml. As simple library mismatch would at
> least be easy to resolve. :)

Agreed :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-03-06 21:26:12
Message-ID: 20090306212612.GE13289@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 06, 2009 at 05:23:45PM -0300, Alvaro Herrera wrote:
> Kenneth Marshall wrote:
> > This looks like a problem caused by two different libxml versions:
> > the one used for the perl XML::LibXML wrappers and the one used to
> > build PostgreSQL. They really need to be the same. Does it still
> > segfault if they are identical?
>
> Unlikely, because AFAICT there's a single libxml installed on my system.
>
Yes, I saw Andrew's comment and I have had that problem my self with
Apache/PHP and perl with libxml. As simple library mismatch would at
least be easy to resolve. :)

Regards,
Ken


From: "Holger Hoffstaette" <holger(at)wizards(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-06 23:48:56
Message-ID: pan.2009.03.06.23.48.56.645875@wizards.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:

> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>> Yes, I discovered this a few weeks ago. It looks like libxml is not
>> reentrant, so for perl you need to use some other XML library. Very
>> annoying.
>>
> Ugh! That is worse than a simple library link incompatibility.

http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html

Seems to me that Perl (?) is calling functions it is not supposed to call
- I'm guessing due to assumptions about mismatching lifecycles. The
parsing functions themselves are supposedly reentrant.

-h


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Holger Hoffstaette <holger(at)wizards(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-07 02:44:53
Message-ID: 49B1DFA5.3050905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Holger Hoffstaette wrote:
> On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:
>
>
>> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>>
>>> Yes, I discovered this a few weeks ago. It looks like libxml is not
>>> reentrant, so for perl you need to use some other XML library. Very
>>> annoying.
>>>
>>>
>> Ugh! That is worse than a simple library link incompatibility.
>>
>
> http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html
>
> Seems to me that Perl (?) is calling functions it is not supposed to call
> - I'm guessing due to assumptions about mismatching lifecycles. The
> parsing functions themselves are supposedly reentrant.
>
>
>

Maybe someone can trace the libxml calls ... not sure how exactly ...
given Alvaro's example, it doesn't seem likely to me that this is due to
a call to xmlCleanupParser(), but maybe the perl code invokes by simply
doing "use XML::LibXML;" calls that for some perverse reason.

My interest wasn't so high that I wanted to spend a lot of time on it.
If it didn't work I was just going to move on.

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Holger Hoffstaette <holger(at)wizards(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-07 03:55:17
Message-ID: 20090307035517.GN3901@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> Holger Hoffstaette wrote:
>
>> http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html
>>
>> Seems to me that Perl (?) is calling functions it is not supposed to call
>> - I'm guessing due to assumptions about mismatching lifecycles. The
>> parsing functions themselves are supposedly reentrant.
>
> Maybe someone can trace the libxml calls ... not sure how exactly ...
> given Alvaro's example, it doesn't seem likely to me that this is due to
> a call to xmlCleanupParser(), but maybe the perl code invokes by simply
> doing "use XML::LibXML;" calls that for some perverse reason.

Something that came to my mind was that maybe the change of memory
management (to make it use palloc) could be confusing libxml somehow.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Holger Hoffstaette <holger(at)wizards(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-07 04:11:10
Message-ID: 49B1F3DE.7060003@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Holger Hoffstaette wrote:
>>
>>
>>> http://www.nabble.com/New-libxml-which-is-reentrant---to18329452.html
>>>
>>> Seems to me that Perl (?) is calling functions it is not supposed to call
>>> - I'm guessing due to assumptions about mismatching lifecycles. The
>>> parsing functions themselves are supposedly reentrant.
>>>
>> Maybe someone can trace the libxml calls ... not sure how exactly ...
>> given Alvaro's example, it doesn't seem likely to me that this is due to
>> a call to xmlCleanupParser(), but maybe the perl code invokes by simply
>> doing "use XML::LibXML;" calls that for some perverse reason.
>>
>
> Something that came to my mind was that maybe the change of memory
> management (to make it use palloc) could be confusing libxml somehow.
>
>

Seems very possible. But what would perl be doing just as a result of
loading the module, not even doing anything, that would cause a segfault
because of that?

cheers

andrew


From: David Lee Lambert <davidl(at)lmert(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-09 08:55:55
Message-ID: 806a7b82-ee81-465c-ab4a-da297d465c99@r18g2000vbi.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6 mar, 22:44, and(dot)(dot)(dot)(at)dunslane(dot)net (Andrew Dunstan) wrote:
> Holger Hoffstaette wrote:
> > On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:
> >> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
> >>> Yes, I discovered this a few weeks ago. [...]
>
> Maybe someone can trace the libxml calls ... not sure how exactly ...
> given Alvaro's example, it doesn't seem likely to me that this is due to
> a call to xmlCleanupParser(), but maybe the perl code invokes by simply
> doing "use XML::LibXML;" calls that for some perverse reason.

I'm able to duplicate this on Postgres 8.4 (Debian Etch, XML::LibXML
from CPAN). Here's the backtrace from the crash:

#0 0x082f3cf1 in MemoryContextAlloc ()
#1 0x082c3f8a in xml_palloc ()
#2 0xb7dfa548 in xmlInitCharEncodingHandlers () from /usr/lib/
libxml2.so.2
#3 0xb7e0195e in xmlInitParser () from /usr/lib/libxml2.so.2
#4 0xb7dff2ef in xmlCheckVersion () from /usr/lib/libxml2.so.2
#5 0xb573af2e in boot_XML__LibXML ()
from /usr/local/lib/perl/5.8.8/auto/XML/LibXML/LibXML.so
#6 0xb587981b in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
#7 0xb5877f19 in Perl_runops_standard () from /usr/lib/libperl.so.5.8
#8 0xb5819b6e in Perl_magicname () from /usr/lib/libperl.so.5.8
#9 0xb581a844 in Perl_call_sv () from /usr/lib/libperl.so.5.8
...

Is it supposed to be OK to call xmlCheckVersion() more than once?

--
DLL


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Lee Lambert <davidl(at)lmert(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-09 12:20:50
Message-ID: 49B509A2.1040406@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Lee Lambert wrote:
> On 6 mar, 22:44, and(dot)(dot)(dot)(at)dunslane(dot)net (Andrew Dunstan) wrote:
>
>> Holger Hoffstaette wrote:
>>
>>> On Fri, 06 Mar 2009 14:32:25 -0600, Kenneth Marshall wrote:
>>>
>>>> On Fri, Mar 06, 2009 at 02:58:30PM -0500, Andrew Dunstan wrote:
>>>>
>>>>> Yes, I discovered this a few weeks ago. [...]
>>>>>
>> Maybe someone can trace the libxml calls ... not sure how exactly ...
>> given Alvaro's example, it doesn't seem likely to me that this is due to
>> a call to xmlCleanupParser(), but maybe the perl code invokes by simply
>> doing "use XML::LibXML;" calls that for some perverse reason.
>>
>
> I'm able to duplicate this on Postgres 8.4 (Debian Etch, XML::LibXML
> from CPAN). Here's the backtrace from the crash:
>
> #0 0x082f3cf1 in MemoryContextAlloc ()
> #1 0x082c3f8a in xml_palloc ()
> #2 0xb7dfa548 in xmlInitCharEncodingHandlers () from /usr/lib/
> libxml2.so.2
> #3 0xb7e0195e in xmlInitParser () from /usr/lib/libxml2.so.2
> #4 0xb7dff2ef in xmlCheckVersion () from /usr/lib/libxml2.so.2
> #5 0xb573af2e in boot_XML__LibXML ()
> from /usr/local/lib/perl/5.8.8/auto/XML/LibXML/LibXML.so
> #6 0xb587981b in Perl_pp_entersub () from /usr/lib/libperl.so.5.8
> #7 0xb5877f19 in Perl_runops_standard () from /usr/lib/libperl.so.5.8
> #8 0xb5819b6e in Perl_magicname () from /usr/lib/libperl.so.5.8
> #9 0xb581a844 in Perl_call_sv () from /usr/lib/libperl.so.5.8
> ...
>
> Is it supposed to be OK to call xmlCheckVersion() more than once?
>
>
>

You are certainly not supposed to call xmlInitParser more than once -
see <http://xmlsoft.org/html/libxml-parser.html#xmlInitParser>

Since this is being called by xmlCheckVersion(), that looks like a bug
in libxml2.

Even if this were fixed, however, I'm still not convinced that we'll be
able to call libxml2 from perl after we've installed our memory handler
(xml_palloc).

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: David Lee Lambert <davidl(at)lmert(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-10 20:14:24
Message-ID: 2983.1236716064@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> David Lee Lambert wrote:
>> Is it supposed to be OK to call xmlCheckVersion() more than once?

> You are certainly not supposed to call xmlInitParser more than once -
> see <http://xmlsoft.org/html/libxml-parser.html#xmlInitParser>

No, what that says is that it can't be called concurrently by more
than one thread. If there were such a restriction then our own code
wouldn't work at all, because we call it every time through xml_parse()
or xpath().

> Even if this were fixed, however, I'm still not convinced that we'll be
> able to call libxml2 from perl after we've installed our memory handler
> (xml_palloc).

Yeah, I'm wondering about that too. It certainly wouldn't have the
behavior that perl is expecting.

We could possibly use xmlMemGet() to fetch the prior settings and then
restore them after we are done, but making sure that happens after an
error would be a bit tricky.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, David Lee Lambert <davidl(at)lmert(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: libxml incompatibility
Date: 2009-03-22 03:16:46
Message-ID: 4223.1237691806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> We could possibly use xmlMemGet() to fetch the prior settings and then
> restore them after we are done, but making sure that happens after an
> error would be a bit tricky.

I experimented with this a bit, and came up with the attached patch.
Basically what it does is revert libxml to its native memory management
methods anytime LibxmlContext doesn't exist. It fixes Alvaro's original
test case and some variants that I stumbled across, but I can't say that
I have a lot of faith in it. I see at least a couple of risk factors:

* it doesn't scale to the case where some other code is doing the same
kind of thing --- the pointers we saved during xml_init might or might
not still be appropriate to restore at end of transaction.

* suppose that a plperl function does some Perlish XML stuff, then calls
a SQL function that calls something in xml.c. When we start up use of
LibxmlContext we'll wipe the internal state of libxml (which we *have*
to do; this still crashes trivially without the added xmlCleanupParser
call). Can this break anything that the perl XML code is expecting to
still be valid when control gets back to it?

If this doesn't work then I'm afraid we'll need some radical rethinking
of the way we handle libxml memory management...

Please test. I'm not much with either Perl or XML and have little
idea of how to stress this.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 3.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libxml incompatibility
Date: 2009-05-13 20:30:19
Message-ID: 17598.1242246619@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> It seems that if you load libxml into a backend for whatever reason (say
> you create a table with a column of type xml) and then create a plperlu
> function that "use XML::LibXML", we get a segmentation fault.

I've applied a patch for this in HEAD. It fixes the reported case,
but since I'm not a big user of either Perl or XML, it would be good
to get some more testing done ...

regards, tom lane