Re: process crash when a plpython function returns unicode

Lists: pgsql-bugspgsql-hackers
From: Zac <zaccheob(at)inwind(dot)it>
To: pgsql-bugs(at)postgresql(dot)org
Subject: process crash when a plpython function returns unicode
Date: 2005-06-15 12:29:27
Message-ID: d8p70s$2jch$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,
I found this bug:
if you define a plpythonu function that returns a unicode python string
the server process crashes calling that function.

The example below refers to a PostgreSQL 8.1devel installation
downloaded today from nightly snapshot but I found the same problem on
PostgreSQL 8.0.1.

Bye
Zac

Example:
OS: SuSE Linux 9.1
Python version: Python 2.3.3

test=# select version();
version
----------------------------------------------------------------------------------------
PostgreSQL 8.1devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
3.3.3 (SuSE Linux)
(1 row)

test=# CREATE FUNCTION test_unicode() RETURNS text AS
test-# $$
test$# return u'\xe0'
test$# $$ LANGUAGE plpythonu;
CREATE FUNCTION
test=# SELECT test_unicode();
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

Server log (debug level 5):

DEBUG: invoking IpcMemoryCreate(size=10387456)
DEBUG: max_safe_fds = 985, usable_fds = 1019, already_open = 5
LOG: database system was shut down at 2005-06-15 14:05:00 CEST
LOG: checkpoint record is at 0/359FE4
LOG: redo record is at 0/359FE4; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 638; next OID: 24578
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system is ready
LOG: transaction ID wrap limit is 2147484134, limited by database "test"
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes
LOG: connection received: host=[local] port=
DEBUG: forked new backend, pid=7842 socket=6
LOG: connection authorized: user=postgres database=test
DEBUG: postmaster child[7842]: starting with (
DEBUG: postgres
DEBUG: -v196608
DEBUG: -p
DEBUG: test
DEBUG: )
DEBUG: InitPostgres
DEBUG: StartTransaction
DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 638/1/0, nestlvl: 1, children: <>
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 638/1/0, nestlvl: 1, children: <>
DEBUG: StartTransactionCommand
DEBUG: StartTransaction
DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 639/1/0, nestlvl: 1, children: <>
LOG: statement: select version();
DEBUG: parse tree:
DETAIL: {QUERY :commandType 1 :querySource 0 :canSetTag true
:utilityStmt <>
:resultRelation 0 :into <> :hasAggs false :hasSubLinks false
:rtable <>
:jointree {FROMEXPR :fromlist <> :quals <>} :rowMarks <>
:forUpdate false
:targetList ({TARGETENTRY :expr {FUNCEXPR :funcid 89
:funcresulttype 25
:funcretset false :funcformat 0 :args <>} :resno 1 :resname version
:ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false})
:groupClause
<> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <>
:limitCount <> :setOperations <> :resultRelations <>}

DEBUG: rewritten parse tree:
DETAIL: ({QUERY :commandType 1 :querySource 0 :canSetTag true
:utilityStmt <>
:resultRelation 0 :into <> :hasAggs false :hasSubLinks false
:rtable <>
:jointree {FROMEXPR :fromlist <> :quals <>} :rowMarks <>
:forUpdate false
:targetList ({TARGETENTRY :expr {FUNCEXPR :funcid 89
:funcresulttype 25
:funcretset false :funcformat 0 :args <>} :resno 1 :resname version
:ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false})
:groupClause
<> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <>
:limitCount <> :setOperations <> :resultRelations <>})

DEBUG: plan:
DETAIL: {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1
:plan_width 0
:targetlist ({TARGETENTRY :expr {FUNCEXPR :funcid 89
:funcresulttype 25
:funcretset false :funcformat 0 :args <>} :resno 1 :resname version
:ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false})
:qual <>
:lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b)
:nParamExec 0 :resconstantqual <>}

DEBUG: CommitTransactionCommand
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 639/1/0, nestlvl: 1, children: <>
DEBUG: checkpoint starting
DEBUG: checkpoint complete; 0 transaction log file(s) added, 0 removed,
0 recycled
DEBUG: StartTransactionCommand
DEBUG: StartTransaction
DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 640/1/0, nestlvl: 1, children: <>
LOG: statement: CREATE FUNCTION test_unicode() RETURNS text AS
$$
return u'\xe0'
$$ LANGUAGE plpythonu;
DEBUG: parse tree:
DETAIL: {QUERY :commandType 5 :querySource 0 :canSetTag true :utilityStmt ?
:resultRelation 0 :into <> :hasAggs false :hasSubLinks false
:rtable <>
:jointree <> :rowMarks <> :forUpdate false :targetList <>
:groupClause <>
:havingQual <> :distinctClause <> :sortClause <> :limitOffset
<> :limitCount
<> :setOperations <> :resultRelations <>}

DEBUG: rewritten parse tree:
DETAIL: ({QUERY :commandType 5 :querySource 0 :canSetTag true
:utilityStmt ?
:resultRelation 0 :into <> :hasAggs false :hasSubLinks false
:rtable <>
:jointree <> :rowMarks <> :forUpdate false :targetList <>
:groupClause <>
:havingQual <> :distinctClause <> :sortClause <> :limitOffset
<> :limitCount
<> :setOperations <> :resultRelations <>})

DEBUG: ProcessUtility
DEBUG: CommitTransactionCommand
DEBUG: CommitTransaction
DEBUG: name: unnamed; blockState: STARTED; state: INPROGR,
xid/subid/cid: 640/1/0, nestlvl: 1, children: <>
DEBUG: StartTransactionCommand
DEBUG: StartTransaction
DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR,
xid/subid/cid: 641/1/0, nestlvl: 1, children: <>
LOG: statement: SELECT test_unicode();
DEBUG: parse tree:
DETAIL: {QUERY :commandType 1 :querySource 0 :canSetTag true
:utilityStmt <>
:resultRelation 0 :into <> :hasAggs false :hasSubLinks false
:rtable <>
:jointree {FROMEXPR :fromlist <> :quals <>} :rowMarks <>
:forUpdate false
:targetList ({TARGETENTRY :expr {FUNCEXPR :funcid 24578
:funcresulttype 25
:funcretset false :funcformat 0 :args <>} :resno 1 :resname
test_unicode
:ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false})
:groupClause
<> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <>
:limitCount <> :setOperations <> :resultRelations <>}

DEBUG: rewritten parse tree:
DETAIL: ({QUERY :commandType 1 :querySource 0 :canSetTag true
:utilityStmt <>
:resultRelation 0 :into <> :hasAggs false :hasSubLinks false
:rtable <>
:jointree {FROMEXPR :fromlist <> :quals <>} :rowMarks <>
:forUpdate false
:targetList ({TARGETENTRY :expr {FUNCEXPR :funcid 24578
:funcresulttype 25
:funcretset false :funcformat 0 :args <>} :resno 1 :resname
test_unicode
:ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false})
:groupClause
<> :havingQual <> :distinctClause <> :sortClause <> :limitOffset <>
:limitCount <> :setOperations <> :resultRelations <>})

DEBUG: plan:
DETAIL: {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1
:plan_width 0
:targetlist ({TARGETENTRY :expr {FUNCEXPR :funcid 24578
:funcresulttype 25
:funcretset false :funcformat 0 :args <>} :resno 1 :resname
test_unicode
:ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false})
:qual <>
:lefttree <> :righttree <> :initPlan <> :extParam (b) :allParam (b)
:nParamExec 0 :resconstantqual <>}

DEBUG: reaping dead processes
DEBUG: server process (PID 7842) was terminated by signal 11
LOG: server process (PID 7842) was terminated by signal 11
LOG: terminating any other active server processes
DEBUG: sending SIGQUIT to process 7396
DEBUG: sending SIGQUIT to process 7397
LOG: connection received: host=[local] port=
FATAL: the database system is in recovery mode
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: forked new backend, pid=8613 socket=6
DEBUG: reaping dead processes
DEBUG: server process (PID 8613) exited with exit code 0
LOG: all server processes terminated; reinitializing
DEBUG: shmem_exit(0)
DEBUG: invoking IpcMemoryCreate(size=10387456)
LOG: database system was interrupted at 2005-06-15 14:10:18 CEST
LOG: checkpoint record is at 0/35A03C
LOG: redo record is at 0/35A03C; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 640; next OID: 24578
LOG: next MultiXactId: 1; next MultiXactOffset: 0
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 0/35A080
LOG: record with zero length at 0/360524
LOG: redo done at 0/3604FC
LOG: database system is ready
LOG: transaction ID wrap limit is 2147484134, limited by database "test"
DEBUG: proc_exit(0)
DEBUG: shmem_exit(0)
DEBUG: exit(0)
DEBUG: reaping dead processes


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Zac <zaccheob(at)inwind(dot)it>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: process crash when a plpython function returns unicode
Date: 2005-06-15 19:11:06
Message-ID: 20050615191106.GA59738@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Jun 15, 2005 at 02:29:27PM +0200, Zac wrote:

> if you define a plpythonu function that returns a unicode python string
> the server process crashes calling that function.

The crash happens if the Unicode string has the high bit set. For
example, u'\x7f' doesn't cause a crash, but u'\x80' does. Here's
a stack trace from HEAD and Python 2.4.1:

#0 0xfec6967c in PyString_AsString (op=0x0) at Objects/stringobject.c:698
#1 0xfed76a38 in PLy_function_handler (fcinfo=0xffbfde28, proc=0x47c610)
at plpython.c:777
#2 0xfed77df4 in plpython_call_handler (fcinfo=0xffbfde28) at plpython.c:352

Lines 776-77 in plpython.c are:

plrv_so = PyObject_Str(plrv);
plrv_sc = PyString_AsString(plrv_so);

PyObject_Str() is documented to return NULL on failure:

http://www.python.org/doc/2.4.1/api/object.html

Apparently PyString_AsString() isn't expecting a NULL argument, so
the code should probably check the return value of PyObject_Str()
before calling PyString_AsString().

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: process crash when a plpython function returns unicode
Date: 2005-06-18 14:41:15
Message-ID: 20050618144115.GA91757@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I've moved this thread from pgsql-bugs to pgsql-hackers; here are
the original messages:

http://archives.postgresql.org/pgsql-bugs/2005-06/msg00105.php
http://archives.postgresql.org/pgsql-bugs/2005-06/msg00107.php

As I mentioned in my followup to the bug report, a simple fix would
appear to be to check for a NULL return value from PyObject_Str()
before calling PyString_AsString() at the following location:

/* Lines 776-77 in plpython.c */
plrv_so = PyObject_Str(plrv);
plrv_sc = PyString_AsString(plrv_so);

I was going to submit a patch, but I don't know enough about the
Python API or how Python and PostgreSQL handle Unicode to know
whether adding that simple check is the appropriate solution (I was
planning to raise an error if PyObject_Str() returned NULL). Can
anybody think of a better fix?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: process crash when a plpython function returns
Date: 2005-06-18 15:27:28
Message-ID: 1119108448.1183.71.camel@Andrea.peacock.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Am Samstag, den 18.06.2005, 08:41 -0600 schrieb Michael Fuhr:
> I've moved this thread from pgsql-bugs to pgsql-hackers; here are
> the original messages:
>
> http://archives.postgresql.org/pgsql-bugs/2005-06/msg00105.php
> http://archives.postgresql.org/pgsql-bugs/2005-06/msg00107.php
>
> As I mentioned in my followup to the bug report, a simple fix would
> appear to be to check for a NULL return value from PyObject_Str()
> before calling PyString_AsString() at the following location:
>
> /* Lines 776-77 in plpython.c */
> plrv_so = PyObject_Str(plrv);
> plrv_sc = PyString_AsString(plrv_so);
>
> I was going to submit a patch, but I don't know enough about the
> Python API or how Python and PostgreSQL handle Unicode to know
> whether adding that simple check is the appropriate solution (I was
> planning to raise an error if PyObject_Str() returned NULL). Can
> anybody think of a better fix?

raise error would be a correct solution since this is what
python does in this case:

http://docs.python.org/api/exceptions.html

also in this context it would be helpful
if sys.defaultencoding would be set to
the database encoding so strings get encoded
to utf-8 when postgres works in unicode mode
rather then the default encoding of ascii.
This could avoid most of the PyObject_Str()
exeptions in the first place.


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Tino Wildenhain <tino(at)wildenhain(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: process crash when a plpython function returns unicode
Date: 2005-06-27 14:12:04
Message-ID: 20050627141204.GA94894@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jun 18, 2005 at 05:27:28PM +0200, Tino Wildenhain wrote:
> Am Samstag, den 18.06.2005, 08:41 -0600 schrieb Michael Fuhr:
> >
> > I was going to submit a patch, but I don't know enough about the
> > Python API or how Python and PostgreSQL handle Unicode to know
> > whether adding that simple check is the appropriate solution (I was
> > planning to raise an error if PyObject_Str() returned NULL). Can
> > anybody think of a better fix?
>
> raise error would be a correct solution since this is what
> python does in this case:

I just submitted a patch that checks for NULL and raises an error
via PLy_elog().

> also in this context it would be helpful
> if sys.defaultencoding would be set to
> the database encoding so strings get encoded
> to utf-8 when postgres works in unicode mode
> rather then the default encoding of ascii.
> This could avoid most of the PyObject_Str()
> exeptions in the first place.

I haven't looked at doing that yet and probably won't before feature
freeze. Gerrit van Dyk has expressed an interest in hacking on
PL/Python (he recently submitted a SETOF patch) so maybe he'll work
on it.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: James William Pye <pgsql(at)jwp(dot)name>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, Tino Wildenhain <tino(at)wildenhain(dot)de>
Subject: Re: process crash when a plpython function returns
Date: 2005-07-08 04:50:23
Message-ID: 1120798223.832.26.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, 2005-06-27 at 08:12 -0600, Michael Fuhr wrote:
> > also in this context it would be helpful
> > if sys.defaultencoding would be set to
> > the database encoding so strings get encoded
> > to utf-8 when postgres works in unicode mode
> > rather then the default encoding of ascii.
> > This could avoid most of the PyObject_Str()
> > exeptions in the first place.
>
> I haven't looked at doing that yet and probably won't before feature
> freeze. Gerrit van Dyk has expressed an interest in hacking on
> PL/Python (he recently submitted a SETOF patch) so maybe he'll work
> on it.

I have this fixed, for the most part, in PL/Py. What I have done might
be a good starting place for someone who wants to get it fixed in core.

http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/python/be/src/encoding.c

This file makes using PostgreSQL encodings in Python a more friendly
experience by setting up some aliases. (u"óäæ".encode('UNICODE') would
work in 8.0)

Also, to set the default encoding used by Python's Unicode strings:
PyUnicode_SetDefaultEncoding(PyEncoding_FromPgEncoding(GetDatabaseEncoding()))

PyEncoding_FromPgEncoding is defined in encoding.c.

Also, it should be noted that to get the interpreter to read the
function code as a specific encoding, one must use, afaik, the # -*-
encoding: utf-8 -*- magic.
--
Regards, James William Pye