Quick Links

Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

Lists:	pgsql-committerspgsql-hackers

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
To:	pgsql-committers(at)postgresql(dot)org
Subject:	pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-05 19:33:38
Message-ID:	E1Smro2-0006gR-JJ@gemulon.postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Fix mapping of PostgreSQL encodings to Python encodings.

Windows encodings, "win1252" and so forth, are named differently in Python,
like "cp1252". Also, if the PyUnicode_AsEncodedString() function call fails
for some reason, use a plain ereport(), not a PLy_elog(), to report that
error. That avoids recursion and crash, if PLy_elog() tries to call
PLyUnicode_Bytes() again.

This fixes bug reported by Asif Naeem. Backpatch down to 9.0, before that
plpython didn't even try these conversions.

Jan Urbański, with minor comment improvements by me.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/b66de4c6d7208d9ec420b912758377a3533c7a7d

Modified Files
--------------
src/pl/plpython/plpy_util.c | 69 ++++++++++++++++++++++++++++++++++++++----
1 files changed, 62 insertions(+), 7 deletions(-)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Cc:	pgsql-committers(at)postgresql(dot)org
Subject:	Re: pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-05 20:31:19
Message-ID:	25303.1341520279@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> writes:
> Fix mapping of PostgreSQL encodings to Python encodings.

The buildfarm doesn't like this --- did you check for side effects on
regression test results?

regards, tom lane

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-05 20:37:19
Message-ID:	4FF5FAFF.7020403@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 05.07.2012 23:31, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)iki(dot)fi> writes:
>> Fix mapping of PostgreSQL encodings to Python encodings.
>
> The buildfarm doesn't like this --- did you check for side effects on
> regression test results?

Hmm, I ran the regressions tests, but not with C encoding. With the
patch, you no longer get the errdetail you used to, when an encoding
conversion fails:

> ***************
> *** 41,47 ****
>
> SELECT unicode_plan1();
> ERROR: spiexceptions.InternalError: could not convert Python Unicode object to PostgreSQL server encoding
> - DETAIL: UnicodeEncodeError: 'ascii' codec can't encode character u'\x80' in position 0: ordinal not in range(128)
> CONTEXT: Traceback (most recent call last):
> PL/Python function "unicode_plan1", line 3, in <module>
> rv = plpy.execute(plan, [u"\x80"], 1)
> --- 39,44 ----

We could just update the expected output, there's two expected outputs
for this test case and one of them is now wrong. But it'd actually be
quite a shame to lose that extra information, that's quite valuable.
Perhaps we should go back to using PLu_elog() here, and find some other
way to avoid the recursion.

- Heikki

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	hlinnaka(at)iki(dot)fi
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-05 20:53:24
Message-ID:	4FF5FEC4.5090908@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 05/07/12 22:37, Heikki Linnakangas wrote:
> On 05.07.2012 23:31, Tom Lane wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)iki(dot)fi> writes:
>>> Fix mapping of PostgreSQL encodings to Python encodings.
>>
>> The buildfarm doesn't like this --- did you check for side effects on
>> regression test results?
>
> Hmm, I ran the regressions tests, but not with C encoding. With the
> patch, you no longer get the errdetail you used to, when an encoding
> conversion fails:
>
>> ***************
>> *** 41,47 ****
>>
>> SELECT unicode_plan1();
>> ERROR: spiexceptions.InternalError: could not convert Python Unicode
>> object to PostgreSQL server encoding
>> - DETAIL: UnicodeEncodeError: 'ascii' codec can't encode character
>> u'\x80' in position 0: ordinal not in range(128)
>> CONTEXT: Traceback (most recent call last):
>> PL/Python function "unicode_plan1", line 3, in <module>
>> rv = plpy.execute(plan, [u"\x80"], 1)
>> --- 39,44 ----
>
> We could just update the expected output, there's two expected outputs
> for this test case and one of them is now wrong. But it'd actually be
> quite a shame to lose that extra information, that's quite valuable.
> Perhaps we should go back to using PLu_elog() here, and find some other
> way to avoid the recursion.

Seems that the problem is that the LC_ALL=C makes Postgres use SQL_ASCII
as the database encoding and as the comment states, translating PG's
SQL_ASCII to Python's "ascii" is not ideal.

The problem is that PLyUnicode_Bytes is (via an ifdef) used as
PyString_ToString on Python3, which means that there are numerous call
sites and new ones might appear in any moment. I'm not that keen on
invoking the traceback machinery on low-level encoding errors.

Hm, since PyUnicode_Bytes should get a unicode object and return bytes
in the server encoding, we might just say that for SQL_ASCII we
arbitrarily choose UTF-8 to encode the unicode codepoints, so we'd just
set serverenc = "utf-8" in the first switch case.

That doesn't solve the problem of the missing error detail, though.

Jan

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	hlinnaka(at)iki(dot)fi, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-05 21:30:52
Message-ID:	1341523852.16957.0.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:
> The problem is that PLyUnicode_Bytes is (via an ifdef) used as
> PyString_ToString on Python3, which means that there are numerous call
> sites and new ones might appear in any moment. I'm not that keen on
> invoking the traceback machinery on low-level encoding errors.

Why not?

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	hlinnaka(at)iki(dot)fi, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-05 21:54:26
Message-ID:	4FF60D12.2030806@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 05/07/12 23:30, Peter Eisentraut wrote:
> On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:
>> The problem is that PLyUnicode_Bytes is (via an ifdef) used as
>> PyString_ToString on Python3, which means that there are numerous call
>> sites and new ones might appear in any moment. I'm not that keen on
>> invoking the traceback machinery on low-level encoding errors.
>
> Why not?

Because it can lead to recursion errors, like the one this patch was
supposed to fix. The traceback machinery calls into the encoding
functions, because it converts Python strings (like function names) into
C strings.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-06 08:05:10
Message-ID:	4FF69C36.2090803@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 06.07.2012 00:54, Jan Urbański wrote:
> On 05/07/12 23:30, Peter Eisentraut wrote:
>> On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:
>>> The problem is that PLyUnicode_Bytes is (via an ifdef) used as
>>> PyString_ToString on Python3, which means that there are numerous call
>>> sites and new ones might appear in any moment. I'm not that keen on
>>> invoking the traceback machinery on low-level encoding errors.
>>
>> Why not?
>
> Because it can lead to recursion errors, like the one this patch was
> supposed to fix. The traceback machinery calls into the encoding
> functions, because it converts Python strings (like function names) into
> C strings.

In the backend elog routines, there is a global variable
'recursion_depth', which is incremented when an error-handling routine
is entered, and decremented afterwards. Can we use a similar mechinism
in PLy_elog() to detect and stop recursion?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-06 08:14:55
Message-ID:	4FF69E7F.4040700@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 06/07/12 10:05, Heikki Linnakangas wrote:
> On 06.07.2012 00:54, Jan Urbański wrote:
>> On 05/07/12 23:30, Peter Eisentraut wrote:
>>> On tor, 2012-07-05 at 22:53 +0200, Jan Urbański wrote:
>>>> The problem is that PLyUnicode_Bytes is (via an ifdef) used as
>>>> PyString_ToString on Python3, which means that there are numerous call
>>>> sites and new ones might appear in any moment. I'm not that keen on
>>>> invoking the traceback machinery on low-level encoding errors.
>>>
>>> Why not?
>>
>> Because it can lead to recursion errors, like the one this patch was
>> supposed to fix. The traceback machinery calls into the encoding
>> functions, because it converts Python strings (like function names) into
>> C strings.
>
> In the backend elog routines, there is a global variable
> 'recursion_depth', which is incremented when an error-handling routine
> is entered, and decremented afterwards. Can we use a similar mechinism
> in PLy_elog() to detect and stop recursion?

I guess we can, I'll try to do some tests in order to see if there's an
easy user-triggereable way of causing PLy_elog to recurse and if not
then a guard like this should be enough as a safety measure against as
yet unknown conditions (as opposed to something we expect to happen
regularly).

Cheers,
Jan

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-06 15:01:52
Message-ID:	4FF6FDE0.1000800@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 06/07/12 10:14, Jan Urbański wrote:
> On 06/07/12 10:05, Heikki Linnakangas wrote:
>> In the backend elog routines, there is a global variable
>> 'recursion_depth', which is incremented when an error-handling routine
>> is entered, and decremented afterwards. Can we use a similar mechinism
>> in PLy_elog() to detect and stop recursion?
>
> I guess we can, I'll try to do some tests in order to see if there's an
> easy user-triggereable way of causing PLy_elog to recurse and if not
> then a guard like this should be enough as a safety measure against as
> yet unknown conditions (as opposed to something we expect to happen
> regularly).

Attached is a patch that stores the recursion level of PLy_traceback and
prevents it from running if it's too deep (PLy_traceback is the one
doing heavy lifting, that's why I chose to put the logic to skip running
there).

I tried a few things and was not able to easily invoke the infinite
recursion condition, but I did notice that there are two more encodings
that have different names in Postgres and in Python (KOI8-R and KOI8-U)
and added them to the switch.

There's still trouble with EUC_TW and MULE_INTERNAL which don't have
Python equivalents. EUC-TW has been discussed in
http://bugs.python.org/issue2066 and rejected (see
http://bugs.python.org/issue2066#msg113731).

If you use any of these encodings, you *will* get into the recursion
trouble described eariler, just as before the path you'd get into it
with CP1252 as your encoding.

What shall we do about those? Ignore them? Document that if you're sing
one of these encodings then PL/Python with Python 2 will be crippled and
with Python 3 just won't work?

Cheers,
Jan

Attachment	Content-Type	Size
plpython-encodings-fix.patch	text/x-diff	2.9 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-06 15:53:55
Message-ID:	4FF70A13.7040207@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 06.07.2012 18:01, Jan Urbański wrote:
> There's still trouble with EUC_TW and MULE_INTERNAL which don't have
> Python equivalents. EUC-TW has been discussed in
> http://bugs.python.org/issue2066 and rejected (see
> http://bugs.python.org/issue2066#msg113731).
>
> If you use any of these encodings, you *will* get into the recursion
> trouble described eariler, just as before the path you'd get into it
> with CP1252 as your encoding.
>
> What shall we do about those? Ignore them? Document that if you're sing
> one of these encodings then PL/Python with Python 2 will be crippled and
> with Python 3 just won't work?

We could convert to UTF-8, and use the PostgreSQL functions to convert
from UTF-8 to the server encoding. Double conversion might be slow, but
I think it would be better than failing.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Jan Urbański <wulczer(at)wulczer(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-06 20:47:54
Message-ID:	1341607674.7092.3.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:
> > What shall we do about those? Ignore them? Document that if you're sing
> > one of these encodings then PL/Python with Python 2 will be crippled and
> > with Python 3 just won't work?
>
> We could convert to UTF-8, and use the PostgreSQL functions to convert
> from UTF-8 to the server encoding. Double conversion might be slow, but
> I think it would be better than failing.

Actually, we already do the other direction that way
(PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
always use this.

I would hesitate to use this as a kind of fallback, because then we
would sometimes be using PostgreSQL's recoding tables and sometimes
Python's recoding tables, which could became confusing.

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-06 21:12:44
Message-ID:	4FF754CC.3030401@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 06/07/12 22:47, Peter Eisentraut wrote:
> On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:
>>> What shall we do about those? Ignore them? Document that if you're sing
>>> one of these encodings then PL/Python with Python 2 will be crippled and
>>> with Python 3 just won't work?
>>
>> We could convert to UTF-8, and use the PostgreSQL functions to convert
>> from UTF-8 to the server encoding. Double conversion might be slow, but
>> I think it would be better than failing.
>
> Actually, we already do the other direction that way
> (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
> always use this.
>
> I would hesitate to use this as a kind of fallback, because then we
> would sometimes be using PostgreSQL's recoding tables and sometimes
> Python's recoding tables, which could became confusing.

So you're in favour of doing unicode -> bytes by encoding with UTF-8 and
then using the server's encoding functions?

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-12 09:08:18
Message-ID:	4FFE9402.7060404@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 07.07.2012 00:12, Jan Urbański wrote:
> On 06/07/12 22:47, Peter Eisentraut wrote:
>> On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:
>>>> What shall we do about those? Ignore them? Document that if you're sing
>>>> one of these encodings then PL/Python with Python 2 will be crippled
>>>> and
>>>> with Python 3 just won't work?
>>>
>>> We could convert to UTF-8, and use the PostgreSQL functions to convert
>>> from UTF-8 to the server encoding. Double conversion might be slow, but
>>> I think it would be better than failing.
>>
>> Actually, we already do the other direction that way
>> (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
>> always use this.
>>
>> I would hesitate to use this as a kind of fallback, because then we
>> would sometimes be using PostgreSQL's recoding tables and sometimes
>> Python's recoding tables, which could became confusing.
>
> So you're in favour of doing unicode -> bytes by encoding with UTF-8 and
> then using the server's encoding functions?

Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
should be quite fast, and it would be good to be consistent in the way
we do conversions in both directions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-13 11:38:07
Message-ID:	5000089F.1090304@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 12/07/12 11:08, Heikki Linnakangas wrote:
> On 07.07.2012 00:12, Jan Urbański wrote:
>> On 06/07/12 22:47, Peter Eisentraut wrote:
>>> On fre, 2012-07-06 at 18:53 +0300, Heikki Linnakangas wrote:
>>>>> What shall we do about those? Ignore them? Document that if you're
>>>>> sing
>>>>> one of these encodings then PL/Python with Python 2 will be crippled
>>>>> and
>>>>> with Python 3 just won't work?
>>>>
>>>> We could convert to UTF-8, and use the PostgreSQL functions to convert
>>>> from UTF-8 to the server encoding. Double conversion might be slow, but
>>>> I think it would be better than failing.
>>>
>>> Actually, we already do the other direction that way
>>> (PLyUnicode_FromStringAndSize) , so maybe it would be more consistent to
>>> always use this.
>>>
>>> I would hesitate to use this as a kind of fallback, because then we
>>> would sometimes be using PostgreSQL's recoding tables and sometimes
>>> Python's recoding tables, which could became confusing.
>>
>> So you're in favour of doing unicode -> bytes by encoding with UTF-8 and
>> then using the server's encoding functions?
>
> Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
> should be quite fast, and it would be good to be consistent in the way
> we do conversions in both directions.
>

I'll implement that than (sorry for not following up on that eariler).

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-14 14:50:06
Message-ID:	5001871E.9050308@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 13/07/12 13:38, Jan Urbański wrote:
> On 12/07/12 11:08, Heikki Linnakangas wrote:
>> On 07.07.2012 00:12, Jan Urbański wrote:
>>> So you're in favour of doing unicode -> bytes by encoding with UTF-8 and
>>> then using the server's encoding functions?
>>
>> Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
>> should be quite fast, and it would be good to be consistent in the way
>> we do conversions in both directions.
>>
>
> I'll implement that than (sorry for not following up on that eariler).

Here's a patch that always encodes Python unicode objects using UTF-8
and then uses Postgres's internal functions to produce bytes in the
server encoding.

Cheers,
Jan

Attachment	Content-Type	Size
plpython-use-server-encoding-functions.patch	text/x-diff	4.4 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-18 15:17:20
Message-ID:	5006D380.1060202@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 14.07.2012 17:50, Jan Urbański wrote:
> On 13/07/12 13:38, Jan Urbański wrote:
>> On 12/07/12 11:08, Heikki Linnakangas wrote:
>>> On 07.07.2012 00:12, Jan Urbański wrote:
>>>> So you're in favour of doing unicode -> bytes by encoding with UTF-8
>>>> and
>>>> then using the server's encoding functions?
>>>
>>> Sounds reasonable to me. The extra conversion between UTF-8 and UCS-2
>>> should be quite fast, and it would be good to be consistent in the way
>>> we do conversions in both directions.
>>>
>>
>> I'll implement that than (sorry for not following up on that eariler).
>
> Here's a patch that always encodes Python unicode objects using UTF-8
> and then uses Postgres's internal functions to produce bytes in the
> server encoding.

Thanks.

If pg_do_encoding_conversion() throws an error, you don't get a chance
to call Py_DECREF() to release the string. Is that a problem?

If an error occurs in PLy_traceback(), after incrementing
recursion_depth, you don't get a chance to decrement it again. I'm not
sure if the Py* function calls can fail, but at least seemingly trivial
things like initStringInfo() can throw an out-of-memory error.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-20 06:59:38
Message-ID:	500901DA.1080307@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 18/07/12 17:17, Heikki Linnakangas wrote:
> On 14.07.2012 17:50, Jan Urbański wrote:
>
> If pg_do_encoding_conversion() throws an error, you don't get a chance
> to call Py_DECREF() to release the string. Is that a problem?
>
> If an error occurs in PLy_traceback(), after incrementing
> recursion_depth, you don't get a chance to decrement it again. I'm not
> sure if the Py* function calls can fail, but at least seemingly trivial
> things like initStringInfo() can throw an out-of-memory error.

Of course you're right (on both accounts).

Here's a version with a bunch of PG_TRies thrown in.

Cheers,
Jan

Attachment	Content-Type	Size
plpython-use-server-encoding-functions-v2.patch	text/x-diff	4.5 KB

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-07-20 07:13:14
Message-ID:	5009050A.3040101@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 20/07/12 08:59, Jan Urbański wrote:
> On 18/07/12 17:17, Heikki Linnakangas wrote:
>> On 14.07.2012 17:50, Jan Urbański wrote:
>>
>> If pg_do_encoding_conversion() throws an error, you don't get a chance
>> to call Py_DECREF() to release the string. Is that a problem?
>>
>> If an error occurs in PLy_traceback(), after incrementing
>> recursion_depth, you don't get a chance to decrement it again. I'm not
>> sure if the Py* function calls can fail, but at least seemingly trivial
>> things like initStringInfo() can throw an out-of-memory error.
>
> Of course you're right (on both accounts).
>
> Here's a version with a bunch of PG_TRies thrown in.

Silly me, playing tricks with postincrements before fully waking up.

Here's v3, with a correct inequality test for exceeding the traceback
recursion test.

Attachment	Content-Type	Size
plpython-use-server-encoding-functions-v3.patch	text/x-diff	4.5 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Jan Urbański <wulczer(at)wulczer(dot)org>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-08-06 11:59:43
Message-ID:	501FB1AF.8030103@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 20.07.2012 10:13, Jan Urbański wrote:
> On 20/07/12 08:59, Jan Urbański wrote:
>> On 18/07/12 17:17, Heikki Linnakangas wrote:
>>> On 14.07.2012 17:50, Jan Urbański wrote:
>>>
>>> If pg_do_encoding_conversion() throws an error, you don't get a chance
>>> to call Py_DECREF() to release the string. Is that a problem?
>>>
>>> If an error occurs in PLy_traceback(), after incrementing
>>> recursion_depth, you don't get a chance to decrement it again. I'm not
>>> sure if the Py* function calls can fail, but at least seemingly trivial
>>> things like initStringInfo() can throw an out-of-memory error.
>>
>> Of course you're right (on both accounts).
>>
>> Here's a version with a bunch of PG_TRies thrown in.
>
> Silly me, playing tricks with postincrements before fully waking up.
>
> Here's v3, with a correct inequality test for exceeding the traceback
> recursion test.

Committed the convert-via-UTF-8 part of this. I'll take a closer look at
the recursion check next.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Jan Urbański <wulczer(at)wulczer(dot)org>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject:	Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date:	2012-08-09 09:55:00
Message-ID:	502388F4.5070503@wulczer.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 06/08/12 13:59, Heikki Linnakangas wrote:
> On 20.07.2012 10:13, Jan Urbański wrote:
>> On 20/07/12 08:59, Jan Urbański wrote:
>>> On 18/07/12 17:17, Heikki Linnakangas wrote:
>>>> On 14.07.2012 17:50, Jan Urbański wrote:
>>>>
>>>> If pg_do_encoding_conversion() throws an error, you don't get a chance
>>>> to call Py_DECREF() to release the string. Is that a problem?
>>>>
>>>> If an error occurs in PLy_traceback(), after incrementing
>>>> recursion_depth, you don't get a chance to decrement it again. I'm not
>>>> sure if the Py* function calls can fail, but at least seemingly trivial
>>>> things like initStringInfo() can throw an out-of-memory error.
>>>
>>> Of course you're right (on both accounts).
>>>
>>> Here's a version with a bunch of PG_TRies thrown in.
>>
>> Silly me, playing tricks with postincrements before fully waking up.
>>
>> Here's v3, with a correct inequality test for exceeding the traceback
>> recursion test.
>
> Committed the convert-via-UTF-8 part of this. I'll take a closer look at
> the recursion check next.

Thanks!

Jan