Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: hlinnaka(at)iki(dot)fi
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Asif Naeem <asif(dot)naeem(at)enterprisedb(dot)com>
Subject: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Date: 2012-07-05 20:53:24
Message-ID: 4FF5FEC4.5090908@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 05/07/12 22:37, Heikki Linnakangas wrote:
> On 05.07.2012 23:31, Tom Lane wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)iki(dot)fi> writes:
>>> Fix mapping of PostgreSQL encodings to Python encodings.
>>
>> The buildfarm doesn't like this --- did you check for side effects on
>> regression test results?
>
> Hmm, I ran the regressions tests, but not with C encoding. With the
> patch, you no longer get the errdetail you used to, when an encoding
> conversion fails:
>
>> ***************
>> *** 41,47 ****
>>
>> SELECT unicode_plan1();
>> ERROR: spiexceptions.InternalError: could not convert Python Unicode
>> object to PostgreSQL server encoding
>> - DETAIL: UnicodeEncodeError: 'ascii' codec can't encode character
>> u'\x80' in position 0: ordinal not in range(128)
>> CONTEXT: Traceback (most recent call last):
>> PL/Python function "unicode_plan1", line 3, in <module>
>> rv = plpy.execute(plan, [u"\x80"], 1)
>> --- 39,44 ----
>
> We could just update the expected output, there's two expected outputs
> for this test case and one of them is now wrong. But it'd actually be
> quite a shame to lose that extra information, that's quite valuable.
> Perhaps we should go back to using PLu_elog() here, and find some other
> way to avoid the recursion.

Seems that the problem is that the LC_ALL=C makes Postgres use SQL_ASCII
as the database encoding and as the comment states, translating PG's
SQL_ASCII to Python's "ascii" is not ideal.

The problem is that PLyUnicode_Bytes is (via an ifdef) used as
PyString_ToString on Python3, which means that there are numerous call
sites and new ones might appear in any moment. I'm not that keen on
invoking the traceback machinery on low-level encoding errors.

Hm, since PyUnicode_Bytes should get a unicode object and return bytes
in the server encoding, we might just say that for SQL_ASCII we
arbitrarily choose UTF-8 to encode the unicode codepoints, so we'd just
set serverenc = "utf-8" in the first switch case.

That doesn't solve the problem of the missing error detail, though.

Jan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2012-07-05 21:17:03 pgsql: Don't try to trim "../" in join_path_components().
Previous Message Heikki Linnakangas 2012-07-05 20:50:44 pgsql: Revert part of the previous patch that avoided using PLy_elog().

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-07-05 21:30:52 Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Previous Message Heikki Linnakangas 2012-07-05 20:37:19 Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.