Re: REVIEW: PL/Python table functions

Lists: pgsql-hackers
From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: REVIEW: PL/Python table functions
Date: 2011-01-22 10:15:24
Message-ID: AANLkTimaL-rVmFQrn+7k3aCCXwwRUjwvzrV+OKZEMGUv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is a review for https://commitfest.postgresql.org/action/patch_view?id=460

== Submission ==
The patch applies and compiles with success, on the top of the general
refactor patch. It is possible it cannot in HEAD now that the part of
the refactor patch applied in the core. I'll check it after the whole
of refactor gets in the core.
make installcheck passes all 18 tests.

== Usability and Feature ==
The patch works fine in general. I created some functions for my tests like:

CREATE OR REPLACE FUNCTION twitter_response(q text, OUT from_user
text, OUT message text) RETURNS SETOF record AS $$
import sys
import urllib2
import simplejson

def fetch(q):
url = 'http://search.twitter.com/search.json?q=%s' % (q)
response = urllib2.urlopen(url)
return response.read()

root = simplejson.loads(fetch(q))
for result in root['results']:
from_user = result['from_user']
message = result['text']
yield from_user, message
$$ LANGUAGE plpythonu STRICT;

and other versions that modify parameter types, return type and the
code body itself.

One issue is typmod of record type.

regression=# create or replace function func1(t text) returns record
as $$ return {'name': t, 'value': 0} $$ language plpythonu;
regression=# select * from func1('jan') as (name text, value bigint);
name | value
------+-------
jan | 0
(1 row)

regression=# select * from func1('jan') as (name text, value int);
ERROR: function return row and query-specified return row do not match
DETAIL: Returned type bigint at ordinal position 2, but query expects integer.

It seems that typmod of PLyTypeInfo is not updated over statements
within the session. I saw the error doesn't occur when I re-connect to
the backend after the error.

== Performance ==
I didn't test performance regression. My static code analysis doesn't
tell it has critical performance issue.

I mark this as "Waiting on Author" for the typmod issue.

Regards,

--
Hitoshi Harada


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-01-22 15:12:00
Message-ID: 4D3AF3C0.4090302@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22/01/11 11:15, Hitoshi Harada wrote:
> This is a review for https://commitfest.postgresql.org/action/patch_view?id=460

Thanks,

> One issue is typmod of record type.
>
> regression=# create or replace function func1(t text) returns record
> as $$ return {'name': t, 'value': 0} $$ language plpythonu;
> regression=# select * from func1('jan') as (name text, value bigint);
> name | value
> ------+-------
> jan | 0
> (1 row)
>
> regression=# select * from func1('jan') as (name text, value int);
> ERROR: function return row and query-specified return row do not match
> DETAIL: Returned type bigint at ordinal position 2, but query expects integer.

That's a bug, thanks for spotting it.

> It seems that typmod of PLyTypeInfo is not updated over statements
> within the session. I saw the error doesn't occur when I re-connect to
> the backend after the error.

I tracked it down to caching the I/O functions for the return type. Your
example shows that you can't just discover the output record type once,
but you have to recheck whether the returned record's structure has not
changed between calls.

I implemented it by looping over the attributes of the output record and
checking if type for which we have already cached the I/O function is
binary coercible to the type that's actually in the record. This allows
the code to skip recaching the functions in the common case when the
record stays the same, and fixes the bug you found.

Attached is a new patch for table function support and an incremental
patch with the change I did. The new patch for table functions does not
apply to HEAD, it's a refinement of the previous table-functions patch.
After the refactor patches are all applied or rejected, I'll post a
frech patch that applies cleanly to HEAD.

Thanks again,
Jan

Attachment Content-Type Size
plpython-table-functions-incremental.patch text/x-patch 5.6 KB
plpython-table-functions.patch text/x-patch 33.8 KB

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-01-24 04:42:28
Message-ID: AANLkTi=5x1nZCKRaxpQWQvAEeTURcNfS_WiN_papBRzj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/1/23 Jan Urbański <wulczer(at)wulczer(dot)org>:
> On 22/01/11 11:15, Hitoshi Harada wrote:
>> This is a review for https://commitfest.postgresql.org/action/patch_view?id=460
>
> Thanks,
>
>> One issue is typmod of record type.
>>
>> regression=# create or replace function func1(t text) returns record
>> as $$ return {'name': t, 'value': 0} $$ language plpythonu;
>> regression=# select * from func1('jan') as (name text, value bigint);
>>  name | value
>> ------+-------
>>  jan  |     0
>> (1 row)
>>
>> regression=# select * from func1('jan') as (name text, value int);
>> ERROR:  function return row and query-specified return row do not match
>> DETAIL:  Returned type bigint at ordinal position 2, but query expects integer.
>
> That's a bug, thanks for spotting it.
>
>> It seems that typmod of PLyTypeInfo is not updated over statements
>> within the session. I saw the error doesn't occur when I re-connect to
>> the backend after the error.
>
> I tracked it down to caching the I/O functions for the return type. Your
> example shows that you can't just discover the output record type once,
> but you have to recheck whether the returned record's structure has not
> changed between calls.
>
> I implemented it by looping over the attributes of the output record and
> checking if type for which we have already cached the I/O function is
> binary coercible to the type that's actually in the record. This allows
> the code to skip recaching the functions in the common case when the
> record stays the same, and fixes the bug you found.
>
> Attached is a new patch for table function support and an incremental
> patch with the change I did. The new patch for table functions does not
> apply to HEAD, it's a refinement of the previous table-functions patch.
> After the refactor patches are all applied or rejected, I'll post a
> frech patch that applies cleanly to HEAD.

I tested the new incremental patch and the previous example works
fine. I don't know if this can be handled properly but another example
is:

regression=# create function func1(t text) returns record as $$ return
{'name':t, 'value':0}; $$ language plpythonu ;
CREATE FUNCTION
regression=# select * from func1('jan') as (name text, value bigint);
name | value
------+-------
jan | 0
(1 row)

regression=# select * from func1('jan') as (value text, name bigint);
value | name
-------+------
jan | 0
(1 row)

where value and name don't point to the correct data. Your
data-type-check logic might not be enough.

Regards,

--
Hitoshi Harada


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-01-26 23:41:51
Message-ID: 4D40B13F.7040401@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24/01/11 05:42, Hitoshi Harada wrote:
> 2011/1/23 Jan Urbański <wulczer(at)wulczer(dot)org>:
>> On 22/01/11 11:15, Hitoshi Harada wrote:
> I tested the new incremental patch and the previous example works
> fine. I don't know if this can be handled properly but another example
> is:
>
> regression=# create function func1(t text) returns record as $$ return
> {'name':t, 'value':0}; $$ language plpythonu ;
> CREATE FUNCTION
> regression=# select * from func1('jan') as (name text, value bigint);
> name | value
> ------+-------
> jan | 0
> (1 row)
>
> regression=# select * from func1('jan') as (value text, name bigint);
> value | name
> -------+------
> jan | 0
> (1 row)
>
> where value and name don't point to the correct data. Your
> data-type-check logic might not be enough.

I have to admit that I don't 100% understand what's happening when you
have a function that returns RECORD and has no OUT parameters, but I did
a quick check with PL/pgSQL and it behaves the same:

create or replace function rec(t text) returns record as $$
declare
r record;
begin
select t as name, 0 as value into r;
return r;
end;
$$ language plpgsql;

pl_regression=# select * from rec('jan') as (value text, name integer);
value | name
-------+------
jan | 0
(1 row)

pl_regression=# select * from rec('jan') as (name text, value integer);
name | value
------+-------
jan | 0
(1 row)

So PL/pgSQL seems to work positionally, whereas PL/Python uses the
variable names when fetching them from the mapping Python returned. All
in all, it works the same as in other PLs, so at least it's consistent.

I'm also attaching an updated version that should apply on top of my
github refactor branch (or incrementally over the new set of refactor
patches that I will post shortly to the refactor thread).

Cheers,
Jan

Attachment Content-Type Size
plpython-table-functions.patch text/x-patch 34.1 KB

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-01-27 21:39:55
Message-ID: 4D41E62B.40202@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 27/01/11 00:41, Jan Urbański wrote:
> I'm also attaching an updated version that should apply on top of my
> github refactor branch (or incrementally over the new set of refactor
> patches that I will post shortly to the refactor thread).

Attached is a patch for master, as the refactorings have already been
merged.

Jan

Attachment Content-Type Size
plpython-table-functions.patch text/x-patch 34.1 KB

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-02-04 15:26:07
Message-ID: AANLkTin-pOZN_TP08BpS1XN8kxw55FZ_xFaL7N+7Ehkv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/1/28 Jan Urbański <wulczer(at)wulczer(dot)org>:
> On 27/01/11 00:41, Jan Urbański wrote:
>> I'm also attaching an updated version that should apply on top of my
>> github refactor branch (or incrementally over the new set of refactor
>> patches that I will post shortly to the refactor thread).
>
> Attached is a patch for master, as the refactorings have already been
> merged.

Sorry, but could you update your patch? Patching it against HEAD today
makes rej.

patching file `src/pl/plpython/Makefile'
patching file `src/pl/plpython/expected/plpython_composite.out'
patching file `src/pl/plpython/expected/plpython_record.out'
Hunk #1 succeeded at 42 with fuzz 2.
Hunk #2 FAILED at 296.
1 out of 2 hunks FAILED -- saving rejects to
src/pl/plpython/expected/plpython_record.out.rej
patching file `src/pl/plpython/expected/plpython_trigger.out'
patching file `src/pl/plpython/plpython.c'
Hunk #1 succeeded at 100 with fuzz 2.
Hunk #2 succeeded at 132 (offset 1 line).
Hunk #4 succeeded at 346 (offset 4 lines).
Hunk #6 succeeded at 1166 (offset 40 lines).
Hunk #8 succeeded at 1485 (offset 40 lines).
Hunk #10 succeeded at 1892 (offset 40 lines).
Hunk #12 succeeded at 1938 (offset 40 lines).
Hunk #14 succeeded at 2011 (offset 40 lines).
Hunk #16 succeeded at 2300 (offset 40 lines).
Hunk #18 succeeded at 2456 (offset 40 lines).
Hunk #20 succeeded at 2531 (offset 40 lines).
patching file `src/pl/plpython/sql/plpython_composite.sql'
patching file `src/pl/plpython/sql/plpython_record.sql'
patching file `src/pl/plpython/sql/plpython_trigger.sql'

Regards,

--
Hitoshi Harada


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-02-06 16:14:10
Message-ID: 4D4EC8D2.2010404@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/02/11 16:26, Hitoshi Harada wrote:
> 2011/1/28 Jan Urbański <wulczer(at)wulczer(dot)org>:
>> On 27/01/11 00:41, Jan Urbański wrote:
>>> I'm also attaching an updated version that should apply on top of my
>>> github refactor branch (or incrementally over the new set of refactor
>>> patches that I will post shortly to the refactor thread).
>>
>> Attached is a patch for master, as the refactorings have already been
>> merged.
>
> Sorry, but could you update your patch? Patching it against HEAD today
> makes rej.

Sure, here's an updated patch.

Jan

Attachment Content-Type Size
plpython-table-functions.patch text/x-patch 34.4 KB

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-02-07 05:10:45
Message-ID: AANLkTikdot8y27YVwLhkfrpJFra=ZyrytxJVUkxZB9ma@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/2/7 Jan Urbański <wulczer(at)wulczer(dot)org>:
> On 04/02/11 16:26, Hitoshi Harada wrote:
>> 2011/1/28 Jan Urbański <wulczer(at)wulczer(dot)org>:
>>> On 27/01/11 00:41, Jan Urbański wrote:
>>>> I'm also attaching an updated version that should apply on top of my
>>>> github refactor branch (or incrementally over the new set of refactor
>>>> patches that I will post shortly to the refactor thread).
>>>
>>> Attached is a patch for master, as the refactorings have already been
>>> merged.
>>
>> Sorry, but could you update your patch? Patching it against HEAD today
>> makes rej.
>
> Sure, here's an updated patch.

Thanks,

I revisited the problem of typeinfo cache, and I guess this is not
what you want;

db1=# create function func1(t text) returns record as $$ return
{'v1':1,'v2':2,t:3} $$ language plpythonu;
CREATE FUNCTION
db1=# select * from func1('v3') as (v3 int, v2 int, v1 int);
v3 | v2 | v1
----+----+----
3 | 2 | 1
(1 row)

db1=# select * from func1('v3') as (v1 int, v2 int, v3 int);
v1 | v2 | v3
----+----+----
3 | 2 | 1
(1 row)

db1=# select * from func1('v4') as (v1 int, v2 int, v3 int);
ERROR: key "v3" not found in mapping
HINT: To return null in a column, add the value None to the mapping
with the key named after the column.
CONTEXT: while creating return value
PL/Python function "func1"
db1=# select * from func1('v4') as (v1 int, v2 int, v4 int);
ERROR: key "v3" not found in mapping
HINT: To return null in a column, add the value None to the mapping
with the key named after the column.
CONTEXT: while creating return value
PL/Python function "func1"

The PL/pgSQL case you pointed earlier is consistent because it fetches
the values positionally. The column name is only an on-demand
labeling. However, for mapping dict of python into the table row
should always map it by key. At least the function author (including
me :P) expects it.

Regards,

--
Hitoshi Harada


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-02-08 22:35:06
Message-ID: 4D51C51A.4030400@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/02/11 06:10, Hitoshi Harada wrote:
> 2011/2/7 Jan Urbański <wulczer(at)wulczer(dot)org>:
>> On 04/02/11 16:26, Hitoshi Harada wrote:
>>> 2011/1/28 Jan Urbański <wulczer(at)wulczer(dot)org>:
>>>> On 27/01/11 00:41, Jan Urbański wrote:
>>>>> I'm also attaching an updated version that should apply on top of my
>>>>> github refactor branch (or incrementally over the new set of refactor
>>>>> patches that I will post shortly to the refactor thread).
>>>>
>>>> Attached is a patch for master, as the refactorings have already been
>>>> merged.
>>>
>>> Sorry, but could you update your patch? Patching it against HEAD today
>>> makes rej.
>>
>> Sure, here's an updated patch.
>
> Thanks,
>
> I revisited the problem of typeinfo cache, and I guess this is not
> what you want;
>
> [problems, now reflected in new regression tests]
>
> The PL/pgSQL case you pointed earlier is consistent because it fetches
> the values positionally. The column name is only an on-demand
> labeling. However, for mapping dict of python into the table row
> should always map it by key. At least the function author (including
> me :P) expects it.

Yes, you're right. I tried to be too cute checking if the arguments are
binary coercible to the fit the new record description. This time I'm
just checking if the record descriptor changed at all, and if so
recaching the I/O funcs.

I hope this version does the right thing, while still avoiding the
performance hit of looking up I/O funcs every time a row is returned.
Actually, PL/Perl *does* look up the I/O funcs every time, so in the
worst case I can just drop this optimisation. But let's hope I got it
right this time :)

Thanks again for the review,
Jan

Attachment Content-Type Size
plpython-table-functions.patch text/x-patch 35.6 KB

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-02-09 03:22:38
Message-ID: AANLkTikJrDoPJcuTX-tfU=PX1cLGE7Qs_G1w9mivtj0_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/2/9 Jan Urbański <wulczer(at)wulczer(dot)org>:
> I hope this version does the right thing, while still avoiding the
> performance hit of looking up I/O funcs every time a row is returned.
> Actually, PL/Perl *does* look up the I/O funcs every time, so in the
> worst case I can just drop this optimisation. But let's hope I got it
> right this time :)

I tested it on the issue above and things around trigger, and looked
good to me. Although I'm not sure if I understand the code overall,
but the modification where I'm unclear seems covered by the regression
tests.

I mark this "Ready for Committer."

Regards,

--
Hitoshi Harada


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REVIEW: PL/Python table functions
Date: 2011-02-26 14:56:50
Message-ID: 1298732210.26135.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Committed the last version.

I updated the documentation which previously claimed that what you
implemented wasn't supported.