Re: plpython SPI cursors

From: Steve Singer <ssinger_pg(at)sympatico(dot)ca>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: plpython SPI cursors
Date: 2011-11-20 18:14:17
Message-ID: BLU0-SMTP4530C1907601DFB9AEAE778ECA0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11-10-15 07:28 PM, Jan Urbański wrote:
> Hi,
>
> attached is a patch implementing the usage of SPI cursors in PL/Python.
> Currently when trying to process a large table in PL/Python you have
> slurp it all into memory (that's what plpy.execute does).
>
> J

I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code

Overview & Feature Review
-----------
This patch adds cursor support to plpython. SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once. This is a good thing. Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch. This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach. I think the feature is worth having

Usability Review
----------------------
The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()

Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on. I like cursor() but if anyone disagrees now is the time.

This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.

Documentation Review
---------------------

The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.

Code Review
----------------

in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;
+
+ if (nargs > 0)
+ nulls = palloc(nargs * sizeof(char));
+ else
+ nulls = NULL;
+
+ for (j = 0; j < nargs; j++)
+ {
+ PyObject *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.

line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, "s", &query))
+ return PLy_cursor_query(query);
+

Should query be freed with PyMem_free()

Testing
-----------

I tested both python 2.6 and 3 on a Linux system

create or replace function x() returns text as $$
cur=None
try:
with plpy.subtransaction():
cur=plpy.cursor('select generate_series(1,1000)')
rows=cur.fetch(10);
plpy.execute('select f()')

except plpy.SPIError:
rows=cur.fetch(10);
return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();

crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG: server
process (PID 3166) was terminated by signal 11: Segmentation fault

The below test gives me a strange error message:

create or replace function x1() returns text as $$
plan=None
try:
with plpy.subtransaction():
plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select * FROM z')
plpy.execute('select * FROM does_not_exist')
except plpy.SPIError, e:
cur=plpy.cursor(plan)
rows=cur.fetch(10)
return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();

test=# select x1()
test-# ;
ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: <NULL>
CONTEXT: Traceback (most recent call last):
PL/Python function "x1", line 9, in <module>
cur=plpy.cursor(plan)
PL/Python function "x1"
STATEMENT: select x1()

I was expecting an error from the function just a bit more useful one.

Performance
-------------------
I did not do any specific performance testing but I don't see this patch
as having any impact to performance

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2011-11-20 18:22:23 Re: plpython SPI cursors
Previous Message Kevin Grittner 2011-11-20 16:33:47 Re: testing ProcArrayLock patches