Lists: | pgsql-hackers |
---|
From: | Jan Urbański <wulczer(at)wulczer(dot)org> |
---|---|
To: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | plpython SPI cursors |
Date: | 2011-10-15 23:28:56 |
Message-ID: | 4E9A1738.6090108@wulczer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
attached is a patch implementing the usage of SPI cursors in PL/Python.
Currently when trying to process a large table in PL/Python you have
slurp it all into memory (that's what plpy.execute does).
This patch allows reading the result set in smaller chunks, using a SPI
cursor behind the scenes.
Example usage:
cursor = plpy.cursor("select a, b from hugetable")
for row in cursor:
plpy.info("a is %s and b is %s" % (row['a'], row['b']))
The patch itself is simple, but there's a lot of boilerplate dedicated
to opening a subtransaction and handling prepared plans. I'd like to do
some refactoring of they way PL/Python uses SPI to reduce the amount of
boilerplate needed, but that'll come as a separate patch (just before
the patch to split plpython.c in smaller chunks).
This feature has been sponsored by Nomao.
Cheers,
Jan
PS: I already added it to the November CF.
J
Attachment | Content-Type | Size |
---|---|---|
0001-Add-cursor-support-to-plpythonu.patch | text/x-diff | 0 bytes |
From: | Steve Singer <ssinger_pg(at)sympatico(dot)ca> |
---|---|
To: | Jan Urbański <wulczer(at)wulczer(dot)org> |
Cc: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-11-20 18:14:17 |
Message-ID: | BLU0-SMTP4530C1907601DFB9AEAE778ECA0@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 11-10-15 07:28 PM, Jan Urbański wrote:
> Hi,
>
> attached is a patch implementing the usage of SPI cursors in PL/Python.
> Currently when trying to process a large table in PL/Python you have
> slurp it all into memory (that's what plpy.execute does).
>
> J
I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code
Overview & Feature Review
-----------
This patch adds cursor support to plpython. SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once. This is a good thing. Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch. This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach. I think the feature is worth having
Usability Review
----------------------
The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()
Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on. I like cursor() but if anyone disagrees now is the time.
This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.
Documentation Review
---------------------
The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.
Code Review
----------------
in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;
+
+ if (nargs > 0)
+ nulls = palloc(nargs * sizeof(char));
+ else
+ nulls = NULL;
+
+ for (j = 0; j < nargs; j++)
+ {
+ PyObject *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.
line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, "s", &query))
+ return PLy_cursor_query(query);
+
Should query be freed with PyMem_free()
Testing
-----------
I tested both python 2.6 and 3 on a Linux system
create or replace function x() returns text as $$
cur=None
try:
with plpy.subtransaction():
cur=plpy.cursor('select generate_series(1,1000)')
rows=cur.fetch(10);
plpy.execute('select f()')
except plpy.SPIError:
rows=cur.fetch(10);
return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();
crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG: server
process (PID 3166) was terminated by signal 11: Segmentation fault
The below test gives me a strange error message:
create or replace function x1() returns text as $$
plan=None
try:
with plpy.subtransaction():
plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select * FROM z')
plpy.execute('select * FROM does_not_exist')
except plpy.SPIError, e:
cur=plpy.cursor(plan)
rows=cur.fetch(10)
return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();
test=# select x1()
test-# ;
ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: <NULL>
CONTEXT: Traceback (most recent call last):
PL/Python function "x1", line 9, in <module>
cur=plpy.cursor(plan)
PL/Python function "x1"
STATEMENT: select x1()
I was expecting an error from the function just a bit more useful one.
Performance
-------------------
I did not do any specific performance testing but I don't see this patch
as having any impact to performance
From: | Jan Urbański <wulczer(at)wulczer(dot)org> |
---|---|
To: | Steve Singer <ssinger_pg(at)sympatico(dot)ca> |
Cc: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-11-20 18:22:23 |
Message-ID: | 4EC9455F.5070209@wulczer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 20/11/11 19:14, Steve Singer wrote:
> On 11-10-15 07:28 PM, Jan Urbański wrote:
>> Hi,
>>
>> attached is a patch implementing the usage of SPI cursors in PL/Python.
> I found a few bugs (see my testing section below) that will need fixing
> + a few questions about the code
Hi Steve,
thanks a lot for the review, I'll investigate the errors you were
getting and post a follow-up.
Good catch on trying cursors with explicit subtransactions, I didn't
think about how they would interact.
Cheers,
Jan
From: | Jan Urbański <wulczer(at)wulczer(dot)org> |
---|---|
To: | Steve Singer <ssinger_pg(at)sympatico(dot)ca> |
Cc: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-11-23 18:58:55 |
Message-ID: | 4ECD426F.5060702@wulczer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 20/11/11 19:14, Steve Singer wrote:
> On 11-10-15 07:28 PM, Jan Urbański wrote:
>> Hi,
>>
>> attached is a patch implementing the usage of SPI cursors in PL/Python.
>> Currently when trying to process a large table in PL/Python you have
>> slurp it all into memory (that's what plpy.execute does).
>>
>> J
>
> I found a few bugs (see my testing section below) that will need fixing
> + a few questions about the code
Responding now to all questions and attaching a revised patch based on
your comments.
> Do we like the name plpy.cursor or would we rather call it something like
> plpy.execute_cursor(...) or plpy.cursor_open(...) or
> plpy.create_cursor(...)
> Since we will be mostly stuck with the API once we release 9.2 this is
> worth
> some opinions on. I like cursor() but if anyone disagrees now is the time.
We use plpy.subtransaction() to create Subxact objects, so I though
plpy.cursor() would be most appropriate.
> This patch does not provide a wrapper around SPI_cursor_move. The patch
> is useful without that and I don't see anything that preculdes someone else
> adding that later if they see a need.
My idea is to add keyword arguments to plpy.cursor() that will allow you
to decide whether you want a scrollable cursor and after that provide a
move() method.
> The patch includes documentation updates that describes the new feature.
> The Database Access page doesn't provide a API style list of database
> access
> functions like the plperl
> http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
> does. I think the organization of the perl page is
> clearer than the python one and we should think about a doing some
> documentaiton refactoring. That should be done as a seperate patch and
> shouldn't be a barrier to committing this one.
Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet
summoned force to overhaul them.
> in PLy_cursor_plan line 4080
> + PG_TRY();
> + {
> + Portal portal;
> + char *volatile nulls;
> + volatile int j;
> I am probably not seeing a code path or misunderstanding something
> about the setjmp/longjump usages but I don't see why nulls and j need to be
> volatile here.
It looked like you could drop volatile there (and in
PLy_spi_execute_plan, where this is copied from (did I mention there's
quite some code duplication in PL/Python?)) but digging in git I found
this commit:
that added the original volatile qualification, so I guess there's a reason.
> line 444
> PLy_cursor(PyObject *self, PyObject *args)
> + {
> + char *query;
> + PyObject *plan;
> + PyObject *planargs = NULL;
> +
> + if (PyArg_ParseTuple(args, "s", &query))
> + return PLy_cursor_query(query);
> +
>
> Should query be freed with PyMem_free()
No, PyArg_ParseTuple returns a string on the stack, I check that
repeatedly creating a cursor with a plan argument does not leak memory
and that adding PyMem_Free there promptly leads to a segfault.
> I tested both python 2.6 and 3 on a Linux system
>
> [test cases demonstrating bugs]
Turns out it's a really bad idea to store pointers to Portal structures,
because they get invalidated by the subtransaction abort hooks.
I switched to storing the cursor name and looking it up in the
appropriate hash table every time it's used. The examples you sent
(which I included as regression tests) now cause a ValueError to be
raised with a message stating that the cursor has been created in an
aborted subtransaction.
Not sure about the wording of the error message, though.
Thanks again for the review!
Cheers,
Jan
Attachment | Content-Type | Size |
---|---|---|
cursor-support-for-plpython-v2.patch | text/x-diff | 40.8 KB |
From: | Steve Singer <ssinger_pg(at)sympatico(dot)ca> |
---|---|
To: | Jan Urbański <wulczer(at)wulczer(dot)org> |
Cc: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-11-26 16:21:14 |
Message-ID: | BLU0-SMTP2363BD56F6BB1AD3E704448ECC0@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 11-11-23 01:58 PM, Jan Urbański wrote:
> On 20/11/11 19:14, Steve Singer wrote:
>> On 11-10-15 07:28 PM, Jan Urbański wrote:
>>> Hi,
>>>
>>> attached is a patch implementing the usage of SPI cursors in PL/Python.
>>> Currently when trying to process a large table in PL/Python you have
>>> slurp it all into memory (that's what plpy.execute does).
>>>
>>> J
>>
>> I found a few bugs (see my testing section below) that will need fixing
>> + a few questions about the code
>
> Responding now to all questions and attaching a revised patch based on
> your comments.
>
I've looked over the revised version of the patch and it now seems fine.
Ready for committer.
>> Do we like the name plpy.cursor or would we rather call it something
>> like
>> plpy.execute_cursor(...) or plpy.cursor_open(...) or
>> plpy.create_cursor(...)
>> Since we will be mostly stuck with the API once we release 9.2 this is
>> worth
>> some opinions on. I like cursor() but if anyone disagrees now is the
>> time.
>
> We use plpy.subtransaction() to create Subxact objects, so I though
> plpy.cursor() would be most appropriate.
>
>> This patch does not provide a wrapper around SPI_cursor_move. The patch
>> is useful without that and I don't see anything that preculdes
>> someone else
>> adding that later if they see a need.
>
> My idea is to add keyword arguments to plpy.cursor() that will allow
> you to decide whether you want a scrollable cursor and after that
> provide a move() method.
>
>> The patch includes documentation updates that describes the new feature.
>> The Database Access page doesn't provide a API style list of database
>> access
>> functions like the plperl
>> http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
>> does. I think the organization of the perl page is
>> clearer than the python one and we should think about a doing some
>> documentaiton refactoring. That should be done as a seperate patch and
>> shouldn't be a barrier to committing this one.
>
> Yeah, the PL/Python docs are a bit chaotic right now. I haven't yet
> summoned force to overhaul them.
>
>> in PLy_cursor_plan line 4080
>> + PG_TRY();
>> + {
>> + Portal portal;
>> + char *volatile nulls;
>> + volatile int j;
>
>> I am probably not seeing a code path or misunderstanding something
>> about the setjmp/longjump usages but I don't see why nulls and j need
>> to be
>> volatile here.
>
> It looked like you could drop volatile there (and in
> PLy_spi_execute_plan, where this is copied from (did I mention there's
> quite some code duplication in PL/Python?)) but digging in git I found
> this commit:
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2789b7278c11785750dd9d2837856510ffc67000
>
>
> that added the original volatile qualification, so I guess there's a
> reason.
>
>> line 444
>> PLy_cursor(PyObject *self, PyObject *args)
>> + {
>> + char *query;
>> + PyObject *plan;
>> + PyObject *planargs = NULL;
>> +
>> + if (PyArg_ParseTuple(args, "s", &query))
>> + return PLy_cursor_query(query);
>> +
>>
>> Should query be freed with PyMem_free()
>
> No, PyArg_ParseTuple returns a string on the stack, I check that
> repeatedly creating a cursor with a plan argument does not leak memory
> and that adding PyMem_Free there promptly leads to a segfault.
>
>
>> I tested both python 2.6 and 3 on a Linux system
>>
>> [test cases demonstrating bugs]
>
> Turns out it's a really bad idea to store pointers to Portal
> structures, because they get invalidated by the subtransaction abort
> hooks.
>
> I switched to storing the cursor name and looking it up in the
> appropriate hash table every time it's used. The examples you sent
> (which I included as regression tests) now cause a ValueError to be
> raised with a message stating that the cursor has been created in an
> aborted subtransaction.
>
> Not sure about the wording of the error message, though.
>
> Thanks again for the review!
>
> Cheers,
> Jan
>
>
>
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Steve Singer <ssinger_pg(at)sympatico(dot)ca> |
Cc: | Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-11-29 05:01:41 |
Message-ID: | 1322542901.4595.7.camel@vanquo.pezone.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On lör, 2011-11-26 at 11:21 -0500, Steve Singer wrote:
> I've looked over the revised version of the patch and it now seems
> fine.
>
> Ready for committer.
I can take it from here.
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Jan Urbański <wulczer(at)wulczer(dot)org> |
Cc: | Steve Singer <ssinger_pg(at)sympatico(dot)ca>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-12-05 17:58:07 |
Message-ID: | 1323107887.10992.10.camel@vanquo.pezone.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote:
> On 20/11/11 19:14, Steve Singer wrote:
> > On 11-10-15 07:28 PM, Jan Urbański wrote:
> >> Hi,
> >>
> >> attached is a patch implementing the usage of SPI cursors in PL/Python.
> >> Currently when trying to process a large table in PL/Python you have
> >> slurp it all into memory (that's what plpy.execute does).
> >>
> >> J
> >
> > I found a few bugs (see my testing section below) that will need fixing
> > + a few questions about the code
>
> Responding now to all questions and attaching a revised patch based on
> your comments.
Committed
Please refresh the other patch.
From: | Jan Urbański <wulczer(at)wulczer(dot)org> |
---|---|
To: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
Cc: | Steve Singer <ssinger_pg(at)sympatico(dot)ca>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-12-05 17:59:58 |
Message-ID: | 4EDD069E.8090006@wulczer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 05/12/11 18:58, Peter Eisentraut wrote:
> On ons, 2011-11-23 at 19:58 +0100, Jan Urbański wrote:
>> On 20/11/11 19:14, Steve Singer wrote:
>> Responding now to all questions and attaching a revised patch based on
>> your comments.
>
> Committed
>
> Please refresh the other patch.
Great, thanks!
I'll try to send an updated version of the other patch this evening.
Cheers,
Jan
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Jan Urbański <wulczer(at)wulczer(dot)org> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, Steve Singer <ssinger_pg(at)sympatico(dot)ca>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-12-05 18:12:19 |
Message-ID: | 201112051812.pB5ICJC22418@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Jan Urbaski wrote:
> On 05/12/11 18:58, Peter Eisentraut wrote:
> > On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
> >> On 20/11/11 19:14, Steve Singer wrote:
> >> Responding now to all questions and attaching a revised patch based on
> >> your comments.
> >
> > Committed
> >
> > Please refresh the other patch.
>
> Great, thanks!
>
> I'll try to send an updated version of the other patch this evening.
I assume this is _not_ related to this TODO item:
Add a DB-API compliant interface on top of the SPI interface
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
From: | Jan Urbański <wulczer(at)wulczer(dot)org> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Peter Eisentraut <peter_e(at)gmx(dot)net>, Steve Singer <ssinger_pg(at)sympatico(dot)ca>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-12-05 18:20:34 |
Message-ID: | 4EDD0B72.6090804@wulczer.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 05/12/11 19:12, Bruce Momjian wrote:
> Jan Urbański wrote:
>> On 05/12/11 18:58, Peter Eisentraut wrote:
>>> On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
>>>> On 20/11/11 19:14, Steve Singer wrote:
>>>> Responding now to all questions and attaching a revised patch based on
>>>> your comments.
>>>
>>> Committed
>>>
>>> Please refresh the other patch.
>>
>> Great, thanks!
>>
>> I'll try to send an updated version of the other patch this evening.
>
> I assume this is _not_ related to this TODO item:
>
> Add a DB-API compliant interface on top of the SPI interface
No, not related.
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Jan Urbański <wulczer(at)wulczer(dot)org>, Steve Singer <ssinger_pg(at)sympatico(dot)ca>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-12-10 17:23:32 |
Message-ID: | 1323537812.20837.0.camel@vanquo.pezone.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On mån, 2011-12-05 at 13:12 -0500, Bruce Momjian wrote:
> Jan Urbański wrote:
> > On 05/12/11 18:58, Peter Eisentraut wrote:
> > > On ons, 2011-11-23 at 19:58 +0100, Jan Urba?ski wrote:
> > >> On 20/11/11 19:14, Steve Singer wrote:
> > >> Responding now to all questions and attaching a revised patch based on
> > >> your comments.
> > >
> > > Committed
> > >
> > > Please refresh the other patch.
> >
> > Great, thanks!
> >
> > I'll try to send an updated version of the other patch this evening.
>
> I assume this is _not_ related to this TODO item:
>
> Add a DB-API compliant interface on top of the SPI interface
No, but this is:
http://petereisentraut.blogspot.com/2011/11/plpydbapi-db-api-for-plpython.html