pl/python improvements

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: pl/python improvements
Date: 2010-12-07 19:17:57
Message-ID: 4CFE8865.7050506@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

no, no patch(es) yet. I'm going through plpython.c trying as best I can
to improve things there. I'll have a patch (or patches) ready for the
January commitfest, but I thought I'd open up a discussion already to
spare me having to redo features because the way I attacked the problem
is a dead end. Be warned, this ended up being a very long email...

The code is on https://github.com/wulczer/postgres, in the plpython
branch. I'll be rebasing it regularly, so don't be surprised by commit
hashes changing.

Currently the improvements are:
* added a validator, so syntax errors are caught on function creation,
and you don't pay compile overhead when first calling the function
* use HTABs instead of Python dictionaries to cache procedures,
maintaing two separate HTABs for procedures and triggers (this way you
can use the procedure's OID as the hash key)
* get rid of the global PLy_error_in_progress, hopefuly things that
cause errors are reported immediately
* use palloc(TopMemoryContext) instead of malloc() for things that need
to live until the end of the session
* execute SPI calls in a subtransaction, report errors back to Python
as exceptions that can be caught etc.
* do not prefix error messages with "PL/Python", just use the error
string as errmsg
* if a function has composite type inputs, check if the underlying
types have not changed when calling the function and recompile it if
necessary (fixes a TODO item)
* support tracebacks and make them look just like Python tracebacks (as
much as possible that is)
* fix plpy.error and plpy.fatal to actually raise a
plpy.Error/plpy.Fatal exception, that can be caught etc.
* remove volatile modifiers from variables (AIUI they are completely
not necessary anywhere in plpython.c)
* general small refactorings, removing unnecessary code, splitting big
procedures into smaller ones

Planned improvements include:
* explicit subxact starting, so you can issue consecutive plpy.execute
calls atomically
* split plpython.c into smaller modules, like plpython_traceback.c,
plpython_type_conversion.c, plpython_plpy.c etc.
* variadic argument handling (is this possible for a non-plpgsql pl?)
* multiple OUT parameters support (a TODO item, I think "table function
support" meant to say that)
* add a DB-API interface on top of SPI (a TODO item)

Not sure if I'll make it in time with all of them, I have put them in
more-or-less priority order.

Now that you're excited, the questions.

Q1

To check for composite types changing after the procedure I/O functions
have been cached, I store the relid of the base relation for the type
(typeidTypeRelid(desc->tdtypeid), where desc is the type's TupleDesc),
as well as the xmin and tid of the relation's tuple from pg_class. Then
when the function is invoked, for each composite parameter I fetch the
relation's tuple from pg_class again and compare xmin and tid. This
means that functions with composite parameters are slower, because
there's a syscache search for each parameter for each invocation. I
roughly measured the slowdown for a function with one composite param to
be 4%. OTOH this works now:

CREATE TABLE employee (
name text,
basesalary integer,
bonus integer
);

INSERT INTO employee VALUES ('John', 100, 10), ('Mary', 200, 10);

CREATE OR REPLACE FUNCTION test_composite_table_input(e employee)
RETURNS integer AS $$
return e['basesalary'] + e['bonus']
$$ LANGUAGE plpythonu;

SELECT name, test_composite_table_input(employee.*) FROM employee;

ALTER TABLE employee DROP bonus;

ALTER TABLE employee ADD bonus integer;
UPDATE employee SET bonus = 10;

-- will recompile the function and recache I/O functions
SELECT name, test_composite_table_input(employee.*) FROM employee;

PL/Perl has the same problem, BTW. The question is: am I going the right
way checking if the composite type is up-to-date? The TODO item says:
(is this possible, easy, quick?). My answer is: I guess, kinda, dunno.

Q2

The format of error messages changes, because the "PL/Python" prefix
gets removed (I want raise plpy.Error("foo") to result in "ERROR:
plpy.ERROR: foo", not in "PL/Python: plpy.ERROR: foo") and because of
other things. Like for instance since plpy.error(msg) now just raises a
plpy.Error exception, you will get "ERROR: plpy.Error: msg" and not just
"ERROR: msg". This potentially breaks clients parsing error messages,
but I don't buy that argument really. Still, is it acceptable?

Q3

Tracebacks! The way I implemented them (based on the patch linked from
the TODO item) they look like this:

CREATE FUNCTION nested_error() RETURNS text AS
'def fun1():
plpy.error("boom")

def fun2():
fun1()

def fun3():
fun2()

fun3()
return "not reached"
' LANGUAGE plpythonu;
SELECT nested_error();
ERROR: plpy.Error: boom
DETAIL: Traceback (most recent call last):
PL/Python function "nested_error", line 10, in <module>
fun3()
PL/Python function "nested_error", line 8, in fun3
fun2()
PL/Python function "nested_error", line 5, in fun2
fun1()
PL/Python function "nested_error", line 2, in fun1
plpy.error("boom")
CONTEXT: PL/Python function "nested_error"

Tracebacks are added to every error, even if their depth is 1, making
them a bit redundant, so you get things like:

CREATE FUNCTION sql_syntax_error() RETURNS text AS
'plpy.execute("syntax error")'
LANGUAGE plpythonu;
SELECT sql_syntax_error();
ERROR: plpy.SPIError: syntax error at or near "syntax"
DETAIL: Traceback (most recent call last):
PL/Python function "sql_syntax_error", line 1, in <module>
plpy.execute("syntax error")
CONTEXT: PL/Python function "sql_syntax_error"

Notice that they look almost exactly like a traceback Python would give
you. I had to cheat there a bit, to hide the ugly generated Python
function name and remove the topmost frame from the traceback that was
pointing at the toplevel interpreter module.
Another oddity is that Python tracebacks have line numbers, so I am
storing the entire text of the function in the cache to print the
corresponding line of each frame, just like the real Python traceback.py
module. I guess that's negligible overhead? Or should I refetch prosrc
every time a traceback is printed? Note that traceback computation costs
some cycles, but I think it's OK, since we're not optimizing for that
case, obviously. The question is: is this how you'd like plpython
tracebacks to look like?

Q3

Splitting into smaller modules - should I do it? I find a ~4k loc module
to be too big, but then again, if I do the split, the resulting diff
will be difficult to review. Or won't it?

Q4

Passing function arguments as globals is IMHO ugly. Not sure what the
TODO item wants to say by "Passing them as globals means functions
cannot be called recursively.". I tried and recursion worked just fine.
But it's still quite ugly IMHO... problem is that I don't know hot to
fix it. Creating a Python function with the signature mimicking the one
of the SQL function is tricky because of unnamed arguments. Also,
passing arguments as real arguments would break code that uses the
'global' trick suggested in the manual. Even though the manual says you
should not rely on it, I still feel bad about braking people's code.
Does anyone have any bright ideas on how to improve argument passing in
plplython?

Q5 (last one)

Anyone has any more things on her/his PL/Python Christmas whishlist?
plpy.make_coffee()?

Thanks for reading all the way here,
Jan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2010-12-07 20:33:20 Re: pl/python improvements
Previous Message Robert Haas 2010-12-07 18:44:00 Re: unlogged tables