Using real libpq parameters

Lists: psycopg
From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Using real libpq parameters
Date: 2011-02-26 23:40:59
Message-ID: AANLkTi=ym3SCQKCQBtp8RJHUswwAPOpjXYKTXS=aHWzp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

Hello,

I've tried to make order in the ideas about pros and cons of using
real query parameters in libpq. Here is my analysis and a possible
plan.

Currently Psycopg uses only the PQexec [1] libpq function to send
queries to the backend: when a query with parameters is executed,
psycopg adapts every parameter value to a sql literal (the protocol
name is ISQLQuote) [2], merges the parameters to the query and sends
everything to the backend using PQexec. This strategy has largely
proven satisfying, providing the greatest flexibility for the adapter
to write the query.

Using PQexecParams [3] instead of PQexec would have several problems
and the behavior of Psycopg would change in a non-backward compatible
way. PQexecParams only accepts one non-empty command while
cur.execute() allows any number of queries separated by semicolons.
Also passing complex data types seems more... complex. An example is
with the array: because psycopg sends a complete query to the backend,
it can use the array[...] syntax where every element is escaped in a
standard way [4] but the array[] constructor doesn't seem accepted as
a parameter. In other places psycopg uses constructors that wouldn't
be valid as parameter values (e.g. '2010-01-01'::date for date
representation): in some cases the python type could be converted into
the proper paramTypes entry for disambiguation (in execute, but
probably not in executemany). The "default" literal is another handy
feature not accepted as parameter. The bottom line is that adapting
Python values for PQexec and for PQexecParams are two entirely
different businesses and adapters currently implemented for ISQLQuote
wouldn't be compatible.

So we have incompatibilities in the queries that can be executed and
in the adapters that users may have extended. I think this rules out
entirely the idea of using PQexecParams as the default exec function,
at least until psycopg3000. What would be the advantages? One is that
large strings or binary values would be treated more efficiently in
the backend, as they wouldn't bloat the parser. Another is that we may
start supporting prepared statements: I would avoid an explicit
.prepare() method exposed to Python (as the PREPARE/EXECUTE sql
statements can be already used for this task), but for instance
.executemany() could use internally PQprepare/PQexecPrepared, whose
interface is similar to PQexecParams.

Luckily (well, it wasn't luck at all: thanks to Federico's design)
psycopg already has two extension mechanisms to cope with having both
the execution methods: one is the possibility to subclass
connections/cursors and thus have a cursor supporting PQexecParams in
execute() and PQprepare in executemany() (let's call it ParamCursor).
Another one is the completely generic adaptation mechanism of Python
objects [5]: there could be a separate protocol (let's say ISQLParam)
to which python object passed to the query should be conformed:
conforming object may expose different methods from the ones required
by ISQLQuote, that may be needed to interface with PQexecParam.

Keeping the two protocols well separated would also allow us to use
the PQexec-based side in ways we preferred to avoid before. For
instance there's often been the request of a way to pass identifiers
to a query e.g. to represents table or field names: I remember the
request being discarded precisely on the basis that switching to
"proper parameters" the method would have broke. Keeping two protocols
we may have, for instance, an "Identifier" wrapper allowing
execute("insert into %s values (%s)", [Identifier('foo'), 'bar']): the
method would only work on the classic cursor, not the param-based one,
and this would be implemented by making the Identifier object conform
to ISQLQuote but not to ISQLParam.

PQexecParams also allows binary representation for the parameters. I
know they would be more efficient than textual types, but it is an
implementation leak probably useful for a specific program talking to
a specific server, but not for a generic library. So I would rule out
its usage for every data type except for bytea.

This of course will not be implemented tonight and won't be released
in Psycopg 2.4 :)

Comments and discussion are appreciated. Cheers.

-- Daniele

[1] http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-PQEXEC
[2] http://initd.org/psycopg/docs/extensions.html#sql-adaptation-protocol-objects
[3] http://www.postgresql.org/docs/9.0/static/libpq-exec.html#LIBPQ-PQEXECPARAMS
[4] http://www.postgresql.org/docs/9.0/static/arrays.html#ARRAYS-IO
[5] http://www.python.org/dev/peps/pep-0246/


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-26 23:47:22
Message-ID: E5201389-F7A5-48A7-B2DC-683AEC1BB2AE@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg


On Feb 26, 2011, at 6:40 PM, Daniele Varrazzo wrote:

> Hello,
>
> I've tried to make order in the ideas about pros and cons of using
> real query parameters in libpq. Here is my analysis and a possible
> plan.
>
<snip>
> Comments and discussion are appreciated. Cheers.

Have you looked at libpqtypes? It makes dealing with complex datatypes trivial.

http://libpqtypes.esilo.com/

Cheers,
M


From: "P(dot) Christeas" <p_christ(at)hol(dot)gr>
To: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 10:30:45
Message-ID: 201102271230.46901.p_christ@hol.gr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Sunday 27 February 2011, Daniele Varrazzo wrote:
> Hello,
>
> I've tried to make order in the ideas about pros and cons of using
> real query parameters in libpq. Here is my analysis and a possible
> plan.
> ...

Sorry not to have answered this thread earlier, I've had some trouble with
kmail+akonadi which kept me silent for a couple of days :S

Yes, we have /already/ done a few experiments on using binary data types with
psycopg2. A few remarks (merely what Daniele has said):
1. When used properly, binary datatypes have significant speed gain over text
ones. But not in all cases.
2. Not all datatypes can go binary. That's not a problem in the send
direction, because we can control that with a boolean flag.
3. There is NO universally backwards compatible way of switching from text
params to binary ones. If we ever support binary, it will need an explicit
attribute/method, like the ISQLParam adaptation that Daniele has proposed.
Since breaking compatibility of psycopg2 is out of the question, so is moving
to binary mode by default.
4. In the receive direction, things are even more complex. It wouldn't be
recommended except for cases of extreme data transfers (always with the option
to fall back to the old text code).
5. libpqtypes doesn't seem to help as you suggested. It does text->net-bin
adaptations, which we don't need, since Python internal representation is
already binary. We want a bin->net-bin library (where net-bin is the network
Postgres-native format of data, heavily dependent on server's version)..
OTOH, reading the code of libpqtypes /could/ help us write ourselves similar
adapters.

Code at:
http://git.hellug.gr/?p=xrg/psycopg2;a=shortlog;h=refs/heads/execparams

--
Say NO to spam and viruses. Stop using Microsoft Windows!


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 10:49:46
Message-ID: AANLkTimjh3nHLabzk_NyEW_6B4hnKn0uY=FD7mUv5Mn3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Sat, Feb 26, 2011 at 11:47 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:

> Have you looked at libpqtypes? It makes dealing with complex datatypes trivial.
>
> http://libpqtypes.esilo.com/

Hi A.M.,

David Blewett had suggested the same just a few hours before you :)

Maybe I'm missing something about libpqtypes: in this case some
explanation would be appreciated.

What I see is that it offers printf/scanf style parameters passing.
This is greatly helpful if you have C variables containing the
parameters: it saves building the arrays to be passed to PQexecParams.
>From their example:

PGresult *res = PQexecf(conn,
"INSERT INTO t VALUES (%int4, %text)", 654321, "some text");

this seems saving getting the oid of the types, converting the
non-string parameters to string and handling the arrays PQexecParams
requires. But I don't see how psycopg could use this handy function
because there isn't a C function containing the parameters. To
simplify what we have, let's say there is a Python tuple 't'
containing the values all as strings: using such structure would be
something like:

char *paramValues[PySequence_Size(t)];
for (i = 0; i < PySequence_Size(t); i++) {
paramValues[i]
}
PQexecParams(query, ..., paramValues, ...);

(omitting everything about types handling). How could PQexecf be
called given the parameters in the Python tuple t instead of in
separate C variables? Would it be a pattern easier than the one above?

About the types, using the bare libpq, psycopg should iterate over the
python types in input and map every python type into a numeric OID
(about which psycopg already has knowledge), pack them into a C array
and send it as paramTypes. Using libpqtypes instead psycopg should
map the python types into a string - the Postgres name of the types -
and then mangle the type names into the query string. I don't see
particular saving in doing the latter instead of the former: there is
still types mapping to do, and the result should be sprintf'd into a
new query instead of put into a C array - it seems more clumsy.

So, I think libpqtypes is a huge saving for a program that would need
otherwise a lot of bureaucracy to operate with the libpq (knowledge of
the Postgres types etc). But psycopg has already sorted out this kind
of layer.

Then, you may be referring to some different libpqtypes functions and
I may have misunderstood both David's and your advice. In this case I
would be grateful if you could illuminate me about what part of
libpqtypes would be great for psycopg to use: I may be too focused on
PQexecf and completely missing the forest for the trees.

Thank you very much.

-- Daniele


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 10:58:53
Message-ID: AANLkTi=dVyvrDqAU9T1w+bX=JNYB2qaHJwQVn3==LDA-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Sun, Feb 27, 2011 at 10:49 AM, Daniele Varrazzo
<daniele(dot)varrazzo(at)gmail(dot)com> wrote:

>  char *paramValues[PySequence_Size(t)];
>  for (i = 0; i < PySequence_Size(t); i++) {
>    paramValues[i]

That should have been something like:

paramValues[i] = PyString_AsString(PySequence_GetItem(t, i));

>  }
>  PQexecParams(query, ..., paramValues, ...);

-- Daniele


From: Federico Di Gregorio <federico(dot)digregorio(at)dndg(dot)it>
To: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 11:40:00
Message-ID: 4D6A3810.5060300@dndg.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On 27/02/11 00:40, Daniele Varrazzo wrote:
[snip]
> Luckily (well, it wasn't luck at all: thanks to Federico's design)
> psycopg already has two extension mechanisms to cope with having both
> the execution methods: one is the possibility to subclass
> connections/cursors and thus have a cursor supporting PQexecParams in
> execute() and PQprepare in executemany() (let's call it ParamCursor).
> Another one is the completely generic adaptation mechanism of Python
> objects [5]: there could be a separate protocol (let's say ISQLParam)
> to which python object passed to the query should be conformed:
> conforming object may expose different methods from the ones required
> by ISQLQuote, that may be needed to interface with PQexecParam.

If you have a look at ISQLQuote empty implementation you'll see a
getbinary() method that was put there exactly for the reason of
supporting PQexecParams & co. At the time the idea was that an adapter
*should* provide a method to convert the adapted object to PostgreSQL
binary representation.

As Daniele explained the main problem is that not every value can be
converted to binary representation so, probably, a separate protocol
would be better.

About the API I'd probably go for complete transparency: if every
parameter supports the new protocol, then adapt to binary, else just
fall back to the current implementation. Checking the parameters should
be quite fast (the code don't really adapt anything yet).

federico

--
Federico Di Gregorio federico(dot)digregorio(at)dndg(dot)it
Studio Associato Di Nunzio e Di Gregorio http://dndg.it
Lord, defend me from my friends; I can account for my enemies.
-- Charles D'Hericault


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 16:11:25
Message-ID: 36393852-402D-483D-B394-6CBDD07387C2@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg


On Feb 27, 2011, at 5:49 AM, Daniele Varrazzo wrote:

> On Sat, Feb 26, 2011 at 11:47 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
>
>> Have you looked at libpqtypes? It makes dealing with complex datatypes trivial.
>>
>> http://libpqtypes.esilo.com/
>
> Hi A.M.,
>
> David Blewett had suggested the same just a few hours before you :)
>
> Maybe I'm missing something about libpqtypes: in this case some
> explanation would be appreciated.
>
> What I see is that it offers printf/scanf style parameters passing.
> This is greatly helpful if you have C variables containing the
> parameters: it saves building the arrays to be passed to PQexecParams.
> From their example:
>
> PGresult *res = PQexecf(conn,
> "INSERT INTO t VALUES (%int4, %text)", 654321, "some text");

One uses PQparamExec after constructing a PQparam with PQputf- certainly not elegant and easy to mess up, but hey, it's C. There's an example here: http://libpqtypes.esilo.com/man3/PQputf.html

PQexecf is just a convenience wrapper around these functions.

>
> About the types, using the bare libpq, psycopg should iterate over the
> python types in input and map every python type into a numeric OID
> (about which psycopg already has knowledge), pack them into a C array
> and send it as paramTypes. Using libpqtypes instead psycopg should
> map the python types into a string - the Postgres name of the types -
> and then mangle the type names into the query string. I don't see
> particular saving in doing the latter instead of the former: there is
> still types mapping to do, and the result should be sprintf'd into a
> new query instead of put into a C array - it seems more clumsy.

If psycopg2 doesn't know the type, then it can pass it as text and set the OID to 0 so the server will try to figure it out. And yes, psycopg2 would be able to delete the OID juggling code, but I see that as a positive step. For backwards compatibility, the register_type function and friends could hook into the libpqtypes OID handler table: http://libpqtypes.esilo.com/browse_source.html?file=handler.c

For numeric types, psycopg would add ints and doubles to the PQParam, saving the server that parsing step.

>
> So, I think libpqtypes is a huge saving for a program that would need
> otherwise a lot of bureaucracy to operate with the libpq (knowledge of
> the Postgres types etc). But psycopg has already sorted out this kind
> of layer.

Postgresql has a problem with prepared statements in that the execution plan is determined before the values are bound, so I suspect the old escape-string methods will need to be preserved and pumped through the prepare for '_'. (I would hope that all queries would go through the extended query protocol if libpqtypes were to be used.) Clearly, some runtime option would need to be provided through Python.

However, I believe (as a matter of opinion) that the multi-statement execution string is the worst misfeature of the v1 protocol. As you mentioned, there is no such option in v2, but I don't believe that to be a loss. However, it would definitely be a backwards-compatibility issue.

I suspect that using libpqtypes would allow psycopg2 to delete a lot of its bureacracy.

>
> Then, you may be referring to some different libpqtypes functions and
> I may have misunderstood both David's and your advice. In this case I
> would be grateful if you could illuminate me about what part of
> libpqtypes would be great for psycopg to use: I may be too focused on
> PQexecf and completely missing the forest for the trees.

What is most exciting about libpqtypes is that is uses the binary protocol which can bring orders-of-magnitude performance benefits for some workloads.

So, to summarize, libpqtypes:
- is a utility wrapper around libpq
- would allow psycopg to delete a bunch of code surrounding escaping and special type handling while supporting C implementations of user-defined types (fast!)
- is actively developed and maintained (as a license-compatible project) with developers who would be receptive to assisting this project
- can offer surprising performance benefits
- would give psycopg2 a strong competitive advantage over the other 3000 python postgresql client libraries
- would include some backwards compatibility issues for this project (though nothing insurmountable)

Cheers,
M


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: "P(dot) Christeas" <p_christ(at)hol(dot)gr>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 17:49:23
Message-ID: AANLkTim7ft6n4YugJu0Xr+fdrYWOndzEZfzT=4tcBT6F@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Sun, Feb 27, 2011 at 10:30 AM, P. Christeas <p_christ(at)hol(dot)gr> wrote:

>  5. libpqtypes doesn't seem to help as you suggested. It does text->net-bin
> adaptations, which we don't need, since Python internal representation is
> already binary. We want a bin->net-bin library (where net-bin is the network
> Postgres-native format of data, heavily dependent on server's version)..

I have re-(re-re-)read the comments about the binary types in the
libpq documentation, and it's not as dire as I thought it was:

"""
Values passed in binary format require knowledge of the internal
representation expected by the backend. For example, integers must be
passed in network byte order. Passing numeric values requires
knowledge of the server storage format, as implemented in
src/backend/utils/adt/numeric.c::numeric_send() and
src/backend/utils/adt/numeric.c::numeric_recv().
"""

So *it is* an implementation leak, but it doesn't promise breakage at
every version. I will ask for further details on the postgres MLs to
try to get a better picture.

> OTOH, reading the code of libpqtypes /could/ help us write ourselves similar
> adapters.

This is true of course: even if we decided not to use libpqtypes
there's surely a lot to learn from it.

-- Daniele


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-27 23:35:43
Message-ID: AANLkTi=SUS-Zj=PJiW1DF2W+M3pn4jage01+jG0yYGUq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Sun, Feb 27, 2011 at 4:11 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
>
> On Feb 27, 2011, at 5:49 AM, Daniele Varrazzo wrote:

>> What I see is that it offers printf/scanf style parameters passing.
>> This is greatly helpful if you have C variables containing the
>> parameters: it saves building the arrays to be passed to PQexecParams.
>> From their example:
>>
>>  PGresult *res = PQexecf(conn,
>>    "INSERT INTO t VALUES (%int4, %text)", 654321, "some text");
>
> One uses PQparamExec after constructing a PQparam with PQputf- certainly not elegant and easy to mess up, but hey, it's C. There's an example here: http://libpqtypes.esilo.com/man3/PQputf.html
>
> PQexecf is just a convenience wrapper around these functions.

I see, but PQexecf is another variadic function, so my point is the
same. If you have arguments in variables a, b, c, you can call
PQputf("%text, %text, %text", a, b, c) and that's handy. But if you
have an array containing three variables v[] and its length L, how do
you pass them to PQputf?

> Postgresql has a problem with prepared statements in that the execution plan is determined before the values are bound, so I suspect the old escape-string methods will need to be preserved and pumped through the prepare for '_'. (I would hope that all queries would go through the extended query protocol if libpqtypes were to be used.) Clearly, some runtime option would need to be provided through Python.

I know about the suboptimal plan Postgres generates in case of
prepared statements. Does it apply for PQexecParams too or just for
PQprepare? I've asked on the -general about this.

> What is most exciting about libpqtypes is that is uses the binary protocol which can bring orders-of-magnitude performance benefits for some workloads.

If there is guarantee it is stable as much as the textual
representation of the data types we'll want to have plenty of it. But
maintainibility has a priority over performance in psycopg and I don't
want to cause problem in future client-server interoperability to gain
some performance.

> So, to summarize, libpqtypes:
> - is a utility wrapper around libpq
> - would allow psycopg to delete a bunch of code surrounding escaping and special type handling while supporting C implementations of user-defined types (fast!)
> - is actively developed and maintained (as a license-compatible project) with developers who would be receptive to assisting this project
> - can offer surprising performance benefits
> - would give psycopg2 a strong competitive advantage over the other 3000 python postgresql client libraries
> - would include some backwards compatibility issues for this project (though nothing insurmountable)

A few things of the library are really interesting, and I have a lot
to learn about the binary protocol. I will study it closer to see if
it can be helpful: I see its utility from the PoV of an end-user
program, but because psycopg is a generic library, and doesn't deal
directly with C variables (but with Python values in C structures rich
of metadata) I have to understand if interfacing to it is really an
improvement respect to interfacing directly to the libpq.

Cheers,

-- Daniele


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-28 18:49:38
Message-ID: 82D2B5BA-19EA-4DB5-9AF0-3FC93EDBF138@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg


On Feb 27, 2011, at 6:35 PM, Daniele Varrazzo wrote:

> On Sun, Feb 27, 2011 at 4:11 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
>>
>> On Feb 27, 2011, at 5:49 AM, Daniele Varrazzo wrote:
>
> I see, but PQexecf is another variadic function, so my point is the
> same. If you have arguments in variables a, b, c, you can call
> PQputf("%text, %text, %text", a, b, c) and that's handy. But if you
> have an array containing three variables v[] and its length L, how do
> you pass them to PQputf?

One calls it multiple times. Hopefully this example will clear up the confusion:

#include <libpq-fe.h>
#include <libpqtypes.h>
#include <assert.h>

int main(int argc,char *argv[])
{
PGconn *conn = PQconnectdb("dbname=test");
assert(conn!=NULL);

PQtypesRegister(conn);

PGparam *param = PQparamCreate(conn);

PQputf(param,"%int4",5); //calling PQputf multiple times
PQputf(param,"%text","spam");

PGresult *res=PQparamExec(conn,param,"SELECT $1,$2;",0);
PGint4 val1;
PGtext val2;

assert(PQgetf(res,0,"%int4",0,&val1)>0);
assert(PQgetf(res,0,"%text",1,&val2)>0);

printf("%d %s\n",val1,val2);

PQclear(res);
}

> I know about the suboptimal plan Postgres generates in case of
> prepared statements. Does it apply for PQexecParams too or just for
> PQprepare? I've asked on the -general about this.

It applies to all prepared statements regardless of the API. However, as a special case, the special "" (empty string) prepared statement follows this rule:
"The unnamed prepared statement is likewise planned during Parse processing if the Parse message defines no parameters. But if there are parameters, query planning occurs every time Bind parameters are supplied. This allows the planner to make use of the actual values of the parameters provided by each Bind message, rather than use generic estimates."
http://www.postgresql.org/docs/9.0/interactive/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

This allows one to better mimic the behavior of the simple query protocol.

>
>> What is most exciting about libpqtypes is that is uses the binary protocol which can bring orders-of-magnitude performance benefits for some workloads.
>
> If there is guarantee it is stable as much as the textual
> representation of the data types we'll want to have plenty of it. But
> maintainibility has a priority over performance in psycopg and I don't
> want to cause problem in future client-server interoperability to gain
> some performance.

libpqtypes supports ALL possible PostgreSQL types (user-defined as well). Right now, creating pycopg2 types involves parsing the textual representations from the result, however, using libpqtypes, psycopg2 could offer a much more natural API like this:

class Point(object):
def __init__(self,x,y):
self.x = x
self.y = y

psycopg2.register_type(Point,["x","y"])
or
psycopg2.register_type(Point,{"x":int,"y":int})
or even
psycopg2.register_type(Point) #psycopg2 extracts all properties as a tuple to pass to libpqtypes

or something like that- I'm just brainstorming here. Obviously, this wouldn't work if a Point instance needed to be interpolated into a query string.

>
>
>> So, to summarize, libpqtypes:
>> - is a utility wrapper around libpq
>> - would allow psycopg to delete a bunch of code surrounding escaping and special type handling while supporting C implementations of user-defined types (fast!)
>> - is actively developed and maintained (as a license-compatible project) with developers who would be receptive to assisting this project
>> - can offer surprising performance benefits
>> - would give psycopg2 a strong competitive advantage over the other 3000 python postgresql client libraries
>> - would include some backwards compatibility issues for this project (though nothing insurmountable)
>
> A few things of the library are really interesting, and I have a lot
> to learn about the binary protocol. I will study it closer to see if
> it can be helpful: I see its utility from the PoV of an end-user
> program, but because psycopg is a generic library, and doesn't deal
> directly with C variables (but with Python values in C structures rich
> of metadata) I have to understand if interfacing to it is really an
> improvement respect to interfacing directly to the libpq.

The best thing about libpqtypes is that you don't need to learn about the binary protocol because it is all cleanly wrapped up. For example, the library has to account for endianness of the server and other hurdles. It is all handled transparently.

Cheers,
M


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Using real libpq parameters
Date: 2011-02-28 21:27:46
Message-ID: AANLkTikamsFRkyTexx+Hkin7FMWMcXgd00W9QZC528U1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: psycopg

On Mon, Feb 28, 2011 at 6:49 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
>
> On Feb 27, 2011, at 6:35 PM, Daniele Varrazzo wrote:

>> I see, but PQexecf is another variadic function, so my point is the
>> same. If you have arguments in variables a, b, c, you can call
>> PQputf("%text, %text, %text", a, b, c) and that's handy. But if you
>> have an array containing three variables v[] and its length L, how do
>> you pass them to PQputf?
>
> One calls it multiple times. Hopefully this example will clear up the confusion:
> [...]
>  PQputf(param,"%int4",5);    //calling PQputf multiple times
>  PQputf(param,"%text","spam");
>
>  PGresult *res=PQparamExec(conn,param,"SELECT $1,$2;",0);
> [...]

Yes it does, thank you.

>> I know about the suboptimal plan Postgres generates in case of
>> prepared statements. Does it apply for PQexecParams too or just for
>> PQprepare? I've asked on the -general about this.
>
> It applies to all prepared statements regardless of the API. However, as a special case, the special "" (empty string) prepared statement follows this rule:
> [...]
> http://www.postgresql.org/docs/9.0/interactive/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY

This is interesting. This means that if we wanted to use PQprepare to
avoid repeated planning in executemany() we should set a non-empty
name for the statement. Good to know...

> libpqtypes supports ALL possible PostgreSQL types (user-defined as well). Right now, creating pycopg2 types involves parsing the textual representations from the result, however, using libpqtypes, psycopg2 could offer a much more natural API like this:
>
> class Point(object):,
>        def __init__(self,x,y):
>                self.x = x
>                self.y = y
> [...]
> psycopg2.register_type(Point) #psycopg2 extracts all properties as a tuple to pass to libpqtypes

FYI, in psycopg 2.4 we have added register_composite() that does about
the same, converting user-defined types into [named]tuples. Of course
it parses the textual representation. The Point was just an example
for a simple adapter, it wasn't meant as an example for composite
(e.g. if you wanted to write ad adapter for *yuck* money, that's the
recipe)

>>> So, to summarize, libpqtypes:

> The best thing about libpqtypes is that you don't need to learn about the binary protocol because it is all cleanly wrapped up. For example, the library has to account for endianness of the server and other hurdles. It is all handled transparently.

Yup, it seems the good way to go indeed if we wanted to introduce
binary communication. Note anyway that using PQexecParams doesn't
imply automatically using binary parameters, and not for all the data
types the benefit of binary is proven. It is definitely for bytea. I'd
argue that for text is the same. I've chatted with Christeas at FOSDEM
and he said (correct me If I'm wrong) after some benchmark he has made
that the performance with numbers were mixed: sending 8 bytes to
represent '3' is slower than sending and parsing the textual
representation. Datetime types would definitely benefit from binary,
but probably they are the only other types for which there is a sure
gain.

I see two sequential steps in this development: the first is to lay
out the ParamCursor (that should convert the python query into the
correct representation for PQexecParams)* and define the ISQLParam
protocol. Within this framework we could introduce the binary
protocol. So we could proceed:

1. text for everything except bytea (first step)
2a. binary only for selected data types, with our code, or
2b. binary for everything using libpqtypes

in case we went for 2b help from people "into" libpqtypes would be
greatly appreciated of course :)

* what query conversion? To make ParamCursor use easier (migrating
previous code, using python dicts) I envise ParamCursor to use the
current format ("%s") and pyformat ("%(name)s") not the libpq $1, $2
format. This will require some funny conversion of the query string:

- "SELECT %s, %s, %s" would become "SELECT $1, $2, $3"
- "SELECT %(foo)s, %(bar)s, %(foo)s" would become "SELECT $1, $2, $1"
plus a remapping of the parameters used in dict into a sequence of 2
items.

Cheers,

-- Daniele