Re: Proper query implementation for Postgresql driver

Lists: pgsql-hackers
From: Shay Rojansky <roji(at)roji(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proper query implementation for Postgresql driver
Date: 2014-09-28 09:53:40
Message-ID: CADT4RqBgnpJg+b2QrKyuomKQ_u9MDcmshwKgniRBH0rAho1zqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're hard
at work on a major 3.0 version. I have a fundamental question that's been
bugging me for a while and would like to get your opinion on it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query flow
(client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

According to the Postgresql docs (49.2.2), the simple query flow, "the
format of the retrieved values is always text". This creates a burden where
npgsql needs to parse textual (and locale-specific!) info (e.g. dates,
money). The situation is even worse when doing client-side parameters,
since npgsql has to *create* textual representations that match what
Postgresql is expecting. The workaround for this issue up to now has been
to switch to culture-invariant formatting (e.g. lc_monetary=C), but this
approach imposes the setting on users and affects functions in ways they
don't necessarily want.

I would, in theory, love to switch the entire thing to binary and thereby
avoid all textual parsing once and for all. If I understand correctly, this
means all queries must be implemented as extended queries, with numerous
extra client-server roundtrips - which are a bit hard to stomach. Section
49.1.2 of the manual also states that the unnamed prepared statement and
portal are optimized for the case of executing a query only once, hinting
that this is the proper way to do things - but this optimization still
cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
EXECUTE).

Can someone please let me know what the recommended/best practice here
would be?

Thanks,

Shay


From: Marko Tiikkaja <marko(at)joh(dot)to>
To: Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-28 10:40:41
Message-ID: 5427E5A9.5060109@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/28/14, 11:53 AM, Shay Rojansky wrote:
> I would, in theory, love to switch the entire thing to binary and thereby
> avoid all textual parsing once and for all. If I understand correctly, this
> means all queries must be implemented as extended queries, with numerous
> extra client-server roundtrips - which are a bit hard to stomach. Section
> 49.1.2 of the manual also states that the unnamed prepared statement and
> portal are optimized for the case of executing a query only once, hinting
> that this is the proper way to do things - but this optimization still
> cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
> EXECUTE).

You don't have to do multiple round-trips for that; you can just send
all the messages in one go. See how e.g. libpq does it in PQexecParams().

.marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-28 14:35:08
Message-ID: 1843.1411914908@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja <marko(at)joh(dot)to> writes:
> On 9/28/14, 11:53 AM, Shay Rojansky wrote:
>> [ complaint about multiple round trips in extended protocol ]

> You don't have to do multiple round-trips for that; you can just send
> all the messages in one go. See how e.g. libpq does it in PQexecParams().

Right. The key thing to understand is that after an error, the server
skips messages until it sees a Sync. So you can send out Parse, Bind,
Execute, Sync in one packet and not have to worry that the server will
attempt to execute a query that failed parsing or whatever.

regards, tom lane


From: Shay Rojansky <roji(at)roji(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Tiikkaja <marko(at)joh(dot)to>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-28 15:12:35
Message-ID: CADT4RqD11D8PJtRvkQ71RkGmzd+s48pa=a8bbsZ-k8oEhyMX4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks guys, that makes perfect sense to me...

Am Sonntag, 28. September 2014 schrieb Tom Lane :

> Marko Tiikkaja <marko(at)joh(dot)to <javascript:;>> writes:
> > On 9/28/14, 11:53 AM, Shay Rojansky wrote:
> >> [ complaint about multiple round trips in extended protocol ]
>
> > You don't have to do multiple round-trips for that; you can just send
> > all the messages in one go. See how e.g. libpq does it in
> PQexecParams().
>
> Right. The key thing to understand is that after an error, the server
> skips messages until it sees a Sync. So you can send out Parse, Bind,
> Execute, Sync in one packet and not have to worry that the server will
> attempt to execute a query that failed parsing or whatever.
>
> regards, tom lane
>


From: Shay Rojansky <roji(at)roji(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:09:38
Message-ID: CADT4RqD5u6owNwEoxOevAjc4YzHE+uMrjYosKoy5vuNgS2ZB5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi again, a few days ago I sent a question (below) about using the extended
query protocol efficiently (especially for non-prepared statements). Your
responses spawned a good discussion between the npgsql developers, it's
here if you wish to look/weigh in:
https://github.com/npgsql/npgsql/issues/370

The idea of using extended query protocol for non-prepared queries raised
another "best practices" question, which I'm hoping you can help with. To
get rid of text encoding (which is locale-dependent, inefficient etc.) for
certain fields, it seems that we have to get rid of it for *all* fields.
This is because we send queries without knowing their result columns in
advance, and would like to pack all messages into a single packet to avoid
roundtrips. In other words, when we send a Bind message we don't yet have a
RowDescription, and so we can' specify on an individual columnar basis
which fields should be returned as binary and which as text. We're left
with the option of either going "full binary" or having to support text
encoding for *all* fields - the abolition of which was the whole point.

We can split the query process into two roundtrips - wait for the
RowDescription and only then send Bind - in order to pick-and-choose
text/binary on a columnar basis, but this is a non-trivial performance hit
which we want to avoid.

Regarding the option of going "full binary"; selecting all types from
pg_type shows hundreds of types. We'd have to provide binary encode/decode
implementation for most (?) of them. In addition, say a new type is added
(via an extension for example); text encoding at least had the advantage of
not forcing us to support everything: the unknown type would be transferred
as text and we'd provide that text to the user as an unprocessed string.
Going full binary seems to eliminate this possibility.

I'd like to understand what we're supposed to do, as a Postgresql driver.
Are we supposed to:
1) Go full binary and implement all types (but what to do about unknown
ones)?
2) Do two roundtrips for queries, thereby hurting performance?
3) Support text encoding for all fields, and manage somehow with issues
such as locale variations (by forcing the locale to be culture invariant,
as we do now)?

Thanks again for your help!

Shay

---------- Forwarded message ----------
From: Shay Rojansky <roji(at)roji(dot)org>
Date: Sun, Sep 28, 2014 at 11:53 AM
Subject: Proper query implementation for Postgresql driver
To: pgsql-hackers(at)postgresql(dot)org

Hi.

I'm a core developer on npgsql (the Postgresql .NET provider), we're hard
at work on a major 3.0 version. I have a fundamental question that's been
bugging me for a while and would like to get your opinion on it.

Npgsql currently supports three basic query modes: simple, client-side
parameters and prepared. The first two use the Postgresql simple query flow
(client-side parameters means the user specifies parameters
programmatically, just like with prepared queries, but the actual
substitution work is done client-side). Prepared uses the Postgresql
extended query flow.

According to the Postgresql docs (49.2.2), the simple query flow, "the
format of the retrieved values is always text". This creates a burden where
npgsql needs to parse textual (and locale-specific!) info (e.g. dates,
money). The situation is even worse when doing client-side parameters,
since npgsql has to *create* textual representations that match what
Postgresql is expecting. The workaround for this issue up to now has been
to switch to culture-invariant formatting (e.g. lc_monetary=C), but this
approach imposes the setting on users and affects functions in ways they
don't necessarily want.

I would, in theory, love to switch the entire thing to binary and thereby
avoid all textual parsing once and for all. If I understand correctly, this
means all queries must be implemented as extended queries, with numerous
extra client-server roundtrips - which are a bit hard to stomach. Section
49.1.2 of the manual also states that the unnamed prepared statement and
portal are optimized for the case of executing a query only once, hinting
that this is the proper way to do things - but this optimization still
cannot not eliminate the extra roundtrips mentioned above (PREPARE, BIND,
EXECUTE).

Can someone please let me know what the recommended/best practice here
would be?

Thanks,

Shay


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:20:29
Message-ID: 542A3D9D.30403@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/28/2014 05:53 PM, Shay Rojansky wrote:
> Hi.
>
> I'm a core developer on npgsql (the Postgresql .NET provider), we're
> hard at work on a major 3.0 version. I have a fundamental question
> that's been bugging me for a while and would like to get your opinion on it.
>
> Npgsql currently supports three basic query modes: simple, client-side
> parameters and prepared. The first two use the Postgresql simple query
> flow (client-side parameters means the user specifies parameters
> programmatically, just like with prepared queries, but the actual
> substitution work is done client-side). Prepared uses the Postgresql
> extended query flow.

Frankly, I suggest dropping "simple" entirely and using only the
parse/bind/describe/execute flow in the v3 protocol.

You can use this for server-side parameter binding *without* storing a
prepared statement by using unnamed statements.

Client-side parameter binding remains useful if you want to support
parameterisation where the PostgreSQL server its self does not, e.g. in
DDL. If you don't care about that, you could reasonably just drop client
side parameter support entirely.

> I would, in theory, love to switch the entire thing to binary and
> thereby avoid all textual parsing once and for all. If I understand
> correctly, this means all queries must be implemented as extended
> queries, with numerous extra client-server roundtrips - which are a bit
> hard to stomach.

What round-trips?

You can and should send parse/bind/describe/execute messages
back-to-back without waiting for a server response. Just Sync and wait
for server response at the end.

You can even send a parse then a stream of bind/describe/execute
messages for batch execution of a prepared statement against a list of
params, then a single Sync at the end.

> Can someone please let me know what the recommended/best practice here
> would be?

You might want to check out what PgJDBC does; it's fairly sane in this area.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Shay Rojansky <roji(at)roji(dot)org>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:24:51
Message-ID: CADT4RqCBw0aK0QgP_U25KWNAibh5-A1Mwixx1=zYuGp0Lvcq8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for the answer Craig.

The remaining point of confusion is not really about simple vs. extended,
it's about text vs. binary. Sending parse/bind/describe/execute messages
back-to-back means the results have to be all text or all binary.

So the question is: are we supposed to transfer all types to and from the
backend in binary? If so, that raises some difficulties (see my previous
message) which I wanted to get your opinion on.

Thanks for the suggestion to look at PgJDBC, I'll do that.

Shay

On Tue, Sep 30, 2014 at 7:20 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 09/28/2014 05:53 PM, Shay Rojansky wrote:
> > Hi.
> >
> > I'm a core developer on npgsql (the Postgresql .NET provider), we're
> > hard at work on a major 3.0 version. I have a fundamental question
> > that's been bugging me for a while and would like to get your opinion on
> it.
> >
> > Npgsql currently supports three basic query modes: simple, client-side
> > parameters and prepared. The first two use the Postgresql simple query
> > flow (client-side parameters means the user specifies parameters
> > programmatically, just like with prepared queries, but the actual
> > substitution work is done client-side). Prepared uses the Postgresql
> > extended query flow.
>
> Frankly, I suggest dropping "simple" entirely and using only the
> parse/bind/describe/execute flow in the v3 protocol.
>
> You can use this for server-side parameter binding *without* storing a
> prepared statement by using unnamed statements.
>
> Client-side parameter binding remains useful if you want to support
> parameterisation where the PostgreSQL server its self does not, e.g. in
> DDL. If you don't care about that, you could reasonably just drop client
> side parameter support entirely.
>
> > I would, in theory, love to switch the entire thing to binary and
> > thereby avoid all textual parsing once and for all. If I understand
> > correctly, this means all queries must be implemented as extended
> > queries, with numerous extra client-server roundtrips - which are a bit
> > hard to stomach.
>
> What round-trips?
>
> You can and should send parse/bind/describe/execute messages
> back-to-back without waiting for a server response. Just Sync and wait
> for server response at the end.
>
> You can even send a parse then a stream of bind/describe/execute
> messages for batch execution of a prepared statement against a list of
> params, then a single Sync at the end.
>
> > Can someone please let me know what the recommended/best practice here
> > would be?
>
> You might want to check out what PgJDBC does; it's fairly sane in this
> area.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:31:08
Message-ID: 30470.1412055068@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Shay Rojansky <roji(at)roji(dot)org> writes:
> The idea of using extended query protocol for non-prepared queries raised
> another "best practices" question, which I'm hoping you can help with. To
> get rid of text encoding (which is locale-dependent, inefficient etc.) for
> certain fields, it seems that we have to get rid of it for *all* fields.
> This is because we send queries without knowing their result columns in
> advance, and would like to pack all messages into a single packet to avoid
> roundtrips.

FWIW, I'd go with text results, especially if you already have code to
deal with that. PG's on-the-wire binary formats are more efficient to
process in some absolute sense, but they're hardly free: you need to
consider byte endianness for integers and floats, integer vs float
encoding for timestamps, the difference between PG's timestamp
representation and whatever native timestamps are on your platform,
etc etc. It's not a trivial amount of code to deal with. And in the
end I think the efficiency gain is pretty marginal compared to the raw
costs of data transfer, especially if you're not on the same physical
machine as the server.

Binary formats are a good tradeoff for individual applications that know
exactly which data types they need to deal with. It's harder to make the
case that they're worth the trouble in general-purpose client libraries.

Having said that, there has been some talk of letting client libraries
supply a "whitelist" of data types that they'd like to receive in binary.
We could do that (modulo questions of whether it's worth incurring a
protocol version change for), but I'm unclear on what the use case really
is for that. Wouldn't you then have to provide an inconsistent API to
users of your driver, that is some things are presented in text and others
not? Is that really a great thing?

regards, tom lane


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:34:29
Message-ID: 20140930053429.GA21712@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-09-30 07:09:38 +0200, roji(at)roji(dot)org wrote:
>
> the unknown type would be transferred as text and we'd provide that
> text to the user as an unprocessed string. Going full binary seems to
> eliminate this possibility.

If you're willing to hand the user an unprocessed string, why can't that
be the binary encoding just as well as text?

-- Abhijit


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:36:17
Message-ID: 30612.1412055377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Shay Rojansky <roji(at)roji(dot)org> writes:
> Thanks for the suggestion to look at PgJDBC, I'll do that.

BTW, libpqtypes (http://libpqtypes.esilo.com) might be worth
studying as well. I've not used it myself, but it claims to
offer datatype-extensible processing of binary formats.

regards, tom lane


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shay Rojansky <roji(at)roji(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:40:42
Message-ID: CAOeZVietZvW3JnUaPGC+OmWtqCLFg80dZnjVvZp39No2yPNHSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 30, 2014 at 11:06 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Shay Rojansky <roji(at)roji(dot)org> writes:
> > Thanks for the suggestion to look at PgJDBC, I'll do that.
>
> BTW, libpqtypes (http://libpqtypes.esilo.com) might be worth
> studying as well. I've not used it myself, but it claims to
> offer datatype-extensible processing of binary formats.
>
> regards, tom lane
>
>
> --
>
>
It does offer that, and is indeed a good idea in the current context.


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Shay Rojansky <roji(at)roji(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 05:46:11
Message-ID: 542A43A3.500@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/30/2014 01:09 PM, Shay Rojansky wrote:

> The idea of using extended query protocol for non-prepared queries
> raised another "best practices" question, which I'm hoping you can help
> with. To get rid of text encoding (which is locale-dependent,
> inefficient etc.) for certain fields, it seems that we have to get rid
> of it for *all* fields. This is because we send queries without knowing
> their result columns in advance, and would like to pack all messages
> into a single packet to avoid roundtrips. In other words, when we send a
> Bind message we don't yet have a RowDescription, and so we can' specify
> on an individual columnar basis which fields should be returned as
> binary and which as text. We're left with the option of either going
> "full binary" or having to support text encoding for *all* fields - the
> abolition of which was the whole point.

Even if you can't get rid of text support, dropping simple query
protocol support and the need to support client-side parameter binding
may well be a pleasant improvement.

> We can split the query process into two roundtrips - wait for the
> RowDescription and only then send Bind - in order to pick-and-choose
> text/binary on a columnar basis, but this is a non-trivial performance
> hit which we want to avoid.

It may be worth doing exactly this if you're doing large batches where
it might be a real win to use as many binary parameters as possible, but
I agree that you wouldn't want to do it for one-shot queries.

> Regarding the option of going "full binary"; selecting all types from
> pg_type shows hundreds of types. We'd have to provide binary
> encode/decode implementation for most (?) of them.

All of them except 'internal' and 'unknown', really. There are some you
may not see in common queries, but you'll still run into them when
you're looking at the system catalogs.

Quite a few are binary compatible with each other, though, so you'll
need fewer individual implementations than you might expect. Take a look
at the castmethod in pg_cast to identify groups of binary compatible types.

> In addition, say a
> new type is added (via an extension for example); text encoding at least
> had the advantage of not forcing us to support everything: the unknown
> type would be transferred as text and we'd provide that text to the user
> as an unprocessed string. Going full binary seems to eliminate this
> possibility.

It does.

> I'd like to understand what we're supposed to do, as a Postgresql
> driver. Are we supposed to:
> 1) Go full binary and implement all types (but what to do about unknown
> ones)?

It's also possible for a type not to have send/recv functions, i.e. to
support text-only use.

From the docs
(http://www.postgresql.org/docs/9.3/static/sql-createtype.html):

"The support functions input_function and output_function are required,
while the functions receive_function, send_function, ... are optional."

However, no built-in type lack binary I/O functions.

You could reasonably require that all user defined extension types must
support binary I/O. This will probably be fine in practice. As you said,
though, users would then have to install plugin for nPgSQL for each
custom type they wished to use because nPgSQL won't otherwise know what
to do with the binary data.

> 2) Do two roundtrips for queries, thereby hurting performance?
> 3) Support text encoding for all fields, and manage somehow with issues
> such as locale variations (by forcing the locale to be culture
> invariant, as we do now)?

That's pretty much what PgJDBC does, playing with extra_float_digits,
client_encoding, TimeZone, etc.

It's not lovely.

I would like to be able to specify a top-level option at Bind/Execute
time that asks the server to send binary for built-in types only, or for
a list of oids that we register ourselves as understanding binary for at
a session level. That would require a protocol change, though.

It might be worth taking some time to think about how we can help
clients get best use out of the binary protocol and noting it on the
TODO page for when we do protocol revision 4.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Shay Rojansky <roji(at)roji(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 06:15:40
Message-ID: CADT4RqB2x-YLw8ciVA06_6w=9p9g3JzEyBweVW6osU+Di7M8=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for all the answers.

Tom:

> FWIW, I'd go with text results, especially if you already have code to
> deal with that. PG's on-the-wire binary formats are more efficient to
> process in some absolute sense, but they're hardly free: you need to
> consider byte endianness for integers and floats, integer vs float
> encoding for timestamps, the difference between PG's timestamp
> representation and whatever native timestamps are on your platform,
> etc etc. It's not a trivial amount of code to deal with. And in the
> end I think the efficiency gain is pretty marginal compared to the raw
> costs of data transfer, especially if you're not on the same physical
> machine as the server.

In my mind data transfer was actually also a good reason to switch to
binary. I don't know how PG's binary timestamp looks like, but isn't it a
safe assumption that it's much more economical (and easy to parse) than any
textual representation? By the way, we already encode numbers and floats as
binary (an handle endianness) in certain contexts (parameters of prepared
statements). I don't underestimate the effort of binary implementation, but
if it's a one-time systematic effort it seems to be worth it?

> Having said that, there has been some talk of letting client libraries
> supply a "whitelist" of data types that they'd like to receive in binary.
> We could do that (modulo questions of whether it's worth incurring a
> protocol version change for), but I'm unclear on what the use case really
> is for that. Wouldn't you then have to provide an inconsistent API to
> users of your driver, that is some things are presented in text and others
> not? Is that really a great thing?

Whitelisting binary types would solve the issue entirely for us here. As a
driver, I wouldn't be exposing the whitelist to users in any way; I would
simply use it to tell Postgresql (on a session level ideally) which types
we want to receive as binary. The user would access the data in the usual
way, no perceivable API change as far as I can see.

Abhijit:

> If you're willing to hand the user an unprocessed string, why can't that
> be the binary encoding just as well as text?

I might be mistaken, but in the case of textual encoding I can just hand
over the text, as I got it from PG, to the user and let them deal with it.
With binary, I get a blob of bytes that has no meaning to anyone. I can
hand it over to the user, but they can't be expected to be able to do
anything with it...

Tom and Atri:

> TW, libpqtypes (http://libpqtypes.esilo.com) might be worth
> studying as well. I've not used it myself, but it claims to
> offer datatype-extensible processing of binary formats.

Thanks for the suggestion, I'll take a look. Since we're a pure .NET
implementation actual use of libpqtypes won't be possible, but it's
definitely possible to learn here. Although given how the protocol
currently looks like, I can't really see what could be done to support
"magical" support of binary encoding of arbitrary, unknown types...?

Craig:

> Even if you can't get rid of text support, dropping simple query
> protocol support and the need to support client-side parameter binding
> may well be a pleasant improvement.

I definitely agree when it comes to dropping client-side parameter binding
(and there seems to be an agreement on that between the devs). But for the
case non-parameterized queries, there doesn't seem to be any benefit of
using the extended protocol over the simple one (if you're still doing
text), is there?

> It's also possible for a type not to have send/recv functions, i.e. to
> support text-only use.

In that case, what would be the behavior of selecting such a type with an
extended query that specifies "all results in binary"? A PG error?

> You could reasonably require that all user defined extension types must
> support binary I/O. This will probably be fine in practice. As you said,
> though, users would then have to install plugin for nPgSQL for each
> custom type they wished to use because nPgSQL won't otherwise know what
> to do with the binary data.

This is the only true remaining point of difficulty for me. We could bite
the bullet, sit down and implement binary for everything built-in; but
eliminating the free use of extensions seems like a no-go.

> That's pretty much what PgJDBC does, playing with extra_float_digits,
> client_encoding, TimeZone, etc.
> It's not lovely.

It definitely isn't... And we have user complaints on several counts as
well. The problem is that messing around with extra_float_digits,
lc_monetary and the rest also affect some Postgresql functions which do
text conversion... With extra_float_digits, "SELECT format('%s',
0.28::double precision)" returns 0.280000000000000027 rather than 0.28. The
point is that the hacks we're doing to support textual *wire* encoding also
impact non-wire functionality and affecting users in unwanted ways.

> I would like to be able to specify a top-level option at Bind/Execute
> time that asks the server to send binary for built-in types only, or for
> a list of oids that we register ourselves as understanding binary for at
> a session level. That would require a protocol change, though.

I absolutely agree something is missing. I'd go more with whitelisting
approach as Tom said above: a session-level list of OIDs for which binary
encoding is requested for all cases throughout the session. It would cause
somewhat less breakage - no actual message format is changed (although you
suddenly start to get binary returns values in "unexpected" places, i.e.
results of extended queries which requested "all text"). It would solve the
problem in an ideal way.

Shay

On Tue, Sep 30, 2014 at 7:46 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 09/30/2014 01:09 PM, Shay Rojansky wrote:
>
> > The idea of using extended query protocol for non-prepared queries
> > raised another "best practices" question, which I'm hoping you can help
> > with. To get rid of text encoding (which is locale-dependent,
> > inefficient etc.) for certain fields, it seems that we have to get rid
> > of it for *all* fields. This is because we send queries without knowing
> > their result columns in advance, and would like to pack all messages
> > into a single packet to avoid roundtrips. In other words, when we send a
> > Bind message we don't yet have a RowDescription, and so we can' specify
> > on an individual columnar basis which fields should be returned as
> > binary and which as text. We're left with the option of either going
> > "full binary" or having to support text encoding for *all* fields - the
> > abolition of which was the whole point.
>
> Even if you can't get rid of text support, dropping simple query
> protocol support and the need to support client-side parameter binding
> may well be a pleasant improvement.
>
> > We can split the query process into two roundtrips - wait for the
> > RowDescription and only then send Bind - in order to pick-and-choose
> > text/binary on a columnar basis, but this is a non-trivial performance
> > hit which we want to avoid.
>
> It may be worth doing exactly this if you're doing large batches where
> it might be a real win to use as many binary parameters as possible, but
> I agree that you wouldn't want to do it for one-shot queries.
>
> > Regarding the option of going "full binary"; selecting all types from
> > pg_type shows hundreds of types. We'd have to provide binary
> > encode/decode implementation for most (?) of them.
>
> All of them except 'internal' and 'unknown', really. There are some you
> may not see in common queries, but you'll still run into them when
> you're looking at the system catalogs.
>
> Quite a few are binary compatible with each other, though, so you'll
> need fewer individual implementations than you might expect. Take a look
> at the castmethod in pg_cast to identify groups of binary compatible types.
>
> > In addition, say a
> > new type is added (via an extension for example); text encoding at least
> > had the advantage of not forcing us to support everything: the unknown
> > type would be transferred as text and we'd provide that text to the user
> > as an unprocessed string. Going full binary seems to eliminate this
> > possibility.
>
> It does.
>
> > I'd like to understand what we're supposed to do, as a Postgresql
> > driver. Are we supposed to:
> > 1) Go full binary and implement all types (but what to do about unknown
> > ones)?
>
> It's also possible for a type not to have send/recv functions, i.e. to
> support text-only use.
>
> From the docs
> (http://www.postgresql.org/docs/9.3/static/sql-createtype.html):
>
> "The support functions input_function and output_function are required,
> while the functions receive_function, send_function, ... are optional."
>
> However, no built-in type lack binary I/O functions.
>
> You could reasonably require that all user defined extension types must
> support binary I/O. This will probably be fine in practice. As you said,
> though, users would then have to install plugin for nPgSQL for each
> custom type they wished to use because nPgSQL won't otherwise know what
> to do with the binary data.
>
> > 2) Do two roundtrips for queries, thereby hurting performance?
> > 3) Support text encoding for all fields, and manage somehow with issues
> > such as locale variations (by forcing the locale to be culture
> > invariant, as we do now)?
>
> That's pretty much what PgJDBC does, playing with extra_float_digits,
> client_encoding, TimeZone, etc.
>
> It's not lovely.
>
> I would like to be able to specify a top-level option at Bind/Execute
> time that asks the server to send binary for built-in types only, or for
> a list of oids that we register ourselves as understanding binary for at
> a session level. That would require a protocol change, though.
>
> It might be worth taking some time to think about how we can help
> clients get best use out of the binary protocol and noting it on the
> TODO page for when we do protocol revision 4.
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shay Rojansky <roji(at)roji(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fwd: Proper query implementation for Postgresql driver
Date: 2014-09-30 06:27:47
Message-ID: 31640.1412058467@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ too tired to respond to the other points, but: ]

Shay Rojansky <roji(at)roji(dot)org> writes:
>> It's also possible for a type not to have send/recv functions, i.e. to
>> support text-only use.

> In that case, what would be the behavior of selecting such a type with an
> extended query that specifies "all results in binary"? A PG error?

Yup.

if (!OidIsValid(pt->typsend))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_FUNCTION),
errmsg("no binary output function available for type %s",
format_type_be(type))));

There's an exactly parallel error if you try to send a parameter in
binary when its datatype hasn't got a typreceive function.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Shay Rojansky <roji(at)roji(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proper query implementation for Postgresql driver
Date: 2014-10-02 19:30:16
Message-ID: CA+TgmoYJ382Lte95w1XZN_mz--d_xUE1iUQ=7ZT7faOjH_r6NA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 30, 2014 at 1:20 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> Frankly, I suggest dropping "simple" entirely and using only the
> parse/bind/describe/execute flow in the v3 protocol.

The last time I checked, that was significantly slower.

http://www.postgresql.org/message-id/CA+TgmoYJKfnMrtMhODwhNoj1jwcgzs_H1R70erCEcrWJM65DUQ@mail.gmail.com

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company