Re: Slowness of extended protocol

From: Shay Rojansky <roji(at)roji(dot)org>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Slowness of extended protocol
Date: 2016-08-10 19:39:49
Message-ID: CADT4RqALgnR0WaomyNWiJ0hZS+C=YXQ6u5aOjKsaUkRX_JmoGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Some comments...

For the record, I do find implicit/transparent driver-level query
preparation interesting and potentially useful, and have opened
https://github.com/npgsql/npgsql/issues/1237 to think about it - mostly
based on arguments on this thread. One big question mark I have is whether
this behavior should be opt-out as in pgjdbc, rather than opt-in. Like
others in this thread (I think), I prefer my libraries and drivers very
simple, following my exact API calls to the letter and doing minimal magic
under the hood. As Tom said, if an application can benefit from preparing,
the developer has the responsibility (and also the best knowledge) to
manage preparation, not the driver. Magical behavior under the hood causes
surprises, hard-to-diagnose bugs etc.

One interesting case where implicit preparation is unbeatable, though, is
when users aren't coding against the driver directly but are using some
layer, e.g. an ORM. For example, the Entity Framework ORM simply does not
prepare statements (see
https://github.com/aspnet/EntityFramework/issues/5459 which I opened). This
is one reason I find it a valuable feature to have, although probably as
opt-in and not opt-out.

Regarding driver-level SQL parsing, Vladimir wrote:

> Unfortunately, client-side SQL parsing is inevitable evil (see below on
'?'), so any sane driver would cache queries any way. The ones that do not
have query cache will perform badly anyway.

First, it's a very different thing to have a query cache in the driver, and
to implicitly prepare statements.

Second, I personally hate the idea that drivers parse SQL and rewrite
queries. It's true that in many languages database APIs have "standardized"
parameter placeholders, and therefore drivers have no choice but to
rewrite. ADO.NET (the .NET database API) actually does not do this -
parameter placeholders are completely database-dependent (see
https://msdn.microsoft.com/en-us/library/yy6y35y8%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396).
Npgsql does rewrite queries to provide named parameters, but the next
version is going to have a "raw query" mode in which no parsing/rewriting
happens whatsoever. It simply takes your string, packs it into a Parse (or
Query), and sends it off. This is not necessarily going to be the main way
users use Npgsql, but it's important to note that query parsing and
rewriting isn't an "inevitable evil".

Vladimir wrote:

> This new message would be slower than proper query cache, so why should
we all spend time on a half-baked solution?

The cases where the prepared statement path doesn't work have already been
listed many times - pgbouncer (and other pools), drivers which don't
support persisting prepared statement across pooled connection open/close,
and people (like many in this conversation) who don't appreciate their
drivers doing magic under the hood. This is why I thing both proposals are
great - there's nothing wrong with query caching (or more precisely,
implicit statement preparation by the driver), especially if it's opt-in,
but that doesn't mean we shouldn't optimize things for people who don't,
can't or won't go for that.

> To be fair, implementing a cache is a trivial thing when compared with
hard-coding binary/text formats for all the datatypes in each and every
language.
> Remember, each driver has to implement its own set of procedures to
input/output values in text/binary format, and that is a way harder than
implementing the cache we are talking about.

I agree with that, but it's not a question of how easy it is to implement
implicit preparation. It's a question of whether driver developers choose
to do this, based on other considerations as well - many people here think
it's not the job of the driver to decide for you whether to prepare or not,
for example. It has nothing to do with implementation complexity.

> "cached plan cannot change result type" -- PostgreSQL just fails to
execute the server-prepared statement if a table was altered.

That alone seems to be a good reason to make implicit preparation opt-in at
best. How exactly do users cope with this in pgjdbc? Do they have some
special API to flush (i.e. deallocate) prepared statements which they're
supposed to use after a DDL? Do you look at the command tag in the
CommandComplete to know whether a command was DDL or not?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2016-08-10 19:41:04 Re: Heap WARM Tuples - Design Draft
Previous Message Simon Riggs 2016-08-10 19:37:07 Re: Heap WARM Tuples - Design Draft