Re: Slowness of extended protocol

From: Shay Rojansky <roji(at)roji(dot)org>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Slowness of extended protocol
Date: 2016-08-09 19:18:40
Message-ID: CADT4RqDOhgjOgKEK4qQUrJctar5extfpqfmPuxxDqd_q76=fUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 9, 2016 at 3:42 PM, Vladimir Sitnikov <
sitnikov(dot)vladimir(at)gmail(dot)com> wrote:

> Shay>But here's the more important general point. We're driver
> developers, not application developers. I don't really know what
> performance is "just fine" for each of my users, and what is not worth
> optimizing further. Users may follow best practices, or they may not for
> various reasons.
>
> Of course we cannot benchmark all the existing applications, however we
> should at lest try to use "close to production" benchmarks.
>
> Let me recap: even "select 1" shows clear advantage of reusing
> server-prepared statements.
> My machine shows the following results for "select 1 pgbench":
> simple: 38K ops/sec (~26us/op)
> extended: 32K ops/sec (~31us/op)
> prepared: 47K ops/sec (~21us/op)
>
> Note: reusing server-prepared statements shaves 10us (out of 31us), while
> "brand new ParseBindExecDeallocate" message would not able to perform
> better than 26us/op (that is 5 us worse than the prepared one). So it makes
> much more sense investing time in "server-prepared statement reuse" at the
> client side and "improving Bind/Exec performance" at the backend side.
>
> For more complex queries the gap (prepared vs simple) would be much bigger
> since parse/validate/plan for a complex query is much harder operation than
> the one for "select 1"
>

You seem to be misunderstanding the fundamental point here. Nobody is
saying that prepared statements aren't a huge performance booster - they
are. I recommend them to anyone who asks. But you're saying "let's not
optimize anything else", whereas there are many programs out there *not*
using prepared statements for various reasons (e.g. pgbouncer, or simply an
existing codebase). If your opinion is that nothing should be done for
these users, fine - nobody's forcing you to do anything. I simply don't see
why *not* optimize the very widely-used single-statement execution path.

> Note: I do not mean "look, prepared always win". I mean: "if your
> environment does not allow reuse of prepared statements for some reason,
> you lose huge amount of time on re-parsing the queries, and it is worth
> fixing that obvious issue first".
>

Maybe it's worth fixing it, maybe not - that's going to depend on the
application. Some applications may be huge/legacy and hard to change,
others may depend on something like pgbouncer which doesn't allow it. Other
drivers out there probably don't persist prepared statements across
close/open, making prepared statements useless for short-lived scenarios.
Does the Python driver persist prepared statements? Does the Go driver do
so? If not the single-execution flow is very relevant for optimization.

> Shay>I don't see how reusing SQL text affects security in any way.
>
> Reusing SQL text makes application more secure as "build SQL on the fly"
> is prone to SQL injection security issues.
> So reusing SQL text makes application more secure and it enables
> server-prepared statements that improve performance considerably. It is a
> win-win.
>

We've all understood that server-prepared statements are good for
performance. But they aren't more or less vulnerable to SQL injection -
developers can just as well concatenate user-provided strings into a
prepared SQL text.

[deleting more comments trying to convince that prepared statements are
great for performance, which they are]

Shay>Again, in a world where prepared statements aren't persisted across
> connections (e.g. pgbouncer)
>
> pgbouncer does not properly support named statements, and that is
> pbgouncer's issue.
>
> Here's the issue for pgbouncer project: https://github.com/
> pgbouncer/pgbouncer/issues/126#issuecomment-200900171
> The response from pgbouncer team is "all the protocol bits are there, it
> is just implementation from pgbouncer that is missing".
>
> By the way: I do not use pgbouncer, thus there's no much interest for me
> to invest time in fixing pgbouncer's issues.
>

Um, OK... So you're not at all bothered by the fact that the (probably)
most popular PostgreSQL connection pool is incompatible with your argument?
I'm trying to think about actual users and the actual software they use, so
pgbouncer is very relevant.

Shay>Any scenario where you open a relatively short-lived connection and
> execute something once is problematic - imagine a simple web service which
> needs to insert a single record into a table.
>
> I would assume the application does not use random string for a table name
> (and columns/aliases), thus it would result in typical SQL text reuse, thus
> it should trigger "server-side statement prepare" logic. In other way, that
> kind of application does not need the "new ParseBindExecDeallocate message
> we are talking about".
>

I wouldn't assume anything. Maybe the application does want to change
column names (which can't be parameterized). Maybe it needs the extended
protocol simply because it wants to do binary encoding, which isn't
possible with the simple protocol (that's the case with Npgsql). There are
many types of applications out there, I try in general not to assume things
about them.

It would be great to get some external opinions on this debate (and
apologies about the email volume here too).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2016-08-09 19:35:06 phrase search TS_phrase_execute code readability patch
Previous Message Ryan Pedela 2016-08-09 19:01:11 Re: 9.6 phrase search distance specification