Re: Slowness of extended protocol

From: Shay Rojansky <roji(at)roji(dot)org>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Greg Stark <stark(at)mit(dot)edu>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Slowness of extended protocol
Date: 2016-08-11 08:11:38
Message-ID: CADT4RqBCMvRzZHO6iomBpdOT_=9nztNsYw-4rDmA4qtBbegjpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Vladimir wrote:

Shay> As Tom said, if an application can benefit from preparing, the
> developer has the responsibility (and also the best knowledge) to manage
> preparation, not the driver. Magical behavior under the hood causes
> surprises, hard-to-diagnose bugs etc.
>
> Why do you do C# then?
> Aren't you supposed to love machine codes as the least magical thing?
> Even x86 does not describe "the exact way the code should be executed".
> All the CPUs shuffle the instructions to make it faster.
>

These are valid questions. I have several answers to that.

First, it depends on the extent to which the "under the hood" activity may
have visible side-effects. CPU instruction reordering is 100% invisible,
having no effect whatsoever on the application apart from speeding it up.
Prepared statements can have very visible effects apart from the speedup
they provide (e.g. failure because of schema changes). It's not that these
effects can't be worked around - they can be - but programmers can be
surprised by these effects, which can cause difficult-to-diagnose issues.
Nobody needs to be really aware of CPU instruction reordering because it
will never cause an application issue (barring a CPU bug).

Another important visible effect of preparing a statement is server-side
cost (i.e. memory). I'm not sure what the overhead is, but it is there and
an irresponsible "prepare everything everywhere" can create a significant
resource drain on your server. I know pgjdbc has knobs for controlling how
many statements are prepared, but in a large-scale performance-sensitive
app a programmer may want to manually decide, on a per-query basis, whether
they want to prepare or not.

But the most important question is that this is a choice that should be
left to the developer, rather than imposed. In some cases developers *do*
drop down to assembly, or prefer to call system calls directly to be in
control of exactly what's happening. They should be able to do this. Now,
of course you can provide an option which disables implicit preparing, but
if you refuse to optimize non-prepared paths you're effectively forcing the
programmer to go down the prepared statement path.

Shay>As Tom said, if an application can benefit from preparing, the
> developer has the responsibility
>
> Does developer have the responsibility to choose between "index scan" and
> "table seq scan"? So your "developer has the responsibility" is like
> building on sand.
>

I'm very glad you raised that point. The programmers indeed outsources that
decision to the database, because the database has the most *knowledge* on
optimal execution (e.g. which indices are defined). Implicit preparing, on
the other hand, has nothing to do with knowledge: as Tom said, the driver
knows absolutely nothing about the application, and doesn't have any
advantage over the programmer in making that decision.

But note that the DBA *does* have to decide which indices exist and which
don't. Would you accept a database that automatically creates indices based
on use, i.e. "x queries caused full table scans, so I'll silently create an
index here"? I wouldn't, it would be utter nonsense and create lots of
unpredictability in terms of performance and resource drain.

My experience shows, that people are very bad at predicting where the
> performance problem would be.
> For 80% (or even 99%) of the cases, they just do not care thinking if a
> particular statement should be server-prepared or not.
> They have absolutely no idea how much resources it would take and so on.
>

And for those people it's great to have an implicit preparation feature,
most probably opt-in. But that shouldn't mean we shouldn't optimize things
for the rest.

> ORMs have no that notion of "this query must be server-prepared, while
> that one must not be".
> And so on.
>
> It is somewhat insane to assume people would use naked SQL. Of course they
> would use ORMs and alike, so they just would not be able to pass that
> additional parameter telling if a particular query out of thousands should
> be server-prepared or not.
>

Uh, I really wouldn't make statements like that if you want to be taken
seriously. Tons of applications use naked SQL and avoid ORMs for many
reasons (e.g. performance benefits of hand-crafted SQL), such general
statements on what applications do in the world are, well, silly. Again,
ORMs are an argument for why implicit preparation should exist (I made that
argument myself above), but they're not an argument for why optimizing
other paths should be excluded.

> Vladimir> "cached plan cannot change result type" -- PostgreSQL just fails
> to execute the server-prepared statement if a table was altered.
>
> Shay>How exactly do users cope with this in pgjdbc? Do they have some
> special API to flush (i.e. deallocate) prepared statements which they're
> supposed to use after a DDL?
>
> First of all, pgjdbc does report those problems to hackers.
> Unfortunately, it is still "not implemented".
> Then, a workaround at pgjdbc side is made.
> Here's relevant pgjdbc fix: https://github.com/pgjdbc/pgjdbc/pull/451
>
> It analyzes error code, and if it finds "not_implemented
> from RevalidateCachedQuery", then it realizes it should re-prepare.
> Unfortunately, there is no dedicated error code, but at least there's a
> routine name.
>

That's exactly the kind of behavior I don't like in libraries/drivers -
implicit behavior which may trigger errors, which the library than has to
(transparently!) recover from. It makes applications very brittle. What
happens if you happen to be in a transaction when this error occurs? When
an error occurs, PostgreSQL puts the ongoing transaction in state "failed"
and requires you to rollback - how can you possibly recover from that?

Shay>it have been listed many times - pgbouncer
>
> Let's stop discussing pgbouncer issue here?
> It has absolutely nothing to do with pgsql-hackers.
>

You have the right to think so, but I disagree. PostgreSQL is part of an
ecosystem, and pgbouncer is a big part of that ecosystem (as are other
potential pools).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2016-08-11 08:39:05 Re: Slowness of extended protocol
Previous Message Michael Paquier 2016-08-11 07:21:51 Re: new autovacuum criterion for visible pages