Cached Query Plans (was: global prepared statements)

From: PFC <lists(at)peufeu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Cached Query Plans (was: global prepared statements)
Date: 2008-04-11 16:34:41
Message-ID: op.t9ggb3wacigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Well, I realized the idea of global prepared statements actually sucked,
so I set on another approach thanks to ideas from this list, this is
caching query plans.

First, let's see if there is low hanging fruit with the typical small,
often-executed queries that are so frequent on websites.
Tables test, test2 and test3 contain id (integer primary key) and another
integer field. There are 100K rows in each.

First, the simplest query :
SELECT * FROM test WHERE id = $1

110 us : Send query as text (PHP:pg_query -> PQexec)
125 us : Parse+Bind (PHP:pg_query_params -> PQexecParams)
67 us : Execute a previously prepared statement (PHP:pg_execute ->
PQexecPrepared)

A slightly more complex one but still pretty classic :
SELECT * FROM (SELECT * FROM test WHERE id>$1 ORDER BY id LIMIT 5) AS a
NATURAL LEFT JOIN test2 NATURAL LEFT JOIN test3 ORDER BY id

523 us : Send query as text (PHP:pg_query -> PQexec)
580 us : Parse+Bind (PHP:pg_query_params -> PQexecParams)
148 us : Execute a previously prepared statement (PHP:pg_execute ->
PQexecPrepared)

OK, so there is low hanging fruit since the parsing+planning time of those
is longer than doing the query itself.

Since the Parse message includes a $-parameterized query that is to be
prepared, it seems logical to put the caching logic there : the query
string (without parameters) makes a nice cache key.

So I made a really quick and really dirty experimentation without changing
the wire protocol between client and server. This is only "proof of
concept".

Try #1 : in exec_parse_message(), if the statement is named, look it up in
the prepared statements cache, if it is found, return at once and do
nothing else.
To exploit this, I issue a pg_prepare() followed by pg_execute() at every
query, wether or not the statement exists. If it already exists,
pg_prepare() now does nothing (except losing a little time).

Results :
88 us : simple query
173 us : complex query

So, the timings are between a simple execute and a plan+execute. It
provides a nice performance gain versus replanning every time, but not
perfect.

Try #2 : again, in exec_parse_message(), if the statement is unnamed, I
use the query string as the statement name, search the plan in the
prepared statements hash table. If it is not found, then it is prepared.
Then I make the unnamed statement plan point to this. Of course, this is
dangerous since it probably introduces a dozen crash bugs, but for this
proof of concept, it's OK.
Client code is unchanged, PQexecParams will benefit from the plan caching,
since it always sends a Parse+Bind message using the unnamed statement.

Results are identical to executing an execute on a prepared statement,
modulo a few microseconds.
This means the overhead of sending the Parse message, and of the server
ignoring it when the statement is cached, is negligible.

So, where to go from that ? I don't see a way to implement this without a
(backwards-compatible) change to the wire protocol, because the clients
will want to specify when a plan should be cached or not. Since the user
should not have to name each and every one of the statements they want to
use plan caching, I see the following choices :

- Add a new Parse+Bind command, which gets the $-parameterized SQL and
the parameters. If the plan is cached, grab it and execute, else prepare
and execute. Add a flag to allow the client to specify if he wants caching
or not.
Pros : Only one message, faster
Cons : SQL is transmitted in full, useless most of the time, but this
overhead is rather small.

- Send the SQL with Bind as statement name, add a flag to Bind telling it
to report a cache miss instead of raising an error, then have the client
send a Parse and Bind again.

- Should there be one common hashtable for named prepared statements and
cached plans, or two hashtables ? Using the SQL string as the statement
name is not clean.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Dunstan 2008-04-11 16:35:26 Re: Commit fest queue
Previous Message Tom Lane 2008-04-11 16:23:06 Re: Index AM change proposals, redux