Patch : Global Prepared Statements

Lists: pgsql-hackers
From: PFC <lists(at)peufeu(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: first time hacker ;) messing with prepared statements
Date: 2008-03-30 07:16:34
Message-ID: op.t8tihwutcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,
So, I embarked (yesterday) on a weekend project to add a new feature to
Postgres...
I use PHP with persistent connections and always have been bothered that
those very small AJAX queries (usually simple selects returning 1 row)
take more CPU in postgres to parse & plan than to actually execute.
Since those small queries usually come in great numbers, I would like to
PREPARE them beforehand and use php's pg_exec(), (faster than SQL
EXECUTE). Saves about 50% CPU time on the server for those small queries.
However with persistent connections there is a problem : you never know
if the query has already been prepared or not.
Ideally a PHP process would open a persistent connection and find all
queries already prepared, ready to execute...

So :

- Added a system catalog "pg_global_prepared" (one per database actually)
which contains :
- oid of user who created the row
- name of statement
- SQL command for preparing statement

example :

test=# SELECT * FROM pg_global_prepared ;
stmt_owner | stmt_name
| stmt_sql
------------+-----------------+-------------------------------------------------------------------------------------------------------
10 | test | PREPARE test (INTEGER) AS SELECT $1+3;
10 | test_plan_pk | PREPARE test_plan_pk (INTEGER) AS SELECT *
FROM test WHERE id = $1;
10 | test_plan_order | PREPARE test_plan_order (INTEGER) AS
SELECT * FROM test WHERE value < $1 ORDER BY value DESC LIMIT 1;

- Added sql command GLOBAL PREPARE foo (arg types) AS sql query
This inserts a row in the above catalog after having run a standard
"prepare" on the query to test its validity

- Added sql command GLOBAL DEALLOCATE
This removes row(s) from the above catalog, (only those owned by the
current user)

- Messed with EXECUTE (ExecuteQuery) so that :
- if the requested statement is found in session cache, use it (as usual)
- if not, look into pg_global_prepared to see if there is one of the same
name and created by same user
- if found, use this to PREPARE, then store in session cache, then
execute it

After that I put this logic in FetchPreparedStatement instead so if it is
asked to fetch a non-existing statement for which there is a row in
pg_global_prepared, it will create it.

test=# EXPLAIN ANALYZE EXECUTE test_plan_pk(1);
NOTICE: prepared statement "test_plan_pk" does not exist in local session
cache, now searching pg_global_prepared for a template to create it.
NOTICE: found template for requested statement, executing :
"test_plan_pk" :
NOTICE: PREPARE test_plan_pk (INTEGER) AS SELECT * FROM test WHERE id =
$1;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Index Scan using test_pkey on test (cost=0.00..8.28 rows=1 width=8)
(actual time=19.476..19.478 rows=1 loops=1)
Index Cond: (id = $1)
Total runtime: 0.079 ms
(3 lignes)

So, you take whatever persistent connection from a pool and issue an
EXECUTE without worries.

***** Now, the problem :

- EXECUTE, EXPLAIN EXECUTE, EXPLAIN ANALYZE EXECUTE all work
- pg_exec() from php makes it crash

Actually pg_exec() does not use SQL EXECUTE, I think it uses the new
extended query protocol and just sends a message to execute a named
prepared query.
In that case, my code in FetchPreparedStatement crashes :

NOTICE: prepared statement "test_plan_pk" does not exist in local session
cache, now searching pg_global_prepared for a template to create it.
LOG: server process (PID 30692) was terminated by signal 11: Segmentation
fault
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing

GDB says it is because CurrentResourceOwner is NULL. Did I forger to
initialize something ? lol.

I'll post more details and complete traceback this afternoon, but here is
the problematic bit of code, this is the code that finds the SQL to
prepare a statement.
Thanks for any suggestion ;)

Relation mycatalog;
HeapTuple tup;
TupleDesc dsc;
NameData stmt_name_data;
ScanKeyData skey[2];
SysScanDesc scan;
Datum datum;
bool found = false;
bool isnull;
const char *sql = "";

namestrcpy(&stmt_name_data, stmt_name);
mycatalog = heap_open(GlobalPreparedRelationId, RowExclusiveLock); <====
crashes here
dsc = RelationGetDescr( mycatalog );
ScanKeyInit(&skey[0],
Anum_pg_global_prepared_stmt_owner,
BTEqualStrategyNumber, F_OIDEQ,
GetUserId());
ScanKeyInit(&skey[1],
Anum_pg_global_prepared_stmt_name,
BTEqualStrategyNumber, F_NAMEEQ,
NameGetDatum(&stmt_name_data));
scan = systable_beginscan(mycatalog, GlobalPreparedIndexId, true,
SnapshotNow, 2, skey);
if( HeapTupleIsValid(tup = systable_getnext(scan)) )
{
datum = heap_getattr( tup, Anum_pg_global_prepared_stmt_sql, dsc, &isnull
);
if( !isnull )
found = true;
}
systable_endscan(scan);
heap_close(mycatalog, RowExclusiveLock);
if( found )
{
int er;

/* found the statement, now prepare it, so this session will have it in
cache for the next EXECUTEs */
sql = DatumGetCString(DirectFunctionCall1(textout, datum));
ereport(NOTICE, (errmsg("found template for requested statement,
executing : \"%s\" :\n%s", stmt_name, sql )));

if (SPI_connect() != SPI_OK_CONNECT)
elog(ERROR, "SPI_connect failed");

er = SPI_execute( sql, false, 0 );

if (SPI_finish() != SPI_OK_FINISH)
elog(ERROR, "SPI_finish failed");

if( er != SPI_OK_UTILITY )
elog(ERROR, "failed to prepare statement, SPI_execute code %d", er );

// find it again
if (prepared_queries)
entry = (PreparedStatement *) hash_search(prepared_queries, stmt_name,
HASH_FIND, NULL);
}


From: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
To: PFC <lists(at)peufeu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-30 07:36:01
Message-ID: 47EF42E1.1080507@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

PFC wrote:
> Hello,
> So, I embarked (yesterday) on a weekend project to add a new
> feature to Postgres...
> I use PHP with persistent connections and always have been
> bothered that those very small AJAX queries (usually simple selects
> returning 1 row) take more CPU in postgres to parse & plan than to
> actually execute.
Microsoft's answer to this issue with SQLServer appears to have been to
introduce a smart
cache for all statement plans. It seems to be very effective. I guess
you're doing much the
same thing but with more user intervention, in effect.

Are you sure that you application wouldn't benefit more from a MOM
solution with
persisted database connections? Have you looked at
http://safmq.sourceforge.net/?

James


From: PFC <lists(at)peufeu(dot)com>
To: "James Mansion" <james(at)mansionfamily(dot)plus(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-30 14:41:51
Message-ID: op.t8t231kycigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> PFC wrote:
>> Hello,
>> So, I embarked (yesterday) on a weekend project to add a new
>> feature to Postgres...
>> I use PHP with persistent connections and always have been bothered
>> that those very small AJAX queries (usually simple selects returning 1
>> row) take more CPU in postgres to parse & plan than to actually execute.
> Microsoft's answer to this issue with SQLServer appears to have been to
> introduce a smart
> cache for all statement plans. It seems to be very effective. I guess
> you're doing much the
> same thing but with more user intervention, in effect.

Actually, the main purpose was to 1) have fun hacking Postgres, and 2)
perhaps something useful would come of it...
And I did find it very interesting, probably due to the fact that
Postgres source code is so... impressively clean... well organized...
readable... it took two hours from downloading the source to having an
extra functional system catalog, and it worked at the first compile, all
due to the concise but to the point comments in include/catalog, I
couldn't believe it.

Anyway, Microsoft's solution is cool, too, but you need to reparse the
entire query to then detect "I've planned this query before, with other
parameters, so I'll reuse that prepared plan", so it adds another parsing
step, which is less efficient.
Postgres could also do that with a small modification, by the way : like
by using the entire string (with $1 style parameters) instead of the
statement name, use that as a cache key, and send parameters separately,
but I think it would be less clean than, say, a statement called
"get_user_by_id" or something.
Also I like the idea of named prepared queries, which feel a bit like
procedures, because many of those small, often-used queries would end up
being defined in the same place, which makes schema changes (and having to
modify queries in your application) slightly less painful.
PHP can also use pg_exec() which sends the parameters separately,
automagically converted to postgres format, so you save time and hassle on
quoting and de-quoting versus a SQL EXECUTE.
Since the performance of pg_exec() is almost twice as fast as plain SQL,
and PHP scripts tend to use quite a lot of these queries, it also means a
free database server performance upgrade (ie. the same DB server can
handle more webservers, for instance).

Another possible implementation would be to use a connection pooler
which, when opening a new connection, can be configured to send a SQL
script containing all the PREPARE statements.

> Are you sure that you application wouldn't benefit more from a MOM
> solution with
> persisted database connections? Have you looked at
> http://safmq.sourceforge.net/?

Dunno. Is this related to Postgres ?


From: PFC <lists(at)peufeu(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Patch : Global Prepared Statements
Date: 2008-03-30 17:57:13
Message-ID: op.t8ub5nkxcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Here is the result of my little experiment, for your viewing pleasure,
lol.
Now it works. Well, it hasn't crashed yet... so I guess I can show it to
people ;)

- Purpose :

Allow PHP (or other languages) users to use prepared statements
(pg_exec()) together with persistent connections, while not bothering
about preparing all those statements when the persistent connection is
first created (you have no way of knowing, from PHP, if your persistent
connection is new or second-hand, by the way).

- What it does :

New commands :
GLOBAL PREPARE, does the same as PREPARE, except for all current and
future sessions FROM THE SAME USER instead of just the current session.
GLOBAL DEALLOCATE does almost what you'd expect.

- Results

Here is a comparison between sending the query as string (oldskool) with
parameters in it, and executing a prepared statement via pg_exec() (which
emits a BIND, not a SQL EXECUTE)

CREATE TABLE test( id SERIAL PRIMARY KEY, value INTEGER );
INSERT INTO test (value) SELECT random()*1000000 FROM
generate_series( 1,100000 );
CREATE INDEX test_value ON test( value );
CREATE TABLE test2 (id INTEGER PRIMARY KEY REFERENCES test( id ) ON DELETE
CASCADE, value2 TEXT );
INSERT INTO test2 SELECT id, (random()*1000000)::TEXT FROM test;
CREATE TABLE test3 (id INTEGER PRIMARY KEY REFERENCES test( id ) ON DELETE
CASCADE, value3 INTEGER );
INSERT INTO test3 SELECT id, random()*1000000 FROM test;

- simple SELECT : SELECT * FROM test WHERE id = $1
=> Server CPU load : -46% (almost cut in half)
=> Queries/s from PHP including PHP overhead : +71%

- three table JOIN with order by/limit :
SELECT * FROM test a NATURAL JOIN test2 b NATURAL JOIN test3 c WHERE
a.value < $1 ORDER BY value DESC LIMIT 1

=> Server CPU load : -84% (ie uses 6 times less CPU)
=> Queries/s from PHP including PHP overhead : +418% (ie 5 times faster)

(By the way, with pg_exec on those simple queries, Postgres beats InnoDB
and MyISAM thoroughly in both queries/s (seen by client) and server CPU
used by query).

Note that the patch has no influence on these results whatsoever.
It just allows a PHP coder to easily use the pure goodness of pg_exec()
without hassle.

- Implementation :

* a new system catalog (I snipped the query texts, but they are stored
entirely).

test=# SELECT * FROM pg_global_prepared ;
stmt_owner | stmt_name | stmt_sql
------------+-------------------+-------------------------------------------
10 | test_plan_order_3 | PREPARE test_plan_order_3 (INTEGER) AS
SEL
10 | test_plan_pk_3 | PREPARE test_plan_pk_3 (INTEGER) AS
SELECT
10 | test_plan_pk | PREPARE test_plan_pk (INTEGER) AS SELECT
*
10 | test_plan_order | PREPARE test_plan_order (INTEGER) AS
SELEC

* GLOBAL PREPARE foo AS SELECT ...
Checks syntax, does a PREPARE, if it works, inserts row in
pg_global_prepared with user OID, statement name, and PREPARE command.

* GLOBAL DEALLOCATE foo
Deletes the line from pg_global_prepared
Note that this does not make all other open connections forget the plan ;)

* In the code :

FetchPreparedStatement() gets a new sister,
FetchOrCreatePreparedStatement().
- try to find statement in the session cache (usual behaviour)
found -> ok, we're done
- if transaction is aborted, byebye
- look in pg_global_prepared for the requested statement name and the
current user ID.
not found -> byebye
- SPI_execute the PREPARE statement
- return the prepared statement to caller

SQL EXECUTE, the BIND handler in postgres.c, and some others got calls to
FetchOrCreatePreparedStatement().
So if you pg_exec() or EXECUTE or EXPLAIN EXECUTE, etc, a statement that
does not exist, the backend will PREPARE it for you if it can find it in
pg_global_prepared.

Reason for the user OID is that independent users mst not be able to
modify each other's prepared statements...

Patch is here (against 8.3.1)
http://home.peufeu.com/pg/

This is certainly not ready for production, lol. But since I had fun
writing this, I'd rather share it.

Regards,
Pierre


From: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
To: PFC <lists(at)peufeu(dot)com>
Cc: James Mansion <james(at)mansionfamily(dot)plus(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-30 19:23:58
Message-ID: 47EFE8CE.7030407@familiedobbelsteen.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

PFC wrote:
>> PFC wrote:
>>> Hello,
>>> So, I embarked (yesterday) on a weekend project to add a new
>>> feature to Postgres...
>>> I use PHP with persistent connections and always have been
>>> bothered that those very small AJAX queries (usually simple selects
>>> returning 1 row) take more CPU in postgres to parse & plan than to
>>> actually execute.
>> Microsoft's answer to this issue with SQLServer appears to have been
>> to introduce a smart
>> cache for all statement plans. It seems to be very effective. I guess
>> you're doing much the
>> same thing but with more user intervention, in effect.
> Actually, the main purpose was to 1) have fun hacking Postgres, and 2)
> perhaps something useful would come of it...
Nice job so far!
Since you have achieved (1), now its time to get to (2) and I thinks its
very well possible. In general its quite nice to have parts of your work
integrated.
> And I did find it very interesting, probably due to the fact that
> Postgres source code is so... impressively clean... well organized...
> readable... it took two hours from downloading the source to having an
> extra functional system catalog, and it worked at the first compile,
> all due to the concise but to the point comments in include/catalog, I
> couldn't believe it.
> Anyway, Microsoft's solution is cool, too, but you need to reparse
> the entire query to then detect "I've planned this query before, with
> other parameters, so I'll reuse that prepared plan", so it adds
> another parsing step, which is less efficient.
The MAJOR benefit of Microsoft's approach is that it works on existing
application, and, most importantly makes NO assumptions on the
"volatile" server state. A few cases where the Microsoft solution works,
while yours will fail is:

* Server restart and assorted like failover (you need to redo a
global prepare).
* Cleanup and instantiation of a prepared statement.

> Postgres could also do that with a small modification, by the way
> : like by using the entire string (with $1 style parameters) instead
> of the statement name, use that as a cache key, and send parameters
> separately, but I think it would be less clean than, say, a statement
> called "get_user_by_id" or something.
> Also I like the idea of named prepared queries, which feel a bit
> like procedures, because many of those small, often-used queries would
> end up being defined in the same place, which makes schema changes
> (and having to modify queries in your application) slightly less painful.
> PHP can also use pg_exec() which sends the parameters separately,
> automagically converted to postgres format, so you save time and
> hassle on quoting and de-quoting versus a SQL EXECUTE.
> Since the performance of pg_exec() is almost twice as fast as
> plain SQL, and PHP scripts tend to use quite a lot of these queries,
> it also means a free database server performance upgrade (ie. the same
> DB server can handle more webservers, for instance).
What you are doing for a global query cache is already in consideration
and having plan invalidation mechanism on schema changes or, maybe,
statistic updates was a step into that direction. You code mostly
contributed the other parts already.

I thinks a good middle ground to address the "persistence" problems I
think there are and keep close to your patch might be using "stored
procedures" and have these cached globally.

Another considerations is whether most task are getting CPU bound or IO
bound. A better, per query, plan might reduce IO load due to better use
of statistics on that single case, while for CPU bound it is very nice
to reduce the planning overhead significantly.
> Another possible implementation would be to use a connection pooler
> which, when opening a new connection, can be configured to send a SQL
> script containing all the PREPARE statements.
This is, IMHO, an application side feature that might be a good addition
to PHP and other languages that provide the "persistent connection" feature.

- Joris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: PFC <lists(at)peufeu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch : Global Prepared Statements
Date: 2008-03-30 21:04:30
Message-ID: 28755.1206911070@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

PFC <lists(at)peufeu(dot)com> writes:
> Here is the result of my little experiment, for your viewing pleasure,

I'm having a problem with the terminology here, since AFAICT what your
patch does is exactly not a global "prepare" --- there is no permanently
stored cached plan. That's a good thing probably, but it seems like
the feature needs to be described differently.

I'm also pretty dubious about storing raw text in that catalog. In the
first place, while I've not looked at your patch, I expect you are
pulling the raw text from debug_query_string. That won't work in cases
where multiple SQL commands were submitted in one query string. In the
second place, raw-text SQL commands will be subject to a whole lot of
ambiguity at parse time. If for instance another session tries to use
the command with a different search_path or standard_conforming_string
setting, it'll get different results. While I can think of use-cases
for that sort of behavior, it seems like mostly a bad idea.

I'm thinking that a more appropriate representation would use stored
parse trees, the same as we do in pg_rewrite, and with the same
dependency information so that a stored statement couldn't outlive the
objects it depends on.

Another area that could do with more thought is the hard-wired
association between statement ownership and accessibility. That's
likely to be pretty inconvenient in a lot of cases, particularly
systems that use role membership heavily.

I also wonder whether statements should belong to schemas...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl>
Cc: PFC <lists(at)peufeu(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-30 21:09:17
Message-ID: 28827.1206911357@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joris Dobbelsteen <joris(at)familiedobbelsteen(dot)nl> writes:
> The MAJOR benefit of Microsoft's approach is that it works on existing
> application, and, most importantly makes NO assumptions on the
> "volatile" server state. A few cases where the Microsoft solution works,
> while yours will fail is:

> * Server restart and assorted like failover (you need to redo a
> global prepare).

Hmm? He's proposing storing the info in a system catalog. That hardly
seems "volatile"; it'll certainly survive a server restart.

> * Cleanup and instantiation of a prepared statement.

Again, it's not clear what you've got in mind.

I agree with the point that this isn't completely transparent to
applications, but if an app is already using named prepared statements
it would surely be a pretty small matter to make it use this feature.
The app code would likely get simpler instead of more complex, since
you'd stop worrying about whether a given statement had been prepared
yet in the current session.

regards, tom lane


From: PFC <lists(at)peufeu(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-31 13:07:54
Message-ID: op.t8vtfgeocigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> * Server restart and assorted like failover (you need to redo a
>> global prepare).
>
> Hmm? He's proposing storing the info in a system catalog. That hardly
> seems "volatile"; it'll certainly survive a server restart.

Yes, it's in a system catalog.

> I agree with the point that this isn't completely transparent to
> applications, but if an app is already using named prepared statements
> it would surely be a pretty small matter to make it use this feature.
> The app code would likely get simpler instead of more complex, since
> you'd stop worrying about whether a given statement had been prepared
> yet in the current session.

Thanks. That was the idea behing this hack...

> I'm having a problem with the terminology here, since AFAICT what your
> patch does is exactly not a global "prepare" --- there is no permanently
> stored cached plan. That's a good thing probably, but it seems like
> the feature needs to be described differently.

Sure, but I couldn't come up with a suitable name at the time... perhaps
CREATE STATEMENT (and DROP STATEMENT) ? This would describe it better
(especially the DROP, because GLOBAL DEALLOCATE is a rather bad name,
since it doesn't actually deallocate anything...)

> I'm also pretty dubious about storing raw text in that catalog. In the
> first place, while I've not looked at your patch, I expect you are
> pulling the raw text from debug_query_string. That won't work in cases
> where multiple SQL commands were submitted in one query string.

LOL, you are right, I had tested with multiple queries on the same line
from psql, but psql apparently splits the queries, when I feed multiple
queries from PHP, one of them being GLOBAL PREPARE, it fails.

> In the
> second place, raw-text SQL commands will be subject to a whole lot of
> ambiguity at parse time. If for instance another session tries to use
> the command with a different search_path or standard_conforming_string
> setting, it'll get different results. While I can think of use-cases
> for that sort of behavior, it seems like mostly a bad idea.

You're right.

> I'm thinking that a more appropriate representation would use stored
> parse trees, the same as we do in pg_rewrite, and with the same
> dependency information so that a stored statement couldn't outlive the
> objects it depends on.

Do the parse tree store fully qualified "schema.table" or
"schema.function" ?
I mean, if table T is mentioned in a parse tree which is stored, and the
table is later dropped and recreated... or a column dropped... what
happens ? Dropping the statement would seem more logical, since it would
probably no longer be valid...

> Another area that could do with more thought is the hard-wired
> association between statement ownership and accessibility. That's
> likely to be pretty inconvenient in a lot of cases, particularly
> systems that use role membership heavily.

Yes, need to think about that.

> I also wonder whether statements should belong to schemas...

Since they are basically an extremely simple form of a function, why not ?
(but since part of the goodness on prepared statements is that they are
stored in a fast hash cache, wouldn't that add too much overhead ?)

Thanks for the helpful advice.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: PFC <lists(at)peufeu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-31 14:58:46
Message-ID: 27845.1206975526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

PFC <lists(at)peufeu(dot)com> writes:
> Do the parse tree store fully qualified "schema.table" or
> "schema.function" ?

They store OIDs.

> I mean, if table T is mentioned in a parse tree which is stored, and the
> table is later dropped and recreated... or a column dropped... what
> happens ?

Dependencies take care of that --- if you drop the table, the statement
goes away too.

>> I also wonder whether statements should belong to schemas...

> Since they are basically an extremely simple form of a function, why not ?
> (but since part of the goodness on prepared statements is that they are
> stored in a fast hash cache, wouldn't that add too much overhead ?)

The lookup overhead would be trivial, I expect, compared to everything
else involved in a query. But what you'd have to work out is the
interaction between that and ordinary prepared statements, which
traditionally haven't had a schema name attached to the statement name.

(Come to think of it, if there's a statement FOO and I explicitly do
PREPARE FOO, what happens? Should the result depend on whether I've
used FOO earlier in the session?)

regards, tom lane


From: James Mansion <james(at)mansionfamily(dot)plus(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PFC <lists(at)peufeu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-03-31 19:26:01
Message-ID: 47F13AC9.1080901@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> PFC <lists(at)peufeu(dot)com> writes:
>
>> Do the parse tree store fully qualified "schema.table" or
>> "schema.function" ?
>>
>
> They store OIDs.
>
So, what happens if we reference a temporary table or something else
that requires resolution
down a search path?

I believe Microsoft and Sybase have to defer some optimisation because
of this.

James


From: PFC <lists(at)peufeu(dot)com>
To: "Joris Dobbelsteen" <joris(at)familiedobbelsteen(dot)nl>
Cc: "James Mansion" <james(at)mansionfamily(dot)plus(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: first time hacker ;) messing with prepared statements
Date: 2008-04-02 17:11:20
Message-ID: op.t8uhqaz4cigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> The MAJOR benefit of Microsoft's approach is that it works on existing
> application,

Yes, that is a nice benefit !
Is there a way to turn it on/off ? Or is it smart enough to only cache
plans for cases where it is relevant ?
For instance, I absolutely want some queries to be planned according to
real parameters (makes huge difference on some search queries, as
expected), whereas most simple queries like the proverbial select by ID
etc could be cached without problems...

> and, most importantly makes NO assumptions on the "volatile" server
> state. A few cases where the Microsoft solution works, while yours will
> fail is:
>
> * Server restart and assorted like failover (you need to redo a
> global prepare).
> * Cleanup and instantiation of a prepared statement.

Hehe, actually, mine does work after restart since the statements are
stored in a database-specific system catalog which is persistent.
Actually, what I store is not the result of PREPARE (a plan) but the text
of the SQL query "PREPARE foo.....", that is I just cut the GLOBAL from
"GLOBAL PREPARE" and store the rest. The actual PREPARE is realized by
each connection when it encounters an EXECUTE request and doesn't find the
cached plan. It is actually extremely simple ;) did you expect a fancy
shared memory cache (ahem...) ? No, no, it's very basic.
This way, if a table was dropped and recreated, or whatever other stuff
that can invalidate a plan since the GLOBAL PREPARE was issued, no
problem, since there was no global stored plan anyway, just some SQL text.
Also if a needed table was dropped, the user will get the same error
message as he would have got issuing a PREPARE for the associated SQL
query string.
The overhead of each connection doing its own PREPARE is negligible,
since, if you use that feature, you intend to issue this query many, many
times during the life of the persistent connection.

> What you are doing for a global query cache is already in consideration
> and having plan invalidation mechanism on schema changes or, maybe,
> statistic updates was a step into that direction. You code mostly
> contributed the other parts already.

As I said it is much simpler than that : I store no plans ;)
Of course this means it only works with persistent connections.

> Another considerations is whether most task are getting CPU bound or IO
> bound. A better, per query, plan might reduce IO load due to better use
> of statistics on that single case, while for CPU bound it is very nice
> to reduce the planning overhead significantly.

Well, if it is IO bound, then this thing is useless. However, since the
purpose is to optimize often-used, simple queries, the likes of which
abound in web applications, then it is relevant... because, if this kind
of simple selects become IO bound, and you have a few on each page, you're
really in trouble...

>> Another possible implementation would be to use a connection pooler
>> which, when opening a new connection, can be configured to send a SQL
>> script containing all the PREPARE statements.
> This is, IMHO, an application side feature that might be a good addition
> to PHP and other languages that provide the "persistent connection"
> feature.

On second thought, if it is not in Postgres, I'd rather put this in the
connection pooler, because this way it can be used by different
applications. But then you have to use a connection pooler. Also, by
putting it in Postgres, statements are only prepared as needed, whereas
the pooler would have to issue a lot of PREPAREs at connection startup,
making new connection startup slower.