Re: Cached Query Plans (was: global prepared statements)

From: PFC <lists(at)peufeu(dot)com>
To: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cached Query Plans (was: global prepared statements)
Date: 2008-04-14 00:44:55
Message-ID: op.t9ksc5mjcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Why limit ourselves with Oracle? How all major proprietary RDBMSs do it.

Thanks for the links. Very interesting.
The DB2 document especially mentions an important point : in order to make
their planner/optimizer smarter, they had to make it slower, hence it
became crucial to cache the plans.
Contrast this with MySQL where using prepared statements gains nothing :
the "optimizer" does so little work that it actually doesn't matter.

So, basically, Orcale :
- Parses the query every time (identifies tables, permissions etc) (soft
parse)
- From that parsed query it looks up a cached plan (the lookup key could
then be different depending on the schema etc)
- If not, it must plan the query (hard parse).
Also the Oracle doc mentions that the soft parsing should be avoided by
using prepared statements in the application (ie Parse once and Bind lots
of times)
So, Oracle will redo the parsing + permissions check each time, unless
prepared statements are used, in which case it's direct execution.

And DB2 :
Er, the document is not very clear about what it actually does, but the
stats look nice ;)

> I liked your global prepared statements idea much better. Named the
> statements is no problem: DB frontends do that for you anyway
> sometimes.

Hm. The "global statements" and the cache would complement each other
actually. Why not.

When the user wants to name the statements, he can do so (and perhaps
control who can execute what, etc, like with stored procs)
Permission checking overhead will be there at each execution.
Should the plan be cached locally ? (RAM consumption times N bakends...)
Cached per user once permissions have been checked ? (avoids the overhead
of rechecking permissions)
What about the search path ?
(I'd force the global statements to use the default search path no matter
what, being explicit is better than "why does it stop working ?")

Can the application or the database library name the statements ?
I'm not so sure. This could work for compiled languages (what about when
you run several applications ? or several versions of the same application
? do we need a uniqueness of statement names from all developers all over
the world ?) Solution : make each application use a different user name,
and global prepared statements only visible to the user that created them,
perhaps. This conflicts with some desirable features, though. It needs
more thinking.

What about non-compiled languages ? It will not be possible to generate a
list of statements beforehands... And queries are also constructed
dynamically by frameworks such as Rails, which makes naming them
impossible, but caching the plans would work well.

So, some situations would benefit from a plan cache,

> Frankly, I think you're better off storing them in a table. Shared
> memory is a limited resource and you cannot change how much you've

I'd say that unless you have a perverse application that will try all the
permutations of column names just to make sure the query is different
every time, how many different queries would you want to cache ?...
probably less than 1000... so it wouldn't take more than a couple
megabytes...

> allocated after the server has started. It does mean you'll have to
> serialise/deserialise them, but this will be cheaper than replanning,
> right?

What would be the overhead of a catalog lookup to get a cached plan for a
statement that returns 1 row ? Would the catalog cache make it fast enough
?
And what about deserialization ?...

> I am not too sure that plans and statistical counters should be stored
> together...

Not sure either.

> Probably plans should go in one place, and statistics should go to the
> stats collector (I know he's not quite ready for this ;)).

That's the problem...

> Hm, a limit on how much memory can be used for plans
> (query_plan_cache_size GUC?), and a LRU/LFU expiration
> of old plans?

Now it gets hairy ;)
Yes memory size should be limited. But how to make a LRU cleaner which
doesn't create lots of contention ?... Luckily, with a hash having a fixed
number of buckets, it is easier (clean a bucket every N seconds for
instance).

> Perhaps a GUC for controlling query cache should heve three values:
> none -- don't cache any statement
> smart -- use heuristics for deciding whether to cache it
> all -- force caching all queries -- for uncommon/statistical/testing
> purposes.

I would not volunteer to write that heuristic ;)
Although there would be a very simple solution : if time to parse > some
percentage of time to execute then cache.
The hairiness is in the plan dependence (or independence) on parameter
values, ideally we only want to cache plans that would be good for all
parameter values, only the user knows that precisely. Although it could be
possible to examine the column histograms...

>> (like mysql, /* flags */ SELECT blah )
>
> I don't like the hint flags. They tend to haunt later on (when the
> database gets smarter, but application forces it to be dumb). I would
> say a GUC.

I don't like them either... needs a better solution like a flag in
PQexecParams, but this would cause lots of trouble, so it's not really
possible...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-14 02:12:41 Re: [HACKERS] Remove lossy-operator RECHECK flag?
Previous Message Tom Lane 2008-04-13 23:11:03 Re: pgwin32_safestat weirdness