Restructuring plancache.c API

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Restructuring plancache.c API
Date: 2010-11-11 22:21:34
Message-ID: 22791.1289514094@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been thinking about supporting automatic replan of cached plans
using specific parameter values, as has been discussed several times,
at greatest length in this thread:
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
There doesn't seem to be full consensus about what the control method
ought to be, but right at the moment I'm thinking about mechanism not
policy. I think that what we need to do is restructure the API of
plancache.c to make it more amenable to returning "throwaway" plans.
It can already do that to some extent using the fully_planned = false
code path, but that's not the design center and it was shoehorned in
in perhaps a less than clean fashion. I want to rearrange it so there's
an explicit notion of three levels of cacheable object:

1. Raw parse tree + source string. These obviously never change.

2. The result tree of parsing and rewriting (ie, the output of
pg_analyze_and_rewrite applied to level 1). This can change, but
only as a result of schema changes on the tables and other objects
referenced in the query. We already have entirely adequate mechanisms
for recognizing when this has to be rebuilt.

3. The finished plan (ie, the output of pg_plan_queries applied to level
2). This might be either cached for reuse, or a throwaway object,
depending on the control mechanism's decisions.

I think we could get rid of the fully_planned switch and instead design
the API around caching levels 1 and 2. Then there's a GetCachedPlan
function (replacing RevalidateCachedPlan) that returns a finished plan,
but it's unspecified whether you get a persistent cached plan or a
throwaway one. The control mechanism would execute inside this
function. We'd still have ReleaseCachedPlan, which would take care of
throwing away the plan if it's throwaway.

Right now the API is structured so that the initial creator of a
cacheable plan has to build levels 2 and 3 first, and the plancache.c
code just copies that data into persistent storage. I'm thinking that
might have been a mistake. Maybe we should just have the caller hand
over the data for level 1, with parse analysis + rewrite done solely
internally within plancache.c. The level-2 data wouldn't be exposed
outside plancache.c at all.

With this focus, the name "plancache" becomes a little bit of a
misnomer, but I am inclined to stick with it because a better name
isn't apparent. "rewritecache" isn't an improvement really.

Comments?

regards, tom lane


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Restructuring plancache.c API
Date: 2010-11-12 08:52:13
Message-ID: 4CDD003D.4090300@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-11-11 23:21, Tom Lane wrote:
> I've been thinking about supporting automatic replan of cached plans
> using specific parameter values, as has been discussed several times,
> at greatest length in this thread:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
..
> I want to rearrange it so there's
> an explicit notion of three levels of cacheable object:
>
> 1. Raw parse tree + source string. These obviously never change.
In the context of cached plans and specific parameter values, a idea for
the future might be to also consider a cached plan for planning of
simple queries. A way to do this is by regarding all constants in a
simple query as parameters, and look for a cached plan for that
parameterized query. To lower the chance for choosing a bad plan for the
actual parameter values, a cached plan could also store the actual
parameter values used during planning. (where planning was done with
constants, not parameters, this would require back replacing the actual
values as constants in the parameterized query). Based on exact match on
the raw parse tree of the parameterized source tree and neighbourhood of
the actual parameter values of the cached and current query, a plan
could be chosen or not. If replanning was chosen, this new plan could
also be stored as new cached plan of the same query but with different
parameter values.

It would require one more level in the plan cache
1 raw parse tree of parameterized query
2 one or more "source string + actual parameter values" (these were the
replaced constants)
then for each entry in level 2 the remaining levels.

regards,
Yeb Havinga


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Restructuring plancache.c API
Date: 2010-11-12 21:47:32
Message-ID: 1289598256-sup-1823@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of jue nov 11 19:21:34 -0300 2010:
> I've been thinking about supporting automatic replan of cached plans
> using specific parameter values, as has been discussed several times,
> at greatest length in this thread:
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00607.php
> There doesn't seem to be full consensus about what the control method
> ought to be, but right at the moment I'm thinking about mechanism not
> policy. I think that what we need to do is restructure the API of
> plancache.c to make it more amenable to returning "throwaway" plans.
> It can already do that to some extent using the fully_planned = false
> code path, but that's not the design center and it was shoehorned in
> in perhaps a less than clean fashion. I want to rearrange it so there's
> an explicit notion of three levels of cacheable object:

I was wondering if this could help with the separation of labour of
functions in postgres.c that we were talking about a couple of weeks
ago. The main impedance mismatch, so to speak, is that those functions
aren't at all related to caching of any sort; but then, since you're
looking for a new name for the source file, I return to my earlier
suggestion of a generic "queries.c" or some such, which could handle all
these issues. (Of course, querycache.c doesn't make any sense.)

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Restructuring plancache.c API
Date: 2010-11-12 21:58:12
Message-ID: 3366.1289599092@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of jue nov 11 19:21:34 -0300 2010:
>> I think that what we need to do is restructure the API of
>> plancache.c to make it more amenable to returning "throwaway" plans.

> I was wondering if this could help with the separation of labour of
> functions in postgres.c that we were talking about a couple of weeks
> ago.

Yeah, it was in the back of my mind that this patch might create some
merge conflicts for that one, but I figured we could deal with that when
the time came. I wasn't intending to refactor the behavior of
pg_analyze_and_rewrite or pg_plan_queries, just change where they might
get called from, so I think any conflict will be inessential and easily
resolved.

> The main impedance mismatch, so to speak, is that those functions
> aren't at all related to caching of any sort; but then, since you're
> looking for a new name for the source file, I return to my earlier
> suggestion of a generic "queries.c" or some such, which could handle all
> these issues. (Of course, querycache.c doesn't make any sense.)

I thought about querycache.c too, but it seems to carry the wrong
connotations --- in mysql-land I believe they use that term to imply
caching a query's *results*. But queries.c seems so generic as to
convey no information at all.

regards, tom lane