Re: Caching for stable expressions with constant arguments v6

Lists: pgsql-hackers
From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Caching for stable expressions with constant arguments v6
Date: 2012-01-16 17:06:45
Message-ID: CABRT9RA-RomVS-yzQ2wUtZ=m-eV61LcbrL1P1J3jydPStTfc6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi list,

Here's v6 of my expression caching patch. The only change in v6 is
added expression cost estimation in costsize.c. I'm setting per-tuple
cost of CacheExpr to 0 and moving sub-expression tuple costs into the
startup cost.

As always, this work is also available from my Github "cache" branch:
https://github.com/intgr/postgres/commits/cache

This patch was marked as "Returned with Feedback" from the 2011-11
commitfest. I expected to get to tweak the patch in response to
feedback before posting to the next commitfest. But said feedback
didn't arrive, and with me being on vacation, I missed the 2012-01 CF
deadline. :(

I will add it to the 2012-01 commitfest now, I hope that's OK. If not,
feel free to remove it and I'll put it in 2012-Next.

PS: Jaime, have you had a chance to look at the patch? Even if you're
not done with the review, I'd be glad to get some comments earlier.
And thanks for reviewing.

Regards,
Marti


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-01-16 17:48:28
Message-ID: CAJKUy5iKTxmDYx8jnehfzwbSRcQC3gUkcSAHyVjzLKo4U0yjTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 12:06 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>
> I will add it to the 2012-01 commitfest now, I hope that's OK. If not,
> feel free to remove it and I'll put it in 2012-Next.
>

i'm not the CF manager so he can disagree with me...
but IMHO your patch has been almost complete since last CF and seems
something we want... i made some performance test that didn't give me
good results last time but didn't show them and wasn't able to
reproduce so maybe it was something in my machine...

so, i would say add it and if the CF manager disagrees he can say so

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-01-27 14:49:10
Message-ID: CA+Tgmoaq41R=3_9SLpt=5UGMdip7WYVeFsZne+veGmCx6OGzeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 12:06 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> Here's v6 of my expression caching patch.

The patch is not attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-01-27 14:53:09
Message-ID: CABRT9RBa=a-MWyiC9_9VfyVMZOJdMWL81=5U2Emj-sS2tUrYSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 27, 2012 at 16:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The patch is not attached.

Very sorry, I keep doing that :(

Attached now.

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v6.patch text/x-patch 119.4 KB

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-02-03 16:28:06
Message-ID: CABRT9RDAgEL+KpErbOAfmTpQZCFp-UnWQa2QUHKr8kUCNUC5iA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 19:06, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> Here's v6 of my expression caching patch. The only change in v6 is
> added expression cost estimation in costsize.c.

Ok, I found another omission in my patch. Up until v4, the behavior of
estimate_expression_value() was to never insert CacheExpr nodes, and
always strip them when found. In v5, I ripped out all the conditional
CacheExpr insertion and stripping so expression trees would always be
normalized the same way.

This had an unintended consequence; operator selectivity estimation
functions aren't prepared to deal with CacheExpr nodes. E.g. STABLE
function calls and expressions with Params are constant-folded in
estimation mode, but the constant would still remain as a child of a
CacheExpr node.

The solutions I considered:
1. Selectivity functions and other estimation callers could be changed
to strip CacheExpr themself. This doesn't seem very attractive since
there are quite a lot of them and it seems like duplication of code.

2. Strip CacheExpr nodes in the estimation pass. This has a potential
hazard that estimate_expression_value() is no longer idempotent; a
second pass over the same expr tree might re-insert CacheExpr to some
places. However, these cases are unlikely to be problematic for
selectivity estimation anyway -- currently selectivity estimation (and
other callers) can only deal with trees that get folded to a single
Const, and those don't get cached.

3. Strip CacheExprs *and* restrict CacheExpr insertion in estimation
mode (like v4 patch). This would also add a larger amount of code to
the patch, but it would keep estimate_expression_value() idempotent.

----

The attached patch implements solution #2 as it's the least amount of
code and the downside doesn't seem problematic (as explained above). I
also rebased the patch to current Postgres master (no conflicts, just
some fuzz). Note that the create_index test fails in master, this
problem isn't introduced by my patch.

As always, this work is also available from my Github "cache" branch:
https://github.com/intgr/postgres/commits/cache

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v7.patch text/x-patch 156.3 KB

From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-02-04 07:49:31
Message-ID: CAJKUy5jSX4erms358uS1N1OtW6MmFZiOQzuAKNcXM=1TOH-uzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 16, 2012 at 12:06 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>
> Here's v6 of my expression caching patch.

i little review...

first, i notice a change of behaviour... i'm not sure if i can say
this is good or not.

with this function:
"""
create or replace function cached_random() returns numeric as $$
begin
raise notice 'cached';
return random();
end;
$$ language plpgsql stable;
"""
if you execute: select *, cached_random() from (select
generate_series(1, 10) ) i;

on head you get 10 random numbers, with your patch you get 10 times
the same random number... wich means your patch make stable promise a
hard one.
personally i think that's good but i know there are people using,
mistakenly, volatile functions inside stable ones

---

seems you are moving code in simplify_function(), do you think is
useful to do that independently? at least if it provides some clarity
to the code

---

benchmark. i run a few tests in my laptop (which is not very performant but...)
from what i see there is no too much gain for the amount of complexity
added... i can see there should be cases which a lot more gain (for
example if you use a function to hide a select and you use such a
function several times in the select... but i guess it should work the
same with a CTE)

configuration:

name | setting
----------------------------------+--------------------
shared_buffers | 4096
synchronous_commit | off

filesystem: xfs

-- This is from the bench_cache.sh but with -T 150
select * from ts where ts between to_timestamp('2005-01-01',
'YYYY-MM-DD') and to_timestamp('2005-01-01', 'YYYY-MM-DD');
head 0.423855
cache 1.949242
select * from ts where ts>now();
head 2.420200
cache 2.580885

/*uncachable*/ select * from one where ts >=
to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') and ts <
(to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') + interval '1
year')
head 955.007129
cache 846.917163

/*cachable*/ select * from one where ts >= to_date(now()::date::text,
'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') +
interval '1 year')
head 827.484067
cache 801.743863

a benchmark with pgbench scale 1 (average of 3 runs, -T 300 clients
=1, except on second run)
-scale 1
== simple ==
head 261.833794
cache 250.22167
== simple (10 clients) ==
head 244.075592
cache 233.815389
== extended ==
head 194.676093
cache 202.778580
== prepared ==
head 300.460328
cache 302.061739
== select only ==
head 886.207252
cache 909.832986

a benchmark with pgbench scale 20 (average of 3 runs, -T 300 clients
=1, except on second run)
-scale 20
== simple ==
head 19.890278
cache 19.536342
== simple (10 clients) ==
head 40.864455
cache 44.457357
== extended ==
head 21.372751
cache 19.992955
== prepared ==
head 19.543434
cache 20.226981
== select only ==
head 31.780529
cache 36.410658

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-02-04 10:40:59
Message-ID: CABRT9RBbJ5gV3GZK9NCiiZqQTUOD986LFB5W8funSz42vcJLWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 4, 2012 at 09:49, Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
> i little review...

Thanks! By the way, you should update to the v7 patch.

> first, i notice a change of behaviour... i'm not sure if i can say
> this is good or not.

> if you execute: select *, cached_random() from (select
> generate_series(1, 10) ) i;

Yeah, this is pretty much expected.

> seems you are moving code in simplify_function(), do you think is
> useful to do that independently? at least if it provides some clarity
> to the code

I think so, yeah, but separating it from the patch would be quite a bit of work.

> shared_buffers | 4096

Note that the "ts" table is 36MB so it doesn't fit in your
shared_buffers now. If you increase shared_buffers, you will see
better gains for tests #1 and #2

> from what i see there is no too much gain for the amount of complexity
> added... i can see there should be cases which a lot more gain

Yeah, the goal of this test script wasn't to demonstrate the best
cases of the patch, but to display how it behaves in different
scenarios. Here's what they do:

Test case #1 is a typical "real world" query that benefits from this
patch. With a higher shared_buffers you should see a 6-7x improvement
there, which I think is a compelling speedup for a whole class of
queries.

Test #2 doesn't benefit much from the patch since now() is already a
very fast function, but the point is that even saving the function
call overhead helps noticeably.

Tests #3 and #4 show the possible *worst* case of the patch -- it's
processing a complex expression, but there's just one row in the
table, so no opportunity for caching.

----
Besides stable functions, this patch also improves the performance of
expressions involving placeholders parameters, such as variable
references from PL/pgSQL, since these are not constant-folded. E.g:

DECLARE
foo timestamptz = '2012-02-04';
BEGIN
SELECT * FROM ts WHERE ts>(foo - interval '1 day');

Before this patch, the interval subtraction was executed once per row;
now it's once per execution.

> a benchmark with pgbench scale 20 (average of 3 runs, -T 300 clients
> =1, except on second run)
> -scale 20

I think the differences here are mostly noise because the queries
pgbench generates aren't affected by caching.

Regards,
Marti


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-03-01 22:34:48
Message-ID: CAJKUy5huJwStHzfBiT9mU1Uq3=HhQKs9TBcDGqz-04SrUHsigg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 4, 2012 at 5:40 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> On Sat, Feb 4, 2012 at 09:49, Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
>> i little review...
>
> Thanks! By the way, you should update to the v7 patch.
>

just tried it and it fail when initializing on make check
"""
creating information schema ... TRAP: FailedAssertion("!(*cachable ==
((bool) 1))", File: "clauses.c", Line: 2295)
Aborted (core dumped)
child process exited with exit code 134
"""

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-03-02 00:12:09
Message-ID: CABRT9RA23KLmYnV7Xtv6iV84vEizHrhhKD11iA2G3Qnn396kCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Mar 2, 2012 at 00:34, Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
> just tried it and it fail when initializing on make check

> creating information schema ... TRAP: FailedAssertion("!(*cachable ==
> ((bool) 1))", File: "clauses.c", Line: 2295)

Sorry, my bad, here's a fixed patch (also rebased to current git master).

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v8.patch text/x-patch 156.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-03-10 00:05:28
Message-ID: 8448.1331337928@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> [ cacheexpr-v8.patch ]

A few comments:

* I believe the unconditional datumCopy call in ExecEvalCacheExpr will
dump core if the value is null and the type is pass-by-reference.
Beyond just skipping it, it seems like you could skip the type
properties lookup as well if the value to be cached is null.

* I do not see any logic in here for resetting the already-cached status
in an existing ExprState tree. You're going to need that for cases
where Params are given a new value. I don't think I trust the
assumption that this is never true for EXTERN parameters, and in any
case you're leaving a whole lot of the potential benefit of the patch on
the table by ignoring the possibility of caching executor-internal
parameters. I would rather see, for example, a ReScan call on a plan
node running around and resetting caching of all expression trees
attached to the plan node if its chgParam isn't empty.

* I think you should consider getting rid of the extra argument to
ExecInitExpr, as well as the enable flag in CacheExprState, and instead
driving cache-enable off a new flag added to EState, which should
probably initialize to "caching OK" since that's what the majority of
callers seem to want. That would get rid of a fair amount of the
boilerplate changes, as well as being at least as convenient to use,
maybe more so.

* In the same vein, I think it would be better to avoid adding
the extra argument to eval_const_expressions_mutator and instead pass
that state around in the existing "context" struct argument. In
particular I am entirely unimpressed with the fact that you can't pass
the extra argument through expression_tree_mutator and thus have to dumb
the code down to fail to cache underneath any node type for which
there's not explicit coding in eval_const_expressions_mutator.

* This is pretty horrible:

+ * NOTE: This function is not indempotent. Calling it twice over an expression
+ * tree causes CacheExpr nodes to be removed in the first pass, then re-added
+ * in the 2nd pass. Make sure it only gets called once.

I don't think we can positively guarantee that. The function API should
be fixed to not need such an assumption.

* It seems like some of the ugliness here is due to thinking that
selectivity functions can't cope with CacheExprs. Wouldn't it be a lot
better to make them cope? I don't think there are that many places.
I'd suggest looking for places that strip RelabelType and fixing them to
strip CacheExpr instead (in fact, most likely any that don't are broken
anyway ...)

* There's a lot of stuff that seems wrong in detail in
eval_const_expressions_mutator, eg it'll try to wrap a plain vanilla
Const with a CacheExpr. I see you've hacked that case inside
insert_cache itself, but that seems like evidence of a poorly thought
out recursion invariant. The function should have a better notion than
that of whether caching is useful for a given subtree. In general I'd
not expect it to insert a CacheExpr unless there is a Param or stable
function call somewhere below the current point, but it seems not to be
tracking that. You probably need at least a three-way indicator
returned from each recursion level: subexpression is not cacheable
(eg it contains a Var or volatile function); subexpression is cacheable
but there is no reason to do so (default situation); or subexpression is
cacheable and should be cached (it contains a Param or stable function).

* I don't think it's a good idea for
simplify_and_arguments/simplify_or_arguments to arbitrarily reorder the
arguments based on cacheability. I know that we tell people not to
depend on order of evaluation in AND/OR lists, but that doesn't mean
they listen, and this will introduce reordering into an awful lot of
places where it never happened before. If you have a mix of
cacheability situations, just stick CacheExprs on each arm (or not).

* I'm having a hard time understanding what you did to simplify_function
and friends; that's been whacked around quite a bit, in ways that are
neither clearly correct nor clearly related to the goal of the patch.
It might be best to present this change as a series of two patches,
one that just refactors and one that adds the caching logic.

* I don't believe CacheExprs can be treated as always simple by
ruleutils' isSimpleNode, at least not unless you always print them with
parentheses (which might be a good idea anyway).

* I don't think I believe either of the changes you made to plpgsql's
exec_simple_check_node. I'm not convinced that only PARAM_EXTERN params
are possible, and even if that's true it doesn't seem to be this patch's
business to add an assert for it. Also, immediately returning TRUE for
a CacheExpr amounts to assuming that we can never wrap a CacheExpr
around something that plpgsql would consider non-simple. I'm not sure
that's true now, and even if it is true it seems a mighty fragile
assumption.

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-03-10 17:00:37
Message-ID: CABRT9RBsdztmXY6y6NYT1MCsgEKE3ix9ujpMegEHLHyxFHY8UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 10, 2012 at 02:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>> [ cacheexpr-v8.patch ]
>
> A few comments

Whoa, that's quite a few. Thanks for the review.

> * There's a lot of stuff that seems wrong in detail in
> eval_const_expressions_mutator, eg it'll try to wrap a plain vanilla
> Const with a CacheExpr.  I see you've hacked that case inside
> insert_cache itself, but that seems like evidence of a poorly thought
> out recursion invariant.  The function should have a better notion than
> that of whether caching is useful for a given subtree.

Well my logic was this: accessing params and consts is just as fast as
accessing the cache -- there's no evaluation involved. So there's no
point in inserting CacheExpr in front of those. All other kinds of
expressions require some sort of evaluation, so they are cached, if
they weren't already constant-folded.

> In general I'd
> not expect it to insert a CacheExpr unless there is a Param or stable
> function call somewhere below the current point, but it seems not to be
> tracking that.

I think if you continue on your "Param or stable function" train of
thought, then it boils down to "expressions that can't be
constant-folded". And that's how it already works now --
constant-foldable expressions get reduced to a Const, which isn't
cached. There's no need to track it specially since we can check
whether we've got a Const node.

> You probably need at least a three-way indicator
> returned from each recursion level: subexpression is not cacheable
> (eg it contains a Var or volatile function); subexpression is cacheable
> but there is no reason to do so (default situation); or subexpression is
> cacheable and should be cached (it contains a Param or stable function).

That could be done -- give a special cachability result in code paths
that return a constant -- but that seems like complicating the logic
and doesn't tell us anything we don't already know.

----

> * I'm having a hard time understanding what you did to simplify_function
> and friends; that's been whacked around quite a bit, in ways that are
> neither clearly correct nor clearly related to the goal of the patch.

Sure, I'll split this out as a separate patch and send it up.

> * I believe the unconditional datumCopy call in ExecEvalCacheExpr will
> dump core if the value is null and the type is pass-by-reference.

datumCopy already has a check for NULL pointer. But good point about
skipping type properties lookup -- will change.

> * I think you should consider getting rid of the extra argument to
> ExecInitExpr, as well as the enable flag in CacheExprState, and instead
> driving cache-enable off a new flag added to EState

Will do.

> * In the same vein, I think it would be better to avoid adding
> the extra argument to eval_const_expressions_mutator and instead pass
> that state around in the existing "context" struct argument.

I'll have to see how ugly it gets to save & restore the cachability
field in recursive calls.

> I am entirely unimpressed with the fact that you can't pass
> the extra argument through expression_tree_mutator and thus have to dumb
> the code down to fail to cache underneath any node type for which
> there's not explicit coding in eval_const_expressions_mutator.

Well I deliberately chose a whitelisting approach. Expression types
that aren't important enough to be constant-folded probably aren't
that important for caching either.

Do you think it's safe to assume that expression types we don't know
about are inherently cachable?

> + * NOTE: This function is not indempotent. Calling it twice over an expression
> * It seems like some of the ugliness here is due to thinking that
> selectivity functions can't cope with CacheExprs. Wouldn't it be a lot
> better to make them cope?

I thought centralizing this CacheExpr-stripping to one place was a
better idea than spraying it all around the code base. We can skip the
copy *and* the check this way.

Hmm, if I passed the context to insert_cache then we could avoid this problem.

We could do the same for RelabelType in estimation mode, but I don't
know how safe that is.

> * I don't think it's a good idea for
> simplify_and_arguments/simplify_or_arguments to arbitrarily reorder the
> arguments based on cacheability.  I know that we tell people not to
> depend on order of evaluation in AND/OR lists, but that doesn't mean
> they listen

Fair enough, will change.

> * I don't believe CacheExprs can be treated as always simple by
> ruleutils' isSimpleNode

Ok.

> * I don't think I believe either of the changes you made to plpgsql's
> exec_simple_check_node.

Will change.

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-03-10 17:39:33
Message-ID: 23193.1331401173@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Sat, Mar 10, 2012 at 02:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> * There's a lot of stuff that seems wrong in detail in
>> eval_const_expressions_mutator, eg it'll try to wrap a plain vanilla
>> Const with a CacheExpr. I see you've hacked that case inside
>> insert_cache itself, but that seems like evidence of a poorly thought
>> out recursion invariant. The function should have a better notion than
>> that of whether caching is useful for a given subtree.

> Well my logic was this: accessing params and consts is just as fast as
> accessing the cache -- there's no evaluation involved. So there's no
> point in inserting CacheExpr in front of those. All other kinds of
> expressions require some sort of evaluation, so they are cached, if
> they weren't already constant-folded.

I think this is overkill. Inserting a CacheExpr is far from free:
it costs cycles to make that node, cycles to copy it around, cycles
to recurse through it during subsequent processing. We should not
be putting them in to save trivial amounts of execution time. So
I don't believe your argument that there is no such thing as a
non-Const, non-volatile expression that shouldn't be cached.
Examples of things that clearly are not expensive enough to deserve
a cache node are RelabelType and PlaceHolderVar (the latter won't
even be there at runtime).

More generally, though, I think that caching something like, say,
"Param int4pl 1" is probably a net loss once you consider all the
added overhead. I'm not going to argue for putting explicit cost
considerations into the first version, but I don't want the
infrastructure to be such that it's impossible to bolt that on later.

>> * I believe the unconditional datumCopy call in ExecEvalCacheExpr will
>> dump core if the value is null and the type is pass-by-reference.

> datumCopy already has a check for NULL pointer.

You're assuming that a null Datum necessarily has an all-zero value,
which is not a safe assumption; in many situations the datum word
will be uninitialized. The null-pointer check in datumCopy is not
particularly useful IMO. It's probably a hangover from fifteen years
ago when the system's null-handling was a lot shakier than it is now.

> Do you think it's safe to assume that expression types we don't know
> about are inherently cachable?

Presumably, anything that doesn't contain a Var nor set off
contain_volatile_functions() should be safe to cache. I do not
care for the assumption that this set of expression types is identical
to the set that eval_const_expressions bothers to deal with.

>> * It seems like some of the ugliness here is due to thinking that
>> selectivity functions can't cope with CacheExprs. Wouldn't it be a lot
>> better to make them cope?

> I thought centralizing this CacheExpr-stripping to one place was a
> better idea than spraying it all around the code base.

How exactly do you figure that this avoids needing to know about them
elsewhere? Not everything in the selectivity code starts out by doing
estimate_expression_value. Perhaps more to the point, the stuff that
*does* do that is generally going to punt if it doesn't get a plain
Const back, so eliminating CacheExprs from non-Const cases isn't helping
it anyway.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-08-27 13:50:22
Message-ID: 20120827135022.GF11088@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Where are we on this?

---------------------------------------------------------------------------

On Mon, Jan 16, 2012 at 07:06:45PM +0200, Marti Raudsepp wrote:
> Hi list,
>
> Here's v6 of my expression caching patch. The only change in v6 is
> added expression cost estimation in costsize.c. I'm setting per-tuple
> cost of CacheExpr to 0 and moving sub-expression tuple costs into the
> startup cost.
>
> As always, this work is also available from my Github "cache" branch:
> https://github.com/intgr/postgres/commits/cache
>
> This patch was marked as "Returned with Feedback" from the 2011-11
> commitfest. I expected to get to tweak the patch in response to
> feedback before posting to the next commitfest. But said feedback
> didn't arrive, and with me being on vacation, I missed the 2012-01 CF
> deadline. :(
>
> I will add it to the 2012-01 commitfest now, I hope that's OK. If not,
> feel free to remove it and I'll put it in 2012-Next.
>
> PS: Jaime, have you had a chance to look at the patch? Even if you're
> not done with the review, I'd be glad to get some comments earlier.
> And thanks for reviewing.
>
> Regards,
> Marti
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-08-27 14:44:32
Message-ID: CABRT9RAJfrMxh1A+3WxNLDhWac6o9VTg27n7gwPsX37XuVhitg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 4:50 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Where are we on this?

TL;DR: Got a review, requires substantial work, current github branch
is slightly broken, will get back to this soon.

Tom Lane sent a thorough review of the patch here:
http://archives.postgresql.org/pgsql-hackers/2012-03/msg00655.php
(very much appreciated!)

I have addressed some smaller points from that list in my github
branch, but it still requires a substantial amount of work (in
particular, the bulk of this patch which is the recursion logic in
eval_const_expressions_mutator, needs to be changed to prevent
unnecessary CacheExpr insertions and to store intermediate state in
the context struct).

I got a small fragment of this into PostgreSQL 9.2 as commit
81a646febe87964725647a36d839f6b4b405f3ae. I rebased my github branch
on top of this commit, but the rebase introduced some test failures
that I have not tracked down yet. I don't know if it applies to git
HEAD any more.

Sadly some other things intervened and I have not had the time to
return to hacking on this patch. But I am hopeful I can get it into
shape during the 9.3 cycle.

Regards,
Marti


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-08-27 14:52:51
Message-ID: 20120827145251.GI11088@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 05:44:32PM +0300, Marti Raudsepp wrote:
> On Mon, Aug 27, 2012 at 4:50 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Where are we on this?
>
> TL;DR: Got a review, requires substantial work, current github branch
> is slightly broken, will get back to this soon.
>
> Tom Lane sent a thorough review of the patch here:
> http://archives.postgresql.org/pgsql-hackers/2012-03/msg00655.php
> (very much appreciated!)
>
> I have addressed some smaller points from that list in my github
> branch, but it still requires a substantial amount of work (in
> particular, the bulk of this patch which is the recursion logic in
> eval_const_expressions_mutator, needs to be changed to prevent
> unnecessary CacheExpr insertions and to store intermediate state in
> the context struct).
>
> I got a small fragment of this into PostgreSQL 9.2 as commit
> 81a646febe87964725647a36d839f6b4b405f3ae. I rebased my github branch
> on top of this commit, but the rebase introduced some test failures
> that I have not tracked down yet. I don't know if it applies to git
> HEAD any more.
>
> Sadly some other things intervened and I have not had the time to
> return to hacking on this patch. But I am hopeful I can get it into
> shape during the 9.3 cycle.

OK, thanks for the update, and your work on this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "marti(at)juffo(dot)org >> Marti Raudsepp" <marti(at)juffo(dot)org>
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-10-31 00:21:55
Message-ID: 50906F23.7020502@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti,

> Sadly some other things intervened and I have not had the time to
> return to hacking on this patch. But I am hopeful I can get it into
> shape during the 9.3 cycle.

Hey, are you going to work on this for 9.3? I really could use the
feature ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Caching for stable expressions with constant arguments v6
Date: 2012-10-31 19:04:15
Message-ID: CABRT9RB+=5=5BQjxAfbz--1UVnv5j4z2iHSCvatx9yes4BJaAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 31, 2012 at 2:21 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Hey, are you going to work on this for 9.3?

Yes, I do plan to get back to it. Thanks for the push :)

> I really could use the feature ...

If you're not aware already, you can work around the limitation using
a subquery.

Instead of: WHERE foo_column > expensive_expression()
Write: WHERE foo_column > (SELECT expensive_expression())

Regards,
Marti