[PATCH] Caching for stable expressions with constant arguments v3

Lists: pgsql-hackers
From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-09-25 02:09:13
Message-ID: CABRT9RAsQbPuSmTAXAo77a=UduXd36nCFPQRpS9r+3hqYFCfjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi list!

This is the third version of my CacheExpr patch. The patch is now
feature-complete -- meaning that it's now capable of caching all
expression types that are currently constant-foldable. Simple queries
don't get CacheExprs added at all now. Code comments should also be
sufficient for review. New regression tests about caching behavior are
included.

I'm now at a point where I can't think of anything significant to
improve about the patch so feedback would be very welcome.

For explanation about design decisions, please read these earlier messages:
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00579.php
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00812.php
http://archives.postgresql.org/pgsql-hackers/2011-09/msg00833.php

Should I remove the enable_cacheexpr GUC? The overhead of this doesn't
seem big enough to warrant people turning it off.

I'm not happy with some of the duplication added to
const_expressions_mutator, particularly CaseExpr. However, moving that
code to another function or macro would probably make it harder to
understand (every parameter would only be used once). Thoughts?

Performance
-----------
There's no question that this patch improves performance for large
sequential scan filters, so I'm mostly exploring how much overhead is
added in the worst cases here.

For benchmarking, I built and ran two parallel PostgreSQL
installations on different ports, one with my latest patche and
another with the base git revision where I branched off my development

Built with GCC 4.6.1, configured with --disable-cassert --disable-debug
Default PostgreSQL configuration running on Phenom II X4. CPU
frequency scaling is disabled during test and PostgreSQL is isolated
to a certain CPU core to reduce variance.

For testing I used 'pgbench -T 10 -n -f $SCRIPT -p $PORT'
This command was ran 20 times (10 for patched, 10 for unpatched) and
the runs were interleaved.

All result differences are statistically significant according to
t-test with p<0.05

First of all, some very simple queries to get some idea of the
overhead of the patch.
----
select now()

base: avg=22088.9 stdev=180.0
patch: avg=22233.8 stdev=134.1

In this case the patch turns off CacheExpr insertion in order not to
break PL/pgSQL simple expressions. Why performance improves by 0.6%, I
don't know; it may be due to the code rearranging in simplify_function

----
select * from now()

base: avg=17097.1 stdev=48.2
patch: avg=16929.2 stdev=47.8

Caching is on here, and hurts the patch because it's only evaluated
once. Performance loss of 1.0%

----
Now we'll try some more complicated expressions on a table with two rows:
create table two (ts timestamptz);
insert into two values (timestamptz '2011-09-24 20:07:56.641322+03'),
('2011-09-24 20:07:56.642743+03');

----
two rows with cachable WHERE
select * from two where ts >= to_date(now()::date::text, 'YYYY-MM-DD')
and ts < (to_date(now()::date::text, 'YYYY-MM-DD') + interval '1
year')

base: avg=5054.2 stdev=47.1
patch: avg=4900.6 stdev=24.4

Clearly for two rows, caching the expression doesn't pay off.
Performance loss of 3.1%

----
two rows with uncachable WHERE
select * from two where ts >= to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') and ts < (to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') + interval '1 year')

base: avg=6236.6 stdev=88.3
patched: avg=6118.7 stdev=50.1

No part of this expression is cachable because clock_timestamp() is a
volatile function. Performance loss is 1.9% in the patched version,
probably due to the overhead added by searching for cachable
expressions.

----
Now repeating the last two tests with a 50-row table:
create table fifty as select generate_series(timestamptz '2001-01-01',
timestamptz '2001-01-01' + '49 days', '1 day') ts;

----
50 rows with cachable WHERE
select * from fifty where ts >= to_date(now()::date::text,
'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') +
interval '1 year')

base: avg=3136.6 stdev=22.3
patch: avg=4397.3 stdev=35.5

As expected, performance is much better with more rows due to caching;
improvement of 28.7%

----
50 rows with uncachable WHERE
select * from fifty where ts >= to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') and ts < (to_date(clock_timestamp()::date::text,
'YYYY-MM-DD') + interval '1 year')

base: avg=3514.0 stdev=9.8
patch: avg=3500.1 stdev=18.4

Since planning overhead is less significant with more rows, the
regression is just 0.4% here.

----
As always, my latest progress can be seen in my GitHub 'cahce' branch:
https://github.com/intgr/postgres/commits/cache

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v3.patch text/x-patch 83.4 KB

From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-09-25 19:08:55
Message-ID: 1663063044.50711.1316977735321.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

I'd love to see someone evaluate the impact of Marti's patch on JDBC applications which use named prepared statements. Anyone have a benchmark handy?

--Josh


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-04 14:27:06
Message-ID: CAJKUy5ifQhnXNa15dxr_FVzUUtC_4=YE91TtOis43et9iNUv6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 24, 2011 at 9:09 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> Hi list!
>
> This is the third version of my CacheExpr patch.

i wanted to try this, but it no longer applies in head... specially
because of the change of IF's for a switch in
src/backend/optimizer/util/clauses.c it also needs adjustment for the
rangetypes regress test

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-04 20:53:37
Message-ID: 4EDBDDD1.4050003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.09.2011 05:09, Marti Raudsepp wrote:
> This is the third version of my CacheExpr patch.

This seems to have bitrotted, thanks to the recent refactoring in
eval_const_expressions().

> For explanation about design decisions, please read these earlier messages:
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg00579.php
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg00812.php
> http://archives.postgresql.org/pgsql-hackers/2011-09/msg00833.php

I wonder if it would be better to add the CacheExpr nodes to the tree as
a separate pass, instead of shoehorning it into eval_const_expressions?
I think would be more readable that way, even though a separate pass
would be more expensive. And there are callers of
eval_const_expressions() that have no use for the caching, like
process_implied_equality().

This comment in RelationGetExpressions() also worries me:

> /*
> * Run the expressions through eval_const_expressions. This is not just an
> * optimization, but is necessary, because the planner will be comparing
> * them to similarly-processed qual clauses, and may fail to detect valid
> * matches without this. We don't bother with canonicalize_qual, however.
> */
> result = (List *) eval_const_expressions(NULL, (Node *) result);

Do the injected CacheExprs screw up that equality? Or the constraint
exclusion logic in predtest.c?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 16:16:51
Message-ID: 28529.1323101811@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I wonder if it would be better to add the CacheExpr nodes to the tree as
> a separate pass, instead of shoehorning it into eval_const_expressions?
> I think would be more readable that way, even though a separate pass
> would be more expensive.

A separate pass would be very considerably more expensive, because
(1) it would require making a whole new copy of each expression tree,
and (2) it would require looking up the volatility status of each
function and operator. eval_const_expressions already has to do the
latter, or has to do it in a lot of cases anyway, so I think it's
probably the best place to add this. If it weren't for (2) I would
suggest adding the work to setrefs.c instead, but as it is I think
we'd better suck it up and deal with any fallout in the later stages
of the planner.

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 17:16:24
Message-ID: CABRT9RB9mVUBoB+U+hUEpAeZsTaGnRjRBfjxaXcQi8y1AvQs+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 4, 2011 at 22:53, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> This seems to have bitrotted, thanks to the recent refactoring in
> eval_const_expressions().

Luckily the conflicts are mostly whitespace changes, so shouldn't be
hard to fix. I'll try to come up with an updated patch today or
tomorrow.

> I wonder if it would be better to add the CacheExpr nodes to the tree as a
> separate pass, instead of shoehorning it into eval_const_expressions? I
> think would be more readable that way, even though a separate pass would be
> more expensive. And there are callers of eval_const_expressions() that have
> no use for the caching, like process_implied_equality().

Per Tom's comment, I won't split out the cache insertion for now.

The context struct has a boolean 'cache' attribute that controls
whether caching is desired or not. I think this could be exposed to
the caller as an eval_const_expressions() argument.

> This comment in RelationGetExpressions() also worries me:
[...]
> Do the injected CacheExprs screw up that equality? Or the constraint
> exclusion logic in predtest.c?

I suspect these cases are guaranteed not to produce any CacheExprs.
They're always immutable expressions. If they contain Var references
they're stored as is (not cachable); if not, they're folded to a
constant.

But I will have to double-check all the callers; it might be a good
idea to disable caching anyway in these cases.

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 17:31:10
Message-ID: 136.1323106270@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Sun, Dec 4, 2011 at 22:53, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> This comment in RelationGetExpressions() also worries me:
> [...]
>> Do the injected CacheExprs screw up that equality? Or the constraint
>> exclusion logic in predtest.c?

> I suspect these cases are guaranteed not to produce any CacheExprs.
> They're always immutable expressions. If they contain Var references
> they're stored as is (not cachable); if not, they're folded to a
> constant.

> But I will have to double-check all the callers; it might be a good
> idea to disable caching anyway in these cases.

I think if you have some call sites inject CacheExprs and others not,
it will get more difficult to match up expressions that should be
considered equal. On the whole this seems like a bad idea. What is
the reason for having such a control boolean in the first place?

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 18:53:11
Message-ID: CABRT9RD=3GSroPO2xmOhZ-8DVvrmUNuDr8P9U34jY33WeO8AJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2011 at 19:31, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think if you have some call sites inject CacheExprs and others not,
> it will get more difficult to match up expressions that should be
> considered equal.  On the whole this seems like a bad idea.  What is
> the reason for having such a control boolean in the first place?

It's needed for correctness with PL/pgSQL simple expressions.

This seems a bit of a kludge, but I considered it the "least bad"
solution. Here's what I added to planner.c standard_planner():

/*
* glob->isSimple is a hint to eval_const_expressions() and PL/pgSQL that
* this statement is potentially a simple expression -- it contains no
* table references, no subqueries and no join clauses.
*
* We need this here because this prevents insertion of CacheExpr, which
* would break simple expressions in PL/pgSQL. Such queries wouldn't
* benefit from constant caching anyway.
*
* The actual definition of a simple statement is more strict, but we
* don't want to spend that checking overhead here.
*
* Caveat: Queries with set-returning functions in SELECT list could
* still potentially benefit from caching, but we don't check that now.
*/
glob->isSimple = (parse->commandType == CMD_SELECT &&
parse->jointree->fromlist == NIL &&
parse->hasSubLinks == FALSE &&
parse->cteList == NIL);

----

I considered stripping CacheExpr nodes later in PL/pgSQL, but I can't
remember right now why I rejected that approach (sorry, it's been 2
months).

Currently I'm also disabling CacheExpr nodes in
estimate_expression_value() since we know for a fact that the planner
only evaluates it once. But that probably doesn't make much of a
difference.

Regards,
Marti


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 19:04:03
Message-ID: 4EDD15A3.5020703@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.12.2011 20:53, Marti Raudsepp wrote:
> I considered stripping CacheExpr nodes later in PL/pgSQL, but I can't
> remember right now why I rejected that approach (sorry, it's been 2
> months).

Yet another idea would be to leave the CacheExprs there, but provide a
way to reset the caches. PL/pgSQL could then reset the caches between
every invocation. Or pass a flag to ExecInitExpr() to skip through the
CacheExprs.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 19:36:14
Message-ID: 2433.1323113774@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 05.12.2011 20:53, Marti Raudsepp wrote:
>> I considered stripping CacheExpr nodes later in PL/pgSQL, but I can't
>> remember right now why I rejected that approach (sorry, it's been 2
>> months).

> Yet another idea would be to leave the CacheExprs there, but provide a
> way to reset the caches. PL/pgSQL could then reset the caches between
> every invocation.

We're likely to need a way to reset these caches anyway, at some point...

> Or pass a flag to ExecInitExpr() to skip through the CacheExprs.

Not sure what you mean by that --- are you imagining that the ExprState
tree would have structure not matching the Expr tree? That seems just
about guaranteed to break something somewhere.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 19:42:19
Message-ID: 4EDD1E9B.5090507@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.12.2011 21:36, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Or pass a flag to ExecInitExpr() to skip through the CacheExprs.
>
> Not sure what you mean by that --- are you imagining that the ExprState
> tree would have structure not matching the Expr tree?

Yes. Or it could just set a flag in the CacheExprState nodes to not do
the caching.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-05 23:05:42
Message-ID: CABRT9RD224HSRRNn8YTW8Po5O23vYfPptUZVwBUT7BzT9z-72A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 4, 2011 at 22:53, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> This seems to have bitrotted, thanks to the recent refactoring in
> eval_const_expressions().

So here's the rebased version of this. It probably took me longer to
rebase this than it took Andres to write his patch ;)

Passes all regression tests and diffing clauses.c seems to have
introduced no functional differences.

But the good news is that I learned how to use git rebase and
pgindent, so it was worth it.

I will look into addressing some of the comments tomorrow, much thanks
for the feedback everyone!

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v3-rebased.patch text/x-patch 84.0 KB

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-06 22:29:22
Message-ID: CABRT9RB7rdvke7JHg853fnJO42DyxagS0EuX1fojMfY7=tAxVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2011 at 21:04, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Yet another idea would be to leave the CacheExprs there, but provide a way
> to reset the caches. PL/pgSQL could then reset the caches between every
> invocation. Or pass a flag to ExecInitExpr() to skip through the CacheExprs.

Great idea, I've ripped out all the conditional CacheExpr generation
logic from the planner and added a new boolean to CacheExprState
instead. This makes me happy as I'm now rid of most kludgy bits and
reduced the patch size somewhat. This also solves Tom's concern.

ExecInitExpr enables the cache when its 'PlanState *parent' attribute
isn't NULL. On the one hand this works out well since PlanState always
has a predictable life cycle thus caching is always safe.

On the other hand, a few places lose caching support this way since
they don't go through the planner:
* Column defaults in a COPY FROM operation. Common use case is
'timestamp default now()'
This might be a significant loss in some data-loading scenarios.
* ALTER TABLE t ALTER col TYPE x USING some_expr(); No big loss here.

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v4-wip.patch text/x-patch 79.2 KB

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-07 21:58:23
Message-ID: CABRT9RDJiQKcbbYWzykVFiacDv9nBxzBJGSDLPMqVe5ohiSi2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 7, 2011 at 00:29, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> ExecInitExpr enables the cache when its 'PlanState *parent' attribute
> isn't NULL
[...]
> On the other hand, a few places lose caching support this way since
> they don't go through the planner:
> * Column defaults in a COPY FROM operation. Common use case is
> 'timestamp default now()'
> This might be a significant loss in some data-loading scenarios.
> * ALTER TABLE t ALTER col TYPE x USING some_expr(); No big loss here.

Let me rephrase that as a question: Does it seem worthwhile to add a
new argument to ExecInitExpr to handle those two cases? Does relying
on the PlanState argument being NULL seem like a bad idea for any
reason?

PS: I forgot to mention that 2 test cases covering the two above query
types are deliberately left failing in the v4-wip patch.

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-07 22:24:32
Message-ID: 22021.1323296672@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> Let me rephrase that as a question: Does it seem worthwhile to add a
> new argument to ExecInitExpr to handle those two cases?

Possibly. Another way would be to keep its API as-is and introduce a
different function name for the other behavior. I would think that
we'd always know for any given caller which behavior we need, so a
flag as such isn't notationally helpful.

> Does relying
> on the PlanState argument being NULL seem like a bad idea for any
> reason?

Yes, that seemed like a pretty horrid idea when I read your description,
but I hadn't got round to looking at just how awful it might be.

regards, tom lane


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-10 14:50:12
Message-ID: 4EE371A4.7090703@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/07/2011 04:58 PM, Marti Raudsepp wrote:
> PS: I forgot to mention that 2 test cases covering the two above query
> types are deliberately left failing in the v4-wip patch.
>

It's not completely clear what happens next with this. Are you hoping
code churn here has calmed down enough for Jaime or someone else to try
and look at this more already? Or should we wait for a new update based
on the feedback that Heikki and Tom have already provided first, perhaps
one that proposes fixes for these two test cases?

One general suggestion about the fight with upstream changes you've run
into here. Now that you're settling into the git workflow, you might
consider publishing updates to a site like Github in the future too.
That lets testing of the code at the point you wrote it always
possible. Given just the patch, reviewers normally must reconcile any
bit rot before they can even compile your code to try it. That gets
increasingly sketchy the longer your patch waits before the next
CommitFest considers it. With a published git working tree, reviewers
can pull that for some hands-on testing whenever, even if a merge
wouldn't actually work out at that point. You just need to be careful
not to push an update that isn't self-consistent to the world. I
normally attach the best formatted patch I can and publish to Github.
Then reviewers can use whichever they find easier, and always have the
option of postponing a look at merge issues if they just want to play
with the program.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-10 19:05:08
Message-ID: CABRT9RA8mTMrBtw_Z0MG0t1=3YnmvbxTG9vPt2cNzSpmWrO++A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 10, 2011 at 16:50, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Or should we wait for a new update based on the
> feedback that Heikki and Tom have already provided first, perhaps one that
> proposes fixes for these two test cases?

Yes, I will post and updated and rebased version tomorrow.

> Now that you're settling into the git workflow, you might consider
> publishing updates to a site like Github in the future too.

It's in the 'cache' branch on my Github:
https://github.com/intgr/postgres/commits/cache
(I linked to it in the v3 posting, but forgot to mention it later)

Regards,
Marti


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Greg Smith <greg(at)2ndquadrant(dot)com>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-12 19:32:52
Message-ID: CABRT9RA9SYhQa4+rWXv223vJpkhOo0Z48xdgyoOrOGv4m0zfbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 10, 2011 at 21:05, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> Yes, I will post and updated and rebased version tomorrow.
(Sorry, I'm 1 day late :)

Here's v5 of the patch. Changes from v4:
* Rebased over Thomas Munro's constifying patch
* ExecInitExpr is now split in two:
* ExecInitExprMutator, which now takes a context struct argument, to
prevent having to pass around 2 arguments through recursion every
time.
* ExecInitExpr, which is the public function, now has a new boolean
argument to enable/disable cache. All callers have been converted.
* A new test case for a VALUES() expression.

I don't know of anything lacking in the current patch so I'm not
planning to make changes until further feedback. I'm also not very
confident about the refactoring I made with ExecInitExpr, so
bikeshedding is welcome. :)

Patch attached. Also available in the 'cache' branch on my Github:
https://github.com/intgr/postgres/commits/cache

Regards,
Marti

Attachment Content-Type Size
cacheexpr-v5.patch text/x-patch 118.5 KB

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-12 20:42:38
Message-ID: CABRT9RBSa-L3ps9pP2uYsoFDxVO=VpX3Lhp6=NM9FPk0HGSEgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 4, 2011 at 22:53, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I wonder if it would be better to add the CacheExpr nodes to the tree as a
> separate pass, instead of shoehorning it into eval_const_expressions? I
> think would be more readable that way, even though a separate pass would be
> more expensive. And there are callers of eval_const_expressions() that have
> no use for the caching, like process_implied_equality().

I had an idea how to implement this approach with minimal performance
impact. It might well be a dumb idea, but I wanted to get it out
there.

The 'struct Expr' type could have another attribute that stores
whether its sub-expressions contain stable/volatile functions, and
whether it only contains of constant arguments. This attribute would
be filled in for every Expr by eval_const_expressions.

Based on this information, ExecInitExpr (which is pretty simple for
now) could decide where to inject CacheExprState nodes. It's easier to
make that decision there, since by that stage the cachability of the
parent node with each argument is already known. (Currently I have to
stick all cachable arguments in a temporary list until I've determined
whether all arguments are cachable, or just some of them are)

This would decouple expression caching from the planner entirely;
CacheExpr wouldn't exist anymore. AFAICT this would also let us get
rid of contain_mutable_functions_walker() and possibly others.

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-12 21:08:04
Message-ID: 9104.1323724084@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> The 'struct Expr' type could have another attribute that stores
> whether its sub-expressions contain stable/volatile functions, and
> whether it only contains of constant arguments. This attribute would
> be filled in for every Expr by eval_const_expressions.

Hmm. I'm a bit hesitant to burn the extra space that would be required
for that. On the other hand it is a relatively clean solution to
postponing the whether-to-actually-cache decisions until runtime.
But on the third hand, I don't know whether this is really much cleaner
than always injecting CacheExpr and having some runtime flag that
prevents it from caching. The argument that CacheExpr could confuse
equal() checks applies just as much to this, unless you make some ad-hoc
decision that equal() should ignore these flag bits.

> This would decouple expression caching from the planner entirely;
> CacheExpr wouldn't exist anymore. AFAICT this would also let us get
> rid of contain_mutable_functions_walker() and possibly others.

Yeah, you could imagine getting rid of tree walks for both mutability
and returns-set tests, and maybe other things too --- once you've paid
the price of an expression attributes flag word, there's room for quite
a few bits in there. Don't know offhand if that's attractive enough to
tip the scales.

regards, tom lane


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-16 07:13:21
Message-ID: 4EEAEF91.6070400@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think what would be helpful here next is a self-contained benchmarking
script. You posted an outline of what you did for the original version
at
http://archives.postgresql.org/message-id/CABRT9RC-1wGxZC_Z5mwkdk70fgY2DRX3sLXzdP4voBKuKPZDow@mail.gmail.com

It doesn't sound like it would be hard to turn that into a little shell
script. I'd rather see one from you mainly so I don't have to worry
about whether I duplicated your procedure correctly or not. Don't even
worry about "best of 3", just run it three times, grep out the line that
shows the TPS number, and show them all. One of the first questions I
have is whether the performance changed between there and your v5. A
self-contained and fully automated performance test case would ease
review, and give some performance regression testing for other suggested
changes to use as a reference.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-16 11:07:41
Message-ID: CABRT9RA_eH7Zru+Q1W+ovW3n=nC3=pW3Azmp=--=nLUMvThyOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 09:13, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I think what would be helpful here next is a self-contained benchmarking
> script.

Alright, a simple script is attached.

> One of the first questions I have is whether
> the performance changed between there and your v5.

Not those testcases anyway, since the executor side changed very
little since my original patch. I'm more worried about the planner, as
that's where the bulk of the code is. There is a small but measurable
regression there (hence the latter 2 tests).

I'm running with shared_buffers=256MB

Unpatched Postgres (git commit 6d09b210):

select * from ts where ts between to_timestamp('2005-01-01',
'YYYY-MM-DD') and to_timestamp('2005-01-01', 'YYYY-MM-DD');
tps = 1.194965 (excluding connections establishing)
tps = 1.187269 (excluding connections establishing)
tps = 1.192899 (excluding connections establishing)
select * from ts where ts>now();
tps = 8.986270 (excluding connections establishing)
tps = 8.974889 (excluding connections establishing)
tps = 8.976249 (excluding connections establishing)
/*uncachable*/ select * from one where ts >=
to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') and ts <
(to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') + interval '1
year')
tps = 11647.167712 (excluding connections establishing)
tps = 11836.858624 (excluding connections establishing)
tps = 11658.372658 (excluding connections establishing)
/*cachable*/ select * from one where ts >= to_date(now()::date::text,
'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') +
interval '1 year')
tps = 9762.035996 (excluding connections establishing)
tps = 9695.627270 (excluding connections establishing)
tps = 9791.141908 (excluding connections establishing)

Patched Postgres (v5 applied on top of above commit):

select * from ts where ts between to_timestamp('2005-01-01',
'YYYY-MM-DD') and to_timestamp('2005-01-01', 'YYYY-MM-DD');
tps = 8.580669 (excluding connections establishing)
tps = 8.583070 (excluding connections establishing)
tps = 8.544887 (excluding connections establishing)
select * from ts where ts>now();
tps = 10.467226 (excluding connections establishing)
tps = 10.429396 (excluding connections establishing)
tps = 10.441230 (excluding connections establishing)
/*uncachable*/ select * from one where ts >=
to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') and ts <
(to_date(clock_timestamp()::date::text, 'YYYY-MM-DD') + interval '1
year')
tps = 11578.768193 (excluding connections establishing)
tps = 11594.920258 (excluding connections establishing)
tps = 11667.866443 (excluding connections establishing)
/*cachable*/ select * from one where ts >= to_date(now()::date::text,
'YYYY-MM-DD') and ts < (to_date(now()::date::text, 'YYYY-MM-DD') +
interval '1 year')
tps = 9505.943115 (excluding connections establishing)
tps = 9502.316023 (excluding connections establishing)
tps = 9546.047208 (excluding connections establishing)

Regards,
Marti

Attachment Content-Type Size
bench_cache.sh application/x-sh 1.2 KB

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-16 16:08:58
Message-ID: 4EEB6D1A.4000306@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

That looks easy enough to try out now, thanks for the script.

Jaime was interested in trying this out, and I hope he still finds time
to do that soon. Now that things have settled down and it's obvious how
to start testing, that should be a project he can take on. I don't
think this is going to reach ready to commit in the next few days
though, so I'm going to mark it as returned for this CommitFest. The
results you're showing are quite interesting, perhaps someone can help
sorting out the parsing area that's accidentally being decelerated.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-16 16:54:51
Message-ID: CABRT9RC46y4cTSM-=EbfaNscE+XrCZDJ66hboO8MgPJEuZR3nA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 18:08, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I don't think this
> is going to reach ready to commit in the next few days though, so I'm going
> to mark it as returned for this CommitFest.

Fair enough, I just hope this doesn't get dragged into the next
commitfest without feedback.

> perhaps someone can help sorting out the parsing area
> that's accidentally being decelerated.

Well the slowdown isn't "accidental", I think it's expected since I'm
adding a fair bit of code to expression processing (which isn't all
pretty).

It could be reduced by doing the caching decisions in a 2nd pass,
inside ExecInitExpr, but it would mean adding an extra field to
'struct Expr' and require a significant rewrite of the patch. I'm not
sure if it's worthwhile to attempt that approach:
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00483.php

Regards,
Marti


From: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-16 19:32:02
Message-ID: CAJKUy5iQZ59pW-mX7hn1U-XoHs3Jp5LdMs0HL_YctVYy7t6qDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 11:08 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> That looks easy enough to try out now, thanks for the script.
>
> Jaime was interested in trying this out, and I hope he still finds time to
> do that soon.

Actually i tried some benchmarks with the original version of the
patch and saw some regression with normal pgbench runs, but it wasn't
much... so i was trying to found out some queries that show benefit
now that we have it, that will be a lot more easier

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Caching for stable expressions with constant arguments v3
Date: 2011-12-17 17:24:02
Message-ID: CABRT9RD=2zDnLzQEYU7_SsgR7nJ5ZOazDm6uAg4_V1FxFPxPNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 21:32, Jaime Casanova <jaime(at)2ndquadrant(dot)com> wrote:
> Actually i tried some benchmarks with the original version of the
> patch and saw some regression with normal pgbench runs, but it wasn't
> much... so i was trying to found out some queries that show benefit
> now that we have it, that will be a lot more easier

Ah, one more trick for testing this patch: if you build with
-DDEBUG_CACHEEXPR=1 then EXPLAIN VERBOSE displays cached
subexpressions between a CACHE[...] marker.

Regards,
Marti


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Caching for stable expressions with constant arguments v3
Date: 2021-04-19 15:53:18
Message-ID: CA+Tgmobi-GRMPjWeFYFZ6NeuJaKOtdy1W9Vt8dG1TTzOzDLNXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 12:25 PM Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> Ah, one more trick for testing this patch: if you build with
> -DDEBUG_CACHEEXPR=1 then EXPLAIN VERBOSE displays cached
> subexpressions between a CACHE[...] marker.

Unless I missed something, this 2011 thread is the latest one about
this patch, and the patch was never applied, and nothing similar ever
got done either. Is that correct? If so, was there any particular
reason why this wasn't judged to be a good idea, or did it just not
get enough attention?

This was on my mind today because of a FDW pushdown question that came
up. Someone observed that a predicate of the form foreign_table_column
pushable_operator hairy_but_stable_expression is not pushed down. In
theory, it could be: just compute the value of
hairy_but_stable_expression locally, and then send the result across.
The trick is that it requires identifying a maximal stable
subexpression. In this case, the whole expression isn't stable,
because every row will provide a different value for
foreign_table_column, but the subexpression which is the right child
of the toplevel OpExpr is stable. Identifying that seems like it could
be somewhat expensive and I seem to vaguely recall some discussion of
that issue, but I think not on this thread. But if this patch - or
some other one - happened to inject a CacheExpr at the right place in
the plan tree, then postgres_fdw could leverage that existing
determination in a pretty straightforward way, precomputing the cached
value locally and then pushing the clause if otherwise safe.

That's also just a fringe benefit. Marti's test results show
significant speedup in all-local cases, which seem likely to benefit a
pretty broad class of queries. We'd probably have to give some thought
to how the technique would work with Andres's rewrite of expression
evaluation, but I imagine that's a solvable problem.

--
Robert Haas
EDB: http://www.enterprisedb.com