Reducing expression evaluation overhead

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Reducing expression evaluation overhead
Date: 2004-03-15 23:39:57
Message-ID: 8715.1079393997@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been looking at an example provided by Arjen van der Meijden in
which practically all the runtime goes into evaluating very trivial
comparison expressions (it's basically a CASE statement with a huge
number of arms). Whether or not you think that a CASE with a huge
number of arms is a particularly sane thing to do, it does seem that
the overhead is uncomfortably large. I get profiles like this:

% cumulative self self total
time seconds seconds calls s/call s/call name
38.15 41.92 41.92 229646 0.00 0.00 nconc
21.76 65.84 23.92 199054454 0.00 0.00 ExecEvalExpr
11.38 78.34 12.50 10000 0.00 0.00 ExecEvalCase
8.43 87.61 9.27 66348151 0.00 0.00 ExecEvalFuncArgs
8.12 96.54 8.93 66348151 0.00 0.00 ExecMakeFunctionResult
2.96 99.78 3.25 66348151 0.00 0.00 ExecEvalVar
1.23 101.14 1.36 10058 0.00 0.00 AllocSetCheck
1.23 102.49 1.35 66348151 0.00 0.00 ExecEvalOper
1.12 103.72 1.24 76537 0.00 0.00 OpernameGetCandidates
0.85 104.66 0.94 66424693 0.00 0.00 int4eq

% cumulative self self total
time seconds seconds calls ms/call ms/call name
25.24 161.92 161.92 51488 3.14 3.14 nconc
20.99 296.57 134.65 172163502 0.00 0.00 ExecEvalExpr
14.90 392.18 95.61 _mcount
9.66 454.16 61.98 57381167 0.00 0.00 ExecMakeFunctionResult
7.32 501.12 46.96 10000 4.70 4.70 ExecEvalCase
7.24 547.60 46.48 57381167 0.00 0.00 ExecEvalFuncArgs
6.47 589.09 41.49 57381167 0.00 0.00 ExecEvalVar
2.90 607.69 18.60 57381167 0.00 0.00 ExecEvalOper
1.29 615.94 8.25 int4eq

The above are from different-size test cases on different hardware,
just to make sure that the data is reasonably trustworthy.

The nconc entry may be ignored; it's lappend() overhead from the parser,
and will drop into the noise once Neil gets his list rewrite done. The
thing that jumps out at me is that ExecEvalExpr is taking way more
cycles than it has any right to do.

The test conditions in the CASE are of the form "var = constant", and so
for each tested condition, ExecEvalExpr is invoked three times: one call
is passed off to ExecEvalVar, one to ExecEvalOper, and the constant is
handled inline by fairly trivial code, which is surely a good bit
cheaper than ExecEvalVar. So it seems that a very large fraction of the
runtime is being spent just in ExecEvalExpr's switching function, plus
the overhead of an extra level of function call to reach the routines
that actually do the work.

I am thinking about attacking this by expanding ExprState nodes to
include a function pointer to the appropriate evaluation subroutine
(ExecEvalVar, ExecEvalOper, etc). ExecEvalExpr would be replaced by
a macro like this:

#define ExecEvalExpr(expr, econtext, isNull, isDone) \
((*(expr)->evalfunc) (expr, econtext, isNull, isDone))

This would eliminate the overhead of the switch() in ExecEvalExpr, and
reduce two levels of function call to one for most node types, at the
cost of whatever the delta is between a direct and an indirect function
call. In my experience that delta is pretty small and we should come
out ahead.

The main downside I can see is adding another four or eight bytes to
each ExprState node, which might be a noticeable memory burden with
complex expressions. A lesser objection is that unless we wanted to
complicate and slow down the ExecEvalExpr macro, we'd have to assume
that the macro is never passed a null expression pointer. Right now
ExecEvalExpr treats a NULL input as if it were a null constant node.
However, I'm fairly sure that that test is useless because no one ever
tries to evaluate a null expression tree.

We could make some other marginal hacks too once we did this. For
instance, ExecEvalOper just hands off control to ExecMakeFunctionResult
once it's made sure the fcache has been initialized. We could have it
do that initialization and then change the evalfunc pointer to point
directly to ExecMakeFunctionResult, thereby eliminating another level of
function-call overhead on all subsequent calls. There are other tests
that could be reduced from every-time to one-time overhead with similar
tricks.

I'm not sure that this would let us catch up to what Arjen reports as
MySQL's expression evaluation speed, but it should at least speed things
up a bit with only fairly localized changes.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Philip Warner 2004-03-16 00:04:30 Re: Custom format for pg_dumpall
Previous Message Tom Lane 2004-03-15 23:14:42 Re: listening addresses