Re: Patch for removng unused targets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, "'Hitoshi Harada'" <umi(dot)tanuki(at)gmail(dot)com>, "'Alexander Korotkov'" <aekorotkov(at)gmail(dot)com>, "'pgsql-hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for removng unused targets
Date: 2013-08-02 19:13:49
Message-ID: 6543.1375470829@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Thank you for the adjustments and comments! In addition to adding comments to
> the function, I've improved the code in the function a little bit. Please find
> attached an updated version of the patch.

I started looking at this patch (finally). I'm not terribly satisfied
with it, because it addresses only a very small part of what we really
need to do in this area, and I'm afraid we might end up throwing it away
in toto once we make the larger changes needed. I carped about this
a bit back in <15642(dot)1354650764(at)sss(dot)pgh(dot)pa(dot)us>, but to try to fill in
some background, consider a query like

select expensive_function(x) from t;

where, since the DBA is smart, t has an index on expensive_function(x).
Ideally we'd just scan the index and return values out of it, without
recomputing the expensive_function(). The planner is physically able
to produce such a plan, but only in very limited cases, and an even
bigger problem is that its cost accounting doesn't recognize the potential
savings from not evaluating expensive_function(x); therefore, even if it
can generate the right plan, it might discard it in favor of a plan that
doesn't use the index. This patch has got that same problem: it makes
a useful improvement in the finished plan if that plan is of the right
form, but it does nothing to push the planner to produce that form in
the first place.

Basically these problems stem from the assumption that we can treat all
scan/join paths as producing the same "flat" tlist (containing only Vars)
and only worry about tlist evaluation at the top level. I think the fix
will have to involve recognizing that certain paths can produce some
expressions more cheaply than others can, and explicitly including those
expressions in the returned tlists in such cases. That's going to be a
pretty invasive change. (Of course, the executor already works that way,
but the planner has never included such considerations at the Path stage.)

Now, the connection to the patch at hand is that if the query is

select x,y,z from t order by expensive_function(x);

this patch will successfully suppress calculation of the expensive
function, *if* we were lucky enough to make the right choice of plan
without considering the cost of the function. It's perfectly capable
of making the wrong choice though. This will lead to bug reports about
"the planner chooses a dumb plan, even though it knows the right plan is
cheaper when I force it to choose that one". I think it's possible to
revise the patch so that we do take the cost savings into account, at
least at the point in grouping_planner where it chooses between the
cheapest_path and the sorted_path returned by query_planner. (I'm not
sure if there are cases where query_planner would discard the best choice
at an earlier stage, but that seems possible in join queries.) But this
won't do anything for cases where the expensive function appears in the
SELECT list.

So as I said, I'm worried that this will be mostly bogus once we address
the larger problem. With the larger fix in place, the expensive_function
value could come out of the indexscan, and then the resjunk expression
would be nothing more than a Var referencing it, and hence hardly worth
suppressing.

Having said all that, there is one situation where this type of approach
might still be useful even after such a fix, and that's KNNGist-style
queries:

select a,b,c from t order by col <-> constant limit 10;

In a KNNGist search, there's no provision for the index AM to return the
actual value of the ORDER BY expression, and in fact it's theoretically
possible that that value is never even explicitly computed inside the
index AM. So we couldn't suppress the useless evaluation of <-> by dint
of requiring the physical scan to return that value as a Var.

Reading between the lines of the original submission at
<CAPpHfdtG5qoHoD+w=Tz3wC3fZ=b8i21=V5xandBFM=DTo-Yg=Q(at)mail(dot)gmail(dot)com>,
I gather that it's the KNNGist-style case that worries you, so maybe
it's worth applying this type of patch anyway. I'd want to rejigger
it to be aware of the cost implications though, at least for
grouping_planner's choices.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-08-02 19:23:29 Re: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])
Previous Message Stephen Frost 2013-08-02 18:14:25 Re: Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])