Re: Composite Datums containing toasted fields are a bad idea(?)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Composite Datums containing toasted fields are a bad idea(?)
Date: 2014-04-27 18:18:46
Message-ID: 6517.1398622726@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-04-25 12:05:17 -0400, Tom Lane wrote:
>>> Meh ... is it likely that the columns involved in an ordering comparison
>>> would be so wide as to be toasted out-of-line? Such a query would only be
>>> fast if the row value were indexed, which would pretty much preclude use
>>> of wide columns.

> Just for some clarity, that also happens with expressions like:
> WHERE
> ROW(ev_class, rulename, ev_action) >= ROW('pg_rewrite'::regclass, '_RETURN', NULL)
> ORDER BY ROW(ev_class, rulename, ev_action);

> which is what is generated by such query generators - where the leading
> columns *are* indexed but not necessarily unique.

Ah, I see. Well, we're pretty darn stupid about such queries anyway :-(.
Your first example could be greatly improved by expanding the whole-row
Var into a ROW() construct (so that RowCompareExpr could be used), and
the second one by exploding the ROW() order-by into separate order-by
columns. Maybe someday we can do that, or persuade the query generators
not to generate such brain-dead SQL in the first place. But in the
meantime these coding techniques lead to highly suboptimal plans anyway,
with or without TOAST considerations. It's also worth noting that
it's merest luck that the existing code isn't *slower* about such
queries; if there were any significant number of comparisons of the
toasted columns occurring during the sort step, it could come out far
behind. So I'm not finding myself terribly concerned here.

Also, I did a bit more research and verified that my patch doesn't cause
any extra detoasting activity for simple set-returning-function cases,
for example:

regression=# create or replace function pgr() returns setof pg_rewrite as
'declare r pg_rewrite;
begin
for r in select * from pg_rewrite loop
return next r;
end loop;
end' language plpgsql;
CREATE FUNCTION

regression=# explain (analyze, buffers) select r.* from pgr() r;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Function Scan on pgr r (cost=0.25..10.25 rows=1000 width=135) (actual time=0.881..0.911 rows=177 loops=1)
Buffers: shared hit=36
Planning time: 0.059 ms
Execution time: 0.986 ms

The same for SQL-language functions, either inlined or not. It's not so
good if you insist on putting the SRF call in the targetlist:

explain (analyze, buffers) select pgr();
QUERY PLAN
------------------------------------------------------------------------------------------
Result (cost=0.00..5.25 rows=1000 width=0) (actual time=0.941..10.575 rows=177 loops=1)
Buffers: shared hit=179
Planning time: 0.029 ms
Execution time: 10.677 ms

On the other hand, in real-world usage (not EXPLAIN), a query like that is
certainly going to be detoasting all the fields anyway to return them to
the client.

On the whole I feel fairly good about the opinion that this change won't
be disastrous for mainstream usages, and will be beneficial for
performance some of the time. Since I'm not hearing any volunteers to
try to convert the other approach into a complete patch, I plan to push
forward with this one.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-04-27 19:19:32 Re: So why is EXPLAIN printing only *plan* time?
Previous Message Andreas Karlsson 2014-04-27 17:31:14 Re: So why is EXPLAIN printing only *plan* time?