Re: Performance penalty when using WITH

From: Robert Klemme <shortcutter(at)googlemail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Li Jin <ljin(at)tripadvisor(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance penalty when using WITH
Date: 2011-08-03 17:30:46
Message-ID: CAM9pMnNpZ4Sq5okZYGJdojki16a1nsLXXr-UuFDWJT5UfAoAHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Aug 3, 2011 at 6:24 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Aug 3, 2011 at 2:18 AM, Robert Klemme
> <shortcutter(at)googlemail(dot)com> wrote:
>>>> Another observation: That criterion looks suspicious to me. I would
>>>> expect any RDBMS to be better able to optimize this:
>>>>
>>>> WHERE firstname = 'Eddie' AND lastname like 'T%'
>>>>
>>>> I know it's semantically not the same but I would assume this is good
>>>> enough for the common usecase.  Plus, if there is an index on
>>>> (firstname, lastname) then that could be used.
>>>
>>> disagree. just one of the ways that could be stymied would to change
>>> the function behind the '||' operator.
>>
>> I don't understand what you mean.  Can you please elaborate?
>>
>> To explain my point a bit: I meant that by querying individual fields
>> separately instead of applying a criterion on a function of the two
>> the RDBMS has a better chance to use indexes and come up with a better
>> plan for this part of the query.
>
> Yes, but your assuming that it is safe and generally advantageous to
> do that.  Both assumptions I think are false.

I am not sure why you say I assume this is _safe_. I said it is "good
enough for the common usecase". And it is certainly good enough for
this particular query.

As for the "generally advantageous" I'd say that an index on "raw"
column values is usually useful for more queries than an index on a
specific function. That's why I'd say generally an index on column
values is more versatile and I would prefer it. Of course you might
achieve orders of magnitude of speedup for individual queries with an
index on a function tailored to that particular query but if you need
to do that for multiple queries you pay a higher penalty for updates.

> The || operator is trivially hacked:
> create or replace function funky_concat(l text, r text) returns text as
> $$
>  select textcat(textcat($1, 'abc'), $2);
> $$ language sql immutable ;
>
> update pg_operator set oprcode = 'funky_concat' where oid = 654;
>
> postgres=# select 'a' || 'b';
> ?column?
> ----------
>  aabcb
> (1 row)
>
> Also even ignoring the above it's not free to have the database try
> and analyze every instance of the || operator to see if it can be
> decomposed to boolean field operations.

Even with your hacked operator you would need an index on the
expression to make it efficient. That could be done with the original
|| as well. But my point was to query

WHERE a = 'foo' and b like 'b%'
instead of WHERE a || ' ' || substring(b, 1, 1) = 'foo b'

to use an index on (a,b). That index would also be useful for queries like

WHERE a = 'foo'
WHERE a like 'fo%'
WHERE a = 'foo' and b = 'bar'

and probably also

WHERE a > 'foo'
WHERE a > 'foo' and b like 'b%'
WHERE a > 'foo' and b = 'bar'

Kind regards

robert

PS: Sorry for the earlier duplicate. Gmail had a hickup.

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Clem Dickey 2011-08-04 01:53:19 Re: Re: GROUP BY with reasonable timings in PLAN but unreasonable execution time
Previous Message Kevin Grittner 2011-08-03 17:29:47 Re: Postgres performance on Linux and Windows