Re: immutable functions vs. join for lookups ?

Lists: pgsql-performance
From: Enrico Weigelt <weigelt(at)metux(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: immutable functions vs. join for lookups ?
Date: 2005-04-15 20:55:11
Message-ID: 20050415205511.GB20345@nibiru.borg.metux.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Hi folks,

I like to use (immutable) functions for looking up serveral
(almost constant) things, i.e fetching a username by id.
This makes my queries more clear.

But is this really performant ?

Lets imagine:

We've got an table with user accounts (uid,name,...). Then we've
got another one which contains some items assigned to users, and
so are linked to them by an uid field.
Now want to view the items with usernames instead of just uid:

a) SELECT items.a, items.b, ..., users.username FROM items, users
WHERE items.uid = users.uid;

c) CREATE FUNCTION id2username(oid) RETURNS text
LANGUAGE 'SQL' IMMUTABLE AS '
SELECT username AS RESULT FROM users WHERE uid = $1';

SELECT items.a, items.b, ..., id2username(users.uid);

Which one is faster with
a) only a few users (<50)
b) many users ( >1k )
while we have several 10k of items ?

thx
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service

phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: contact(at)metux(dot)de
cellphone: +49 174 7066481
---------------------------------------------------------------------
-- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: weigelt(at)metux(dot)de
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-15 21:12:47
Message-ID: 6776.1113599567@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Enrico Weigelt <weigelt(at)metux(dot)de> writes:
> c) CREATE FUNCTION id2username(oid) RETURNS text
> LANGUAGE 'SQL' IMMUTABLE AS '
> SELECT username AS RESULT FROM users WHERE uid = $1';

This is simply dangerous. The function is *NOT* immutable (it is
stable though). When ... not if ... your application breaks because
you got the wrong answers, you'll get no sympathy from anyone.

The correct question to ask was "if I make a stable function like
this, is it likely to be faster than the join?". The answer is
"probably not; at best it will be equal to the join". The best the
planner is likely to be able to do with the function-based query
is equivalent to a nestloop with inner indexscan (assuming there is
an index on users.uid). If that's the best plan then the join case
should find it too ... but if you are selecting a lot of items rows
then it won't be the best plan.

regards, tom lane


From: Enrico Weigelt <weigelt(at)metux(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-17 06:06:04
Message-ID: 20050417060603.GA23283@nibiru.borg.metux.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Enrico Weigelt <weigelt(at)metux(dot)de> writes:
> > c) CREATE FUNCTION id2username(oid) RETURNS text
> > LANGUAGE 'SQL' IMMUTABLE AS '
> > SELECT username AS RESULT FROM users WHERE uid = $1';
>
> This is simply dangerous. The function is *NOT* immutable (it is
> stable though). When ... not if ... your application breaks because
> you got the wrong answers, you'll get no sympathy from anyone.

In my case it is immutable. The username never changes.

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service

phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: contact(at)metux(dot)de
cellphone: +49 174 7066481
---------------------------------------------------------------------
-- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------


From: Jaime Casanova <systemguards(at)gmail(dot)com>
To: weigelt(at)metux(dot)de, pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-17 08:37:07
Message-ID: c2d9e70e05041701372aee3a2c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 4/17/05, Enrico Weigelt <weigelt(at)metux(dot)de> wrote:
> * Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Enrico Weigelt <weigelt(at)metux(dot)de> writes:
> > > c) CREATE FUNCTION id2username(oid) RETURNS text
> > > LANGUAGE 'SQL' IMMUTABLE AS '
> > > SELECT username AS RESULT FROM users WHERE uid = $1';
> >
> > This is simply dangerous. The function is *NOT* immutable (it is
> > stable though). When ... not if ... your application breaks because
> > you got the wrong answers, you'll get no sympathy from anyone.
>
> In my case it is immutable. The username never changes.
>
Even if your data never changes it *can* change so the function should
be at most stable not immutable.

regards,
Jaime Casanova


From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-18 09:00:38
Message-ID: 758d5e7f05041802004ee56be4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 4/15/05, Enrico Weigelt <weigelt(at)metux(dot)de> wrote:
> a) SELECT items.a, items.b, ..., users.username FROM items, users
> WHERE items.uid = users.uid;
>
> c) CREATE FUNCTION id2username(oid) RETURNS text
> LANGUAGE 'SQL' IMMUTABLE AS '
> SELECT username AS RESULT FROM users WHERE uid = $1';

You will be told that this function is not immutable but stable, and this
is quite right. But consider such a function:

CREATE OR REPLACE FUNCTION id2username (oid int) RETURNS TEXT AS $$
BEGIN
IF oid = 0 THEN RETURN 'foo';
ELSIF oid = 1 THEN RETURN 'bar';
END IF;
END;
$$ LANGUAGE plpgsql IMMUTABLE;

versus a lookup table with similar data. Logic suggests it should be faster
than a table... It got me worried when I added: "RAISE WARNING 'Called'"
after begin and I got lots of "Called" warnings when using this IMMUTABLE
function in select... And the timings for ~6000 values in aaa table
(and two values in lookup table) are:

There is a query, output of the EXPLAIN ANALYZE, Time of EXPLAIN
ANALYZE and "Real time" of SELECT (without EXPLAIN ANALYZE):

a) simple select from temp table, and a lookup cost:
EXPLAIN ANALYZE SELECT n FROM aaa;
Seq Scan on aaa (cost=0.00..87.92 rows=5992 width=4) (actual
time=0.011..24.849 rows=6144 loops=1)
Total runtime: 51.881 ms
(2 rows)
Time: 52,882 ms
Real time: 16,261 ms

EXPLAIN ANALYZE SELECT id2username(n) FROM aaa limit 2;
Limit (cost=0.00..0.03 rows=2 width=4) (actual time=0.111..0.150
rows=2 loops=1)
-> Seq Scan on aaa (cost=0.00..104.80 rows=6144 width=4) (actual
time=0.102..0.129 rows=2 loops=1)
Total runtime: 0.224 ms
(3 rows)
Time: 1,308 ms
Real time: 1,380 ms

b) natural join with lookup table:
EXPLAIN ANALYZE SELECT username FROM aaa NATURAL JOIN lookup;
Hash Join (cost=2.45..155.09 rows=3476 width=32) (actual
time=0.306..83.677 rows=6144 loops=1)
Hash Cond: ("outer".n = "inner".n)
-> Seq Scan on aaa (cost=0.00..87.92 rows=5992 width=4) (actual
time=0.006..25.517 rows=6144 loops=1)
-> Hash (cost=2.16..2.16 rows=116 width=36) (actual
time=0.237..0.237 rows=0 loops=1)
-> Seq Scan on lookup (cost=0.00..2.16 rows=116 width=36)
(actual time=0.016..0.034 rows=2 loops=1)
Total runtime: 107.378 ms
(6 rows)
Time: 109,040 ms
Real time: 25,364 ms

c) IMMUTABLE "static" lookup function:
EXPLAIN ANALYZE SELECT id2username(n) FROM aaa;
Seq Scan on aaa (cost=0.00..104.80 rows=6144 width=4) (actual
time=0.089..116.397 rows=6144 loops=1)
Total runtime: 143.800 ms
(2 rows)
Time: 144,869 ms
Real time: 102,428 ms

d) self-join with a function ;)
EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
aaa USING (n);
Hash Left Join (cost=506.82..688.42 rows=6144 width=36) (actual
time=102.382..182.661 rows=6144 loops=1)
Hash Cond: ("outer".n = "inner".n)
-> Seq Scan on aaa (cost=0.00..89.44 rows=6144 width=4) (actual
time=0.012..24.360 rows=6144 loops=1)
-> Hash (cost=506.82..506.82 rows=2 width=36) (actual
time=102.217..102.217 rows=0 loops=1)
-> Subquery Scan v_lookup (cost=476.05..506.82 rows=2
width=36) (actual time=53.626..102.057 rows=2 loops=1)
-> Subquery Scan "values" (cost=476.05..506.80 rows=2
width=4) (actual time=53.613..102.023 rows=2 loops=1)
-> Unique (cost=476.05..506.77 rows=2 width=4)
(actual time=53.456..101.772 rows=2 loops=1)
-> Sort (cost=476.05..491.41 rows=6144
width=4) (actual time=53.440..76.710 rows=6144 loops=1)
Sort Key: n
-> Seq Scan on aaa
(cost=0.00..89.44 rows=6144 width=4) (actual time=0.013..26.626
rows=6144 loops=1)
Total runtime: 209.378 ms
(11 rows)
Time: 211,460 ms
Real time: 46,682 ms

...so this IMMUTABLE is twice as slow (~100 ms) as the query joining
itself with a SELECT DISTINCT on an IMMUTABLE function (~50 ms),
which is twice as slow as JOIN against lookup table (~25 ms), and I feel
this IMMUTABLE function could be around ~20 ms (~16 ms plus
calling the function two times plus giving the values).

Ah, and this is PostgreSQL 8.0.1 running under FreeBSD on a
CPU: Intel(R) Celeron(R) CPU 2.40GHz (2400.10-MHz 686-class CPU).

Regards,
Dawid

PS: I have a feeling that IMMUTABLE functions worked better in 7.4,
yet I am unable to confirm this.


From: Enrico Weigelt <weigelt(at)metux(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-21 19:22:26
Message-ID: 20050421192226.GB16548@nibiru.borg.metux.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

* Jaime Casanova <systemguards(at)gmail(dot)com> wrote:

<snip>
> Even if your data never changes it *can* change so the function should
> be at most stable not immutable.

okay, the planner sees that the table could potentionally change.
but - as the dba - I'd like to tell him, this table *never* changes
in practise (or at most there will be an insert once a year)

isnt there any way to enforce the function to be really immutable ?

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service

phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: contact(at)metux(dot)de
cellphone: +49 174 7066481
---------------------------------------------------------------------
-- DSL ab 0 Euro. -- statische IP -- UUCP -- Hosting -- Webshops --
---------------------------------------------------------------------


From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: weigelt(at)metux(dot)de, pgsql-performance(at)postgresql(dot)org
Subject: Re: immutable functions vs. join for lookups ?
Date: 2005-04-22 10:15:24
Message-ID: 758d5e7f050422031579a04963@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 4/21/05, Enrico Weigelt <weigelt(at)metux(dot)de> wrote:
> <snip>
> > Even if your data never changes it *can* change so the function should
> > be at most stable not immutable.
>
> okay, the planner sees that the table could potentionally change.
> but - as the dba - I'd like to tell him, this table *never* changes
> in practise (or at most there will be an insert once a year)
>
> isnt there any way to enforce the function to be really immutable ?

Never say never. :)

And to answer your question -- your IMMUTABLE function may reference
other functions (even VOLATILE). So you may create a "caller" immutable
function which just calls your non-immutable function. But from
performance standpoint there is not much difference (probably your
STABLE function will be faster than STABLE inside IMMUTABLE function).

Ah, and please note that some time in future PostgreSQL may require
that IMMUTABLE function calls only IMMUTABLE functions.

Regards,
Dawid