Re: Index + mismatching datatypes [WAS: index on custom

Lists: pgsql-performance
From: "jan(dot)aerts(at)bbsrc(dot)ac(dot)uk" <jan(dot)aerts(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: index on custom function; explain
Date: 2005-10-03 15:14:11
Message-ID: 1128352451.312090.37640@g44g2000cwa.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I'm trying to include a custom function in my SQL-queries, which
unfortunately leaves the server hanging...

I basically search through two tables:
* TABLE_MAPPING: lists that 'abc' is mapped to 'def'
id1 | name1 | id2 | name2
-------------------------
1 | abc | 2 | def
3 | uvw | 4 | xyz
This data means that 'abc' is mapped_to 'def', and 'uvw' is mapped_to
'xyz'. About 1,500,000 records in total.

* TABLE ALIASES: lists different aliases of the same thing
id1 | name1 | id2 | name2
-------------------------
3 | uvw | 2 | def
This data means that 'uvw' and 'def' are essentially the same thing.
About 820,000 records in total.

I have indexes on all columns of the above tables.

Based on the two tables above, 'abc' is indirectly mapped_to 'xyz' as
well (through 'def' also-known-as 'uvw').

I wrote this little function: aliases_of
CREATE FUNCTION aliases_of(INTEGER) RETURNS SETOF integer
AS 'SELECT $1
UNION
SELECT id1 FROM aliases WHERE id2 = $1
UNION
SELECT id2 FROM aliases WHERE id1 = $1
'
LANGUAGE SQL
STABLE;

A simple SELECT aliases_of(2) shows:
aliases_of
----------
2
3

Now, when I want to traverse the aliases, I would write a query as
follows:
SELECT m1.name1, m1.name2, m2.name1, m2.name2
FROM mappings m1, mappings m2
WHERE m1.name1 = 'abc'
AND m2.name1 IN (SELECT aliases_of(m1.name2));

Unfortunately, this query seems to keep running and to never stop...

An EXPLAIN of the above query shows:
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop (cost=0.00..118379.45 rows=1384837 width=80)
Join Filter: (subplan)
-> Index Scan using ind_cmappings_object1_id on c_mappings m1
(cost=0.00..7.08 rows=2 width=40)
Index Cond: (name1 = 'abc')
-> Seq Scan on c_mappings m2 (cost=0.00..35935.05 rows=1423805
width=40)
SubPlan
-> Result (cost=0.00..0.01 rows=1 width=0)
(7 rows)

Strangely enough, I _do_ get output when I type the following query:
SELECT m1.name1, m1.name2, m2.name1, m2.name2
FROM mappings m1, mappings m2
WHERE m1.name1 = 'abc'
AND m2.name1 IN (
SELECT m1.name2
UNION
SELECT name2 FROM aliases WHERE name1 = m1.name2
UNION
SELECT name1 FROM aliases WHERE name2 = m2.name1
);

The EXPLAIN for this query is:
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.00..36712030.90 rows=1384837 width=80)
Join Filter: (subplan)
-> Index Scan using ind_cmappings_object1_id on c_mappings m1
(cost=0.00..7.08 rows=2 width=40)
Index Cond: (object1_id = 16575564)
-> Seq Scan on c_mappings m2 (cost=0.00..35935.05 rows=1423805
width=40)
SubPlan
-> Unique (cost=13.21..13.23 rows=1 width=4)
-> Sort (cost=13.21..13.22 rows=3 width=4)
Sort Key: object2_id
-> Append (cost=0.00..13.18 rows=3 width=4)
-> Subquery Scan "*SELECT* 1" (cost=0.00..0.01
rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1
width=0)
-> Subquery Scan "*SELECT* 2" (cost=0.00..5.92
rows=1 width=4)
-> Index Scan using
ind_caliases_object2_id on c_aliases (cost=0.00..5.92 rows=1 width=4)
Index Cond: (object2_id = $0)
-> Subquery Scan "*SELECT* 3" (cost=0.00..7.25
rows=1 width=4)
-> Index Scan using
ind_caliases_object1_id on c_aliases (cost=0.00..7.25 rows=1 width=4)
Index Cond: (object1_id = $0)
(18 rows)

So my questions are:
* Does anyone have any idea how I can integrate a function that lists
all aliases for a given name into such a mapping query?
* Does the STABLE keyword in the function definition make the function
to read all its data into memory?
* Is there a way to let postgres use an "Index scan" on that function
instead of a "seq scan"?

Any help very much appreciated,
Jan Aerts


From: "Jan Aerts" <jan(dot)aerts(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: index on custom function; explain
Date: 2005-10-04 10:10:28
Message-ID: 1128420628.326795.158780@g47g2000cwa.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Some additional thoughts: what appears to take the most time (i.e.
account for the highest cost in the explain), is _not_ running the
function itself (cost=0.00..0.01), but comparing the result from that
function with the name1 column in the mappings table
(cost=0.00..35935.05). Am I right? (See EXPLAIN in previous post.) If
so: that's pretty strange, because the name1-column in the mappings
table is indexed...

jan.


From: Yann Michel <yann-postgresql(at)spline(dot)de>
To: "jan(dot)aerts(at)bbsrc(dot)ac(dot)uk" <jan(dot)aerts(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: index on custom function; explain
Date: 2005-10-06 08:19:19
Message-ID: 20051006081919.GA17081@zoom.spline.inf.fu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

On Mon, Oct 03, 2005 at 08:14:11AM -0700, jan(dot)aerts(at)bbsrc(dot)ac(dot)uk wrote:
> So my questions are:
> * Does anyone have any idea how I can integrate a function that lists
> all aliases for a given name into such a mapping query?

what version are you using?

TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Cheers,
Yann


From: Andrew McMillan <andrew(at)catalyst(dot)net(dot)nz>
To: Jan Aerts <jan(dot)aerts(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: index on custom function; explain
Date: 2005-10-06 18:50:01
Message-ID: 1128624601.17294.86.camel@lamb.mcmillan.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 2005-10-04 at 03:10 -0700, Jan Aerts wrote:
> Some additional thoughts: what appears to take the most time (i.e.
> account for the highest cost in the explain), is _not_ running the
> function itself (cost=0.00..0.01), but comparing the result from that
> function with the name1 column in the mappings table
> (cost=0.00..35935.05). Am I right? (See EXPLAIN in previous post.) If
> so: that's pretty strange, because the name1-column in the mappings
> table is indexed...

35935.05 is for the loop, 0.01 is for the operation within the loop.

What version of PostgreSQL is this? Some old versions were not good at
handling the IN ( ... ) clause.

Also, PostgreSQL doesn't always do a wonderful job of considering the
activities of a function into the design of the query plans. Sometimes
this can be a blessing, but not in this case.

Cheers,
Andrew.

-------------------------------------------------------------------------
Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington
WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201 MOB: +64(272)DEBIAN OFFICE: +64(4)499-2267
It is truth which you cannot contradict; you can without any difficulty
contradict Socrates. - Plato

-------------------------------------------------------------------------


From: Enrico Weigelt <weigelt(at)metux(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Index + mismatching datatypes [WAS: index on custom function; explain]
Date: 2005-11-07 18:07:38
Message-ID: 20051107180738.GC15990@nibiru.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

* Yann Michel <yann-postgresql(at)spline(dot)de> wrote:

> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

I've got a similar problem: I have to match different datatypes,
ie. bigint vs. integer vs. oid.

Of course I tried to use casted index (aka ON (foo::oid)), but
it didn't work.

What am I doing wrong ?

cu
--
---------------------------------------------------------------------
Enrico Weigelt == metux IT service
phone: +49 36207 519931 www: http://www.metux.de/
fax: +49 36207 519932 email: contact(at)metux(dot)de
---------------------------------------------------------------------
Realtime Forex/Stock Exchange trading powered by postgreSQL :))
http://www.fxignal.net/
---------------------------------------------------------------------


From: Neil Conway <neilc(at)samurai(dot)com>
To: weigelt(at)metux(dot)de
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index + mismatching datatypes [WAS: index on custom
Date: 2005-11-07 20:45:57
Message-ID: 1131396357.6884.110.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 2005-07-11 at 19:07 +0100, Enrico Weigelt wrote:
> I've got a similar problem: I have to match different datatypes,
> ie. bigint vs. integer vs. oid.
>
> Of course I tried to use casted index (aka ON (foo::oid)), but
> it didn't work.

Don't include the cast in the index definition, include it in the query
itself:

SELECT ... FROM foo WHERE int8col = 5::int8

for example. Alternatively, upgrade to 8.0 or better, which doesn't
require this workaround.

-Neil