subselect instead of a view...

Lists: pgsql-sql
From: "Dan Langille" <dan(at)langille(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Subject: subselect instead of a view...
Date: 2002-11-26 03:15:41
Message-ID: 3DE2A10D.5202.D543B95A@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Create view WLE_pathnames as
SELECT E.name, EP.pathname
FROM element E,
element_pathnames EP,
watch_list_element WLE
WHERE WLE.watch_list_id = 3724
and WLE.element_id = E.id
and E.id = EP.id;
name | pathname
-------------+-----------------------------
euchre | /ports/games/euchre
reallyslick | /ports/graphics/reallyslick

The above query is pretty well optimized:

Nested Loop (cost=0.00..647.08 rows=62 width=61) (actual
time=0.99..1.19 rows=2 loops=1)
-> Nested Loop (cost=0.00..437.06 rows=62 width=20) (actual
time=0.66..0.78 rows=2 loops=1)
-> Index Scan using watch_list_element_pkey on
watch_list_element wle (cost=0.00..229.64 rows=62 width=4) (actual
time=0.34..0.36 rows=2 loops=1)
-> Index Scan using element_pkey on element e
(cost=0.00..3.34 rows=1 width=16) (actual time=0.16..0.17 rows=1
loops=2)
-> Index Scan using element_pathnames_pkey on element_pathnames ep
(cost=0.00..3.38 rows=1 width=41) (actual time=0.16..0.17 rows=1
loops=2)
Total runtime: 1.44 msec

Now I want to get all the stuff from element_pathnames like
pathname || '/%'. Does that make sense?

Essentially, I want this using the above view:

explain analyze
SELECT E.id, CLE.commit_log_id, E.name, EP.pathname
FROM element E,
element_pathnames EP,
commit_log_elements CLE,
WLE_pathnames WLEP
WHERE E.id = EP.id
AND EP.pathname like WLEP.pathname || '/%'
AND CLE.element_id = E.id
ORDER BY EP.pathname;

I know this can be done better, I just can't figure out how. I keep
thinking of a subselect but I'm totally blocked. It must be bed
time.

Sort (cost=285579.85..285579.85 rows=67012 width=114) (actual
time=9463.95..9464.01 rows=11 loops=1)
-> Hash Join (cost=264060.42..272748.13 rows=67012 width=114)
(actual time=9154.69..9463.55 rows=11 loops=1)
-> Seq Scan on commit_log_elements cle (cost=0.00..3936.75
rows=216575 width=8) (actual time=0.18..1762.38 rows=216575 loops=1)
-> Hash (cost=263370.92..263370.92 rows=36997 width=106)
(actual time=5716.62..5716.62 rows=0 loops=1)
-> Hash Join (cost=258032.99..263370.92 rows=36997
width=106) (actual time=5524.78..5695.47 rows=10 loops=1)
-> Seq Scan on element e (cost=0.00..2286.70
rows=119570 width=16) (actual time=0.15..892.40 rows=119570 loops=1)
-> Hash (cost=257416.50..257416.50 rows=36997
width=90) (actual time=3481.05..3481.05 rows=0 loops=1)
-> Nested Loop (cost=0.00..257416.50
rows=36997 width=90) (actual time=1847.01..3465.54 rows=10 loops=1)
-> Nested Loop (cost=0.00..647.08
rows=62 width=49) (actual time=0.96..1.50 rows=2 loops=1)
-> Nested Loop
(cost=0.00..437.06 rows=62 width=8) (actual time=0.64..0.94 rows=2
loops=1)
-> Index Scan using
watch_list_element_pkey on watch_list_element wle (cost=0.00..229.64
rows=62 width=4) (actual time=0.34..0.37 rows=2 loops=1)
-> Index Scan using
element_pkey on element e (cost=0.00..3.34 rows=1 width=4) (actual
time=0.21..0.22 rows=1 loops=2)
-> Index Scan using
element_pathnames_pkey on element_pathnames ep (cost=0.00..3.38
rows=1 width=41) (actual time=0.21..0.23 rows=1 loops=2)
-> Seq Scan on element_pathnames ep
(cost=0.00..2355.70 rows=119570 width=41) (actual time=0.08..858.74
rows=119570 loops=2)
Total runtime: 9464.51 msec

Clues please?
--
Dan Langille : http://www.langille.org/


From: "Dan Langille" <dan(at)langille(dot)org>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: subselect instead of a view...
Date: 2002-11-26 05:07:29
Message-ID: 3DE2BB41.18930.D5AA1552@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 25 Nov 2002 at 22:15, Dan Langille wrote:

> I know this can be done better, I just can't figure out how. I keep
> thinking of a subselect but I'm totally blocked. It must be bed time.

It's odd what reading the paper, relaxing with a book, and then
trying to sleep can generate. There I was, almost dropping off, when
I realised I needed this:

SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id
FROM element E,
element_pathnames EP,
watch_list_element WLE,
element_pathnames EP2,
element E2,
commit_log_elements CLE,
commit_log CL
WHERE WLE.watch_list_id = 3724
and WLE.element_id = E.id
and E.id = EP.id
and EP2.pathname like EP.pathname || '/%'
AND EP2.id = E2.id
AND E2.id = CLE.element_id
AND CLE.commit_log_id = CL.id;

I am still suspicous of that like. It seems to be the performance
killer here. There is an index which can be used:

# explain select * from element_pathnames WHERE pathname like 'abc%';
NOTICE: QUERY PLAN:

Index Scan using element_pathnames_pathname on element_pathnames
(cost=0.00..5.80 rows=1 width=41)

But in the main query, it doesn't get picked up. The explain appears
below (and at http://www.freshports.org/tmp/explain.txt which will be
easier to read than this text-wrapped version). There are quite a
few sequential scans there. I'm confused as to why the indexes are
not being used. A "vacuum analyze" has been run.

Thanks.

Hash Join (cost=266574.28..279596.82 rows=67012 width=118)
-> Hash Join (cost=263685.03..272372.74 rows=67012 width=114)
-> Seq Scan on commit_log_elements cle (cost=0.00..3936.75
rows=216575 width=8)
-> Hash (cost=262995.54..262995.54 rows=36997 width=106)
-> Hash Join (cost=2994.62..262995.54 rows=36997
width=106)
-> Nested Loop (cost=0.00..257416.50 rows=36997
width=102)
-> Nested Loop (cost=0.00..647.08 rows=62
width=61)
-> Nested Loop (cost=0.00..437.06
rows=62 width=20)
-> Index Scan using
watch_list_element_pkey on watch_list_element wle (cost=0.00..229.64
rows=62 width=4)
-> Index Scan using
element_pkey on element e (cost=0.00..3.34 rows=1 width=16)
-> Index Scan using
element_pathnames_pkey on element_pathnames ep (cost=0.00..3.38
rows=1 width=41)
-> Seq Scan on element_pathnames ep2
(cost=0.00..2355.70 rows=119570 width=41)
-> Hash (cost=2286.70..2286.70 rows=119570
width=4)
-> Seq Scan on element e2
(cost=0.00..2286.70 rows=119570 width=4)
-> Hash (cost=2543.20..2543.20 rows=58420 width=4)
-> Seq Scan on commit_log cl (cost=0.00..2543.20 rows=58420
width=4)
--
Dan Langille : http://www.langille.org/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dan Langille" <dan(at)langille(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: subselect instead of a view...
Date: 2002-11-26 05:29:29
Message-ID: 6003.1038288569@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Dan Langille" <dan(at)langille(dot)org> writes:
> SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id
> ...
> and EP2.pathname like EP.pathname || '/%'

> I am still suspicous of that like. It seems to be the performance
> killer here. There is an index which can be used:

It won't be, though. The LIKE-to-indexscan transformation happens at
plan time, and that means it can only happen if the pattern is a
constant. Which it surely will not be in your example.

regards, tom lane


From: "Dan Langille" <dan(at)langille(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: subselect instead of a view...
Date: 2002-11-26 15:17:26
Message-ID: 3DE34A36.29690.D7D88D0C@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 26 Nov 2002 at 0:29, Tom Lane wrote:

> "Dan Langille" <dan(at)langille(dot)org> writes:
> > SELECT E.name, EP.pathname, E.id, EP2.pathname, CL.id
> > ...
> > and EP2.pathname like EP.pathname || '/%'
>
> > I am still suspicous of that like. It seems to be the performance
> > killer here. There is an index which can be used:
>
> It won't be, though. The LIKE-to-indexscan transformation happens at
> plan time, and that means it can only happen if the pattern is a
> constant. Which it surely will not be in your example.

Thanks. I'll see if I can come up with something else to do this.
--
Dan Langille : http://www.langille.org/