Re: help with getting index scan

Lists: pgsql-general
From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: help with getting index scan
Date: 2002-02-23 22:10:22
Message-ID: Pine.NEB.4.43.0202231600570.21797-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

i can't seem to get index scan to work on table phone_cat_address. here
are my schemas:

CREATE TABLE "phone_address" (
"id" integer DEFAULT nextval('"phone_address_id_seq"'::text) NOT
NULL,
"aid" bigint,
"name" character varying(96),
"address" character varying(60),
...
"nameftx" txtidx
);
CREATE UNIQUE INDEX phone_address_id_key
ON phone_address USING btree (id);
CREATE UNIQUE INDEX phone_address_aid_key
ON phone_address USING btree (aid);
CREATE INDEX phone_address_name_idx
ON phone_address USING btree (lower(name));
CREATE INDEX phone_address_nameftx_idx
ON phone_address USING gist (nameftx);

CREATE TABLE "phone_cat" (
"id" integer DEFAULT nextval('"phone_cat_id_seq"'::text) NOT NULL,
"cid" integer,
"name" character varying(96),
"popular" character(1) DEFAULT 'N',
"nameftx" txtidx
);

CREATE UNIQUE INDEX phone_cat_id_key ON phone_cat USING btree (id);
CREATE UNIQUE INDEX phone_cat_cid_key ON phone_cat USING btree (cid);
CREATE INDEX phone_cat_name_idx ON phone_cat USING btree (lower(name));
CREATE INDEX phone_cat_nameftx_idx ON phone_cat USING gist (nameftx);

CREATE TABLE "phone_cat_address" (
"cid" integer NOT NULL,
"aid" bigint NOT NULL
);

CREATE UNIQUE INDEX phone_cat_address_cid_key
ON phone_cat_address USING btree (cid, aid);

----

here is the explain:

yellowpages=# explain SELECT p.name,p.address,p.city,p.state
yellowpages-# FROM phone_address AS p, phone_cat AS pFROM phone_address AS
p, phone_cat AS pc, ph
one_cat_address AS pca
yellowpages-# WHERE pc.nameftx ## 'automobile&repair' AND pc.cid=pca.cid
AND pca.aid=p.aid
yellowpages-# ;
NOTICE: QUERY PLAN:

Nested Loop (cost=44.12..9272.76 rows=337 width=83)
-> Hash Join (cost=44.12..7243.86 rows=337 width=16)
-> Seq Scan on phone_cat_address pca (cost=0.00..5512.02
rows=336702 width=12)
-> Hash (cost=44.09..44.09 rows=11 width=4)
-> Index Scan using phone_cat_nameftx_idx on phone_cat pc
(cost=0.00..44.09 rows=
11 width=4)
-> Index Scan using phone_address_aid_key on phone_address p
(cost=0.00..6.01 rows=1 width=67
)

NOTICE: QUERY PLAN:

Nested Loop (cost=44.12..9272.76 rows=337 width=83)
-> Hash Join (cost=44.12..7243.86 rows=337 width=16)
-> Seq Scan on phone_cat_address pca (cost=0.00..5512.02
rows=336702 width=12)
-> Hash (cost=44.09..44.09 rows=11 width=4)
-> Index Scan using phone_cat_nameftx_idx on phone_cat pc
(cost=0.00..44.09 rows=
11 width=4)
-> Index Scan using phone_address_aid_key on phone_address p
(cost=0.00..6.01 rows=1 width=67
)

EXPLAIN


From: Doug McNaught <doug(at)wireboard(dot)com>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 15:10:15
Message-ID: m3y9hhzjfs.fsf@varsoon.denali.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:

> i can't seem to get index scan to work on table phone_cat_address. here
> are my schemas:

Standard question: have you run VACUUM ANALYZE?

Also, the estimate of rows returned from the phone_cat_address scan is
pretty large--how large is the table itself? Sequential scan is
actually faster if you're going to end up returning most of the rows
in the table...

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Doug McNaught <doug(at)wireboard(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 15:22:06
Message-ID: Pine.NEB.4.43.0202250919490.26699-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 25 Feb 2002, Doug McNaught wrote:

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
>
> > i can't seem to get index scan to work on table phone_cat_address. here
> > are my schemas:
>
> Standard question: have you run VACUUM ANALYZE?

yes :) before each test.

> Also, the estimate of rows returned from the phone_cat_address scan is
> pretty large--how large is the table itself? Sequential scan is
> actually faster if you're going to end up returning most of the rows
> in the table...

yellowpages=# select count(*) from phone_cat_address;
count
--------
336702
(1 row)

type typical results should be a tiny fraction of that number.

---
Thomas T. Thai | Minnesota.com | tom(at)minnesota(dot)com | 612.220.6220
Visit http://www.minnesota.com/


From: Doug McNaught <doug(at)wireboard(dot)com>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 15:38:13
Message-ID: m3pu2tzi56.fsf@varsoon.denali.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:

> > > i can't seem to get index scan to work on table phone_cat_address. here
> > > are my schemas:
> >
> > Standard question: have you run VACUUM ANALYZE?
>
> yes :) before each test.

Excellent. ;)

>
> > Also, the estimate of rows returned from the phone_cat_address scan is
> > pretty large--how large is the table itself? Sequential scan is
> > actually faster if you're going to end up returning most of the rows
> > in the table...
>
> yellowpages=# select count(*) from phone_cat_address;
> count
> --------
> 336702
> (1 row)
>
> type typical results should be a tiny fraction of that number.

Well, EXPLAIN is indicating (unless I misread it) that the estimate of
rows returned is 336702, so it's not surprising that it opts for a
sequential scan. Is this under 7.1 or 7.2? The latter keeps much
better statistics about table populations...

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Doug McNaught <doug(at)wireboard(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 15:55:25
Message-ID: Pine.NEB.4.43.0202250943330.26810-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 25 Feb 2002, Doug McNaught wrote:

> > > Also, the estimate of rows returned from the phone_cat_address scan is
> > > pretty large--how large is the table itself? Sequential scan is
> > > actually faster if you're going to end up returning most of the rows
> > > in the table...
> >
> > yellowpages=# select count(*) from phone_cat_address;
> > count
> > --------
> > 336702
> > (1 row)
> >
> > type typical results should be a tiny fraction of that number.
>
> Well, EXPLAIN is indicating (unless I misread it) that the estimate of
> rows returned is 336702, so it's not surprising that it opts for a
> sequential scan. Is this under 7.1 or 7.2? The latter keeps much
> better statistics about table populations...

this is under 7.2. is there away to force it to use index scan? cause
right now when i'm searching using a cat reference, it's taking a few
seconds.

---
Thomas T. Thai | Minnesota.com | tom(at)minnesota(dot)com | 612.220.6220
Visit http://www.minnesota.com/


From: Doug McNaught <doug(at)wireboard(dot)com>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:00:51
Message-ID: m3lmdhzh3g.fsf@varsoon.denali.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:

> On 25 Feb 2002, Doug McNaught wrote:
> > Well, EXPLAIN is indicating (unless I misread it) that the estimate of
> > rows returned is 336702, so it's not surprising that it opts for a
> > sequential scan. Is this under 7.1 or 7.2? The latter keeps much
> > better statistics about table populations...
>
> this is under 7.2. is there away to force it to use index scan? cause
> right now when i'm searching using a cat reference, it's taking a few
> seconds.

I'm still suspicious that something is wrong, but you can do

SET enable_seqscan TO off;

before your query and see if it helps your performance. If it makes a
significant difference let us know--Tom may be interested in trying to
improve the statistics.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:04:17
Message-ID: 21306.1014653057@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> i can't seem to get index scan to work on table phone_cat_address.

The planner seems to think that the cid column alone isn't very
selective, and thus indexscanning on it wouldn't be useful.
How many distinct cid values do you have? Also, which PG version is
this?

regards, tom lane


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:15:34
Message-ID: Pine.NEB.4.43.0202251009010.26912-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 25 Feb 2002, Tom Lane wrote:

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> > i can't seem to get index scan to work on table phone_cat_address.
>
> The planner seems to think that the cid column alone isn't very
> selective, and thus indexscanning on it wouldn't be useful.
> How many distinct cid values do you have? Also, which PG version is
> this?

yellowpages=# select count(distinct(cid)) from phone_cat_address;
count
-------
5139
(1 row)

this is 7.2. i'm just trying to find ways to narrow the search time down.
it's currently taking several seconds.


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Doug McNaught <doug(at)wireboard(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:23:05
Message-ID: Pine.NEB.4.43.0202251021520.26912-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 25 Feb 2002, Doug McNaught wrote:

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
>
> > On 25 Feb 2002, Doug McNaught wrote:
> > > Well, EXPLAIN is indicating (unless I misread it) that the estimate of
> > > rows returned is 336702, so it's not surprising that it opts for a
> > > sequential scan. Is this under 7.1 or 7.2? The latter keeps much
> > > better statistics about table populations...
> >
> > this is under 7.2. is there away to force it to use index scan? cause
> > right now when i'm searching using a cat reference, it's taking a few
> > seconds.
>
> I'm still suspicious that something is wrong, but you can do
>
> SET enable_seqscan TO off;
>
> before your query and see if it helps your performance. If it makes a
> significant difference let us know--Tom may be interested in trying to
> improve the statistics.

i just tried turning off seq scan and the query still takes up to 8
seconds which is 7 seconds too long. btw, how do you clean the cache
from the last query?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:30:41
Message-ID: 21476.1014654641@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> On Mon, 25 Feb 2002, Tom Lane wrote:
>> How many distinct cid values do you have? Also, which PG version is
>> this?

> 5139

Hmm, seems like that ought to be selective enough. What does pg_stats
show for phone_cat_address? (And phone_cat, for that matter.)

If you set enable_seqscan to off, do you get a plan you like better?
If so, what is it?

regards, tom lane


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:38:19
Message-ID: Pine.NEB.4.43.0202251036430.27000-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 25 Feb 2002, Tom Lane wrote:

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> > On Mon, 25 Feb 2002, Tom Lane wrote:
> >> How many distinct cid values do you have? Also, which PG version is
> >> this?
>
> > 5139
>
> Hmm, seems like that ought to be selective enough. What does pg_stats
> show for phone_cat_address? (And phone_cat, for that matter.)

sorry tom, i'm still new to PostgreSQL. what is pg_stats and how do i use
it in the way you've asked?

> If you set enable_seqscan to off, do you get a plan you like better?
> If so, what is it?

with seqscan off, the query still takes about the same about of time
(around 8 secs). i'd like to get it down to 1 if possible.


From: Doug McNaught <doug(at)wireboard(dot)com>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:41:03
Message-ID: m3g03pzf8g.fsf@varsoon.denali.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:

> i just tried turning off seq scan and the query still takes up to 8
> seconds which is 7 seconds too long. btw, how do you clean the cache
> from the last query?

I took a detailed look at your schema and I'm not enough of a guru to
offer any more advice (I've never used gist indexes or anything like
that). Maybe Tom or someone with more knowledge than I will weigh
in.

The only way I know of to "clear the cache" is to restart the
database.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Doug McNaught <doug(at)wireboard(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:48:02
Message-ID: Pine.NEB.4.43.0202251046150.27000-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 25 Feb 2002, Doug McNaught wrote:

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
>
> > i just tried turning off seq scan and the query still takes up to 8
> > seconds which is 7 seconds too long. btw, how do you clean the cache
> > from the last query?
>
> I took a detailed look at your schema and I'm not enough of a guru to
> offer any more advice (I've never used gist indexes or anything like
> that). Maybe Tom or someone with more knowledge than I will weigh
> in.

oh don't worry about the gist index. that is actually really fast.

> The only way I know of to "clear the cache" is to restart the
> database.

ouch. lol. cause when i try to time the same query again, the cached time
isn't what i want.

thanks though doug.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:50:10
Message-ID: 21664.1014655810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> sorry tom, i'm still new to PostgreSQL. what is pg_stats and how do i use
> it in the way you've asked?

select * from pg_stats where tablename = 'foo';

>> If you set enable_seqscan to off, do you get a plan you like better?
>> If so, what is it?

> with seqscan off, the query still takes about the same about of time
> (around 8 secs). i'd like to get it down to 1 if possible.

I wanted to know what the plan and cost estimates are. Also, it'd be
good to show EXPLAIN ANALYZE results, so that we can compare reality
to planner cost estimates ...

regards, tom lane


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 16:51:15
Message-ID: Pine.NEB.4.43.0202251048120.27000-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 25 Feb 2002, Tom Lane wrote:

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> > On Mon, 25 Feb 2002, Tom Lane wrote:
> >> How many distinct cid values do you have? Also, which PG version is
> >> this?
>
> > 5139
>
> Hmm, seems like that ought to be selective enough. What does pg_stats
> show for phone_cat_address? (And phone_cat, for that matter.)
>
> If you set enable_seqscan to off, do you get a plan you like better?
> If so, what is it?

it does seem that the index scan is slightly faster, but the overall
results feels roughly about the same. here are the explains:

explain SELECT *
yellowpages-# FROM
yellowpages-# (SELECT p.name,p.address,p.city,p.state,
yellowpages(#
geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat)) as dist
yellowpages(# FROM phone_address AS p, phone_cat AS pc,
phone_cat_address AS pca, zipcodes AS
z
yellowpages(# WHERE z.zip_code='55404'
yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
pca.aid=p.aid)
yellowpages(# ) AS ss
yellowpages-# WHERE dist < 35
yellowpages-# ORDER BY dist LIMIT 20;
NOTICE: QUERY PLAN:

Limit (cost=10799.67..10799.67 rows=20 width=115)
-> Sort (cost=10799.67..10799.67 rows=112 width=115)
-> Nested Loop (cost=0.00..10795.85 rows=112 width=115)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=1
6)
-> Materialize (cost=10786.10..10786.10 rows=337 width=99)
-> Nested Loop (cost=0.00..10786.10 rows=337
width=99)
-> Nested Loop (cost=0.00..8757.20 rows=337
width=16)
-> Index Scan using phone_cat_nameftx_idx
on phone_cat pc (cost
=0.00..44.09 rows=11 width=4)
-> Index Scan using
phone_cat_address_cid_key on phone_cat_addre
ss pca (cost=0.00..812.56 rows=286 width=12)
-> Index Scan using phone_address_aid_key on
phone_address p (cost=0.
00..6.01 rows=1 width=83)

EXPLAIN
yellowpages=# set enable_seqscan to on;
SET VARIABLE
yellowpages=# explain SELECT *
yellowpages-# FROM
yellowpages-# (SELECT p.name,p.address,p.city,p.state,
yellowpages(#
geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat)) as dist
yellowpages(# FROM phone_address AS p, phone_cat AS pc,
phone_cat_address AS pca, zipcodes AS
z
yellowpages(# WHERE z.zip_code='55404'
yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
pca.aid=p.aid)
yellowpages(# ) AS ss
yellowpages-# WHERE dist < 35
yellowpages-# ORDER BY dist LIMIT 20;
NOTICE: QUERY PLAN:

Limit (cost=9286.33..9286.33 rows=20 width=115)
-> Sort (cost=9286.33..9286.33 rows=112 width=115)
-> Nested Loop (cost=44.12..9282.51 rows=112 width=115)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=1
6)
-> Materialize (cost=9272.76..9272.76 rows=337 width=99)
-> Nested Loop (cost=44.12..9272.76 rows=337
width=99)
-> Hash Join (cost=44.12..7243.86 rows=337
width=16)
-> Seq Scan on phone_cat_address pca
(cost=0.00..5512.02 rows=3
36702 width=12)
-> Hash (cost=44.09..44.09 rows=11
width=4)
-> Index Scan using
phone_cat_nameftx_idx on phone_cat pc
(cost=0.00..44.09 rows=11 width=4)
-> Index Scan using phone_address_aid_key on
phone_address p (cost=0.
00..6.01 rows=1 width=83)

EXPLAIN
yellowpages=#


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 17:01:20
Message-ID: Pine.NEB.4.43.0202251052570.27000-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 25 Feb 2002, Tom Lane wrote:

This is getting rather long with the explains etc. should i continue to
cc: to the list?

> "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> > sorry tom, i'm still new to PostgreSQL. what is pg_stats and how do i use
> > it in the way you've asked?
>
> select * from pg_stats where tablename = 'foo';

yellowpages=# select * from pg_stats where tablename =
'phone_cat_address';
tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs
|
histogram_bounds
| correlation
-------------------+---------+-----------+-----------+------------+------------------------------
-----------------------------------------------------+-------------------------------------------
--------------------------------------------------------------------------------+----------------
-------------------------------------------------------------------------------+-------------
phone_cat_address | cid | 0 | 4 | 1176 |
{2,10,3,12,11,16,6,56,18293,7
5} |
{0.035,0.0283333,0.021,0.0206667,0.0133333
,0.0123333,0.00933333,0.00933333,0.009,0.00833333}
| {4,43,98,177,43
2,1603,2076,11212,15979,18262,18775}
| 1
phone_cat_address | aid | 0 | 8 | -0.526973 |
{873766,1468933,3316338,39146
78,4955422,6073155,6369270,8431670,9012026,10684620} |
{0.000666667,0.000666667,0.000666667,0.000
666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
| {10755,1321415,
2288476,3469515,4445287,5650291,7029439,8344730,9662520,11016908,100000851228}
| 0.011702
(2 rows)

yellowpages=# select * from pg_stats where tablename = 'phone_cat';
tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals | most_common_freqs
|
histogram_bounds

| correlation
-----------+---------+-----------+-----------+------------+------------------+-------------------
+------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------
------------------------+-------------
phone_cat | id | 0 | 4 | -1 |
|
| {2,543,1050,1594,2106,2641,3892,5566,7244,8979,10673}

| 0.861298
phone_cat | cid | 0 | 4 | -1 |
|
| {3,1189,2182,10960,12145,13215,14410,15649,16799,18058,19247}

| 0.00397399
phone_cat | name | 0 | 29 | -1 |
|
| {"ABRASIVE CUTTING",Auditors,"Boat Covers Tops &
Upholstery-Wholesale","Cash Registers & Suppli
es (Wholesale)","Communication Equipment-Manufacturers","Decoration
Supplies-Wholesale","Framing
Contractors-Buildings","Livestock Commission",Plants-Horticultural,"State
Government-General Offi
ces",Zippers-Repairing} | 0.860659
phone_cat | popular | 0 | 5 | 2 | {N,Y}
| {0.988,0.012}
|

| 0.996101
(4 rows)

>
> >> If you set enable_seqscan to off, do you get a plan you like better?
> >> If so, what is it?
>
> > with seqscan off, the query still takes about the same about of time
> > (around 8 secs). i'd like to get it down to 1 if possible.
>
> I wanted to know what the plan and cost estimates are. Also, it'd be
> good to show EXPLAIN ANALYZE results, so that we can compare reality
> to planner cost estimates ...

yellowpages=# set enable_seqscan to on;
SET VARIABLE
yellowpages=# explain analyze SELECT *
yellowpages-# FROM
yellowpages-# (SELECT p.name,p.address,p.city,p.state,
yellowpages(#
geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat))
as dist
yellowpages(# FROM phone_address AS p, phone_cat AS pc,
phone_cat_address AS
pca, zipcodes AS z
yellowpages(# WHERE z.zip_code='55404'
yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
pca.aid=p.a
id)
yellowpages(# ) AS ss
yellowpages-# WHERE dist < 35
yellowpages-# ORDER BY dist LIMIT 20;ORDER BY dist LIMIT 20;
NOTICE: QUERY PLAN:

Limit (cost=9286.33..9286.33 rows=20 width=115) (actual
time=6748.11..6748.17 r
ows=20 loops=1)
-> Sort (cost=9286.33..9286.33 rows=112 width=115) (actual
time=6748.10..674
8.12 rows=21 loops=1)
-> Nested Loop (cost=44.12..9282.51 rows=112 width=115) (actual
time=6
505.91..6701.65 rows=1745 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.
01 rows=1 width=16) (actual time=0.97..0.97 rows=1 loops=1)
-> Materialize (cost=9272.76..9272.76 rows=337 width=99)
(actual
time=6504.94..6558.83 rows=4217 loops=1)
-> Nested Loop (cost=44.12..9272.76 rows=337
width=99) (ac
tual time=555.67..6398.70 rows=4217 loops=1)
-> Hash Join (cost=44.12..7243.86 rows=337
width=16)
(actual time=555.67..5440.17 rows=4217 loops=1)
-> Seq Scan on phone_cat_address pca
(cost=0.0
0..5512.02 rows=336702 width=12) (actual time=0.00..3376.45 rows=336702
loops=1)
-> Hash (cost=44.09..44.09 rows=11
width=4) (a
ctual time=5.86..5.86 rows=0 loops=1)
-> Index Scan using
phone_cat_nameftx_idx
on phone_cat pc (cost=0.00..44.09 rows=11 width=4) (actual
time=2.93..5.85 row
s=8 loops=1)
-> Index Scan using phone_address_aid_key on
phone_ad
dress p (cost=0.00..6.01 rows=1 width=83) (actual time=0.16..0.18 rows=1
loops=
4217)
Total runtime: 6786.19 msec

EXPLAIN
yellowpages=#

----
yellowpages=# set enable_seqscan to off;
SET VARIABLE
yellowpages=# explain analyze -> Index Scan
using pho
ne_address_aid_key on phone_ad
yellowpages-# dress p (cost=0.00..6.01 rows=1 width=83) (actual
time=0.16..0.18
rows=1 loops=
yellowpages(# 4217)
yellowpages-# Total runtime: 6786.19 msec
yellowpages-#
yellowpages-# EXPLAIN
yellowpages-# yellowpages=#
yellowpages-# ;
ERROR: parser: parse error at or near "->"
yellowpages=# set enable_seqscan to off;
SET VARIABLE
yellowpages=# explain analyze SELECT *
yellowpages-# FROM
yellowpages-# (SELECT p.name,p.address,p.city,p.state,
yellowpages(#
geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat))
as dist
yellowpages(# FROM phone_address AS p, phone_cat AS pc,
phone_cat_address AS
pca, zipcodes AS z
yellowpages(# WHERE z.zip_code='55404'
yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
pca.aid=p.a
id)
yellowpages(# ) AS ss
yellowpages-# WHERE dist < 35
yellowpages-# ORDER BY dist LIMIT 20;
NOTICE: QUERY PLAN:

Limit (cost=10799.67..10799.67 rows=20 width=115) (actual
time=1564.47..1564.53
rows=20 loops=1)
-> Sort (cost=10799.67..10799.67 rows=112 width=115) (actual
time=1564.47..1
564.49 rows=21 loops=1)
-> Nested Loop (cost=0.00..10795.85 rows=112 width=115) (actual
time=1
323.26..1517.14 rows=1745 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.
01 rows=1 width=16) (actual time=0.00..0.00 rows=1 loops=1)
-> Materialize (cost=10786.10..10786.10 rows=337 width=99)
(actu
al time=1322.28..1372.30 rows=4217 loops=1)
-> Nested Loop (cost=0.00..10786.10 rows=337
width=99) (ac
tual time=3.90..1209.10 rows=4217 loops=1)
-> Nested Loop (cost=0.00..8757.20 rows=337
width=16
) (actual time=2.93..245.72 rows=4217 loops=1)
-> Index Scan using phone_cat_nameftx_idx
on ph
one_cat pc (cost=0.00..44.09 rows=11 width=4) (actual time=2.92..5.79
rows=8 lo
ops=1)
-> Index Scan using
phone_cat_address_cid_key o
n phone_cat_address pca (cost=0.00..812.56 rows=286 width=12) (actual
time=0.60
..21.64 rows=527 loops=8)
-> Index Scan using phone_address_aid_key on
phone_ad
dress p (cost=0.00..6.01 rows=1 width=83) (actual time=0.17..0.19 rows=1
loops=
4217)
Total runtime: 1596.69 msec

EXPLAIN


From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 21:28:02
Message-ID: 3C7AAC62.5A127420@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thomas,

Can you try:

SELECT name, address, city, state, dist
FROM
(SELECT aid, name, address, city, state,
geo_distance(
(select point( longitude, latitude) from zipcodes WHERE zip_code =
'55404'),
point(long, lat)) as dist
FROM phone_address WHERE dist < 35) AS ss, phone_cat AS pc,
phone_cat_address AS pca
WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=ss.aid
ORDER BY dist LIMIT 20;

you may have to replace dist in dist < 35 by the whole thing
geo_distance(...);

JLL

"Thomas T. Thai" wrote:
>
> On Mon, 25 Feb 2002, Tom Lane wrote:
>
> > "Thomas T. Thai" <tom(at)minnesota(dot)com> writes:
> > > On Mon, 25 Feb 2002, Tom Lane wrote:
> > >> How many distinct cid values do you have? Also, which PG version is
> > >> this?
> >
> > > 5139
> >
> > Hmm, seems like that ought to be selective enough. What does pg_stats
> > show for phone_cat_address? (And phone_cat, for that matter.)
> >
> > If you set enable_seqscan to off, do you get a plan you like better?
> > If so, what is it?
>
> it does seem that the index scan is slightly faster, but the overall
> results feels roughly about the same. here are the explains:
>
> explain SELECT *
> yellowpages-# FROM
> yellowpages-# (SELECT p.name,p.address,p.city,p.state,
> yellowpages(#
> geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat)) as dist
> yellowpages(# FROM phone_address AS p, phone_cat AS pc,
> phone_cat_address AS pca, zipcodes AS
> z
> yellowpages(# WHERE z.zip_code='55404'
> yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
> pca.aid=p.aid)
> yellowpages(# ) AS ss
> yellowpages-# WHERE dist < 35
> yellowpages-# ORDER BY dist LIMIT 20;
> NOTICE: QUERY PLAN:
>
> Limit (cost=10799.67..10799.67 rows=20 width=115)
> -> Sort (cost=10799.67..10799.67 rows=112 width=115)
> -> Nested Loop (cost=0.00..10795.85 rows=112 width=115)
> -> Index Scan using zipcodes_zc_idx on zipcodes z
> (cost=0.00..3.01 rows=1 width=1
> 6)
> -> Materialize (cost=10786.10..10786.10 rows=337 width=99)
> -> Nested Loop (cost=0.00..10786.10 rows=337
> width=99)
> -> Nested Loop (cost=0.00..8757.20 rows=337
> width=16)
> -> Index Scan using phone_cat_nameftx_idx
> on phone_cat pc (cost
> =0.00..44.09 rows=11 width=4)
> -> Index Scan using
> phone_cat_address_cid_key on phone_cat_addre
> ss pca (cost=0.00..812.56 rows=286 width=12)
> -> Index Scan using phone_address_aid_key on
> phone_address p (cost=0.
> 00..6.01 rows=1 width=83)
>
> EXPLAIN
> yellowpages=# set enable_seqscan to on;
> SET VARIABLE
> yellowpages=# explain SELECT *
> yellowpages-# FROM
> yellowpages-# (SELECT p.name,p.address,p.city,p.state,
> yellowpages(#
> geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat)) as dist
> yellowpages(# FROM phone_address AS p, phone_cat AS pc,
> phone_cat_address AS pca, zipcodes AS
> z
> yellowpages(# WHERE z.zip_code='55404'
> yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
> pca.aid=p.aid)
> yellowpages(# ) AS ss
> yellowpages-# WHERE dist < 35
> yellowpages-# ORDER BY dist LIMIT 20;
> NOTICE: QUERY PLAN:
>
> Limit (cost=9286.33..9286.33 rows=20 width=115)
> -> Sort (cost=9286.33..9286.33 rows=112 width=115)
> -> Nested Loop (cost=44.12..9282.51 rows=112 width=115)
> -> Index Scan using zipcodes_zc_idx on zipcodes z
> (cost=0.00..3.01 rows=1 width=1
> 6)
> -> Materialize (cost=9272.76..9272.76 rows=337 width=99)
> -> Nested Loop (cost=44.12..9272.76 rows=337
> width=99)
> -> Hash Join (cost=44.12..7243.86 rows=337
> width=16)
> -> Seq Scan on phone_cat_address pca
> (cost=0.00..5512.02 rows=3
> 36702 width=12)
> -> Hash (cost=44.09..44.09 rows=11
> width=4)
> -> Index Scan using
> phone_cat_nameftx_idx on phone_cat pc
> (cost=0.00..44.09 rows=11 width=4)
> -> Index Scan using phone_address_aid_key on
> phone_address p (cost=0.
> 00..6.01 rows=1 width=83)
>
> EXPLAIN
> yellowpages=#
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-25 21:51:48
Message-ID: Pine.NEB.4.43.0202251538310.28214-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 25 Feb 2002, Jean-Luc Lachance wrote:

> Thomas,
>
> Can you try:
>
> SELECT name, address, city, state, dist
> FROM
> (SELECT aid, name, address, city, state,
> geo_distance(
> (select point( longitude, latitude) from zipcodes WHERE zip_code =
> '55404'),
> point(long, lat)) as dist
> FROM phone_address WHERE dist < 35) AS ss, phone_cat AS pc,
> phone_cat_address AS pca
> WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=ss.aid
> ORDER BY dist LIMIT 20;
>
> you may have to replace dist in dist < 35 by the whole thing
> geo_distance(...);

to get it to work, the query had to be changed to:
SELECT ss.name, address, city, state, dist
FROM
(SELECT aid, name, address, city, state,
geo_distance(
(SELECT point( longitude, latitude)
FROM zipcodes WHERE zip_code ='55404'),
point(long, lat)
) AS dist
FROM phone_address
WHERE geo_distance(
(SELECT point( longitude, latitude)
FROM zipcodes WHERE zip_code ='55404'),
point(long, lat)
) < 35
) AS ss,
phone_cat AS pc,
phone_cat_address AS pca
WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=ss.aid
ORDER BY dist LIMIT 20;

Summary: not much difference from the original query. I'm still not
understanding why they plan chose to use seqscan. it takes 6 times longer.

---
SET enable_seqscan TO on:

Limit (cost=9279.11..9279.11 rows=20 width=99)
(actual time=6518.61..6518.67 rows=20 loops=1)
InitPlan
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Sort (cost=9279.11..9279.11 rows=112 width=99)
(actual time=6518.61..6518.63 rows=21 loops=1)
-> Nested Loop (cost=44.12..9275.29 rows=112 width=99)
(actual time=556.65..6470.21 rows=1745 loops=1)
-> Hash Join (cost=44.12..7243.86 rows=337 width=16)
(actual time=554.75..5418.58 rows=4217 loops=1)
-> Seq Scan on phone_cat_address pca
(cost=0.00..5512.02 rows=336702 width=12)
(actual time=0.00..3329.21 rows=336702 loops=1)
-> Hash (cost=44.09..44.09 rows=11 width=4)
(actual time=4.88..4.88 rows=0 loops=1)
-> Index Scan
using phone_cat_nameftx_idx on phone_cat pc
(cost=0.00..44.09 rows=11 width=4)
(actual time=1.95..4.87 rows=8 loops=1)
-> Index Scan using phone_address_aid_key on phone_address
(cost=0.00..6.02 rows=1 width=83)
(actual time=0.20..0.21 rows=0 loops=4217)
Total runtime: 6521.54 msec

---
SET enable_seqscan TO off:

Limit (cost=10792.45..10792.45 rows=20 width=99)
(actual time=1316.42..1316.48rows=20 loops=1)
InitPlan
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.97..0.98 rows=1 loops=1)
-> Sort (cost=10792.45..10792.45 rows=112 width=99)
(actual time=1316.42..1316.44 rows=21 loops=1)
-> Nested Loop (cost=0.00..10788.63 rows=112 width=99)
(actual time=6.84..1263.21 rows=1745 loops=1)
-> Nested Loop (cost=0.00..8757.20 rows=337 width=16)
(actual time=2.93..239.25 rows=4217 loops=1)
-> Index Scan using phone_cat_nameftx_idx
on phone_cat pc
(cost=0.00..44.09 rows=11 width=4)
(actual time=2.93..6.75 rows=8 loops=1)
-> Index Scan using phone_cat_address_cid_key
on phone_cat_address pca
(cost=0.00..812.56 rows=286 width=12)
(actual time=0.36..21.94 rows=527 loops=8)
-> Index Scan using phone_address_aid_key
on phone_address (cost=0.00..6.02 rows=1 width=83)
(actual time=0.20..0.21 rows=0 loops=4217)
Total runtime: 1318.37 msec


From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-26 16:37:07
Message-ID: 3C7BB9B3.571F14CE@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I think it is because of the SS intermediate result.
PG does not have an index on ss.aid to do the join, so it it has to sort
the result.

Let's try it the other way around; take the ss out of the picture.

SELECT p.name, p.address, p.city, p.state,
geo_distance(
( SELECT point( longitude, latitude)
FROM zipcodes WHERE zip_code ='55404'),
point(long, lat)
) AS dist
FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=p.aid AND
geo_distance(
(SELECT point( longitude, latitude)
FROM zipcodes WHERE zip_code ='55404'),
point(long, lat)
) < 35
ORDER BY dist LIMIT 20;

JLL

"Thomas T. Thai" wrote:
>
[...]
> to get it to work, the query had to be changed to:
> SELECT ss.name, address, city, state, dist
> FROM
> (SELECT aid, name, address, city, state,
> geo_distance(
> (SELECT point( longitude, latitude)
> FROM zipcodes WHERE zip_code ='55404'),
> point(long, lat)
> ) AS dist
> FROM phone_address
> WHERE geo_distance(
> (SELECT point( longitude, latitude)
> FROM zipcodes WHERE zip_code ='55404'),
> point(long, lat)
> ) < 35
> ) AS ss,
> phone_cat AS pc,
> phone_cat_address AS pca
> WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=ss.aid
> ORDER BY dist LIMIT 20;
>
> Summary: not much difference from the original query. I'm still not
> understanding why they plan chose to use seqscan. it takes 6 times longer.
>
> ---
> SET enable_seqscan TO on:
>
> Limit (cost=9279.11..9279.11 rows=20 width=99)
> (actual time=6518.61..6518.67 rows=20 loops=1)
> InitPlan
> -> Index Scan using zipcodes_zc_idx on zipcodes
> (cost=0.00..3.01 rows=1 width=16)
> (actual time=0.00..0.00 rows=1 loops=1)
> -> Index Scan using zipcodes_zc_idx on zipcodes
> (cost=0.00..3.01 rows=1 width=16)
> (actual time=0.00..0.00 rows=1 loops=1)
> -> Sort (cost=9279.11..9279.11 rows=112 width=99)
> (actual time=6518.61..6518.63 rows=21 loops=1)
> -> Nested Loop (cost=44.12..9275.29 rows=112 width=99)
> (actual time=556.65..6470.21 rows=1745 loops=1)
> -> Hash Join (cost=44.12..7243.86 rows=337 width=16)
> (actual time=554.75..5418.58 rows=4217 loops=1)
> -> Seq Scan on phone_cat_address pca
> (cost=0.00..5512.02 rows=336702 width=12)
> (actual time=0.00..3329.21 rows=336702 loops=1)
> -> Hash (cost=44.09..44.09 rows=11 width=4)
> (actual time=4.88..4.88 rows=0 loops=1)
> -> Index Scan
> using phone_cat_nameftx_idx on phone_cat pc
> (cost=0.00..44.09 rows=11 width=4)
> (actual time=1.95..4.87 rows=8 loops=1)
> -> Index Scan using phone_address_aid_key on phone_address
> (cost=0.00..6.02 rows=1 width=83)
> (actual time=0.20..0.21 rows=0 loops=4217)
> Total runtime: 6521.54 msec
>
> ---
> SET enable_seqscan TO off:
>
> Limit (cost=10792.45..10792.45 rows=20 width=99)
> (actual time=1316.42..1316.48rows=20 loops=1)
> InitPlan
> -> Index Scan using zipcodes_zc_idx on zipcodes
> (cost=0.00..3.01 rows=1 width=16)
> (actual time=0.00..0.00 rows=1 loops=1)
> -> Index Scan using zipcodes_zc_idx on zipcodes
> (cost=0.00..3.01 rows=1 width=16)
> (actual time=0.97..0.98 rows=1 loops=1)
> -> Sort (cost=10792.45..10792.45 rows=112 width=99)
> (actual time=1316.42..1316.44 rows=21 loops=1)
> -> Nested Loop (cost=0.00..10788.63 rows=112 width=99)
> (actual time=6.84..1263.21 rows=1745 loops=1)
> -> Nested Loop (cost=0.00..8757.20 rows=337 width=16)
> (actual time=2.93..239.25 rows=4217 loops=1)
> -> Index Scan using phone_cat_nameftx_idx
> on phone_cat pc
> (cost=0.00..44.09 rows=11 width=4)
> (actual time=2.93..6.75 rows=8 loops=1)
> -> Index Scan using phone_cat_address_cid_key
> on phone_cat_address pca
> (cost=0.00..812.56 rows=286 width=12)
> (actual time=0.36..21.94 rows=527 loops=8)
> -> Index Scan using phone_address_aid_key
> on phone_address (cost=0.00..6.02 rows=1 width=83)
> (actual time=0.20..0.21 rows=0 loops=4217)
> Total runtime: 1318.37 msec


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-26 20:15:37
Message-ID: Pine.NEB.4.43.0202261408590.2625-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 26 Feb 2002, Jean-Luc Lachance wrote:

> I think it is because of the SS intermediate result.
> PG does not have an index on ss.aid to do the join, so it it has to sort
> the result.
>
> Let's try it the other way around; take the ss out of the picture.
>
> SELECT p.name, p.address, p.city, p.state,
> geo_distance(
> ( SELECT point( longitude, latitude)
> FROM zipcodes WHERE zip_code ='55404'),
> point(long, lat)
> ) AS dist
> FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
> WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=p.aid AND
> geo_distance(
> (SELECT point( longitude, latitude)
> FROM zipcodes WHERE zip_code ='55404'),
> point(long, lat)
> ) < 35
> ORDER BY dist LIMIT 20;

this still uses seq scan if i have 'set enable_seqscan to on;'

Limit (cost=9279.49..9279.49 rows=20 width=98)
(actual time=19257.96..19258.02 rows=20 loops=1)
InitPlan
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=26.37..26.37 rows=1 loops=1)
-> Sort (cost=9279.49..9279.49 rows=112 width=98)
(actual time=19257.96..19257.98 rows=21 loops=1)
-> Nested Loop (cost=44.52..9275.67 rows=112 width=98)
(actual time=873.05..19203.65 rows=1745 loops=1)
-> Hash Join (cost=44.52..7244.26 rows=337 width=16)
(actual time=778.33..8155.07 rows=4217 loops=1)
-> Seq Scan on phone_cat_address pca
(cost=0.00..5512.02 rows=336702 width=12)
(actual time=6.83..5012.23 rows=336702 loops=1)
-> Hash (cost=44.50..44.50 rows=11 width=4)
(actual time=186.52..186.52 rows=0 loops=1)
-> Index Scan using phone_cat_nameftx_idx
on phone_cat pc
(cost=0.00..44.50 rows=11 width=4)
(actual time=67.38..186.51 rows=8 loops=1)
-> Index Scan using phone_address_aid_key on phone_address
p
(cost=0.00..6.02 rows=1 width=82)
(actual time=2.57..2.58 rows=0 loops=4217)
Total runtime: 19258.94 msec

Limit (cost=9279.49..9279.49 rows=20 width=98)
(actual time=19257.96..19258.02 rows=20 loops=1)
InitPlan
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes
(cost=0.00..3.01 rows=1 width=16)
(actual time=26.37..26.37 rows=1 loops=1)
-> Sort (cost=9279.49..9279.49 rows=112 width=98)
(actual time=19257.96..19257.98 rows=21 loops=1)
-> Nested Loop (cost=44.52..9275.67 rows=112 width=98)
(actual time=873.05..19203.65 rows=1745 loops=1)
-> Hash Join (cost=44.52..7244.26 rows=337 width=16)
(actual time=778.33..8155.07 rows=4217 loops=1)
-> Seq Scan on phone_cat_address pca
(cost=0.00..5512.02 rows=336702 width=12)
(actual time=6.83..5012.23 rows=336702 loops=1)
-> Hash (cost=44.50..44.50 rows=11 width=4)
(actual time=186.52..186.52 rows=0 loops=1)
-> Index Scan using phone_cat_nameftx_idx
on phone_cat pc
(cost=0.00..44.50 rows=11 width=4)
(actual time=67.38..186.51 rows=8 loops=1)
-> Index Scan using phone_address_aid_key on phone_address
p
(cost=0.00..6.02 rows=1 width=82)
(actual time=2.57..2.58 rows=0 loops=4217)
Total runtime: 19258.94 msec


From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-26 21:54:28
Message-ID: 3C7C0414.D1DDECC5@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thomas,

Maybe the where clause logical expression is not being reduced.
If there is only one pca record that will satisfy
pc.nameftx ## 'salon' AND pc.cid=pca.cid

try:

WHERE aid = ( SELECT aid FROM phone_cat_address AS pca, phone_cat AS pc
WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid)

"Thomas T. Thai" wrote:
>
> On Tue, 26 Feb 2002, Jean-Luc Lachance wrote:
>
> > I think it is because of the SS intermediate result.
> > PG does not have an index on ss.aid to do the join, so it it has to sort
> > the result.
> >
> > Let's try it the other way around; take the ss out of the picture.
> >
> > SELECT p.name, p.address, p.city, p.state,
> > geo_distance(
> > ( SELECT point( longitude, latitude)
> > FROM zipcodes WHERE zip_code ='55404'),
> > point(long, lat)
> > ) AS dist
> > FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
> > WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=p.aid AND
> > geo_distance(
> > (SELECT point( longitude, latitude)
> > FROM zipcodes WHERE zip_code ='55404'),
> > point(long, lat)
> > ) < 35
> > ORDER BY dist LIMIT 20;
>
> this still uses seq scan if i have 'set enable_seqscan to on;'
>


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-27 03:16:36
Message-ID: Pine.NEB.4.43.0202262115360.3558-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 26 Feb 2002, Jean-Luc Lachance wrote:

> Thomas,
>
> Maybe the where clause logical expression is not being reduced.
> If there is only one pca record that will satisfy
> pc.nameftx ## 'salon' AND pc.cid=pca.cid
>
> try:
>
> WHERE aid = ( SELECT aid FROM phone_cat_address AS pca, phone_cat AS pc
> WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid)

SELECT p.name, p.address, p.city, p.state,
geo_distance(
( SELECT point( longitude, latitude)
FROM zipcodes WHERE zip_code ='55404'),
point(long, lat)
) AS dist
FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
WHERE p.aid = (SELECT aid
FROM phone_cat_address AS pca, phone_cat AS pc
WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid
) AND
geo_distance(
(SELECT point( longitude, latitude)
FROM zipcodes WHERE zip_code ='55404'),
point(long, lat)
) < 35
ORDER BY dist LIMIT 20;

ERROR: More than one tuple returned by a subselect used as an expression.
ERROR: More than one tuple returned by a subselect used as an expression.

> > > SELECT p.name, p.address, p.city, p.state,
> > > geo_distance(
> > > ( SELECT point( longitude, latitude)
> > > FROM zipcodes WHERE zip_code ='55404'),
> > > point(long, lat)
> > > ) AS dist
> > > FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
> > > WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=p.aid AND
> > > geo_distance(
> > > (SELECT point( longitude, latitude)
> > > FROM zipcodes WHERE zip_code ='55404'),
> > > point(long, lat)
> > > ) < 35
> > > ORDER BY dist LIMIT 20;
> >
> > this still uses seq scan if i have 'set enable_seqscan to on;'


From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-02-27 16:20:17
Message-ID: 3C7D0741.44DC61AB@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thomas,

Obviously, my "If" statement was not satisfied...
Any chance that "SELECT DISTINCT aid..." might work?

JLL

P.S. I can't explain why you get the error twice.

"Thomas T. Thai" wrote:
>
> On Tue, 26 Feb 2002, Jean-Luc Lachance wrote:
>
> > Thomas,
> >
> > Maybe the where clause logical expression is not being reduced.
> > If there is only one pca record that will satisfy
> > pc.nameftx ## 'salon' AND pc.cid=pca.cid
> >
> > try:
> >
> > WHERE aid = ( SELECT aid FROM phone_cat_address AS pca, phone_cat AS pc
> > WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid)
>
> SELECT p.name, p.address, p.city, p.state,
> geo_distance(
> ( SELECT point( longitude, latitude)
> FROM zipcodes WHERE zip_code ='55404'),
> point(long, lat)
> ) AS dist
> FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
> WHERE p.aid = (SELECT aid
> FROM phone_cat_address AS pca, phone_cat AS pc
> WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid
> ) AND
> geo_distance(
> (SELECT point( longitude, latitude)
> FROM zipcodes WHERE zip_code ='55404'),
> point(long, lat)
> ) < 35
> ORDER BY dist LIMIT 20;
>
> ERROR: More than one tuple returned by a subselect used as an expression.
> ERROR: More than one tuple returned by a subselect used as an expression.
>
> > > > SELECT p.name, p.address, p.city, p.state,
> > > > geo_distance(
> > > > ( SELECT point( longitude, latitude)
> > > > FROM zipcodes WHERE zip_code ='55404'),
> > > > point(long, lat)
> > > > ) AS dist
> > > > FROM phone_address p, phone_cat AS pc, phone_cat_address AS pca
> > > > WHERE pc.nameftx ## 'salon' AND pc.cid=pca.cid AND pca.aid=p.aid AND
> > > > geo_distance(
> > > > (SELECT point( longitude, latitude)
> > > > FROM zipcodes WHERE zip_code ='55404'),
> > > > point(long, lat)
> > > > ) < 35
> > > > ORDER BY dist LIMIT 20;
> > >
> > > this still uses seq scan if i have 'set enable_seqscan to on;'


From: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-02 22:28:00
Message-ID: 20020303063920.5A38.RK73@sea.plala.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 25 Feb 2002 10:51:15 -0600 (CST)
"Thomas T. Thai" <tom(at)minnesota(dot)com> wrote:

>
> it does seem that the index scan is slightly faster, but the overall
> results feels roughly about the same. here are the explains:
>
> explain SELECT *
> yellowpages-# FROM
> yellowpages-# (SELECT p.name,p.address,p.city,p.state,
> yellowpages(#
> geo_distance(point(z.longitude,z.latitude),point(p.long,p.lat)) as dist
> yellowpages(# FROM phone_address AS p, phone_cat AS pc,
> phone_cat_address AS pca, zipcodes AS
> z
> yellowpages(# WHERE z.zip_code='55404'
> yellowpages(# AND (pc.nameftx ## 'salon' AND pc.cid=pca.cid AND
> pca.aid=p.aid)
> yellowpages(# ) AS ss
> yellowpages-# WHERE dist < 35
> yellowpages-# ORDER BY dist LIMIT 20;

Does a setting of the sort_mem still have a default value ?
Could you try a series of your challenges again after rewriting the
postgresql.conf if so ?

Could you, in addition, execute the following queries in stead of the
original and show us the explain output for them ? But I'm not sure
they work faster than the original

set enable_seqscan to on;
explain analyze --- (1)
SELECT *
FROM (SELECT p.name, p.address, p.city, p.state,
geo_distance(point(z.longitude, z.latitude),
point(p.long, p.lat)) as dist
FROM phone_address AS p,
(SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
phone_cat_address AS pca,
zipcodes AS z
WHERE z.zip_code='55404' AND pc.cid=pca.cid AND pca.aid=p.aid
) AS ss
WHERE ss.dist < 35
ORDER BY ss.dist
LIMIT 20;

explain analyze --- (2)
SELECT *
FROM (SELECT p.name, p.address, p.city, p.state,
geo_distance(point(z.longitude, z.latitude),
point(p.long, p.lat)) as dist
FROM phone_address AS p,
(SELECT * FROM phone_cat WHERE nameftx ## 'salon'
ORDER BY cid) AS pc,
phone_cat_address AS pca,
zipcodes AS z
WHERE z.zip_code='55404' AND pc.cid=pca.cid AND pca.aid=p.aid
) AS ss
WHERE ss.dist < 35
ORDER BY ss.dist
LIMIT

Regards,
Masaru Sugawara


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-04 06:33:10
Message-ID: Pine.NEB.4.43.0203040018520.451-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, 3 Mar 2002, Masaru Sugawara wrote:
> Does a setting of the sort_mem still have a default value ?
> Could you try a series of your challenges again after rewriting the
> postgresql.conf if so ?

i changed the default values to:

shared_buffers = 15200
sort_mem = 32168
vacuum_mem = 8192
fsync = false

> Could you, in addition, execute the following queries in stead of the
> original and show us the explain output for them ? But I'm not sure
> they work faster than the original

the two queries below are the same except for the 'set enable_seqscan
to on' right? here are the results:

> set enable_seqscan to on;
> explain analyze --- (1)
> SELECT *
> FROM (SELECT p.name, p.address, p.city, p.state,
> geo_distance(point(z.longitude, z.latitude),
> point(p.long, p.lat)) as dist
> FROM phone_address AS p,
> (SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
> phone_cat_address AS pca,
> zipcodes AS z
> WHERE z.zip_code='55404' AND pc.cid=pca.cid AND pca.aid=p.aid
> ) AS ss
> WHERE ss.dist < 35
> ORDER BY ss.dist
> LIMIT 20;

Limit (cost=107.13..107.13 rows=1 width=109)
(actual time=9851.64..9851.70 rows=20 loops=1)
-> Sort (cost=107.13..107.13 rows=1 width=109)
(actual time=9851.64..9851.66 rows=21 loops=1)
-> Nested Loop (cost=43.77..107.12 rows=1 width=109)
(actual time=1462.90..9803.26 rows=1745 loops=1)
-> Nested Loop (cost=43.77..104.08 rows=1 width=93)
(actual time=1461.97..9234.44 rows=4217 loops=1)
-> Merge Join (cost=43.77..98.31 rows=1 width=12)
(actual time=1461.97..8623.90 rows=4217 loops=1)
-> Index Scan using
phone_cat_address_cidaid_key
on phone_cat_address pca
(cost=0.00..52.00 rows=1000 width=8)
(actual time=0.00..6447.81 rows=310533
loops=1)
-> Sort (cost=43.77..43.77 rows=11 width=4)
(actualtime=3.91..22.98 rows=4214 loops=1)
-> Index Scan using phone_cat_nameftx
on phone_cat
(cost=0.00..43.58 rows=11 width=4)
(actual time=0.97..3.90 rows=8
loops=1)
-> Index Scan using phone_address_aid_key
on phone_address p (cost=0.00..5.77 rows=1
width=81)
(actual time=0.09..0.11 rows=1 loops=4217)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.07..0.09 rows=1 loops=4217)
Total runtime: 9853.59 msec

for this next one, i assumed you wanted 'set enable_seqscan to off;'

> explain analyze --- (2)
> SELECT *
> FROM (SELECT p.name, p.address, p.city, p.state,
> geo_distance(point(z.longitude, z.latitude),
> point(p.long, p.lat)) as dist
> FROM phone_address AS p,
> (SELECT * FROM phone_cat WHERE nameftx ## 'salon'
> ORDER BY cid) AS pc,
> phone_cat_address AS pca,
> zipcodes AS z
> WHERE z.zip_code='55404' AND pc.cid=pca.cid AND pca.aid=p.aid
> ) AS ss
> WHERE ss.dist < 35
> ORDER BY ss.dist
> LIMIT

Limit (cost=107.13..107.13 rows=1 width=109)
(actual time=10228.59..10228.65 rows=20 loops=1)
-> Sort (cost=107.13..107.13 rows=1 width=109)
(actual time=10228.59..10228.61 rows=21 loops=1)
-> Nested Loop (cost=43.77..107.12 rows=1 width=109)
(actual time=1466.80..10180.24 rows=1745 loops=1)
-> Nested Loop (cost=43.77..104.08 rows=1 width=93)
(actual time=1465.86..9598.50 rows=4217 loops=1)
-> Merge Join (cost=43.77..98.31 rows=1 width=12)
(actual time=1465.86..9025.95 rows=4217 loops=1)
-> Index Scan using
phone_cat_address_cidaid_key
on phone_cat_address pca
(cost=0.00..52.00 rows=1000 width=8)
(actual time=0.00..6862.98 rows=310533
loops=1)
-> Sort (cost=43.77..43.77 rows=11 width=4)
(actual time=3.91..13.69 rows=4214 loops=1)
-> Index Scan using phone_cat_nameftx
on phone_cat
(cost=0.00..43.58 rows=11 width=4)
(actual time=0.97..3.90 rows=8
loops=1)
-> Index Scan using phone_address_aid_key
on phone_address p
(cost=0.00..5.77 rows=1 width=81)
(actual time=0.08..0.10 rows=1 loops=4217)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.07..0.08 rows=1 loops=4217)
Total runtime: 10230.54 msec

--
Thomas T. Thai
Minnesota.com, Inc.


From: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-05 16:30:57
Message-ID: 20020306012943.3BDB.RK73@sea.plala.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 4 Mar 2002 00:33:10 -0600 (CST)
"Thomas T. Thai" <tom(at)minnesota(dot)com> wrote:

> i changed the default values to:
>
> shared_buffers = 15200
> sort_mem = 32168
> vacuum_mem = 8192
> fsync = false
>
> > Could you, in addition, execute the following queries in stead of the
> > original and show us the explain output for them ? But I'm not sure
> > they work faster than the original
>
> the two queries below are the same except for the 'set enable_seqscan
> to on' right? here are the results:

No, there is a difference -- it is an "ORDER BY".

...

> for this next one, i assumed you wanted 'set enable_seqscan to off;'

I didn't mean to let you set it to off. But both of them don't seem to
become as fast as I have thought.

> Limit (cost=107.13..107.13 rows=1 width=109)
> -> Sort (cost=107.13..107.13 rows=1 width=109)
> -> Nested Loop (cost=43.77..107.12 rows=1 width=109)
> -> Nested Loop (cost=43.77..104.08 rows=1 width=93)
> -> Merge Join (cost=43.77..98.31 rows=1 width=12)
> -> Index Scan using
> phone_cat_address_cidaid_key
> on phone_cat_address pca
> -> Sort (cost=43.77..43.77 rows=11 width=4)
> -> Index Scan using phone_cat_nameftx
> on phone_cat
> -> Index Scan using phone_address_aid_key
> on phone_address p
> -> Index Scan using zipcodes_zc_idx on zipcodes z

I would think there is obviously room for more research. To force the planner
use the InitPlan, my two queries are changed a bit:

set enable_seqscan to on;
explain analyze --- (1')
SELECT *
FROM (SELECT p.name, p.address, p.city, p.state,
geo_distance((SELECT point(z.longitude, z.latitude)
FROM zipcodes AS z
WHERE z.zip_code='55404'),
point(p.long, p.lat)) as dist
FROM phone_address AS p,
(SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
phone_cat_address AS pca,
WHERE pc.cid = pca.cid AND pca.aid = p.aid
) AS ss
WHERE ss.dist < 35
ORDER BY ss.dist
LIMIT 20;

set enable_seqscan to on;
explain analyze --- (2')
SELECT *
FROM (SELECT p.name, p.address, p.city, p.state,
geo_distance((SELECT point(z.longitude, z.latitude)
FROM zipcodes AS z
WHERE z.zip_code='55404'),
point(p.long, p.lat)) as dist
FROM phone_address AS p,
(SELECT * FROM phone_cat WHERE nameftx ## 'salon'
ORDER BY cid) AS pc,
phone_cat_address AS pca,
WHERE pc.cid = pca.cid AND pca.aid = p.aid
) AS ss
WHERE ss.dist < 35
ORDER BY ss.dist
LIMIT 20;

Regards,
Masaru Sugawara


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-03-06 06:08:44
Message-ID: Pine.NEB.4.43.0203060006580.8525-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 6 Mar 2002, Masaru Sugawara wrote:

Both of your queries generated an error:

ERROR: parser: parse error at or near "WHERE"

I can't see which where it is though.

[...]
> I would think there is obviously room for more research. To force the planner
> use the InitPlan, my two queries are changed a bit:
>
> set enable_seqscan to on;
> explain analyze --- (1')
> SELECT *
> FROM (SELECT p.name, p.address, p.city, p.state,
> geo_distance((SELECT point(z.longitude, z.latitude)
> FROM zipcodes AS z
> WHERE z.zip_code='55404'),
> point(p.long, p.lat)) as dist
> FROM phone_address AS p,
> (SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
> phone_cat_address AS pca,
> WHERE pc.cid = pca.cid AND pca.aid = p.aid
> ) AS ss
> WHERE ss.dist < 35
> ORDER BY ss.dist
> LIMIT 20;
>
>
> set enable_seqscan to on;
> explain analyze --- (2')
> SELECT *
> FROM (SELECT p.name, p.address, p.city, p.state,
> geo_distance((SELECT point(z.longitude, z.latitude)
> FROM zipcodes AS z
> WHERE z.zip_code='55404'),
> point(p.long, p.lat)) as dist
> FROM phone_address AS p,
> (SELECT * FROM phone_cat WHERE nameftx ## 'salon'
> ORDER BY cid) AS pc,
> phone_cat_address AS pca,
> WHERE pc.cid = pca.cid AND pca.aid = p.aid
> ) AS ss
> WHERE ss.dist < 35
> ORDER BY ss.dist
> LIMIT 20;

--
Thomas T. Thai
Minnesota.com, Inc.


From: Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-03-06 06:34:58
Message-ID: 1015396500.1406.93.camel@linda
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 2002-03-06 at 06:08, Thomas T. Thai wrote:
> On Wed, 6 Mar 2002, Masaru Sugawara wrote:
>
> Both of your queries generated an error:
>
> ERROR: parser: parse error at or near "WHERE"
>
> I can't see which where it is though.
>
> [...]
> > I would think there is obviously room for more research. To force the planner
> > use the InitPlan, my two queries are changed a bit:
> >
> > set enable_seqscan to on;
> > explain analyze --- (1')
> > SELECT *
> > FROM (SELECT p.name, p.address, p.city, p.state,
> > geo_distance((SELECT point(z.longitude, z.latitude)
> > FROM zipcodes AS z
> > WHERE z.zip_code='55404'),
> > point(p.long, p.lat)) as dist
> > FROM phone_address AS p,
> > (SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
> > phone_cat_address AS pca,

probably that comma after pca

> > WHERE pc.cid = pca.cid AND pca.aid = p.aid
> > ) AS ss
> > WHERE ss.dist < 35
> > ORDER BY ss.dist
> > LIMIT 20;

--
Oliver Elphick Oliver(dot)Elphick(at)lfix(dot)co(dot)uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C

"The LORD is my light and my salvation; whom shall I
fear? the LORD is the strength of my life; of whom
shall I be afraid?" Psalms 27:1


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: help with getting index scan
Date: 2002-03-06 06:51:15
Message-ID: Pine.NEB.4.43.0203060049210.8525-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 6 Mar 2002, Thomas T. Thai wrote:

> ERROR: parser: parse error at or near "WHERE"

ok found it. comma before where. see explain below.

> > set enable_seqscan to on;
> > explain analyze --- (1')
> > SELECT *
> > FROM (SELECT p.name, p.address, p.city, p.state,
> > geo_distance((SELECT point(z.longitude, z.latitude)
> > FROM zipcodes AS z
> > WHERE z.zip_code='55404'),
> > point(p.long, p.lat)) as dist
> > FROM phone_address AS p,
> > (SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
> > phone_cat_address AS pca,
> > WHERE pc.cid = pca.cid AND pca.aid = p.aid
> > ) AS ss
> > WHERE ss.dist < 35
> > ORDER BY ss.dist
> > LIMIT 20;

Limit (cost=9537.20..9537.20 rows=20 width=93)
(actual time=5849.67..5849.73 rows=20 loops=1)
InitPlan
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.93 rows=1 loops=1)
-> Sort (cost=9537.20..9537.20 rows=112 width=93)
(actual time=5849.66..5849.68 rows=21 loops=1)
-> Nested Loop (cost=43.61..9533.38 rows=112 width=93)
(actual time=383.80..5798.66 rows=1745 loops=1)
-> Hash Join (cost=43.61..7574.33 rows=337 width=12)
(actual time=382.86..5214.98 rows=4217 loops=1)
-> Seq Scan on phone_cat_address pca
(cost=0.00..5843.01 rows=336701 width=8)
(actual time=0.00..3441.76 rows=336701 loops=1)
-> Hash (cost=43.58..43.58 rows=11 width=4)
(actual time=3.91..3.91 rows=0 loops=1)
-> Index Scan using phone_cat_name_fts_idx
on phone_cat (cost=0.00..43.58 rows=11 width=4)
(actual time=0.00..3.90 rows=8 loops=1)
-> Index Scan using phone_address_aid_key
on phone_address p (cost=0.00..5.81 rows=1 width=81)
(actual time=0.10..0.11 rows=0 loops=4217)
Total runtime: 5851.62 msec

> > set enable_seqscan to on;
> > explain analyze --- (2')
> > SELECT *
> > FROM (SELECT p.name, p.address, p.city, p.state,
> > geo_distance((SELECT point(z.longitude, z.latitude)
> > FROM zipcodes AS z
> > WHERE z.zip_code='55404'),
> > point(p.long, p.lat)) as dist
> > FROM phone_address AS p,
> > (SELECT * FROM phone_cat WHERE nameftx ## 'salon'
> > ORDER BY cid) AS pc,
> > phone_cat_address AS pca,
> > WHERE pc.cid = pca.cid AND pca.aid = p.aid
> > ) AS ss
> > WHERE ss.dist < 35
> > ORDER BY ss.dist
> > LIMIT 20;

Limit (cost=20934.34..20934.34 rows=20 width=93)
(actual time=12176.87..12177.84 rows=20 loops=1)
InitPlan
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Index Scan using zipcodes_zc_idx on zipcodes z
(cost=0.00..3.01 rows=1 width=16)
(actual time=0.00..0.00 rows=1 loops=1)
-> Sort (cost=20934.34..20934.34 rows=959 width=93)
(actual time=12176.87..12177.80 rows=21 loops=1)
-> Hash Join (cost=8455.23..20886.82 rows=959 width=93)
(actual time=5344.78..12127.58 rows=1745 loops=1)
-> Seq Scan on phone_address p
(cost=0.00..11207.20 rows=69280 width=81)
(actual time=0.01..5821.06 rows=105214 loops=1)
-> Hash (cost=8448.04..8448.04 rows=2878 width=12)
(actual time=5336.00..5336.00 rows=0 loops=1)
-> Hash Join (cost=43.79..8448.04 rows=2878
width=12)
(actual time=388.78..5305.09 rows=4217 loops=1)
-> Seq Scan on phone_cat_address pca
(cost=0.00..5843.01 rows=336701 width=8)
(actual time=0.00..3475.08 rows=336701
loops=1)
-> Hash (cost=43.77..43.77 rows=11 width=74)
(actual time=4.92..4.92 rows=0 loops=1)
-> Subquery Scan pc
(cost=43.77..43.77 rows=11 width=74)
(actual time=4.88..4.91 rows=8 loops=1)
-> Sort (cost=43.77..43.77
rows=11 width=74)
(actual time=4.88..4.89
rows=8 loops=1)
-> Index Scan using
phone_cat_name_fts_idx on phone_cat
(cost=0.00..43.58 rows=11 width=74)
(actual time=0.97..3.90 rows=8
loops=1)
Total runtime: 12178.84 msec

--
Thomas T. Thai
Minnesota.com, Inc.


From: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-06 14:22:32
Message-ID: 20020306232030.C159.RK73@sea.plala.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 6 Mar 2002 00:51:15 -0600 (CST)
"Thomas T. Thai" <tom(at)minnesota(dot)com> wrote:

> On Wed, 6 Mar 2002, Thomas T. Thai wrote:
>
> > ERROR: parser: parse error at or near "WHERE"
>
> ok found it. comma before where. see explain below.
>
> > > set enable_seqscan to on;
> > > explain analyze --- (1')
> > > SELECT *
> > > FROM (SELECT p.name, p.address, p.city, p.state,
> > > geo_distance((SELECT point(z.longitude, z.latitude)
> > > FROM zipcodes AS z
> > > WHERE z.zip_code='55404'),
> > > point(p.long, p.lat)) as dist
> > > FROM phone_address AS p,
> > > (SELECT * FROM phone_cat WHERE nameftx ## 'salon') AS pc,
> > > phone_cat_address AS pca,
> > > WHERE pc.cid = pca.cid AND pca.aid = p.aid
> > > ) AS ss
> > > WHERE ss.dist < 35
> > > ORDER BY ss.dist
> > > LIMIT 20;
>
> Limit (cost=9537.20..9537.20 rows=20 width=93)
> (actual time=5849.67..5849.73 rows=20 loops=1)
> InitPlan
> -> Index Scan using zipcodes_zc_idx on zipcodes z
> (cost=0.00..3.01 rows=1width=16)
> (actual time=0.00..0.00 rows=1 loops=1)
> -> Index Scan using zipcodes_zc_idx on zipcodes z
> (cost=0.00..3.01 rows=1 width=16)
> (actual time=0.00..0.93 rows=1 loops=1)
> -> Sort (cost=9537.20..9537.20 rows=112 width=93)
> (actual time=5849.66..5849.68 rows=21 loops=1)
> -> Nested Loop (cost=43.61..9533.38 rows=112 width=93)
> (actual time=383.80..5798.66 rows=1745 loops=1)
> -> Hash Join (cost=43.61..7574.33 rows=337 width=12)
> (actual time=382.86..5214.98 rows=4217 loops=1)
> -> Seq Scan on phone_cat_address pca
> (cost=0.00..5843.01 rows=336701 width=8)
> (actual time=0.00..3441.76 rows=336701 loops=1)
> -> Hash (cost=43.58..43.58 rows=11 width=4)
> (actual time=3.91..3.91 rows=0 loops=1)
> -> Index Scan using phone_cat_name_fts_idx
> on phone_cat (cost=0.00..43.58 rows=11 width=4)
> (actual time=0.00..3.90 rows=8 loops=1)
> -> Index Scan using phone_address_aid_key
> on phone_address p (cost=0.00..5.81 rows=1 width=81)
> (actual time=0.10..0.11 rows=0 loops=4217)
> Total runtime: 5851.62 msec

That's not bat, I guess. The query below is a try to manage to get
a pattern like this:

-> Nested Loop
-> Index Scan using phone_cat_nameftx_idx on phone_cat pc
-> Index Scan using phone_cat_address_cid_key on phone_cat_address pca

But I have no idea any more.

set enable_seqscan to on;
explain analyze --- (1'')
SELECT *
FROM (SELECT p.name, p.address, p.city, p.state,
geo_distance((SELECT point(z.longitude, z.latitude)
FROM zipcodes AS z
WHERE z.zip_code='55404'),
point(p.long, p.lat)) as dist
FROM phone_address AS p,
(SELECT pca.aid
FROM (SELECT * FROM phone_cat
WHERE nameftx ## 'salon') AS pc,
phone_cat_address AS pca
WHERE pc.cid = pca.cid
ORDER BY 1
) AS pc_pca
WHERE pc_pca.aid = p.aid
) AS ss
WHERE ss.dist < 35
ORDER BY ss.dist
LIMIT 20;

Regards,
Masaru Sugawara


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-06 14:42:43
Message-ID: Pine.NEB.4.43.0203060836040.9616-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 6 Mar 2002, Masaru Sugawara wrote:
[...]
> That's not bat, I guess. The query below is a try to manage to get
> a pattern like this:
>
> -> Nested Loop
> -> Index Scan using phone_cat_nameftx_idx on phone_cat pc
> -> Index Scan using phone_cat_address_cid_key on phone_cat_address pca

the query below still results in a seq scan:
...
-> Seq Scan on phone_cat_address pca
(cost=0.00..5843.01 rows=336701 width=8)
(actual time=0.97..2875.06 rows=336701 loops=1)
-> Hash (cost=43.58..43.58 rows=11 width=4)
(actual time=3.91..3.91 rows=0 loops=1)
...
Total runtime: 5240.28 msec

something is not right about with the planner. using my original query and
with seqscan off, i can get the query to drop to around 600 ms. can't
understand why it would choose to use seqscan on phone_cat_address.

having tuned postgresql, there has been a huge speed increase. however,
the seq scan is still costing me. doing a 'set enable_seqscan to off;'
before the actual query seems like a kludge.

> set enable_seqscan to on;
> explain analyze --- (1'')
> SELECT *
> FROM (SELECT p.name, p.address, p.city, p.state,
> geo_distance((SELECT point(z.longitude, z.latitude)
> FROM zipcodes AS z
> WHERE z.zip_code='55404'),
> point(p.long, p.lat)) as dist
> FROM phone_address AS p,
> (SELECT pca.aid
> FROM (SELECT * FROM phone_cat
> WHERE nameftx ## 'salon') AS pc,
> phone_cat_address AS pca
> WHERE pc.cid = pca.cid
> ORDER BY 1
> ) AS pc_pca
> WHERE pc_pca.aid = p.aid
> ) AS ss
> WHERE ss.dist < 35
> ORDER BY ss.dist
> LIMIT 20;

--
Thomas T. Thai
Minnesota.com, Inc.


From: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
To: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-06 17:35:02
Message-ID: 20020307023431.DEC1.RK73@sea.plala.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 6 Mar 2002 08:42:43 -0600 (CST)
"Thomas T. Thai" <tom(at)minnesota(dot)com> wrote:

> On Wed, 6 Mar 2002, Masaru Sugawara wrote:
> [...]
> > That's not bat, I guess. The query below is a try to manage to get
> > a pattern like this:
> >
> > -> Nested Loop
> > -> Index Scan using phone_cat_nameftx_idx on phone_cat pc
> > -> Index Scan using phone_cat_address_cid_key on phone_cat_address pca
>
> the query below still results in a seq scan:
> ...
> -> Seq Scan on phone_cat_address pca
> (cost=0.00..5843.01 rows=336701 width=8)
> (actual time=0.97..2875.06 rows=336701 loops=1)
> -> Hash (cost=43.58..43.58 rows=11 width=4)
> (actual time=3.91..3.91 rows=0 loops=1)

It's a pity that the query use no index on phone_cat_address.

> ...
> Total runtime: 5240.28 msec
>
> something is not right about with the planner. using my original query and
> with seqscan off, i can get the query to drop to around 600 ms. can't
> understand why it would choose to use seqscan on phone_cat_address.

Since phone_cat_address isn't limited by a WHERE cluse, etc., most of its
rows will be selected. Therefore the planner seems to judge that a
sequential scan is better/faster than an index scan.

Regards,
Masaru Sugawara


From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: help with getting index scan
Date: 2002-03-06 18:11:17
Message-ID: Pine.NEB.4.43.0203061209450.10216-100000@ns01.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 7 Mar 2002, Masaru Sugawara wrote:

> > the query below still results in a seq scan:
> > ...
> > -> Seq Scan on phone_cat_address pca
> > (cost=0.00..5843.01 rows=336701 width=8)
> > (actual time=0.97..2875.06 rows=336701 loops=1)
> > -> Hash (cost=43.58..43.58 rows=11 width=4)
> > (actual time=3.91..3.91 rows=0 loops=1)
>
>
> It's a pity that the query use no index on phone_cat_address.

when i force it to use index scan, time drops down to ~800 ms.

> > Total runtime: 5240.28 msec
[...]
> Since phone_cat_address isn't limited by a WHERE cluse, etc., most of its
> rows will be selected. Therefore the planner seems to judge that a
> sequential scan is better/faster than an index scan.

--
Thomas T. Thai
Minnesota.com, Inc.