Re: help needed -- sequential scan problem

Lists: pgsql-performance
From: sarlav kumar <sarlavk(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: help needed -- sequential scan problem
Date: 2004-11-19 22:31:22
Message-ID: 20041119223122.49470.qmail@web51301.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi All,

I am new to Postgres.

I have a query which does not use index scan unless I force postgres to use index scan. I dont want to force postgres, unless there is no way of optimizing this query.

The query :

select m.company_name,m.approved,cu.account_no,mbt.business_name,cda.country,
(select count(*) from merchant_purchase mp left join data d on mp.data_id=d.id where mp.merchant_id=m.id and d.status=5) as Trans_count,
(select sum(total * 0.01) from merchant_purchase mp left join data d on mp.data_id=d.id where mp.merchant_id=m.id and d.status=5) as Trans_amount,
(select count(*) from merchant_purchase mp left join data d on mp.data_id=d.id where d.what=15 and d.status=5 and d.flags=7 and mp.merchant_id=m.id) as Reversal_count
from merchant m
left join customer cu on cu.id=m.uid
left join customerdata cda on cda.uid=cu.id
left join merchant_business_types mbt on mbt.id=m.businesstype and
exists (select distinct(merchant_id) from merchant_purchase where m.id=merchant_id);


First Question: I know the way I have written the first two sub-selects is really bad, as they have the same conditions in the where clause. But I am not sure if there is a way to select two columns in a single sub-select query. When I tried to combine the two sub-select queries, I got an error saying that the sub-select can have only one column. Does anyone know any other efficient way of doing it?

Second Question: The query plan is as follows:

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=901.98..17063.67 rows=619 width=88) (actual time=52.01..5168.09 rows=619 loops=1)
Hash Cond: ("outer".businesstype = "inner".id)
Join Filter: (subplan)
-> Merge Join (cost=900.34..1276.04 rows=619 width=62) (actual time=37.00..97.58 rows=619 loops=1)
Merge Cond: ("outer".id = "inner".uid)
-> Merge Join (cost=900.34..940.61 rows=619 width=52) (actual time=36.91..54.66 rows=619 loops=1)
Merge Cond: ("outer".id = "inner".uid)
-> Sort (cost=795.45..810.32 rows=5949 width=17) (actual time=32.59..36.59 rows=5964 loops=1)
Sort Key: cu.id
-> Seq Scan on customer cu (cost=0.00..422.49 rows=5949 width=17) (actual time=0.02..15.69 rows=5964 loops=1)
-> Sort (cost=104.89..106.44 rows=619 width=35) (actual time=4.27..5.10 rows=619 loops=1)
Sort Key: m.uid
-> Seq Scan on merchant m (cost=0.00..76.19 rows=619 width=35) (actual time=0.04..2.65 rows=619 loops=1)
-> Index Scan using customerdata_uid_idx on customerdata cda (cost=0.00..311.85 rows=5914 width=10) (actual time=0.09..27.70 rows=5
919 loops=1)
-> Hash (cost=1.51..1.51 rows=51 width=26) (actual time=0.19..0.19 rows=0 loops=1)
-> Seq Scan o n merchant_business_types mbt (cost=0.00..1.51 rows=51 width=26) (actual time=0.04..0.12 rows=51 loops=1)
SubPlan
-> Aggregate (cost=269.89..269.89 rows=1 width=12) (actual time=2.70..2.70 rows=1 loops=619)
-> Nested Loop (cost=0.00..269.78 rows=44 width=12) (actual time=2.40..2.69 rows=4 loops=619)
Filter: ("inner".status = 5)
-> Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619)
Filter: (merchant_id = $0)
-> Index Scan using data_pkey on data d (cost=0.00..3.91 rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=3951)
Index Cond: ("outer".data_id = d.id)
-> Aggregate (cost=269.89..269.89 rows=1 width=16) (actual time=2.73..2.73 rows=1 loops=619)
-> Nested Loop (cost=0.00..269.78 rows=44 width=16) (actual time=2.42..2.70 rows=4 loops=619)
Filter: ("inner".status = 5)
-> Seq Scan on merchant_purchase m p (cost=0.00..95.39 rows=44 width=8) (actual time=2.39..2.60 rows=6 loops=619)
Filter: (merchant_id = $0)
-> Index Scan using data_pkey on data d (cost=0.00..3.91 rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=3951)
Index Cond: ("outer".data_id = d.id)
-> Aggregate (cost=270.12..270.12 rows=1 width=20) (actual time=2.72..2.72 rows=1 loops=619)
-> Nested Loop (cost=0.00..270.00 rows=44 width=20) (actual time=2.63..2.72 rows=0 loops=619)
Filter: (("inner".what = 15) AND ("inner".status = 5) AND ("inner".flags = 7))
-> Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.40..2.62 rows=6 loops=619)
Filter: (merchant_id = $0)
-> Index Scan using data_pkey on data d (cost=0.00..3.91 rows=1 width=16) (actual time=0.01..0.01 rows=1 loops=3951)
Index Cond: ("outer".data_id = d.id)
-> Unique (cost=0.00..113.14 rows=4 width=4) (actual time=0.02..0.02 rows=0 loops=598)
-> Index Scan using merchant_purchase_merchant_id_idx on merchant_purchase (cost=0.00..113.02 rows=44 width=4) (actual time=0.01.
.0.01 rows=0 loops=598)
Index Cond: ($0 = merchant_id)
Total runtime: 5170.37 msec (5.170 sec)
(42 rows)


As you can see, there are many sequential scans in the query plan. Postgres is not using the index defined, even though it leads to better performance(0.2 sec!! when i force index scan)

Is there something wrong in my query that makes postgres use seq scan as opposed to index scan?? Any help would be really appreciated.

Thanks for you time and help!
Saranya


---------------------------------
Do you Yahoo!?
Discover all thats new in My Yahoo!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: sarlav kumar <sarlavk(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: help needed -- sequential scan problem
Date: 2004-11-20 05:19:46
Message-ID: 16679.1100927986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

sarlav kumar <sarlavk(at)yahoo(dot)com> writes:
> I have a query which does not use index scan unless I force postgres to use index scan. I dont want to force postgres, unless there is no way of optimizing this query.

The major issue seems to be in the sub-selects:

> -> Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619)
> Filter: (merchant_id = $0)

where the estimated row count is a factor of 7 too high. If the
estimated row count were even a little lower, it'd probably have gone
for an indexscan. You might get some results from increasing the
statistics target for merchant_purchase.merchant_id. If that doesn't
help, I'd think about reducing random_page_cost a little bit.

regards, tom lane


From: sarlav kumar <sarlavk(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: help needed -- sequential scan problem
Date: 2004-11-22 19:41:25
Message-ID: 20041122194125.8040.qmail@web51302.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Tom,

Thanks for the help, Tom.

>The major issue seems to be in the sub-selects:

> -> Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619)
> Filter: (merchant_id = $0)
>where the estimated row count is a factor of 7 too high. If the
>estimated row count were even a little lower, it'd probably have gone
>for an indexscan.

I understand that the sub-selects are taking up most of the time as they do a sequential scan on the tables.

>You might get some results from increasing the
>statistics target for merchant_purchase.merchant_id.

Do I have to use vacuum analyze to update the statistics? If so, I have already tried that and it doesn't seem to help.

>If that doesn't help, I'd think about reducing random_page_cost a little bit.

I am sorry, I am not aware of what random_page_cost is, as I am new to Postgres. What does it signify and how do I reduce random_page_cost?

Thanks,
Saranya

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: sarlav kumar <sarlavk(at)yahoo(dot)com>
Subject: Re: help needed -- sequential scan problem
Date: 2004-11-22 19:57:38
Message-ID: 200411221157.38900.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Sarlav,

> I am sorry, I am not aware of what random_page_cost is, as I am new to
> Postgres. What does it signify and how do I reduce random_page_cost?

It's a parameter in your postgresql.conf file. After you test it, you will
want to change it there and reload the server (pg_ctl reload).

However, you can test it on an individual connection:
SET random_page_cost=2.5
(the default is 4.0)

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


From: sarlav kumar <sarlavk(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: sarlav kumar <sarlavk(at)yahoo(dot)com>
Subject: Re: help needed -- sequential scan problem
Date: 2004-11-22 19:58:01
Message-ID: 20041122195801.14066.qmail@web51302.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Josh,

Can you tell me in what way it affects performance? And How do I decide what value to set for the random_page_cost? Does it depend on any other factors?

Thanks,
Saranya

Josh Berkus <josh(at)agliodbs(dot)com> wrote:
Sarlav,

> I am sorry, I am not aware of what random_page_cost is, as I am new to
> Postgres. What does it signify and how do I reduce random_page_cost?

It's a parameter in your postgresql.conf file. After you test it, you will
want to change it there and reload the server (pg_ctl reload).

However, you can test it on an individual connection:
SET random_page_cost=2.5
(the default is 4.0)

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


From: "gnari" <gnari(at)simnet(dot)is>
To: "sarlav kumar" <sarlavk(at)yahoo(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: help needed -- sequential scan problem
Date: 2004-11-22 20:01:42
Message-ID: 013401c4d0ce$15e276e0$0100000a@wp2000
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

From: "sarlav kumar" <sarlavk(at)yahoo(dot)com>

> [Tom:]
> >You might get some results from increasing the
> >statistics target for merchant_purchase.merchant_id.
>
> Do I have to use vacuum analyze to update the statistics? If so, I have
already tried that and it doesn't seem to help.

alter table merchant_purchase alter column merchant_id set statistics 500;
analyze merchant_purchase;

>
> >If that doesn't help, I'd think about reducing random_page_cost a little
bit.
>
> I am sorry, I am not aware of what random_page_cost is, as I am new to
Postgres. What does it signify and how do I reduce random_page_cost?

set random_page_cost = 3;
explain analyse <query>

if it is an improvement, consider setting the value in your postgresql.conf,
but remember that this may affect other queries too.

gnari