Is dump-reload the only cure?

Lists: pgsql-adminpgsql-performance
From: <mallah(at)trade-india(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Is dump-reload the only cure?
Date: 2002-11-01 11:15:43
Message-ID: 1584.203.145.130.142.1036149343.squirrel@mail.trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance


Hi ,

For a particular table it was only dump and reload of the table that
helped in enabling index usage.

I tried VACUUM ANALYZE and even recreating the indexes but it
did not work.

why does the planner use the index like a miser?
below are the details

was there anything bettwer i could have done for indexes getting used?

regds
mallah.

Query:

explain SELECT count( email_id ) from email_bank_mailing_lists where query_id='499';
NOTICE: QUERY PLAN:

Aggregate (cost=4330.48..4330.48 rows=1 width=4)
-> Index Scan using email_bank_ml_qid on email_bank_mailing_lists (cost=0.00..4327.28
rows=1282 width=4)
EXPLAIN

distribution of query_id in table:
total: 256419

query_id | count(*)
----------------------
298 | 6167
328 | 2083
354 | 9875
404 | 6974
432 | 5059
437 | 2497
440 | 2837
448 | 14624
449 | 13053
454 | 409
455 | 3725
456 | 560
458 | 3477
460 | 5561
486 | 41842
488 | 63642
492 | 2244
493 | 6047
494 | 37415
499 | 25010
501 | 3318

before dump reload:
tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists;
NOTICE: --Relation email_bank_mailing_lists--
NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
NOTICE: Analyzing email_bank_mailing_lists
VACUUM
tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where
query_id=499;NOTICE: QUERY PLAN:

Aggregate (cost=6863.24..6863.24 rows=1 width=4)
-> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4)

EXPLAIN

-----------------------------------------
Get your free web based email at trade-india.com.
"India's Leading B2B eMarketplace.!"
http://www.trade-india.com/


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: mallah(at)trade-india(dot)com
Cc: pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 12:52:40
Message-ID: 1036155160.3550.8.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

On Fri, 2002-11-01 at 06:15, mallah(at)trade-india(dot)com wrote:

Looks like a borderline case. See the costs of the index scan and
sequential scan are very similar. Since 499 covers nearly 1 in 10
tuples, it's likely found on nearly every page. This should make a
sequential scan much cheaper.

However, if the data is clumped together (not distributed throughout the
table) than an index scan may be preferable. So... CLUSTER may be
useful to you.

In the future please 'explain analyze' the queries you're looking at to
see actual costs as compared to the estimated cost.

> 499 | 25010
> 501 | 3318
>
>
> before dump reload:
> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists;
> NOTICE: --Relation email_bank_mailing_lists--
> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
> Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
> NOTICE: Analyzing email_bank_mailing_lists
> VACUUM
> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where
> query_id=499;NOTICE: QUERY PLAN:
>
> Aggregate (cost=6863.24..6863.24 rows=1 width=4)
> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4)
>
> EXPLAIN

--
Rod Taylor


From: <mallah(at)trade-india(dot)com>
To: <rbt(at)rbt(dot)ca>
Cc: <mallah(at)trade-india(dot)com>, <pgsql-admin(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 13:03:36
Message-ID: 1062.203.145.130.142.1036155816.squirrel@mail.trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance


Hi Rod ,

Does it means that index scan is used for less frequenlty occuring data?
yes my table was not clustered.

can u tell me what does 0.00..6788.24 and rows and width means?

in explain out put cost=0.00..6788.24 rows=30001 width=4

I have one more table where i face the similar problem , i have not dump - reloaded
it yet , will post again if i face the problem.

thanks

Regds
Mallah.

> On Fri, 2002-11-01 at 06:15, mallah(at)trade-india(dot)com wrote:
>
>
> Looks like a borderline case. See the costs of the index scan and sequential scan are very
> similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This
> should make a sequential scan much cheaper.
>
> However, if the data is clumped together (not distributed throughout the table) than an index
> scan may be preferable. So... CLUSTER may be useful to you.
>
> In the future please 'explain analyze' the queries you're looking at to see actual costs as
> compared to the estimated cost.
>
>
>> 499 | 25010
>> 501 | 3318
>>
>>
>> before dump reload:
>> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation
>> email_bank_mailing_lists--
>> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
>> Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
>> NOTICE: Analyzing email_bank_mailing_lists
>> VACUUM
>> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where
>> query_id=499;NOTICE: QUERY PLAN:
>>
>> Aggregate (cost=6863.24..6863.24 rows=1 width=4)
>> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4)
>>
>> EXPLAIN
>
> --
> Rod Taylor

-----------------------------------------
Get your free web based email at trade-india.com.
"India's Leading B2B eMarketplace.!"
http://www.trade-india.com/


From: <mallah(at)trade-india(dot)com>
To: <rbt(at)rbt(dot)ca>
Cc: <pgsql-admin(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 13:15:22
Message-ID: 1068.203.145.130.142.1036156522.squirrel@mail.trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

Rod ,

Clustering did work for my other case ;-)

tradein_clients=> explain analyze SELECT count(*) from email_source where source_id=173;
NOTICE: QUERY PLAN:

Aggregate (cost=13042.91..13042.91 rows=1 width=0) (actual time=1415.32..1415.32 rows=1 loops=1)
-> Seq Scan on email_source (cost=0.00..12964.48 rows=31375 width=0) (actual
time=1.19..1368.58 rows=32851 loops=1)Total runtime: 1415.42 msec

EXPLAIN
tradein_clients=> \d email_source
Table "email_source"
Column | Type | Modifiers
-----------+---------+-----------
email_id | integer |
source_id | integer |
Indexes: email_source_sid
Unique keys: email_source_idx
tradein_clients=> CLUSTER email_source_sid on email_source ;
CLUSTER
tradein_clients=>
tradein_clients=> explain analyze SELECT count(*) from email_source where source_id=173;
NOTICE: QUERY PLAN:

Aggregate (cost=11458.83..11458.83 rows=1 width=0) (actual time=207.73..207.73 rows=1 loops=1)
-> Index Scan using email_source_sid on email_source (cost=0.00..11449.76 rows=3627 width=0)
(actual time=0.27..161.04 rows=32851 loops=1)Total runtime: 207.90 msec
EXPLAIN

Does it Mean that clustered indexes are guarrented to be used for index scan?
one more thing does clustering means that all future data addition will happen
in the ordered manner only i mean consecutively in terms of source_id?

Regds
MALLAH.

> On Fri, 2002-11-01 at 06:15, mallah(at)trade-india(dot)com wrote:
>
>
> Looks like a borderline case. See the costs of the index scan and sequential scan are very
> similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This
> should make a sequential scan much cheaper.
>
> However, if the data is clumped together (not distributed throughout the table) than an index
> scan may be preferable. So... CLUSTER may be useful to you.
>
> In the future please 'explain analyze' the queries you're looking at to see actual costs as
> compared to the estimated cost.
>
>
>> 499 | 25010
>> 501 | 3318
>>
>>
>> before dump reload:
>> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation
>> email_bank_mailing_lists--
>> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
>> Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
>> NOTICE: Analyzing email_bank_mailing_lists
>> VACUUM
>> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where
>> query_id=499;NOTICE: QUERY PLAN:
>>
>> Aggregate (cost=6863.24..6863.24 rows=1 width=4)
>> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4)
>>
>> EXPLAIN
>
> --
> Rod Taylor

-----------------------------------------
Get your free web based email at trade-india.com.
"India's Leading B2B eMarketplace.!"
http://www.trade-india.com/


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: mallah(at)trade-india(dot)com
Cc: pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 14:07:41
Message-ID: 1036159661.5794.0.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

See Paragraph 2 of the description section:
http://www.postgresql.org/idocs/index.php?sql-explain.html

In the above is a good explanation of 'cost'. Rows is the number of
rows estimated to be returned, and width is the expected number of
columns it needs to deal with at that point.

On Fri, 2002-11-01 at 08:03, mallah(at)trade-india(dot)com wrote:
>
> Hi Rod ,
>
> Does it means that index scan is used for less frequenlty occuring data?
> yes my table was not clustered.
>
> can u tell me what does 0.00..6788.24 and rows and width means?
>
> in explain out put cost=0.00..6788.24 rows=30001 width=4
>
>
> I have one more table where i face the similar problem , i have not dump - reloaded
> it yet , will post again if i face the problem.
>
>
> thanks
>
> Regds
> Mallah.
>
>
> > On Fri, 2002-11-01 at 06:15, mallah(at)trade-india(dot)com wrote:
> >
> >
> > Looks like a borderline case. See the costs of the index scan and sequential scan are very
> > similar. Since 499 covers nearly 1 in 10 tuples, it's likely found on nearly every page. This
> > should make a sequential scan much cheaper.
> >
> > However, if the data is clumped together (not distributed throughout the table) than an index
> > scan may be preferable. So... CLUSTER may be useful to you.
> >
> > In the future please 'explain analyze' the queries you're looking at to see actual costs as
> > compared to the estimated cost.
> >
> >
> >> 499 | 25010
> >> 501 | 3318
> >>
> >>
> >> before dump reload:
> >> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists; NOTICE: --Relation
> >> email_bank_mailing_lists--
> >> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
> >> Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
> >> NOTICE: Analyzing email_bank_mailing_lists
> >> VACUUM
> >> tradein_clients=# explain SELECT count( email_id ) from email_bank_mailing_lists where
> >> query_id=499;NOTICE: QUERY PLAN:
> >>
> >> Aggregate (cost=6863.24..6863.24 rows=1 width=4)
> >> -> Seq Scan on email_bank_mailing_lists (cost=0.00..6788.24 rows=30001 width=4)
> >>
> >> EXPLAIN
> >
> > --
> > Rod Taylor
>
>
>
> -----------------------------------------
> Get your free web based email at trade-india.com.
> "India's Leading B2B eMarketplace.!"
> http://www.trade-india.com/
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
--
Rod Taylor


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: mallah(at)trade-india(dot)com
Cc: pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 14:11:03
Message-ID: 1036159864.5794.5.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

On Fri, 2002-11-01 at 08:03, mallah(at)trade-india(dot)com wrote:
>
> Hi Rod ,
>
> Does it means that index scan is used for less frequenlty occuring data?
> yes my table was not clustered.
>
> can u tell me what does 0.00..6788.24 and rows and width means?
>
> in explain out put cost=0.00..6788.24 rows=30001 width=4
>
>
> I have one more table where i face the similar problem , i have not dump - reloaded
> it yet , will post again if i face the problem.

Keep in mind that an index scan is very expensive in regards to a single
tuple. It has to run through (fetch) the index pages, then fetch the
pages from the table. Since the table fetches are random, the harddrive
will probably incur a seek for each tuple found in the index. The seeks
add up much quicker than a sequential scan (without nearly as many seeks
or drive head movements).

--
Rod Taylor


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: mallah(at)trade-india(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Is dump-reload the only cure?
Date: 2002-11-01 14:38:51
Message-ID: 1036161531.5794.18.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

> Does it Mean that clustered indexes are guarrented to be used for index scan?
> one more thing does clustering means that all future data addition will happen
> in the ordered manner only i mean consecutively in terms of source_id?

No, but clustering a table allows an index scan to visit less pages, and
make less disk seeks. This in turn makes it a better choice for some
queries due to the current layout of tuples on the disk. However, there
are new borderline cases -- just in different places than before.

--
Rod Taylor


From: <mallah(at)trade-india(dot)com>
To: <rbt(at)rbt(dot)ca>
Cc: <pgsql-admin(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 14:41:22
Message-ID: 1040.203.145.130.142.1036161682.squirrel@mail.trade-india.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

Thanks for the insight Rod.
is there any other place i can know more abt these principles?

But if the table is clustered then the pages are stored catagiously
with respect to that column rite?

> On Fri, 2002-11-01 at 08:03, mallah(at)trade-india(dot)com wrote:
>>
>> Hi Rod ,
>>
>> Does it means that index scan is used for less frequenlty occuring data? yes my table was not
>> clustered.
>>
>> can u tell me what does 0.00..6788.24 and rows and width means?
>>
>> in explain out put cost=0.00..6788.24 rows=30001 width=4
>>
>>
>> I have one more table where i face the similar problem , i have not dump - reloaded it yet ,
>> will post again if i face the problem.
>
> Keep in mind that an index scan is very expensive in regards to a single tuple. It has to run
> through (fetch) the index pages, then fetch the pages from the table. Since the table fetches
> are random, the harddrive will probably incur a seek for each tuple found in the index. The
> seeks add up much quicker than a sequential scan (without nearly as many seeks or drive head
> movements).
>
> --
> Rod Taylor

-----------------------------------------
Get your free web based email at trade-india.com.
"India's Leading B2B eMarketplace.!"
http://www.trade-india.com/


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: mallah(at)trade-india(dot)com
Cc: pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [pgsql-performance] Is dump-reload the only cure?
Date: 2002-11-01 19:54:05
Message-ID: 1036180445.20463.22.camel@camel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

On Fri, 2002-11-01 at 06:15, mallah(at)trade-india(dot)com wrote:
> tradein_clients=# VACUUM VERBOSE ANALYZE email_bank_mailing_lists;
> NOTICE: --Relation email_bank_mailing_lists--
> NOTICE: Pages 3583: Changed 0, Empty 0; Tup 256419: Vac 0, Keep 0, UnUsed 44822.
> Total CPU 0.24s/0.04u sec elapsed 0.30 sec.
> NOTICE: Analyzing email_bank_mailing_lists
> VACUUM

I'd suggest running a vacuum full and then running vacuum analyze more
often. 44822 unused tuples seems quite excessive to me...

Robert Treat