Fastest way to drop an index?

Lists: pgsql-general
From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Fastest way to drop an index?
Date: 2009-02-03 12:18:58
Message-ID: e373d31e0902030418h4c029595ubc89abfdd804b5f7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a table with two fields:

user_id
col2

There is quite a pivotal SQL for our site that issues the query:

...WHERE user_id = 'xyz' and col2 = 'xyz'

Until now, I have had two indexes, one for user_id and one for col2.

Now, I built a compound index concurrently (user_id, col2). After an
error the previous, this has now built, but the query above is still
using individual indexes of olden times.

So I want to drop the index on col2 alone. (Col2 is never used by
itself, so no use keeping the index. We only created it originally
because an index on this field alone would be smaller than a compound
index, and because PG manual said individual indexes tend to work
better.)

But when I try to drop the col2 index, it takes forever, and eats up a
lot of memory to the extent that all other stuff stops. Why should
dropping an index be so tedious?

Would appreciate any thoughts on this. Thanks!


From: Filip RembiaƂkowski <plk(dot)zuber(at)gmail(dot)com>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fastest way to drop an index?
Date: 2009-02-03 15:14:07
Message-ID: 92869e660902030714t3936b82v828018a9d05eb082@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2009/2/3 Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>

> I have a table with two fields:
>
> user_id
> col2
>
> There is quite a pivotal SQL for our site that issues the query:
>
> ...WHERE user_id = 'xyz' and col2 = 'xyz'
>
> Until now, I have had two indexes, one for user_id and one for col2.
>
> Now, I built a compound index concurrently (user_id, col2). After an
> error the previous, this has now built, but the query above is still
> using individual indexes of olden times.
>
> So I want to drop the index on col2 alone. (Col2 is never used by
> itself, so no use keeping the index. We only created it originally
> because an index on this field alone would be smaller than a compound
> index, and because PG manual said individual indexes tend to work
> better.)
>
> But when I try to drop the col2 index, it takes forever, and eats up a
> lot of memory to the extent that all other stuff stops. Why should
> dropping an index be so tedious?
>
> Would appreciate any thoughts on this. Thanks!
>

is the system very write-intensive? Under high writes some DDL operations
are really a pain...
if so, you could try to make up a script which will rewrite data to another
table and switch tables after the operation. but this will require minimal
application downtime.

btw, what's your pg version and basic memory-related settings? is it the
same system that you describe in your other email?

and what is full definition of this table? only two fields? what's the
primary key?

cheers,
Filip

--
Filip RembiaƂkowski
JID,mailto:filip(dot)rembialkowski(at)gmail(dot)com


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fastest way to drop an index?
Date: 2009-02-03 17:30:10
Message-ID: dcc563d10902030930y55ef0ec5qb3cf8a03bc724227@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Feb 3, 2009 at 5:18 AM, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
> I have a table with two fields:
>
> user_id
> col2
>
> There is quite a pivotal SQL for our site that issues the query:
>
> ...WHERE user_id = 'xyz' and col2 = 'xyz'
>
> Until now, I have had two indexes, one for user_id and one for col2.
>
> Now, I built a compound index concurrently (user_id, col2). After an
> error the previous, this has now built, but the query above is still
> using individual indexes of olden times.
>
> So I want to drop the index on col2 alone. (Col2 is never used by
> itself, so no use keeping the index. We only created it originally
> because an index on this field alone would be smaller than a compound
> index, and because PG manual said individual indexes tend to work
> better.)

PostgreSQL can use > 1 index at a time, so it's possible that this
index IS getting used. Or does explain rule that out? IF you have a
maintenance window, you can test how the db works with different
indexes by dropping them in a transaction then rolling back. Note
that this locks the table for most other users, so don't do it midday
for a half an hour or so.

begin
drop index index1;
explain analyze select query goes here
rollback;

> But when I try to drop the col2 index, it takes forever, and eats up a
> lot of memory to the extent that all other stuff stops. Why should
> dropping an index be so tedious?

It shouldn't. Is this database under a very heavy load when you're
trying to remove the index? Are you sure the drop index is what's
eating up the memory and you're not just looking at memory usage due
to shared memory allocations?


From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fastest way to drop an index?
Date: 2009-02-03 19:02:39
Message-ID: e373d31e0902031102o3512c714xb88c197331b79b97@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for the suggestions.

Following is the SQL query. Actually, this is not merely a DROP INDEX
question. I am also surprised that this straight index query takes
more time than it used to! It would be under 1 second because it's a
one-row conditional match, but not it takes anywhere between 5 to 10
seconds for just one row!

Only change I have made recently is to increase the stats for user_id
to 300. Not for title_encrypted. User_id is varchar(35) and
title_encrypted is varchar(40).

Will this differential statistics on two columns in a WHERE clause
affect query speed? I wonder.

pguser=# explain analyze select title_alias from testimonials where
user_id = 'superman' and title_encrypted = md5('MY TITLE COMES HERE')
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using new_idx_testimonials_userid on testimonials
(cost=0.00..157.78 rows=1 width=9) (actual time=8809.715..8809.715
rows=0 loops=1)
Index Cond: ((user_id)::text = 'superman'::text)
Filter: ((title_encrypted)::text = 'b333dc1b0992cb8c70b58a418211389a'::text)
Total runtime: 8809.750 ms
(4 rows)

Time: 8811.817 ms

pguser=# explain analyze select title_alias from testimonials where
user_id = 'superman' and title_encrypted = md5('MY TITLE COMES HERE')
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using new_idx_testimonials_userid on testimonials
(cost=0.00..157.78 rows=1 width=9) (actual time=1.426..1.426 rows=0
loops=1)
Index Cond: ((user_id)::text = 'superman'::text)
Filter: ((title_encrypted)::text = 'b333dc1b0992cb8c70b58a418211389a'::text)
Total runtime: 1.462 ms
(4 rows)

Time: 2.289 ms


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Fastest way to drop an index?
Date: 2009-02-03 21:13:24
Message-ID: 15129.1233695604@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> writes:
> Index Scan using new_idx_testimonials_userid on testimonials
> (cost=0.00..157.78 rows=1 width=9) (actual time=8809.715..8809.715
> rows=0 loops=1)
> Index Cond: ((user_id)::text = 'superman'::text)
> Filter: ((title_encrypted)::text = 'b333dc1b0992cb8c70b58a418211389a'::text)
> Total runtime: 8809.750 ms

This is using the index to fetch the rows that match user_id = 'superman',
and then testing each fetched row to see if it has the desired value of
title_encrypted. The fact that hardly any rows pass the filter test
tells nearly nothing about how long this should be expected to run.
The rather high estimated cost suggests that the planner thinks there
are several dozen rows matching the index condition, and the actual
runtime suggests that there are actually hundred or thousands of 'em.
If so, your problem is that you need a different index. I'd bet on an
index on title_encrypted being more useful for this query than the one
on user_id; or you could experiment with a two-column index.

regards, tom lane