Re: Index bloat problem?

Lists: pgsql-performance
From: Bill Chandler <billybobc1210(at)yahoo(dot)com>
To: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Index bloat problem?
Date: 2005-04-21 17:00:18
Message-ID: 20050421170018.2998.qmail@web51403.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

All,

Running PostgreSQL 7.4.2, Solaris.

Client is reporting that the size of an index is
greater than the number of rows in the table (1.9
million vs. 1.5 million). Index was automatically
created from a 'bigserial unique' column.

Database contains several tables with exactly the same
columns (including 'bigserial unique' column). This
is the only table where this index is out of line with
the actual # of rows.

Queries on this table take 40 seconds to retrieve 2000
rows as opposed to 1-2 seconds on the other tables.

We have been running 'VACUUM ANALYZE' very regularly.
In fact, our vacuum schedule has probably been
overkill. We have been running on a per-table basis
after every update (many per day, only inserts
occurring) and after every purge (one per day,
deleting a day's worth of data).

It is theoretically possible that at some time a
process was run that deleted all rows in the table
followed by a VACUUM FULL. In this case we would have
dropped/recreated our own indexes on the table but not
the index automatically created for the bigserial
column. If that happened, could that cause these
symptoms?

What about if an out-of-the-ordinary number of rows
were deleted (say 75% of rows in the table, as opposed
to normal 5%) followed by a 'VACUUM ANALYZE'? Could
things get out of whack because of that situation?

thanks,

Bill

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bill Chandler <billybobc1210(at)yahoo(dot)com>
Cc: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-21 17:22:03
Message-ID: 200504211022.03541.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bill,

> What about if an out-of-the-ordinary number of rows
> were deleted (say 75% of rows in the table, as opposed
> to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
> things get out of whack because of that situation?

Yes. You'd want to run REINDEX after and event like that. As you should now.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Alex Turner <armtuk(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bill Chandler <billybobc1210(at)yahoo(dot)com>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-21 17:33:28
Message-ID: 33c6269f050421103311a3f3f8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Is:

REINDEX DATABASE blah

supposed to rebuild all indices in the database, or must you specify
each table individualy? (I'm asking because I just tried it and it
only did system tables)

Alex Turner
netEconomist

On 4/21/05, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Bill,
>
> > What about if an out-of-the-ordinary number of rows
> > were deleted (say 75% of rows in the table, as opposed
> > to normal 5%) followed by a 'VACUUM ANALYZE'? Could
> > things get out of whack because of that situation?
>
> Yes. You'd want to run REINDEX after and event like that. As you should now.
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index bloat problem?
Date: 2005-04-21 17:47:24
Message-ID: 60fyxk2f9v.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

josh(at)agliodbs(dot)com (Josh Berkus) writes:
> Bill,
>
>> What about if an out-of-the-ordinary number of rows
>> were deleted (say 75% of rows in the table, as opposed
>> to normal 5%) followed by a 'VACUUM ANALYZE'?  Could
>> things get out of whack because of that situation?
>
> Yes. You'd want to run REINDEX after and event like that. As you should now.

Based on Tom's recent comments, I'd be inclined to handle this via
doing a CLUSTER, which has the "triple heroism effect" of:

a) Reorganizing the entire table to conform with the relevant index order,
b) Having the effect of VACUUM FULL, and
c) Having the effect of REINDEX

all in one command.

It has all of the "oops, that blocked me for 20 minutes" effect of
REINDEX and VACUUM FULL, but at least it doesn't have the effect
twice...
--
(format nil "~S(at)~S" "cbbrowne" "acm.org")
http://www.ntlug.org/~cbbrowne/sap.html
Rules of the Evil Overlord #78. "I will not tell my Legions of Terror
"And he must be taken alive!" The command will be: ``And try to take
him alive if it is reasonably practical.''"
<http://www.eviloverlord.com/>


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Alex Turner <armtuk(at)gmail(dot)com>
Cc: Bill Chandler <billybobc1210(at)yahoo(dot)com>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-21 17:50:29
Message-ID: 200504211050.29365.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alex,

> REINDEX DATABASE blah
>
> supposed to rebuild all indices in the database, or must you specify
> each table individualy? (I'm asking because I just tried it and it
> only did system tables)

"DATABASE

Recreate all system indexes of a specified database. Indexes on user tables
are not processed. Also, indexes on shared system catalogs are skipped except
in stand-alone mode (see below). "

http://www.postgresql.org/docs/8.0/static/sql-reindex.html

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: a3a18850(at)telus(dot)net
To: Bill Chandler <billybobc1210(at)yahoo(dot)com>
Cc: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-21 18:41:48
Message-ID: 1114108908.4267f3ec2e0ea@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Quoting Bill Chandler <billybobc1210(at)yahoo(dot)com>:

> Running PostgreSQL 7.4.2, Solaris.
> Client is reporting that the size of an index is
> greater than the number of rows in the table (1.9
> million vs. 1.5 million). Index was automatically
> created from a 'bigserial unique' column.

> We have been running 'VACUUM ANALYZE' very regularly.
> In fact, our vacuum schedule has probably been
> overkill. We have been running on a per-table basis
> after every update (many per day, only inserts
> occurring) and after every purge (one per day,
> deleting a day's worth of data).
>
> What about if an out-of-the-ordinary number of rows
> were deleted (say 75% of rows in the table, as opposed
> to normal 5%) followed by a 'VACUUM ANALYZE'? Could
> things get out of whack because of that situation?

I gather you mean, out-of-the-ordinary for most apps, but not for this client?

In case nobody else has asked: is your max_fsm_pages big enough to handle all
the deleted pages, across ALL tables hit by the purge? If not, you're
haemorrhaging pages, and VACUUM is probably warning you about exactly that.

If that's not a problem, you might want to consider partitioning the data.
Take a look at inherited tables. For me, they're a good approximation of
clustered indexes (sigh, miss'em) and equivalent to table spaces.

My app is in a similar boat to yours: up to 1/3 of a 10M-row table goes away
every day. For each of the child tables that is a candidate to be dropped, there
is a big prologue txn, whichs moves (INSERT then DELETE) the good rows into a
child table that is NOT to be dropped. Then BANG pull the plug on the tables you
don't want. MUCH faster than DELETE: the dropped tables' files' disk space goes
away in one shot, too.

Just my 2c.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Chandler <billybobc1210(at)yahoo(dot)com>
Cc: pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-22 05:57:11
Message-ID: 27013.1114149431@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bill Chandler <billybobc1210(at)yahoo(dot)com> writes:
> Client is reporting that the size of an index is
> greater than the number of rows in the table (1.9
> million vs. 1.5 million).

This thread seems to have wandered away without asking the critical
question "what did you mean by that?"

It's not possible for an index to have more rows than there are in
the table unless something is seriously broken. And there aren't
any SQL operations that let you inspect an index directly anyway.
So: what is the actual observation that led you to the above
conclusion? Facts, please, not inferences.

regards, tom lane


From: David Roussel <pgsql-performance(at)diroussel(dot)xsmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bill Chandler <billybobc1210(at)yahoo(dot)com>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-22 08:44:05
Message-ID: 869771629a1d24a91b8df779f7cddfe9@diroussel.xsmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 22 Apr 2005, at 06:57, Tom Lane wrote:
> Bill Chandler <billybobc1210(at)yahoo(dot)com> writes:
>> Client is reporting that the size of an index is
>> greater than the number of rows in the table (1.9
>> million vs. 1.5 million).
>
> This thread seems to have wandered away without asking the critical
> question "what did you mean by that?"
>
> It's not possible for an index to have more rows than there are in
> the table unless something is seriously broken. And there aren't
> any SQL operations that let you inspect an index directly anyway.
> So: what is the actual observation that led you to the above
> conclusion? Facts, please, not inferences.

I work for the client in question. Glad you picked up on that point. I
covered the detail in my my post "How can an index be larger than a
table" on 21 Apr. 2005. I guess I was too detailed, and too much info
put people off.
http://archives.postgresql.org/pgsql-performance/2005-04/msg00553.php

quoting from there...

|
|SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE
relname LIKE 'dave_data%';
|
|relname relkind reltuples relpages
|======================================= ======= ========= ========
|dave_data_update_events r 1593600.0 40209
|dave_data_update_events_event_id_key i 1912320.0 29271
|dave_data_update_events_event_id_seq S 1.0 1
|dave_data_update_events_lds_idx i 1593600.0 6139
|dave_data_update_events_obj_id_idx i 1593600.0 6139
|iso_pjm_data_update_events_obj_id_idx i 1593600.0 6139
|

Note that there are only 1593600 rows in the table, so why the 1912320
figure?

Of course I checked that the row count was correct...

|
|EXPLAIN ANALYZE
|select count(*) from iso_pjm_data_update_events
|
|QUERY PLAN
|Aggregate (cost=60129.00..60129.00 rows=1 width=0) (actual
time=35933.292..35933.293 rows=1 loops=1)
| -> Seq Scan on iso_pjm_data_update_events (cost=0.00..56145.00
rows=1593600 width=0) (actual time=0.213..27919.497 rows=1593600
loops=1)
|Total runtime: 35933.489 ms
|

and...

|
|select count(*) from iso_pjm_data_update_events
|
|count
|1593600
|

so it's not that there are any undeleted rows lying around


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Roussel <pgsql-performance(at)diroussel(dot)xsmail(dot)com>
Cc: Bill Chandler <billybobc1210(at)yahoo(dot)com>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-22 14:06:33
Message-ID: 329.1114178793@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

David Roussel <pgsql-performance(at)diroussel(dot)xsmail(dot)com> writes:
> |dave_data_update_events r 1593600.0 40209
> |dave_data_update_events_event_id_key i 1912320.0 29271

Hmm ... what PG version is this, and what does VACUUM VERBOSE on
that table show?

regards, tom lane


From: "David Roussel" <pgsql-performance(at)diroussel(dot)xsmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bill Chandler" <billybobc1210(at)yahoo(dot)com>, "pgsql-perform" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-22 16:16:36
Message-ID: 1114186596.23462.232500712@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, 22 Apr 2005 10:06:33 -0400, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> said:
> David Roussel <pgsql-performance(at)diroussel(dot)xsmail(dot)com> writes:
> > |dave_data_update_events r 1593600.0 40209
> > |dave_data_update_events_event_id_key i 1912320.0 29271
>
> Hmm ... what PG version is this, and what does VACUUM VERBOSE on
> that table show?

PG 7.4

The disparity seems to have sorted itself out now, so hampering futher
investigations. I guess the regular inserts of new data, and the nightly
deletion and index recreation did it. However, we did suffer reduced
performance and the strange cardinality for several days before it went
away. For what it's worth..

ndb=# vacuum verbose iso_pjm_data_update_events;
INFO: vacuuming "public.iso_pjm_data_update_events"
INFO: index "iso_pjm_data_update_events_event_id_key" now contains
1912320 row versions in 29271 pages
DETAIL: 21969 index pages have been deleted, 20000 are currently
reusable.
CPU 6.17s/0.88u sec elapsed 32.55 sec.
INFO: index "iso_pjm_data_update_events_lds_idx" now contains 1912320
row versions in 7366 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 3.52s/0.57u sec elapsed 14.35 sec.
INFO: index "iso_pjm_data_update_events_obj_id_idx" now contains
1912320 row versions in 7366 pages
DETAIL: 0 index pages have been deleted, 0 are currently reusable.
CPU 3.57s/0.58u sec elapsed 12.87 sec.
INFO: "iso_pjm_data_update_events": found 0 removable, 1912320
nonremovable row versions in 40209 pages
DETAIL: 159384 dead row versions cannot be removed yet.
There were 745191 unused item pointers.
0 pages are entirely empty.
CPU 18.26s/3.62u sec elapsed 74.35 sec.
VACUUM

After each insert is does this...

VACUUM ANALYZE iso_pjm_DATA_UPDATE_EVENTS
VACUUM ANALYZE iso_pjm_CONTROL

Each night it does this...

BEGIN
DROP INDEX iso_pjm_control_obj_id_idx
DROP INDEX iso_pjm_control_real_name_idx
DROP INDEX iso_pjm_data_update_events_lds_idx
DROP INDEX iso_pjm_data_update_events_obj_id_idx
CREATE UNIQUE INDEX iso_pjm_control_obj_id_idx ON
iso_pjm_control(obj_id)
CLUSTER iso_pjm_control_obj_id_idx ON iso_pjm_control
CREATE UNIQUE INDEX iso_pjm_control_real_name_idx ON
iso_pjm_control(real_name)
CREATE INDEX iso_pjm_data_update_events_lds_idx ON
iso_pjm_data_update_events(lds)
CREATE INDEX iso_pjm_data_update_events_obj_id_idx ON
iso_pjm_data_update_events(obj_id)
COMMIT

Note there is no reference to iso_pjm_data_update_events_event_id_key
which is the index that went wacky on us. Does that seem weird to you?

Thanks

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David Roussel" <pgsql-performance(at)diroussel(dot)xsmail(dot)com>
Cc: "Bill Chandler" <billybobc1210(at)yahoo(dot)com>, "pgsql-perform" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat problem?
Date: 2005-04-22 17:28:32
Message-ID: 3033.1114190912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"David Roussel" <pgsql-performance(at)diroussel(dot)xsmail(dot)com> writes:
> Note there is no reference to iso_pjm_data_update_events_event_id_key
> which is the index that went wacky on us. Does that seem weird to you?

What that says is that that index doesn't belong to that table. You
sure it wasn't a chance coincidence of names that made you think it did?

regards, tom lane