Quick Links

Index bloat of 4x

Lists:	pgsql-general

From:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Index bloat of 4x
Date:	2007-01-17 14:32:23
Message-ID:	20070117093223.bcdb302e.wmoran@collaborativefusion.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

We just did a bunch of maintenance on one of our production databases that
involved a lot of alter tables and moving records about and the like.

Afterwards, I did a vacuum full and analyze to get the database back on
track -- autovac maintains it under normal operations.

Today I decided to run reindex during a slow period, and was shocked to
find the database size drop from 165M to 30M. Keep in mind that the
165M is after vacuum full. So, apparently, there was 135M of index bloat?
That seems a little excessive to me, especially when the docs claim that
reindexing is usually not necessary.

This is PostgreSQL 8.1.4. We've got upgrades to 8.2 planned, but it's
going to be a few months before we can squeak that into a maintenance
window. Additionally, I thought all the big index improvements were
added in 7.4.

I guess my question is whether or not this is expected. It's obviously
not a good thing -- I've noticed that shared buffer usage has dropped
dramatically as well (from 28,000 to 7000). I hadn't expected index
bloat of this magnitude, and I'm concerned about when the database hits
2 or 3 G in size and has 12G just in indexes that take hours to rebuild.

--
Bill Moran
Collaborative Fusion Inc.

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 15:42:51
Message-ID:	20070117154251.GG26080@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Bill Moran wrote:
>
> We just did a bunch of maintenance on one of our production databases that
> involved a lot of alter tables and moving records about and the like.
>
> Afterwards, I did a vacuum full and analyze to get the database back on
> track -- autovac maintains it under normal operations.
>
> Today I decided to run reindex during a slow period, and was shocked to
> find the database size drop from 165M to 30M. Keep in mind that the
> 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> That seems a little excessive to me, especially when the docs claim that
> reindexing is usually not necessary.

It's been said that vacuum full does not fix index bloat -- in fact,
it's a problem it worsens. However, I very much doubt that it would be
this serious. I guess the question is, how large was the index *before*
all the alter tables?

I'd expect that it was the ALTER TABLEs that caused this much index
growth, which VACUUM FULL was subsequently unable to fix.

I don't expect you kept a log of index sizes throughout the operation
however :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 16:32:16
Message-ID:	20070117113216.a6d72185.wmoran@collaborativefusion.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

In response to Alvaro Herrera <alvherre(at)commandprompt(dot)com>:

> Bill Moran wrote:
> >
> > We just did a bunch of maintenance on one of our production databases that
> > involved a lot of alter tables and moving records about and the like.
> >
> > Afterwards, I did a vacuum full and analyze to get the database back on
> > track -- autovac maintains it under normal operations.
> >
> > Today I decided to run reindex during a slow period, and was shocked to
> > find the database size drop from 165M to 30M. Keep in mind that the
> > 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> > That seems a little excessive to me, especially when the docs claim that
> > reindexing is usually not necessary.
>
> It's been said that vacuum full does not fix index bloat -- in fact,
> it's a problem it worsens. However, I very much doubt that it would be
> this serious. I guess the question is, how large was the index *before*
> all the alter tables?

I don't have details on the various indexes. I do keep an mrtg graph of
pg_database_size(), so I can track the overall size of the database and
correlate it to events. I'm not tracking individual relations, indexes,
etc though.

The entire database was around 28M prior to the upgrades, etc. Immediately
after the upgrades, it was ~270M. Following a vacuum full, it dropped to
165M. Following a database-wide reindex, it dropped to 30M.

> I'd expect that it was the ALTER TABLEs that caused this much index
> growth, which VACUUM FULL was subsequently unable to fix.
>
> I don't expect you kept a log of index sizes throughout the operation
> however :-(

Not index size, specifically, no.

I can probably reproduce the issue, however. I have access to the scripts
that were run to update the database, and I can pull a pre-upgrade version
from backup.

I guess my question is whether or not this behaviour is strange enough to
warrant me taking the time to do so. Just because I've never seen it
before doesn't mean that it's unheard of. ;)

Is this level of index bloat known? Would it be worthwhile for me to
investigate it and report any details on what's going on or is this a known
factor that folks don't need any additional details on?

--
Bill Moran
Collaborative Fusion Inc.

From:	"Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Cc:	bruce(at)momjian(dot)us
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 16:42:20
Message-ID:	19dcd19103d2e8b6656dac44879e2fcb@biglumber.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Today I decided to run reindex during a slow period, and was shocked to
> find the database size drop from 165M to 30M. Keep in mind that the
> 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> That seems a little excessive to me, especially when the docs claim that
> reindexing is usually not necessary.

For what its worth, I've seen far worse.

> I guess my question is whether or not this is expected. It's obviously
> not a good thing -- I've noticed that shared buffer usage has dropped
> dramatically as well (from 28,000 to 7000). I hadn't expected index
> bloat of this magnitude, and I'm concerned about when the database hits
> 2 or 3 G in size and has 12G just in indexes that take hours to rebuild.

Regular reindexing is so inexpensive compared to vacuum, I recommend
adding it in as part of your regular maintenance. At the very least, it's
unlikely to ever be that severe again unless you don't reindex for an
equally long period of time.

Come to think of it, an auto-reindex option might be nice in core someday.
TODO item?

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200701171129
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFFrlDivJuQZxSWSsgRAvNnAJ9fJ+U6cyyO382HiZtp8LE5drcpOgCgwlW5
EbOS7Gbg/DYOgXeG7vUIlhY=
=9E8g
-----END PGP SIGNATURE-----

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc:	Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, bruce(at)momjian(dot)us
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 16:48:15
Message-ID:	1169052495.22307.283.camel@coppola.muc.ecircle.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

[snip]
> Come to think of it, an auto-reindex option might be nice in core someday.
> TODO item?

Marry it with autovacuum + online index build, and it will be cool ;-)

BTW, having a privileged background thread doing the reindex could be a
solution to most of the objections regarding online reindex, as the
thread would be privileged enough already to be able to back out if
something fails (part of the objections), and it could stay around long
enough to only lock tentatively in a loop in order to avoid deadlocks
(another part of the objections).

If it would be also marked the same as the vacuum threads not to block
other vacuums, then it would also not be a problem that it runs 2
days...

Cheers,
Csaba.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 16:56:14
Message-ID:	24871.1169052974@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Bill Moran <wmoran(at)collaborativefusion(dot)com> writes:
> The entire database was around 28M prior to the upgrades, etc. Immediately
> after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> 165M. Following a database-wide reindex, it dropped to 30M.

As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them.
(Worst case, they could double in size, if the vacuum moves every row;
there's an intermediate state where there have to be index entries for
both old and new copies of each moved row, to ensure things are
consistent if the vacuum crashes right there.)

So the above doesn't sound too unlikely. Perhaps we should recommend
vac full + reindex as standard cleanup procedure. Longer term, maybe
teach vac full to do an automatic reindex if it's moved more than X% of
the rows. Or forget the current vac full implementation entirely, and
go over to something acting more like CLUSTER ...

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 17:12:51
Message-ID:	20070117171251.GH26080@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Bill Moran wrote:

> The entire database was around 28M prior to the upgrades, etc. Immediately
> after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> 165M. Following a database-wide reindex, it dropped to 30M.

Oh, so it was clearly the upgrade procedure that caused the bloat ...
Reindexing seems the expected course.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 17:52:03
Message-ID:	20070117125203.899f6660.wmoran@collaborativefusion.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

In response to Alvaro Herrera <alvherre(at)commandprompt(dot)com>:

> Bill Moran wrote:
>
> > The entire database was around 28M prior to the upgrades, etc. Immediately
> > after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> > 165M. Following a database-wide reindex, it dropped to 30M.
>
> Oh, so it was clearly the upgrade procedure that caused the bloat ...
> Reindexing seems the expected course.

Right. Sorry if I didn't explain that properly.

It wasn't the fact that it bloated that surprised me. It was the
_magnitude_ of bloat that I wasn't expecting, as well as the fact that
it was _all_ _index_ bloat.

--
Bill Moran
Collaborative Fusion Inc.

From:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
To:	Ben <bench(at)silentmedia(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 18:01:43
Message-ID:	20070117130143.9171d2df.wmoran@collaborativefusion.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

In response to Ben <bench(at)silentmedia(dot)com>:

> Hey Bill. How do you monitor your shared buffer usage? My understanding
> was that there wasn't a good way to see what was used vs. allocated.

echo "select count(*) from pg_buffercache where reldatabase is not null;" | $PSQL_BIN -P tuples_only -U pgsql postgres | head -1

Of course, you have to install the pg_buffercache contrib module first.

> On Wed, 17 Jan 2007, Bill Moran wrote:
>
> >
> > We just did a bunch of maintenance on one of our production databases that
> > involved a lot of alter tables and moving records about and the like.
> >
> > Afterwards, I did a vacuum full and analyze to get the database back on
> > track -- autovac maintains it under normal operations.
> >
> > Today I decided to run reindex during a slow period, and was shocked to
> > find the database size drop from 165M to 30M. Keep in mind that the
> > 165M is after vacuum full. So, apparently, there was 135M of index bloat?
> > That seems a little excessive to me, especially when the docs claim that
> > reindexing is usually not necessary.
> >
> > This is PostgreSQL 8.1.4. We've got upgrades to 8.2 planned, but it's
> > going to be a few months before we can squeak that into a maintenance
> > window. Additionally, I thought all the big index improvements were
> > added in 7.4.
> >
> > I guess my question is whether or not this is expected. It's obviously
> > not a good thing -- I've noticed that shared buffer usage has dropped
> > dramatically as well (from 28,000 to 7000). I hadn't expected index
> > bloat of this magnitude, and I'm concerned about when the database hits
> > 2 or 3 G in size and has 12G just in indexes that take hours to rebuild.
> >
> > --
> > Bill Moran
> > Collaborative Fusion Inc.
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> >
>
>
>
>
>
>

--
Bill Moran
Collaborative Fusion Inc.

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 20:12:43
Message-ID:	45AE833B.60407@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tom Lane wrote:
> Bill Moran <wmoran(at)collaborativefusion(dot)com> writes:
>> The entire database was around 28M prior to the upgrades, etc. Immediately
>> after the upgrades, it was ~270M. Following a vacuum full, it dropped to
>> 165M. Following a database-wide reindex, it dropped to 30M.
>
> As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them.
> (Worst case, they could double in size, if the vacuum moves every row;
> there's an intermediate state where there have to be index entries for
> both old and new copies of each moved row, to ensure things are
> consistent if the vacuum crashes right there.)
>
> So the above doesn't sound too unlikely. Perhaps we should recommend
> vac full + reindex as standard cleanup procedure. Longer term, maybe
> teach vac full to do an automatic reindex if it's moved more than X% of
> the rows. Or forget the current vac full implementation entirely, and
> go over to something acting more like CLUSTER ...

we have to recommend the CLUSTER "way" to fix overly bloated databases
quite often to people on IRC because vacuum full is unreasonably slow on
highly fragmented databases.
Doing something like that internally for vacuum full sounds like a
reasonable idea except for the additional disk usage during the process
which might cause issues for people ...

Stefan

From:	Vivek Khera <vivek(at)khera(dot)org>
To:	pgsql general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Index bloat of 4x
Date:	2007-01-17 20:51:52
Message-ID:	B5B4F1CD-86E3-4C7D-9738-5F559E2F3268@khera.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Jan 17, 2007, at 11:56 AM, Tom Lane wrote:

> So the above doesn't sound too unlikely. Perhaps we should recommend
> vac full + reindex as standard cleanup procedure. Longer term, maybe
> teach vac full to do an automatic reindex if it's moved more than X
> % of

a vac full + reindex is a waste of effort.

whenever i need a big cleanup, I drop indexes (other than PK), vac
full, re-create indexes.

however, usually a reindex does a sufficient job if vacuum has been
run with any sort of regularity.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-18 07:08:25
Message-ID:	19674.1169104105@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Bill Moran <wmoran(at)collaborativefusion(dot)com> writes:
> It wasn't the fact that it bloated that surprised me. It was the
> _magnitude_ of bloat that I wasn't expecting, as well as the fact that
> it was _all_ _index_ bloat.

Um, no, you had plenty of table *and* index bloat before. The problem
here is that VACUUM FULL fixed all the table bloat whilst making the
index situation worse :-(

regards, tom lane

From:	Bill Moran <wmoran(at)collaborativefusion(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-18 13:07:15
Message-ID:	20070118080715.479b9315.wmoran@collaborativefusion.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

In response to Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Bill Moran <wmoran(at)collaborativefusion(dot)com> writes:
> > It wasn't the fact that it bloated that surprised me. It was the
> > _magnitude_ of bloat that I wasn't expecting, as well as the fact that
> > it was _all_ _index_ bloat.
>
> Um, no, you had plenty of table *and* index bloat before. The problem
> here is that VACUUM FULL fixed all the table bloat whilst making the
> index situation worse :-(

Right. It doesn't _look_ that way from the graph, but that's because I only
graph total DB size. I expect if I graphed data and index size separately,
it would be evident.

At this point, I'm going to assume that my question of, "Is this 4x bloat
strange enough to warrant further investigation" is "no". It seems like
this amount of bloat isn't terribly unusual, and that the people working on
improving this sort of thing already have enough examples of it.

Thanks to everyone for the replies.

--
Bill Moran
Collaborative Fusion Inc.

From:	"Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To:	pgsql-general(at)postgresql(dot)org
Cc:	Bill Moran <wmoran(at)collaborativefusion(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-19 01:42:00
Message-ID:	200701181842.00696.pgsql@bluepolka.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Thursday January 18 2007 6:07 am, Bill Moran wrote:
> Right. It doesn't _look_ that way from the graph, but that's
> because I only graph total DB size. I expect if I graphed
> data and index size separately, it would be evident.

pg_total_relation_size() might give you what you want there.

> At this point, I'm going to assume that my question of, "Is
> this 4x bloat strange enough to warrant further investigation"
> is "no". It seems like this amount of bloat isn't terribly
> unusual, and that the people working on improving this sort of
> thing already have enough examples of it.

I afraid I don't see how any of the answers I saw discussed fit a
24x7 operation. Reindex, drop index, vacuum full, ... they all
block production queries of one sort or another for significant
periods of time (minutes) on large (multi/tens of GB) tables,
and thus are infeasible for true 24x7 operations. What it seems
we really need is something to remove the bloat without blocking
production DML queries, while under significant query load, with
very large tables. This bloat issue is by far our biggest
headache on the DB side.

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	"Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc:	Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-19 09:11:17
Message-ID:	1169197877.11526.10.camel@coppola.muc.ecircle.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

[snip]
> I afraid I don't see how any of the answers I saw discussed fit a
> 24x7 operation. Reindex, drop index, vacuum full, ... they all
> block production queries of one sort or another for significant
> periods of time (minutes) on large (multi/tens of GB) tables,
> and thus are infeasible for true 24x7 operations.[snip]

This is not completely true, as of 8.2 there is an online index build,
and if that could be used in a background thread to rebuild the index
and replace the bloated one once it's finished, that would be a
non-blocking operation which could be done in 24x7 situations.

There are some issues with using the online index build for replacing an
existing index, but if those could be solved it would be a viable
solution I think...

Cheers,
Csaba.

From:	"Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-19 17:19:52
Message-ID:	200701191019.52768.pgsql@bluepolka.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Friday January 19 2007 2:11 am, Csaba Nagy wrote:
>
> > I afraid I don't see how any of the answers I saw discussed
> > fit a 24x7 operation. Reindex, drop index, vacuum full, ...
> > they all block production queries of one sort or another for
> > significant periods of time (minutes) on large (multi/tens
> > of GB) tables, and thus are infeasible for true 24x7
> > operations.[snip]
>
> This is not completely true, as of 8.2 there is an online
> index build, and if that could be used in a background thread
> to rebuild the index and replace the bloated one once it's
> finished, that would be a non-blocking operation which could
> be done in 24x7 situations.

Online index creation definitely helps us toward 24x7. But
wouldn't we still have to drop the old index, thus blocking
production queries?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-19 17:45:03
Message-ID:	17158.1169228703@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

"Ed L." <pgsql(at)bluepolka(dot)net> writes:
> Online index creation definitely helps us toward 24x7. But
> wouldn't we still have to drop the old index, thus blocking
> production queries?

Yes, but only for a very short period.

regards, tom lane

From:	"Jeremy Haile" <jhaile(at)fastmail(dot)fm>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc:	"Csaba Nagy" <nagy(at)ecircle-ag(dot)com>, "Postgres general mailing list" <pgsql-general(at)postgresql(dot)org>, "Bill Moran" <wmoran(at)collaborativefusion(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-19 19:03:28
Message-ID:	1169233408.21922.1170154897@webmail.messagingengine.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Is it feasible to add a "reindex concurrently" that doesn't lock the
table for the rebuild, then locks the table when doing a second pass to
pickup rows that were changed after the first pass? Or something like
that....

On Fri, 19 Jan 2007 12:45:03 -0500, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> said:
> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > Online index creation definitely helps us toward 24x7. But
> > wouldn't we still have to drop the old index, thus blocking
> > production queries?
>
> Yes, but only for a very short period.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Jeremy Haile <jhaile(at)fastmail(dot)fm>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ed L(dot)" <pgsql(at)bluepolka(dot)net>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-22 09:24:10
Message-ID:	1169457850.2735.20.camel@coppola.muc.ecircle.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Fri, 2007-01-19 at 20:03, Jeremy Haile wrote:
> Is it feasible to add a "reindex concurrently" that doesn't lock the
> table for the rebuild, then locks the table when doing a second pass to
> pickup rows that were changed after the first pass? Or something like
> that....

IIRC, the objection was the deadlock potential of any lock upgrade, and
the problems of impossible cleanup on failure if something changed the
permissions of the executing user in the meantime. That's why I think it
would make sense if it could be done by a privileged background thread
like the autovacuum ones, so the lock upgrade can be tried without
blocking, as it can take quite some time till it succeeds, and the
cleanup is possible due to the privileged nature of the executor.

If there would be such a facility it would also need some policies to
control time windows and priorities just as for autovacuum, that's why I
connect it in my usage-focused mind to autovacuum.

Cheers,
Csaba.

From:	"Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Jeremy Haile <jhaile(at)fastmail(dot)fm>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-22 20:32:37
Message-ID:	200701221332.37953.pgsql@bluepolka.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

We have a large number (50+) of pre-8.2 clusters. How can I
best/most easily identify those indices most bloated and in need
of reindex/rebuilding?

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Jeremy Haile <jhaile(at)fastmail(dot)fm>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ed L(dot)" <pgsql(at)bluepolka(dot)net>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-31 03:18:32
Message-ID:	200701310318.l0V3IWU08970@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Added to TODO:

* Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY

This is difficult because you must upgrade to an exclusive table lock
to replace the existing index file. CREATE INDEX CONCURRENTLY does not
have this complication. This would allow index compaction without
downtime.

I understand the problems, but the need for this seems pretty clear.

---------------------------------------------------------------------------

Csaba Nagy wrote:
> On Fri, 2007-01-19 at 20:03, Jeremy Haile wrote:
> > Is it feasible to add a "reindex concurrently" that doesn't lock the
> > table for the rebuild, then locks the table when doing a second pass to
> > pickup rows that were changed after the first pass? Or something like
> > that....
>
> IIRC, the objection was the deadlock potential of any lock upgrade, and
> the problems of impossible cleanup on failure if something changed the
> permissions of the executing user in the meantime. That's why I think it
> would make sense if it could be done by a privileged background thread
> like the autovacuum ones, so the lock upgrade can be tried without
> blocking, as it can take quite some time till it succeeds, and the
> cleanup is possible due to the privileged nature of the executor.
>
> If there would be such a facility it would also need some policies to
> control time windows and priorities just as for autovacuum, that's why I
> connect it in my usage-focused mind to autovacuum.
>
> Cheers,
> Csaba.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	"Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc:	pgsql-general(at)postgresql(dot)org, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-01-31 04:14:52
Message-ID:	200701310414.l0V4EqF18895@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I found this thread quite depressing because I had forgotten the VACUUM
FULL only reclaims totally empty pages. I have applied the following
documentation patch to recommend periodic REINDEX, and backpatched to
8.2.X docs. I also added some TODO items so hopefully at least we will
keep track of this limitation.

---------------------------------------------------------------------------

Ed L. wrote:
> On Thursday January 18 2007 6:07 am, Bill Moran wrote:
> > Right. It doesn't _look_ that way from the graph, but that's
> > because I only graph total DB size. I expect if I graphed
> > data and index size separately, it would be evident.
>
> pg_total_relation_size() might give you what you want there.
>
> > At this point, I'm going to assume that my question of, "Is
> > this 4x bloat strange enough to warrant further investigation"
> > is "no". It seems like this amount of bloat isn't terribly
> > unusual, and that the people working on improving this sort of
> > thing already have enough examples of it.
>
> I afraid I don't see how any of the answers I saw discussed fit a
> 24x7 operation. Reindex, drop index, vacuum full, ... they all
> block production queries of one sort or another for significant
> periods of time (minutes) on large (multi/tens of GB) tables,
> and thus are infeasible for true 24x7 operations. What it seems
> we really need is something to remove the bloat without blocking
> production DML queries, while under significant query load, with
> very large tables. This bloat issue is by far our biggest
> headache on the DB side.
>
> Ed
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment	Content-Type	Size
/rtmp/diff	text/x-diff	2.1 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Index bloat of 4x
Date:	2007-01-31 04:15:55
Message-ID:	200701310415.l0V4FtP19125@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tom Lane wrote:
> Bill Moran <wmoran(at)collaborativefusion(dot)com> writes:
> > The entire database was around 28M prior to the upgrades, etc. Immediately
> > after the upgrades, it was ~270M. Following a vacuum full, it dropped to
> > 165M. Following a database-wide reindex, it dropped to 30M.
>
> As Alvaro said, vacuum full doesn't shrink indexes but in fact bloats them.
> (Worst case, they could double in size, if the vacuum moves every row;
> there's an intermediate state where there have to be index entries for
> both old and new copies of each moved row, to ensure things are
> consistent if the vacuum crashes right there.)
>
> So the above doesn't sound too unlikely. Perhaps we should recommend
> vac full + reindex as standard cleanup procedure. Longer term, maybe
> teach vac full to do an automatic reindex if it's moved more than X% of
> the rows. Or forget the current vac full implementation entirely, and
> go over to something acting more like CLUSTER ...

TODO already has:

* Improve speed with indexes

For large table adjustments during VACUUM FULL, it is faster to
reindex rather than update the index. Also, index updates can
bloat the index.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Jim Nasby <decibel(at)decibel(dot)org>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Jeremy Haile <jhaile(at)fastmail(dot)fm>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ed L(dot)" <pgsql(at)bluepolka(dot)net>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-02-02 03:38:07
Message-ID:	93FBF786-18A2-4BCC-AB5C-118B5451663F@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Is there no way to change the index code to allow for moving index
tuples from one page to another? If we could do that then presumably
we could free up substantially more pages.

On Jan 30, 2007, at 10:18 PM, Bruce Momjian wrote:

>
> Added to TODO:
>
> * Add REINDEX CONCURRENTLY, like CREATE INDEX CONCURRENTLY
>
> This is difficult because you must upgrade to an exclusive table
> lock
> to replace the existing index file. CREATE INDEX CONCURRENTLY
> does not
> have this complication. This would allow index compaction without
> downtime.
>
> I understand the problems, but the need for this seems pretty clear.
>
> ----------------------------------------------------------------------
> -----
>
> Csaba Nagy wrote:
>> On Fri, 2007-01-19 at 20:03, Jeremy Haile wrote:
>>> Is it feasible to add a "reindex concurrently" that doesn't lock the
>>> table for the rebuild, then locks the table when doing a second
>>> pass to
>>> pickup rows that were changed after the first pass? Or something
>>> like
>>> that....
>>
>> IIRC, the objection was the deadlock potential of any lock
>> upgrade, and
>> the problems of impossible cleanup on failure if something changed
>> the
>> permissions of the executing user in the meantime. That's why I
>> think it
>> would make sense if it could be done by a privileged background
>> thread
>> like the autovacuum ones, so the lock upgrade can be tried without
>> blocking, as it can take quite some time till it succeeds, and the
>> cleanup is possible due to the privileged nature of the executor.
>>
>> If there would be such a facility it would also need some policies to
>> control time windows and priorities just as for autovacuum, that's
>> why I
>> connect it in my usage-focused mind to autovacuum.
>>
>> Cheers,
>> Csaba.
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>> choose an index scan if your joining column's datatypes do not
>> match
>
> --
> Bruce Momjian bruce(at)momjian(dot)us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Jim Nasby <decibel(at)decibel(dot)org>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Jeremy Haile <jhaile(at)fastmail(dot)fm>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Ed L(dot)" <pgsql(at)bluepolka(dot)net>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>
Subject:	Re: Index bloat of 4x
Date:	2007-02-05 15:11:06
Message-ID:	20070205151106.GA4092@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Jim Nasby wrote:
> Is there no way to change the index code to allow for moving index
> tuples from one page to another? If we could do that then presumably
> we could free up substantially more pages.

This paper

@inproceedings{DBLP:conf/sigmod/ZouS96,
author = {C. Zou and B. Salzberg},
editor = {H. V. Jagadish and Inderpal Singh Mumick},
title = {On-line Reorganization of Sparsely-populated B+trees},
booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},
year = {1996},
pages = {115-124},
bibsource = {DBLP, \url{http://dblp.uni-trier.de}}
}

may be of some use here.

http://citeseer.ist.psu.edu/zou96line.html

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.