Re: Subject: bool / vacuum full bug followup part 2

From: Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Subject: bool / vacuum full bug followup part 2
Date: 2002-05-06 16:23:53
Message-ID: Pine.LNX.4.33.0205061007040.13079-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 3 May 2002, Tom Lane wrote:

> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > On Fri, 3 May 2002, Tom Lane wrote:
> >> Scott Marlowe <scott(dot)marlowe(at)ihs(dot)com> writes:
> > Well, my keys aren't changing and the index is growing like they are.
> >>
> >> Could we see the exact details of your test case?
>
> > Sure. I think I posted most of it here already...
>
> Okay, what I see is that the index on the integer column behaves like I
> would expect: you can update, vacuum, update, vacuum, and it doesn't get
> bigger.

Yes, it does get bigger, but only with use, not vacuum full.

It doesn't look like the index on the text column is getting reused
either. Is that because I'm update a lot of rows with a single update
statement? would it be reused if I was changing one row at a time and
commiting it?

I'll test that theory, but I'm gonna bet right now that it won't.

> But the index on the boolean column does grow.

Not only that, but it grows from a vacuum full. I would expect it to at
least stay the same.

> I believe the
> problem is that there are so many equal keys. The reinserted index
> entries are always inserted at the end of the range of matching keys,
> and so there's no opportunity to re-use space within other pages of the
> index. There are only two leaf pages getting the insertions, and so
> nothing to do but split them over and over.
>
> What this really points up, of course, is that making a btree index on
> a boolean column is a pretty foolish thing to do.

Since postgresql 7.2.1 refuses to make an index of any kind other than
btree, what is the answer? no indexes? While single column indexes may
seem wasteful, remember that the boole may be stored in a table that has
very large tuples, and a sequential scan of such a table could be quite
slow, or there may be a situation where a tiny percentage of the booles
are one setting while most are the other, like an approval system for
online content. Either way, creating an index shouldn't result in a
database directory that grows to 100X it's original size, gets slower, and
doesn't give back space to vacuum;

scott.marlowe=# create index test_yn_dx on test using hash (yn);
ERROR: data type boolean has no default operator class for access method
"hash"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using rtree (yn);
ERROR: data type boolean has no default operator class for access method
"rtree"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using gist (yn);
ERROR: data type boolean has no default operator class for access method
"gist"
You must specify an operator class for the index or define a
default operator class for the data type
scott.marlowe=# create index test_yn_dx on test using btree (yn);
CREATE
scott.marlowe=# drop index tesT_yn_dx;
DROP

> I'm not particularly
> unhappy about the performance being bad with respect to space usage,
> because the fact of the matter is that performance is going to be bad
> by any measure.

If the database gets slow, that's forgiveable. If it runs out of space in
the middle of the day because indexes just grow and grow, that's not. It
makes postgresql look like a toy database.

A growing index that vacuum doesn't shrink is a serious issue for people
who expect to reclaim lost space with vacuum. We at least need to let
people know of this behavior in the admin docs, as right now they (the
docs) seem to imply that vacuum frees up all unused space. For indexes,
this isn't true, and people who are getting started don't need this kind
of gotcha waiting to kill a production database 2 or 6 months into use.

Is it maybe at least possible to make reindex either transaction safe or
have an option that pretty much drops and recreates the index in a
transactionally safe mode or something?

I never knew about this problem until now, and I've found that I had
indexes that were 180Megs that reindexed to 48k in size. My database had
been getting slower and slower, and now it flies again. This was true of
ALL types of indexes, on ints, text, everything. The indexes were HUGE.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Nigel J. Andrews 2002-05-06 16:32:27 Re: pgaccess
Previous Message david blood 2002-05-06 16:08:03 Re: pgaccess