Re: when to reindex?

Lists: pgsql-general
From: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
To: "Pgsql General list" <pgsql-general(at)postgresql(dot)org>
Subject: when to reindex?
Date: 2008-06-06 15:36:01
Message-ID: 1A6E6D554222284AB25ABE3229A92762E9A49F@nrtexcus702.int.asurion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In Oracle, there is a method to determine when it is advisable to
rebuild indexes. Are there any guidelines for this in PostgreSQL?

I found this but it doesn't indicate at which point an index should be
rebuilt other than corruption.

http://www.postgresql.org/docs/8.3/interactive/routine-reindex.html

Jon


From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
Cc: "Pgsql General list" <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-06 17:01:24
Message-ID: dcc563d10806061001ybffa60ewb96d0757ca3635db@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Jun 6, 2008 at 9:36 AM, Roberts, Jon <Jon(dot)Roberts(at)asurion(dot)com> wrote:
> In Oracle, there is a method to determine when it is advisable to
> rebuild indexes. Are there any guidelines for this in PostgreSQL?
>
> I found this but it doesn't indicate at which point an index should be
> rebuilt other than corruption.
>
> http://www.postgresql.org/docs/8.3/interactive/routine-reindex.html

PostgreSQL isn't quite as finicky about indexes as oracle can be. If
you've ever rebuilt a table and forgot to rebuild the indexes in
oracle you know what I'm talking about.

PostgreSQL generally takes care of indexes pretty well. There are two
reasons to reindex in pgsql. The first one is a corrupted index.
Note that if you're running on quality hardware, and a properly
configured db (i.e. fsync isn't off, etc...) then you shouldn't get
corrupted indexes. If you get them quite often, then you've got worse
problems than just figuring out when to reindex. The second common
situation that requires a reindex is when you suffer from index bloat.
This can be caused by certain out of the ordinary update patterns and
by vacuum full.


From: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: "Pgsql General list" <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-06 18:02:58
Message-ID: 1A6E6D554222284AB25ABE3229A92762E9A4A3@nrtexcus702.int.asurion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> On Fri, Jun 6, 2008 at 9:36 AM, Roberts, Jon <Jon(dot)Roberts(at)asurion(dot)com>
> wrote:
> > In Oracle, there is a method to determine when it is advisable to
> > rebuild indexes. Are there any guidelines for this in PostgreSQL?
> >
> > I found this but it doesn't indicate at which point an index should
be
> > rebuilt other than corruption.
> >
> > http://www.postgresql.org/docs/8.3/interactive/routine-reindex.html
>
> PostgreSQL isn't quite as finicky about indexes as oracle can be. If
> you've ever rebuilt a table and forgot to rebuild the indexes in
> oracle you know what I'm talking about.
>
> PostgreSQL generally takes care of indexes pretty well. There are two
> reasons to reindex in pgsql. The first one is a corrupted index.
> Note that if you're running on quality hardware, and a properly
> configured db (i.e. fsync isn't off, etc...) then you shouldn't get
> corrupted indexes. If you get them quite often, then you've got worse
> problems than just figuring out when to reindex. The second common
> situation that requires a reindex is when you suffer from index bloat.
> This can be caused by certain out of the ordinary update patterns and
> by vacuum full.

I am concerned about index bloat. I have an index on a table that is
updated with new data frequently and according to this:
http://www.postgresql.org/docs/8.3/interactive/routine-reindex.html

"Any situation in which the range of index keys changed over time"

I will eventually get index bloat.

Based on this, I have the fillfactor set lower than the default 90 but
this will fill up and it will run slower over time. I want to automate
the reindex process but only reindex when needed. I have a pretty large
database so I can't reindex everything regardless if it needs it or not.

Jon


From: Kevin Hunter <hunteke(at)earlham(dot)edu>
To: Jon Roberts <Jon(dot)Roberts(at)asurion(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Postgres General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-06 19:10:23
Message-ID: 48498B9F.2070602@earlham.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

At 2:02p -0400 on Fri, 06 Jun 2008, Jon Roberts wrote:
> Based on this, I have the fillfactor set lower than the default 90 but
> this will fill up and it will run slower over time. I want to automate
> the reindex process but only reindex when needed. I have a pretty large
> database so I can't reindex everything regardless if it needs it or not.

Not knowing the core Postgres internals, this is perhaps a naive
question: would it be possible to engineer REINDEX so that it could do
the job in increments? Rather than recreating the entire index at once,
say only create the index for 2% of the table? I suppose analogous to a
partial index somehow?

Or, assuming the REINDEX is for speed/bloat, not for corruption, perhaps
an option to use the old index as a basis, rather than scanning the
entire table multiple times as with a CREATE INDEX CONCURRENTLY.

I do see REINDEX CONCURRENTLY on the TODO list though. Not exactly the
same idea, but ...

Kevin


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Pgsql General list <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-06 20:23:19
Message-ID: 48499CB7.1060407@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Roberts, Jon wrote:

> Based on this, I have the fillfactor set lower than the default 90 but
> this will fill up and it will run slower over time. I want to automate
> the reindex process but only reindex when needed. I have a pretty large
> database so I can't reindex everything regardless if it needs it or not.

I'll try to offer a few pointers, but be aware that there might be a
MUCH better way of doing things that I'm just not aware of. I'm lucky
enough not to need to worry too much about either index bloat or
occasional index rebuilds.

Anyway, the following query:

select relname, relpages from pg_class where relkind = 'i';

will get you a list of your indexes and the associated page counts.
Unfortunately it gives you no indication of the actual use of those pages.

You might be able to put something together using the `pageinspect'
module. For example, this query:

select (stats).* FROM
(select
pageno,
bt_page_stats('INDEXNAME', pageno)
AS stats
from generate_series(
1,
(select relpages from pg_class
where relname = 'INDEXNAME')
- 1
) as pageno
) AS x;

will return details about each page.

Using some simple aggregates should then let you get some idea of the
space use in the index. For example, replacing the:

SELECT (stats).* FROM

line in the above query with:

SELECT
SUM((x).free_size) AS totalfree,
SUM((x).page_size) AS totalsize,
SUM((x).free_size)::numeric / SUM((x).page_size)::numeric
* 100 AS freepercent
FROM -- ... rest of prior query ...

should return the percentage of free space in the index. This *might* be
a good metric for whether a reindex is appropriate. You can probably
obtain the configured fill factor on the index from the catalogs
somewhere (I don't know how off the top of my head) and compare that to
the free space to see if it's excessive.

I'm not at all sure that this is correct, so please point out if I've
made some incorrect assumption or just misunderstood something.

By the way, I also just run into this message:
http://unixadmintalk.com/f48/measuring-table-index-bloat-352483/
when doing a quick search. It might be useful.

--
Craig Ringer


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Kevin Hunter" <hunteke(at)earlham(dot)edu>
Cc: "Jon Roberts" <Jon(dot)Roberts(at)asurion(dot)com>, "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>, "Postgres General List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-06 20:42:15
Message-ID: 87hcc6cn2w.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Kevin Hunter" <hunteke(at)earlham(dot)edu> writes:

> Or, assuming the REINDEX is for speed/bloat, not for corruption, perhaps
> an option to use the old index as a basis, rather than scanning the
> entire table multiple times as with a CREATE INDEX CONCURRENTLY.

That's been mentioned, it ought to be on the TODO. The trick is determining
*when* to use the index and when to use the table -- but that's something the
planner already does quite well and we could hopefully leverage that.

Note that in typical cases it would be slower. REINDEX scans the table
precisely once and sorts it. The sorting will probably have to do multiple
passes through temporary files which is presumably what you're referring to.
But those passes are still at least sequential. A full index scan has to do
random access reads and in many cases read the same page many times to get
different records.

The cases where it would win would be where you have a lot of dead space in
the table (but not in the index), partial indexes which don't cover much of
the table, or a table which is already very well clustered (possibly,
depending on other factors).

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: Postgres General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-07 14:37:12
Message-ID: 484A9D18.2080103@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Gregory Stark wrote:
> REINDEX scans the table
> precisely once and sorts it.

For the bloat, as opposed to corruption, case -
what information is needed from the table that
is not in the old index? Why would a sequential
read of the old index alone (then some processing)
not suffice?

Thanks,
Jeremy Harris


From: "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>
To: "Jeremy Harris" <jgh(at)wizmail(dot)org>
Cc: "Postgres General List" <pgsql-general(at)postgresql(dot)org>
Subject: Re: when to reindex?
Date: 2008-06-08 08:05:34
Message-ID: 65937bea0806080105x266154f7h23880700b7b8b509@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Jun 7, 2008 at 8:07 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:

> Gregory Stark wrote:
>
>> REINDEX scans the table
>> precisely once and sorts it.
>>
>
> For the bloat, as opposed to corruption, case -
> what information is needed from the table that
> is not in the old index? Why would a sequential
> read of the old index alone (then some processing)
> not suffice?

In Postgres, an index does not hold the livliness information of the rows it
is pointing it; that is, there may be a thousand row-pointers in the index,
but not all the rows pointed to by those pointers are known to be live. This
is an implication of MVCC in Postgres.

So every index lookup has to look at the corresponding heap (aka table) row
and decide if that row should be visible to the querying session.

Best regards,

--
gurjeet[(dot)singh](at)EnterpriseDB(dot)com
singh(dot)gurjeet(at){ gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device