Re: Index creation takes more time?

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Index creation takes more time?
Date: 2012-09-09 09:43:37
Message-ID: e87a2f7a91ce1fca7143bcadc4553a0b@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

Dne 09.09.2012 11:25, Herouth Maoz napsal:
> We have tables which we archive and shorten every day. That is - the
> main table that has daily inserts and updates is kept small, and
> there
> is a parallel table with all the old data up to a year ago.
>
> In the past we noticed that the bulk transfer from the main table to
> the archive table takes a very long time, so we decided to do this in
> three steps: (1) drop indexes on the archive table, (2) insert a
> week's worth of data into the archive table. (3) recreate the
> indexes.
> This proved to take much less time than having each row update the
> index.
>
> However, this week we finally upgraded from PG 8.3 to 9.1, and
> suddenly, the archiving process takes a lot more time than it used to
> - 14:30 hours for the most important table, to be exact, spent only
> on
> index creation.
>
> The same work running on the same data in 8.3 on a much weaker PC
> took merely 4:30 hours.
>
> There are 8 indexes on the archive table.
>
> The size of the main table is currently (after archive) 7,805,009
> records.
> The size of the archive table is currently 177,328,412 records.

What amount of data are we talking about? What data types? Show us the
table
structure and definition of the indexes.

What hardware is this running on? Try to collect some system stats
(vmstat,
iostat and sar are your friends) when the indexes are being rebuilt.

> Has there been a major change in index creation that would cause 9.1
> to do it this much slower? Should I go back to simply copying over
> the
> data or is the whole concept breaking down?

Not really, it's usually expected to be either faster or the same as
with
previous releases.

Now, when upgrading to 9.1, have you used the same configuration or are
there differences (I mean in postgresql.conf). I'm interested in the
maintenance_work_mem value.

Now, it might be useful to look at partitioning in this case, i.e.
splitting
the large table into smaller child table (e.g. one per week) and
loading the
data into these. But it depends on how you use the old data - with some
queries
this may cause issues (worse performance).

Tomas

In response to

Browse pgsql-general by date

  From Date Subject
Next Message vdg 2012-09-09 11:12:33 Re: Add a check an a array column
Previous Message Tim Uckun 2012-09-09 09:38:35 Re: how to group by similarity ?