Re: 7 hrs for a pg_restore?

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Douglas J Hunley" <doug(at)hunley(dot)homeip(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: 7 hrs for a pg_restore?
Date: 2008-02-20 09:01:09
Message-ID: 2e78013d0802200101q44dbe6edvac067c39629ed9cd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Feb 19, 2008 11:53 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
>
> Keep in mind, if you have several GB worth of indexes, they take up
> basically no space in the logical dump (just the "CREATE INDEX" command,
> and that's it). But they can take a lot of processor time to build up
> again, especially with localized text.
>
>

I think it would be interesting if we can build these indexes in parallel.
Each index build requires a seq scan on the table. If the table does
not fit in shared buffers, each index build would most likely result
in lots of IO.

One option would be to add this facility to the backend so that multiple
indexes can be built with a single seq scan of the table. In theory, it
should be possible, but might be tricky given the way index build works
(it calls respective ambuild method to build the index which internally
does the seq scan).

Other option is to make pg_restore multi-threaded/processed. The
synchronized_scans facility would then synchronize the multiple heap
scans. ISTM that if we can make pg_restore mult-processed, then
we can possibly add more parallelism to the restore process.

My two cents.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message shilpa.raghavendra 2008-02-20 11:02:40 Need Help selecting Large Data From PQSQL
Previous Message Sven Geisler 2008-02-20 08:57:21 Re: Anyone using a SAN?