Re: Further pg_upgrade analysis for many tables

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2012-11-14 04:11:27
Message-ID: CA+CSw_uMBjy6+J7MN1onUSweNCjYOaHj+ijTwwqFfCFQNj+TOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 14, 2012 at 2:03 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> At 64k I see pg_upgrade taking 12% of the duration time, if I subtract
> out the dump/restore times.

My percentage numbers only included CPU time and I used SSD storage.
For the most part there was no IO wait to speak of, but it's
completely expected that thousands of link calls are not free.

>> Postgres time itself breaks down with 10% for shutdown checkpoint and
>> 90% for regular running, consisting of 16% parsing, 13% analyze, 20%
>> plan, 30% execute, 11% commit (AtEOXact_RelationCache) and 6% network.
>
> That SVG graph was quite impressive.

I used perf and Gprof2Dot for this. I will probably do a blog post on
how to generate these graphs. It's much more useful for me than a
plain flat profile as I don't know by heart which functions are called
by which.

>> It looks to me that most benefit could be had from introducing more
>> parallelism. Are there any large roadblocks to pipelining the dump and
>> restore to have them happen in parallel?
>
> I talked to Andrew Dustan about parallelization in pg_restore. First,
> we currently use pg_dumpall, which isn't in the custom format required
> for parallel restore, but if we changed to custom format, create table
> isn't done in parallel, only create index/check constraints, and trigger
> creation, etc. Not sure if it worth perusing this just for pg_upgrade.

I agree that parallel restore for schemas is a hard problem. But I
didn't mean parallelism within the restore, I meant that we could
start both postmasters and pipe the output from dump directly to
restore. This way the times for dumping and restoring can overlap.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-11-14 04:32:15 recursive view syntax
Previous Message Andrew Dunstan 2012-11-14 03:56:16 Re: [HACKERS] pg_dump and thousands of schemas