Re: Further pg_upgrade analysis for many tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2012-11-10 00:23:40
Message-ID: CAMkU=1zLgye1V7fxHcto-Z8eTw4kPSJJi8bxYrAZ10W9KSBOVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 9, 2012 at 3:06 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, Nov 8, 2012 at 08:59:21PM -0800, Jeff Janes wrote:
>> On Thu, Nov 8, 2012 at 4:33 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> >
>> > I am actually now dumping git head/9.3, so I assume all the problems we
>> > know about should be fixed.
>>
>> Are sure the server you are dumping out of is head?
>
> Well, I tested again with 9.2 dumping/loading 9.2 and the same for git
> head, and got these results:
>
> pg_dump restore
> 9.2 git 9.2 git
>
> 1 0.13 0.11 0.07 0.07
> 1000 4.37 3.98 4.32 5.28
> 2000 12.98 12.19 13.64 14.25
> 4000 47.85 50.14 61.31 70.97
> 8000 210.39 183.00 302.67 294.20
> 16000 901.53 769.83 1399.25 1359.09

For pg_dump, there are 4 possible combinations, not just two. you can
use 9.2's pg_dump to dump from a 9.2 server, use git's pg_dump to dump
from a 9.2 server, use git's pg_dump to dump from a git server, or use
9.2's pg_dump to dump from a git server (although that last one isn't
very relevant)

>
> As you can see, there is very little difference between 9.2 and git
> head, except maybe at the 16k level for pg_dump.
>
> Is there some slowdown with a mismatched version dump/reload? I am
> attaching my test script.

Sorry, from the script I can't really tell what versions are being
used for what.

>
>> Using head's pg_dump, but 9.2.1 server, it takes me 179.11 seconds to
>> dump 16,000 tables (schema only) like your example, and it is
>> definitely quadratic.
>
> Are you using a SERIAL column for the tables. I am, and Peter
> Eisentraut reported that was a big slowdown.

Yes, I'm using the same table definition as your example.

>
>> But using head's pg_dump do dump tables out of head's server, it only
>> took 24.95 seconds, and the quadratic term is not yet important,
>> things still look linear.
>
> Again, using SERIAL?

Yep.

>> Is the next value, increment, etc. for a sequence stored in a catalog,
>> or are they stored in the 8kb file associated with each sequence? If
>
> Each sequence is stored in its own 1-row 8k table:
>
> test=> CREATE SEQUENCE seq;
> CREATE SEQUENCE
>
> test=> SELECT * FROM seq;
> -[ RECORD 1 ]-+--------------------
> sequence_name | seq
> last_value | 1
> start_value | 1
> increment_by | 1
> max_value | 9223372036854775807
> min_value | 1
> cache_value | 1
> log_cnt | 0
> is_cycled | f
> is_called | f
>
>> they are stored in the file, than it is shame that pg_dump goes to the
>> effort of extracting that info if pg_upgrade is just going to
>> overwrite it anyway.
>
> Actually, pg_upgrade needs pg_dump to restore all those sequence values.

I did an experiment where I had pg_dump just output dummy values
rather than hitting the database. Once pg_upgrade moves the relation
files over, the dummy values disappear and are set back to their
originals. So I think that pg_upgrade depends on pg_dump only in a
trivial way--they need to be there, but it doesn't matter what they
are.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit kapila 2012-11-10 04:59:19 Re: Proposal for Allow postgresql.conf values to be changed via SQL
Previous Message Tom Lane 2012-11-10 00:14:48 Re: Inadequate thought about buffer locking during hot standby replay