Re: pg_upgrade (was: 8.2 features status)

From: Rick Gigger <rick(at)alpinenetworking(dot)com>
To: Jim C(dot) Nasby <jnasby(at)pervasive(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Subject: Re: pg_upgrade (was: 8.2 features status)
Date: 2006-08-05 05:28:04
Message-ID: E882D0B5-CFA7-4B49-8A78-8E8922AACF31@alpinenetworking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I had a few thoughts on this issue:

The objective is to smoothly upgrade to the new version with minimal
downtime.

The different proposals as far as I can see are as follows:

Proposal A - the big one time reformatting
1) shutdown the db
2) run a command that upgrades the data directory to the new format
3) start up the new postgres version with the new data dir

Pros: only pg_upgrade (or whatever it's called) needs to know about
the old and new formats, each version of postgres knows about "it's"
format and that's it. The postgres code stays clean
cons: your database is down while the upgrade takes place. This
sucks because the people who need this are the same people who are
trying to avoid downtime. It's faster than a dump/reload but it
doesn't completely solve the problem, it just mitigates it.

Proposal B - the gradual upgrade
1) shutdown the db
2) start it back up with the new version of postgres
3) the new postgres version upgrades things in place as needed

Pros: very short downtime. only the time to shutdown the postgres
version and start up the new one
cons: postgres code gets filled with cruft. each version has to know
about the old versions on disk data format and how to upgrade it.
Until it is finished you will be left with a database that is part
old format, part new format. This could introduce bugs for people
who never needed the feature in the first place.

Here is another proposal that I haven't heard anyone else suggest.
My apologies in advance if it's obviously not workable or has already
be discussed.

Proposal C - PITR with in on the fly disk upgrades
1) setup PITR
2) run pg_upgrade on your latest backed up data directories
3) start up the new pg on that data directory in restartable
recovery / read-only / hot-standby mode
4) update the recovery log importer so that it can update the log
files on the fly as it applies them
5) failover to the hot standby as you normally would

Pros: essentially no downtime, just any incidental time needed for
the failover to occur.
cruft in postgres main codebase is mimimized. It's limited to the
log importer. All other parts of postgres are unaffected
Cons: requires another server or double the disk space on the
original server. Is this a problem for people with databases so
large that a dump reload is unacceptable?
Perhaps there are technical issues with postgres that I don't
understand that would make this too hard.
Maybe it would take to long to update each log file as it's applied
so it wouldn't be able to catch up.

Oh yeah there's another way
Proposal D - Use slony
But I figured since that's been working for a long time, if slony
solved their problem then they wouldn't be looking for something else.

I have no need for this feature as a dump reload is not a problem for
me. I've always wondered though if that was a feasible answer to
this problem. Each time it crops up people propose solutions A and B
but never C.

On Aug 4, 2006, at 1:30 PM, Jim C. Nasby wrote:

> On Fri, Aug 04, 2006 at 02:12:16PM -0400, Stephen Frost wrote:
>> * Jim C. Nasby (jnasby(at)pervasive(dot)com) wrote:
>>> On Thu, Aug 03, 2006 at 11:20:48PM -0700, Josh Berkus wrote:
>>>>> * In-place upgrades (pg_upgrade)
>>>>
>>>> BTW, I may get Sun to contribute an engineer for this; will get
>>>> you posted.
>>>
>>> How would such a thing handle changes to page formats?
>>
>> Couldn't this be done by converting a table/partial-table at a time?
>> It wouldn't be something which could run while the system is live,
>> but
>> it'd probably take less time than dump/restore and wouldn't require
>> double the disk space of the whole database... no?
>
> True, but if you're going to go about creating code that can deal
> with 2
> different versions of on-disk data, why not go one better: put that
> code
> into the database itself, so that pages are converted on-the-fly as
> they're dirtied. That way you have *no* downtime (or almost no,
> anyway).
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rick Gigger 2006-08-05 07:15:09 Re: 8.2 features status
Previous Message Bruce Momjian 2006-08-05 05:23:55 Re: [HACKERS] [PATCHES] log_statement output for protocol