Re: Proposal: Multiversion page api (inplace upgrade)

Lists: pgsql-hackers
From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 14:11:59
Message-ID: 484FDD2F.8090709@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

1) Overview

This proposal is part of inplace upgrade project. PostgreSQL should be able to
read any page in old version. This is basic for all possible upgrade method.

2) Background

We have several macros for manipulating of the page structures but this list is
not complete and many parts of code access into this structures directly and
severals part does not use existing macros. The idea is to use only specified
API for manipulation/access of data structure on page. This API will recognize
page layout version and it process data correctly.

3) API

Proposed API is extended version of current macros which does not satisfy all
Page Header manipulation. I plan to use function in first implementation,
because it offers better type control and debugging capability, but some
functions could be converted into macros (or into inline functions) in final
solution (performance improving). All changes are related to bufpage.h and page.c.

4) Implementation

The main point of implementation is to have several version of PageHeader
structure (e.g. PageHeader_04, PageHeader_03 ...) and correct structure will be
handled in special branch (see examples).

Possible improvement is to use union which combine different PageHeader version
and because most PageHeader items are same for all Page Layout version, it will
reduce number of switches. But I'm afraid if union have same data layout as
separate structure on all supported platforms.

There are examples:

void PageSetFull(Page page)
{
switch ( PageGetPageLayoutVersion(page) )
{
case 4 : ((PageHeader_04) (page))->pd_flags |= PD_PAGE_FULL;
break;
default elog(PANIC, "PageSetFull is not supported on page layout version %i",
PageGetPageLayoutVersion(page));
}
}

LocationIndex PageGetLower(Page page)
{
switch ( PageGetPageLayoutVersion(page) )
{
case 4 : return ((PageHeader_04) (page))->pd_lower);
}
elog(PANIC, "Unsupported page layout in function PageGetLower.");
}

5) Issues

a) hash index has hardcoded PageHeader into meta page structure -> need
rewrite hash index implementation to be multiheader version friendly
b) All *ItemSize macros (+toast chunk size) depends on sizeof(PageHeader) ->
separate proposal will follow soon.

All comments are welcome.

Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 14:56:08
Message-ID: 7193.1213196168@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> There are examples:

> void PageSetFull(Page page)
> {
> switch ( PageGetPageLayoutVersion(page) )
> {
> case 4 : ((PageHeader_04) (page))->pd_flags |= PD_PAGE_FULL;
> break;
> default elog(PANIC, "PageSetFull is not supported on page layout version %i",
> PageGetPageLayoutVersion(page));
> }
> }

> LocationIndex PageGetLower(Page page)
> {
> switch ( PageGetPageLayoutVersion(page) )
> {
> case 4 : return ((PageHeader_04) (page))->pd_lower);
> }
> elog(PANIC, "Unsupported page layout in function PageGetLower.");
> }

I'm fairly concerned about the performance impact of turning what had
been simple field accesses into function calls. I argue also that since
none of the PageHeader fields have actually moved in any version that's
likely to be supported, the above functions are actually of exactly
zero value.

The proposed PANIC in PageSetFull seems like it requires more thought as
well: surely we don't want that ever to happen. Which means that
callers need to be careful not to invoke such an operation on an
un-updated page, but this proposed coding offers no aid in making sure
that won't happen. What is needed there, I think, is some more global
policy about what operations are permitted on old (un-converted) pages
and a high-level approach to ensuring that unsafe operations aren't
attempted.

regards, tom lane


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 14:59:06
Message-ID: 484FE83A.1000602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> 4) Implementation
>
> The main point of implementation is to have several version of
> PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct
> structure will be handled in special branch (see examples).

(this won't come as a surprise as we talked about this in PGCon, but) I
think we should rather convert the page structure to new format in
ReadBuffer the first time a page is read in. That would keep the changes
a lot more isolated.

Note that you need to handle not only page header changes, but changes
to internal representations of different data types, and changes like
varvarlen and combocid. Those are things that have happened in the past;
in the future, I'm foreseeing changes to the toast header, for example,
as there's been a lot of ideas related to toast options compression.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 15:15:06
Message-ID: 7463.1213197306@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> (this won't come as a surprise as we talked about this in PGCon, but) I
> think we should rather convert the page structure to new format in
> ReadBuffer the first time a page is read in. That would keep the changes
> a lot more isolated.

The problem is that ReadBuffer is an extremely low-level environment,
and it's not clear that it's possible (let alone practical) to do a
conversion at that level in every case. In particular it hardly seems
sane to expect ReadBuffer to do tuple content conversion, which is going
to be practically impossible to perform without any catalog accesses.

Another issue is that it might not be possible to update a page for
lack of space. Are we prepared to assume that there will never be a
transformation we need to apply that makes the data bigger? (Likely
counterexample: adding collation info to text values.) In such a
situation an in-place update might be impossible, and that certainly
takes it outside the bounds of what ReadBuffer can be expected to manage.

regards, tom lane


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 15:35:17
Message-ID: 484FF0B5.5030005@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane napsal(a):
> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
>> There are examples:
>
>> void PageSetFull(Page page)
>> {
>> switch ( PageGetPageLayoutVersion(page) )
>> {
>> case 4 : ((PageHeader_04) (page))->pd_flags |= PD_PAGE_FULL;
>> break;
>> default elog(PANIC, "PageSetFull is not supported on page layout version %i",
>> PageGetPageLayoutVersion(page));
>> }
>> }
>
>> LocationIndex PageGetLower(Page page)
>> {
>> switch ( PageGetPageLayoutVersion(page) )
>> {
>> case 4 : return ((PageHeader_04) (page))->pd_lower);
>> }
>> elog(PANIC, "Unsupported page layout in function PageGetLower.");
>> }
>
> I'm fairly concerned about the performance impact of turning what had
> been simple field accesses into function calls.

I use functions now because it is easy to track what's going on. Finally it
should be (mostly) macros.

> I argue also that since
> none of the PageHeader fields have actually moved in any version that's
> likely to be supported, the above functions are actually of exactly
> zero value.

Yeah, it is why I'm thinking to use page header with unions inside (for example
TSL/flag field)
and use switch only in case like TSL or flags fields. What I don't know if
fields in this structure will be placed on same place on all platforms.

> The proposed PANIC in PageSetFull seems like it requires more thought as
> well: surely we don't want that ever to happen. Which means that
> callers need to be careful not to invoke such an operation on an
> un-updated page, but this proposed coding offers no aid in making sure
> that won't happen. What is needed there, I think, is some more global
> policy about what operations are permitted on old (un-converted) pages
> and a high-level approach to ensuring that unsafe operations aren't
> attempted.

ad) PANIC
PANIC shouldn't happen because page validation in BufferRead should check
supported page version.

ad) policy - it is good catch. I think all read page operation should be allowed
on old page version. Only tuple, LSN, TSL, and special modification should be
allowed for writing. Addpageitem should invokes page conversion before any
action happen (if there is free space for tuple, it is possible to convert page
in to the new format, but after conversion space could be smaller then tuple.).

Zdenek


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 15:42:54
Message-ID: 484FF27E.8040904@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> 4) Implementation
>>
>> The main point of implementation is to have several version of
>> PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and
>> correct structure will be handled in special branch (see examples).
>
> (this won't come as a surprise as we talked about this in PGCon, but) I
> think we should rather convert the page structure to new format in
> ReadBuffer the first time a page is read in. That would keep the changes
> a lot more isolated.

I agree with Tom's reply. And anyway this approach will be mostly isolated into
page.c and you need to able read old page in both cases.

> Note that you need to handle not only page header changes, but changes
> to internal representations of different data types, and changes like
> varvarlen and combocid. Those are things that have happened in the past;
> in the future, I'm foreseeing changes to the toast header, for example,
> as there's been a lot of ideas related to toast options compression.

I know, this is a first small step for inplace upgrade. Tupleheader will follow.
Page structure is basic. I want to split development into small steps, because
it is easy to review.

Zdenek


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 15:42:57
Message-ID: 484FF281.3030104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> (this won't come as a surprise as we talked about this in PGCon, but) I
>> think we should rather convert the page structure to new format in
>> ReadBuffer the first time a page is read in. That would keep the changes
>> a lot more isolated.
>
> The problem is that ReadBuffer is an extremely low-level environment,
> and it's not clear that it's possible (let alone practical) to do a
> conversion at that level in every case.

Well, we can't predict the future, and can't guarantee that it's
possible or practical to do the things we need to do in the future no
matter what approach we choose.

> In particular it hardly seems
> sane to expect ReadBuffer to do tuple content conversion, which is going
> to be practically impossible to perform without any catalog accesses.

ReadBuffer has access to Relation, which has information about what kind
of a relation it's dealing with, and TupleDesc. That should get us
pretty far. It would be a modularity violation, for sure, but I could
live with that for the purpose of page version conversion.

> Another issue is that it might not be possible to update a page for
> lack of space. Are we prepared to assume that there will never be a
> transformation we need to apply that makes the data bigger?

We do need some solution to that. One idea is to run a pre-upgrade
script in the old version that scans the database and moves tuples that
would no longer fit on their pages in the new version. This could be run
before the upgrade, while the old database is still running, so it would
be acceptable for that to take some time.

No doubt people would prefer something better than that. Another idea
would be to have some over-sized buffers that can be used as the target
of conversion, until some tuples are moved off to another page. Perhaps
the over-sized buffer wouldn't need to be in shared memory, if they're
read-only until some tuples are moved.

This is pretty hand-wavy, I know. The point is, I don't think these
problems are insurmountable.

> (Likely counterexample: adding collation info to text values.)

I doubt it, as collation is not a property of text values, but
operations. But that's off-topic...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 16:02:22
Message-ID: 484FF70E.9060407@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Tom Lane wrote:
>> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>>> (this won't come as a surprise as we talked about this in PGCon, but)
>>> I think we should rather convert the page structure to new format in
>>> ReadBuffer the first time a page is read in. That would keep the
>>> changes a lot more isolated.
>>
>> The problem is that ReadBuffer is an extremely low-level environment,
>> and it's not clear that it's possible (let alone practical) to do a
>> conversion at that level in every case.
>
> Well, we can't predict the future, and can't guarantee that it's
> possible or practical to do the things we need to do in the future no
> matter what approach we choose.
>
>> In particular it hardly seems
>> sane to expect ReadBuffer to do tuple content conversion, which is going
>> to be practically impossible to perform without any catalog accesses.
>
> ReadBuffer has access to Relation, which has information about what kind
> of a relation it's dealing with, and TupleDesc. That should get us
> pretty far. It would be a modularity violation, for sure, but I could
> live with that for the purpose of page version conversion.

But if you look for example into hash implementation some pages are not in
regular format and conversion could need more information which we do not have
to have in ReadBuffer.

>> Another issue is that it might not be possible to update a page for
>> lack of space. Are we prepared to assume that there will never be a
>> transformation we need to apply that makes the data bigger?
>
> We do need some solution to that. One idea is to run a pre-upgrade
> script in the old version that scans the database and moves tuples that
> would no longer fit on their pages in the new version. This could be run
> before the upgrade, while the old database is still running, so it would
> be acceptable for that to take some time.

It could not work for indexes and do not forget TOAST chunks. I think in some
cases you can get unused quoter of each page in TOAST table.

> No doubt people would prefer something better than that. Another idea
> would be to have some over-sized buffers that can be used as the target
> of conversion, until some tuples are moved off to another page. Perhaps
> the over-sized buffer wouldn't need to be in shared memory, if they're
> read-only until some tuples are moved.

Anyway, you need mechanism how to mark that this page is read only which is also
require a lot of modification. And some mechanism how to make a decision when
this page converted. I guess this approach will require similar modification as
convert on write.

> This is pretty hand-wavy, I know. The point is, I don't think these
> problems are insurmountable.
>
>> (Likely counterexample: adding collation info to text values.)
>
> I doubt it, as collation is not a property of text values, but
> operations. But that's off-topic...

Yes, it is offtopic, however I think Tom is right :-).

Zdenek


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-12 00:17:59
Message-ID: 87wskv4ibs.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> (Likely counterexample: adding collation info to text values.)

I don't think the argument really needs an example, but I would be pretty
upset if we proposed tagging every text datum with a collation. Encoding
perhaps, though that seems like a bad idea to me on performance grounds, but
collation is not a property of the data at all.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-12 01:26:20
Message-ID: F0238EBA67824444BC1CB4700960CB480588CC68@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > (Likely counterexample: adding collation info to text values.)
>
> I don't think the argument really needs an example, but I
> would be pretty
> upset if we proposed tagging every text datum with a
> collation. Encoding
> perhaps, though that seems like a bad idea to me on
> performance grounds, but
> collation is not a property of the data at all.

Again not directly related to difficulties upgrading pages...

The recent discussion ...
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00102.php
... mentions keeping collation information together with text data,
however it is referring to keeping it together when processing it,
not when storing the text.

Regards,
Stephen Denne.
--
At the Datamail Group we value teamwork, respect, achievement, client focus, and courage.
This email with any attachments is confidential and may be subject to legal privilege.
If it is not intended for you please advise by replying immediately, destroy it and do not
copy, disclose or use it in any way.

The Datamail Group, through our GoGreen programme, is committed to environmental sustainability.
Help us in our efforts by not printing this email.
__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/dmzmessaging.htm for details.
__________________________________________________________________


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-12 17:43:51
Message-ID: 200806121743.m5CHhpJ03188@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Zdenek Kotala wrote:
> > 4) Implementation
> >
> > The main point of implementation is to have several version of
> > PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct
> > structure will be handled in special branch (see examples).
>
> (this won't come as a surprise as we talked about this in PGCon, but) I
> think we should rather convert the page structure to new format in
> ReadBuffer the first time a page is read in. That would keep the changes
> a lot more isolated.
>
> Note that you need to handle not only page header changes, but changes
> to internal representations of different data types, and changes like
> varvarlen and combocid. Those are things that have happened in the past;
> in the future, I'm foreseeing changes to the toast header, for example,
> as there's been a lot of ideas related to toast options compression.

I understand the goal of having good modularity (not having ReadBuffer
modify the page), but I am worried that doing multi-version page
processing in a modular way is going to spread version-specific
information all over the backend code, making is harder to understand.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Decibel! <decibel(at)decibel(dot)org>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-12 21:39:12
Message-ID: 2803D35A-ED9D-4914-BC4D-7097D6A46DF3@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 11, 2008, at 10:42 AM, Heikki Linnakangas wrote:

>> Another issue is that it might not be possible to update a page for
>> lack of space. Are we prepared to assume that there will never be a
>> transformation we need to apply that makes the data bigger?
>
> We do need some solution to that. One idea is to run a pre-upgrade
> script in the old version that scans the database and moves tuples
> that would no longer fit on their pages in the new version. This
> could be run before the upgrade, while the old database is still
> running, so it would be acceptable for that to take some time.

That means old versions have to have some knowledge of new versions.
There's also a big race condition unless the old version starts
taking size requirements into account every time a page is dirtied.

> No doubt people would prefer something better than that. Another
> idea would be to have some over-sized buffers that can be used as
> the target of conversion, until some tuples are moved off to
> another page. Perhaps the over-sized buffer wouldn't need to be in
> shared memory, if they're read-only until some tuples are moved.
>
> This is pretty hand-wavy, I know. The point is, I don't think these
> problems are insurmountable.

--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-12 21:53:07
Message-ID: 48519AC3.1010209@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Another issue is that it might not be possible to update a page for
> lack of space. Are we prepared to assume that there will never be a
> transformation we need to apply that makes the data bigger? In such a
> situation an in-place update might be impossible, and that certainly
> takes it outside the bounds of what ReadBuffer can be expected to manage.

Would a possible solution to this be that you could

1. Upgrade to the newest minor-version of the old release
(which has knowledge of the space requirements of the
new one).

2. Run some new maintenance command like "vacuum expand" or
"vacuum prepare_for_upgrade" or something that would split
any too-full pages, leaving only pages with enough space.

3. Only then shutdown the old server and start the
new major-version server.


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-13 08:27:19
Message-ID: 48522F67.7080809@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ron Mayer napsal(a):
> Tom Lane wrote:
>> Another issue is that it might not be possible to update a page for
>> lack of space. Are we prepared to assume that there will never be a
>> transformation we need to apply that makes the data bigger? In such a
>> situation an in-place update might be impossible, and that certainly
>> takes it outside the bounds of what ReadBuffer can be expected to manage.
>
> Would a possible solution to this be that you could
>

<snip>

>
> 2. Run some new maintenance command like "vacuum expand" or
> "vacuum prepare_for_upgrade" or something that would split
> any too-full pages, leaving only pages with enough space.

It does not solve problems for example with TOAST tables. If chunks does not fit
on a new page layout one of the chunk tuple have to be moved to free page. It
means you get a lot of pages with ~2kB of free unused space. And if max chunk
size is different between version you got another problem as well.

There is also idea to change compression algorithm for 8.4 (or offer more
varinats). It also mean that you need to understand old algorithm in a new
version or you need to repack everything on old version.

Zdenek


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-13 09:26:07
Message-ID: 48523D2F.5030305@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian napsal(a):
> Heikki Linnakangas wrote:
>> Zdenek Kotala wrote:
>>> 4) Implementation
>>>
>>> The main point of implementation is to have several version of
>>> PageHeader structure (e.g. PageHeader_04, PageHeader_03 ...) and correct
>>> structure will be handled in special branch (see examples).
>> (this won't come as a surprise as we talked about this in PGCon, but) I
>> think we should rather convert the page structure to new format in
>> ReadBuffer the first time a page is read in. That would keep the changes
>> a lot more isolated.
>>
>> Note that you need to handle not only page header changes, but changes
>> to internal representations of different data types, and changes like
>> varvarlen and combocid. Those are things that have happened in the past;
>> in the future, I'm foreseeing changes to the toast header, for example,
>> as there's been a lot of ideas related to toast options compression.
>
> I understand the goal of having good modularity (not having ReadBuffer
> modify the page), but I am worried that doing multi-version page
> processing in a modular way is going to spread version-specific
> information all over the backend code, making is harder to understand.

I don't think so. Page already contains page version information inside and
currently we have macros like PageSetLSN. Caller needn't know nothing about
PageHeader representation. It is responsibility of page API to correctly handle
multi version.

The same we can use for tuple access. It is more complicated but I think it is
possible. Currently we several macros (e.g. HeapTupleGetOid) which works on
TupleData structure. "Only" what we need is extend this API as well.

I think in final we will get more readable code.

Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-13 16:18:37
Message-ID: 12071.1213373917@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> It does not solve problems for example with TOAST tables. If chunks does not fit
> on a new page layout one of the chunk tuple have to be moved to free page. It
> means you get a lot of pages with ~2kB of free unused space. And if max chunk
> size is different between version you got another problem as well.

> There is also idea to change compression algorithm for 8.4 (or offer more
> varinats). It also mean that you need to understand old algorithm in a new
> version or you need to repack everything on old version.

I don't have any problem at all with the idea that in-place update isn't
going to support arbitrary changes of parameters, such as modifying the
toast chunk size. In particular anything that is locked down by
pg_control isn't a problem.

regards, tom lane