[Patch] Space reservation (pgupgrade)

Lists: pgsql-hackers
From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: [Patch] Space reservation (pgupgrade)
Date: 2008-12-12 11:43:48
Message-ID: 49424E74.2060605@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I attached patch which add capability to reserve space on page for future
upgrade. It is mandatory for future in-place upgrade implementation. This patch
contains basic infrastructure not preupgrade script itself. I'm going to send
WIP preupgrade script today in separate mail.

This patch contains following modifications:

1) I added datpreupgstatus and relpreupgstatus attribute into pg_database and
pg_class. Original idea was to use only flag, but I need more info for tracking
several process status (like 0 - not set, 1 - reserved space set, 2 -
reservation is finished and so on).

I'm not sure if datpreupgstatus will be useful, but I think better is to have it
here.

2) I added two reloption rs_perpage and rs_pertuple for setup amount of reserved
space. I think these two attributes are enough for configure all case. Keep in
mind that for each relation could be these parameters different.

3) I adapted source code to respect new reloptions. Basic idea of it is that
before someone call PageAddItem it checks free space on a page
(PageGetFreeSpace...). I modify PageGetFreeSpace function to count reserved
space. Unfortunately, it requires additional parameters.

It works, but I'm not sure if any external enhancement cannot shortcut this and
call PageAddItem without PageFreeSpace call.

I'm thinking now about refactoring it and replace PageGetFreeSpace(Heap)
functions with RelPageGetFreeSpace and add new function RelPageAddItem.
RelPageAddItem will replace all direct call of PageAddItem.

Comments, ideas?

thanks Zdenek

Attachment Content-Type Size
spacereserve.patch text/x-diff 43.0 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Space reservation (pgupgrade)
Date: 2008-12-12 12:45:59
Message-ID: 49425D07.6030701@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> I attached patch which add capability to reserve space on page for
> future upgrade. It is mandatory for future in-place upgrade
> implementation. This patch contains basic infrastructure not preupgrade
> script itself. I'm going to send WIP preupgrade script today in separate
> mail.

Is that a preupgrade script for upgrading from 8.3 to 8.4? As such, it
can't take advantage of any of the changes in this patch.

> This patch contains following modifications:
>
> 1) I added datpreupgstatus and relpreupgstatus attribute into
> pg_database and pg_class. Original idea was to use only flag, but I need
> more info for tracking several process status (like 0 - not set, 1 -
> reserved space set, 2 - reservation is finished and so on).
>
> I'm not sure if datpreupgstatus will be useful, but I think better is to
> have it here.

I think this is too flexible and not flexible enough at the same time.
It's too flexible, because we don't know what information we'd need to
store about relations in a future script. Those fields might go unused
for many releases, just confusing people. It's not flexible enough, if
it turns out that we need to store more information about relations.

Predicting features that have not yet been written is hard, isn't it.
Trying to do it too precisely will just lead to failure. The generic
approach of using a pre-upgrade script is good, but I don't think it's
wise to prepare anything more specific than that.

It seems that those flags were meant to keep track of what databases and
relations have already been processed by the pre-upgrade script. I don't
think the script needs to be restartable. If we accept that the whole
cluster must be processed in one go, we don't need so much bookkeeping.
Remember that this is a tool that we'll need to backport to people's
production systems, so better keep it as simple as possible.

> 2) I added two reloption rs_perpage and rs_pertuple for setup amount of
> reserved space. I think these two attributes are enough for configure
> all case. Keep in mind that for each relation could be these parameters
> different.

I'm afraid these too are too flexible and not flexible enough.

For example, if we change the representation of a data type so that some
values become longer, some shorter, how much space would you reserve per
page and per tuple?

In the future release, when we know exactly what the new on-disk format
looks like, we can backpatch a patch that reserves the right amount of
space on pages.

Note that from a testing point of view, those reloptions would go unused
until it's time to upgrade to the next release, so it wouldn't be
significantly less risky to just backpatch code to do the calculation at
that point, vs. implementing some generic formula based on per-page and
per-tuple reservation earlier.

> 3) I adapted source code to respect new reloptions. Basic idea of it is
> that before someone call PageAddItem it checks free space on a page
> (PageGetFreeSpace...). I modify PageGetFreeSpace function to count
> reserved space. Unfortunately, it requires additional parameters.

Yeah, I think that's the right place to do it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Space reservation (pgupgrade)
Date: 2008-12-12 13:38:21
Message-ID: 4942694D.3000604@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> I attached patch which add capability to reserve space on page for
>> future upgrade. It is mandatory for future in-place upgrade
>> implementation. This patch contains basic infrastructure not
>> preupgrade script itself. I'm going to send WIP preupgrade script
>> today in separate mail.
>
> Is that a preupgrade script for upgrading from 8.3 to 8.4? As such, it
> can't take advantage of any of the changes in this patch.

No, it is preupgrade script for 8.4->8.5, better is say it is code which try to
make space on page - part of future preupgrade script. It will be useful when
8.5 will be out. But probably we will need to make some improvements in 8.4 code.

>
>> This patch contains following modifications:
>>
>> 1) I added datpreupgstatus and relpreupgstatus attribute into
>> pg_database and pg_class. Original idea was to use only flag, but I
>> need more info for tracking several process status (like 0 - not set,
>> 1 - reserved space set, 2 - reservation is finished and so on).
>>
>> I'm not sure if datpreupgstatus will be useful, but I think better is
>> to have it here.
>
> I think this is too flexible and not flexible enough at the same time.
> It's too flexible, because we don't know what information we'd need to
> store about relations in a future script. Those fields might go unused
> for many releases, just confusing people. It's not flexible enough, if
> it turns out that we need to store more information about relations.

Yes, it is reason why I use int instead of bool, which should be use differently
in each version. Of course nobody know what will happen in the future
development, but I currently implement what I know that we need. If we will need
more than ...

> Predicting features that have not yet been written is hard, isn't it.
> Trying to do it too precisely will just lead to failure. The generic
> approach of using a pre-upgrade script is good, but I don't think it's
> wise to prepare anything more specific than that.
>
> It seems that those flags were meant to keep track of what databases and
> relations have already been processed by the pre-upgrade script. I don't
> think the script needs to be restartable. If we accept that the whole
> cluster must be processed in one go, we don't need so much bookkeeping.
> Remember that this is a tool that we'll need to backport to people's
> production systems, so better keep it as simple as possible.

The problem is with CREATE TABLE/INDEX command. If somebody creates table then
table will not have correctly set attributes and new table is not prepared for
upgrade. There are of course more solution like forbidden create table/index
command during preupgrade or set reserved space for relation based on already
know constants. I prefer now easiest way and it is to have three statuses.

The idea of preupgrade script is following:

1) for each relation calculate reserved space per page and per tuple. Per page
is just difference between page headers size and maybe special size. But per
tuple it requires to calculate per attribute difference - potential null values
will be ignored. The calculation will be know when next release, but we need
basic support for it.

2) when reserved space is set then preupgrade script start to process all
relation and perform appropriate operation on the block.

4) upgrade check in single mode if all relations have correct status.

>> 2) I added two reloption rs_perpage and rs_pertuple for setup amount
>> of reserved space. I think these two attributes are enough for
>> configure all case. Keep in mind that for each relation could be these
>> parameters different.
>
> I'm afraid these too are too flexible and not flexible enough.
>
> For example, if we change the representation of a data type so that some
> values become longer, some shorter, how much space would you reserve per
> page and per tuple?

As I mention several times, data type change should be handled differently. It
should invoke to create new datatype and keep old datatype implementation (for
example in separate legacy library).

Tuple size difference is calculate on following structures:

HeapTupleHeader (including DatumTupleHeader for composite datatypes)
OID size (+Security tag)
Array structures
Varlena implementation
Toast pointer

preupgrade script should know difference in size in these structures and takes
all attributes and count maximal additional size.

> In the future release, when we know exactly what the new on-disk format
> looks like, we can backpatch a patch that reserves the right amount of
> space on pages.

Yes but you need to have infrastructure. Better and safer is backport just a
constants then big amount of code. Probably we will be sometime in situation
when we will need to backport more but every time we can postpone problematic
feature and update inplace upgrade infrastructure in current development
release. See CRC example.

> Note that from a testing point of view, those reloptions would go unused
> until it's time to upgrade to the next release, so it wouldn't be
> significantly less risky to just backpatch code to do the calculation at
> that point, vs. implementing some generic formula based on per-page and
> per-tuple reservation earlier.

As I mention, It is what i want. Create basic infrastructure and constants will
be set and backported when 8.5 will be out.

thanks Zdenek