Re: pg_upgrade: How to deal with toast

Lists: pgsql-hackers
From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_upgrade: How to deal with toast
Date: 2008-11-19 21:45:21
Message-ID: 492488F1.4000009@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We are now in discussion about toast table upgrading. I try to collect ideas and
figure out how it should work and where are problems.

Overview:
---------

A few weeks ago we made a decision to use convert on read. We already made a
decision how to solve problem with overflow data after conversion. Now we need
to make decision how to deal with toast table.

Toasted data are split into chunks and these chunks are stored into toasted
table as a record with following structure:

(valueid oid, residx int32, chunk varlena)

Chunk size is defined by TOAST_MAX_CHUNK_SIZE.

Toast table could contain different datatypes and one page can contain different
type chunks.

toast_fetch_datum, toast_fetch_datum_slice and functions are low level function
which do a main job.

How to upgrade it:
------------------

Toasted values is processed on demand when some part of postgresql needs to have
detosted value (see for example pg_detoast_datum()). The toast_fetch_datum
function starts index scan where valueid is a search key. This scan invokes a
toast index and toasttable page upgrade provided by hook in ReadBuffer and
toast_fetch_datum gets already converted tuple, but chunked data will stay
untouched.

Toasted datum can be converted only when is completely connected together. It of
course invokes lot of page conversions.

The idea is to read toasted datum, convert it, store it back and old chunks mark
as deleted. Same method will be use for slice access, because we cannot access
selected slice until the toasted datum is not converted.

the implementation will add hook in toast_fetch_datum, toast_fetch_datum_slice
functions which handle conversion (similar to hook in ReadBuffer).

Issues:
-------

1) different chunk size in old and new format.

It is not issue because, old format is read and connected in the conversion
function and this function can accept different chunk size. (originally I
supposed to replace residx with offset, but I think it is not necessary now for
upgrade)

2) data type is unknown

Unfortunately, in low function is no clue what data type is really stored in a
chunks. One idea how to solve it is to add attno to chunk record structure. By
my opinion, It is best solution, but it disallow to upgrade from 8.3->8.4!

3) How to detect which toasted datum need conversion

One idea is to make magic with XMIN. Everything older then "upgraded"
transaction needs conversion. But vacuum probably can freeze some old or new
tuples and after that we lost information. Another possible solution is probably
mark tuples on converted page in info mask.

Ideas, comments?

Zdenek


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 06:16:38
Message-ID: 492500C6.2080602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> 2) data type is unknown
>
> Unfortunately, in low function is no clue what data type is really
> stored in a chunks.

Do we need to know? We haven't changed the on-disk format of any data
types between 8.3 and 8.4, have we?

The other idea that was suggested earlier is to retoast all toast datums
referenced from the heap page, when the heap page is read in. We know
the data types at that point. That generates a lot more random I/O on
the first access to the unconverted heap page, but I think it would be ok.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 07:25:15
Message-ID: 492510DB.6040400@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> 2) data type is unknown
>>
>> Unfortunately, in low function is no clue what data type is really
>> stored in a chunks.
>
> Do we need to know? We haven't changed the on-disk format of any data
> types between 8.3 and 8.4, have we?

Yeah, there is no change, but we should prepare on-disk format for any potential
future change.

> The other idea that was suggested earlier is to retoast all toast datums
> referenced from the heap page, when the heap page is read in. We know
> the data types at that point. That generates a lot more random I/O on
> the first access to the unconverted heap page, but I think it would be ok.

Thanks to remind me this approach. Yeah it is solution which should work. I
don't like much this because random I/O, however this could be used for chunk
record transformation between 8.3->8.4. And after that we will be able converted
it on detoast request.

The question is if we should do toast modification now to avoid potential future
problems or if we will solve it when any on-disk format change requires it?

I prefer do it now, because there could be small risk that it will not possible
to do it in the future. When in-place upgrade will be implemented nobody will
want to perform backup/restore.

thanks Zdenek


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 09:29:04
Message-ID: 49252DE0.6070204@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> The question is if we should do toast modification now to avoid
> potential future problems or if we will solve it when any on-disk format
> change requires it?

Perhaps we should just add the new attid attribute to the toast table,
but mark it as nullable? We wouldn't need to fill it in in the 8.3->8.4
conversion but new tuples would include it.

In the future release that we actually need it, we'll make it
non-nullable, and write a pre-upgrade script to retoast tuples that
don't have it yet.

Hmm. That would change TOAST_MAX_CHUNK_SIZE, though.

> I prefer do it now, because there could be small risk that it will not
> possible to do it in the future. When in-place upgrade will be
> implemented nobody will want to perform backup/restore.

I feel we should avoid doing anything extra, risking that we get nothing
finished.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 10:21:07
Message-ID: 49253A13.4000501@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> The question is if we should do toast modification now to avoid
>> potential future problems or if we will solve it when any on-disk
>> format change requires it?
>
> Perhaps we should just add the new attid attribute to the toast table,
> but mark it as nullable? We wouldn't need to fill it in in the 8.3->8.4
> conversion but new tuples would include it.
>
> In the future release that we actually need it, we'll make it
> non-nullable, and write a pre-upgrade script to retoast tuples that
> don't have it yet.

Hmm, It seems to me as a good idea. It will complicated preupgrade script for
8.4->8.5 script but we will have one year to developed it :-).

What we need to do is during conversion to add nullbit array for each tuple in
toasttable. I think there is enough space on all platform to reuse gap between
tuple header and data.

> Hmm. That would change TOAST_MAX_CHUNK_SIZE, though.

Yes, it change it. :(

but I can convert easily residx to offset end (residx*TOAST_MAX_CHUNK_SIZE -+
something) during page conversion.

>> I prefer do it now, because there could be small risk that it will not
>> possible to do it in the future. When in-place upgrade will be
>> implemented nobody will want to perform backup/restore.
>
> I feel we should avoid doing anything extra, risking that we get nothing
> finished.

Agree.

Zdenek


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 10:37:06
Message-ID: 49253DD2.2080004@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Heikki Linnakangas napsal(a):
>> Perhaps we should just add the new attid attribute to the toast table,
>> but mark it as nullable? We wouldn't need to fill it in in the
>> 8.3->8.4 conversion but new tuples would include it.
> >
>> In the future release that we actually need it, we'll make it
>> non-nullable, and write a pre-upgrade script to retoast tuples that
>> don't have it yet.
>
> Hmm, It seems to me as a good idea. It will complicated preupgrade
> script for 8.4->8.5 script but we will have one year to developed it :-).
>
> What we need to do is during conversion to add nullbit array for each
> tuple in toasttable. I think there is enough space on all platform to
> reuse gap between tuple header and data.

It the new attribute is added to the end, the old tuples will be
compatible as is. If there's no null bitmap (or it's shorter than
expectd), attributes missing from a tuple are implicitly NULL. That's
how we support ALTER TABLE ADD COLUMN without rewriting all the data.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 11:36:57
Message-ID: 49254BD9.1050001@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):

> It the new attribute is added to the end, the old tuples will be
> compatible as is. If there's no null bitmap (or it's shorter than
> expectd), attributes missing from a tuple are implicitly NULL. That's
> how we support ALTER TABLE ADD COLUMN without rewriting all the data.
>

Yeah, but better is to put int column before chunk data. It save space
(alignment). I prefer to put attno as a third column.

Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 15:50:32
Message-ID: 2449.1227196232@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> Heikki Linnakangas napsal(a):
>> Perhaps we should just add the new attid attribute to the toast table,
>> but mark it as nullable?

> Hmm, It seems to me as a good idea.

No, it's a really horrid idea. Nullable attributes complicate the C
code, and what in the world are we buying with it anyway? Just decide
what the field should contain and put it in there.

regards, tom lane


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 18:21:23
Message-ID: 4925AAA3.80302@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane napsal(a):
> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
>> Heikki Linnakangas napsal(a):
>>> Perhaps we should just add the new attid attribute to the toast table,
>>> but mark it as nullable?
>
>> Hmm, It seems to me as a good idea.
>
> No, it's a really horrid idea. Nullable attributes complicate the C
> code, and what in the world are we buying with it anyway? Just decide
> what the field should contain and put it in there.

The problem what we try to solve is to perform this change during upgrade from
8.3->8.4. Extra value is a problem because it requires extra space and there is
not free space. It is temporal solution(hack) for 8.3->8.4.

Another thing what we can do is to perform "fake" page conversion of heap which
will retoast a toasted value which are present on heap tuples. The toasted table
will contains two kind of tuples, but in normal situation only converted tuples
should be accessed.

Zdenek


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 18:29:13
Message-ID: 200811201829.mAKITDS27567@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Tom Lane napsal(a):
> > Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> >> Heikki Linnakangas napsal(a):
> >>> Perhaps we should just add the new attid attribute to the toast table,
> >>> but mark it as nullable?
> >
> >> Hmm, It seems to me as a good idea.
> >
> > No, it's a really horrid idea. Nullable attributes complicate the C
> > code, and what in the world are we buying with it anyway? Just decide
> > what the field should contain and put it in there.
>
> The problem what we try to solve is to perform this change during upgrade from
> 8.3->8.4. Extra value is a problem because it requires extra space and there is
> not free space. It is temporal solution(hack) for 8.3->8.4.

Once we have the 'require free space' capability in a major Postgres
release, can't we use that to make space for the new TOAST field we will
need?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 18:34:52
Message-ID: 4925ADCC.6030307@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian napsal(a):
> Zdenek Kotala wrote:
>> Tom Lane napsal(a):
>>> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
>>>> Heikki Linnakangas napsal(a):
>>>>> Perhaps we should just add the new attid attribute to the toast table,
>>>>> but mark it as nullable?
>>>> Hmm, It seems to me as a good idea.
>>> No, it's a really horrid idea. Nullable attributes complicate the C
>>> code, and what in the world are we buying with it anyway? Just decide
>>> what the field should contain and put it in there.
>> The problem what we try to solve is to perform this change during upgrade from
>> 8.3->8.4. Extra value is a problem because it requires extra space and there is
>> not free space. It is temporal solution(hack) for 8.3->8.4.
>
> Once we have the 'require free space' capability in a major Postgres
> release, can't we use that to make space for the new TOAST field we will
> need?
>

The problem is between 8.3 and 8.4. Unfortunately 8.3 does not have this
capability. And if it will be backported then space reservation on toast table
will be too expensive - you need to move one tuple which usually has BLCKSZ/4).

Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_upgrade: How to deal with toast
Date: 2008-11-20 20:16:54
Message-ID: 7249.1227212214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> Tom Lane napsal(a):
>> No, it's a really horrid idea. Nullable attributes complicate the C
>> code, and what in the world are we buying with it anyway? Just decide
>> what the field should contain and put it in there.

> The problem what we try to solve is to perform this change during upgrade from
> 8.3->8.4. Extra value is a problem because it requires extra space and there is
> not free space. It is temporal solution(hack) for 8.3->8.4.

Solution of what? The original concern you had was about
TOAST_MAX_CHUNK_SIZE changing from 8.3 to 8.4. If that's a problem then
it has to be solved anyway, and the solution has to involve being able
to push some chunks off-page to make room. If it's not a problem then
let's just leave the toast representation as-is till later.

regards, tom lane