[PATCH] Extending pg_class info + more flexible TOAST chunk size

Lists: pgsql-hackers
From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-12 13:32:46
Message-ID: 48F1FC7E.1060406@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast
chunk size more flexible.

Attached patch add three new columns into pg_class

relblocksize - which is always BLCKSZ. I put it there for fullness, but i could
be use in future development to specify different BLCKSZ per relation.

relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in
smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE. The
problem how to do it fully dynamic is how to pass information rel_rd->relsegsize
down into smgropen. One idea is to extend relfilenode, but I'm not sure about it.

relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE. Other relation has this
value set to zero for now. toast functions are fully aware about this setting
and use it. This column will be convert to int2vector during pg_upgrade
development (I need to track information for each page version).

Additional work could be add CRATE TABLE ... WITH chunksize=xxxx

Patch requires increase control file version and catalog version.

thanks for your comments
Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql

Attachment Content-Type Size
pg_class.patch.gz application/x-gzip 6.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 01:00:02
Message-ID: 18603.1223859602@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast
> chunk size more flexible.

> Attached patch add three new columns into pg_class

> relblocksize - which is always BLCKSZ. I put it there for fullness, but i could
> be use in future development to specify different BLCKSZ per relation.

> relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in
> smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE. The
> problem how to do it fully dynamic is how to pass information rel_rd->relsegsize
> down into smgropen. One idea is to extend relfilenode, but I'm not sure about it.

> relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE. Other relation has this
> value set to zero for now. toast functions are fully aware about this setting
> and use it. This column will be convert to int2vector during pg_upgrade
> development (I need to track information for each page version).

There is not one of these things that we have any intention of being
allowed to vary on a per-relation basis. Why don't you read them out of
pg_control?

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 01:36:59
Message-ID: 20081013100331.8793.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> wrote:

> Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast
> chunk size more flexible.

I agree that flexible toast threshold is useful, but I have
several questions in your implementations.

> relblocksize - which is always BLCKSZ. I put it there for fullness, but i could
> be use in future development to specify different BLCKSZ per relation.

We need many infrastructural changes for making it work:

1. Change BLCKSZ to be a variable from a constant value.
It allows us to have "initdb --blocksize=16kB".
2. Make bufmgr handle different sizes of pages in a server instance.
I think it is a difficult task.
3. Have a SQL interface for 2.

You suggested only 3, but 1 and 2 are more important.

> relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in
> smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE.

I'm not sure why relsegsize should be adjusted per-relation basis.
If there are no limitations in filesystem and in backup
utilities, large relsegsize is always better, no?
Is it enough to have "initdb --segsize=4GB" ?

> relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE.

You added new columns in pg_class, but we have another choice to use
pg_class.reloptions. What is the reason you choose new-columns-way?

Another question is that we should have per-column toast control
options not only per-relation ones. Do we also need to modify
pg_attribute in the same way?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 07:04:37
Message-ID: 48F2F305.1040507@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane napsal(a):
> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
>> Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast
>> chunk size more flexible.
>
>> Attached patch add three new columns into pg_class
>
>> relblocksize - which is always BLCKSZ. I put it there for fullness, but i could
>> be use in future development to specify different BLCKSZ per relation.
>
>> relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in
>> smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE. The
>> problem how to do it fully dynamic is how to pass information rel_rd->relsegsize
>> down into smgropen. One idea is to extend relfilenode, but I'm not sure about it.
>
>> relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE. Other relation has this
>> value set to zero for now. toast functions are fully aware about this setting
>> and use it. This column will be convert to int2vector during pg_upgrade
>> development (I need to track information for each page version).
>
> There is not one of these things that we have any intention of being
> allowed to vary on a per-relation basis. Why don't you read them out of
> pg_control?

The problem what I need to solve is how to handle different TOAST chunk size
which becomes with upgrade. if you insert new record into TOAST table it will be
created on the new page which has different max chunk size, because it depends
on page header size. It means that one TOAST table will have more chunk size.
Use old value from previous version is possible but it could invoke waste of
space in situation when pageheader in a new version is bigger.

Another solution is to take first chunk size and say - it is max chunk size.

Relsegsize is related to tablespace but when you upgrade you could want to use
new size for new tables. But it is not important for pg_upgrade project.

Blocksize is more nice to have int this moment, but It makes me sense to have
different block size for toast table and heap. I know that this idea requires
lot of changes including buffer cache and so on.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 07:25:18
Message-ID: 48F2F7DE.8010604@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> The problem what I need to solve is how to handle different TOAST chunk
> size which becomes with upgrade. if you insert new record into TOAST
> table it will be created on the new page which has different max chunk
> size, because it depends on page header size. It means that one TOAST
> table will have more chunk size.
> Use old value from previous version is possible but it could invoke
> waste of space in situation when pageheader in a new version is bigger.
>
> Another solution is to take first chunk size and say - it is max chunk
> size.

Not all chunks need to be the same size. We do currently require that,
but AFAICS it's only because that allows random access to a given offset
within a datum. That's of course nice, but I think we could live without
it. Or try random access with the new toast size first, and if the
chunks turn out to be different size, fall back to reading all chunks
sequentially. And if we have to retoast all values before they're
accessed, per the other thread, then we can just assume that all toast
chunks that we need to random access are of the new size.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 07:38:19
Message-ID: 48F2FAEB.5020404@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro napsal(a):
> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> wrote:
>
>> Bacause MAX_TOAST_CHUNK_SIZE is related on page layout version I need have toast
>> chunk size more flexible.
>
> I agree that flexible toast threshold is useful, but I have
> several questions in your implementations.
>
>> relblocksize - which is always BLCKSZ. I put it there for fullness, but i could
>> be use in future development to specify different BLCKSZ per relation.
>
> We need many infrastructural changes for making it work:
>
> 1. Change BLCKSZ to be a variable from a constant value.
> It allows us to have "initdb --blocksize=16kB".
> 2. Make bufmgr handle different sizes of pages in a server instance.
> I think it is a difficult task.
> 3. Have a SQL interface for 2.
>
> You suggested only 3, but 1 and 2 are more important.

I know about problem complexity. If you find on how many places BLCKSZ is used
you get basic picture.

>> relsegsize - currently it is always RELSEG_FILE. I performed basic adjustment in
>> smgr.c and md.c. Now only smgropen contains reference to RELSEG_FILE.
>
> I'm not sure why relsegsize should be adjusted per-relation basis.
> If there are no limitations in filesystem and in backup
> utilities, large relsegsize is always better, no?
> Is it enough to have "initdb --segsize=4GB" ?

It is related to tablespace. Usually one table space is stored on different
volume which can have different FS. The question is how to handle relation
migration from one tablespace to another. In most cases it requires physical
copy which could change seg size anyway. Maybe extend pg_tablespace makes more
sense.

>> relmaxitemsize - it is set to TOAST_MAX_CHUNK_SIZE.
>
> You added new columns in pg_class, but we have another choice to use
> pg_class.reloptions. What is the reason you choose new-columns-way?

Good point I will look on it.

> Another question is that we should have per-column toast control
> options not only per-relation ones. Do we also need to modify
> pg_attribute in the same way?

Currently we have one TOAST table per heap table (if it contains varlena
attribute). Data in the TOAST table are stored without any idea what data type
it is.

Thanks Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 08:45:14
Message-ID: 48F30A9A.5050804@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> The problem what I need to solve is how to handle different TOAST
>> chunk size which becomes with upgrade. if you insert new record into
>> TOAST table it will be created on the new page which has different max
>> chunk size, because it depends on page header size. It means that one
>> TOAST table will have more chunk size.
>> Use old value from previous version is possible but it could invoke
>> waste of space in situation when pageheader in a new version is bigger.
>>
>> Another solution is to take first chunk size and say - it is max chunk
>> size.
>
> Not all chunks need to be the same size. We do currently require that,
> but AFAICS it's only because that allows random access to a given offset
> within a datum. That's of course nice, but I think we could live without
> it.

Good point. I think it is good to keep this feature.

> Or try random access with the new toast size first, and if the
> chunks turn out to be different size, fall back to reading all chunks
> sequentially.

I think it is not necessary to read it sequentially, only you need to recompute
new position.

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 09:15:17
Message-ID: 48F311A5.9060809@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Heikki Linnakangas napsal(a):
>> Not all chunks need to be the same size. We do currently require that,
>> but AFAICS it's only because that allows random access to a given
>> offset within a datum. That's of course nice, but I think we could
>> live without it.
>
> Good point. I think it is good to keep this feature.

Yeah. At the moment, it's only used for substring(), I think.

>> Or try random access with the new toast size first, and if the chunks
>> turn out to be different size, fall back to reading all chunks
>> sequentially.
>
> I think it is not necessary to read it sequentially, only you need to
> recompute new position.

Yeah, true.

It occurs to me that instead of storing a chunk id, we could store a
byte offset of the chunk. That would allow random access even if every
chunk was of different size. You probably don't want any new changes you
need to deal with in the upgrade, though :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-10-13 13:39:58
Message-ID: 48F34FAE.9020408@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):

>
> It occurs to me that instead of storing a chunk id, we could store a
> byte offset of the chunk. That would allow random access even if every
> chunk was of different size. You probably don't want any new changes you
> need to deal with in the upgrade, though :-).

Yes, no change please :-)

Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Zdenek Kotala" <Zdenek(dot)Kotala(at)sun(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-11-04 02:47:38
Message-ID: 603c8f070811031847u1fab34d0v62655bd2664cf94e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek,

It seems like there is general agreement that this patch needs some
changes before being considered for application. Is that correct?

http://archives.postgresql.org/pgsql-hackers/2008-11/msg00049.php

Are you planning to send a new version for this CommitFest, or is this
8.5 material at this point?

Thanks,

...Robert


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Extending pg_class info + more flexible TOAST chunk size
Date: 2008-11-04 13:32:44
Message-ID: 49104EFC.3080901@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas napsal(a):
> Zdenek,
>
> It seems like there is general agreement that this patch needs some
> changes before being considered for application. Is that correct?
>
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00049.php
>
> Are you planning to send a new version for this CommitFest, or is this
> 8.5 material at this point?

Yes, I plan to do it. 8.5 is too late for this change.

Thanks Zdenek

--
Zdenek Kotala Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql