Vertical Partitioning with TOAST

Lists: pgsql-hackers
From: Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Vertical Partitioning with TOAST
Date: 2005-12-01 08:59:08
Message-ID: 438EBB5C.9070304@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

I wrote a experimental patch for a vertical partitioning
function.

I decided to use the code of TOAST to create the function
easily. In a word, the row that the user specified is forcedly
driven out with TOAST.

The performance gain of 10% was seen by driving out c_data of the
customer table in the DBT-2 benchmark in our environment.

The mechanism of TOAST is an overdesigned system to use it for a
vertical partitioning. Because the overhead of processing is large,
the performance might down according to the environment.

There are seriously a lot of things that should be considered if
a vertical partitioning is mounted.
For instance, TOAST index is omitted, and ctid is used for link.

Your comments are welcome. Thanks.

---
How To Use
---
Use "ALTER TABLE" command.
http://www.postgresql.org/docs/8.1/static/sql-altertable.html

ALTER TABLE name ALTER COLUMN column SET STRAGE FORCEEXTERNAL;

I do not understand whether "FORCEEXTERNAL" is an appropriate
word. Please teach when there is a better word...

--
Junji Teramoto

Attachment Content-Type Size
patch_forceexternal.diff text/plain 4.6 KB

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 05:02:27
Message-ID: 20051208050227.GH16053@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This seems like a useful feature to add, allowing for easy built-in
verticle partitioning. Are there issues with the patch as-is? (Other
than it probably should have gone to -patches...)

On Thu, Dec 01, 2005 at 05:59:08PM +0900, Junji TERAMOTO wrote:
> Hi all,
>
> I wrote a experimental patch for a vertical partitioning
> function.
>
> I decided to use the code of TOAST to create the function
> easily. In a word, the row that the user specified is forcedly
> driven out with TOAST.
>
> The performance gain of 10% was seen by driving out c_data of the
> customer table in the DBT-2 benchmark in our environment.
>
> The mechanism of TOAST is an overdesigned system to use it for a
> vertical partitioning. Because the overhead of processing is large,
> the performance might down according to the environment.
>
> There are seriously a lot of things that should be considered if
> a vertical partitioning is mounted.
> For instance, TOAST index is omitted, and ctid is used for link.
>
> Your comments are welcome. Thanks.
>
> ---
> How To Use
> ---
> Use "ALTER TABLE" command.
> http://www.postgresql.org/docs/8.1/static/sql-altertable.html
>
> ALTER TABLE name ALTER COLUMN column SET STRAGE FORCEEXTERNAL;
>
> I do not understand whether "FORCEEXTERNAL" is an appropriate
> word. Please teach when there is a better word...
>
>
> --
> Junji Teramoto

> diff -purN postgresql-8.1.0.org/src/backend/access/heap/heapam.c postgresql-8.1.0/src/backend/access/heap/heapam.c
> --- postgresql-8.1.0.org/src/backend/access/heap/heapam.c 2005-10-15 11:49:08.000000000 +0900
> +++ postgresql-8.1.0/src/backend/access/heap/heapam.c 2005-12-01 15:31:38.307713257 +0900
> @@ -1070,6 +1070,36 @@ heap_get_latest_tid(Relation relation,
> } /* end of loop */
> }
>
> +// Add by junji from here
> +/*
> + * has_rel_forceexternal - Is there "SET STORAGE FORCEEXTERNAL"ed rows?
> + */
> +bool
> +has_rel_forceexternal(Relation relation)
> +{
> + TupleDesc tupleDesc;
> + Form_pg_attribute *att;
> + int numAttrs;
> + int i;
> +
> + /*
> + * Get the tuple descriptor and break down the tuple(s) into fields.
> + */
> + tupleDesc = relation->rd_att;
> + att = tupleDesc->attrs;
> + numAttrs = tupleDesc->natts;
> +
> + for (i = 0; i < numAttrs; i++)
> + {
> + if (att[i]->attstorage == 'f')
> + return true;
> + }
> +
> + return false;
> +}
> +// Add by junji to here
> +
> +
> /*
> * heap_insert - insert tuple into a heap
> *
> @@ -1130,6 +1160,9 @@ heap_insert(Relation relation, HeapTuple
> * out-of-line attributes from some other relation, invoke the toaster.
> */
> if (HeapTupleHasExternal(tup) ||
> +// Add by junji from here
> + (has_rel_forceexternal(relation)) ||
> +// Add by junji to here
> (MAXALIGN(tup->t_len) > TOAST_TUPLE_THRESHOLD))
> heap_tuple_toast_attrs(relation, tup, NULL);
>
> @@ -1762,6 +1795,9 @@ l2:
> */
> need_toast = (HeapTupleHasExternal(&oldtup) ||
> HeapTupleHasExternal(newtup) ||
> +// Add by junji from here
> + (has_rel_forceexternal(relation)) ||
> +// Add by junji to here
> (MAXALIGN(newtup->t_len) > TOAST_TUPLE_THRESHOLD));
>
> newtupsize = MAXALIGN(newtup->t_len);
> diff -purN postgresql-8.1.0.org/src/backend/access/heap/tuptoaster.c postgresql-8.1.0/src/backend/access/heap/tuptoaster.c
> --- postgresql-8.1.0.org/src/backend/access/heap/tuptoaster.c 2005-10-15 11:49:09.000000000 +0900
> +++ postgresql-8.1.0/src/backend/access/heap/tuptoaster.c 2005-12-01 15:29:29.722579466 +0900
> @@ -512,6 +512,46 @@ toast_insert_or_update(Relation rel, Hea
> }
> }
>
> +// Add by junji from here
> + /*
> + * We look for attributes of attstorage 'f'.
> + */
> + if (rel->rd_rel->reltoastrelid != InvalidOid)
> + {
> + Datum old_value;
> +
> + /*------
> + * Search for the biggest yet inlined attribute with
> + * attstorage equals 'x' or 'e'
> + *------
> + */
> + for (i = 0; i < numAttrs; i++)
> + {
> + if (toast_action[i] == 'p')
> + continue;
> + if (VARATT_IS_EXTERNAL(toast_values[i]))
> + continue;
> + if (att[i]->attstorage != 'f')
> + continue;
> +
> + /*
> + * Store this external
> + */
> + old_value = toast_values[i];
> + toast_action[i] = 'p';
> + toast_values[i] = toast_save_datum(rel, toast_values[i]);
> + if (toast_free[i])
> + pfree(DatumGetPointer(old_value));
> +
> + toast_free[i] = true;
> + toast_sizes[i] = VARATT_SIZE(toast_values[i]);
> +
> + need_change = true;
> + need_free = true;
> + }
> + }
> +// Add by junji to here
> +
> /* ----------
> * Compress and/or save external until data fits into target length
> *
> diff -purN postgresql-8.1.0.org/src/backend/commands/tablecmds.c postgresql-8.1.0/src/backend/commands/tablecmds.c
> --- postgresql-8.1.0.org/src/backend/commands/tablecmds.c 2005-10-15 11:49:15.000000000 +0900
> +++ postgresql-8.1.0/src/backend/commands/tablecmds.c 2005-12-01 15:29:29.726577573 +0900
> @@ -3439,6 +3439,10 @@ ATExecSetStorage(Relation rel, const cha
> newstorage = 'x';
> else if (pg_strcasecmp(storagemode, "main") == 0)
> newstorage = 'm';
> +// Add by junji from here
> + else if (pg_strcasecmp(storagemode, "forceexternal") == 0)
> + newstorage = 'f';
> +// Add by junji to here
> else
> {
> ereport(ERROR,
> @@ -6045,8 +6049,14 @@ needs_toast_table(Relation rel)
> tupdesc = rel->rd_att;
> att = tupdesc->attrs;
>
> +
> +
> for (i = 0; i < tupdesc->natts; i++)
> {
> +// Add by junji from here
> + if (att[i]->attstorage == 'f')
> + return true;
> +// Add by junji to here
> if (att[i]->attisdropped)
> continue;
> data_length = att_align(data_length, att[i]->attalign);
> diff -purN postgresql-8.1.0.org/src/include/access/heapam.h postgresql-8.1.0/src/include/access/heapam.h
> --- postgresql-8.1.0.org/src/include/access/heapam.h 2005-10-15 11:49:42.000000000 +0900
> +++ postgresql-8.1.0/src/include/access/heapam.h 2005-12-01 15:29:29.726577573 +0900
> @@ -155,6 +155,10 @@ extern void heap_get_latest_tid(Relation
> ItemPointer tid);
> extern void setLastTid(const ItemPointer tid);
>
> +// Add by junji from here
> +extern bool has_rel_forceexternal(Relation relation);
> +// Add by junji to here
> +
> extern Oid heap_insert(Relation relation, HeapTuple tup, CommandId cid,
> bool use_wal, bool use_fsm);
> extern HTSU_Result heap_delete(Relation relation, ItemPointer tid,

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 05:59:47
Message-ID: 7027.1134021587@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> This seems like a useful feature to add, allowing for easy built-in
> verticle partitioning. Are there issues with the patch as-is?

Other than the ones mentioned by the poster?

It seemed to me more like a not-too-successful experiment than something
ready for application. If you take the viewpoint that this is just
another TOAST storage strategy, I think it's pretty useless. A large
field value is going to get toasted anyway with the regular strategy,
and if your column happens to contain some values that are not large,
forcing them out-of-line anyway is simply silly. (You could make a case
for making the threshold size user-controllable, but I don't see the
case for setting the threshold to zero, which is what this amounts to.)

The poster was not actually suggesting applying it in the form of a
force-external TOAST strategy; he was using this as a prototype to try
to interest people in the idea of out-of-line storage mechanisms with
lower overhead than TOAST. But that part is all speculation not code.

Personally, I'd rather look into whether we couldn't speed up TOAST
without changing any of its basic assumptions. The current
implementation isn't awful, but it was built to allow the existing table
and index mechanisms to be re-used for TOAST data. Now that we know for
certain TOAST is a good idea, it would be reasonable to take a second
look at whether we could improve the performance with another round of
implementation effort.

regards, tom lane


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 07:34:23
Message-ID: 20051208073423.GP16053@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > This seems like a useful feature to add, allowing for easy built-in
> > verticle partitioning. Are there issues with the patch as-is?
>
> Other than the ones mentioned by the poster?
>
> It seemed to me more like a not-too-successful experiment than something
> ready for application. If you take the viewpoint that this is just
> another TOAST storage strategy, I think it's pretty useless. A large
> field value is going to get toasted anyway with the regular strategy,
> and if your column happens to contain some values that are not large,
> forcing them out-of-line anyway is simply silly. (You could make a case
> for making the threshold size user-controllable, but I don't see the
> case for setting the threshold to zero, which is what this amounts to.)

Valid point. I do think there's a lot of benefit to being able to set
the limit much lower than what it currently defaults to today. We have a
client that has a queue-type table that is updated very frequently. One
of the fields is text, that is not updated as frequently. Keeping this
table vacuumed well enough has proven to be problematic, because any
delay to vacuuming quickly results in a very large amount of bloat.
Moving that text field into a seperate table would most likely be a win.

Presumably this would need to be settable on at least a per-table basis.

Would adding such a variable be a good beginner TODO, or is it too
invasive?

> Personally, I'd rather look into whether we couldn't speed up TOAST
> without changing any of its basic assumptions. The current
> implementation isn't awful, but it was built to allow the existing table
> and index mechanisms to be re-used for TOAST data. Now that we know for
> certain TOAST is a good idea, it would be reasonable to take a second
> look at whether we could improve the performance with another round of
> implementation effort.

I've often wondered about all the overhead of storing toast data in what
amounts to a regular table. Sounds like another TODO...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 08:51:23
Message-ID: 4397F40B.6000209@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello all,

Thank you for having the interest.

Jim C. Nasby wrote:
> Valid point. I do think there's a lot of benefit to being able to set
> the limit much lower than what it currently defaults to today. We have a
> client that has a queue-type table that is updated very frequently. One
> of the fields is text, that is not updated as frequently. Keeping this
> table vacuumed well enough has proven to be problematic, because any
> delay to vacuuming quickly results in a very large amount of bloat.
> Moving that text field into a seperate table would most likely be a win.

Yes, our team think that this patch is effective that the tuple can be
partially updated.
For instance, DBT-2 updates frequently contents excluding c_data in the
customer table. Because c_data(about 400bytes: The size of the entire
tuple is 500bytes.) is copied together in every case, it is thought that
it has decreased the performance.
That is more important than the vertical partitioning function.

Of course, it is important to change DDL of the table. However, I think
it might be useful when it is not possible to change.

As pointed out by Tom, this is a patch to verify the idea.
I want to know that community is how much interested in a partial update.
Of course, it is interested whether to want the vertical partitioning
function in PostgreSQL, too. :-)

By the way, should I send the patch to -patches again?

--
Junji Teramoto / teramoto.junji (a) lab.ntt.co.jp


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 15:03:43
Message-ID: 200512081503.jB8F3hc09241@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:
> On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
> > "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > > This seems like a useful feature to add, allowing for easy built-in
> > > verticle partitioning. Are there issues with the patch as-is?
> >
> > Other than the ones mentioned by the poster?
> >
> > It seemed to me more like a not-too-successful experiment than something
> > ready for application. If you take the viewpoint that this is just
> > another TOAST storage strategy, I think it's pretty useless. A large
> > field value is going to get toasted anyway with the regular strategy,
> > and if your column happens to contain some values that are not large,
> > forcing them out-of-line anyway is simply silly. (You could make a case
> > for making the threshold size user-controllable, but I don't see the
> > case for setting the threshold to zero, which is what this amounts to.)
>
> Valid point. I do think there's a lot of benefit to being able to set
> the limit much lower than what it currently defaults to today. We have a
> client that has a queue-type table that is updated very frequently. One
> of the fields is text, that is not updated as frequently. Keeping this
> table vacuumed well enough has proven to be problematic, because any
> delay to vacuuming quickly results in a very large amount of bloat.
> Moving that text field into a seperate table would most likely be a win.
>
> Presumably this would need to be settable on at least a per-table basis.
>
> Would adding such a variable be a good beginner TODO, or is it too
> invasive?

Well, we have now:

ALTER TABLE ALTER [ COLUMN ] column
SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

What else is needed?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 18:42:57
Message-ID: 20051208184256.GA58449@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 08, 2005 at 10:03:43AM -0500, Bruce Momjian wrote:
> Jim C. Nasby wrote:
> > On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
> > > "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > > > This seems like a useful feature to add, allowing for easy built-in
> > > > verticle partitioning. Are there issues with the patch as-is?
> > >
> > > Other than the ones mentioned by the poster?
> > >
> > > It seemed to me more like a not-too-successful experiment than something
> > > ready for application. If you take the viewpoint that this is just
> > > another TOAST storage strategy, I think it's pretty useless. A large
> > > field value is going to get toasted anyway with the regular strategy,
> > > and if your column happens to contain some values that are not large,
> > > forcing them out-of-line anyway is simply silly. (You could make a case
> > > for making the threshold size user-controllable, but I don't see the
> > > case for setting the threshold to zero, which is what this amounts to.)
> >
> > Valid point. I do think there's a lot of benefit to being able to set
> > the limit much lower than what it currently defaults to today. We have a
> > client that has a queue-type table that is updated very frequently. One
> > of the fields is text, that is not updated as frequently. Keeping this
> > table vacuumed well enough has proven to be problematic, because any
> > delay to vacuuming quickly results in a very large amount of bloat.
> > Moving that text field into a seperate table would most likely be a win.
> >
> > Presumably this would need to be settable on at least a per-table basis.
> >
> > Would adding such a variable be a good beginner TODO, or is it too
> > invasive?
>
> Well, we have now:
>
> ALTER TABLE ALTER [ COLUMN ] column
> SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
>
> What else is needed?

As Tom suggested, I think it would be best to be able to change the size
at which a field gets stored externally. I think it also makes sense to
have this reverse the normal order of compress first, then if it still
doesn't fit store it externally. I forsee this typically being useful
when you have fields that are between ~100 and 1000 bytes in size, and
I'm doubtful that compression would do much good there. But I wouldn't
rule out this being useful on fields that can also sometimes contain
much larger amounts of data, so I don't think it makes sense to disable
compression completely. So, I think this leaves two new options:

SET STORAGE EXTERNAL [THRESHOLD x]
If a field is over x in size, it's stored externally.

SET STORAGE EXTENDED [THRESHOLD x]
If a field is over x in size, it's stored externally. If it's over
BLCKSZ/4 it will also be compressed (I think that's how things work
now).

Actually, that's rather ugly. I think it would be better to just break
external storage and compression out into their own attributes:

SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]

ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
then it will be stored externally. May be specified along with ALLOW
COMPRESSION.

ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Vertical Partitioning with TOAST
Date: 2005-12-08 19:04:22
Message-ID: 439883B6.6000502@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/8/2005 1:42 PM, Jim C. Nasby wrote:

> On Thu, Dec 08, 2005 at 10:03:43AM -0500, Bruce Momjian wrote:
>> Jim C. Nasby wrote:
>> > On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
>> > > "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
>> > > > This seems like a useful feature to add, allowing for easy built-in
>> > > > verticle partitioning. Are there issues with the patch as-is?
>> > >
>> > > Other than the ones mentioned by the poster?
>> > >
>> > > It seemed to me more like a not-too-successful experiment than something
>> > > ready for application. If you take the viewpoint that this is just
>> > > another TOAST storage strategy, I think it's pretty useless. A large
>> > > field value is going to get toasted anyway with the regular strategy,
>> > > and if your column happens to contain some values that are not large,
>> > > forcing them out-of-line anyway is simply silly. (You could make a case
>> > > for making the threshold size user-controllable, but I don't see the
>> > > case for setting the threshold to zero, which is what this amounts to.)
>> >
>> > Valid point. I do think there's a lot of benefit to being able to set
>> > the limit much lower than what it currently defaults to today. We have a
>> > client that has a queue-type table that is updated very frequently. One
>> > of the fields is text, that is not updated as frequently. Keeping this
>> > table vacuumed well enough has proven to be problematic, because any
>> > delay to vacuuming quickly results in a very large amount of bloat.
>> > Moving that text field into a seperate table would most likely be a win.
>> >
>> > Presumably this would need to be settable on at least a per-table basis.
>> >
>> > Would adding such a variable be a good beginner TODO, or is it too
>> > invasive?
>>
>> Well, we have now:
>>
>> ALTER TABLE ALTER [ COLUMN ] column
>> SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
>>
>> What else is needed?
>
> As Tom suggested, I think it would be best to be able to change the size
> at which a field gets stored externally. I think it also makes sense to
> have this reverse the normal order of compress first, then if it still
> doesn't fit store it externally. I forsee this typically being useful
> when you have fields that are between ~100 and 1000 bytes in size, and
> I'm doubtful that compression would do much good there. But I wouldn't
> rule out this being useful on fields that can also sometimes contain
> much larger amounts of data, so I don't think it makes sense to disable
> compression completely. So, I think this leaves two new options:

It's not the size of a field that triggers toasting. It is the size of
the entire tuple. As long as that is > BLKSIZE/4, the toaster will pick
the currently largest inline value and do "something" with it.
"something" is either compressing or (if not allowed or already done)
moving external.

Jan

>
> SET STORAGE EXTERNAL [THRESHOLD x]
> If a field is over x in size, it's stored externally.
>
> SET STORAGE EXTENDED [THRESHOLD x]
> If a field is over x in size, it's stored externally. If it's over
> BLCKSZ/4 it will also be compressed (I think that's how things work
> now).
>
> Actually, that's rather ugly. I think it would be better to just break
> external storage and compression out into their own attributes:
>
> SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]
>
> ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
> then it will be stored externally. May be specified along with ALLOW
> COMPRESSION.
>
> ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
> then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2006-03-03 03:15:19
Message-ID: 200603030315.k233FJA04699@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Is there still interst in this idea for TODO?

---------------------------------------------------------------------------

Jim C. Nasby wrote:
> On Thu, Dec 08, 2005 at 10:03:43AM -0500, Bruce Momjian wrote:
> > Jim C. Nasby wrote:
> > > On Thu, Dec 08, 2005 at 12:59:47AM -0500, Tom Lane wrote:
> > > > "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > > > > This seems like a useful feature to add, allowing for easy built-in
> > > > > verticle partitioning. Are there issues with the patch as-is?
> > > >
> > > > Other than the ones mentioned by the poster?
> > > >
> > > > It seemed to me more like a not-too-successful experiment than something
> > > > ready for application. If you take the viewpoint that this is just
> > > > another TOAST storage strategy, I think it's pretty useless. A large
> > > > field value is going to get toasted anyway with the regular strategy,
> > > > and if your column happens to contain some values that are not large,
> > > > forcing them out-of-line anyway is simply silly. (You could make a case
> > > > for making the threshold size user-controllable, but I don't see the
> > > > case for setting the threshold to zero, which is what this amounts to.)
> > >
> > > Valid point. I do think there's a lot of benefit to being able to set
> > > the limit much lower than what it currently defaults to today. We have a
> > > client that has a queue-type table that is updated very frequently. One
> > > of the fields is text, that is not updated as frequently. Keeping this
> > > table vacuumed well enough has proven to be problematic, because any
> > > delay to vacuuming quickly results in a very large amount of bloat.
> > > Moving that text field into a seperate table would most likely be a win.
> > >
> > > Presumably this would need to be settable on at least a per-table basis.
> > >
> > > Would adding such a variable be a good beginner TODO, or is it too
> > > invasive?
> >
> > Well, we have now:
> >
> > ALTER TABLE ALTER [ COLUMN ] column
> > SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }
> >
> > What else is needed?
>
> As Tom suggested, I think it would be best to be able to change the size
> at which a field gets stored externally. I think it also makes sense to
> have this reverse the normal order of compress first, then if it still
> doesn't fit store it externally. I forsee this typically being useful
> when you have fields that are between ~100 and 1000 bytes in size, and
> I'm doubtful that compression would do much good there. But I wouldn't
> rule out this being useful on fields that can also sometimes contain
> much larger amounts of data, so I don't think it makes sense to disable
> compression completely. So, I think this leaves two new options:
>
> SET STORAGE EXTERNAL [THRESHOLD x]
> If a field is over x in size, it's stored externally.
>
> SET STORAGE EXTENDED [THRESHOLD x]
> If a field is over x in size, it's stored externally. If it's over
> BLCKSZ/4 it will also be compressed (I think that's how things work
> now).
>
> Actually, that's rather ugly. I think it would be better to just break
> external storage and compression out into their own attributes:
>
> SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]
>
> ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
> then it will be stored externally. May be specified along with ALLOW
> COMPRESSION.
>
> ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
> then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2006-03-03 22:16:39
Message-ID: 20060303221639.GK82012@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

If this would be accepted I might actually be able to accomplish this.
Maybe. :) But having a TODO wouldn't be a bad idea as well...

Would this require 2 new fields in pg_attribute, or is there a better
way to store the thresholds? I'm thinking that each field would need two
special values; 0 for 'no external/compression' and -1 for 'use default'
(hrm, I guess this means we should add at least one GUC to control that
default...)

I suspect there's folks on -general who would express interest if you
want me to ask there...

On Thu, Mar 02, 2006 at 10:15:19PM -0500, Bruce Momjian wrote:
>
> Is there still interst in this idea for TODO?
>
> ---------------------------------------------------------------------------
> > As Tom suggested, I think it would be best to be able to change the size
> > at which a field gets stored externally. I think it also makes sense to
> > have this reverse the normal order of compress first, then if it still
> > doesn't fit store it externally. I forsee this typically being useful
> > when you have fields that are between ~100 and 1000 bytes in size, and
> > I'm doubtful that compression would do much good there. But I wouldn't
> > rule out this being useful on fields that can also sometimes contain
> > much larger amounts of data, so I don't think it makes sense to disable
> > compression completely. So, I think this leaves two new options:
> >
> > SET STORAGE EXTERNAL [THRESHOLD x]
> > If a field is over x in size, it's stored externally.
> >
> > SET STORAGE EXTENDED [THRESHOLD x]
> > If a field is over x in size, it's stored externally. If it's over
> > BLCKSZ/4 it will also be compressed (I think that's how things work
> > now).
> >
> > Actually, that's rather ugly. I think it would be better to just break
> > external storage and compression out into their own attributes:
> >
> > SET STORAGE [ ALLOW EXTERNAL [THRESHOLD x] ] [ ALLOW COMPRESSION [THRESHOLD x] ]
> >
> > ALLOW EXTERNAL: if a field is larger than x bytes (BLCKSZ/4 by default)
> > then it will be stored externally. May be specified along with ALLOW
> > COMPRESSION.
> >
> > ALLOW COMPRESSION: if a field is larger than x bytes (BLCKSZ/4 by default)
> > then it will be COMPRESSED. May be specified along with ALLOW EXTERNAL.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2006-03-04 10:15:46
Message-ID: 1141467346.3772.2.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, N, 2006-03-02 kell 22:15, kirjutas Bruce Momjian:
> Is there still interst in this idea for TODO?

Just to voice my support - Yes, I think that being able to set lower
thresolds for TOAST is very useful in several cases.

Also getting rid of toast index and start using ctids directly would be
a big bonus.

When using direct ctids we could use either ctid chains or some sort of
skiplist for access to N-th TOAST chunk.

------------
Hannu


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Hannu Krosing <hannu(at)skype(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2006-03-04 10:21:19
Message-ID: 20060304102119.GB29180@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 04, 2006 at 12:15:46PM +0200, Hannu Krosing wrote:
> Also getting rid of toast index and start using ctids directly would be
> a big bonus.
>
> When using direct ctids we could use either ctid chains or some sort of
> skiplist for access to N-th TOAST chunk.

I suppose this would mean that you couldn't use vacuum on the toast
table anymore. Or teach vacuum that everytime it moves a tuple it needs
to update the original table (sequential scan). What exactly are you
trying to save here?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2006-03-04 15:31:37
Message-ID: 14597.1141486297@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Sat, Mar 04, 2006 at 12:15:46PM +0200, Hannu Krosing wrote:
>> Also getting rid of toast index and start using ctids directly would be
>> a big bonus.
>> When using direct ctids we could use either ctid chains or some sort of
>> skiplist for access to N-th TOAST chunk.

> I suppose this would mean that you couldn't use vacuum on the toast
> table anymore.

Another problem with it is that it'd destroy the current optimizations
that allow partial fetches of uncompressed TOASTed fields to be fast.
You couldn't fetch page N of a TOAST datum without reading all the pages
before it.

I suppose the objection that toast tables wouldn't be regular tables
anymore might not be fatal, but you'll certainly get some pushback if
you try to take away the direct-access optimizations.

regards, tom lane


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Junji TERAMOTO <teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Vertical Partitioning with TOAST
Date: 2006-03-04 18:40:49
Message-ID: 1141497650.3772.12.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, L, 2006-03-04 kell 10:31, kirjutas Tom Lane:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > On Sat, Mar 04, 2006 at 12:15:46PM +0200, Hannu Krosing wrote:
> >> Also getting rid of toast index and start using ctids directly would be
> >> a big bonus.
> >> When using direct ctids we could use either ctid chains or some sort of
> >> skiplist for access to N-th TOAST chunk.
>
> > I suppose this would mean that you couldn't use vacuum on the toast
> > table anymore.
>
> Another problem with it is that it'd destroy the current optimizations
> that allow partial fetches of uncompressed TOASTed fields to be fast.
> You couldn't fetch page N of a TOAST datum without reading all the pages
> before it.
>
> I suppose the objection that toast tables wouldn't be regular tables
> anymore might not be fatal, but you'll certainly get some pushback if
> you try to take away the direct-access optimizations.

That's why I was suggesting skiplist instead on simple linked lists.

Another way would be to put a list of all toast ctids for your whole
toasted field in the first page(s) of the toast.

That way you will still have option of fast access to any partial of the
field, most likely even faster than with current implementation, as you
have to touch less pages. And you can have it also for compressed fields
if you store uncompressed offsets.

-----------------
Hannu