Re: Equivalent praxis to CLUSTERED INDEX?

Lists: pgsql-performance
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>, <mischa(dot)sandberg(at)telus(dot)net>
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-26 19:30:24
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE475B39@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>> How do vendors actually implement auto-clustering? I assume
>they move
>> rows around during quiet periods or have lots of empty space in each
>> value bucket.
>
>
>As far as I know, Oracle does it by having a B-Tree organized heap (a
>feature introduced around v8 IIRC), basically making the primary key
>index and the heap the same physical structure. Any non-index columns
>are stored in the index along with the index columns. Implementing it
>is slightly weird because searching the index and selecting the rows
>from the heap are not separate operations.

Almost the same for MSSQL. The clustered index is always forced unique.
If you create a non-unique clustered index, SQLServer will internally
pad it with random (or is it sequential? Can't remember right now) data
to make each key unique. The clustered index contains all the data
fields - both the index key and the other columns from the database.

It does support non-clustered indexes as well on the same table. Any
"secondary index" will then contain the index key and the primary key
value. This means a lookup in a non-clustered index means a two-step
index lookup: First look in the non-clustered index for the clustered
key. Then look in the clustered index for the rest of the data.

Naturally a non-clustered index needs better selectivity before it's
actually used than a clustered index does.

IIRC, SQL Server always creates clustered indexes by default for primary
keys.

//Magnus


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>, "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>, <mischa(dot)sandberg(at)telus(dot)net>
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-26 19:48:25
Message-ID: 200408261248.25661.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Magnus,

> IIRC, SQL Server always creates clustered indexes by default for primary
> keys.

I think that's a per-database setting; certainly the ones I admin do not.

However, since SQL Server orders its data pages, those data pages tend to be
in the order of the primary key regardless if there is no clustered index.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-26 20:44:03
Message-ID: 1093553043.349.171.camel@vulture.corp.neopolitan.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote:
> Almost the same for MSSQL. The clustered index is always forced unique.
> If you create a non-unique clustered index, SQLServer will internally
> pad it with random (or is it sequential? Can't remember right now) data
> to make each key unique. The clustered index contains all the data
> fields - both the index key and the other columns from the database.
>
> It does support non-clustered indexes as well on the same table. Any
> "secondary index" will then contain the index key and the primary key
> value. This means a lookup in a non-clustered index means a two-step
> index lookup: First look in the non-clustered index for the clustered
> key. Then look in the clustered index for the rest of the data.

Ah, okay. I see how that would work for a secondary index, though it
would make for a slow secondary index. Neat workaround. For all I
know, current versions of Oracle may support secondary indexes on
index-organized tables; all this Postgres usage over the last couple
years has made my Oracle knowledge rusty.

> IIRC, SQL Server always creates clustered indexes by default for primary
> keys.

That would surprise me actually. For some types of tables, e.g. ones
with multiple well-used indexes or large rows, index-organizing the heap
could easily give worse performance than a normal index/heap pair
depending on access patterns. It also tends to be more prone to having
locking contention under some access patterns. This is one of those
options that needs to be used knowledgeably; it is not a general
architectural improvement that you would want to apply to every table
all the time.

J. Andrew Rogers


From: Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-26 21:41:04
Message-ID: QNsXc.56331$X12.35308@edtnps84
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

J. Andrew Rogers wrote:
> On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote:
>>IIRC, SQL Server always creates clustered indexes by default for primary
>>keys.
>
> That would surprise me actually.

Yaz, it should. It doesn't ALWAYS create clustered (unique) index for
primary keys, but clustered is the default if you just specify

CREATE TABLE Foo (col1, ...
,PRIMARY KEY(col1, ...)
)

Saying PRIMARY KEY NONCLUSTERED(...) is how you override the default.

((Weird to be discussing so much MSSQL here))


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-performance(at)postgresql(dot)org, mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 01:45:50
Message-ID: 200408270145.i7R1jop04185@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Updated TODO item:

o Automatically maintain clustering on a table

This would require some background daemon to maintain clustering
during periods of low usage. It might also require tables to be only
paritally filled for easier reorganization. It also might require
creating a merged heap/index data file so an index lookup would
automatically access the heap data too.

---------------------------------------------------------------------------

J. Andrew Rogers wrote:
> On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote:
> > Almost the same for MSSQL. The clustered index is always forced unique.
> > If you create a non-unique clustered index, SQLServer will internally
> > pad it with random (or is it sequential? Can't remember right now) data
> > to make each key unique. The clustered index contains all the data
> > fields - both the index key and the other columns from the database.
> >
> > It does support non-clustered indexes as well on the same table. Any
> > "secondary index" will then contain the index key and the primary key
> > value. This means a lookup in a non-clustered index means a two-step
> > index lookup: First look in the non-clustered index for the clustered
> > key. Then look in the clustered index for the rest of the data.
>
>
> Ah, okay. I see how that would work for a secondary index, though it
> would make for a slow secondary index. Neat workaround. For all I
> know, current versions of Oracle may support secondary indexes on
> index-organized tables; all this Postgres usage over the last couple
> years has made my Oracle knowledge rusty.
>
>
> > IIRC, SQL Server always creates clustered indexes by default for primary
> > keys.
>
>
> That would surprise me actually. For some types of tables, e.g. ones
> with multiple well-used indexes or large rows, index-organizing the heap
> could easily give worse performance than a normal index/heap pair
> depending on access patterns. It also tends to be more prone to having
> locking contention under some access patterns. This is one of those
> options that needs to be used knowledgeably; it is not a general
> architectural improvement that you would want to apply to every table
> all the time.
>
>
> J. Andrew Rogers
>
>
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-performance(at)postgresql(dot)org, mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 03:39:42
Message-ID: 87pt5dnta9.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> Updated TODO item:
>
> o Automatically maintain clustering on a table
>
> This would require some background daemon to maintain clustering
> during periods of low usage. It might also require tables to be only
> paritally filled for easier reorganization. It also might require
> creating a merged heap/index data file so an index lookup would
> automatically access the heap data too.

Fwiw, I would say the first "would" is also a "might". None of the previous
discussions here presumed a maintenance daemon. The discussions before talked
about a mechanism to try to place new tuples as close as possible to the
proper index position.

I would also suggest making some distinction between a cluster system similar
to what we have now but improved to maintain the clustering continuously, and
an actual index-organized-table where the tuples are actually only stored in a
btree structure.

They're two different approaches to similar problems. But they might both be
useful to have, and have markedly different implementation details.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-performance(at)postgresql(dot)org, mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 04:35:07
Message-ID: 200408270435.i7R4Z7N25250@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


OK, new wording:

o Automatically maintain clustering on a table

This might require some background daemon to maintain clustering
during periods of low usage. It might also require tables to be only
paritally filled for easier reorganization. Another idea would
be to create a merged heap/index data file so an index lookup would
automatically access the heap data too.

---------------------------------------------------------------------------

Greg Stark wrote:
>
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > Updated TODO item:
> >
> > o Automatically maintain clustering on a table
> >
> > This would require some background daemon to maintain clustering
> > during periods of low usage. It might also require tables to be only
> > paritally filled for easier reorganization. It also might require
> > creating a merged heap/index data file so an index lookup would
> > automatically access the heap data too.
>
> Fwiw, I would say the first "would" is also a "might". None of the previous
> discussions here presumed a maintenance daemon. The discussions before talked
> about a mechanism to try to place new tuples as close as possible to the
> proper index position.
>
> I would also suggest making some distinction between a cluster system similar
> to what we have now but improved to maintain the clustering continuously, and
> an actual index-organized-table where the tuples are actually only stored in a
> btree structure.
>
> They're two different approaches to similar problems. But they might both be
> useful to have, and have markedly different implementation details.
>
> --
> greg
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 08:26:26
Message-ID: cgmr7s$59d$1@floppy.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg Stark wrote:

> The discussions before talked about a mechanism to try to place new
> tuples as close as possible to the proper index position.

Means this that an index shall have a "fill factor" property, similar to
Informix one ?

From the manual:

The FILLFACTOR option takes effect only when you build an index on a table
that contains more than 5,000 rows and uses more than 100 table pages, when
you create an index on a fragmented table, or when you create a fragmented
index on a nonfragmented table.
Use the FILLFACTOR option to provide for expansion of an index at a later
date or to create compacted indexes.
When the index is created, the database server initially fills only that
percentage of the nodes specified with the FILLFACTOR value.

# Providing a Low Percentage Value
If you provide a low percentage value, such as 50, you allow room for growth
in your index. The nodes of the index initially fill to a certain percentage and
contain space for inserts. The amount of available space depends on the
number of keys in each page as well as the percentage value.
For example, with a 50-percent FILLFACTOR value, the page would be half
full and could accommodate doubling in size. A low percentage value can
result in faster inserts and can be used for indexes that you expect to grow.

# Providing a High Percentage Value
If you provide a high percentage value, such as 99, your indexes are
compacted, and any new index inserts result in splitting nodes. The
maximum density is achieved with 100 percent. With a 100-percent
FILLFACTOR value, the index has no room available for growth; any
additions to the index result in splitting the nodes.
A 99-percent FILLFACTOR value allows room for at least one insertion per
node. A high percentage value can result in faster selects and can be used for
indexes that you do not expect to grow or for mostly read-only indexes.

Regards
Gaetano Mendola


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 16:23:57
Message-ID: 200408271623.i7RGNvK05464@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


I had FILLFACTOR in the TODO list until just a few months ago, but
because no one had discussed it in 3-4 years, I removed the item. I
have added mention now in the auto-cluster section because that actually
seems like the only good reason for a non-100% fillfactor. I don't
think our ordinary btrees have enough of a penalty for splits to make a
non-full fillfactor worthwhile, but having a non-full fillfactor for
autocluster controls how often items have to be shifted around.

---------------------------------------------------------------------------

Gaetano Mendola wrote:
> Greg Stark wrote:
>
> > The discussions before talked about a mechanism to try to place new
> > tuples as close as possible to the proper index position.
>
> Means this that an index shall have a "fill factor" property, similar to
> Informix one ?
>
> From the manual:
>
>
> The FILLFACTOR option takes effect only when you build an index on a table
> that contains more than 5,000 rows and uses more than 100 table pages, when
> you create an index on a fragmented table, or when you create a fragmented
> index on a nonfragmented table.
> Use the FILLFACTOR option to provide for expansion of an index at a later
> date or to create compacted indexes.
> When the index is created, the database server initially fills only that
> percentage of the nodes specified with the FILLFACTOR value.
>
> # Providing a Low Percentage Value
> If you provide a low percentage value, such as 50, you allow room for growth
> in your index. The nodes of the index initially fill to a certain percentage and
> contain space for inserts. The amount of available space depends on the
> number of keys in each page as well as the percentage value.
> For example, with a 50-percent FILLFACTOR value, the page would be half
> full and could accommodate doubling in size. A low percentage value can
> result in faster inserts and can be used for indexes that you expect to grow.
>
>
> # Providing a High Percentage Value
> If you provide a high percentage value, such as 99, your indexes are
> compacted, and any new index inserts result in splitting nodes. The
> maximum density is achieved with 100 percent. With a 100-percent
> FILLFACTOR value, the index has no room available for growth; any
> additions to the index result in splitting the nodes.
> A 99-percent FILLFACTOR value allows room for at least one insertion per
> node. A high percentage value can result in faster selects and can be used for
> indexes that you do not expect to grow or for mostly read-only indexes.
>
>
>
>
> Regards
> Gaetano Mendola
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 16:31:06
Message-ID: 200408270931.06191.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruce,

What happened to the B-Tree Table patch discussed on Hackers ad nauseum last
winter?

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 16:32:43
Message-ID: 200408271632.i7RGWh206951@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Josh Berkus wrote:
> Bruce,
>
> What happened to the B-Tree Table patch discussed on Hackers ad nauseum last
> winter?

I don't remember that. The only issue I remember is sorting btree index
by heap tid on creation. We eventually got that into CVS for 8.0.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Adi Alurkar <adi(at)sf(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 16:51:00
Message-ID: 4661B1AE-F849-11D8-8B8C-000A95C4BD7A@sf.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greetings,

I am not sure if this applies only to clustering but for storage in
general,

IIRC Oracle has 2 parameters that can be set at table creation :
from Oracle docs

PCTFREE integer :
Specify the percentage of space in each data block of the table, object
table OID index, or partition reserved for future updates to the
table's rows. The value of PCTFREE must be a value from 0 to 99. A
value of 0 allows the entire block to be filled by inserts of new rows.
The default value is 10. This value reserves 10% of each block for
updates to existing rows and allows inserts of new rows to fill a
maximum of 90% of each block.
PCTFREE has the same function in the PARTITION description and in the
statements that create and alter clusters, indexes, materialized views,
and materialized view logs. The combination of PCTFREE and PCTUSED
determines whether new rows will be inserted into existing data blocks
or into new blocks.

PCTUSED integer
Specify the minimum percentage of used space that Oracle maintains for
each data block of the table, object table OID index, or
index-organized table overflow data segment. A block becomes a
candidate for row insertion when its used space falls below PCTUSED.
PCTUSED is specified as a positive integer from 0 to 99 and defaults to
40.
PCTUSED has the same function in the PARTITION description and in the
statements that create and alter clusters, materialized views, and
materialized view logs.
PCTUSED is not a valid table storage characteristic for an
index-organized table (ORGANIZATION INDEX).
The sum of PCTFREE and PCTUSED must be equal to or less than 100. You
can use PCTFREE and PCTUSED together to utilize space within a table
more efficiently.

PostgreSQL could take some hints from the above.

On Aug 27, 2004, at 1:26 AM, Gaetano Mendola wrote:

> Greg Stark wrote:
>
>> The discussions before talked about a mechanism to try to place new
> > tuples as close as possible to the proper index position.
>
> Means this that an index shall have a "fill factor" property, similar
> to
> Informix one ?
>
> From the manual:
>
>
> The FILLFACTOR option takes effect only when you build an index on a
> table
> that contains more than 5,000 rows and uses more than 100 table pages,
> when
> you create an index on a fragmented table, or when you create a
> fragmented
> index on a nonfragmented table.
> Use the FILLFACTOR option to provide for expansion of an index at a
> later
> date or to create compacted indexes.
> When the index is created, the database server initially fills only
> that
> percentage of the nodes specified with the FILLFACTOR value.
>
> # Providing a Low Percentage Value
> If you provide a low percentage value, such as 50, you allow room for
> growth
> in your index. The nodes of the index initially fill to a certain
> percentage and
> contain space for inserts. The amount of available space depends on the
> number of keys in each page as well as the percentage value.
> For example, with a 50-percent FILLFACTOR value, the page would be half
> full and could accommodate doubling in size. A low percentage value can
> result in faster inserts and can be used for indexes that you expect
> to grow.
>
>
> # Providing a High Percentage Value
> If you provide a high percentage value, such as 99, your indexes are
> compacted, and any new index inserts result in splitting nodes. The
> maximum density is achieved with 100 percent. With a 100-percent
> FILLFACTOR value, the index has no room available for growth; any
> additions to the index result in splitting the nodes.
> A 99-percent FILLFACTOR value allows room for at least one insertion
> per
> node. A high percentage value can result in faster selects and can be
> used for
> indexes that you do not expect to grow or for mostly read-only indexes.
>
>
>
>
> Regards
> Gaetano Mendola
>
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>
>
--
Adi Alurkar (DBA sf.NET) <adi(at)vasoftware(dot)com>
1024D/79730470 A491 5724 74DE 956D 06CB D844 6DF1 B972 7973 0470


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Adi Alurkar <adi(at)sf(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 17:27:07
Message-ID: 200408271727.i7RHR7Z13191@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


But what is the advantage of non-full pages in Oracle?

---------------------------------------------------------------------------

Adi Alurkar wrote:
> Greetings,
>
> I am not sure if this applies only to clustering but for storage in
> general,
>
> IIRC Oracle has 2 parameters that can be set at table creation :
> from Oracle docs
>
> PCTFREE integer :
> Specify the percentage of space in each data block of the table, object
> table OID index, or partition reserved for future updates to the
> table's rows. The value of PCTFREE must be a value from 0 to 99. A
> value of 0 allows the entire block to be filled by inserts of new rows.
> The default value is 10. This value reserves 10% of each block for
> updates to existing rows and allows inserts of new rows to fill a
> maximum of 90% of each block.
> PCTFREE has the same function in the PARTITION description and in the
> statements that create and alter clusters, indexes, materialized views,
> and materialized view logs. The combination of PCTFREE and PCTUSED
> determines whether new rows will be inserted into existing data blocks
> or into new blocks.
>
> PCTUSED integer
> Specify the minimum percentage of used space that Oracle maintains for
> each data block of the table, object table OID index, or
> index-organized table overflow data segment. A block becomes a
> candidate for row insertion when its used space falls below PCTUSED.
> PCTUSED is specified as a positive integer from 0 to 99 and defaults to
> 40.
> PCTUSED has the same function in the PARTITION description and in the
> statements that create and alter clusters, materialized views, and
> materialized view logs.
> PCTUSED is not a valid table storage characteristic for an
> index-organized table (ORGANIZATION INDEX).
> The sum of PCTFREE and PCTUSED must be equal to or less than 100. You
> can use PCTFREE and PCTUSED together to utilize space within a table
> more efficiently.
>
> PostgreSQL could take some hints from the above.
>
> On Aug 27, 2004, at 1:26 AM, Gaetano Mendola wrote:
>
> > Greg Stark wrote:
> >
> >> The discussions before talked about a mechanism to try to place new
> > > tuples as close as possible to the proper index position.
> >
> > Means this that an index shall have a "fill factor" property, similar
> > to
> > Informix one ?
> >
> > From the manual:
> >
> >
> > The FILLFACTOR option takes effect only when you build an index on a
> > table
> > that contains more than 5,000 rows and uses more than 100 table pages,
> > when
> > you create an index on a fragmented table, or when you create a
> > fragmented
> > index on a nonfragmented table.
> > Use the FILLFACTOR option to provide for expansion of an index at a
> > later
> > date or to create compacted indexes.
> > When the index is created, the database server initially fills only
> > that
> > percentage of the nodes specified with the FILLFACTOR value.
> >
> > # Providing a Low Percentage Value
> > If you provide a low percentage value, such as 50, you allow room for
> > growth
> > in your index. The nodes of the index initially fill to a certain
> > percentage and
> > contain space for inserts. The amount of available space depends on the
> > number of keys in each page as well as the percentage value.
> > For example, with a 50-percent FILLFACTOR value, the page would be half
> > full and could accommodate doubling in size. A low percentage value can
> > result in faster inserts and can be used for indexes that you expect
> > to grow.
> >
> >
> > # Providing a High Percentage Value
> > If you provide a high percentage value, such as 99, your indexes are
> > compacted, and any new index inserts result in splitting nodes. The
> > maximum density is achieved with 100 percent. With a 100-percent
> > FILLFACTOR value, the index has no room available for growth; any
> > additions to the index result in splitting the nodes.
> > A 99-percent FILLFACTOR value allows room for at least one insertion
> > per
> > node. A high percentage value can result in faster selects and can be
> > used for
> > indexes that you do not expect to grow or for mostly read-only indexes.
> >
> >
> >
> >
> > Regards
> > Gaetano Mendola
> >
> >
> >
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 7: don't forget to increase your free space map settings
> >
> >
> --
> Adi Alurkar (DBA sf.NET) <adi(at)vasoftware(dot)com>
> 1024D/79730470 A491 5724 74DE 956D 06CB D844 6DF1 B972 7973 0470
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jeremy Dunn" <jdunn(at)autorevenue(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 17:39:35
Message-ID: 007e01c48c5c$d15f4c60$4f01a8c0@jeremydunn
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org
> [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of
> Bruce Momjian
> Sent: Friday, August 27, 2004 1:27 PM
> To: Adi Alurkar
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?
>
>
>
> But what is the advantage of non-full pages in Oracle?
>

One advantage has to do with updates of variable-length columns, e.g.
varchars.

If the block is fully packed with data, an update to a varchar column
that makes the column wider, causes "row-chaining". This means that a
portion of the row is stored in a different data block, which may be
somewhere completely different in the storage array. Retrieving that
row (or even just that column from that row) as a unit may now require
additional disk seek(s).

Leaving some space for updates in each data block doesn't prevent this
problem completely, but mitigates it to a certain extent. If for
instance a row is typically inserted with a null value for a varchar
column, but the application developer knows it will almost always get
updated with some value later on, then leaving a certain percentage of
empty space in each block allocated to that table makes sense.

Conversely, if you know that your data is never going to get updated
(e.g. a data warehousing application), you might specify to pack the
blocks as full as possible. This makes for the most efficient data
retrieval performance.

- Jeremy


From: Adi Alurkar <adi(at)sf(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 17:39:38
Message-ID: 118B3D82-F850-11D8-8B8C-000A95C4BD7A@sf.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

IIRC it it to reduce the "overflow" of data or what oracle calls
chained rows. i.e if a table has variable length columns and 10 rows
get inserted into a datapage, if this datapage is full and one of the
variable length field gets updated the row will now "overflow" into
another datapage, but if the datapage is created with an appropriate
amount of free space the updated row will be stored in one single
datapage.

On Aug 27, 2004, at 10:27 AM, Bruce Momjian wrote:

>
> But what is the advantage of non-full pages in Oracle?
>
> -----------------------------------------------------------------------
> ----
>
> Adi Alurkar wrote:
>> Greetings,
>>
>> I am not sure if this applies only to clustering but for storage in
>> general,
>>
>> IIRC Oracle has 2 parameters that can be set at table creation :
>> from Oracle docs
>>
>> PCTFREE integer :
>> Specify the percentage of space in each data block of the table,
>> object
>> table OID index, or partition reserved for future updates to the
>> table's rows. The value of PCTFREE must be a value from 0 to 99. A
>> value of 0 allows the entire block to be filled by inserts of new
>> rows.
>> The default value is 10. This value reserves 10% of each block for
>> updates to existing rows and allows inserts of new rows to fill a
>> maximum of 90% of each block.
>> PCTFREE has the same function in the PARTITION description and in the
>> statements that create and alter clusters, indexes, materialized
>> views,
>> and materialized view logs. The combination of PCTFREE and PCTUSED
>> determines whether new rows will be inserted into existing data blocks
>> or into new blocks.
>>
>> PCTUSED integer
>> Specify the minimum percentage of used space that Oracle maintains for
>> each data block of the table, object table OID index, or
>> index-organized table overflow data segment. A block becomes a
>> candidate for row insertion when its used space falls below PCTUSED.
>> PCTUSED is specified as a positive integer from 0 to 99 and defaults
>> to
>> 40.
>> PCTUSED has the same function in the PARTITION description and in the
>> statements that create and alter clusters, materialized views, and
>> materialized view logs.
>> PCTUSED is not a valid table storage characteristic for an
>> index-organized table (ORGANIZATION INDEX).
>> The sum of PCTFREE and PCTUSED must be equal to or less than 100. You
>> can use PCTFREE and PCTUSED together to utilize space within a table
>> more efficiently.
>>
>> PostgreSQL could take some hints from the above.
>>
>> On Aug 27, 2004, at 1:26 AM, Gaetano Mendola wrote:
>>
>>> Greg Stark wrote:
>>>
>>>> The discussions before talked about a mechanism to try to place new
>>>> tuples as close as possible to the proper index position.
>>>
>>> Means this that an index shall have a "fill factor" property, similar
>>> to
>>> Informix one ?
>>>
>>> From the manual:
>>>
>>>
>>> The FILLFACTOR option takes effect only when you build an index on a
>>> table
>>> that contains more than 5,000 rows and uses more than 100 table
>>> pages,
>>> when
>>> you create an index on a fragmented table, or when you create a
>>> fragmented
>>> index on a nonfragmented table.
>>> Use the FILLFACTOR option to provide for expansion of an index at a
>>> later
>>> date or to create compacted indexes.
>>> When the index is created, the database server initially fills only
>>> that
>>> percentage of the nodes specified with the FILLFACTOR value.
>>>
>>> # Providing a Low Percentage Value
>>> If you provide a low percentage value, such as 50, you allow room for
>>> growth
>>> in your index. The nodes of the index initially fill to a certain
>>> percentage and
>>> contain space for inserts. The amount of available space depends on
>>> the
>>> number of keys in each page as well as the percentage value.
>>> For example, with a 50-percent FILLFACTOR value, the page would be
>>> half
>>> full and could accommodate doubling in size. A low percentage value
>>> can
>>> result in faster inserts and can be used for indexes that you expect
>>> to grow.
>>>
>>>
>>> # Providing a High Percentage Value
>>> If you provide a high percentage value, such as 99, your indexes are
>>> compacted, and any new index inserts result in splitting nodes. The
>>> maximum density is achieved with 100 percent. With a 100-percent
>>> FILLFACTOR value, the index has no room available for growth; any
>>> additions to the index result in splitting the nodes.
>>> A 99-percent FILLFACTOR value allows room for at least one insertion
>>> per
>>> node. A high percentage value can result in faster selects and can be
>>> used for
>>> indexes that you do not expect to grow or for mostly read-only
>>> indexes.
>>>
>>>
>>>
>>>
>>> Regards
>>> Gaetano Mendola
>>>
>>>
>>>
>>>
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 7: don't forget to increase your free space map settings
>>>
>>>
>> --
>> Adi Alurkar (DBA sf.NET) <adi(at)vasoftware(dot)com>
>> 1024D/79730470 A491 5724 74DE 956D 06CB D844 6DF1 B972 7973 0470
>>
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania
> 19073
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
> joining column's datatypes do not match
>
>
--
Adi Alurkar (DBA sf.NET) <adi(at)vasoftware(dot)com>
1024D/79730470 A491 5724 74DE 956D 06CB D844 6DF1 B972 7973 0470


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Adi Alurkar <adi(at)sf(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 17:48:41
Message-ID: 200408271748.i7RHmfJ15766@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Adi Alurkar wrote:
> IIRC it it to reduce the "overflow" of data or what oracle calls
> chained rows. i.e if a table has variable length columns and 10 rows
> get inserted into a datapage, if this datapage is full and one of the
> variable length field gets updated the row will now "overflow" into
> another datapage, but if the datapage is created with an appropriate
> amount of free space the updated row will be stored in one single
> datapage.

Agreed. What I am wondering is with our system where every update gets
a new row, how would this help us? I know we try to keep an update on
the same row as the original, but is there any significant performance
benefit to doing that which would offset the compaction advantage?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Adi Alurkar <adi(at)sf(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 18:19:29
Message-ID: 1082.1093630769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Agreed. What I am wondering is with our system where every update gets
> a new row, how would this help us? I know we try to keep an update on
> the same row as the original, but is there any significant performance
> benefit to doing that which would offset the compaction advantage?

Because Oracle uses overwrite-in-place (undoing from an UNDO log on
transaction abort), while we always write a whole new row, it would take
much larger PCTFREE wastage to get a useful benefit in PG than it does
in Oracle. That wastage translates directly into increased I/O costs,
so I'm a bit dubious that we should assume there is a win to be had here
just because Oracle offers the feature.

regards, tom lane


From: Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 18:26:39
Message-ID: z1LXc.66099$X12.19104@edtnps84
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I think you've probably fingered the kicker of why PG doesn't have this
kind of clustering already. Hence perhaps the need for other approaches
to the issue (the disk-IO efficiency of reading groups of rows related
by a common key) that other DB's (with in-place update) address with
synchronous clustering ('heap rebalancing' ?).

Bruce Momjian wrote:
> Adi Alurkar wrote:
>
>>IIRC it it to reduce the "overflow" of data or what oracle calls
>>chained rows. i.e if a table has variable length columns and 10 rows
>>get inserted into a datapage, if this datapage is full and one of the
>>variable length field gets updated the row will now "overflow" into
>>another datapage, but if the datapage is created with an appropriate
>>amount of free space the updated row will be stored in one single
>>datapage.
>
>
> Agreed. What I am wondering is with our system where every update gets
> a new row, how would this help us? I know we try to keep an update on
> the same row as the original, but is there any significant performance
> benefit to doing that which would offset the compaction advantage?
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Adi Alurkar <adi(at)sf(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 19:31:22
Message-ID: 87fz68ml85.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> Agreed. What I am wondering is with our system where every update gets
> a new row, how would this help us? I know we try to keep an update on
> the same row as the original, but is there any significant performance
> benefit to doing that which would offset the compaction advantage?

Hm. Posit a system where all transactions are short updates executed in
autocommit mode.

In such a system as soon as a transaction commits it would take a very short
time before the previous record was a dead tuple.

If every backend kept a small list of tuples it had marked deleted and
whenever it was idle checked to see if they were dead yet, it might avoid much
of the need for vacuum. And in such a circumstance I think you wouldn't need
more than a pctfree of 50% even on a busy table. Every tuple would need about
one extra slot.

This would only be a reasonable idea if a) if the list of potential dead
tuples is short and if it overflows it just forgets them leaving them for
vacuum to deal with. and b) It only checks the potentially dead tuples when
the backend is otherwise idle.

Even so it would be less efficient than a batch vacuum, and it would be taking
up i/o bandwidth (to maintain indexes even if the heap buffer is in ram), even
if that backend is idle it doesn't mean other backends couldn't have used that
i/o bandwidth.

But I think it would deal with a lot of the complaints about vacuum and it
would make it more feasible to use a pctfree parameter to make clustering more
effective.

--
greg


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Adi Alurkar <adi(at)sf(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 19:34:57
Message-ID: 87acwgml26.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> but is there any significant performance benefit to doing that which would
> offset the compaction advantage?

Just as a side comment. Setting PCTFREE 0 PCTUSED 100 on tables that have no
updates on them has an astonishingly big effect on speed. So the penalty for
leaving some space free really is substantial.

I think the other poster is right. Oracle really needs pctfree because of the
way it handles updates. Postgres doesn't really need as much because it
doesn't try to squeeze the new tuple in the space the old one took up. If it
doesn't fit on the page the worst that happens is it has to store it on some
other page, whereas oracle has to do its strange row chaining thing.

--
greg


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Adi Alurkar <adi(at)sf(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 19:42:50
Message-ID: 200408271942.i7RJgoR03090@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg Stark wrote:
>
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > but is there any significant performance benefit to doing that which would
> > offset the compaction advantage?
>
> Just as a side comment. Setting PCTFREE 0 PCTUSED 100 on tables that have no
> updates on them has an astonishingly big effect on speed. So the penalty for
> leaving some space free really is substantial.
>
> I think the other poster is right. Oracle really needs pctfree because of the
> way it handles updates. Postgres doesn't really need as much because it
> doesn't try to squeeze the new tuple in the space the old one took up. If it
> doesn't fit on the page the worst that happens is it has to store it on some
> other page, whereas oracle has to do its strange row chaining thing.

Oracle also does that chain thing so moving updates to different pages
might have more of an impact than it does on PostgreSQL. We have chains
too but just for locking. Not sure on Oracle.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Mischa Sandberg <ischamay(dot)andbergsay(at)activestateway(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-27 19:50:12
Message-ID: UfMXc.66117$X12.27750@edtnps84
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

This discussion is starting to sound like the split in HEAP memory
management evolution, into garbage-collecting (e.g. Java) and
non-garbage-collecting (e.g. C++).

Reclamation by GC's these days has become seriously sophisticated.
CLUSTER resembles the first generation of GC's, which were
single-big-pass hold-everything-else threads.

Perhaps the latest in incremental GC algorithms would be worth scouting,
for the next step in PG page management.

Greg Stark wrote:

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
>>but is there any significant performance benefit to doing that which would
>>offset the compaction advantage?
>
> Just as a side comment. Setting PCTFREE 0 PCTUSED 100 on tables that have no
> updates on them has an astonishingly big effect on speed. So the penalty for
> leaving some space free really is substantial.
>
> I think the other poster is right. Oracle really needs pctfree because of the
> way it handles updates. Postgres doesn't really need as much because it
> doesn't try to squeeze the new tuple in the space the old one took up. If it
> doesn't fit on the page the worst that happens is it has to store it on some
> other page, whereas oracle has to do its strange row chaining thing.


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-28 01:02:47
Message-ID: 412FD9B7.4080207@bigfoot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:

> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
>>Agreed. What I am wondering is with our system where every update gets
>>a new row, how would this help us? I know we try to keep an update on
>>the same row as the original, but is there any significant performance
>>benefit to doing that which would offset the compaction advantage?
>
>
> Because Oracle uses overwrite-in-place (undoing from an UNDO log on
> transaction abort), while we always write a whole new row, it would take
> much larger PCTFREE wastage to get a useful benefit in PG than it does
> in Oracle. That wastage translates directly into increased I/O costs,
> so I'm a bit dubious that we should assume there is a win to be had here
> just because Oracle offers the feature.

Mmmm. Consider this scenario:

ctid datas
(0,1) yyy-xxxxxxxxxxxxxxxxxxx
(0,2) -------- EMPTY --------
(0,3) -------- EMPTY --------
(0,4) -------- EMPTY --------
(0,5) -------- EMPTY --------
(0,6) yyy-xxxxxxxxxxxxxxxxxxx
(0,7) -------- EMPTY --------
.... -------- EMPTY --------
(0,11) yyy-xxxxxxxxxxxxxxxxxxx

the row (0,2) --> (0,5) are space available for the (0,1) updates.
This will help a table clustered ( for example ) to mantain his
own correct cluster order.

Regards
Gaetano Mendola


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "J(dot) Andrew Rogers" <jrogers(at)neopolitan(dot)com>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-performance(at)postgresql(dot)org, mischa(dot)sandberg(at)telus(dot)net
Subject: Re: Equivalent praxis to CLUSTERED INDEX?
Date: 2004-08-31 22:05:03
Message-ID: 20040831220503.GK78395@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, Aug 26, 2004 at 11:39:42PM -0400, Greg Stark wrote:
>
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>
> > Updated TODO item:
> >
> > o Automatically maintain clustering on a table
> >
> > This would require some background daemon to maintain clustering
> > during periods of low usage. It might also require tables to be only
> > paritally filled for easier reorganization. It also might require
> > creating a merged heap/index data file so an index lookup would
> > automatically access the heap data too.
>
> Fwiw, I would say the first "would" is also a "might". None of the previous
> discussions here presumed a maintenance daemon. The discussions before talked
> about a mechanism to try to place new tuples as close as possible to the
> proper index position.
>
> I would also suggest making some distinction between a cluster system similar
> to what we have now but improved to maintain the clustering continuously, and
> an actual index-organized-table where the tuples are actually only stored in a
> btree structure.
>
> They're two different approaches to similar problems. But they might both be
> useful to have, and have markedly different implementation details.

There's a third approach that I think is worth considering. Half of the
benefit to clustered tables is limiting the number of pages you need to
access when scanning the primary key. The order of tuples in the pages
themselves isn't nearly as important as ordering of the pages. This
means you can get most of the benefit of an index-organized table just
by being careful about what page you place a tuple on. What I'm thinking
of is some means to ensure all the tuples on a page are within some PK
range, but not worrying about the exact order within the page since it's
relatively cheap to scan through the page in memory.

Some pros:
This would probably mean less change to the code that inserts tuples.

No need for a background daemon.

No need to create a new B-Tree table structure.

Ideally, there won't be need to move tuples around, which should mean
that current indexing code doesn't need to change.

Cons:
Need to have some way to deal with pages that fill up.

To gain full benefit some means of indicating what range of PK values
are on a page might be needed.

It's not as beneficial as a true IOT since you don't get the benefit of
storing your tuples inline with your B-Tree.

I'm sure there's a ton of things I'm missing, especially since I'm not
familiar with the postgresql code, but hopefully others can explore this
further.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"