Re: Index Tuple Compression Approach?

Lists: pgsql-hackers
From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Index Tuple Compression Approach?
Date: 2007-08-14 21:21:16
Message-ID: 604pj1n777.fsf_-_@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I recently had a chat with someone who was pretty intimate with Adabas
for a number of years who's in the process of figuring things out
about PostgreSQL. We poked at bits of the respective implementations,
seeing some similarities and differences. He pointed out one aspect
of index handling that could (in principle) be an interesting
optimization.

Evidently, in Adabas, index leaf nodes were not simply tuples, but
lists where the index value would not be repeated.

In PostgreSQL, if you have the index value 'abc', and there are 10
tuples with that value, then you'll have a page full of tuples of the
following form:

|abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...

Now, the Adabas approach was rather different. It would only have the
index value once, and then have the list of tuple pointers:

|abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|

This could allow a fair bit of compression, for cases where the index
value is not unique.

There is a concommitant downside, that concurrent updates may fight
over a page, and, since there would be a higher density, there would
be more need to fight over pages.

Does this seem pretty much like madness? Or is it a plausible "some
day ToDo"?
--
"cbbrowne","@","acm.org"
http://linuxfinances.info/info/postgresql.html
"I don't do drugs anymore 'cause I find I get the same effect just by
standing up really fast." -- Jonathan Katz


From: Decibel! <decibel(at)decibel(dot)org>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-14 21:27:39
Message-ID: 20070814212739.GS54135@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Isn't this what Grouped Index Tuples is?

On Tue, Aug 14, 2007 at 05:21:16PM -0400, Chris Browne wrote:
> I recently had a chat with someone who was pretty intimate with Adabas
> for a number of years who's in the process of figuring things out
> about PostgreSQL. We poked at bits of the respective implementations,
> seeing some similarities and differences. He pointed out one aspect
> of index handling that could (in principle) be an interesting
> optimization.
>
> Evidently, in Adabas, index leaf nodes were not simply tuples, but
> lists where the index value would not be repeated.
>
> In PostgreSQL, if you have the index value 'abc', and there are 10
> tuples with that value, then you'll have a page full of tuples of the
> following form:
>
> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>
> Now, the Adabas approach was rather different. It would only have the
> index value once, and then have the list of tuple pointers:
>
> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>
> This could allow a fair bit of compression, for cases where the index
> value is not unique.
>
> There is a concommitant downside, that concurrent updates may fight
> over a page, and, since there would be a higher density, there would
> be more need to fight over pages.
>
> Does this seem pretty much like madness? Or is it a plausible "some
> day ToDo"?
> --
> "cbbrowne","@","acm.org"
> http://linuxfinances.info/info/postgresql.html
> "I don't do drugs anymore 'cause I find I get the same effect just by
> standing up really fast." -- Jonathan Katz
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Decibel!, aka Jim Nasby decibel(at)decibel(dot)org
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org, heikki(at)enterprisedb(dot)com
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-14 23:58:27
Message-ID: 1187135907.24264.44.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
> Isn't this what Grouped Index Tuples is?
>

http://community.enterprisedb.com/git/git-readme.txt

It looks like GIT is a little different.

GIT actually stores a lower-bound key of a contiguous* range of keys
that all point to the same page, and for each of those ranges stores a
bitmap of page offsets. A search searches first for an exact match in
the index, and failing that, looks to see if the previous index tuple
happens to be one of these ranges.

The algorithm Chris is talking about stores a set of tuple ids (which
include page and offset) for each distinct key.

Both could be helpful, although I don't think they can work together
very well.

GIT has the disadvantage that it's "lossy". It doesn't even store every
key in the index, so it can't be sure that the match actually is a
match. Thus, it returns candidate matches. That also has implications
for enforcing UNIQUE (although it's not impossible, according to the
readme). However, GIT can be used effectively on an index that happens
to be unique. GIT also assumes a tree structure, and makes no sense for
a hash index, and makes no sense for a types without ordering. GIT's
space savings is dependent on the clustering of the table.

Chris's suggestion would work on a UNIQUE index, but would be no help at
all, because there would be no duplicate keys to collapse. However, it
could be used for non-tree indexes. The space savings for this strategy
is dependent on how repetitive the keys are.

I guess the ultimate deciding factor is which can save you more space.
If you have lots of duplicates, Chris's suggestion might work better,
because you don't have to try to maintain cluster order. If you have a
wider distribution of data, GIT is probably better, although you have to
keep some degree of clustering (HOT may help with that).

Heikki is the authority on GIT, so I'm including him in the CC so he can
correct me :)

Regards,
Jeff Davis

*: I'm not 100% sure I'm using "contiguous" correctly, but the range of
keys can contain gaps or duplicates, so long as every key in the range
points to that same page. That is, if keys 1,1,2,3,5 all point to page
P, they can be grouped into just "1" so long as there doesn't exist a
key 4 that points to a page other than P.


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: "Decibel!" <decibel(at)decibel(dot)org>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 05:51:28
Message-ID: 46C29460.8070002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
>> Isn't this what Grouped Index Tuples is?
>
> http://community.enterprisedb.com/git/git-readme.txt
>
> It looks like GIT is a little different.
>
> GIT actually stores a lower-bound key of a contiguous* range of keys
> that all point to the same page, and for each of those ranges stores a
> bitmap of page offsets. A search searches first for an exact match in
> the index, and failing that, looks to see if the previous index tuple
> happens to be one of these ranges.
>
> The algorithm Chris is talking about stores a set of tuple ids (which
> include page and offset) for each distinct key.

Right.

> Both could be helpful, although I don't think they can work together
> very well.

What Chris is suggesting is basically a special case of GIT, where all
the heap tuples represented by an index tuple have the same key. I was
actually thinking of adding a flag to index tuples to indicate that
special case in GIT. We could effectively do both.

> GIT has the disadvantage that it's "lossy". It doesn't even store every
> key in the index, so it can't be sure that the match actually is a
> match. Thus, it returns candidate matches. That also has implications
> for enforcing UNIQUE (although it's not impossible, according to the
> readme). However, GIT can be used effectively on an index that happens
> to be unique. GIT also assumes a tree structure, and makes no sense for
> a hash index, and makes no sense for a types without ordering. GIT's
> space savings is dependent on the clustering of the table.
>
> Chris's suggestion would work on a UNIQUE index, but would be no help at
> all, because there would be no duplicate keys to collapse. However, it
> could be used for non-tree indexes. The space savings for this strategy
> is dependent on how repetitive the keys are.

Right. I wasn't concerned about the case of a lot of duplicate keys,
because the bitmap index is more efficient for that. And also because
HOT should reduce the number of index tuples with duplicate keys
pointing to different versions of the same row.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Decibel!" <decibel(at)decibel(dot)org>, "Chris Browne" <cbbrowne(at)acm(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 10:34:08
Message-ID: 1187174048.4157.42.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
> >> Isn't this what Grouped Index Tuples is?
> >
> > http://community.enterprisedb.com/git/git-readme.txt
> >
> > It looks like GIT is a little different.
> >
> > GIT actually stores a lower-bound key of a contiguous* range of keys
> > that all point to the same page, and for each of those ranges stores a
> > bitmap of page offsets. A search searches first for an exact match in
> > the index, and failing that, looks to see if the previous index tuple
> > happens to be one of these ranges.
> >
> > The algorithm Chris is talking about stores a set of tuple ids (which
> > include page and offset) for each distinct key.
>
> Right.
>
> > Both could be helpful, although I don't think they can work together
> > very well.
>
> What Chris is suggesting is basically a special case of GIT, where all
> the heap tuples represented by an index tuple have the same key. I was
> actually thinking of adding a flag to index tuples to indicate that
> special case in GIT. We could effectively do both.

A few additional thoughts...

The approach suggested by Chris is also used by Teradata Non-Unique
Secondary Indexes, known as NUSIs (but not named by me!). The following
link is a public domain description that is detailed enough:
http://teradata.uark.edu/research/wang/indexes.html

Replicating this approach directly isn't that useful because it would
interact badly with the way we handle updates. Thinking about the basic
design pattern however, we can envisage a type of index that changes the
1:1 mapping between index and heap tuple into the concept of a "tuple
set index" where we have a 1:Many mapping between index and heap.

In general, the tuple set index approach can significantly reduce index
size. This won't be of interest to anyone unless all of your data
overflows RAM and you need to do I/O to constantly page back in pieces
of your data. If your database does fit in RAM, reducing the number of
index blocks might just increase contention. This means that the tuple
set approach is only useful when you have very large databases, but when
you do it is very, very useful.

GIT is a tuple set index with two important extensions:

1. GIT allows a range of values to be indexed, not just a single value,
so this allows it to be useful for both Unique and Non-Unique cases.

2. GIT restricts the set of tuples to a small range of blocks within the
table. Making the range of blocks = 1 means that GIT is designed to work
well with HOT, which also stays on the same block. Keeping the range of
blocks small means GIT degrades as slowly as possible in the face of
"cold" UPDATEs. If the values are inserted in roughly ordered/clustered
sequence then this doesn't increase index size at all, so the most
common/highest volume use cases are covered.

So from my perspective, GIT is very close to the optimal design for a
tuple set index that addresses the need for high concurrency and
unique/near-uniqueness with PostgreSQL. There are certainly many other
options for a tuple set index, and bitmap indexes are simply another
version of a tuple set index but with different behaviours tuned to a
different use case. There maybe other use cases that require more than
two kinds of tuple set index...and we have discussed implementing the
tuple set behaviour as part of the other main index types.

As an aside, it turns out, after further research that GIT is similar to
a clustered index in SQLServer, as described by Louis Davidson, "Pro SQL
Server 2005 Database Design and Optimization", p.405. SQLServer creates
a clustered index by default on each PK, so it says.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Index Tuple Compression Approach?
Date: 2007-08-15 16:39:46
Message-ID: 758d5e7f0708150939w1a08590aw1abcfcde402122df@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/14/07, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
> I recently had a chat with someone who was pretty intimate with Adabas
> for a number of years who's in the process of figuring things out
> about PostgreSQL. We poked at bits of the respective implementations,
> seeing some similarities and differences. He pointed out one aspect
> of index handling that could (in principle) be an interesting
> optimization.
>
> Evidently, in Adabas, index leaf nodes were not simply tuples, but
> lists where the index value would not be repeated.
>
> In PostgreSQL, if you have the index value 'abc', and there are 10
> tuples with that value, then you'll have a page full of tuples of the
> following form:
>
> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>
> Now, the Adabas approach was rather different. It would only have the
> index value once, and then have the list of tuple pointers:
>
> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>
> This could allow a fair bit of compression, for cases where the index
> value is not unique.

Interesting. Some time ago I've played a little with quite a big table
which constained path (file path) as a primary key. It did have sense
to have a strucure (SELECTs were mostly ORDER BY path WHERE path >
'/foo' LIMIT n).
The actual index was only a little bit smaller than the table (there were
maybe 4 or 5 columns there).

Some time ago I've had an idea that it might be possible to compress
th index size, even if it is a unique index. Take the path example.
My idea would be to to split indexed value to 8-byte chunks.
For example: /var/lib/postgresql/8.2/main would be split into:
"/var/lib"
"/postgre"
"sql/8.2" -- these would be insertered into a tree as a "scaffold",
and only vacuum should remove them..
"main" -- this would be a leaf node. It could be repeated in non-unique
indexes.

[/etc/pas] -- scaffold-node
|-"swd" -- leaf node
[/etc/sha]
|-"dow"
[/var/lib] -- a problematic mixed scaffold/leaf node.
[/postgre]
|-"sql"
|-"sql/8.2"
[sql/8.2/]
|-"main"
|-"foobar"

The scaffold nodes would be there to guarantee that there is some
place to attach leafs to. They should not be removed by DELETE
(unless we are sure no other node depends on them).

Advantages? The repeated values (as "/var/lib/postgresql/8.2")
are not repeated -- they are embedded into tree, as a "scaffold",
actual nodes that are significant (files, not directories, in my
example) are put as actual leafs.

I guess it would reduce large indexes size and at the same time
it could remove limitation that B-tree index cannot index values
larger than 1/3 of the database page. 8-byte chunks was given
as an example here, perhaps larger value would be better.

(Of course redesigning schema to put directories separate from
files woul be useful, but it would not help with ORDER BY .. LIMIT
queries -- they would have to be JOIN-ed and re-sorted in memory
I'm afraid).

Regards,
Dawid


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Decibel! <decibel(at)decibel(dot)org>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 17:41:08
Message-ID: 1187199668.5203.17.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote:
> What Chris is suggesting is basically a special case of GIT, where all
> the heap tuples represented by an index tuple have the same key. I was
> actually thinking of adding a flag to index tuples to indicate that
> special case in GIT. We could effectively do both.
>

The bigger difference that I see is that GIT doesn't just group together
ranges of keys, it also groups by heap page number (or a small range of
page numbers, as Simon pointed out).

Regards,
Jeff Davis


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 19:31:24
Message-ID: 46C3548C.90100@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dawid Kuroczko wrote:
> Some time ago I've had an idea that it might be possible to compress
> th index size, even if it is a unique index. Take the path example.
> My idea would be to to split indexed value to 8-byte chunks.
> For example: /var/lib/postgresql/8.2/main would be split into:
> "/var/lib"
> "/postgre"
> "sql/8.2" -- these would be insertered into a tree as a "scaffold",
> and only vacuum should remove them..
> "main" -- this would be a leaf node. It could be repeated in non-unique
> indexes.

That general approach of storing a common part leading part just once is
called prefix compression. Yeah, it helps a lot on long text fields.
Tree structures like file paths in particular.

It's been discussed before. One big problem is extracting the common
leading part. You could only do it for text, but it should be done in a
datatype neutral way.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 20:03:37
Message-ID: 60r6m4pnty.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

qnex42(at)gmail(dot)com ("Dawid Kuroczko") writes:
> On 8/14/07, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
>> I recently had a chat with someone who was pretty intimate with Adabas
>> for a number of years who's in the process of figuring things out
>> about PostgreSQL. We poked at bits of the respective implementations,
>> seeing some similarities and differences. He pointed out one aspect
>> of index handling that could (in principle) be an interesting
>> optimization.
>>
>> Evidently, in Adabas, index leaf nodes were not simply tuples, but
>> lists where the index value would not be repeated.
>>
>> In PostgreSQL, if you have the index value 'abc', and there are 10
>> tuples with that value, then you'll have a page full of tuples of the
>> following form:
>>
>> |abc|ptr[rec1]|abc|ptr[rec2]|abc|ptr[rec3]| ...and so forth...
>>
>> Now, the Adabas approach was rather different. It would only have the
>> index value once, and then have the list of tuple pointers:
>>
>> |abc|ptr[rec1],ptr[rec2],ptr[rec3],...[ptr[rec10]|
>>
>> This could allow a fair bit of compression, for cases where the index
>> value is not unique.
>
> Interesting. Some time ago I've played a little with quite a big table
> which constained path (file path) as a primary key. It did have sense
> to have a strucure (SELECTs were mostly ORDER BY path WHERE path >
> '/foo' LIMIT n).
> The actual index was only a little bit smaller than the table (there were
> maybe 4 or 5 columns there).
>
> Some time ago I've had an idea that it might be possible to compress
> th index size, even if it is a unique index. Take the path example.
> My idea would be to to split indexed value to 8-byte chunks.
> For example: /var/lib/postgresql/8.2/main would be split into:
> "/var/lib"
> "/postgre"
> "sql/8.2" -- these would be insertered into a tree as a "scaffold",
> and only vacuum should remove them..
> "main" -- this would be a leaf node. It could be repeated in non-unique
> indexes.
>
> [/etc/pas] -- scaffold-node
> |-"swd" -- leaf node
> [/etc/sha]
> |-"dow"
> [/var/lib] -- a problematic mixed scaffold/leaf node.
> [/postgre]
> |-"sql"
> |-"sql/8.2"
> [sql/8.2/]
> |-"main"
> |-"foobar"
>
> The scaffold nodes would be there to guarantee that there is some
> place to attach leafs to. They should not be removed by DELETE
> (unless we are sure no other node depends on them).

Note that there is a well-understood name for this; this is assortedly
known as a "Radix tree" or a "Patricia trie".

<http://en.wikipedia.org/wiki/Radix_tree>

As you observe, the tree/trie edges consist not of individual
characters, but rather of sequences of characters.

> Advantages? The repeated values (as "/var/lib/postgresql/8.2")
> are not repeated -- they are embedded into tree, as a "scaffold",
> actual nodes that are significant (files, not directories, in my
> example) are put as actual leafs.

Certainly as compared to a traditional trie, this representation leads
to there being a whole lot less nodes and a whole lot less branching.

The Radix/Patricia tree compresses things two ways:

- As you observe, there can be fewer, larger componets

- By combining common prefixes together, this makes cases of
strings that are highly patterned much, much, much cheaper, as the
tree branches at (and only at) the places where they tend to
differ.

It could conceivably make it workable to have indexes on big, highly
repeating things such as blobs of XML. (Although it *doesn't* get you
the ability to search on substrings, which is probably what you'd also
want...)

> I guess it would reduce large indexes size and at the same time it
> could remove limitation that B-tree index cannot index values larger
> than 1/3 of the database page. 8-byte chunks was given as an
> example here, perhaps larger value would be better.
>
> (Of course redesigning schema to put directories separate from files
> woul be useful, but it would not help with ORDER BY .. LIMIT queries
> -- they would have to be JOIN-ed and re-sorted in memory I'm
> afraid).

I'll gleefully ignore the nature of the example, as it's kind of
beside the point. The point is to try to compress what's in the
column. If it's being abused somewhat, and has more crud in it, that
gives a potential for a *bigger* win.
--
output = reverse("gro.mca" "@" "enworbbc")
http://linuxdatabases.info/info/spiritual.html
"Documentation wants to be obsolete."
-- Bruce R. Lewis


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-15 20:54:17
Message-ID: 87bqd8y0w6.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

> That general approach of storing a common part leading part just once is
> called prefix compression. Yeah, it helps a lot on long text fields.
> Tree structures like file paths in particular.

You kind of want to do avoid both the prefix and the suffix, no?

> It's been discussed before. One big problem is extracting the common
> leading part. You could only do it for text,

Or for multi-column indexes

I could see this being especially useful if you have some columns in the index
key which are small and some that are quite large. So if you have an event
table with an index on <userid,timestamp> you wouldn't have to store lots of
timestamps on the upper level tree nodes. You would only store them for the
leaf nodes.

> but it should be done in a datatype neutral way.

I wonder if there's an analogous operation for other data types though.
Numeric could store the a value relative to the parent value. Arrays could
store only the elements needed. bytea of course works just as well as text (or
better in the face of i18n).

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Dawid Kuroczko <qnex42(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-16 06:48:00
Message-ID: 46C3F320.1010809@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>
>> That general approach of storing a common part leading part just once is
>> called prefix compression. Yeah, it helps a lot on long text fields.
>> Tree structures like file paths in particular.
>
> You kind of want to do avoid both the prefix and the suffix, no?

You're much more likely to find common prefixes than suffixes in an
index page, because of the ordering. I suppose compressing the suffix
would be useful in some cases as well. You might be better off with some
generic compression algorithm at that point, though.

>> It's been discussed before. One big problem is extracting the common
>> leading part. You could only do it for text,
>
> Or for multi-column indexes

Oh yeah, that you could do more easily.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-16 07:48:31
Message-ID: 873ayjyl68.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

> Gregory Stark wrote:
>> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>>
>>> That general approach of storing a common part leading part just once is
>>> called prefix compression. Yeah, it helps a lot on long text fields.
>>> Tree structures like file paths in particular.
>>
>> You kind of want to do avoid both the prefix and the suffix, no?
>
> You're much more likely to find common prefixes than suffixes in an
> index page, because of the ordering. I suppose compressing the suffix
> would be useful in some cases as well. You might be better off with some
> generic compression algorithm at that point, though.

Sorry, by "suffix" I don't mean common sufixes, I mean the bits of the key
following the point which discriminates between the left and right side of the
tree.

So for example if you're indexing a text field and have a
tree structure like:

Redhat Fedora Core 7
/ \
Debian Etch (Unstable) Ubuntu hoary

We don't really need the whole of "Redhat Fedora Core 7" in the index node. We
could actually get by with just "R". Everything before "R" is on the left and
everything after "R" is on the right.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index Tuple Compression Approach?
Date: 2007-08-16 14:50:36
Message-ID: 608x8bpm83.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

stark(at)enterprisedb(dot)com (Gregory Stark) writes:

> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>
>> Gregory Stark wrote:
>>> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>>>
>>>> That general approach of storing a common part leading part just once is
>>>> called prefix compression. Yeah, it helps a lot on long text fields.
>>>> Tree structures like file paths in particular.
>>>
>>> You kind of want to do avoid both the prefix and the suffix, no?
>>
>> You're much more likely to find common prefixes than suffixes in an
>> index page, because of the ordering. I suppose compressing the suffix
>> would be useful in some cases as well. You might be better off with some
>> generic compression algorithm at that point, though.
>
> Sorry, by "suffix" I don't mean common sufixes, I mean the bits of the key
> following the point which discriminates between the left and right side of the
> tree.
>
> So for example if you're indexing a text field and have a
> tree structure like:
>
> Redhat Fedora Core 7
> / \
> Debian Etch (Unstable) Ubuntu hoary
>
> We don't really need the whole of "Redhat Fedora Core 7" in the index node. We
> could actually get by with just "R". Everything before "R" is on the left and
> everything after "R" is on the right.

Right. The case where you get more characters than just "R" is when
you introduce extra entries that have the same prefix. Say, "Redhat
Fedora Core 4", "Redhat Fedora Core 5", "Redhat Fedora Core 6". And,
for good measure, let's throw in "Redhat RHAS 3", "Redhat RHAS 4", and
"Redhat RHAS 5".

In that case, you'd have substrings:
"R"
"edhat "
"Fedora Core "
"RHAS "
as discriminators.
--
"cbbrowne","@","cbbrowne.com"
http://www3.sympatico.ca/cbbrowne/spreadsheets.html
If a mute swears, does his mother wash his hands with soap?