Re: BTree index

From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: vjanand(at)uwm(dot)edu
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: BTree index
Date: 2003-11-09 21:57:28
Message-ID: 20031109215728.GA29601@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 05, 2003 at 09:08:31AM -0600, vjanand(at)uwm(dot)edu wrote:

> I am trying to find information regarding creation of B-tree index in postgres
> for variable length character data (Char/varchar type). Specifically, what
> pagination policy is used, does it use prefix BTree, or any other form of
> compression (encoding)?

I was very surprised while writing this answer:

The whole key is stored (no prefix, no pagination). If the key is too
big, it won't fit into the index and the insertion will be rejected:

regression=# create table test (a text);
CREATE TABLE
regression=# create index test_idx on test(a);
CREATE INDEX
regression=# insert into test values ('hello world');
INSERT 17115 1
regression=# insert into test select repeat(a,10) from test;
INSERT 17116 1
regression=# insert into test select repeat(a,10) from test;
INSERT 0 2
regression=# insert into test select repeat(a,10) from test;
INSERT 0 4
regression=# insert into test select repeat(a,10) from test;
INSERT 0 8
regression=# insert into test select repeat(a,10) from test;
ERROR: fila de índice requiere 12624 bytes, tamaño máximo es 8191
-- oops
regression=# set lc_messages to 'C';
SET
regression=# insert into test select repeat(a,10) from test;
ERROR: index row requires 12624 bytes, maximum size is 8191

So, what size were the tuples inserted:
regression=# select max(length(a)) from test;
max
--------
110000
(1 fila)

What!? 110000 bytes? I have always had the idea that the tuples were
uncompressed, so how can 110000 bytes be stored in 8191 bytes? After
tracking into the sourcecode, I found that in
src/backend/access/common/indextuple.c the index_formtuple routine seems
to compress the key before insertion. In src/include/tuptoaster.h there
is a symbol for activation of this feature that is set at least on my
sources (TOAST_INDEX_HACK).

So, there you are: the compression used is the same "lousy fast" LZ
algorithm used elsewhere in the TOAST code (toast_compress_datum()).

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
FOO MANE PADME HUM

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Larry Rosenman 2003-11-09 22:18:43 Re: OSR5: Passes without the setsockopt() calls...
Previous Message Hannu Krosing 2003-11-09 21:56:51 Re: UPPER()/LOWER() and UTF-8