GiST seems to drop left-branch leaf tuples

Lists: pgsql-hackers
From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: GiST seems to drop left-branch leaf tuples
Date: 2010-11-22 21:18:36
Message-ID: 5D31F5EB-A473-4176-B4BD-77DF184409FD@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have been working on a plugin for GiST that has some unusual features:

* The data type for both Node and Leaf keys is large (typically 4222
bytes on 32-bit; 4230 bytes on 64-bit).

* Due to the large size the storage class is EXTENDED (main would only
degrade to EXTENDED in any case). Tests using EXTERNAL show the same
results so I do not believe compression is an issue.

* This is a hash-type index: the values are large hashes for an audio
fingerprinting application and the sole relationship between keys is a
float match probability between two values. For example, if A matches
B with a score of 0.18 but A matches C with a score of 0.61, A is
closer to C than to B. The distance metric is calculated in reverse
(1.0 - match_score) so the penalty for inserting A under the same Node
as B would be 0.82 (scaled by 1000 to 820).

* The Leaf Keys contain the same data as the rows themselves.

* The Node (union) Keys are:

- a bitwise OR of the lower-level Leaf Keys

- a penalty or consistency comparison of Leaf or Query (L) value
against a Node union-key (N) is effectively a scaled hamming distance
of:
L ^ (L & N)
so if L is under N the check will always return 1.0 (true for
consistency; 0.0 for penalty);
by the same operation, newly inserted values compare closer to
Node (union) keys where the corresponding Leaf keys are closer

- Comparisons between two Nodes, N1 and N2 (used in Same(), for
example) have used a:
-- Tanimoto bit distance popcount(N1 & N2) / popcount(N1 | N2);
-- the same scaled-hamming distance check used for a Leaf against
another Leaf;
-- simple memcmp for identity.

Whatever test I use for Same(), Penalty() and Consistent() does not
seem to affect the problem significantly. For now I am only using
Consistent() as a check for retrieval.

I have developed this under debug source-builds of postgresql 8.4.5
and 9.0.1 on Mac OS X 10.6 (Snow Leopard, x86 64-bit) and Ubuntu Linux
10.04 (Lucid, x86 64-bit and 32-bit). There is no difference between
the platforms or architectures. I am using the standard PostgreSQL
Makefile build setup so compiler flags are the same as used by
PostgreSQL.

The problem may be my own understanding of Picksplit(), Penalty() and
Same(). Using gevel, on a table with 500 newly-inserted rows:

postgres=# \d fps2
Table "public.fps2"
Column | Type | Modifiers
-------------+---------------+-----------
soid | character(24) | not null
fingerprint | fprint | not null
Indexes:
"fps2_fingerprint_ix" gist (fingerprint)

postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
-----------------------------------------
Number of levels: 4 +
Number of pages: 61 +
Number of leaf pages: 42 +
Number of tuples: 193 +
Number of invalid tuples: 0 +
Number of leaf tuples: 133 +
Total size of tuples: 271704 bytes+
Total size of leaf tuples: 187236 bytes+
Total size of index: 499712 bytes+

Note that there are only 133 leaf tuples -- for 500 rows. If the
index process were operating correctly, there should have been 500
leaf tuples there. If I REINDEX the table the number of leaf tuples
may change slightly but not by much. This closely corresponds to a
query:

postgres=# select count(*) from fps2 f1 join fps2 f2 on f1.fingerprint
= f2.fingerprint;
count
-------
133

where = is a match operator that returns rows where the Leaf key
comparison is > .98 (on the scaled probability score pretty much
exactly equal). The above query is using the index:

postgres=# explain select count(*) from fps2 f1 join fps2 f2 on
f1.fingerprint ~= f2.fingerprint;
QUERY PLAN
------------------------------------------------------------------------------------------------
Aggregate (cost=1173.67..1173.68 rows=1 width=0)
-> Nested Loop (cost=0.00..1170.54 rows=1250 width=0)
-> Seq Scan on fps2 f1 (cost=0.00..105.00 rows=500 width=32)
-> Index Scan using fps2_fingerprint_ix on fps2 f2
(cost=0.00..2.11 rows=2 width=32)
Index Cond: (f1.fingerprint ~= f2.fingerprint)

If I use a table scan instead of the index, the query would return:

postgres=# select count(*) from fps2 f1 join fps2 f2 on
fprint_cmp(f1.fingerprint, f2.fingerprint) > .98;
count
-------
500

The GiST tree looks like:

postgres=# select gist_tree('fps2_fingerprint_ix');
gist_tree
----------------------------------------------------------------------------------------------
0(l:0) blk: 0 numTuple: 4 free: 2532b(68.97%) rightlink:4294967295
(InvalidBlockNumber) +
1(l:1) blk: 37 numTuple: 2 free: 5340b(34.56%) rightlink:105
(OK) +
1(l:2) blk: 90 numTuple: 3 free: 3936b(51.76%) rightlink:109
(OK) +
1(l:3) blk: 9 numTuple: 5 free: 1128b(86.18%) rightlink:
94 (OK) +
2(l:3) blk: 114 numTuple: 5 free: 1128b(86.18%)
rightlink:89 (OK) +
3(l:3) blk: 106 numTuple: 5 free: 1128b(86.18%)
rightlink:114 (OK) +
2(l:2) blk: 109 numTuple: 3 free: 3936b(51.76%) rightlink:24
(OK) +
1(l:3) blk: 55 numTuple: 5 free: 1128b(86.18%) rightlink:
45 (OK) +
2(l:3) blk: 94 numTuple: 1 free: 6744b(17.35%) rightlink:
121 (OK) +
3(l:3) blk: 121 numTuple: 5 free: 1128b(86.18%)
rightlink:108 (OK) +
2(l:1) blk: 38 numTuple: 4 free: 2532b(68.97%) rightlink:77
(OK) +
1(l:2) blk: 84 numTuple: 1 free: 6744b(17.35%) rightlink:120
(OK) +
1(l:3) blk: 88 numTuple: 4 free: 2532b(68.97%) rightlink:
110 (OK) +
2(l:2) blk: 120 numTuple: 3 free: 3936b(51.76%) rightlink:42
(OK) +
1(l:3) blk: 5 numTuple: 3 free: 3936b(51.76%) rightlink:
88 (OK) +
2(l:3) blk: 110 numTuple: 4 free: 2532b(68.97%)
rightlink:71 (OK) +
3(l:3) blk: 72 numTuple: 4 free: 2532b(68.97%) rightlink:
119 (OK) +
3(l:2) blk: 96 numTuple: 5 free: 1128b(86.18%) rightlink:84
(OK) +
1(l:3) blk: 93 numTuple: 4 free: 2532b(68.97%) rightlink:
95 (OK) +
2(l:3) blk: 87 numTuple: 1 free: 6744b(17.35%) rightlink:
99 (OK) +
3(l:3) blk: 102 numTuple: 3 free: 3936b(51.76%)
rightlink:93 (OK) +
4(l:3) blk: 91 numTuple: 3 free: 3936b(51.76%) rightlink:
102 (OK) +
5(l:3) blk: 99 numTuple: 5 free: 1128b(86.18%) rightlink:
91 (OK) +
4(l:2) blk: 12 numTuple: 3 free: 3936b(51.76%) rightlink:96
(OK) +
1(l:3) blk: 85 numTuple: 1 free: 6744b(17.35%) rightlink:
98 (OK) +
2(l:3) blk: 98 numTuple: 2 free: 5340b(34.56%) rightlink:
125 (OK) +
3(l:3) blk: 125 numTuple: 3 free: 3936b(51.76%)
rightlink:87 (OK) +
3(l:1) blk: 105 numTuple: 3 free: 3936b(51.76%) rightlink:38
(OK) +
1(l:2) blk: 8 numTuple: 2 free: 5340b(34.56%) rightlink:70
(OK) +
1(l:3) blk: 66 numTuple: 1 free: 6744b(17.35%) rightlink:
97 (OK) +
2(l:3) blk: 97 numTuple: 5 free: 1128b(86.18%) rightlink:
67 (OK) +
2(l:2) blk: 70 numTuple: 3 free: 3936b(51.76%) rightlink:104
(OK) +
1(l:3) blk: 59 numTuple: 1 free: 6744b(17.35%) rightlink:
115 (OK) +
2(l:3) blk: 115 numTuple: 1 free: 6744b(17.35%)
rightlink:116 (OK) +
3(l:3) blk: 116 numTuple: 4 free: 2532b(68.97%)
rightlink:64 (OK) +
3(l:2) blk: 46 numTuple: 5 free: 1128b(86.18%) rightlink:90
(OK) +
1(l:3) blk: 65 numTuple: 5 free: 1128b(86.18%) rightlink:
107 (OK) +
2(l:3) blk: 112 numTuple: 4 free: 2532b(68.97%)
rightlink:113 (OK) +
3(l:3) blk: 107 numTuple: 5 free: 1128b(86.18%)
rightlink:112 (OK) +
4(l:3) blk: 113 numTuple: 1 free: 6744b(17.35%)
rightlink:126 (OK) +
5(l:3) blk: 126 numTuple: 4 free: 2532b(68.97%)
rightlink:57 (OK) +
4(l:1) blk: 77 numTuple: 5 free: 1128b(86.18%) rightlink:
4294967295 (InvalidBlockNumber)+
1(l:2) blk: 53 numTuple: 5 free: 1128b(86.18%) rightlink:76
(OK) +
1(l:3) blk: 81 numTuple: 2 free: 5340b(34.56%) rightlink:
82 (OK) +
2(l:3) blk: 92 numTuple: 4 free: 2532b(68.97%) rightlink:
81 (OK) +
3(l:3) blk: 82 numTuple: 1 free: 6744b(17.35%) rightlink:
100 (OK) +
4(l:3) blk: 100 numTuple: 4 free: 2532b(68.97%)
rightlink:74 (OK) +
5(l:3) blk: 73 numTuple: 3 free: 3936b(51.76%) rightlink:
92 (OK) +
2(l:2) blk: 24 numTuple: 1 free: 6744b(17.35%) rightlink:118
(OK) +
1(l:3) blk: 39 numTuple: 4 free: 2532b(68.97%) rightlink:
47 (OK) +
3(l:2) blk: 118 numTuple: 4 free: 2532b(68.97%) rightlink:36
(OK) +
1(l:3) blk: 34 numTuple: 4 free: 2532b(68.97%) rightlink:
39 (OK) +
2(l:3) blk: 47 numTuple: 1 free: 6744b(17.35%) rightlink:
111 (OK) +
3(l:3) blk: 111 numTuple: 1 free: 6744b(17.35%)
rightlink:122 (OK) +
4(l:3) blk: 122 numTuple: 4 free: 2532b(68.97%)
rightlink:117 (OK) +
4(l:2) blk: 7 numTuple: 1 free: 6744b(17.35%) rightlink:124
(OK) +
1(l:3) blk: 54 numTuple: 1 free: 6744b(17.35%) rightlink:
86 (OK) +
5(l:2) blk: 124 numTuple: 3 free: 3936b(51.76%) rightlink:53
(OK) +
1(l:3) blk: 80 numTuple: 4 free: 2532b(68.97%) rightlink:
54 (OK) +
2(l:3) blk: 79 numTuple: 2 free: 5340b(34.56%) rightlink:
80 (OK) +
3(l:3) blk: 86 numTuple: 4 free: 2532b(68.97%) rightlink:
123 (OK) +

There are no memory errors in the code and everything seems to work
correctly but GiST seems to discard Leaf tuples. In implementing
Picksplit() I have followed the code in the contrib/intarray/
_int_gist.c, contrib/hstore/hstore_gist.c modules and used Guttman's
poly-time split algorithm. Note that a raw split (just split the page
in half) does not seem to make a difference in the number of Leaf keys
that are discarded. The assignments are correct, such that on return
from Picksplit() the GIST_SPLITVEC *v, contains something like:

v->spl_left = {1, 2, 3}
v->spl_nleft = 3
v->spl_ldatum = (Datum)union_of_1_2_3

v->spl_right = {4,5}
v->spl_nright = 2
v->spl_rdatum = (Datum)union_of_4_5

Since the default page size is 4, splits are generally when inserting
a 5th element and a split is 3/2, left or right. I used a simple
example above but when using Guttman's poly-time split algorithm the
output might be:

v->spl_left = {5,3}
v->spl_right = {2,4,1}

As I understand it, the union process for the parent Node key should
"contain" each Leaf or Node key under it. The bitwise OR and
comparison by (L ^ (L & N)) shows that is the case. By the same
token, Same() on two Leaf keys should never return true (indeed,
changing Same() to always return false makes no difference in testing).

I stepped through code from Picksplit() through gistplacetopage() in
gcc but could not detect any errors or where the Leaf tuples may be
discarded. If you have any ideas or would like test code, please let
me know.

Cheers,
Pete Tanski


From: Peter Tanski <ptanski(at)raditaz(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-22 21:26:05
Message-ID: 2498A05C-E766-4E08-83BE-1A0987E64C80@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

One minor correction:

postgres=# explain select count(*) from fps2 f1 join fps2 f2 on
f1.fingerprint = f2.fingerprint;
QUERY PLAN
------------------------------------------------------------------------------------------------
Aggregate (cost=1173.67..1173.68 rows=1 width=0)
-> Nested Loop (cost=0.00..1170.54 rows=1250 width=0)
-> Seq Scan on fps2 f1 (cost=0.00..105.00 rows=500 width=32)
-> Index Scan using fps2_fingerprint_ix on fps2 f2
(cost=0.00..2.11 rows=2 width=32)
Index Cond: (f1.fingerprint = f2.fingerprint)

(The previous query example used the ~= operator which was defined to
match at > .5 but in this case there no matches in the table so ~= is
the same as =.)

On Nov 22, 2010, at 4:18 PM, Peter Tanski wrote:

> I have been working on a plugin for GiST that has some unusual
> features:
>
> * The data type for both Node and Leaf keys is large (typically 4222
> bytes on 32-bit; 4230 bytes on 64-bit).
>
> * Due to the large size the storage class is EXTENDED (main would
> only degrade to EXTENDED in any case). Tests using EXTERNAL show
> the same results so I do not believe compression is an issue.
>
> * This is a hash-type index: the values are large hashes for an
> audio fingerprinting application and the sole relationship between
> keys is a float match probability between two values. For example,
> if A matches B with a score of 0.18 but A matches C with a score of
> 0.61, A is closer to C than to B. The distance metric is calculated
> in reverse (1.0 - match_score) so the penalty for inserting A under
> the same Node as B would be 0.82 (scaled by 1000 to 820).
>
> * The Leaf Keys contain the same data as the rows themselves.
>
> * The Node (union) Keys are:
>
> - a bitwise OR of the lower-level Leaf Keys
>
> - a penalty or consistency comparison of Leaf or Query (L) value
> against a Node union-key (N) is effectively a scaled hamming
> distance of:
> L ^ (L & N)
> so if L is under N the check will always return 1.0 (true for
> consistency; 0.0 for penalty);
> by the same operation, newly inserted values compare closer to
> Node (union) keys where the corresponding Leaf keys are closer
>
> - Comparisons between two Nodes, N1 and N2 (used in Same(), for
> example) have used a:
> -- Tanimoto bit distance popcount(N1 & N2) / popcount(N1 | N2);
> -- the same scaled-hamming distance check used for a Leaf against
> another Leaf;
> -- simple memcmp for identity.
>
> Whatever test I use for Same(), Penalty() and Consistent() does not
> seem to affect the problem significantly. For now I am only using
> Consistent() as a check for retrieval.
>
> I have developed this under debug source-builds of postgresql 8.4.5
> and 9.0.1 on Mac OS X 10.6 (Snow Leopard, x86 64-bit) and Ubuntu
> Linux 10.04 (Lucid, x86 64-bit and 32-bit). There is no difference
> between the platforms or architectures. I am using the standard
> PostgreSQL Makefile build setup so compiler flags are the same as
> used by PostgreSQL.
>
> The problem may be my own understanding of Picksplit(), Penalty()
> and Same(). Using gevel, on a table with 500 newly-inserted rows:
>
> postgres=# \d fps2
> Table "public.fps2"
> Column | Type | Modifiers
> -------------+---------------+-----------
> soid | character(24) | not null
> fingerprint | fprint | not null
> Indexes:
> "fps2_fingerprint_ix" gist (fingerprint)
>
> postgres=# select gist_stat('fps2_fingerprint_ix');
> gist_stat
> -----------------------------------------
> Number of levels: 4 +
> Number of pages: 61 +
> Number of leaf pages: 42 +
> Number of tuples: 193 +
> Number of invalid tuples: 0 +
> Number of leaf tuples: 133 +
> Total size of tuples: 271704 bytes+
> Total size of leaf tuples: 187236 bytes+
> Total size of index: 499712 bytes+
>
>
> Note that there are only 133 leaf tuples -- for 500 rows. If the
> index process were operating correctly, there should have been 500
> leaf tuples there. If I REINDEX the table the number of leaf tuples
> may change slightly but not by much. This closely corresponds to a
> query:
>
> postgres=# select count(*) from fps2 f1 join fps2 f2 on
> f1.fingerprint = f2.fingerprint;
> count
> -------
> 133
>
> where = is a match operator that returns rows where the Leaf key
> comparison is > .98 (on the scaled probability score pretty much
> exactly equal). The above query is using the index:
>
> postgres=# explain select count(*) from fps2 f1 join fps2 f2 on
> f1.fingerprint ~= f2.fingerprint;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------
> Aggregate (cost=1173.67..1173.68 rows=1 width=0)
> -> Nested Loop (cost=0.00..1170.54 rows=1250 width=0)
> -> Seq Scan on fps2 f1 (cost=0.00..105.00 rows=500 width=32)
> -> Index Scan using fps2_fingerprint_ix on fps2 f2
> (cost=0.00..2.11 rows=2 width=32)
> Index Cond: (f1.fingerprint ~= f2.fingerprint)
>
>
> If I use a table scan instead of the index, the query would return:
>
> postgres=# select count(*) from fps2 f1 join fps2 f2 on
> fprint_cmp(f1.fingerprint, f2.fingerprint) > .98;
> count
> -------
> 500
>
>
> The GiST tree looks like:
>
> postgres=# select gist_tree('fps2_fingerprint_ix');
> gist_tree
> ----------------------------------------------------------------------------------------------
> 0(l:0) blk: 0 numTuple: 4 free: 2532b(68.97%) rightlink:4294967295
> (InvalidBlockNumber) +
> 1(l:1) blk: 37 numTuple: 2 free: 5340b(34.56%) rightlink:105
> (OK) +
> 1(l:2) blk: 90 numTuple: 3 free: 3936b(51.76%) rightlink:109
> (OK) +
> 1(l:3) blk: 9 numTuple: 5 free: 1128b(86.18%) rightlink:
> 94 (OK) +
> 2(l:3) blk: 114 numTuple: 5 free: 1128b(86.18%)
> rightlink:89 (OK) +
> 3(l:3) blk: 106 numTuple: 5 free: 1128b(86.18%)
> rightlink:114 (OK) +
> 2(l:2) blk: 109 numTuple: 3 free: 3936b(51.76%) rightlink:24
> (OK) +
> 1(l:3) blk: 55 numTuple: 5 free: 1128b(86.18%) rightlink:
> 45 (OK) +
> 2(l:3) blk: 94 numTuple: 1 free: 6744b(17.35%) rightlink:
> 121 (OK) +
> 3(l:3) blk: 121 numTuple: 5 free: 1128b(86.18%)
> rightlink:108 (OK) +
> 2(l:1) blk: 38 numTuple: 4 free: 2532b(68.97%) rightlink:77
> (OK) +
> 1(l:2) blk: 84 numTuple: 1 free: 6744b(17.35%) rightlink:120
> (OK) +
> 1(l:3) blk: 88 numTuple: 4 free: 2532b(68.97%) rightlink:
> 110 (OK) +
> 2(l:2) blk: 120 numTuple: 3 free: 3936b(51.76%) rightlink:42
> (OK) +
> 1(l:3) blk: 5 numTuple: 3 free: 3936b(51.76%) rightlink:
> 88 (OK) +
> 2(l:3) blk: 110 numTuple: 4 free: 2532b(68.97%)
> rightlink:71 (OK) +
> 3(l:3) blk: 72 numTuple: 4 free: 2532b(68.97%) rightlink:
> 119 (OK) +
> 3(l:2) blk: 96 numTuple: 5 free: 1128b(86.18%) rightlink:84
> (OK) +
> 1(l:3) blk: 93 numTuple: 4 free: 2532b(68.97%) rightlink:
> 95 (OK) +
> 2(l:3) blk: 87 numTuple: 1 free: 6744b(17.35%) rightlink:
> 99 (OK) +
> 3(l:3) blk: 102 numTuple: 3 free: 3936b(51.76%)
> rightlink:93 (OK) +
> 4(l:3) blk: 91 numTuple: 3 free: 3936b(51.76%) rightlink:
> 102 (OK) +
> 5(l:3) blk: 99 numTuple: 5 free: 1128b(86.18%) rightlink:
> 91 (OK) +
> 4(l:2) blk: 12 numTuple: 3 free: 3936b(51.76%) rightlink:96
> (OK) +
> 1(l:3) blk: 85 numTuple: 1 free: 6744b(17.35%) rightlink:
> 98 (OK) +
> 2(l:3) blk: 98 numTuple: 2 free: 5340b(34.56%) rightlink:
> 125 (OK) +
> 3(l:3) blk: 125 numTuple: 3 free: 3936b(51.76%)
> rightlink:87 (OK) +
> 3(l:1) blk: 105 numTuple: 3 free: 3936b(51.76%) rightlink:38
> (OK) +
> 1(l:2) blk: 8 numTuple: 2 free: 5340b(34.56%) rightlink:70
> (OK) +
> 1(l:3) blk: 66 numTuple: 1 free: 6744b(17.35%) rightlink:
> 97 (OK) +
> 2(l:3) blk: 97 numTuple: 5 free: 1128b(86.18%) rightlink:
> 67 (OK) +
> 2(l:2) blk: 70 numTuple: 3 free: 3936b(51.76%) rightlink:104
> (OK) +
> 1(l:3) blk: 59 numTuple: 1 free: 6744b(17.35%) rightlink:
> 115 (OK) +
> 2(l:3) blk: 115 numTuple: 1 free: 6744b(17.35%)
> rightlink:116 (OK) +
> 3(l:3) blk: 116 numTuple: 4 free: 2532b(68.97%)
> rightlink:64 (OK) +
> 3(l:2) blk: 46 numTuple: 5 free: 1128b(86.18%) rightlink:90
> (OK) +
> 1(l:3) blk: 65 numTuple: 5 free: 1128b(86.18%) rightlink:
> 107 (OK) +
> 2(l:3) blk: 112 numTuple: 4 free: 2532b(68.97%)
> rightlink:113 (OK) +
> 3(l:3) blk: 107 numTuple: 5 free: 1128b(86.18%)
> rightlink:112 (OK) +
> 4(l:3) blk: 113 numTuple: 1 free: 6744b(17.35%)
> rightlink:126 (OK) +
> 5(l:3) blk: 126 numTuple: 4 free: 2532b(68.97%)
> rightlink:57 (OK) +
> 4(l:1) blk: 77 numTuple: 5 free: 1128b(86.18%) rightlink:
> 4294967295 (InvalidBlockNumber)+
> 1(l:2) blk: 53 numTuple: 5 free: 1128b(86.18%) rightlink:76
> (OK) +
> 1(l:3) blk: 81 numTuple: 2 free: 5340b(34.56%) rightlink:
> 82 (OK) +
> 2(l:3) blk: 92 numTuple: 4 free: 2532b(68.97%) rightlink:
> 81 (OK) +
> 3(l:3) blk: 82 numTuple: 1 free: 6744b(17.35%) rightlink:
> 100 (OK) +
> 4(l:3) blk: 100 numTuple: 4 free: 2532b(68.97%)
> rightlink:74 (OK) +
> 5(l:3) blk: 73 numTuple: 3 free: 3936b(51.76%) rightlink:
> 92 (OK) +
> 2(l:2) blk: 24 numTuple: 1 free: 6744b(17.35%) rightlink:118
> (OK) +
> 1(l:3) blk: 39 numTuple: 4 free: 2532b(68.97%) rightlink:
> 47 (OK) +
> 3(l:2) blk: 118 numTuple: 4 free: 2532b(68.97%) rightlink:36
> (OK) +
> 1(l:3) blk: 34 numTuple: 4 free: 2532b(68.97%) rightlink:
> 39 (OK) +
> 2(l:3) blk: 47 numTuple: 1 free: 6744b(17.35%) rightlink:
> 111 (OK) +
> 3(l:3) blk: 111 numTuple: 1 free: 6744b(17.35%)
> rightlink:122 (OK) +
> 4(l:3) blk: 122 numTuple: 4 free: 2532b(68.97%)
> rightlink:117 (OK) +
> 4(l:2) blk: 7 numTuple: 1 free: 6744b(17.35%) rightlink:124
> (OK) +
> 1(l:3) blk: 54 numTuple: 1 free: 6744b(17.35%) rightlink:
> 86 (OK) +
> 5(l:2) blk: 124 numTuple: 3 free: 3936b(51.76%) rightlink:53
> (OK) +
> 1(l:3) blk: 80 numTuple: 4 free: 2532b(68.97%) rightlink:
> 54 (OK) +
> 2(l:3) blk: 79 numTuple: 2 free: 5340b(34.56%) rightlink:
> 80 (OK) +
> 3(l:3) blk: 86 numTuple: 4 free: 2532b(68.97%) rightlink:
> 123 (OK) +
>
> There are no memory errors in the code and everything seems to work
> correctly but GiST seems to discard Leaf tuples. In implementing
> Picksplit() I have followed the code in the contrib/intarray/
> _int_gist.c, contrib/hstore/hstore_gist.c modules and used Guttman's
> poly-time split algorithm. Note that a raw split (just split the
> page in half) does not seem to make a difference in the number of
> Leaf keys that are discarded. The assignments are correct, such
> that on return from Picksplit() the GIST_SPLITVEC *v, contains
> something like:
>
> v->spl_left = {1, 2, 3}
> v->spl_nleft = 3
> v->spl_ldatum = (Datum)union_of_1_2_3
>
> v->spl_right = {4,5}
> v->spl_nright = 2
> v->spl_rdatum = (Datum)union_of_4_5
>
> Since the default page size is 4, splits are generally when
> inserting a 5th element and a split is 3/2, left or right. I used a
> simple example above but when using Guttman's poly-time split
> algorithm the output might be:
>
> v->spl_left = {5,3}
> v->spl_right = {2,4,1}
>
> As I understand it, the union process for the parent Node key should
> "contain" each Leaf or Node key under it. The bitwise OR and
> comparison by (L ^ (L & N)) shows that is the case. By the same
> token, Same() on two Leaf keys should never return true (indeed,
> changing Same() to always return false makes no difference in
> testing).
>
> I stepped through code from Picksplit() through gistplacetopage() in
> gcc but could not detect any errors or where the Leaf tuples may be
> discarded. If you have any ideas or would like test code, please
> let me know.
>
> Cheers,
> Pete Tanski
>
>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Peter Tanski <ptanski(at)raditaz(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 07:29:14
Message-ID: 4CEB6D4A.2090201@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.11.2010 23:18, Peter Tanski wrote:
> Whatever test I use for Same(), Penalty() and Consistent() does not seem
> to affect the problem significantly. For now I am only using
> Consistent() as a check for retrieval.

I believe it's not possible to lose leaf tuples with incorrectly defined
gist support functions. You might get completely bogus results, but the
tuples should be there when you look at gist_tree() output. So this
sounds like a gist bug to me.

> Note that there are only 133 leaf tuples -- for 500 rows. If the index
> process were operating correctly, there should have been 500 leaf tuples
> there. If I REINDEX the table the number of leaf tuples may change
> slightly but not by much.

One idea for debugging is to insert the rows to the table one by one,
and run the query after each insertion. When do the leaf tuples disappear?

If you can put together a small self-contained test case and post it to
the list, I can take a look.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 15:00:52
Message-ID: EE47BAA8-76F3-46F9-ABA5-703BCF9C9061@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for the advice. I ran a row-by-row test, including debug output. I'll put a test case together as well but I believe I have narrowed down the problem somewhat. The first split occurrs when the 6th row is inserted and there are 6 calls to Compress(), however picksplit only receives 4 of those 6 tuples and the other two are dropped.

postgres=# \i xaa
psql:xaa:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 1 +
Number of invalid tuples: 0 +
Number of leaf tuples: 1 +
Total size of tuples: 1416 bytes+
Total size of leaf tuples: 1416 bytes+
Total size of index: 8192 bytes+

postgres=# \i xab
psql:xab:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 2 +
Number of invalid tuples: 0 +
Number of leaf tuples: 2 +
Total size of tuples: 2820 bytes+
Total size of leaf tuples: 2820 bytes+
Total size of index: 8192 bytes+

postgres=# \i xac
psql:xac:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 3 +
Number of invalid tuples: 0 +
Number of leaf tuples: 3 +
Total size of tuples: 4224 bytes+
Total size of leaf tuples: 4224 bytes+
Total size of index: 8192 bytes+

postgres=# \i xad
psql:xad:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 4 +
Number of invalid tuples: 0 +
Number of leaf tuples: 4 +
Total size of tuples: 5628 bytes+
Total size of leaf tuples: 5628 bytes+
Total size of index: 8192 bytes+

postgres=# \i xae
psql:xae:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 5 +
Number of invalid tuples: 0 +
Number of leaf tuples: 5 +
Total size of tuples: 7032 bytes+
Total size of leaf tuples: 7032 bytes+
Total size of index: 8192 bytes+

postgres=# \i xaf
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:660] entered picksplit
psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:812] split: 2 left, 2 right
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
----------------------------------------
Number of levels: 2 +
Number of pages: 3 +
Number of leaf pages: 2 +
Number of tuples: 6 +
Number of invalid tuples: 0 +
Number of leaf tuples: 4 +
Total size of tuples: 8460 bytes +
Total size of leaf tuples: 5640 bytes +
Total size of index: 24576 bytes+

postgres=#

There are checks inside the Picksplit() function for the number of entries:

OffsetNumber maxoff = entryvec->n - 1;
int n_entries, j;
n_entries = Max(maxoff, 1) - 1;

j = 0;
for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i)) {
FPrint* v = deserialize_fprint(entv[i].key);
if (!GIST_LEAF(&entv[i])) {
leaf_split = false;
}
if (v == NULL) {
elog(ERROR, "entry %d is invalid", i);
}
raw_vec[j] = v;
vec_ixs[j++] = i;
}
if (n_entries > j) {
elog(WARNING, "[%s:%s:%d]: " SIZE_T_FMT " bad entries",
__FILE__, __func__, __LINE__, n_entries - j);
n_entries = j;
} else if (n_entries < j) {
elog(ERROR, "skipping %d entries", j-n_entries);
}

So I know the number of entries sent to Picksplit() is 4, for 6 calls to decompress.

Note that Decompress() returns the input unchanged and entries are untoasted in the deserialize_fprint() function, which malloc's each value:

Datum fprint_decompress(PG_FUNCTION_ARGS) {
GISTENTRY* entry = (GISTENTRY*)PG_GETARG_POINTER(0);

FPDEBUG("entered decompress");

if (!entry) {
elog(ERROR, "fprint_decompress: entry is NULL");
}

// cut out here -- we handle the memory
PG_RETURN_POINTER(entry);
}

I'll put together a test case and send that on.

On Nov 23, 2010, at 2:29 AM, Heikki Linnakangas wrote:

> On 22.11.2010 23:18, Peter Tanski wrote:
>> Whatever test I use for Same(), Penalty() and Consistent() does not seem
>> to affect the problem significantly. For now I am only using
>> Consistent() as a check for retrieval.
>
> I believe it's not possible to lose leaf tuples with incorrectly defined gist support functions. You might get completely bogus results, but the tuples should be there when you look at gist_tree() output. So this sounds like a gist bug to me.
>
>> Note that there are only 133 leaf tuples -- for 500 rows. If the index
>> process were operating correctly, there should have been 500 leaf tuples
>> there. If I REINDEX the table the number of leaf tuples may change
>> slightly but not by much.
>
> One idea for debugging is to insert the rows to the table one by one, and run the query after each insertion. When do the leaf tuples disappear?
>
> If you can put together a small self-contained test case and post it to the list, I can take a look.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Peter Tanski <ptanski(at)raditaz(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 15:22:59
Message-ID: 1290525717-sup-8003@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300 2010:

> There are checks inside the Picksplit() function for the number of entries:
>
> OffsetNumber maxoff = entryvec->n - 1;
> int n_entries, j;
> n_entries = Max(maxoff, 1) - 1;
>
> j = 0;
> for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i)) {
> FPrint* v = deserialize_fprint(entv[i].key);

Isn't this off by one? Offset numbers are 1-based, so the maxoff
computation is wrong.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 16:39:33
Message-ID: 0B1250AB-55F8-4475-9A0B-982B567A923B@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Picksplit() seems to be an exceptional case here: the first and last
numbers in entryvec are invalid so

entryvec->vector[entryvec->n - 1]

is invalid. All the other GiST code Picksplit() functions use the
same convention. For example, see the btree_gist picksplit function, at
http://doxygen.postgresql.org/btree__utils__num_8c-source.html#l00241

OffsetNumber i,
maxoff = entryvec->n - 1;

On Nov 23, 2010, at 10:22 AM, Alvaro Herrera wrote:

> Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300
> 2010:
>
>> There are checks inside the Picksplit() function for the number of
>> entries:
>>
>> OffsetNumber maxoff = entryvec->n - 1;
>> int n_entries, j;
>> n_entries = Max(maxoff, 1) - 1;
>>
>> j = 0;
>> for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i)) {
>> FPrint* v = deserialize_fprint(entv[i].key);
>
> Isn't this off by one? Offset numbers are 1-based, so the maxoff
> computation is wrong.
>
> --
> Álvaro Herrera <alvherre(at)commandprompt(dot)com>
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 16:44:39
Message-ID: 4C90DBBA-45E8-463F-851D-587F0D96BD89@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I should correct what I just wrote: the first and last entries in
entryvec->vector are invalid.

On Nov 23, 2010, at 11:39 AM, Peter Tanski wrote:

> Picksplit() seems to be an exceptional case here: the first and last
> numbers in entryvec are invalid so
>
> entryvec->vector[entryvec->n - 1]
>
> is invalid. All the other GiST code Picksplit() functions use the
> same convention. For example, see the btree_gist picksplit
> function, at
> http://doxygen.postgresql.org/btree__utils__num_8c-source.html#l00241
>
> OffsetNumber i,
> maxoff = entryvec->n - 1;
>
>
> On Nov 23, 2010, at 10:22 AM, Alvaro Herrera wrote:
>
>> Excerpts from Peter Tanski's message of mar nov 23 12:00:52 -0300
>> 2010:
>>
>>> There are checks inside the Picksplit() function for the number of
>>> entries:
>>>
>>> OffsetNumber maxoff = entryvec->n - 1;
>>> int n_entries, j;
>>> n_entries = Max(maxoff, 1) - 1;
>>>
>>> j = 0;
>>> for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i)) {
>>> FPrint* v = deserialize_fprint(entv[i].key);
>>
>> Isn't this off by one? Offset numbers are 1-based, so the maxoff
>> computation is wrong.
>>
>> --
>> Álvaro Herrera <alvherre(at)commandprompt(dot)com>
>> The PostgreSQL Company - Command Prompt, Inc.
>> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>


From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 19:54:11
Message-ID: 6A6EDF2E-770F-4A15-84F8-4A4B2D04C15E@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
>>>> j = 0;
>>>> for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i)) {
>>>> FPrint* v = deserialize_fprint(entv[i].key);
>>>
>>> Isn't this off by one? Offset numbers are 1-based, so the maxoff
>>> computation is wrong.
> The first for loop of all others compare with i <= maxoff instead of i < maxoff.

You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.)

On the other hand, there are two problems:

1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size.

2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself:

postgres=# set enable_seqscan=false;
SET
postgres=# set enable_indexscan=true;
SET
postgres=# create table fps2 (id serial, soid character(24) not null, fingerprint fprint not null);
NOTICE: CREATE TABLE will create implicit sequence "fps2_id_seq" for serial column "fps2.id"
CREATE TABLE
postgres=# create index fps2_fingerprint_ix on fps2 using gist (fingerprint fprint_gist_ops);
CREATE INDEX
postgres=# \i xaa
psql:xaa:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# \i xab
psql:xab:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# \i xac
psql:xac:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# \i xad
psql:xad:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 4 +
Number of invalid tuples: 0 +
Number of leaf tuples: 4 +
Total size of tuples: 5628 bytes+
Total size of leaf tuples: 5628 bytes+
Total size of index: 8192 bytes+

postgres=# \i xae
psql:xae:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 5 +
Number of invalid tuples: 0 +
Number of leaf tuples: 5 +
Total size of tuples: 7032 bytes+
Total size of leaf tuples: 7032 bytes+
Total size of index: 8192 bytes+

postgres=# \i xaf
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:659] entered picksplit
psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:838] split: 3 left, 2 right
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
----------------------------------------
Number of levels: 2 +
Number of pages: 3 +
Number of leaf pages: 2 +
Number of tuples: 7 +
Number of invalid tuples: 0 +
Number of leaf tuples: 5 +
Total size of tuples: 9864 bytes +
Total size of leaf tuples: 7044 bytes +
Total size of index: 24576 bytes+

postgres=# select id, soid from fps2;
id | soid
----+--------------------------
1 | 4c65a39d4d9bca2c33000082
2 | 4c65a39d4d9bca2c3300008a
3 | 4c65a39d4d9bca2c33000090
4 | 4c65a39d4d9bca2c33000099
5 | 4c65a39d4d9bca2c330000a5
6 | 4c65a39d4d9bca2c330000a8

postgres=# select f1.id, f2.id, fprint_cmp(f1.fingerprint, f2.fingerprint) from fps2 f1 join fps2 f2 on f1.fingerprint=f2.fingerprint;
id | id | fprint_cmp
----+----+------------------
1 | 1 | 1.00031467691569
2 | 2 | 1.00031467691569
4 | 4 | 1.00031467691569
5 | 5 | 1.00031467691569
6 | 6 | 1.00031467691569

So GiST does not include a tuple for row 3; one of the old tuples.

After inserting a few more rows to trigger another Picksplit():

postgres=# \i xag
psql:xag:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xag:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xag:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xag:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xag:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 703.4312133789062500
psql:xag:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xag:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xag:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 832.1127319335937500
psql:xag:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xag:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xag:1: NOTICE: [pgfprint.c:fprint_union:453] entered union
psql:xag:1: NOTICE: [pgfprint.c:fprint_same:951] entered same
psql:xag:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xag:1: NOTICE: [pgfprint.c:fprint_compress:384] returning non-leafkey entry raw
INSERT 0 1
postgres=# \i xah
psql:xah:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xah:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xah:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xah:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xah:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 880.7246093750000000
psql:xah:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xah:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xah:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 903.4860839843750000
psql:xah:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xah:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xah:1: NOTICE: [pgfprint.c:fprint_union:453] entered union
psql:xah:1: NOTICE: [pgfprint.c:fprint_same:951] entered same
psql:xah:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xah:1: NOTICE: [pgfprint.c:fprint_compress:384] returning non-leafkey entry raw
INSERT 0 1
postgres=# \i xai
psql:xai:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xai:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xai:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xai:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xai:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 904.7127075195312500
psql:xai:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xai:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xai:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 907.4243164062500000
psql:xai:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xai:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xai:1: NOTICE: [pgfprint.c:fprint_union:453] entered union
psql:xai:1: NOTICE: [pgfprint.c:fprint_same:951] entered same
psql:xai:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xai:1: NOTICE: [pgfprint.c:fprint_compress:384] returning non-leafkey entry raw
INSERT 0 1
postgres=# \i xaj
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xaj:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 910.3089599609375000
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_penalty:913] entered penalty
psql:xaj:1: NOTICE: [pgfprint.c:fprint_penalty:935] penalty: 906.1793212890625000
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_picksplit:659] entered picksplit
psql:xaj:1: NOTICE: [pgfprint.c:fprint_picksplit:838] split: 2 left, 3 right
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:384] returning non-leafkey entry raw
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:384] returning non-leafkey entry raw
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_decompress:419] entered decompress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_union:453] entered union
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaj:1: NOTICE: [pgfprint.c:fprint_compress:384] returning non-leafkey entry raw
INSERT 0 1
postgres=# select id, soid from fps2;
id | soid
----+--------------------------
1 | 4c65a39d4d9bca2c33000082
2 | 4c65a39d4d9bca2c3300008a
3 | 4c65a39d4d9bca2c33000090
4 | 4c65a39d4d9bca2c33000099
5 | 4c65a39d4d9bca2c330000a5
6 | 4c65a39d4d9bca2c330000a8
7 | 4c65a39d4d9bca2c330000b0
8 | 4c65a39d4d9bca2c330000be
9 | 4c65a39d4d9bca2c330000c8
10 | 4c65a39d4d9bca2c330000d3
(10 rows)

postgres=# select f1.id, f2.id, fprint_cmp(f1.fingerprint, f2.fingerprint) from fps2 f1 join fps2 f2 on f1.fingerprint=f2.fingerprint;
id | id | fprint_cmp
----+----+------------------
1 | 1 | 1.00031467691569
4 | 4 | 1.00031467691569
5 | 5 | 1.00031467691569
6 | 6 | 1.00031467691569
7 | 7 | 1.00031467691569
8 | 8 | 1.00031467691569
9 | 9 | 1.00031467691569
10 | 10 | 1.00031467691569
(8 rows)

Index tuples for rows 3 and 2 have been dropped.


From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-24 03:12:18
Message-ID: 218BEF96-3524-41EB-A15C-67CA3DAD4B58@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I found another off-by-one error in my Picksplit() algorithm and the GiST index contains one leaf tuple for each row in the table now. The error was to start from 1 instead of 0 when assigning the entries. Thanks to everyone for your help.

For the record, this is the only GiST index I know of where the keys are over 2000 bytes in size. So GiST definitely handles large keys. Perhaps the maximum size for intarray could be increased.

On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote:

> On 2010-11-23 20:54, Peter Tanski wrote:
>> On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
>>>>>> j = 0;
>>>>>> for (i = FirstOffsetNumber; i< maxoff; i = OffsetNumberNext(i)) {
>>>>>> FPrint* v = deserialize_fprint(entv[i].key);
>>>>> Isn't this off by one? Offset numbers are 1-based, so the maxoff
>>>>> computation is wrong.
>>> The first for loop of all others compare with i<= maxoff instead of i< maxoff.
>> You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.)
>>
>> On the other hand, there are two problems:
>>
>> 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size.
>>
>> 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself:
> I noticed an n_entries intialization in one of your earlier mails that might also be a source of trouble. I was under the impression that gistentryvectors have n-1 entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) must be skipped. E.g. entryvec->n = 5. This means that there are 4 entries, which are in array positions 1,2,3,4.
>
> btw: interesting topic, audio fingerprinting!
>
> regards,
> Yeb Havinga
>


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Peter Tanski <ptanski(at)raditaz(dot)com>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-24 06:09:20
Message-ID: Pine.LNX.4.64.1011240907350.12632@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter, glad to know you succeeded. FYI, a year ago we developed GiST
extension for rdkit.org.

Oleg
On Tue, 23 Nov 2010, Peter Tanski wrote:

> I found another off-by-one error in my Picksplit() algorithm and the GiST index contains one leaf tuple for each row in the table now. The error was to start from 1 instead of 0 when assigning the entries. Thanks to everyone for your help.
>
> For the record, this is the only GiST index I know of where the keys are over 2000 bytes in size. So GiST definitely handles large keys. Perhaps the maximum size for intarray could be increased.
>
> On Nov 23, 2010, at 4:01 PM, Yeb Havinga wrote:
>
>> On 2010-11-23 20:54, Peter Tanski wrote:
>>> On Nov 23, 2010, at 1:37 PM, Yeb Havinga wrote:
>>>>>>> j = 0;
>>>>>>> for (i = FirstOffsetNumber; i< maxoff; i = OffsetNumberNext(i)) {
>>>>>>> FPrint* v = deserialize_fprint(entv[i].key);
>>>>>> Isn't this off by one? Offset numbers are 1-based, so the maxoff
>>>>>> computation is wrong.
>>>> The first for loop of all others compare with i<= maxoff instead of i< maxoff.
>>> You are right: I am missing the last one, there. (During a memory-debugging phase entv[entryvec-n - 1] was always invalid, probably as a memory overwrite error but I fixed that later and never changed it back.)
>>>
>>> On the other hand, there are two problems:
>>>
>>> 1. the maximum size on a GiST page is 4240 bytes, so I cannot add a full-size Datum using this kind of hash-key setup (the base Datum size is 4230 bytes on a 64-bit machine). The example test cases I used were smaller in order to get around that issue: they are 2326 bytes base size.
>>>
>>> 2. Even after fixing the Picksplit() loop, the dropped-leaf problem still manifests itself:
>> I noticed an n_entries intialization in one of your earlier mails that might also be a source of trouble. I was under the impression that gistentryvectors have n-1 entries (not n-2 as you say), because the first element (0 / InvalidOffsetNumber) must be skipped. E.g. entryvec->n = 5. This means that there are 4 entries, which are in array positions 1,2,3,4.
>>
>> btw: interesting topic, audio fingerprinting!
>>
>> regards,
>> Yeb Havinga
>>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83