Re: GiST seems to drop left-branch leaf tuples

From: Peter Tanski <ptanski(at)raditaz(dot)com>
To:
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-22 21:26:05
Message-ID: 2498A05C-E766-4E08-83BE-1A0987E64C80@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One minor correction:

postgres=# explain select count(*) from fps2 f1 join fps2 f2 on
f1.fingerprint = f2.fingerprint;
QUERY PLAN
------------------------------------------------------------------------------------------------
Aggregate (cost=1173.67..1173.68 rows=1 width=0)
-> Nested Loop (cost=0.00..1170.54 rows=1250 width=0)
-> Seq Scan on fps2 f1 (cost=0.00..105.00 rows=500 width=32)
-> Index Scan using fps2_fingerprint_ix on fps2 f2
(cost=0.00..2.11 rows=2 width=32)
Index Cond: (f1.fingerprint = f2.fingerprint)

(The previous query example used the ~= operator which was defined to
match at > .5 but in this case there no matches in the table so ~= is
the same as =.)

On Nov 22, 2010, at 4:18 PM, Peter Tanski wrote:

> I have been working on a plugin for GiST that has some unusual
> features:
>
> * The data type for both Node and Leaf keys is large (typically 4222
> bytes on 32-bit; 4230 bytes on 64-bit).
>
> * Due to the large size the storage class is EXTENDED (main would
> only degrade to EXTENDED in any case). Tests using EXTERNAL show
> the same results so I do not believe compression is an issue.
>
> * This is a hash-type index: the values are large hashes for an
> audio fingerprinting application and the sole relationship between
> keys is a float match probability between two values. For example,
> if A matches B with a score of 0.18 but A matches C with a score of
> 0.61, A is closer to C than to B. The distance metric is calculated
> in reverse (1.0 - match_score) so the penalty for inserting A under
> the same Node as B would be 0.82 (scaled by 1000 to 820).
>
> * The Leaf Keys contain the same data as the rows themselves.
>
> * The Node (union) Keys are:
>
> - a bitwise OR of the lower-level Leaf Keys
>
> - a penalty or consistency comparison of Leaf or Query (L) value
> against a Node union-key (N) is effectively a scaled hamming
> distance of:
> L ^ (L & N)
> so if L is under N the check will always return 1.0 (true for
> consistency; 0.0 for penalty);
> by the same operation, newly inserted values compare closer to
> Node (union) keys where the corresponding Leaf keys are closer
>
> - Comparisons between two Nodes, N1 and N2 (used in Same(), for
> example) have used a:
> -- Tanimoto bit distance popcount(N1 & N2) / popcount(N1 | N2);
> -- the same scaled-hamming distance check used for a Leaf against
> another Leaf;
> -- simple memcmp for identity.
>
> Whatever test I use for Same(), Penalty() and Consistent() does not
> seem to affect the problem significantly. For now I am only using
> Consistent() as a check for retrieval.
>
> I have developed this under debug source-builds of postgresql 8.4.5
> and 9.0.1 on Mac OS X 10.6 (Snow Leopard, x86 64-bit) and Ubuntu
> Linux 10.04 (Lucid, x86 64-bit and 32-bit). There is no difference
> between the platforms or architectures. I am using the standard
> PostgreSQL Makefile build setup so compiler flags are the same as
> used by PostgreSQL.
>
> The problem may be my own understanding of Picksplit(), Penalty()
> and Same(). Using gevel, on a table with 500 newly-inserted rows:
>
> postgres=# \d fps2
> Table "public.fps2"
> Column | Type | Modifiers
> -------------+---------------+-----------
> soid | character(24) | not null
> fingerprint | fprint | not null
> Indexes:
> "fps2_fingerprint_ix" gist (fingerprint)
>
> postgres=# select gist_stat('fps2_fingerprint_ix');
> gist_stat
> -----------------------------------------
> Number of levels: 4 +
> Number of pages: 61 +
> Number of leaf pages: 42 +
> Number of tuples: 193 +
> Number of invalid tuples: 0 +
> Number of leaf tuples: 133 +
> Total size of tuples: 271704 bytes+
> Total size of leaf tuples: 187236 bytes+
> Total size of index: 499712 bytes+
>
>
> Note that there are only 133 leaf tuples -- for 500 rows. If the
> index process were operating correctly, there should have been 500
> leaf tuples there. If I REINDEX the table the number of leaf tuples
> may change slightly but not by much. This closely corresponds to a
> query:
>
> postgres=# select count(*) from fps2 f1 join fps2 f2 on
> f1.fingerprint = f2.fingerprint;
> count
> -------
> 133
>
> where = is a match operator that returns rows where the Leaf key
> comparison is > .98 (on the scaled probability score pretty much
> exactly equal). The above query is using the index:
>
> postgres=# explain select count(*) from fps2 f1 join fps2 f2 on
> f1.fingerprint ~= f2.fingerprint;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------
> Aggregate (cost=1173.67..1173.68 rows=1 width=0)
> -> Nested Loop (cost=0.00..1170.54 rows=1250 width=0)
> -> Seq Scan on fps2 f1 (cost=0.00..105.00 rows=500 width=32)
> -> Index Scan using fps2_fingerprint_ix on fps2 f2
> (cost=0.00..2.11 rows=2 width=32)
> Index Cond: (f1.fingerprint ~= f2.fingerprint)
>
>
> If I use a table scan instead of the index, the query would return:
>
> postgres=# select count(*) from fps2 f1 join fps2 f2 on
> fprint_cmp(f1.fingerprint, f2.fingerprint) > .98;
> count
> -------
> 500
>
>
> The GiST tree looks like:
>
> postgres=# select gist_tree('fps2_fingerprint_ix');
> gist_tree
> ----------------------------------------------------------------------------------------------
> 0(l:0) blk: 0 numTuple: 4 free: 2532b(68.97%) rightlink:4294967295
> (InvalidBlockNumber) +
> 1(l:1) blk: 37 numTuple: 2 free: 5340b(34.56%) rightlink:105
> (OK) +
> 1(l:2) blk: 90 numTuple: 3 free: 3936b(51.76%) rightlink:109
> (OK) +
> 1(l:3) blk: 9 numTuple: 5 free: 1128b(86.18%) rightlink:
> 94 (OK) +
> 2(l:3) blk: 114 numTuple: 5 free: 1128b(86.18%)
> rightlink:89 (OK) +
> 3(l:3) blk: 106 numTuple: 5 free: 1128b(86.18%)
> rightlink:114 (OK) +
> 2(l:2) blk: 109 numTuple: 3 free: 3936b(51.76%) rightlink:24
> (OK) +
> 1(l:3) blk: 55 numTuple: 5 free: 1128b(86.18%) rightlink:
> 45 (OK) +
> 2(l:3) blk: 94 numTuple: 1 free: 6744b(17.35%) rightlink:
> 121 (OK) +
> 3(l:3) blk: 121 numTuple: 5 free: 1128b(86.18%)
> rightlink:108 (OK) +
> 2(l:1) blk: 38 numTuple: 4 free: 2532b(68.97%) rightlink:77
> (OK) +
> 1(l:2) blk: 84 numTuple: 1 free: 6744b(17.35%) rightlink:120
> (OK) +
> 1(l:3) blk: 88 numTuple: 4 free: 2532b(68.97%) rightlink:
> 110 (OK) +
> 2(l:2) blk: 120 numTuple: 3 free: 3936b(51.76%) rightlink:42
> (OK) +
> 1(l:3) blk: 5 numTuple: 3 free: 3936b(51.76%) rightlink:
> 88 (OK) +
> 2(l:3) blk: 110 numTuple: 4 free: 2532b(68.97%)
> rightlink:71 (OK) +
> 3(l:3) blk: 72 numTuple: 4 free: 2532b(68.97%) rightlink:
> 119 (OK) +
> 3(l:2) blk: 96 numTuple: 5 free: 1128b(86.18%) rightlink:84
> (OK) +
> 1(l:3) blk: 93 numTuple: 4 free: 2532b(68.97%) rightlink:
> 95 (OK) +
> 2(l:3) blk: 87 numTuple: 1 free: 6744b(17.35%) rightlink:
> 99 (OK) +
> 3(l:3) blk: 102 numTuple: 3 free: 3936b(51.76%)
> rightlink:93 (OK) +
> 4(l:3) blk: 91 numTuple: 3 free: 3936b(51.76%) rightlink:
> 102 (OK) +
> 5(l:3) blk: 99 numTuple: 5 free: 1128b(86.18%) rightlink:
> 91 (OK) +
> 4(l:2) blk: 12 numTuple: 3 free: 3936b(51.76%) rightlink:96
> (OK) +
> 1(l:3) blk: 85 numTuple: 1 free: 6744b(17.35%) rightlink:
> 98 (OK) +
> 2(l:3) blk: 98 numTuple: 2 free: 5340b(34.56%) rightlink:
> 125 (OK) +
> 3(l:3) blk: 125 numTuple: 3 free: 3936b(51.76%)
> rightlink:87 (OK) +
> 3(l:1) blk: 105 numTuple: 3 free: 3936b(51.76%) rightlink:38
> (OK) +
> 1(l:2) blk: 8 numTuple: 2 free: 5340b(34.56%) rightlink:70
> (OK) +
> 1(l:3) blk: 66 numTuple: 1 free: 6744b(17.35%) rightlink:
> 97 (OK) +
> 2(l:3) blk: 97 numTuple: 5 free: 1128b(86.18%) rightlink:
> 67 (OK) +
> 2(l:2) blk: 70 numTuple: 3 free: 3936b(51.76%) rightlink:104
> (OK) +
> 1(l:3) blk: 59 numTuple: 1 free: 6744b(17.35%) rightlink:
> 115 (OK) +
> 2(l:3) blk: 115 numTuple: 1 free: 6744b(17.35%)
> rightlink:116 (OK) +
> 3(l:3) blk: 116 numTuple: 4 free: 2532b(68.97%)
> rightlink:64 (OK) +
> 3(l:2) blk: 46 numTuple: 5 free: 1128b(86.18%) rightlink:90
> (OK) +
> 1(l:3) blk: 65 numTuple: 5 free: 1128b(86.18%) rightlink:
> 107 (OK) +
> 2(l:3) blk: 112 numTuple: 4 free: 2532b(68.97%)
> rightlink:113 (OK) +
> 3(l:3) blk: 107 numTuple: 5 free: 1128b(86.18%)
> rightlink:112 (OK) +
> 4(l:3) blk: 113 numTuple: 1 free: 6744b(17.35%)
> rightlink:126 (OK) +
> 5(l:3) blk: 126 numTuple: 4 free: 2532b(68.97%)
> rightlink:57 (OK) +
> 4(l:1) blk: 77 numTuple: 5 free: 1128b(86.18%) rightlink:
> 4294967295 (InvalidBlockNumber)+
> 1(l:2) blk: 53 numTuple: 5 free: 1128b(86.18%) rightlink:76
> (OK) +
> 1(l:3) blk: 81 numTuple: 2 free: 5340b(34.56%) rightlink:
> 82 (OK) +
> 2(l:3) blk: 92 numTuple: 4 free: 2532b(68.97%) rightlink:
> 81 (OK) +
> 3(l:3) blk: 82 numTuple: 1 free: 6744b(17.35%) rightlink:
> 100 (OK) +
> 4(l:3) blk: 100 numTuple: 4 free: 2532b(68.97%)
> rightlink:74 (OK) +
> 5(l:3) blk: 73 numTuple: 3 free: 3936b(51.76%) rightlink:
> 92 (OK) +
> 2(l:2) blk: 24 numTuple: 1 free: 6744b(17.35%) rightlink:118
> (OK) +
> 1(l:3) blk: 39 numTuple: 4 free: 2532b(68.97%) rightlink:
> 47 (OK) +
> 3(l:2) blk: 118 numTuple: 4 free: 2532b(68.97%) rightlink:36
> (OK) +
> 1(l:3) blk: 34 numTuple: 4 free: 2532b(68.97%) rightlink:
> 39 (OK) +
> 2(l:3) blk: 47 numTuple: 1 free: 6744b(17.35%) rightlink:
> 111 (OK) +
> 3(l:3) blk: 111 numTuple: 1 free: 6744b(17.35%)
> rightlink:122 (OK) +
> 4(l:3) blk: 122 numTuple: 4 free: 2532b(68.97%)
> rightlink:117 (OK) +
> 4(l:2) blk: 7 numTuple: 1 free: 6744b(17.35%) rightlink:124
> (OK) +
> 1(l:3) blk: 54 numTuple: 1 free: 6744b(17.35%) rightlink:
> 86 (OK) +
> 5(l:2) blk: 124 numTuple: 3 free: 3936b(51.76%) rightlink:53
> (OK) +
> 1(l:3) blk: 80 numTuple: 4 free: 2532b(68.97%) rightlink:
> 54 (OK) +
> 2(l:3) blk: 79 numTuple: 2 free: 5340b(34.56%) rightlink:
> 80 (OK) +
> 3(l:3) blk: 86 numTuple: 4 free: 2532b(68.97%) rightlink:
> 123 (OK) +
>
> There are no memory errors in the code and everything seems to work
> correctly but GiST seems to discard Leaf tuples. In implementing
> Picksplit() I have followed the code in the contrib/intarray/
> _int_gist.c, contrib/hstore/hstore_gist.c modules and used Guttman's
> poly-time split algorithm. Note that a raw split (just split the
> page in half) does not seem to make a difference in the number of
> Leaf keys that are discarded. The assignments are correct, such
> that on return from Picksplit() the GIST_SPLITVEC *v, contains
> something like:
>
> v->spl_left = {1, 2, 3}
> v->spl_nleft = 3
> v->spl_ldatum = (Datum)union_of_1_2_3
>
> v->spl_right = {4,5}
> v->spl_nright = 2
> v->spl_rdatum = (Datum)union_of_4_5
>
> Since the default page size is 4, splits are generally when
> inserting a 5th element and a split is 3/2, left or right. I used a
> simple example above but when using Guttman's poly-time split
> algorithm the output might be:
>
> v->spl_left = {5,3}
> v->spl_right = {2,4,1}
>
> As I understand it, the union process for the parent Node key should
> "contain" each Leaf or Node key under it. The bitwise OR and
> comparison by (L ^ (L & N)) shows that is the case. By the same
> token, Same() on two Leaf keys should never return true (indeed,
> changing Same() to always return false makes no difference in
> testing).
>
> I stepped through code from Picksplit() through gistplacetopage() in
> gcc but could not detect any errors or where the Leaf tuples may be
> discarded. If you have any ideas or would like test code, please
> let me know.
>
> Cheers,
> Pete Tanski
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-22 21:59:30 Re: final patch - plpgsql: for-in-array
Previous Message Peter Tanski 2010-11-22 21:18:36 GiST seems to drop left-branch leaf tuples