Re: GiST seems to drop left-branch leaf tuples

From: Peter Tanski <ptanski(at)raditaz(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST seems to drop left-branch leaf tuples
Date: 2010-11-23 15:00:52
Message-ID: EE47BAA8-76F3-46F9-ABA5-703BCF9C9061@raditaz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the advice. I ran a row-by-row test, including debug output. I'll put a test case together as well but I believe I have narrowed down the problem somewhat. The first split occurrs when the 6th row is inserted and there are 6 calls to Compress(), however picksplit only receives 4 of those 6 tuples and the other two are dropped.

postgres=# \i xaa
psql:xaa:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 1 +
Number of invalid tuples: 0 +
Number of leaf tuples: 1 +
Total size of tuples: 1416 bytes+
Total size of leaf tuples: 1416 bytes+
Total size of index: 8192 bytes+

postgres=# \i xab
psql:xab:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 2 +
Number of invalid tuples: 0 +
Number of leaf tuples: 2 +
Total size of tuples: 2820 bytes+
Total size of leaf tuples: 2820 bytes+
Total size of index: 8192 bytes+

postgres=# \i xac
psql:xac:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 3 +
Number of invalid tuples: 0 +
Number of leaf tuples: 3 +
Total size of tuples: 4224 bytes+
Total size of leaf tuples: 4224 bytes+
Total size of index: 8192 bytes+

postgres=# \i xad
psql:xad:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 4 +
Number of invalid tuples: 0 +
Number of leaf tuples: 4 +
Total size of tuples: 5628 bytes+
Total size of leaf tuples: 5628 bytes+
Total size of index: 8192 bytes+

postgres=# \i xae
psql:xae:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
---------------------------------------
Number of levels: 1 +
Number of pages: 1 +
Number of leaf pages: 1 +
Number of tuples: 5 +
Number of invalid tuples: 0 +
Number of leaf tuples: 5 +
Total size of tuples: 7032 bytes+
Total size of leaf tuples: 7032 bytes+
Total size of index: 8192 bytes+

postgres=# \i xaf
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_decompress:421] entered decompress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:660] entered picksplit
psql:xaf:1: NOTICE: [pgfprint.c:fprint_picksplit:812] split: 2 left, 2 right
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
psql:xaf:1: NOTICE: [pgfprint.c:fprint_compress:379] entered compress
INSERT 0 1
postgres=# select gist_stat('fps2_fingerprint_ix');
gist_stat
----------------------------------------
Number of levels: 2 +
Number of pages: 3 +
Number of leaf pages: 2 +
Number of tuples: 6 +
Number of invalid tuples: 0 +
Number of leaf tuples: 4 +
Total size of tuples: 8460 bytes +
Total size of leaf tuples: 5640 bytes +
Total size of index: 24576 bytes+

postgres=#

There are checks inside the Picksplit() function for the number of entries:

OffsetNumber maxoff = entryvec->n - 1;
int n_entries, j;
n_entries = Max(maxoff, 1) - 1;

j = 0;
for (i = FirstOffsetNumber; i < maxoff; i = OffsetNumberNext(i)) {
FPrint* v = deserialize_fprint(entv[i].key);
if (!GIST_LEAF(&entv[i])) {
leaf_split = false;
}
if (v == NULL) {
elog(ERROR, "entry %d is invalid", i);
}
raw_vec[j] = v;
vec_ixs[j++] = i;
}
if (n_entries > j) {
elog(WARNING, "[%s:%s:%d]: " SIZE_T_FMT " bad entries",
__FILE__, __func__, __LINE__, n_entries - j);
n_entries = j;
} else if (n_entries < j) {
elog(ERROR, "skipping %d entries", j-n_entries);
}

So I know the number of entries sent to Picksplit() is 4, for 6 calls to decompress.

Note that Decompress() returns the input unchanged and entries are untoasted in the deserialize_fprint() function, which malloc's each value:

Datum fprint_decompress(PG_FUNCTION_ARGS) {
GISTENTRY* entry = (GISTENTRY*)PG_GETARG_POINTER(0);

FPDEBUG("entered decompress");

if (!entry) {
elog(ERROR, "fprint_decompress: entry is NULL");
}

// cut out here -- we handle the memory
PG_RETURN_POINTER(entry);
}

I'll put together a test case and send that on.

On Nov 23, 2010, at 2:29 AM, Heikki Linnakangas wrote:

> On 22.11.2010 23:18, Peter Tanski wrote:
>> Whatever test I use for Same(), Penalty() and Consistent() does not seem
>> to affect the problem significantly. For now I am only using
>> Consistent() as a check for retrieval.
>
> I believe it's not possible to lose leaf tuples with incorrectly defined gist support functions. You might get completely bogus results, but the tuples should be there when you look at gist_tree() output. So this sounds like a gist bug to me.
>
>> Note that there are only 133 leaf tuples -- for 500 rows. If the index
>> process were operating correctly, there should have been 500 leaf tuples
>> there. If I REINDEX the table the number of leaf tuples may change
>> slightly but not by much.
>
> One idea for debugging is to insert the rows to the table one by one, and run the query after each insertion. When do the leaf tuples disappear?
>
> If you can put together a small self-contained test case and post it to the list, I can take a look.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-11-23 15:22:59 Re: GiST seems to drop left-branch leaf tuples
Previous Message Craig Ringer 2010-11-23 14:12:04 Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)