Understanding and implementing a GiST Index

From: Connor Wolf <wolf(at)imaginaryindustries(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Understanding and implementing a GiST Index
Date: 2014-10-09 07:09:58
Message-ID: 543634C6.5000005@imaginaryindustries.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi there!

I'm trying to implement a custom indexing system using the GiST index
framework, and I have a few questions.
Basically, I'm trying to implement a system that allows me to search
across a set of 64 bit integers by hamming distance. This is intended to
be used in searching for similar images, where the 64 bit integer is
actually a phash
<http://www.hackerfactor.com/blog/?/archives/432-Looks-Like-It.html> of
an image, and similarity between two images is reflected in the hamming
distance between two integers.

Anyways, The appropriate approach here is to use something called a BK
tree
<http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees>,
for which I've written some test code
<https://github.com/fake-name/MangaCMS/blob/master/deduplicator/hamDb.py#L27>
and I think I have a decent grip of (my test code seems to work, in any
event).

That said:

* Is there a /simple/ piece of example-code for implementing a GiST
index for a single index? I've been looking through the /contrib/
directory, particularly the /contrib/btree_gist/ files, but it looks
like 90% of the complexity there is related to supporting all the
column types Postgres has, rather then actually tied to producing a
functional index.
* Once I have something compiling, how can I check to be sure that I'm
actually using the indexing module I created? I can enable my
compiled extension using `CREATE EXTENSION`, but is there an option
for `CREATE INDEX test_index ON testing USING gist (val);` that lets
me specify *which* GiST index is actually employed? Is this even a
valid question?
This is probably something that's available in one of the system
tables somewhere, but I haven't had much luck with google finding
out where.
* Testing: What's the appropriate way to examine the generated tree
structure of the index? I certainly went through a few bugs with my
test tree system (and that was in python!). Is there any way to
examine the index structure for debugging purposes?
* Also, it looks like `ereport()` is the proper way to emit debugging
information. Is this correct?
* In that vein, is there any way to have information that is on the
operations of an entire query? Looking at the number of tree nodes
touched for a scan would be nice (and I would not be surprised if
there is already a facility for it).

Project code is here <https://github.com/fake-name/pg-spgist_hamming> if
anyone is interested, any help would be great. I have very little idea
what I'm doing.

Thanks,
Connor

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Joseph Krogh 2014-10-09 07:58:43 Re: Importing large object (with lo_import) to table in separate tablespaces takes up lots of space in $PGDATA
Previous Message Andrus 2014-10-09 05:41:52 Re: Converting char to varchar automatically

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-10-09 07:38:01 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Heikki Linnakangas 2014-10-09 06:47:45 Re: What exactly is our CRC algorithm?