Re: combined indexes with Gist - planner issues?

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Hans-Juergen Schoenig -- PostgreSQL <postgres(at)cybertec(dot)at>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zoltan Boszormenyi <zb(at)cybertec(dot)at>
Subject: Re: combined indexes with Gist - planner issues?
Date: 2009-09-03 05:33:39
Message-ID: 4A9F5533.2010105@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Hans-Juergen Schoenig -- PostgreSQL wrote:
> we did some experiments with doing such a table.
> the problem is if you want to allow arbitrary combinations of words
> which can be modeled perfectly with FTI.
> you would instantly end up with a self join with 5 relations or so -
> which is again bad.
>
> there are too many common words to consider doing with partly with gist
> and partly with a btree.

How about an inverted index, either via GIN or with a custom table, such
that you have the cheapest price per existing word. (That's pretty close
to how full text searching itself works). Either reduce the number of
words with tsearch2's stemming algorithms. Or go for trigrams right
away. Split a word or search query in all its trigrams, then look up the
(cheapest) price(s) per trigram and return the n least expensive ones.

I've done somethings pretty similar for a customer, using the custom
table approach, as integration of GIN just started back then. Even now,
you need to consider the downside of that index lacking visibility
information and having to recreate the index from time to time. OTOH a
custom table needs a lot more manual twiddling with triggers and bulk
"index" rebuilding.

I guess I'd still go for a custom table today, as it's simply more
flexible. Something like:

CREATE TABLE cheapest_product_by_word (
trgm TEXT,
cheapest_products bigint[]
);

However, all of this is assuming that data (i.e. prices, products)
change very rarely and it's beneficial to calculate such an intermediate
lookup-table in advance. Not sure how much that's the case for you.

Regards

Markus Wanner

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2009-09-03 06:50:28 [PATCH] Reworks for Access Control facilities (r2277)
Previous Message Tom Lane 2009-09-03 05:10:00 Re: Triggers on columns