Re: trgm regex index peculiarity

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: trgm regex index peculiarity
Date: 2014-04-06 00:52:09
Message-ID: 29409.1396745529@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> Next revision of patch is attached. Changes are so:
> 1) Notion "penalty" is used instead of "size".
> 2) We try to reduce total penalty to WISH_TRGM_PENALTY, but restriction is
> MAX_TRGM_COUNT total trigrams count.
> 3) Penalties are assigned to particular color trigram classes. I.e.
> separate penalties for __a, _aa, _a_, aa_. It's based on analysis of
> trigram frequencies in Oscar Wilde writings. We can end up with different
> numbers, but I don't think they will be dramatically different.

Committed with cosmetic improvements (adjusting the comments mostly).

The new whitespace penalties look reasonably sane to me. I wonder though
if WISH_TRGM_PENALTY is too small --- it seems like this code will tend to
select many fewer trigrams than the old code did. What testing did you do
that led you to select the specific value of 16?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-04-06 06:15:59 Re: [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of < 0
Previous Message Alvaro Herrera 2014-04-06 00:22:40 Re: Another assert failure from no-palloc-in-critical-sections