Re: Index AM change proposals, redux

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Index AM change proposals, redux
Date: 2008-04-11 18:33:14
Message-ID: Pine.LNX.4.64.0804112228380.21547@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Slightly offtopic. How to get benefit on tuple level ? For example,
we mark GiST tsearch index as lossy, while for not very big documents it's
actually exact and we could save a lot not rechecking them.

Oleg

On Fri, 11 Apr 2008, Teodor Sigaev wrote:

>> Teodor, do you have any thoughts about exactly how you'd fix @@@ ?
>> I suppose that the recheck-need is not really a property of specific
>> tuples, but of a particular query, for that case. Where would you
>> want to detect that?
>
> tsquery may include restriction by weight of search terms: 'sea & port:A'.
> GIN index doesn't store information about weights, so the only difference
> between @@ and @@@ is that @@@ is marked with RECHECK flag. I think, the
> better way is set flag about required recheck by looking value from index,
> not for tsquery. It gives to us more flexibility.
>
> So, I planned to add pointer to bool to consistent method, so signature will
> be
> bool consistent( bool check[], StrategyNumber n, Datum query, bool
> *needRecheck)
>
> Returning value of needRecheck should be ignored for operation not marked by
> RECHECK flag in opclass. needRecheck should be initialized to true before
> call of consistent method to keep compatibility with old opclasses.
>
> To define, is recheck needed or not, the better way is to check actually
> needed values. For example, let tsquery is equal to
> 'foo | bar | qq:A' and tsvetor = 'foo:1,2,3 asdasdasd:4'. Obviously recheck
> is not needed. So patch is close to trivial:
>
> *** tsginidx.c.orig 2008-04-11 17:08:37.000000000 +0400
> --- tsginidx.c 2008-04-11 17:18:45.000000000 +0400
> ***************
> *** 109,114 ****
> --- 109,115 ----
> {
> QueryItem *frst;
> bool *mapped_check;
> + bool *needRecheck;
> } GinChkVal;
>
> static bool
> ***************
> *** 116,121 ****
> --- 117,125 ----
> {
> GinChkVal *gcv = (GinChkVal *) checkval;
>
> + if ( val->weight )
> + *(gcv->needRecheck) = true;
> +
> return gcv->mapped_check[((QueryItem *) val) - gcv->frst];
> }
>
> ***************
> *** 144,149 ****
> --- 148,155 ----
>
> gcv.frst = item = GETQUERY(query);
> gcv.mapped_check = (bool *) palloc(sizeof(bool) *
> query->size);
> + gcv.needRecheck = PG_GETARG_POINTER(3);
> + *(gcv.needRecheck) = false;
>
> for (i = 0; i < query->size; i++)
> if (item[i].type == QI_VAL)
>
>
>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-11 18:33:40 Commit fest status
Previous Message Alvaro Herrera 2008-04-11 18:27:35 Re: Patch to add objetct size on "\d+" verbose output