Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2013-06-19 08:30:56
Message-ID: CAPpHfdtMPreArtJpJNKuYCiURvt5=G+UJjx5buHVSnM01Cbi8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 18.06.2013 23:59, Alexander Korotkov wrote:
>
>> I would like to illustrate that on example. Imagine you have fulltext
>> query
>> "rare_term& frequent_term". Frequent term has large posting tree while
>>
>> rare term has only small posting list containing iptr1, iptr2 and iptr3.
>> At
>> first we get iptr1 from posting list of rare term, then we would like to
>> check whether we have to scan part of frequent term posting tree where
>> iptr
>> < iptr1. So we call pre_consistent([false, true]), because we know that
>> rare term is not present for iptr< iptr2. pre_consistent returns false.
>> So
>> we can start scanning frequent term posting tree from iptr1. Similarly we
>> can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to
>> maximum possible pointer.
>>
>
> Thanks, now I understand the rare-term & frequent-term problem. Couldn't
> you do that with the existing consistent function? I don't see why you need
> the new pre-consistent function for this.

In the case of two entries I can. But in the case of n entries things
becomes more complicated. Imagine you have "term_1 & term_2 & ... & term_n"
query. When you get some item pointer from term_1 you can skip all the
lesser item pointers from term_2, term_3 ... term_n. But if all you have
for it is consistent function you have to call it with following check
arguments:
1) [false, false, false, ... , false]
2) [false, true, false, ... , false]
3) [false, false, true, ... , false]
4) [false, true, true, ..., false]
......
i.e. you have to call it 2^(n-1) times. But if you know the query specific
(i.e. in opclass) it's typically easy to calculate exactly what we need in
single pass. That's why I introduced pre_consistent.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2013-06-19 08:35:02 Re: GIN improvements part2: fast scan
Previous Message Kyotaro HORIGUCHI 2013-06-19 08:19:07 Re: Add visibility map information to pg_freespace.