Re: Understanding GIN posting trees

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Understanding GIN posting trees
Date: 2011-07-14 12:09:22
Message-ID: 4E1EDC72.4020608@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have a couple of questions on GIN:

The code seems to assume that it's possible for the same TID to appear
twice for a single key (see addItemPointersToTuple()). I understand that
it's possible for a single heap tuple to contain the same key twice. For
example if you index an array of integers like [1,2,1]. But once you've
inserted all the keys for a single heap item, you never try to insert
the same TID again, so no duplicates should occur.

Looking at the history, it looks like pre-8.4 we assumed that no such
duplicates are possible. Duplicates of a single key for one column are
eliminated in extractEntriesSU(), but apparently when the multi-column
support was added, we didn't make the de-duplication to run across the
keys extracted from all columns. Now that the posting tree/list
insertion code has to deal with duplicates anyway, the de-duplication
performed in extractEntriesSU() seems pointless. But I wonder if it
would be better to make extractEntriesSU() remove duplicates across all
columns, so that the insertion code wouldn't need to deal with duplicates.

Dealing with the duplicates in the insertion code isn't particularly
difficult. And in fact, now that we only support the getbitmap method,
we wouldn't really need to eliminate duplicates anyway. But I have an
ulterior motive:

Why is the posting tree a tree? AFAICS, we never search it using the
TID, it's always scanned in whole. It would be simpler to store the TIDs
in a posting list in no particular order. This could potentially make
insertions cheaper, as you could just append to the last posting list
page for the key, instead of traversing the posting tree to a particular
location. You could also pack the tids denser, as you wouldn't need to
reserve free space for additions in the middle.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Understanding GIN posting trees
Date: 2011-07-14 14:28:48
Message-ID: 4E1EFD20.4040006@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I have a couple of questions on GIN:
>
> The code seems to assume that it's possible for the same TID to appear
> twice for a single key (see addItemPointersToTuple()). I understand that
> it's possible for a single heap tuple to contain the same key twice. For
> example if you index an array of integers like [1,2,1]. But once you've
> inserted all the keys for a single heap item, you never try to insert
> the same TID again, so no duplicates should occur.
>
> Looking at the history, it looks like pre-8.4 we assumed that no such
> duplicates are possible. Duplicates of a single key for one column are
> eliminated in extractEntriesSU(), but apparently when the multi-column
> support was added, we didn't make the de-duplication to run across the
> keys extracted from all columns. Now that the posting tree/list
> insertion code has to deal with duplicates anyway, the de-duplication
> performed in extractEntriesSU() seems pointless. But I wonder if it
> would be better to make extractEntriesSU() remove duplicates across all
> columns, so that the insertion code wouldn't need to deal with duplicates.

During vacuuming of pending list we could get a powerloss and some data will be
in tree and pending list both, after restart pgsql will try to insert the same
data to tree.

> Dealing with the duplicates in the insertion code isn't particularly
> difficult. And in fact, now that we only support the getbitmap method,
> we wouldn't really need to eliminate duplicates anyway. But I have an
> ulterior motive:

> Why is the posting tree a tree? AFAICS, we never search it using the
> TID, it's always scanned in whole. It would be simpler to store the TIDs
> in a posting list in no particular order. This could potentially make
> insertions cheaper, as you could just append to the last posting list
> page for the key, instead of traversing the posting tree to a particular
> location. You could also pack the tids denser, as you wouldn't need to
> reserve free space for additions in the middle.
For consistentFn call we need to collect all data for current tid. We do that by
scanning over logical ordered arrays of tids and trees allows to do that by
scanning a leafs pages.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Understanding GIN posting trees
Date: 2011-07-14 19:10:35
Message-ID: 7534.1310670635@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Why is the posting tree a tree? AFAICS, we never search it using the
> TID, it's always scanned in whole. It would be simpler to store the TIDs
> in a posting list in no particular order. This could potentially make
> insertions cheaper, as you could just append to the last posting list
> page for the key, instead of traversing the posting tree to a particular
> location. You could also pack the tids denser, as you wouldn't need to
> reserve free space for additions in the middle.

Surely VACUUM would like to search it by TID for deletion purposes?

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Understanding GIN posting trees
Date: 2011-07-14 19:58:05
Message-ID: 4E1F4A4D.9040609@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.07.2011 22:10, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Why is the posting tree a tree? AFAICS, we never search it using the
>> TID, it's always scanned in whole. It would be simpler to store the TIDs
>> in a posting list in no particular order. This could potentially make
>> insertions cheaper, as you could just append to the last posting list
>> page for the key, instead of traversing the posting tree to a particular
>> location. You could also pack the tids denser, as you wouldn't need to
>> reserve free space for additions in the middle.
>
> Surely VACUUM would like to search it by TID for deletion purposes?

It doesn't, it scans all the tid lists in whole. I guess it could search
by TID, it could be a win if there's only a few deleted tuples, in a
small range of pages.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Understanding GIN posting trees
Date: 2011-07-14 21:10:00
Message-ID: 4E1F5B28.1060506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.07.2011 17:28, Teodor Sigaev wrote:
>> Why is the posting tree a tree? AFAICS, we never search it using the
>> TID, it's always scanned in whole. It would be simpler to store the TIDs
>> in a posting list in no particular order. This could potentially make
>> insertions cheaper, as you could just append to the last posting list
>> page for the key, instead of traversing the posting tree to a particular
>> location. You could also pack the tids denser, as you wouldn't need to
>> reserve free space for additions in the middle.
> For consistentFn call we need to collect all data for current tid. We do
> that by scanning over logical ordered arrays of tids and trees allows to
> do that by scanning a leafs pages.

Oh, I see. You essentially do a merge join of all the posting trees of
query keys.

Hmm, but we do need to scan all the posting trees of all the matched
keys in whole anyway. We could collect all TIDs in the posting lists of
all the keys into separate TIDBitmaps, and then combine the bitmaps,
calling consistentFn for each TID that was present in at least one
bitmap. I guess the performance characteristics of that would be
somewhat different from what we have now, and you'd need to keep a lot
of in-memory bitmaps if the query contains a lot of keys.

While we're at it, it just occurred to me that it if the number of query
keys is limited, say <= 16, you could build a lookup table for each
combination of keys either occurring or not. You could use then use that
instead of calling consistentFn for each possible match. You could even
use the table to detect common cases like "all/any keys must match",
perhaps opening some optimization opportunities elsewhere.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: Understanding GIN posting trees
Date: 2011-07-15 13:59:22
Message-ID: 4E2047BA.1070602@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Oh, I see. You essentially do a merge join of all the posting trees of
> query keys.
>
> Hmm, but we do need to scan all the posting trees of all the matched
> keys in whole anyway. We could collect all TIDs in the posting lists of
> all the keys into separate TIDBitmaps, and then combine the bitmaps,
> calling consistentFn for each TID that was present in at least one
> bitmap. I guess the performance characteristics of that would be
> somewhat different from what we have now, and you'd need to keep a lot
> of in-memory bitmaps if the query contains a lot of keys.
I hope to reimplement amgettuple interface someday and this interface is
designed for small startup cost. With bitmaps per search key it will be impossible.

> While we're at it, it just occurred to me that it if the number of query
> keys is limited, say <= 16, you could build a lookup table for each
> combination of keys either occurring or not. You could use then use that
> instead of calling consistentFn for each possible match. You could even
> use the table to detect common cases like "all/any keys must match",
> perhaps opening some optimization opportunities elsewhere.
I'm afraid that it becomes looking as a separate optimizer/planner :)

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/