Re: GIN, partial matches, lossy bitmaps

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: GIN, partial matches, lossy bitmaps
Date: 2009-03-02 19:14:23
Message-ID: 49AC300F.1050903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While reading the GIN code, I just rediscovered that the GIN partial
match support suffers from the same problem that I criticized the "fast
insert" patch about, and searching the archives I found that I already
complained about that back in April:

http://archives.postgresql.org/pgsql-patches/2008-04/msg00157.php

If I'm reading the code correctly, item pointers of all matching heap
tuples are first collected into a TIDBitmap, and then amgetnext returns
tuples from that one by one. If the bitmap becomes lossy, an error is
thrown. gingetbitmap is a dummy implementation: it creates a new
TIDBitmap and inserts all the tuples from the other TIDBitmap into it
one by one, and then returns the new TIDBitmap.

If we remove the support for regular, non-bitmap, index scans with GIN,
that could be cleaned up as well. Even if we don't do that, gingetbitmap
should not error when the bitmap becomes lossy, but just return the
lossy bitmap.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-02 20:02:35
Message-ID: 1236024155.12092.14.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-03-02 at 21:14 +0200, Heikki Linnakangas wrote:
> If I'm reading the code correctly, item pointers of all matching heap
> tuples are first collected into a TIDBitmap, and then amgetnext returns
> tuples from that one by one. If the bitmap becomes lossy, an error is
> thrown. gingetbitmap is a dummy implementation: it creates a new
> TIDBitmap and inserts all the tuples from the other TIDBitmap into it
> one by one, and then returns the new TIDBitmap.

Do you think that might be the cause of the extra startup overhead that
Robert Haas observed for bitmap scans?

> If we remove the support for regular, non-bitmap, index scans with GIN,
> that could be cleaned up as well. Even if we don't do that, gingetbitmap
> should not error when the bitmap becomes lossy, but just return the
> lossy bitmap.

That sounds reasonable to me.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-03 01:57:29
Message-ID: 603c8f070903021757w2f4b736dlf712a0a3ae5c225a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 2, 2009 at 2:14 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> While reading the GIN code, I just rediscovered that the GIN partial match
> support suffers from the same problem that I criticized the "fast insert"
> patch about, and searching the archives I found that I already complained
> about that back in April:
>
> http://archives.postgresql.org/pgsql-patches/2008-04/msg00157.php
>
> If I'm reading the code correctly, item pointers of all matching heap tuples
> are first collected into a TIDBitmap, and then amgetnext returns tuples from
> that one by one. If the bitmap becomes lossy, an error is thrown.
> gingetbitmap is a dummy implementation: it creates a new TIDBitmap and
> inserts all the tuples from the other TIDBitmap into it one by one, and then
> returns the new TIDBitmap.

The latest version of the path no longer does this - instead, it
flushes the pending list to the main index if the bitmap becomes
lossy. That strikes me as more tolerable than throwing an error, but
I agree with your criticism: I'm not sure why we are insisting on
using a TIDBitmap (which is designed to be lossy at times) in a
situation where we actually can't tolerate lossiness. In fact, this
was the main point of my original review of this patch.

http://archives.postgresql.org/message-id/603c8f070902101859j91fb78eg7e0228afe8f2fd36@mail.gmail.com

> If we remove the support for regular, non-bitmap, index scans with GIN, that
> could be cleaned up as well. Even if we don't do that, gingetbitmap should
> not error when the bitmap becomes lossy, but just return the lossy bitmap.

Make sense to me.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-03 02:03:24
Message-ID: 603c8f070903021803y145b1433vfd9a73ee45edffaa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 2, 2009 at 3:02 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2009-03-02 at 21:14 +0200, Heikki Linnakangas wrote:
>> If I'm reading the code correctly, item pointers of all matching heap
>> tuples are first collected into a TIDBitmap, and then amgetnext returns
>> tuples from that one by one. If the bitmap becomes lossy, an error is
>> thrown. gingetbitmap is a dummy implementation: it creates a new
>> TIDBitmap and inserts all the tuples from the other TIDBitmap into it
>> one by one, and then returns the new TIDBitmap.
>
> Do you think that might be the cause of the extra startup overhead that
> Robert Haas observed for bitmap scans?

I don't think this is the same thing. My point was that an index scan
wins big time over a bitmap index scan when the index scan doesn't
need to be run to completion - that is, when the query is a semi-join
or an anti-join, or when using LIMIT without ORDER BY.
This is true with or without Teodor's patch, and is the reason why I'm
not sure that removing index scan support is such a great idea.

...Robert


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-05 13:58:23
Message-ID: 49AFDA7F.2000707@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> If we remove the support for regular, non-bitmap, index scans with GIN,
> that could be cleaned up as well. Even if we don't do that, gingetbitmap
> should not error when the bitmap becomes lossy, but just return the
> lossy bitmap.
Changes since 28.2
(http://archives.postgresql.org/message-id/499B0FFA.8040608@sigaev.ru)

- fixes/changes pointed by Robert
(http://archives.postgresql.org/pgsql-hackers/2009-02/msg00987.php)
- gingetbitmap will never throw error about lossiness of bitmap, it will return
lossy bitmap even it was a prefix search.
- remove tbm_check_tuple/tbm_has_lossy/tbm_max_non_lossy methods because they
become unused
- add new method tbm_add_page(TIDBitmap*, BlockNumber) to add the whole page to
the TIDBitmap.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
fast_insert_gin-0.29.2.gz application/x-tar 26.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-05 23:35:24
Message-ID: 17706.1236296124@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> Changes since 28.2
> (http://archives.postgresql.org/message-id/499B0FFA.8040608@sigaev.ru)

> - fixes/changes pointed by Robert
> (http://archives.postgresql.org/pgsql-hackers/2009-02/msg00987.php)
> - gingetbitmap will never throw error about lossiness of bitmap, it will return
> lossy bitmap even it was a prefix search.
> - remove tbm_check_tuple/tbm_has_lossy/tbm_max_non_lossy methods because they
> become unused
> - add new method tbm_add_page(TIDBitmap*, BlockNumber) to add the whole page to
> the TIDBitmap.

I cleaned up and applied the planner part of this, since that seems
reasonably useful in its own right for experimental index AMs,
regardless of where we settle out for GIN. (The "cleanup" mostly
consisted of fixing it to not make extra calls to find_usable_indexes
--- that's an expensive function, and there's no very good reason to
run it another time rather than separating out the indexes afterwards.)

Attached is the remainder of the patch with relatively minor fixes.
The main change I made is to get rid of the changes in gincostestimate;
I agree with Robert that it's probably inappropriate to consider the
current pending-list size during planning. I haven't really reviewed
any of the rest of it; this is just to have a clean patch against HEAD.

regards, tom lane

Attachment Content-Type Size
fast_insert_gin-0.30.gz application/octet-stream 22.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-06 02:50:45
Message-ID: 603c8f070903051850u5c5460dai1b90a6cae175e855@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 5, 2009 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Attached is the remainder of the patch with relatively minor fixes.
> The main change I made is to get rid of the changes in gincostestimate;
> I agree with Robert that it's probably inappropriate to consider the
> current pending-list size during planning.  I haven't really reviewed
> any of the rest of it; this is just to have a clean patch against HEAD.

The changes to config.sgml are not good English and contain
typographical errors. It could also be a bit more informatiave, maybe
something like:

This parameter also specifies the number of insert or updated tuples
needed to trigger <command>VACUUM</> on a <acronym>GIN</acronym>
index. <acronym>GIN</acronym> indexes require <command>VACUUM</>
after insert or update operations because newly inserted tuples are
initially stored in an unsorted pending list.

I still think removing index scans entirely is short-sighted - but I
may be outvoted (then again, no one other than Tom has really
expressed an opinion one way or the other, and I initially agreed with
him until I thought about the performance aspects some more).

...Robert


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-13 11:56:18
Message-ID: Pine.LNX.4.64.0903131437160.31919@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 5 Mar 2009, Robert Haas wrote:

> On Thu, Mar 5, 2009 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Attached is the remainder of the patch with relatively minor fixes.
>> The main change I made is to get rid of the changes in gincostestimate;
>> I agree with Robert that it's probably inappropriate to consider the
>> current pending-list size during planning.  I haven't really reviewed
>> any of the rest of it; this is just to have a clean patch against HEAD.
>
> The changes to config.sgml are not good English and contain
> typographical errors. It could also be a bit more informatiave, maybe
> something like:
>
> This parameter also specifies the number of insert or updated tuples
> needed to trigger <command>VACUUM</> on a <acronym>GIN</acronym>
> index. <acronym>GIN</acronym> indexes require <command>VACUUM</>
> after insert or update operations because newly inserted tuples are
> initially stored in an unsorted pending list.

thanks, will update docs.

>
> I still think removing index scans entirely is short-sighted - but I
> may be outvoted (then again, no one other than Tom has really
> expressed an opinion one way or the other, and I initially agreed with
> him until I thought about the performance aspects some more).

I'm also wonder if we're on the right way, since the only serious
issue with indexscan was possible problem with slaves, but read-only slaves
delayed to 8.5, so this is not an issue now. In 8.5 development cycle we'll
certainly return to this issue, so why do we disable index scan for 8.4 ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-13 12:25:30
Message-ID: 25428.1236947130@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> I'm also wonder if we're on the right way, since the only serious
> issue with indexscan was possible problem with slaves, but read-only slaves
> delayed to 8.5, so this is not an issue now. In 8.5 development cycle we'll
> certainly return to this issue, so why do we disable index scan for 8.4 ?

So that we have a trouble-free feature in 8.4. I have no confidence
in the solution that was proposed, and am more than willing to accept
the loss of plain-indexscan support to avoid the risk of worse bugs.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN, partial matches, lossy bitmaps
Date: 2009-03-23 15:38:05
Message-ID: 15127.1237822685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Attached is the remainder of the patch with relatively minor fixes.
> The main change I made is to get rid of the changes in gincostestimate;
> I agree with Robert that it's probably inappropriate to consider the
> current pending-list size during planning. I haven't really reviewed
> any of the rest of it; this is just to have a clean patch against HEAD.

Is that patch (fast_insert_gin-0.30.gz) still the latest version of
the fast-insert patch? Has anyone else done any further work?

regards, tom lane