Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)

Lists: pgsql-hackerspgsql-patches
From: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
To: pgsql-patches(at)postgresql(dot)org
Subject: Updated bitmap index patch
Date: 2007-05-03 03:51:29
Message-ID: Pine.LNX.4.58.0705031335240.10105@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi all,

Attached is an updated bitmap index patch. It contains bug fixes, API
changes, binary changes (page identifier to distinguish it from other
indexes) and has been brought up to HEAD.

I worked on a few approaches to VACUUM, none very satisfactory. The
problem is, breaking a compressed word representing matches can have
serious consequences -- at the least, creation of new words, at the worst,
creation of a new page. If a lot of this were to happen, REINDEX would be
much more efficient (this is what earlier patches did).

One approach I looked at was modifying the existing read API to be able to
do something like "kill prior tuple". This, I think, made the API quite
complex and it was hard to implement, since the existing mechanism
decompresses words on the fly and it would be hard to identify which TID
is no longer a match. So, I dropped this idea pretty quickly.

The second approach is to just manually traverse each vector and change
matches to non-matches where necessary. The complexity then is in managing
the consequences of breaking compressed words, doing WAL (efficiently) and
calculating free space. I've only partially implemented this approach. At
this stage, I don't have time to finish it due to other commitments.

Thanks,

Gavin

Attachment Content-Type Size
bitmap-2007-05-03.diff.gz application/x-gzip 82.7 KB

From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-03 04:56:59
Message-ID: 46396B9B.1090007@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>

I have applied this to todays HEAD performed some quick tests - looks
good! I have to re-create a TPC-H dataset to test one of the previous
bugs, so I'll probably look at that tomorrow or so.

> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>

The second approach seems like better the way to go (as far as I
understand the issues...). How much work is remaining on this? - not
sure that I'll have time to look at it either ... but may as well know
the size to the job :-) !

Cheers

Mark


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-03 11:57:26
Message-ID: 200705031157.l43BvQx26436@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
> Thanks,
>
> Gavin
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-04 06:55:19
Message-ID: 463AD8D7.9000402@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Mark Kirkwood wrote:

>
> I have applied this to todays HEAD performed some quick tests - looks
> good! I have to re-create a TPC-H dataset to test one of the previous
> bugs, so I'll probably look at that tomorrow or so.
>

The TPC-H query query that previously produced a SIGSEGV now runs and
gives the correct answer.

Cheers

Mark


From: Finlay Thompson <finlay(at)catalyst(dot)net(dot)nz>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-04 12:41:02
Message-ID: 1178282462.12996.5.camel@korora.catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi Gavin

Thanks for the new patch!

I ran some address matching on the patched code and have generated
another "ERROR: out of memory" problem.

The strange thing is that it runs over 150 queries without problem and
then crashes.

I have attached the logfile (well some of it).

If you want I can send you the schema as well....

Finlay

On Thu, 2007-05-03 at 13:51 +1000, Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
> Thanks,
>
> Gavin
> ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster

Attachment Content-Type Size
logfile text/plain 16.0 KB

From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Finlay Thompson" <finlay(at)catalyst(dot)net(dot)nz>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-04 18:15:36
Message-ID: C260C658.F416%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Finlay,

Thanks for testing. If you can send me the schema, that would be great.

Thanks,
Jie

On 5/4/07 5:41 AM, "Finlay Thompson" <finlay(at)catalyst(dot)net(dot)nz> wrote:

> Hi Gavin
>
> Thanks for the new patch!
>
> I ran some address matching on the patched code and have generated
> another "ERROR: out of memory" problem.
>
> The strange thing is that it runs over 150 queries without problem and
> then crashes.
>
> I have attached the logfile (well some of it).
>
> If you want I can send you the schema as well....
>
>
> Finlay
>
>
> On Thu, 2007-05-03 at 13:51 +1000, Gavin Sherry wrote:
>> Hi all,
>>
>> Attached is an updated bitmap index patch. It contains bug fixes, API
>> changes, binary changes (page identifier to distinguish it from other
>> indexes) and has been brought up to HEAD.
>>
>> I worked on a few approaches to VACUUM, none very satisfactory. The
>> problem is, breaking a compressed word representing matches can have
>> serious consequences -- at the least, creation of new words, at the worst,
>> creation of a new page. If a lot of this were to happen, REINDEX would be
>> much more efficient (this is what earlier patches did).
>>
>> One approach I looked at was modifying the existing read API to be able to
>> do something like "kill prior tuple". This, I think, made the API quite
>> complex and it was hard to implement, since the existing mechanism
>> decompresses words on the fly and it would be hard to identify which TID
>> is no longer a match. So, I dropped this idea pretty quickly.
>>
>> The second approach is to just manually traverse each vector and change
>> matches to non-matches where necessary. The complexity then is in managing
>> the consequences of breaking compressed words, doing WAL (efficiently) and
>> calculating free space. I've only partially implemented this approach. At
>> this stage, I don't have time to finish it due to other commitments.
>>
>> Thanks,
>>
>> Gavin
>> ---------------------------(end of broadcast)--------------------------- TIP
>> 2: Don't 'kill -9' the postmaster
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-17 17:17:07
Message-ID: 20070517171707.GI28701@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.

Very minor nitpick: I accidentally noticed that the Makefile in this
patch uses the "depend" target. We don't use that anymore.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-17 21:35:01
Message-ID: 200705172135.l4HLZ1i11501@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Due to unfinished VACUUM:

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
> Thanks,
>
> Gavin
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gavin Sherry <swm(at)alcove(dot)com(dot)au>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-17 22:20:27
Message-ID: 20070517222027.GU28701@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian wrote:
>
> Due to unfinished VACUUM:
>
> This has been saved for the 8.4 release:
>
> http://momjian.postgresql.org/cgi-bin/pgpatches_hold

While we're at this, let's consider Heikki's patch for the streaming
indexscan API stuff. That patch was supposed to come from the bitmap
index patch, but it was also supposed to help the Grouped Index Tuples
(GIT) patch.

In fact, as far as I understood the discussion, GIT would be helped by
the streaming indexscan API patch, because it would help de-uglify
certain parts of that patch. It is my impression that we would not want
to commit the ugly GIT patch; so it would mean that either we commit the
streaming indexscan patch first, and a beautiful GIT patch shortly later,
or we don't commit any of them.

So, the question is: do we want the GIT patch in 8.3? If we do, then we
need the streaming indexscan API patch. And thus we would need an
updated (beautiful) GIT patch.

Is there a resolution on this? Alexey Kluykin and I would be interested
in helping review this set of patches, if it is decided that they are
wanted for 8.3.

I want to point out that the GIT patch is one of the things sitting in
the reviewing queue from very early.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Gavin Sherry <swm(at)alcove(dot)com(dot)au>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-17 22:40:05
Message-ID: 200705172240.l4HMe5629468@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I though the stream bitmaps was a cleanup of on-disk bitmaps, but I
think you are right that it was for GII too:

http://archives.postgresql.org/pgsql-patches/2007-03/msg00163.php

Heikki, can you clarify this and send us an updated version?

---------------------------------------------------------------------------

Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > Due to unfinished VACUUM:
> >
> > This has been saved for the 8.4 release:
> >
> > http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> While we're at this, let's consider Heikki's patch for the streaming
> indexscan API stuff. That patch was supposed to come from the bitmap
> index patch, but it was also supposed to help the Grouped Index Tuples
> (GIT) patch.
>
> In fact, as far as I understood the discussion, GIT would be helped by
> the streaming indexscan API patch, because it would help de-uglify
> certain parts of that patch. It is my impression that we would not want
> to commit the ugly GIT patch; so it would mean that either we commit the
> streaming indexscan patch first, and a beautiful GIT patch shortly later,
> or we don't commit any of them.
>
> So, the question is: do we want the GIT patch in 8.3? If we do, then we
> need the streaming indexscan API patch. And thus we would need an
> updated (beautiful) GIT patch.
>
> Is there a resolution on this? Alexey Kluykin and I would be interested
> in helping review this set of patches, if it is decided that they are
> wanted for 8.3.
>
> I want to point out that the GIT patch is one of the things sitting in
> the reviewing queue from very early.
>
> --
> Alvaro Herrera http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-05-18 09:47:19
Message-ID: 464D7627.60807@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> While we're at this, let's consider Heikki's patch for the streaming
> indexscan API stuff. That patch was supposed to come from the bitmap
> index patch, but it was also supposed to help the Grouped Index Tuples
> (GIT) patch.
>
> In fact, as far as I understood the discussion, GIT would be helped by
> the streaming indexscan API patch, because it would help de-uglify
> certain parts of that patch. It is my impression that we would not want
> to commit the ugly GIT patch; so it would mean that either we commit the
> streaming indexscan patch first, and a beautiful GIT patch shortly later,
> or we don't commit any of them.

Right, for GIT the relevant part of the patch is the support for
returning candidate matches.

However, that's just one source of ugliness in the GIT patch. The
amgettuple interface needs to be changed as well, here's the proposal
for that:

http://archives.postgresql.org/pgsql-hackers/2007-03/msg01079.php

The internals aren't that beautiful either, but I wanted to start from
the API.

> So, the question is: do we want the GIT patch in 8.3? If we do, then we
> need the streaming indexscan API patch. And thus we would need an
> updated (beautiful) GIT patch.
>
> Is there a resolution on this? Alexey Kluykin and I would be interested
> in helping review this set of patches, if it is decided that they are
> wanted for 8.3.

Thanks!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-06-01 14:38:48
Message-ID: 46602F78.9050502@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gavin Sherry wrote:
> Hi all,
>
> Attached is an updated bitmap index patch. It contains bug fixes, API
> changes, binary changes (page identifier to distinguish it from other
> indexes) and has been brought up to HEAD.
>
> I worked on a few approaches to VACUUM, none very satisfactory. The
> problem is, breaking a compressed word representing matches can have
> serious consequences -- at the least, creation of new words, at the worst,
> creation of a new page. If a lot of this were to happen, REINDEX would be
> much more efficient (this is what earlier patches did).
>
> One approach I looked at was modifying the existing read API to be able to
> do something like "kill prior tuple". This, I think, made the API quite
> complex and it was hard to implement, since the existing mechanism
> decompresses words on the fly and it would be hard to identify which TID
> is no longer a match. So, I dropped this idea pretty quickly.
>
> The second approach is to just manually traverse each vector and change
> matches to non-matches where necessary. The complexity then is in managing
> the consequences of breaking compressed words, doing WAL (efficiently) and
> calculating free space. I've only partially implemented this approach. At
> this stage, I don't have time to finish it due to other commitments.
>
>
>

What exactly is the state of this patch? Reading this email it looks to
me like something that should wait for 8.4. Or is there some part of it
that is ready for 8.3? If so, which part?

cheers

andrew


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Updated bitmap index patch
Date: 2007-07-21 11:20:24
Message-ID: 1185016824.4284.48.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 2007-05-17 at 18:20 -0400, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> >
> > Due to unfinished VACUUM:
> >
> > This has been saved for the 8.4 release:
> >
> > http://momjian.postgresql.org/cgi-bin/pgpatches_hold
>
> While we're at this, let's consider Heikki's patch for the streaming
> indexscan API stuff. That patch was supposed to come from the bitmap
> index patch, but it was also supposed to help the Grouped Index Tuples
> (GIT) patch.
>
> In fact, as far as I understood the discussion, GIT would be helped by
> the streaming indexscan API patch, because it would help de-uglify
> certain parts of that patch. It is my impression that we would not want
> to commit the ugly GIT patch; so it would mean that either we commit the
> streaming indexscan patch first, and a beautiful GIT patch shortly later,
> or we don't commit any of them.
>
> So, the question is: do we want the GIT patch in 8.3? If we do, then we
> need the streaming indexscan API patch. And thus we would need an
> updated (beautiful) GIT patch.
>
> Is there a resolution on this? Alexey Kluykin and I would be interested
> in helping review this set of patches, if it is decided that they are
> wanted for 8.3.
>
> I want to point out that the GIT patch is one of the things sitting in
> the reviewing queue from very early.

Alvaro,

As you note above there is some linkage between bit map indexes and
clustered indexes, so it seems like we'll either get both or neither.

I notice the GIT patch is being listed as under review by Alexey and
yourself. Are you actively working on this, or is anyone else planning
on working on this review?

I'd like to help where I can if nobody else is currently doing this. I
would focus initially on some analysis of the various use cases to give
a better view on what we would need B-tree, clustered indexes and bitmap
indexes to do for us.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Updated bitmap index patch
Date: 2007-07-23 17:16:14
Message-ID: 20070723171614.GM2540@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Simon Riggs wrote:

> Alvaro,
>
> As you note above there is some linkage between bit map indexes and
> clustered indexes, so it seems like we'll either get both or neither.
>
> I notice the GIT patch is being listed as under review by Alexey and
> yourself. Are you actively working on this, or is anyone else planning
> on working on this review?

Sorry, no, I'm currently stuck elsewhere.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
Date: 2007-07-23 21:02:31
Message-ID: 1185224551.4284.373.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 2007-07-21 at 12:20 +0100, Simon Riggs wrote:

> I'd like to help where I can if nobody else is currently doing this. I
> would focus initially on some analysis of the various use cases to give
> a better view on what we would need B-tree, clustered indexes and bitmap
> indexes to do for us.

I've done some further analysis of bitmap indexes in preparation for a
comparison with clustered indexes (GIT), to help understand the use
cases for each.

Overall, my conclusion is that BMI and GIT have separate use cases,
almost opposite use cases or at least orthogonal ones. I would
eventually like both. BMI optimises for high numbers of rows per value,
whilst GIT optimises for clustering of values. BMI is not useful at all
for PKs, whilst GIT is specifically designed to handle them. Both handle
INSERTs well, though GIT handles growing numbers of values easily, BMI
prefers to keep the distribution more constant. GIT needs HOT to
continue to operate effectively for long periods, whereas BMI doesn't
seem to handle UPDATEs well at all (but more testing required on that
one).

---

Neither the latest bitmap index nor the latest GIT patch applied
cleanly. The bitmap patch was close, but GIT needs an update yet to
integrate Alexey's recent work.

My test case was a table with 10 million rows, with columns with varying
numbers of unique values. So Ndistinct = 100 means 100,000 rows per
value.

BITMAP INDEXES

Ndistinct Best time Size in blocks
1 10.6s 100
10 10.4s 102
100 11.7s 2002
1000 15.1s 6006
10000 19.8s 10046
100000 82.1s 100442
1000000 - >450000

Size exactly equivalent for both Integer and Text (same values). Build
time was similar also.

The test for 1 million distinct values didn't return after over 4 CPU
minutes expended with the disk going crazy. After a number of minutes I
decided to cancel the index build. Multiple cancels didn't stop the
build, so after some more time I decided to kill it, which then crashed
the server. Automatic restart crashed as well with a "could not find
transaction id 0" error. Clearly some WAL-weirdness to investigate...

Overall, I'd have to say that's quite enough for me to say bitmap is not
quite ready yet without clear health warnings. I had hopes...

B-TREE INDEXES (Integers)

Rows/value Best time Size in blocks
10000000 49s 21899
1000000 49s 21899
100000 49s 21899
10000 47s 21899
1000 43s 21899
100 38s 21899
10 38s 21899
1 33s 21899

Build time for Integers shown. Build time for Text ~x5-6 times as long.

Testing against equivalent b-tree builds, the fastest b-tree build I
could get was 33s on a unique integer index. So BMI build time is
certainly optimised for low numbers of distinct values, but doesn't have
any optimisation for when the BMI is built on a poor candidate column.
GIT does degrade down to a normal b-tree when clustering isn't
sufficient to give reduction in index size.

The cross-over point was between 10^4 and 10^5 distinct values for both
size and build time; on that test around 100-1000 rows per value. So
BMIs are probably still useful with varying number of rows per value,
but overall high Ndistinct proves inefficient in both build time and
space allocation. This isn't such a surprise since we know that b-tree
build uses a sort-based plan whereas BMI uses a hash based plan; neither
will win all of the time, we know that from the executor.

GIT works well even with unique indexes, since each grouped tuple covers
a range of values. I'll re-run the tests when I can to get timings. GIT
can compress typically down to 1-5% with clustered data, not quite as
good as bitmap's 0.5% best.

GIT's design was to have an index that was tuned for clustered data, yet
degrades cleanly to a standard b-tree when conditions are not right.
This makes me think that a hybrid b-tree should be possible, even
desirable. When the data is clustered, use the grouping technique to
reduce he number of tuples stored and when the data is highly non-unique
use the bitmap technique to reduce numbers of tuples. Using both
techniques in the same index would offer even wider flexibility, since
we'd be able to cater for real-world data more easily. Both GIT and BMI
use bitmaps, just in different ways.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
Date: 2007-07-23 21:19:28
Message-ID: 15822.1185225568@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> ... BMI is not useful at all
> for PKs, whilst GIT is specifically designed to handle them.

This seems a strange statement, because GIT doesn't look particularly
efficient for unique indexes AFAICS. In the worst case you'd have to
look individually at each tuple on a heap page to check for uniqueness
conflict (no binary search, because you couldn't assume they are
ordered).

> B-TREE INDEXES (Integers)

> Rows/value Best time Size in blocks
> 10000000 49s 21899
> 1000000 49s 21899
> 100000 49s 21899
> 10000 47s 21899
> 1000 43s 21899
> 100 38s 21899
> 10 38s 21899
> 1 33s 21899

Surely the GIT code failed to kick in at all here? That's just about
exactly the index size I'd expect for 10 million integers with the
existing btree code (at least when MAXALIGN=4).

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing new index types (was Re: [PATCHES] Updatedbitmap indexpatch)
Date: 2007-07-23 22:11:55
Message-ID: 1185228715.4284.433.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > ... BMI is not useful at all
> > for PKs, whilst GIT is specifically designed to handle them.
>
> This seems a strange statement, because GIT doesn't look particularly
> efficient for unique indexes AFAICS. In the worst case you'd have to
> look individually at each tuple on a heap page to check for uniqueness
> conflict (no binary search, because you couldn't assume they are
> ordered).

That is one of a few heuristics about the patch that need some active
discussion, so I'm glad you asked.

The main use case is nearly-unique, so for cases where we have a
Master:Detail relationship, e.g. Order:OrderItem. The Order index is a
PK, with the OrderItem index as a nearly unique key. The index is not
brilliant for the Order index, but is good for the OrderItem index.

Heikki designed the grouping so that there is a state change between
non-grouped and non-grouped (normal) index entries. By default the patch
uses a threshold of non-grouped -> grouped at N=2 index entries and then
no limit on the number of rows/block. Currently you can tune N, but we
might also envisage setting a limit on the width of the range of values
to limit the number of tids stored in a grouped index entry. That could
control the uniqueness overhead.

On an I/O bound workload the space saving on the index outweighs the CPU
loss from uniqueness checking. When I/O is not an issue then
unfortunately there is a CPU overhead.

For GIT it would appear that the summary is that it gives a slight loss
on medium sized PK indexes and an increasing win as index size
increases. We struggled to come up with ways of making it Just Work with
as few parameters as possible.

> > B-TREE INDEXES (Integers)
>
> > Rows/value Best time Size in blocks
> > 10000000 49s 21899
> > 1000000 49s 21899
> > 100000 49s 21899
> > 10000 47s 21899
> > 1000 43s 21899
> > 100 38s 21899
> > 10 38s 21899
> > 1 33s 21899
>
> Surely the GIT code failed to kick in at all here? That's just about
> exactly the index size I'd expect for 10 million integers with the
> existing btree code (at least when MAXALIGN=4).

That was the b-tree test, i.e. the control. The GIT patch has bitrot, so
not able to test just yet.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing new index types (was Re: [PATCHES]Updatedbitmap indexpatch)
Date: 2007-07-24 06:41:39
Message-ID: 1185259299.4263.42.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 2007-07-23 at 23:11 +0100, Simon Riggs wrote:
> On Mon, 2007-07-23 at 17:19 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > > ... BMI is not useful at all
> > > for PKs, whilst GIT is specifically designed to handle them.
> >
> > This seems a strange statement, because GIT doesn't look particularly
> > efficient for unique indexes AFAICS. In the worst case you'd have to
> > look individually at each tuple on a heap page to check for uniqueness
> > conflict (no binary search, because you couldn't assume they are
> > ordered).
>
> That is one of a few heuristics about the patch that need some active
> discussion, so I'm glad you asked.
>
> The main use case is nearly-unique, so for cases where we have a
> Master:Detail relationship, e.g. Order:OrderItem. The Order index is a
> PK, with the OrderItem index as a nearly unique key. The index is not
> brilliant for the Order index, but is good for the OrderItem index.
>
> Heikki designed the grouping so that there is a state change between
> non-grouped and non-grouped (normal) index entries. By default the patch
> uses a threshold of non-grouped -> grouped at N=2 index entries and then
> no limit on the number of rows/block. Currently you can tune N, but we
> might also envisage setting a limit on the width of the range of values
> to limit the number of tids stored in a grouped index entry. That could
> control the uniqueness overhead.

Possibly Heikki might add more here, but it occurs to me that I didn't
mention two other things about uniqueness checking.

The state change occurs when the block fills, so up to that point all
the index entries are separate, so no additional uniqueness checking
cost. When the state change does occur the highest value is always left
as a singleton index entry, again to speed uniqueness checking. This
copes with INSERTs, since the dominant use case is to have a
similar-to-the-last-high-value or increasing key (for PKs).

Lastly, GIT is designed to work in conjunction with HOT. When doing HOT
updates there are no index insertions, so far fewer uniqueness checks
need to be performed anyway.

So overall, GIT is reasonably well suited to unique indexes. But I think
you can see that these behaviours influence the performance
considerably, even though they are just small parts of the patch.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
Date: 2007-07-24 14:10:11
Message-ID: 46A60843.2010308@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>> ... BMI is not useful at all
>> for PKs, whilst GIT is specifically designed to handle them.
>
> This seems a strange statement, because GIT doesn't look particularly
> efficient for unique indexes AFAICS. In the worst case you'd have to
> look individually at each tuple on a heap page to check for uniqueness
> conflict (no binary search, because you couldn't assume they are
> ordered).

It handles them in the sense that a clustered PK index is way smaller
than a normal PK index. Unlike the bitmap index, which is not suitable
for highly distinct columns.

Inserting and performing a uniqueness check is more expensive on a
clustered index, because as you said it needs to scan the heap page
looking for conflicts. It's alleviated by the heuristics Simon
mentioned; a page is "groupified" when only when it gets full, which
means there'll usually be a mixture of normal and groupified tuples on a
leaf page. In particular, if there's hot key values that are repeatedly
inserted, the index tuples corresponding those key values are likely to
stay as normal index tuples, and are therefore cheaper to check
uniqueness against.

Also IIRC, the patch tries to keep the last index tuple on a page as a
normal index tuple, which catches the important special case of
inserting monotonically increasing keys, like with a sequence-generated PK.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reviewing new index types (was Re: [PATCHES] Updated bitmap indexpatch)
Date: 2007-09-26 08:36:30
Message-ID: 200709260836.l8Q8aUd11470@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Sat, 2007-07-21 at 12:20 +0100, Simon Riggs wrote:
>
> > I'd like to help where I can if nobody else is currently doing this. I
> > would focus initially on some analysis of the various use cases to give
> > a better view on what we would need B-tree, clustered indexes and bitmap
> > indexes to do for us.
>
> I've done some further analysis of bitmap indexes in preparation for a
> comparison with clustered indexes (GIT), to help understand the use
> cases for each.
>
> Overall, my conclusion is that BMI and GIT have separate use cases,
> almost opposite use cases or at least orthogonal ones. I would
> eventually like both. BMI optimises for high numbers of rows per value,
> whilst GIT optimises for clustering of values. BMI is not useful at all
> for PKs, whilst GIT is specifically designed to handle them. Both handle
> INSERTs well, though GIT handles growing numbers of values easily, BMI
> prefers to keep the distribution more constant. GIT needs HOT to
> continue to operate effectively for long periods, whereas BMI doesn't
> seem to handle UPDATEs well at all (but more testing required on that
> one).
>
> ---
>
> Neither the latest bitmap index nor the latest GIT patch applied
> cleanly. The bitmap patch was close, but GIT needs an update yet to
> integrate Alexey's recent work.
>
> My test case was a table with 10 million rows, with columns with varying
> numbers of unique values. So Ndistinct = 100 means 100,000 rows per
> value.
>
> BITMAP INDEXES
>
> Ndistinct Best time Size in blocks
> 1 10.6s 100
> 10 10.4s 102
> 100 11.7s 2002
> 1000 15.1s 6006
> 10000 19.8s 10046
> 100000 82.1s 100442
> 1000000 - >450000
>
> Size exactly equivalent for both Integer and Text (same values). Build
> time was similar also.
>
> The test for 1 million distinct values didn't return after over 4 CPU
> minutes expended with the disk going crazy. After a number of minutes I
> decided to cancel the index build. Multiple cancels didn't stop the
> build, so after some more time I decided to kill it, which then crashed
> the server. Automatic restart crashed as well with a "could not find
> transaction id 0" error. Clearly some WAL-weirdness to investigate...
>
> Overall, I'd have to say that's quite enough for me to say bitmap is not
> quite ready yet without clear health warnings. I had hopes...
>
> B-TREE INDEXES (Integers)
>
> Rows/value Best time Size in blocks
> 10000000 49s 21899
> 1000000 49s 21899
> 100000 49s 21899
> 10000 47s 21899
> 1000 43s 21899
> 100 38s 21899
> 10 38s 21899
> 1 33s 21899
>
> Build time for Integers shown. Build time for Text ~x5-6 times as long.
>
> Testing against equivalent b-tree builds, the fastest b-tree build I
> could get was 33s on a unique integer index. So BMI build time is
> certainly optimised for low numbers of distinct values, but doesn't have
> any optimisation for when the BMI is built on a poor candidate column.
> GIT does degrade down to a normal b-tree when clustering isn't
> sufficient to give reduction in index size.
>
> The cross-over point was between 10^4 and 10^5 distinct values for both
> size and build time; on that test around 100-1000 rows per value. So
> BMIs are probably still useful with varying number of rows per value,
> but overall high Ndistinct proves inefficient in both build time and
> space allocation. This isn't such a surprise since we know that b-tree
> build uses a sort-based plan whereas BMI uses a hash based plan; neither
> will win all of the time, we know that from the executor.
>
> GIT works well even with unique indexes, since each grouped tuple covers
> a range of values. I'll re-run the tests when I can to get timings. GIT
> can compress typically down to 1-5% with clustered data, not quite as
> good as bitmap's 0.5% best.
>
> GIT's design was to have an index that was tuned for clustered data, yet
> degrades cleanly to a standard b-tree when conditions are not right.
> This makes me think that a hybrid b-tree should be possible, even
> desirable. When the data is clustered, use the grouping technique to
> reduce he number of tuples stored and when the data is highly non-unique
> use the bitmap technique to reduce numbers of tuples. Using both
> techniques in the same index would offer even wider flexibility, since
> we'd be able to cater for real-world data more easily. Both GIT and BMI
> use bitmaps, just in different ways.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +