Quick Links

WIP: bitmap indexes

Lists:	pgsql-hackerspgsql-patches

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	pgsql-patches(at)postgresql(dot)org
Subject:	WIP: bitmap indexes
Date:	2006-08-01 04:01:04
Message-ID:	Pine.LNX.4.58.0608011342001.1411@linuxworld.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Hi all,

Attached is an update to the patch implementing bitmap indexes Jie sent
last week.

This patch tidies up some coding style issues, the system catalogs, adds
some basic docs and regression tests, as well as additional
functionality.

There are still outstanding bugs and problems. These are:

a) The planner doesn't really know about bitmaps. The code cheats. As
such, bitmap index access is not costed correctly.

b) There is, as Tom pointed out, a lot of code duplication around
BitmapHeapNext(), MultiExecBitmapIndexScan() and related routines. This
needs to be tidied up and would probably benefit from Tom's proposal to
change the behaviour of amgetmulti.

c) Related to this is the fact that the current on-disk bitmap cannot
handle the ScalarArrayOpExpr optimisation that normal bitmap scans can.
(The patch introduces some regression tests for bitmaps and one of these
fails with an invalid row count. This displays the problem that needs to
be solved).

d) Also related to this, in() subqueries are causing us to hit some
uninitialised memory. I haven't had time to explore this but it is related
to the architectural issue above.

e) Jie is hunting down a bug in multi-column support.

f) I haven't tested concurrency

I will continue to send in matches as we we make progress on these issues.
Feed back, in particular on (a) and (b), are most welcome.

Thanks,

Attachment	Content-Type	Size
bitmap-2.diff	text/plain	252.8 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, "Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 01:38:36
Message-ID:	27266.1155605916@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> Attached is an update to the patch implementing bitmap indexes Jie sent
> last week.

What's the current status of this patch ... has any work been done since
the first of the month?

I suppose the patch as given here no longer applies to HEAD, because of
recent changes in rmgr for Simon's restartable-recovery patch (should be
easy enough to fix) and consumption of OIDs by other patches (also easy
to fix, in principle, but doubtless tedious).

If you have to renumber OIDs I'd suggest starting at 3000 for your added
OIDs, to leave some daylight for any other patches that go in during the
next couple weeks.

regards, tom lane

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 01:50:11
Message-ID:	Pine.LNX.4.58.0608151147300.22006@linuxworld.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, 14 Aug 2006, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > Attached is an update to the patch implementing bitmap indexes Jie sent
> > last week.
>
> What's the current status of this patch ... has any work been done since
> the first of the month?

Yes. I am tidying up the executor code at the moment so that the on
disk bitmap support is more or less hidden from the executor itself. and
Jie has been fixing multicolumn support and working on the planner. I've
also brought the code into line with PostgreSQL style, etc.

>
> I suppose the patch as given here no longer applies to HEAD, because of
> recent changes in rmgr for Simon's restartable-recovery patch (should be
> easy enough to fix) and consumption of OIDs by other patches (also easy
> to fix, in principle, but doubtless tedious).

Right.

> If you have to renumber OIDs I'd suggest starting at 3000 for your added
> OIDs, to leave some daylight for any other patches that go in during the
> next couple weeks.

Thanks.

I will post an updated patch in a few days time.

Gavin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc:	Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 02:20:25
Message-ID:	27537.1155608425@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> I will post an updated patch in a few days time.

OK. Do you want me to work on the discussed amgetmulti change, or would
that just be joggling your elbow at the moment?

regards, tom lane

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 02:36:13
Message-ID:	Pine.LNX.4.58.0608151223500.22335@linuxworld.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, 14 Aug 2006, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > I will post an updated patch in a few days time.
>
> OK. Do you want me to work on the discussed amgetmulti change, or would
> that just be joggling your elbow at the moment?

Yes, that would be joggling ;).

The approach I am taking is more or less the one you proposed a few weeks
ago. Namely, to pass the bitmap object as an argument to amgetmulti and
have the routine fill it in as it sees fit.

There is some trickiness, however.

One of the main reasons for the uglification of the executor in Jie's
original patch was that she wanted to avoid the inefficiency of
translating the on disk bitmap representation to the TID bitmap
representation. If the plan calls for a straight on disk
bitmap read or AND/ORing together a few on-disk bitmaps this is justified.
If we're AND/ORing together an ondisk bitmap read and a TID bitmap scan of
a btree, we're in trouble. In this case, the existing code implements the
current semantics of amgetmulti.

What I am doing is treating the bitmap object as opaque, storing the data
in the format the AM wants, and providing a 'translate to TID bitmap'
routine (trivial for btree).

Do you have any thoughts on this?

Thanks,

Gavin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc:	Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 03:37:54
Message-ID:	27950.1155613074@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> One of the main reasons for the uglification of the executor in Jie's
> original patch was that she wanted to avoid the inefficiency of
> translating the on disk bitmap representation to the TID bitmap
> representation.

Offhand that seems like micro-optimization, at least as stated -- you
want to think about number of page fetches rather than exactly how a
bitmap is represented. I've still not got round to reading the patch
in detail, but after thinking about it in the shower on a few mornings,
it seems to me that the key concept we'd like to implement is that a
bitmap index can provide its data in a page-by-page fashion without the
overhead of fabricating the bitmap in-memory as btree forces us to do.
That is, the current implementation involves doing the whole index scan
and building a bitmap in memory, which we then read out page-wise for
the heap scan. The weak spot of this approach is that the in-memory
bitmap can get unreasonably large. With a bitmap index, you can pass
data back in a page-by-page fashion: here are the TID indexes to hit on
this page, then here are the TID indexes to hit on the next page, etc.
Correct me if I'm wrong, but isn't the patch's present hacking on the
executor intended to make it happen like this?

The main architectural idea I have at the moment is that this should all
be private between tidbitmap.c and the bitmap index AM. I think we
could perhaps have a flavor of tidbitmap that is "serial access" as
opposed to the present random-access, ie, instead of keeping the whole
bitmap in memory, you have a function pointer that you can call to
demand the next bitmap page in sequence. tidbitmap.c will need to
manage both kinds of bitmaps and be prepared to do ANDs and ORs across
both types, but just at the moment I see no showstopper reason why this
can't work.

Or is that exactly what you said already? It's late here and I'm a
bit tired...

regards, tom lane

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 04:49:41
Message-ID:	Pine.LNX.4.58.0608151414160.23379@linuxworld.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On Mon, 14 Aug 2006, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > One of the main reasons for the uglification of the executor in Jie's
> > original patch was that she wanted to avoid the inefficiency of
> > translating the on disk bitmap representation to the TID bitmap
> > representation.
>
> Offhand that seems like micro-optimization, at least as stated -- you
> want to think about number of page fetches rather than exactly how a
> bitmap is represented. I've still not got round to reading the patch
> in detail, but after thinking about it in the shower on a few mornings,
> it seems to me that the key concept we'd like to implement is that a
> bitmap index can provide its data in a page-by-page fashion without the
> overhead of fabricating the bitmap in-memory as btree forces us to do.
> That is, the current implementation involves doing the whole index scan
> and building a bitmap in memory, which we then read out page-wise for
> the heap scan. The weak spot of this approach is that the in-memory
> bitmap can get unreasonably large. With a bitmap index, you can pass
> data back in a page-by-page fashion: here are the TID indexes to hit on
> this page, then here are the TID indexes to hit on the next page, etc.
> Correct me if I'm wrong, but isn't the patch's present hacking on the
> executor intended to make it happen like this?

Not really. It reads ahead on the bitmap index and passes back the bitmap
words. The other executor routines are hacked up to process the data in
this format.

If I understand your idea correctly, we could change this to read, say, a
page of the index at a time, store this internally in the state object we
pass around, and we can then read out of this the TIDs on a given heap
page which match the query. Once we process all the bitmap data, we just
pull more.

>
> The main architectural idea I have at the moment is that this should all
> be private between tidbitmap.c and the bitmap index AM. I think we
> could perhaps have a flavor of tidbitmap that is "serial access" as
> opposed to the present random-access, ie, instead of keeping the whole
> bitmap in memory, you have a function pointer that you can call to
> demand the next bitmap page in sequence. tidbitmap.c will need to
> manage both kinds of bitmaps and be prepared to do ANDs and ORs across
> both types, but just at the moment I see no showstopper reason why this
> can't work.
>
> Or is that exactly what you said already? It's late here and I'm a
> bit tired...

This is better than what I had in mind. It seems to me that part of this
which will be a litle ugly is dealing with "key in (1,2,3,...)"
transformation. Let me think about it...

Thanks,

Gavin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc:	Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-15 13:18:49
Message-ID:	2273.1155647929@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> On Mon, 14 Aug 2006, Tom Lane wrote:
>> Correct me if I'm wrong, but isn't the patch's present hacking on the
>> executor intended to make it happen like this?

> Not really. It reads ahead on the bitmap index and passes back the bitmap
> words. The other executor routines are hacked up to process the data in
> this format.

Well, as I said, I don't think there's justification for exposing a
bitmap index's internal data formats to the rest of the system like
that: it's not very future-proof and I don't see that it's buying any
significant performance gain. At some point you have to convert to TIDs
anyway, at least in the sense of knowing what page and line number you
are at, so passing the data as an array of TIDs really isn't going to
hurt much. So my advice is to rip out all those changes and go back to
the existing tidbitmap.c readout API. There's nothing wrong with
the TBMIterateResult data structure.

What I do find interesting to think about is whether, strictly within
tidbitmap.c, there could be an alternate kind of bitmap object that
doesn't have to materialize the whole bitmap for an indexscan in memory
because it knows it can fetch the data on-demand, ie, build the next
page TBMIterateResult data structure on-the-fly from the index when it's
requested. Call it a "stream bitmap" in contrast to the present
"materialized bitmaps". The trick here is to be able to AND and OR a
stream bitmap with another stream bitmap or a materialized bitmap.
I don't see any reason in principle why that couldn't be done: the
output of the AND/OR would be a stream of TBMIterateResults just like
the inputs. That is, it's another stream bitmap, but with a different
generating function and some internal state that includes its source
bitmaps. You'd have to sort a materialized bitmap into order before
starting to AND/OR it with a stream bitmap, but that code is there
already.

You'd probably need to break the abstraction enough for nodeBitmapAnd
and nodeBitmapOr to know which kinds of bitmaps they are dealing with
and do things a little differently in different cases --- in particular,
the optimization nodeBitmapOr understands about how to "OR" things
together probably doesn't work for stream bitmaps. But I see no reason
to be changing nodeBitmapHeapscan.

> ... It seems to me that part of this
> which will be a litle ugly is dealing with "key in (1,2,3,...)"
> transformation. Let me think about it...

You're worrying about the wrong problem if you focus solely on
ScalarArrayOpExpr. The general problem you need to be solving is
"how do I AND and OR these bitmaps efficiently"? Once you've fixed
that, the ScalarArrayOpExpr code is just one user of that behavior.

regards, tom lane

From:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-16 07:03:26
Message-ID:	C108114E.A55E%jzhang@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On 8/15/06 6:18 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
>> On Mon, 14 Aug 2006, Tom Lane wrote:
>>> Correct me if I'm wrong, but isn't the patch's present hacking on the
>>> executor intended to make it happen like this?
>
>> Not really. It reads ahead on the bitmap index and passes back the bitmap
>> words. The other executor routines are hacked up to process the data in
>> this format.
>
> Well, as I said, I don't think there's justification for exposing a
> bitmap index's internal data formats to the rest of the system like
> that: it's not very future-proof and I don't see that it's buying any
> significant performance gain. At some point you have to convert to TIDs
> anyway, at least in the sense of knowing what page and line number you
> are at, so passing the data as an array of TIDs really isn't going to
> hurt much. So my advice is to rip out all those changes and go back to
> the existing tidbitmap.c readout API. There's nothing wrong with
> the TBMIterateResult data structure.
>

The bitmap words in the bitmap index are very simple and can be very
generic. You can think about them as one bit per tuple along with some
padding bits between heap pages. The problem I have is that I do not know a
good way to construct an in-memory version of this for other index
structures, like b-tree. To be able to handle both cases nicely, you are
right -- TBMIterateResult is better. Or, PagetableEntry may be better since
it will make AND/OR easier.

> What I do find interesting to think about is whether, strictly within
> tidbitmap.c, there could be an alternate kind of bitmap object that
> doesn't have to materialize the whole bitmap for an indexscan in memory
> because it knows it can fetch the data on-demand, ie, build the next
> page TBMIterateResult data structure on-the-fly from the index when it's
> requested. Call it a "stream bitmap" in contrast to the present
> "materialized bitmaps". The trick here is to be able to AND and OR a
> stream bitmap with another stream bitmap or a materialized bitmap.
> I don't see any reason in principle why that couldn't be done: the
> output of the AND/OR would be a stream of TBMIterateResults just like
> the inputs. That is, it's another stream bitmap, but with a different
> generating function and some internal state that includes its source
> bitmaps. You'd have to sort a materialized bitmap into order before
> starting to AND/OR it with a stream bitmap, but that code is there
> already.

I like this idea. I think that we can define a new TBMStatus to be
TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except
that we add a new returning bool argument, stating if this is a stream
bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages,
and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several
bitmaps, the result bitmap is a stream bitmap if there is at least one
bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to
be able to pull more data from its subnode.

Thanks,
Jie

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc:	"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-16 13:15:41
Message-ID:	13399.1155734141@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> On 8/15/06 6:18 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, as I said, I don't think there's justification for exposing a
>> bitmap index's internal data formats to the rest of the system like
>> that: it's not very future-proof and I don't see that it's buying any
>> significant performance gain.

> The bitmap words in the bitmap index are very simple and can be very
> generic.

They're not generic in the least: there's a compression scheme involved
that you might want to whack around at any time. So I disagree with the
idea that it's OK to expose the format outside the access/bitmap/ module.

> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap.

It occurs to me that what tbm_begin_iterate really is is a constructor
for a stream bitmap object that reads out the contents of a tbm bitmap
(we need a decent name for the non-stream data structure ... maybe
hash bitmap?). If we think of it like that then we can unify the
ideas some more.

My proposal at this point would be to invent two different Node types,
one for stream bitmaps and one for hash bitmaps. The initial input to
nodeBitmapHeapscan can be either, but if it's given a hash bitmap then
it stream-ifies it for use. amgetmulti can return either kind, and
nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do.
Preserving the existing optimization for ORing hash bitmaps is a bit
tricky but I think it's doable. Consider this API for amgetmulti:

amgetmulti takes an argument which can be either a hash bitmap or NULL.
It returns an object that must be either a hash or stream bitmap.
If it wants to return a stream bitmap, it simply disregards the argument
and returns a constructed stream-bitmap object. If it wants to return
a hash bitmap, then if the argument is not NULL, OR the additional bits
into the argument object and return it; if the argument is NULL,
construct a fresh hash-bitmap object, set bits in it, and return it.

Assume that we have the existing hash-bitmap AND/OR functions as well as
constructors for AND and OR stream bitmaps that take lists of input
stream objects. Then the algorithm for nodeBitmapOr looks like this:

HashBitmap *hashBitmap = NULL;
List *streamBitmaps = NIL;

foreach(input plan)
{
Node *newBitmap = amgetmulti(hashBitmap);

if (IsA(newBitmap, HashBitmap))
{
// any OR-ing required was done implicitly
hashBitmap = newBitmap;
}
else
{
Assert(IsA(newBitmap, StreamBitmap));
streamBitmaps = lappend(streamBitmaps, newBitmap);
}
}

if (streamBitmaps == NIL)
{
// all inputs returned hash, so we're done
return hashBitmap;
}
else
{
// need a stream OR operation atop the inputs
if (hashBitmap)
streamBitmaps = lappend(streamBitmaps,
HashToStreamBitmap(hashBitmap));
return ConstructStreamOr(streamBitmaps);
}

nodeBitmapAnd is a bit different but not any harder.

regards, tom lane

From:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-17 07:36:01
Message-ID:	C1096A71.A59F%jzhang@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

> It occurs to me that what tbm_begin_iterate really is is a constructor
> for a stream bitmap object that reads out the contents of a tbm bitmap
> (we need a decent name for the non-stream data structure ... maybe
> hash bitmap?). If we think of it like that then we can unify the
> ideas some more.
>
> My proposal at this point would be to invent two different Node types,
> one for stream bitmaps and one for hash bitmaps. The initial input to
> nodeBitmapHeapscan can be either, but if it's given a hash bitmap then
> it stream-ifies it for use. amgetmulti can return either kind, and
> nodeBitmapAnd and nodeBitmapOr can use IsA tests to decide what to do.
> Preserving the existing optimization for ORing hash bitmaps is a bit
> tricky but I think it's doable. Consider this API for amgetmulti:
>
> amgetmulti takes an argument which can be either a hash bitmap or NULL.
> It returns an object that must be either a hash or stream bitmap.
> If it wants to return a stream bitmap, it simply disregards the argument
> and returns a constructed stream-bitmap object. If it wants to return
> a hash bitmap, then if the argument is not NULL, OR the additional bits
> into the argument object and return it; if the argument is NULL,
> construct a fresh hash-bitmap object, set bits in it, and return it.

This sounds great. One thing I am concern about is that this will add the
dependency of node types into the access methods. If we still keep
nodeBitmapIndexscan and let it do the bitmap construction for tids returned
by amgetmulti.

Otherwise, I think that I get a pretty good picture about how to do this.

Thanks,
Jie

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc:	"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-17 12:54:00
Message-ID:	5097.1155819240@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> This sounds great. One thing I am concern about is that this will add the
> dependency of node types into the access methods. If we still keep
> nodeBitmapIndexscan and let it do the bitmap construction for tids returned
> by amgetmulti.

No, I'm assuming the other proposal that was on the table, namely to get
rid of amgetmulti in its current form and instead have an AM call that
delivers a bitmap in one step. (Probably should rename the pg_am column
to something like amgetbitmap.) nodeBitmapIndexscan would become pretty
trivial. For the existing AMs this just means that they call
tbm_add_tuple(s) for themselves, which is no big problem, especially
considering that they probably get to save some code by not having to
stop the indexscan when the buffer array gets full.

regards, tom lane

From:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-17 19:13:17
Message-ID:	C10A0DDD.A600%jzhang@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On 8/17/06 5:54 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
>> This sounds great. One thing I am concern about is that this will add the
>> dependency of node types into the access methods. If we still keep
>> nodeBitmapIndexscan and let it do the bitmap construction for tids returned
>> by amgetmulti.
>
> No, I'm assuming the other proposal that was on the table, namely to get
> rid of amgetmulti in its current form and instead have an AM call that
> delivers a bitmap in one step. (Probably should rename the pg_am column
> to something like amgetbitmap.) nodeBitmapIndexscan would become pretty
> trivial. For the existing AMs this just means that they call
> tbm_add_tuple(s) for themselves, which is no big problem, especially
> considering that they probably get to save some code by not having to
> stop the indexscan when the buffer array gets full.

This sounds good. Another problem is about ScalarArrayOpExpr support in
current nodeBitmapIndexscan. This will not work for stream bitmaps. We have
to disable it in the optimizer.

Thanks,
Jie

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc:	"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-17 19:29:23
Message-ID:	18654.1155842963@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

"Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> This sounds good. Another problem is about ScalarArrayOpExpr support in
> current nodeBitmapIndexscan. This will not work for stream bitmaps.

Sure it will; it's just an OR.

> We have to disable it in the optimizer.

That's not happening ;-)

regards, tom lane

From:	"Jie Zhang" <jzhang(at)greenplum(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-08-17 19:42:20
Message-ID:	C10A14AC.A608%jzhang@greenplum.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

On 8/17/06 12:29 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
>> This sounds good. Another problem is about ScalarArrayOpExpr support in
>> current nodeBitmapIndexscan. This will not work for stream bitmaps.
>
> Sure it will; it's just an OR.
>

Yes, it is just an OR. But we have to introduce an OR logic in
nodeBitmapIndexscan node for stream bitmaps. We can't do what it is done
right now: scanning the underline index one key at a time. We have to scan
the underline index for all keys at the same time. It seems simply that we
just introduce another BitmapOr in the path.

Thanks,
Jie

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Jie Zhang <jzhang(at)greenplum(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [PATCHES] WIP: bitmap indexes
Date:	2006-09-05 22:30:06
Message-ID:	200609052230.k85MU6P03691@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-patches

This has been saved for the 8.3 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------
Jie Zhang wrote:
>
>
> On 8/15/06 6:18 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> >> On Mon, 14 Aug 2006, Tom Lane wrote:
> >>> Correct me if I'm wrong, but isn't the patch's present hacking on the
> >>> executor intended to make it happen like this?
> >
> >> Not really. It reads ahead on the bitmap index and passes back the bitmap
> >> words. The other executor routines are hacked up to process the data in
> >> this format.
> >
> > Well, as I said, I don't think there's justification for exposing a
> > bitmap index's internal data formats to the rest of the system like
> > that: it's not very future-proof and I don't see that it's buying any
> > significant performance gain. At some point you have to convert to TIDs
> > anyway, at least in the sense of knowing what page and line number you
> > are at, so passing the data as an array of TIDs really isn't going to
> > hurt much. So my advice is to rip out all those changes and go back to
> > the existing tidbitmap.c readout API. There's nothing wrong with
> > the TBMIterateResult data structure.
> >
>
> The bitmap words in the bitmap index are very simple and can be very
> generic. You can think about them as one bit per tuple along with some
> padding bits between heap pages. The problem I have is that I do not know a
> good way to construct an in-memory version of this for other index
> structures, like b-tree. To be able to handle both cases nicely, you are
> right -- TBMIterateResult is better. Or, PagetableEntry may be better since
> it will make AND/OR easier.
>
> > What I do find interesting to think about is whether, strictly within
> > tidbitmap.c, there could be an alternate kind of bitmap object that
> > doesn't have to materialize the whole bitmap for an indexscan in memory
> > because it knows it can fetch the data on-demand, ie, build the next
> > page TBMIterateResult data structure on-the-fly from the index when it's
> > requested. Call it a "stream bitmap" in contrast to the present
> > "materialized bitmaps". The trick here is to be able to AND and OR a
> > stream bitmap with another stream bitmap or a materialized bitmap.
> > I don't see any reason in principle why that couldn't be done: the
> > output of the AND/OR would be a stream of TBMIterateResults just like
> > the inputs. That is, it's another stream bitmap, but with a different
> > generating function and some internal state that includes its source
> > bitmaps. You'd have to sort a materialized bitmap into order before
> > starting to AND/OR it with a stream bitmap, but that code is there
> > already.
>
> I like this idea. I think that we can define a new TBMStatus to be
> TBM_STREAM in TIDBitmap. *getmulti functions will remain the same, except
> that we add a new returning bool argument, stating if this is a stream
> bitmap. If this is a stream bitmap, nodeBitmapIndexScan simply fills spages,
> and passes it upstream. When nodeBitmapAnd or nodeBitmapOr ANDs/ORs several
> bitmaps, the result bitmap is a stream bitmap if there is at least one
> bitmap is a stream bitmap. Then we add another loop in nodeBitmapHeapscan to
> be able to pull more data from its subnode.
>
> Thanks,
> Jie
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +