Re: Best way to scan on-disk bitmaps

Lists: pgsql-hackers
From: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>
To: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Best way to scan on-disk bitmaps
Date: 2005-05-12 19:21:08
Message-ID: 20050512192107.GA18096@mits.lv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings.

I have questions on how to implement on-disk bitmap scan.

I've been working on on-disk bitmaps as an ordinary Index Access Method, but
now it's clear to me, that it'll lose all it's strengths this way. One on-disk
bitmap has exactly one list of indexed table's TIDs and potentially unlimited
number of bitmaps (number of index attributes multiplied by attribute's
cardinality, to be precise).

So, for better performance, one should first retrieve all the needed bitmaps
from the index, then join all bitmaps according to the query clauses and, as
the last phase, retrieve TIDs from index, that matches final bitmap.

According to the docs "the index access method is responsible for
regurgitating the TIDs ...", but for on-disk bitmaps index scan is devided
into 3 phases. So, to perform the scan in phases, to my mind, executor
should be involved. (I'd like to mention again, that this is the first time
I got so deep inside postgres code).

I wanted to use Tom's nodeBitmap* stuff, but it's problematic a bit. Bitmaps
in nodeBitmap* are built upon list of TIDs retrieved during relation scan.
For on-disk bitmap indexes, there's no need for that, as all bitmaps are
already inside the index.

The question is: Is it possible to extend nodeBitmap functionality in such a
way, that Executor can either build bitmap after list of TIDs, obtained from
RelationScan, or ask index access method to give bitmaps it contain (and TIDs
at given position in the map later)?
This will, probably, require more functions in the pg_am catalog.

Or should I create a completely new node for on-disk bitmaps?

--

Victor Y. Yegorov


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-12 20:10:15
Message-ID: 12035.1115928615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Victor Y. Yegorov" <viy(at)mits(dot)lv> writes:
> I have questions on how to implement on-disk bitmap scan.

I think your best plan might be

1. Be sure that all the indexable WHERE conditions are passed to the
indexscan as indexquals. This might be, say,
WHERE a = 42 and b = 'foo'

2. Within the index AM, form the AND of the relevant bitmaps (here the
ones for a = 42 and b = 'foo').

3. Within the index AM, pick up the TIDs for the remaining one-bits,
and pass them back.

4. Let the existing machinery handle the OR-ing problem as well as
actual fetching of the heap rows.

This can be done without any restructuring of the index AM API.

regards, tom lane


From: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-12 21:27:01
Message-ID: 20050512212701.GB18096@mits.lv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> [12.05.2005 23:09]:
> 1. Be sure that all the indexable WHERE conditions are passed to the
> indexscan as indexquals. This might be, say,
> WHERE a = 42 and b = 'foo'

If I have on-disk bitmap
ON (a, b, c)
will the planner pick an index scan for
WHERE a = 42 AND b = 'foo'
(i.e. only part of the index attributes are involved)? Any modifications
needed to achieve this functionality?

To my mind, bitmap scan even for 1 attribute of a multi-column index would be
a win, though I haven't tested this yet.

--

Victor Y. Yegorov


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-12 21:40:06
Message-ID: 12737.1115934006@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Victor Y. Yegorov" <viy(at)mits(dot)lv> writes:
> If I have on-disk bitmap
> ON (a, b, c)
> will the planner pick an index scan for
> WHERE a = 42 AND b = 'foo'
> (i.e. only part of the index attributes are involved)? Any modifications
> needed to achieve this functionality?

Hmm. That particular case will work, but the planner believes that only
consecutive columns in the index are usable --- that is, if you have
quals for a and c but not for b, it will think that the condition for c
isn't usable with the index. This is true for btree and gist indexes,
so I suppose we'd need to introduce a pg_am column that tells what to
do.

[ thinks some more ... ]

Plan B would be to remove that restriction and teach btree and gist to
cope. While a btree couldn't use a nonconsecutive restriction as part
of its where-to-scan logic, I don't see any good reason why it couldn't
still perform the test before returning the TID, thus possibly saving a
trip to the heap. Offhand it seems this should be true of gist as well,
but I don't know that code well enough to be sure.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-13 05:46:17
Message-ID: 87br7foeza.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Plan B would be to remove that restriction and teach btree and gist to
> cope. While a btree couldn't use a nonconsecutive restriction as part
> of its where-to-scan logic, I don't see any good reason why it couldn't
> still perform the test before returning the TID, thus possibly saving a
> trip to the heap. Offhand it seems this should be true of gist as well,
> but I don't know that code well enough to be sure.

Not long ago there was some discussion about how gist indexes don't really
handle multicolumn indexes usefully currently. They use only the first column
to determine page splits. The discussion wandered and it became clear that it
wasn't even clear what a multicolumn gist index should mean.

I suggested treating each column as independent axes. Independently ask each
column's datatype for the "distance" value and then calculate the inner
product as a kind of geometric n-dimensional distance. There was some
resistance since this limits gist indexes to always basing page splits on a
single "distance" based algorithm. (Though all current gist index methods in
the postgres source tree do work this way, mostly with copy-pasted code in
fact.)

In this model the columns listed in the gist index are unordered. Any subset
of columns can be used to perform an index lookup. Making it more like the
bitmap index behaviour you're looking at than the btree index behaviour.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-13 06:08:56
Message-ID: 20622.1115964536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Plan B would be to remove that restriction and teach btree and gist to
>> cope. While a btree couldn't use a nonconsecutive restriction as part
>> of its where-to-scan logic, I don't see any good reason why it couldn't
>> still perform the test before returning the TID, thus possibly saving a
>> trip to the heap.

> [ snip ]

> In this model the columns listed in the gist index are unordered. Any subset
> of columns can be used to perform an index lookup. Making it more like the
> bitmap index behaviour you're looking at than the btree index behaviour.

I thought some more about this since sending my earlier message. As far
as I can recall at the moment, there really isn't anything fundamental
that depends on the consecutive-columns rule. The one place where the
rubber meets the road is in the index cost estimation functions: if we
were to relax that rule, then btcostestimate would have to be taught to
include only the consecutive columns when estimating how much of a btree
index is going to be touched.

And more than that: if you've studied the btree code at all, you realize
that that's only an incomplete heuristic anyway. For instance, if the
leading key is a > xxx, second keys like b > yyy and b < yyy act
completely differently in terms of indexscan cost, but btcostestimate
doesn't presently know that.

I wonder if we shouldn't migrate the amcostestimate functions into the
individual index AMs (which would mean adding a column to pg_am, but so
what). btcostestimate could be much less phony about this if it had
access to the same infrastructure that _bt_first uses to examine the
index clauses.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-14 16:31:22
Message-ID: 200505141631.j4EGVMm16956@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Victor Y. Yegorov" <viy(at)mits(dot)lv> writes:
> > If I have on-disk bitmap
> > ON (a, b, c)
> > will the planner pick an index scan for
> > WHERE a = 42 AND b = 'foo'
> > (i.e. only part of the index attributes are involved)? Any modifications
> > needed to achieve this functionality?
>
> Hmm. That particular case will work, but the planner believes that only
> consecutive columns in the index are usable --- that is, if you have
> quals for a and c but not for b, it will think that the condition for c
> isn't usable with the index. This is true for btree and gist indexes,
> so I suppose we'd need to introduce a pg_am column that tells what to
> do.

We do have a TODO for this:

* Use index to restrict rows returned by multi-key index when used with
non-consecutive keys to reduce heap accesses

For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and
col3 = 9, spin though the index checking for col1 and col3 matches,
rather than just col1; also called skip-scanning.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-15 19:19:32
Message-ID: 877ji047qz.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:

> > Hmm. That particular case will work, but the planner believes that only
> > consecutive columns in the index are usable --- that is, if you have
> > quals for a and c but not for b, it will think that the condition for c
> > isn't usable with the index. This is true for btree and gist indexes,
> > so I suppose we'd need to introduce a pg_am column that tells what to
> > do.
>
> We do have a TODO for this:
>
> * Use index to restrict rows returned by multi-key index when used with
> non-consecutive keys to reduce heap accesses
>
> For an index on col1,col2,col3, and a WHERE clause of col1 = 5 and
> col3 = 9, spin though the index checking for col1 and col3 matches,
> rather than just col1; also called skip-scanning.

That TODO is something else.

Though it is related in that it is another example of why the existing code is
too simplistic and will eventually need to be enhanced. Not only would bitmap
indexes and (possibly) gist indexes, but even btree indexes would need to do
so if this TODO were implemented.

--
greg


From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-15 19:22:59
Message-ID: ja8f815ef3o3o6eab4gu6fo223p43iqq1a@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 12 May 2005 17:40:06 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
wrote:
>the planner believes that only
>consecutive columns in the index are usable --- that is, if you have
>quals for a and c but not for b, it will think that the condition for c
>isn't usable with the index. This is true for btree [...]

It's not difficult to setup a test case where an index is used, but
only with a=42 as an index condition, and c='foo' is applied as a
filter condition. Now add a redundant condition on b
... AND b BETWEEN minb AND maxb ...
and watch how c='foo' moves into the index condition.

I did this test some time ago and I believe that adding the condition
on b did not change the estimated cost, only the actual execution
time.

Servus
Manfred


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-15 19:27:51
Message-ID: 19469.1116185271@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>>> Hmm. That particular case will work, but the planner believes that only
>>> consecutive columns in the index are usable --- that is, if you have
>>> quals for a and c but not for b, it will think that the condition for c
>>> isn't usable with the index.
>>
>> We do have a TODO for this:
>>
>> * Use index to restrict rows returned by multi-key index when used with
>> non-consecutive keys to reduce heap accesses

> That TODO is something else.

No, I think Bruce is right --- it's essentially the same thing. It
certainly would be a good idea to change btrees to work like that,
if we are going to modify the planner to relax the restriction for
other index types.

I think it would be easy to change the planner and btree to handle
this (where "easy" means "I remember where all the skeletons are
buried"). But I don't know the gist code hardly at all. Can anyone
offer an informed opinion on whether gist can handle this now, and
if not what it would take to handle it?

(hash and rtree are not at issue since they don't support multi-key
indexes.)

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-15 20:11:31
Message-ID: 87vf5k2qrw.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> I think it would be easy to change the planner and btree to handle
> this (where "easy" means "I remember where all the skeletons are
> buried"). But I don't know the gist code hardly at all. Can anyone
> offer an informed opinion on whether gist can handle this now, and
> if not what it would take to handle it?

Currently gist indexes only use the first column for page splits, making
multi-key gist indexes basically useless. The problem is it's hard to imagine
an API for a pickSplit function that could handle multi-key indexes with
disparate data types and operator classes. I had an idea of a new way to deal
with gist indexes that simplified the API and side-stepped the whole issue but
you raised concerns that it might be too limiting.

Unfortunately the mailing list archive seems to have missed this discussion.
I've attached three messages from the discussion at the time.


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-15 20:54:49
Message-ID: Pine.GSO.4.62.0505160002130.22318@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 15 May 2005, Tom Lane wrote:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>>>> Hmm. That particular case will work, but the planner believes that only
>>>> consecutive columns in the index are usable --- that is, if you have
>>>> quals for a and c but not for b, it will think that the condition for c
>>>> isn't usable with the index.
>>>
>>> We do have a TODO for this:
>>>
>>> * Use index to restrict rows returned by multi-key index when used with
>>> non-consecutive keys to reduce heap accesses
>
>> That TODO is something else.
>
> No, I think Bruce is right --- it's essentially the same thing. It
> certainly would be a good idea to change btrees to work like that,
> if we are going to modify the planner to relax the restriction for
> other index types.
>
> I think it would be easy to change the planner and btree to handle
> this (where "easy" means "I remember where all the skeletons are
> buried"). But I don't know the gist code hardly at all. Can anyone
> offer an informed opinion on whether gist can handle this now, and
> if not what it would take to handle it?

I think that handling this in GiST is depends solely on how users consistent
function designed to handle NULLs in keys. Other words, it should works as
soon as users consistent function "know" what to do with NULLs in internal keys.

Teodor will comment multi-key GiST tomorrow.

We used Paul Aoki paper "Generalizing ''Search'' in Generalized Search Trees",
(available from http://www.sai.msu.su/~megera/postgres/gist/papers/csd-97-950.pdf )
for implementation of multi-key GiST index support. It's true, that first
key is used for splitting, but elements with duplicated first key could
be shuffled to get better clustering on second key.

>
> (hash and rtree are not at issue since they don't support multi-key
> indexes.)
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-16 13:14:57
Message-ID: 42889CD1.6070001@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

About page splitting algorithm in GiST in multikey case. For the beginning, page
is splitted by calling pickSplit method of key of first column (pickSplit method
is defined for opclass and it is a user function), then it try to find equal
values of first column in left and right pages ( gist.c lines 1264-1901 ). If
it is, then GiST core will try to resort tuples with first equal keys between
left and right pages using penalty method for second and higher column's key. If
it's not, it leave pages untouched. But unions for parent page of second and
higher column's keys will be formed.

So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index
can speed up queries like 'a=V1 and c=V2'. But it will not usable for queries
( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other
operations. Number of supported operations by GiST is indefinite unlike, for
example, btree which supported only five: <, <=, =, =>, >.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-16 22:53:43
Message-ID: 200505162253.j4GMrhZ16436@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


If people have GIST TODOs, please post them.

---------------------------------------------------------------------------

Teodor Sigaev wrote:
> About page splitting algorithm in GiST in multikey case. For the beginning, page
> is splitted by calling pickSplit method of key of first column (pickSplit method
> is defined for opclass and it is a user function), then it try to find equal
> values of first column in left and right pages ( gist.c lines 1264-1901 ). If
> it is, then GiST core will try to resort tuples with first equal keys between
> left and right pages using penalty method for second and higher column's key. If
> it's not, it leave pages untouched. But unions for parent page of second and
> higher column's keys will be formed.
>
> So, if index is defined as 'using gist (a,b,c)' then, in principle, GiST index
> can speed up queries like 'a=V1 and c=V2'. But it will not usable for queries
> ( b=V3 and c=V2 ). By the way, instead of '=' operation may be used other
> operations. Number of supported operations by GiST is indefinite unlike, for
> example, btree which supported only five: <, <=, =, =>, >.
>
>
>
> --
> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
> WWW: http://www.sigaev.ru/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-17 01:39:20
Message-ID: 42894B48.4000203@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> If people have GIST TODOs, please post them.

Concurrency :)


From: Alvaro Herrera <alvherre(at)surnet(dot)cl>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-17 03:57:13
Message-ID: 20050517035713.GB11360@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 17, 2005 at 09:39:20AM +0800, Christopher Kings-Lynne wrote:
>
>
> Bruce Momjian wrote:
> >If people have GIST TODOs, please post them.
>
> Concurrency :)

And WAL support.

--
Alvaro Herrera (<alvherre[a]surnet.cl>)
"No necesitamos banderas
No reconocemos fronteras" (Jorge González)


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)surnet(dot)cl>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-05-17 03:58:41
Message-ID: 200505170358.j4H3wfW01998@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> On Tue, May 17, 2005 at 09:39:20AM +0800, Christopher Kings-Lynne wrote:
> >
> >
> > Bruce Momjian wrote:
> > >If people have GIST TODOs, please post them.
> >
> > Concurrency :)
>
> And WAL support.

Already there:

* Add WAL index reliability improvement to non-btree indexes

and this too:

* Add concurrency to GIST

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-06-13 23:20:21
Message-ID: 10044.1118704821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> ... So, if index is defined as 'using gist (a,b,c)' then, in
> principle, GiST index can speed up queries like 'a=V1 and c=V2'. But
> it will not usable for queries ( b=V3 and c=V2 ). By the way, instead
> of '=' operation may be used other operations. Number of supported
> operations by GiST is indefinite unlike, for example, btree which
> supported only five: <, <=, =, =>, >.

I have committed changes to the planner to arrange that a GiST indexscan
must supply at least one restriction clause for the first index column,
and can supply restriction clauses for any, all, or none of the
remaining columns; the old left-to-right heuristic is gone.

As far as I can tell, this doesn't require any changes to the GiST code,
but please take another look if you aren't too sure about it.

regards, tom lane


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Victor Y(dot) Yegorov" <viy(at)mits(dot)lv>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Best way to scan on-disk bitmaps
Date: 2005-06-14 10:04:59
Message-ID: Pine.GSO.4.63.0506141403510.7546@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 13 Jun 2005, Tom Lane wrote:

> Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>> ... So, if index is defined as 'using gist (a,b,c)' then, in
>> principle, GiST index can speed up queries like 'a=V1 and c=V2'. But
>> it will not usable for queries ( b=V3 and c=V2 ). By the way, instead
>> of '=' operation may be used other operations. Number of supported
>> operations by GiST is indefinite unlike, for example, btree which
>> supported only five: <, <=, =, =>, >.
>
> I have committed changes to the planner to arrange that a GiST indexscan
> must supply at least one restriction clause for the first index column,
> and can supply restriction clauses for any, all, or none of the
> remaining columns; the old left-to-right heuristic is gone.
>
> As far as I can tell, this doesn't require any changes to the GiST code,
> but please take another look if you aren't too sure about it.

I did quick test and found no problem with gist(a,b,c) and index does used for
(a,*,c) case

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83