Re: Minmax indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minmax indexes
Date: 2014-06-17 16:14:00
Message-ID: CA+Tgmob5h61BK_snKJjb8_kZYjJ-eBzRw26ANsX7amKB9up73g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 17, 2014 at 12:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Well, I'm not the guy who does things with geometric data, but I don't
>> want to ignore the significant percentage of our users who are. As
>> you must surely know, the GIST implementations for geometric data
>> types store bounding boxes on internal pages, and that seems to be
>> useful to people. What is your reason for thinking that it would be
>> any less useful in this context?
>
> For me minmax indexes are helpful because they allow to generate *small*
> 'coarse' indexes over large volumes of data. From my pov that's possible
> possible because they don't contain item pointers for every contained
> row.
> That'ill imo work well if there are consecutive rows in the table that
> can be summarized into one min/max range. That's quite likely to happen
> for common applications of number of scalar datatypes. But the
> likelihood of placing sufficiently many rows with very similar bounding
> boxes close together seems much less relevant in practice. And I think
> that's generally likely for operations which can't be well represented
> as btree opclasses - the substructure that implies inside a Datum will
> make correlation between consecutive rows less likely.

Well, I don't know: suppose you're loading geospatial data showing the
location of every building in some country. It might easily be the
case that the data is or can be loaded in an order that provides
pretty good spatial locality, leading to tight bounding boxes over
physically consecutive data ranges.

But I'm not trying to say that we absolutely have to support that kind
of thing; what I am trying to say is that there should be a README or
a mailing list post or some such that says: "We thought about how
generic to make this. We considered A, B, and C. We rejected C as
too narrow, and A because if we made it that general it would have
greatly enlarged the disk footprint for the following reasons.
Therefore we selected B." Basically, I think Heikki asked a good
question - which was "could we abstract this more?" - and I can't
recall seeing a clear answer explaining why we could or couldn't and
what the trade-offs would be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-06-17 16:15:10 Re: pg_control is missing a field for LOBLKSIZE
Previous Message Robert Haas 2014-06-17 16:07:04 Re: Set new system identifier using pg_resetxlog