High-Concurrency GiST in postgreSQL

From: "C(dot) Mundi" <cmundi(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: High-Concurrency GiST in postgreSQL
Date: 2011-12-05 18:31:09
Message-ID: CAPvS8WZNQ8ysY=hyij5EJscrZZyG9V7uZqAxsQfu8tWDpQNvBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello. This is my first post. As such, feedback on style and choice of
venue are especially welcome.

I am a regular but not especially expert user of a variety of databases,
including postgreSQL.
I have only modest experience with spatial databases.

I have a new project[1] in which GiST could be very useful, provided I can
achieve high concurrency. Starting with some empirical evidence that R*
would be a good place to start, and after reading "High-Concurrency Locking
in R-Trees" [2], I went looking for an implementation of R-link trees
extended to R*. So I was very interested to read Hellerstein et al. where
they wrote [3]:

*High concurrency, recoverability, and degree-3 consis-
tency are critical factors in a full-fledged database sys-
tem. We are considering extending the results of Kor-
nacker and Banks for R-trees [KB95] to our implemen-
tation of GiSTs.
*

Since this information may be somewhat dated, and GiST has obviously come a
long way in postgreSQL, I am looking for current information and advice on
the state of concurrency in GiST in postgreSQL. If someone has already
done an R*-link tree then that could really help me. ( I can wish, no?)

Thanks for reading and thanks for advice or pointers.

Carlos

[1] It's not a GiS prject, but it has some similarities:
(a) I need to manage up to 10 million three-dimensional "boxes" or as few
as 1000 "boxes"
(b) The distribution of sizes, aspect ratios and locations in R3 are all
unknown a priori and may change during execution under insert/delete.
(c) Queries may arrive asynchronously and at high rate from hundreds (or
more?) of compute nodes.
(d) Successive queries from any node, viewed as a time-sequence, may have
very low (or at best sporadic) spatial correlation -- lots of page jumps.
(e) R* will be advantageous over R, but Priority R is probably not
especially useful since turnover may be greater than 20% during a "job."
(f) I would like to avoid teh complications of distributed databases, again
because of the high turnover.

[2] Marcel Kornacker and Douglas Banks. High-Concurrency Locking in
R-Trees. (1995)

[3] Hellerstein, Naughton, and Pfeffer. Generalized Search Trees for
Database Systems. (1995)

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Kretschmer 2011-12-05 18:41:50 disallow SET WORK_MEM
Previous Message Pavel Stehule 2011-12-05 16:47:07 Re: pl/pgsql and arrays[]