Re: WIP: SP-GiST, Space-Partitioned GiST

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: SP-GiST, Space-Partitioned GiST
Date: 2011-09-24 21:29:09
Message-ID: Pine.LNX.4.64.1109250119300.26195@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 22 Sep 2011, Heikki Linnakangas wrote:

> On 06.09.2011 20:34, Oleg Bartunov wrote:
>> Here is the latest spgist patch, which has all planned features as well as
>> all overhead, introduced by concurrency and recovery, so performance
>> measurement should be realistic now.
>
> I'm ignoring the text suffix-tree part of this for now, because of the issue
> with non-C locales that Alexander pointer out.
>
> Regarding the quadtree, have you compared the performance of that with
> Alexander's improved split algorithm? I ran some tests using the test harness
> I still had lying around from the fast GiST index build tests:
>
> testname | time | accesses | indexsize
> -------------------------+-----------------+----------+-----------
> points unordered auto | 00:03:58.188866 | 378779 | 522 MB
> points ordered auto | 00:07:14.362355 | 177534 | 670 MB
> points unordered auto | 00:02:59.130176 | 46561 | 532 MB
> points ordered auto | 00:04:00.50756 | 45066 | 662 MB
> points unordered spgist | 00:03:05.569259 | 78871 | 394 MB
> points ordered spgist | 00:01:46.06855 | 422104 | 417 MB
> (8 rows)
>
> These tests were with a table with 7500000 random points. In the
> ordered-tests, the table is sorted by x,y coordinates. 'time' is the time
> used to build the index on it, and 'accesses' is the total number of index
> blocks hit by a series of 10000 bounding box queries, measured from
> pg_statio_user_indexes.idx_blks_hit + idx_blks_read.
>
> The first two tests in the list are with a GiST index on unpatched
> PostgreSQL. The next six tests are with Alexander's double-sorting split
> patch. The last two tests are with an SP-GiST index.
>
> It looks like the query performance with GiST using the double-sorting split
> is better than SP-GiST, although the SP-GiST index is somewhat smaller. The
> ordered case seems pathologically bad, is that some sort of a worst-case
> scenario for quadtrees?

random points are not good use-case, on real data sp-gist is much faster
than GiST. I attached the latest version of patch, which descrease wal-traffic
and introduce kd-tree example.

real-life example from geonames database can be downloaded from
http://mira.sai.msu.su/~megera/geo-all.dump.gz

Below is a log of test run on my notebook:

zcat /home/megera/app/pgsql/knn/geo-all.dump.gz | psql test
create index spg_kd_idx on geo using spgist(point kd_point_ops);
CREATE INDEX
Time: 97180.047 ms
test=# select pg_total_relation_size('spg_quad_idx');
pg_total_relation_size
------------------------
363126784

test=# create index spg_quad_idx on geo using spgist(point);
LOG: checkpoints are occurring too frequently (22 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (19 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (20 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
CREATE INDEX
Time: 68565.279 ms
test=# select pg_total_relation_size('spg_quad_idx');
pg_total_relation_size
------------------------
363126784

test=# create index gist_idx on geo using gist(point);
CREATE INDEX
Time: 354446.965 ms
test=# select pg_total_relation_size('gist_idx');
pg_total_relation_size
------------------------
542793728

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Attachment Content-Type Size
spgist_patch-0.101.gz application/octet-stream 39.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-09-24 21:37:52 Re: [v9.2] Fix Leaky View Problem
Previous Message Cédric Villemain 2011-09-24 21:26:16 Re: posix_fadvsise in base backups