Quick Links

Re: Cube extension improvement, GSoC

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Stas Kelvich <stanconn(at)gmail(dot)com>
Subject:	Re: Cube extension improvement, GSoC
Date:	2013-03-30 11:55:33
Message-ID:	CAPpHfdtsKPx7Kxy8eAAdD=H1V5Eh5H+TEz5sxUvF=twURgBAnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Jey!

I remember I've started couple of threads related to cube extension:
http://www.postgresql.org/message-id/4F30616D.3030509@gmail.com
http://www.postgresql.org/message-id/4F3C16E9.90808@gmail.com
Could you provide some feedback to GSoC proposal of Stas?

On Sat, Mar 23, 2013 at 3:10 AM, Stas Kelvich <stanconn(at)gmail(dot)com> wrote:

> Hello,
>
> some time ago I started working on the data search system (about 100-200M
> of records) with queries consisted of several diapason and equality
> conditions, e.g.:
>
> WHERE dim1 BETWEEN 128 AND 137 AND
> WHERE dim2 BETWEEN 4815 AND 162342 AND
> WHERE dim3 = 42
> ORDER BY dim1 ASC
>
> There are 6 or 7 search criteria in my task. In order to avoid full table
> scan I started using R-Tree over cube data type:
>
> CREATE INDEX ON my_table USING GiST(cube(array[dim1, dim2, dim3]))
>
> For fast sorting I used Alexander Korotkov's patch for knngist (
> http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg153676.html),
> which changes metric for nearest neighbors search and allows to obtain
> cubes ordered by a specific coordinate. Having changed some data types and
> function/operator definitions I ported this to 9.2 (
> https://github.com/kelvich/postgres/commit/bb372). Then, after this I
> could select ordered records right from the index:
>
> SELECT * FROM my_table
> WHERE cube(array[dim1, dim2, dim3]) <@ cube
> '(128,4815,42),(137,162342,42)'
> ORDER BY cube(array[dim1, dim2, dim3]) <*> 5;
>
> The main issue of such approach is the index size. In my case it was about
> 100GB for 100M of records. Therefore index building becomes very expensive
> disk-related operation. For the same reason reading a large number of
> records from the index is slow too.
>
> I came to Oleg Bartunov, Theodor Sigaev and after a while to Alexander
> Korotkov for any help. (I'm very thankful to them and glad that they agreed
> to meet, listen to me and give useful advices). Having discussed it we
> decided that there was several possible improvements for the cube extension:
>
> * Adding point data type support to the cube extension in order to avoid
> storing of coincident upper left and lower right vertices, which may reduce
> the volume that leaf nodes occupy almost twice.
> * Checking the split algorithm with big datasets and, if possible,
> improving it.
> * Learning cube extension to store dimensions with different data types.
> Such index would be good alternative to compound key B-Tree multi-index
> (suitable for diapason queries and data ordering).
> * Providing support for KNN with metrics induced by the different norms:
> euclidean, taxicab norm, p-norm. This can be useful in fields where we can
> extract signature: similar images search, similar audio search.
>
> I'd like to participate in GSoC with this improvements, and I'm very
> interested in any comments or suggestions about this feature list.
>

------
With best regards,
Alexander Korotkov.

Responses

Re: Cube extension improvement, GSoC at 2013-03-30 16:06:33 from Alexander Korotkov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	赖文豫	2013-03-30 14:08:44	By now, why PostgreSQL 9.2 don't support SSDs?
Previous Message	Alexander Korotkov	2013-03-30 08:29:52	Re: Getting to 9.3 beta