Re: Red-black tree for GIN

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Red-black tree for GIN
Date: 2009-11-23 13:28:26
Message-ID: 4B0A8DFA.7050009@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi there,

attached is a patch, which contains implementation of a red-black
tree, a self-balanced implementation of binary tree. The main goal of
this patch is to improve creation time of GIN index in the corner cases.
While creation, GIN collects data in memory in binary tree until it
reach some limit and then it flush tree to disk. Some data could
produces unbalanced binary tree, for example, sorted data, so the tree
degenerates to the list with O(N^2) processing time (see thread
http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php)
), which cause very slow index creation. Tom has fixed that by limiting
depth of tree (committed to 8.3 and 8.4), but we found it's not enough
and propose to use red-black tree, which is very good for skewed data and has
almost the same performance for unsorted data, see
http://www.sai.msu.su/~megera/wiki/2009-07-27 and
http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information.

Implementation of red-black tree has several currently unused methods,
but they will be used in next patches.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
rbtree-0.5.gz application/x-tar 6.2 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2009-12-31 21:19:11
Message-ID: 603c8f070912311319q4fb1ffffrbfc0adef1d237db5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/11/23 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> Hi there,
>
> attached is a patch, which contains implementation of a  red-black
> tree,  a self-balanced implementation of binary tree.  The main goal of
> this patch is to improve creation time of GIN index in the corner cases.
> While creation, GIN collects data in memory in binary tree until it
> reach some limit and then it flush tree to disk. Some data could
> produces unbalanced binary tree,  for example, sorted data, so the tree
> degenerates to the list with O(N^2) processing time (see thread
> http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php)
> ), which cause very slow index creation.  Tom has fixed that by limiting
> depth of tree  (committed to 8.3 and 8.4),  but we found it's not enough
> and propose to use red-black tree, which is very good for skewed data and
> has almost the same performance for unsorted data, see
> http://www.sai.msu.su/~megera/wiki/2009-07-27 and
> http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information.
>
> Implementation of red-black tree has several currently unused  methods,
> but they will be used in next patches.

I did a quick read-through of this, and one question that immediately
occurred to me is that rbtree.c says that it is adopted from
http://algolist.manual.ru/ds/rbtree.php. But I'm not sure what
license that code is under, so I'm not sure whether it's OK for us to
use it.

My other question is as related to performance. Can you provide a
test case that shows the performance improvement with this patch?

...Robert


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-05 01:12:09
Message-ID: 20100105011209.GQ3778@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas escribió:

> I did a quick read-through of this, and one question that immediately
> occurred to me is that rbtree.c says that it is adopted from
> http://algolist.manual.ru/ds/rbtree.php. But I'm not sure what
> license that code is under, so I'm not sure whether it's OK for us to
> use it.

This code comes from Thomas Niemann's "Sorting and Searching Algorithms:
A Cookbook",
http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm

specifically
http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_rbt.htm

The code is in the public domain; that web page says
"Source code, when part of a software project, may be used
freely without reference to the author."

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-05 01:30:13
Message-ID: 603c8f071001041730m47917a6btfd8e04ad6550231d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 4, 2010 at 8:12 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Robert Haas escribió:
>> I did a quick read-through of this, and one question that immediately
>> occurred to me is that rbtree.c says that it is adopted from
>> http://algolist.manual.ru/ds/rbtree.php.  But I'm not sure what
>> license that code is under, so I'm not sure whether it's OK for us to
>> use it.
>
> This code comes from Thomas Niemann's "Sorting and Searching Algorithms:
> A Cookbook",
> http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm
>
> specifically
> http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_rbt.htm
>
> The code is in the public domain; that web page says
>        "Source code, when part of a software project, may be used
>        freely without reference to the author."

That is excellent. Perhaps we should document that information in the
code comments where the URL is currently mentioned.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-11 02:42:19
Message-ID: 603c8f071001101842w23f909e7v51c9f15cb523e10c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> My other question is as related to performance.  Can you provide a
> test case that shows the performance improvement with this patch?

So, we still don't have a test case for this patch. During the
November CommitFest, Greg Smith griped a bit about the lack of a
reproducible performance benchmark for the XLogInsert patch:

http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php

...and I would say the same logic applies to this patch, maybe even
moreso. Tom has already applied a partial workaround for this
problem, and I'm feeling like it won't be trivial to figure out what
to measure to see the remaining issue and measure how much this new
implementation helps.

The coding pattern that this patch uses also merits some discussion.
Basically, rbtree.c is a generic implementation of red-black trees -
from a textbook - which ginbulk.c then uses for GIN. One possible
advantage of this implementation is that it might make it possible for
us to use the rbtree.c logic in other places, if we have other data
structures that need similar treatment. But I'm not sure if that's
the way we want to go. The other alternative is to drop the
generalized implementation and incorporate the logic directly into
ginbulk.c. I really don't know which is better, but I'd like to hear
some other opinions...

...Robert


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-11 02:54:05
Message-ID: 407d949e1001101854o55464134u7092db9a8a5a3e82@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 11, 2010 at 2:42 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The coding pattern that this patch uses also merits some discussion.
> Basically, rbtree.c is a generic implementation of red-black trees -
> from a textbook - which ginbulk.c then uses for GIN.  One possible
> advantage of this implementation is that it might make it possible for
> us to use the rbtree.c logic in other places, if we have other data
> structures that need similar treatment.  But I'm not sure if that's
> the way we want to go.  The other alternative is to drop the
> generalized implementation and incorporate the logic directly into
> ginbulk.c.  I really don't know which is better, but I'd like to hear
> some other opinions...
>

So, code reuse is not the only advantage of abstraction. It's also
just plain easier to understand and test code written with clear
abstract interfaces. The way you describe it someone with no knowledge
could look at rbtree.c and see if it's done correctly and maybe
improve it. And someone reading ginbulk only has to understand the
operations red-black trees support and no understand how they're
implemented to follow the code.

Is there any advantage of integrating the code with ginbulk.c? Does it
let us get away with finer grained locks or do any tricks doing
gin-specific changes while we're in the middle of rebalancing or
anything like that?

--
greg


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-11 06:49:12
Message-ID: Pine.LNX.4.64.1001110945570.10030@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

we have benchmark for rbtree
http://www.sai.msu.su/~megera/wiki/2009-07-27
rbtree, actually, fix corner cases with ordered input, with little
overhead.

As you may see from knngist patch, rbtree used in gist code, so, please,
leave rbtree code as is.

Oleg
On Sun, 10 Jan 2010, Robert Haas wrote:

> On Thu, Dec 31, 2009 at 4:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > My other question is as related to performance. =A0Can you provide a
> > test case that shows the performance improvement with this patch?
>
> So, we still don't have a test case for this patch. During the
> November CommitFest, Greg Smith griped a bit about the lack of a
> reproducible performance benchmark for the XLogInsert patch:
>
> http://archives.postgresql.org/pgsql-hackers/2009-12/msg00816.php
>
> ...and I would say the same logic applies to this patch, maybe even
> moreso. Tom has already applied a partial workaround for this
> problem, and I'm feeling like it won't be trivial to figure out what
> to measure to see the remaining issue and measure how much this new
> implementation helps.
>
> The coding pattern that this patch uses also merits some discussion.
> Basically, rbtree.c is a generic implementation of red-black trees -
> from a textbook - which ginbulk.c then uses for GIN. One possible
> advantage of this implementation is that it might make it possible for
> us to use the rbtree.c logic in other places, if we have other data
> structures that need similar treatment. But I'm not sure if that's
> the way we want to go. The other alternative is to drop the
> generalized implementation and incorporate the logic directly into
> ginbulk.c. I really don't know which is better, but I'd like to hear
> some other opinions...
>
> ...Robert
>
> --=20
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-11 18:08:43
Message-ID: 4B4B692B.4050401@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> ...and I would say the same logic applies to this patch, maybe even
> moreso. Tom has already applied a partial workaround for this
> problem, and I'm feeling like it won't be trivial to figure out what

That partial workaround is not work for some corner cases:
http://www.sai.msu.su/~megera/wiki/2009-07-27 (8.4 already has that hask!)

> The coding pattern that this patch uses also merits some discussion.
> Basically, rbtree.c is a generic implementation of red-black trees -
> from a textbook - which ginbulk.c then uses for GIN. One possible
> advantage of this implementation is that it might make it possible for
> us to use the rbtree.c logic in other places, if we have other data
> structures that need similar treatment. But I'm not sure if that's
> the way we want to go. The other alternative is to drop the
> generalized implementation and incorporate the logic directly into
> ginbulk.c. I really don't know which is better, but I'd like to hear
> some other opinions...

knngist uses that implementation of rb-tree. One more candidate is a ts_stat
which now uses unbalanced binary tree.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-11 18:18:09
Message-ID: 603c8f071001111018t36d157bobe145ff29cc18e66@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/11 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> knngist uses that implementation of rb-tree. One more candidate is a ts_stat
> which now uses unbalanced binary tree.

Ah, OK. That's great if we can reuse that code in 2 or 3 places.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-21 03:35:19
Message-ID: 603c8f071001201935p7ae1c7felbffbfc0571ea8ed4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 11, 2010 at 1:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 2010/1/11 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> knngist uses that implementation of rb-tree. One more candidate is a ts_stat
>> which now uses unbalanced binary tree.
>
> Ah, OK.  That's great if we can reuse that code in 2 or 3 places.

Some preliminary thoughts on this patch:

1. I think rb_free_recursive is missing a pfree().

2. We already have a definition of NIL in the PG source base. I think
this one needs to be named something else. RBNIL, maybe.

3. This code could really use some more comments, and maybe some of
the variable names could be better chosen, too. It's far from obvious
what is going on here. I studied rbtrees in college and I still
remember more or less how they work, but, boy, this is hard to follow.
The names of the iterator states are truly horrible, at least IMO.

4. It would be nice if you could do a better job conforming to project
indentation style.

...Robert


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-24 14:43:42
Message-ID: 4B5C5C9E.7010109@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> 1. I think rb_free_recursive is missing a pfree().

Fixed, thank you
>
> 2. We already have a definition of NIL in the PG source base. I think
> this one needs to be named something else. RBNIL, maybe.
fixed

>
> 3. This code could really use some more comments, and maybe some of
> the variable names could be better chosen, too. It's far from obvious
> what is going on here. I studied rbtrees in college and I still
> remember more or less how they work, but, boy, this is hard to follow.
> The names of the iterator states are truly horrible, at least IMO.
I changed names and slightly improve comments.

>
> 4. It would be nice if you could do a better job conforming to project
> indentation style.
Did pgindent.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
rbtree-0.6.gz application/x-tar 6.3 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-24 15:26:00
Message-ID: 603c8f071001240726p27d2ca1ds4dad0aa51588feef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/24 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> 3. This code could really use some more comments, and maybe some of
>> the variable names could be better chosen, too.  It's far from obvious
>> what is going on here.  I studied rbtrees in college and I still
>> remember more or less how they work, but, boy, this is hard to follow.
>>  The names of the iterator states are truly horrible, at least IMO.
>
> I changed names and slightly improve comments.

Would it be OK if I went through here and hacked on the comments and
sent this back to you?

...Robert


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-24 15:55:04
Message-ID: 4B5C6D58.1070403@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Would it be OK if I went through here and hacked on the comments and
> sent this back to you?
Feel free to edit the patch.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Red-black tree for GIN
Date: 2010-01-26 03:24:22
Message-ID: 603c8f071001251924h164b3d7fjced9e2b0142e5b5c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/24 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> Would it be OK if I went through here and hacked on the comments and
>> sent this back to you?
>
> Feel free to edit the patch.

Here's an edited version, which I've now reviewed more fully. Some
more substantive review comments:

1. I'm pretty satisfied that the rbtree code is generally sane,
although I wonder if we should think about putting it in access/common
rather than utils/misc. I'm not sure that I have a sufficiently
clearly defined notion of what each subdirectory is for to draw a
definitive conclusion on this point; hopefully someone else will weigh
in.

2. I'm a little concerned about the performance implications of this
patch. Looking at http://www.sai.msu.su/~megera/wiki/2009-04-03, it's
clear that the patch is a huge win in some cases. But I'm also
surprised by how much performance is lost in some of the other cases
that you tested. I suspect, on balance, that it's probably still a
good idea to put this in, but I wonder if you've profiled this at all
to see where the extra time is going and/or explored possible ways of
squashing that overhead down a little more.

3. In ginInsertEntry(), the "else" branch of the if statement really
looks like magic when you first read it. I wonder if it would make
sense to pull the contents of EAAllocate() directly into this
function, since there's only one call site anyhow.

There are a lot of whitespace changes in the version I'm attaching
compared to your last one - your pg_indent run didn't use a typedef
list that included the new typedefs, so some of the spacing came out
kind of wacky. I also fleshed out the comments a bit, deleted one
comment that was no longer true, and changed some of the function
names in rbtree.c to make them more consistent, but the changes are
all pretty trivial.

I still have not done any performance testing of my own on this code,
and it probably needs that. If anyone else has time to step up and
help with that, it would be much appreciated. It would be useful to
have some plain old benchmarks as well as some profiling data as
mentioned above.

...Robert

Attachment Content-Type Size
rbtree-0.6-rmh application/octet-stream 65.1 KB