Re: gistchoose vs. bloat

Lists: pgsql-hackers
From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: gistchoose vs. bloat
Date: 2012-06-18 11:12:29
Message-ID: CAPpHfduhRjJ3WQ3F2yac+UZ98XFoGW4iXfontLZYkBC+dqpFFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

While experimenting with gistchoose I achieve interesting results about
relation of gistchoose behaviour and gist index bloat.
I've created following testcase for reproducing gist index bloating:
1) Create test table with 1000000 points from geonames
# create table geotest (id serial, point point);
# insert into geotest (point) (select * from geonames order by random()
limit 1000000);
2) Create gist index and statistics table. Insert initial statistics (rows
count, index size) into table.
# create index geotest_idx on geotest using gist (point) with (buffering =
off);
# create table geotest_stats (id serial, count bigint, size bigint);
# insert into geotest_stats (count, size) values ((select count(*) from
geotest), pg_relation_size('geotest_idx'));
3) Delete 10% of geotest and vacuum it.
delete from geotest where id in (select id from geotest order by random()
limit 100000);
vacuum geotest;
4) Insert new 100000 points fromg eonames.
insert into geotest (point) (select * from geonames order by random() limit
100000);
5) Insert current statistics (rows count, index size) into table.
insert into geotest_stats (count, size) values ((select count(*) from
geotest), pg_relation_size('geotest_idx'));
6) Repeat 3-5 steps 50 times.

I get following results with current gistchoose implementation:
test=# select * from geotest1_stats;
id | count | size
----+---------+-----------
1 | 1000000 | 75767808
2 | 1000000 | 76046336
3 | 1000000 | 76840960
4 | 1000000 | 77799424
5 | 1000000 | 78766080
6 | 1000000 | 79757312
7 | 1000000 | 80494592
8 | 1000000 | 81125376
9 | 1000000 | 81985536
10 | 1000000 | 82804736
11 | 1000000 | 83378176
12 | 1000000 | 84115456
13 | 1000000 | 84819968
14 | 1000000 | 85598208
15 | 1000000 | 86302720
16 | 1000000 | 87023616
17 | 1000000 | 87703552
18 | 1000000 | 88342528
19 | 1000000 | 88915968
20 | 1000000 | 89513984
21 | 1000000 | 90152960
22 | 1000000 | 90742784
23 | 1000000 | 91324416
24 | 1000000 | 91742208
25 | 1000000 | 92258304
26 | 1000000 | 92758016
27 | 1000000 | 93241344
28 | 1000000 | 93683712
29 | 1000000 | 93970432
30 | 1000000 | 94396416
31 | 1000000 | 94740480
32 | 1000000 | 95068160
33 | 1000000 | 95444992
34 | 1000000 | 95780864
35 | 1000000 | 96313344
36 | 1000000 | 96731136
37 | 1000000 | 96968704
38 | 1000000 | 97222656
39 | 1000000 | 97509376
40 | 1000000 | 97779712
41 | 1000000 | 98074624
42 | 1000000 | 98344960
43 | 1000000 | 98639872
44 | 1000000 | 99000320
45 | 1000000 | 99229696
46 | 1000000 | 99459072
47 | 1000000 | 99672064
48 | 1000000 | 99901440
49 | 1000000 | 100114432
50 | 1000000 | 100261888
51 | 1000000 | 100458496
(51 rows)

Index grow about from 75 MB to 100 MB.

Current implementation of gistchoose select first index tuple which have
minimal penalty. It is possible for several tuples to have same minimal
penalty. I've created simple patch which selects random from them. I then
I've following results for same testcase.

test=# select * from geotest_stats;
id | count | size
----+---------+----------
1 | 1000000 | 74694656
2 | 1000000 | 74743808
3 | 1000000 | 74891264
4 | 1000000 | 75120640
5 | 1000000 | 75374592
6 | 1000000 | 75612160
7 | 1000000 | 75833344
8 | 1000000 | 76144640
9 | 1000000 | 76333056
10 | 1000000 | 76595200
11 | 1000000 | 76718080
12 | 1000000 | 76939264
13 | 1000000 | 77070336
14 | 1000000 | 77332480
15 | 1000000 | 77520896
16 | 1000000 | 77750272
17 | 1000000 | 77996032
18 | 1000000 | 78127104
19 | 1000000 | 78307328
20 | 1000000 | 78512128
21 | 1000000 | 78610432
22 | 1000000 | 78774272
23 | 1000000 | 78929920
24 | 1000000 | 79060992
25 | 1000000 | 79216640
26 | 1000000 | 79331328
27 | 1000000 | 79454208
28 | 1000000 | 79593472
29 | 1000000 | 79708160
30 | 1000000 | 79822848
31 | 1000000 | 79921152
32 | 1000000 | 80035840
33 | 1000000 | 80076800
34 | 1000000 | 80175104
35 | 1000000 | 80207872
36 | 1000000 | 80322560
37 | 1000000 | 80363520
38 | 1000000 | 80445440
39 | 1000000 | 80494592
40 | 1000000 | 80576512
41 | 1000000 | 80666624
42 | 1000000 | 80764928
43 | 1000000 | 80805888
44 | 1000000 | 80912384
45 | 1000000 | 80994304
46 | 1000000 | 81027072
47 | 1000000 | 81100800
48 | 1000000 | 81174528
49 | 1000000 | 81297408
50 | 1000000 | 81371136
51 | 1000000 | 81420288
(51 rows)

Now index grow about from 75 MB to 81 MB. It is almost 4 times less index
bloat!
I've following explanation of that. If index tuples are overlapping then
possibility of insertion into last tuple is much lower than possibility of
insertion to the first tuple. Thereby, freed space underlying to last
tuples of inner page is not likely to be reused. Selecting random tuple
increase that chances.
Obviously, it makes insertion more expensive. I need some more benchmarks
to measure it. Probably, we would need a user-visible option for cheaper
insert or less bloat.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_choose_bloat-0.1.patch application/octet-stream 2.6 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-08-20 03:13:37
Message-ID: 1345432417.20987.93.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-06-18 at 15:12 +0400, Alexander Korotkov wrote:
> Hackers,
>
>
> While experimenting with gistchoose I achieve interesting results
> about relation of gistchoose behaviour and gist index bloat.

...
>
> Current implementation of gistchoose select first index tuple which
> have minimal penalty. It is possible for several tuples to have same
> minimal penalty. I've created simple patch which selects random from
> them. I then I've following results for same testcase.
>
I took a look at this patch. The surrounding code is pretty messy (not
necessarily because of your patch). A few comments would go a long way.

The 'which_grow' array is initialized as it goes, first using pointer
notations ("*which_grows = -1.0") and then using subscript notation. As
far as I can tell, the first r->rd_att->natts of the array (the only
elements that matter) need to be written the first time through anyway.
Why not just replace "which_grow[j] < 0" with "i == FirstOffsetNumber"
and add a comment that we're initializing the penalties with the first
index tuple?

The 'sum_grow' didn't make any sense, thank you for getting rid of that.

Also, we should document that the earlier attributes always take
precedence, which is why we break out of the inner loop as soon as we
encounter an attribute with a higher penalty.

Please add a comment indicating why you are randomly choosing among the
equal penalties.

I think that there might be a problem with the logic, as well. Let's say
you have two attributes and there are two index tuples, it1 and it2;
with penalties [10,10] and [10,100] respectively. The second time
through the outer loop, with i = 2, you might (P=0.5) assign 2 to the
'which' variable in the first iteration of the inner loop, before it
realizes that it2 actually has a higher penalty. I think you need to
finish out the inner loop and have a flag that indicates that all
attributes are equal before you do the probabilistic replacement.

Also, I think you should use random() rather than rand().

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-08-20 17:13:16
Message-ID: CAPpHfdsXH5eDOaMndbONL4VJM5pbOSO5gXOZWmYS7m98pXBcUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 20, 2012 at 7:13 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> I took a look at this patch. The surrounding code is pretty messy (not
> necessarily because of your patch). A few comments would go a long way.
>
> The 'which_grow' array is initialized as it goes, first using pointer
> notations ("*which_grows = -1.0") and then using subscript notation. As
> far as I can tell, the first r->rd_att->natts of the array (the only
> elements that matter) need to be written the first time through anyway.
> Why not just replace "which_grow[j] < 0" with "i == FirstOffsetNumber"
> and add a comment that we're initializing the penalties with the first
> index tuple?
>
> The 'sum_grow' didn't make any sense, thank you for getting rid of that.
>
> Also, we should document that the earlier attributes always take
> precedence, which is why we break out of the inner loop as soon as we
> encounter an attribute with a higher penalty.
>
> Please add a comment indicating why you are randomly choosing among the
> equal penalties.
>
> I think that there might be a problem with the logic, as well. Let's say
> you have two attributes and there are two index tuples, it1 and it2;
> with penalties [10,10] and [10,100] respectively. The second time
> through the outer loop, with i = 2, you might (P=0.5) assign 2 to the
> 'which' variable in the first iteration of the inner loop, before it
> realizes that it2 actually has a higher penalty. I think you need to
> finish out the inner loop and have a flag that indicates that all
> attributes are equal before you do the probabilistic replacement.
>

Current gistchoose code has a bug. I've started separate thread about it.
http://archives.postgresql.org/pgsql-hackers/2012-08/msg00544.php
Also, it obviously needs more comments.

Current state of patch is more proof of concept than something ready. I'm
going to change it in following ways:
1) We don't know how expensive user penalty function is. So, I'm going to
change randomization algorithm so that it doesn't increase number of
penalty calls in average.
2) Since, randomization could produce additional IO, there are probably no
optimal solution for all the cases. We could introduce user-visible option
which enables or disables randomization. However, default value of this
option is another question.

> Also, I think you should use random() rather than rand().
>

Thanks, will fix.

------
With best regards,
Alexander Korotkov.


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-09-04 15:21:49
Message-ID: CAPpHfduz2QvqxbpTNbOJk6MOY_p551m1yQVFBAiFac1y1Du4EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 20, 2012 at 9:13 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> Current gistchoose code has a bug. I've started separate thread about it.
> http://archives.postgresql.org/pgsql-hackers/2012-08/msg00544.php
> Also, it obviously needs more comments.
>
> Current state of patch is more proof of concept than something ready. I'm
> going to change it in following ways:
> 1) We don't know how expensive user penalty function is. So, I'm going to
> change randomization algorithm so that it doesn't increase number of
> penalty calls in average.
> 2) Since, randomization could produce additional IO, there are probably no
> optimal solution for all the cases. We could introduce user-visible option
> which enables or disables randomization. However, default value of this
> option is another question.
>
>
>> Also, I think you should use random() rather than rand().
>>
>
> Thanks, will fix.
>

New version of patch is attached. Parameter "randomization" was introduced.
It controls whether to randomize choose. Choose algorithm was rewritten.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_choose_bloat-0.2.patch application/octet-stream 9.3 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-09-11 06:35:26
Message-ID: 1347345326.22899.51.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-09-04 at 19:21 +0400, Alexander Korotkov wrote:

> New version of patch is attached. Parameter "randomization" was
> introduced. It controls whether to randomize choose. Choose algorithm
> was rewritten.
>
Do you expect it to be bad in any reasonable situations? I'm inclined to
just make it always randomize if it's better. I think it would be hard
for a user to guess when it's better and when not.

Maybe it's useful to turn randomization off for testing purposes, e.g.
to ensure determinism?

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-09-11 06:43:20
Message-ID: CAPpHfdsBQ8O3pv5cvWFj8U5MEzF1xCdyRFgpY_VXgSGQOsfRww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 11, 2012 at 10:35 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Tue, 2012-09-04 at 19:21 +0400, Alexander Korotkov wrote:
>
> > New version of patch is attached. Parameter "randomization" was
> > introduced. It controls whether to randomize choose. Choose algorithm
> > was rewritten.
> >
> Do you expect it to be bad in any reasonable situations? I'm inclined to
> just make it always randomize if it's better. I think it would be hard
> for a user to guess when it's better and when not.
>

Randomization should increase IO when index doesn't entirely fit to cache.
Without randomization only fraction of the tree would be used for actual
insertions. While with randomization whole tree would be potentially used
for insertions.

> Maybe it's useful to turn randomization off for testing purposes, e.g.
> to ensure determinism?
>

Yes, that's another good point. For example, randomization impede
reproducing of bugs.

------
With best regards,
Alexander Korotkov.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-10-01 01:15:06
Message-ID: 1349054106.15580.31.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-09-04 at 19:21 +0400, Alexander Korotkov wrote:

> New version of patch is attached. Parameter "randomization" was
> introduced. It controls whether to randomize choose. Choose algorithm
> was rewritten.
>
Review comments:

1. Comment above while loop in gistRelocateBuildBuffersOnSplit needs to
be updated.

2. Typo in two places: "if randomization id required".

3. In gistRelocateBuildBuffersOnSplit, shouldn't that be:
splitPageInfo = &relocationBuffersInfos[bufferIndex];
not:
splitPageInfo = &relocationBuffersInfos[i];

4. It looks like the randomization is happening while trying to compare
the penalties. I think it may be more readable to separate those two
steps; e.g.

/* create a mapping whether randomization is on or not */
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
offsets[i - FirstOffsetNumber] = i;

if (randomization)
/* randomize offsets array */

for (i = 0; i < maxoff; i++)
{
offset = offsets[i];
...
}

That's just an idea; if you think it's more readable as-is (or if I am
misunderstanding) then let me know.

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-10-03 19:41:05
Message-ID: CAPpHfds2DifG0wRc79R4ZJ8438q8n5pjbcQk238h+Qp_0PnGBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 1, 2012 at 5:15 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Tue, 2012-09-04 at 19:21 +0400, Alexander Korotkov wrote:
>
> > New version of patch is attached. Parameter "randomization" was
> > introduced. It controls whether to randomize choose. Choose algorithm
> > was rewritten.
> >
> Review comments:
>
> 1. Comment above while loop in gistRelocateBuildBuffersOnSplit needs to
> be updated.
>

Actually, I didn't realize what exact comment you expect. Check if added
commend meets you expectations.

>
> 2. Typo in two places: "if randomization id required".
>
> 3. In gistRelocateBuildBuffersOnSplit, shouldn't that be:
> splitPageInfo = &relocationBuffersInfos[bufferIndex];
> not:
> splitPageInfo = &relocationBuffersInfos[i];
>

Fixed.

> 4. It looks like the randomization is happening while trying to compare
> the penalties. I think it may be more readable to separate those two
> steps; e.g.
>
> /* create a mapping whether randomization is on or not */
> for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
> offsets[i - FirstOffsetNumber] = i;
>
> if (randomization)
> /* randomize offsets array */
>
> for (i = 0; i < maxoff; i++)
> {
> offset = offsets[i];
> ...
> }
>
> That's just an idea; if you think it's more readable as-is (or if I am
> misunderstanding) then let me know.
>

Actually, current implementation comes from idea of creating possible less
overhead when randomization is off. I'll try to measure overhead in worst
case. If it is low enough then you proposal looks reasonable to me.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_choose_bloat-0.3.patch application/octet-stream 10.0 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-10-18 18:09:30
Message-ID: 20121018180930.GI3763@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov escribió:

> > 4. It looks like the randomization is happening while trying to compare
> > the penalties. I think it may be more readable to separate those two
> > steps; e.g.
> >
> > /* create a mapping whether randomization is on or not */
> > for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
> > offsets[i - FirstOffsetNumber] = i;
> >
> > if (randomization)
> > /* randomize offsets array */
> >
> > for (i = 0; i < maxoff; i++)
> > {
> > offset = offsets[i];
> > ...
> > }
> >
> > That's just an idea; if you think it's more readable as-is (or if I am
> > misunderstanding) then let me know.
>
> Actually, current implementation comes from idea of creating possible less
> overhead when randomization is off. I'll try to measure overhead in worst
> case. If it is low enough then you proposal looks reasonable to me.

Were you able to do these measurements? If not, I'll defer to your and
Jeff's judgement on what's the best next step here.

Jeff, do you think we need more review of this patch?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-10-21 07:03:36
Message-ID: 1350803016.27983.20.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2012-10-18 at 15:09 -0300, Alvaro Herrera wrote:
> Jeff, do you think we need more review of this patch?

In the patch, it refers to rd_options without checking for NULL first,
which needs to be fixed.

There's actually still one place where it says "id" rather than "is".
Just a nitpick.

Regarding my point 4 from the previous email, I mildly disagree with the
style, but I don't see a correctness problem there.

If the first two items are fixed, then the patch is fine with me.

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-11-02 08:54:33
Message-ID: CAPpHfdvnW03Vs4SRK1LpF6dLWa9Z7MzkEidhAyNb6AuexwG87g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 21, 2012 at 11:03 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Thu, 2012-10-18 at 15:09 -0300, Alvaro Herrera wrote:
> > Jeff, do you think we need more review of this patch?
>
> In the patch, it refers to rd_options without checking for NULL first,
> which needs to be fixed.
>
> There's actually still one place where it says "id" rather than "is".
> Just a nitpick.
>
> Regarding my point 4 from the previous email, I mildly disagree with the
> style, but I don't see a correctness problem there.
>
> If the first two items are fixed, then the patch is fine with me.
>

First two items are fixed in attached version of the patch.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_choose_bloat-0.4.patch application/octet-stream 10.1 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-12-08 15:05:09
Message-ID: 20121208150509.GC15668@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2012-11-02 12:54:33 +0400, Alexander Korotkov wrote:
> On Sun, Oct 21, 2012 at 11:03 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> > On Thu, 2012-10-18 at 15:09 -0300, Alvaro Herrera wrote:
> > > Jeff, do you think we need more review of this patch?
> >
> > In the patch, it refers to rd_options without checking for NULL first,
> > which needs to be fixed.
> >
> > There's actually still one place where it says "id" rather than "is".
> > Just a nitpick.
> >
> > Regarding my point 4 from the previous email, I mildly disagree with the
> > style, but I don't see a correctness problem there.
> >
> > If the first two items are fixed, then the patch is fine with me.
> >
>
> First two items are fixed in attached version of the patch.

So the patch is ready for committer now?

I notice there's no documentation about the new reloption at all?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-12-13 21:03:39
Message-ID: CAPpHfdt+7z0muee0Fd6qg4bQkLE9wX0-YR0Ef+bn9L64KkxAFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On Sat, Dec 8, 2012 at 7:05 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> I notice there's no documentation about the new reloption at all?
>

Thanks for notice! I've added small description to docs in the attached
patch.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gist_choose_bloat-0.5.patch application/octet-stream 11.5 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-12-14 08:46:23
Message-ID: 1355474783.11945.14.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-12-14 at 01:03 +0400, Alexander Korotkov wrote:
> Hi!
>
> On Sat, Dec 8, 2012 at 7:05 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> I notice there's no documentation about the new reloption at
> all?
>
>
> Thanks for notice! I've added small description to docs in the
> attached patch.

Here is an edited version of the documentation note. Please review to
see if you like my version.

Also, I fixed a compiler warning.

My tests showed a significant reduction in the size of a gist index with
many of the same penalty values. The run times showed mixed results,
however, and I didn't dig in much further because you've already done
significant testing.

Marking this one ready again.

Regards,
Jeff Davis

Attachment Content-Type Size
gist_choose_bloat-0.5A.patch.gz application/x-gzip 3.1 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-12-14 11:38:29
Message-ID: CAPpHfdu1XufRen8JNY7d-ZJVOPuUVBRAB4s+8Oy0SpJu3qEDcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 14, 2012 at 12:46 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> > Thanks for notice! I've added small description to docs in the
> > attached patch.
>
> Here is an edited version of the documentation note. Please review to
> see if you like my version.
>

Edited version looks good for me.

> Also, I fixed a compiler warning.
>

Thanks!

------
With best regards,
Alexander Korotkov.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-12-14 16:36:50
Message-ID: 50CB55A2.7080001@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

One question: does the randomization ever help when building a new
index? In the original test case, you repeatedly delete and insert
tuples, and I can see how the index can get bloated in that case. But I
don't see how bloat would occur when building the index from scratch.

BTW, I don't much like the option name "randomization". It's not clear
what's been randomized. I'd prefer something like
"distribute_on_equal_penalty", although that's really long. Better ideas?

- Heikki


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-12-14 18:12:09
Message-ID: 1355508729.11945.26.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-12-14 at 18:36 +0200, Heikki Linnakangas wrote:
> One question: does the randomization ever help when building a new
> index? In the original test case, you repeatedly delete and insert
> tuples, and I can see how the index can get bloated in that case. But I
> don't see how bloat would occur when building the index from scratch.

When building an index on a bunch of identical int4range values (in my
test, [1,10) ), the resulting index was about 17% smaller.

If the current algorithm always chooses to insert on the left-most page,
then it seems like there would be a half-filled right page for every
split that occurs. Is that reasoning correct?

However, I'm having some second thoughts about the run time for index
builds. Maybe we should have a few more tests to determine if this
should really be the default or just an option?

> BTW, I don't much like the option name "randomization". It's not clear
> what's been randomized. I'd prefer something like
> "distribute_on_equal_penalty", although that's really long. Better ideas?

I agree that "randomization" is vague, but I can't think of anything
better.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-21 05:48:22
Message-ID: 9994.1358747302@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Fri, 2012-12-14 at 18:36 +0200, Heikki Linnakangas wrote:
>> BTW, I don't much like the option name "randomization". It's not clear
>> what's been randomized. I'd prefer something like
>> "distribute_on_equal_penalty", although that's really long. Better ideas?

> I agree that "randomization" is vague, but I can't think of anything
> better.

I looked at this patch. ISTM we should not have the option at all but
just do it always. I cannot believe that always-go-left is ever a
preferable strategy in the long run; the resulting imbalance in the
index will surely kill any possible benefit. Even if there are some
cases where it miraculously fails to lose, how many users are going to
realize that applies to their case and make use of the option?

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-21 06:57:42
Message-ID: 1358751462.992.80.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
> I looked at this patch. ISTM we should not have the option at all but
> just do it always. I cannot believe that always-go-left is ever a
> preferable strategy in the long run; the resulting imbalance in the
> index will surely kill any possible benefit. Even if there are some
> cases where it miraculously fails to lose, how many users are going to
> realize that applies to their case and make use of the option?

Sounds good to me.

If I remember correctly, there was also an argument that it may be
useful for repeatable test results. That's a little questionable for
performance (except in those cases where few penalties are identical
anyway), but could plausibly be useful for a crash report or something.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-21 13:06:09
Message-ID: 1445.1358773569@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>> I looked at this patch. ISTM we should not have the option at all but
>> just do it always. I cannot believe that always-go-left is ever a
>> preferable strategy in the long run; the resulting imbalance in the
>> index will surely kill any possible benefit. Even if there are some
>> cases where it miraculously fails to lose, how many users are going to
>> realize that applies to their case and make use of the option?

> Sounds good to me.

> If I remember correctly, there was also an argument that it may be
> useful for repeatable test results. That's a little questionable for
> performance (except in those cases where few penalties are identical
> anyway), but could plausibly be useful for a crash report or something.

Meh. There's already a random decision, in the equivalent place and for
a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
complained about that being indeterminate, so I'm unconvinced that
there's a market for it with gist.

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-21 13:19:35
Message-ID: 50FD4067.2000902@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.01.2013 15:06, Tom Lane wrote:
> Jeff Davis<pgsql(at)j-davis(dot)com> writes:
>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>>> I looked at this patch. ISTM we should not have the option at all but
>>> just do it always. I cannot believe that always-go-left is ever a
>>> preferable strategy in the long run; the resulting imbalance in the
>>> index will surely kill any possible benefit. Even if there are some
>>> cases where it miraculously fails to lose, how many users are going to
>>> realize that applies to their case and make use of the option?
>
>> Sounds good to me.
>
>> If I remember correctly, there was also an argument that it may be
>> useful for repeatable test results. That's a little questionable for
>> performance (except in those cases where few penalties are identical
>> anyway), but could plausibly be useful for a crash report or something.
>
> Meh. There's already a random decision, in the equivalent place and for
> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
> complained about that being indeterminate, so I'm unconvinced that
> there's a market for it with gist.

I wonder if it would work for gist to do something similar to
_bt_findinsertloc, and have a bias towards the left page, but sometimes
descend to one of the pages to the right. You would get the cache
locality of usually descending down the same subtree, but also fill the
pages to the right. Jeff / Alexander, want to give that a shot?

When building an index from scratch, using the new buffered index build,
you could do a lot better than fill each page like with regular inserts
and split when one fills up. You could e.g buffer 10 pages worth of
tuples, and perform a 10-way split of all the buffered tuples together,
distributing them equally to 10 pages (or 10+something, to leave some
room for updates). But that's clearly a separate and much larger patch.

- Heikki


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 18:44:39
Message-ID: 51018117.4040400@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.01.2013 15:19, Heikki Linnakangas wrote:
> On 21.01.2013 15:06, Tom Lane wrote:
>> Jeff Davis<pgsql(at)j-davis(dot)com> writes:
>>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>>>> I looked at this patch. ISTM we should not have the option at all but
>>>> just do it always. I cannot believe that always-go-left is ever a
>>>> preferable strategy in the long run; the resulting imbalance in the
>>>> index will surely kill any possible benefit. Even if there are some
>>>> cases where it miraculously fails to lose, how many users are going to
>>>> realize that applies to their case and make use of the option?
>>
>>> Sounds good to me.
>>
>>> If I remember correctly, there was also an argument that it may be
>>> useful for repeatable test results. That's a little questionable for
>>> performance (except in those cases where few penalties are identical
>>> anyway), but could plausibly be useful for a crash report or something.
>>
>> Meh. There's already a random decision, in the equivalent place and for
>> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
>> complained about that being indeterminate, so I'm unconvinced that
>> there's a market for it with gist.
>
> I wonder if it would work for gist to do something similar to
> _bt_findinsertloc, and have a bias towards the left page, but sometimes
> descend to one of the pages to the right. You would get the cache
> locality of usually descending down the same subtree, but also fill the
> pages to the right. Jeff / Alexander, want to give that a shot?

I did some experimenting with that. I used the same test case Alexander
did, with geonames data, and compared unpatched version, the original
patch, and the attached patch that biases the first "best" tuple found,
but still sometimes chooses the other equally good ones.

testname | initsize | finalsize | idx_blks_read | idx_blks_hit
----------------+----------+-----------+---------------+--------------
patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331
unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647
unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759
patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738
origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373

All these tests were performed with shared_buffers=4MB, so that the
index won't fit completely in shared buffers, and I could use the
idx_blk_read/hit ratio as a measure of cache-friendliness. The
"origpath" test was with a simplified version of Alexander's patch, see
attached. It's the same as the original, but with the
randomization-option and gist-build related code removed. The patched-10
and patched-2 tests are two variants with my patch, with 1/10 and 1/2
chance of moving to the next equal tuple, respectively. The differences
in cache hit ratio might be just a result of different index sizes. I
included two unpatched runs above to show that there's a fair amount of
noise in these tests. That's because of the random nature of the test
case; it picks rows to delete and insert at random.

I think the conclusion is that all of these patches are effective. The
1/10 variant is less effective, as expected, as it's closer in behavior
to the unpatched behavior than the others. The 1/2 variant seems as good
as the original patch.

I performed another test, by inserting 1000000 duplicate values on an
empty table.

testname | finalsize | idx_blks_read | idx_blks_hit
------------+-----------+---------------+--------------
unpatched | 89030656 | 21350 | 2972033
patched-10 | 88973312 | 21450 | 2971920
patched-2 | 88481792 | 22947 | 2970260
origpatch | 61186048 | 761817 | 2221500

The original patch produces a much smaller index in this test, but since
the inserts are distributed randomly, the pages are accessed randomly
which is bad for caching.

A table full of duplicates isn't very realistic, but overall, I'm
leaning towards my version of this patch (gistchoose-2.patch). It has
less potential for causing a regression in existing applications, but is
just as effective in the original scenario of repeated delete+insert.

- Heikki

Attachment Content-Type Size
duplicatetest.sh application/x-sh 625 bytes
gistbloat.sh application/x-sh 1.9 KB
gistchoose-2.patch text/x-diff 2.7 KB
origpatch-simplified.patch text/x-diff 2.5 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 19:26:15
Message-ID: 51018AD7.5040402@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.01.2013 15:19, Heikki Linnakangas wrote:
> On 21.01.2013 15:06, Tom Lane wrote:
>> Jeff Davis<pgsql(at)j-davis(dot)com> writes:
>>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>>>> I looked at this patch. ISTM we should not have the option at all but
>>>> just do it always. I cannot believe that always-go-left is ever a
>>>> preferable strategy in the long run; the resulting imbalance in the
>>>> index will surely kill any possible benefit. Even if there are some
>>>> cases where it miraculously fails to lose, how many users are going to
>>>> realize that applies to their case and make use of the option?
>>
>>> Sounds good to me.
>>
>>> If I remember correctly, there was also an argument that it may be
>>> useful for repeatable test results. That's a little questionable for
>>> performance (except in those cases where few penalties are identical
>>> anyway), but could plausibly be useful for a crash report or something.
>>
>> Meh. There's already a random decision, in the equivalent place and for
>> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
>> complained about that being indeterminate, so I'm unconvinced that
>> there's a market for it with gist.
>
> I wonder if it would work for gist to do something similar to
> _bt_findinsertloc, and have a bias towards the left page, but sometimes
> descend to one of the pages to the right. You would get the cache
> locality of usually descending down the same subtree, but also fill the
> pages to the right. Jeff / Alexander, want to give that a shot?

I did some experimenting with that. I used the same test case Alexander
did, with geonames data, and compared unpatched version, the original
patch, and the attached patch that biases the first "best" tuple found,
but still sometimes chooses the other equally good ones.

testname | initsize | finalsize | idx_blks_read | idx_blks_hit
----------------+----------+-----------+---------------+--------------
patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331
unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647
unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759
patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738
origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373

All these tests were performed with shared_buffers=4MB, so that the
index won't fit completely in shared buffers, and I could use the
idx_blk_read/hit ratio as a measure of cache-friendliness. The
"origpath" test was with a simplified version of Alexander's patch, see
attached. It's the same as the original, but with the
randomization-option and gist-build related code removed. The patched-10
and patched-2 tests are two variants with my patch, with 1/10 and 1/2
chance of moving to the next equal tuple, respectively. The differences
in cache hit ratio might be just a result of different index sizes. I
included two unpatched runs above to show that there's a fair amount of
noise in these tests. That's because of the random nature of the test
case; it picks rows to delete and insert at random.

I think the conclusion is that all of these patches are effective. The
1/10 variant is less effective, as expected, as it's closer in behavior
to the unpatched behavior than the others. The 1/2 variant seems as good
as the original patch.

I performed another test, by inserting 1000000 duplicate values on an
empty table.

testname | finalsize | idx_blks_read | idx_blks_hit
------------+-----------+---------------+--------------
unpatched | 89030656 | 21350 | 2972033
patched-10 | 88973312 | 21450 | 2971920
patched-2 | 88481792 | 22947 | 2970260
origpatch | 61186048 | 761817 | 2221500

The original patch produces a much smaller index in this test, but since
the inserts are distributed randomly, the pages are accessed randomly
which is bad for caching.

A table full of duplicates isn't very realistic, but overall, I'm
leaning towards my version of this patch (gistchoose-2.patch). It has
less potential for causing a regression in existing applications, but is
just as effective in the original scenario of repeated delete+insert.

- Heikki

Attachment Content-Type Size
duplicatetest.sh application/x-sh 625 bytes
gistbloat.sh application/x-sh 1.9 KB
gistchoose-2.patch text/x-diff 2.7 KB
origpatch-simplified.patch text/x-diff 2.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 19:44:59
Message-ID: 23619.1359056699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> I did some experimenting with that. I used the same test case Alexander
> did, with geonames data, and compared unpatched version, the original
> patch, and the attached patch that biases the first "best" tuple found,
> but still sometimes chooses the other equally good ones.

> testname | initsize | finalsize | idx_blks_read | idx_blks_hit
> ----------------+----------+-----------+---------------+--------------
> patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331
> unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647
> unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759
> patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738
> origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373

> I think the conclusion is that all of these patches are effective. The
> 1/10 variant is less effective, as expected, as it's closer in behavior
> to the unpatched behavior than the others. The 1/2 variant seems as good
> as the original patch.

At least on this example, it seems a tad better, if you look at index
size.

> A table full of duplicates isn't very realistic, but overall, I'm
> leaning towards my version of this patch (gistchoose-2.patch). It has
> less potential for causing a regression in existing applications, but is
> just as effective in the original scenario of repeated delete+insert.

+1 for this patch, but I think the comments could use more work. I was
convinced it was wrong on first examination, mainly because it's hard to
follow the underdocumented look_further_on_equal logic. I propose the
attached, which is the same logic with better comments (I also chose to
rename and invert the sense of the state variable, because it seemed
easier to follow this way ... YMMV on that though).

regards, tom lane

Attachment Content-Type Size
gistchoose-3.patch text/x-patch 4.0 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 20:08:39
Message-ID: CAPpHfduF7taDg=eNgAVQtU6bGrRRnWDN1Z2YOENtmPTS1Do91w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 24, 2013 at 11:26 PM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 21.01.2013 15:19, Heikki Linnakangas wrote:
>
>> On 21.01.2013 15:06, Tom Lane wrote:
>>
>>> Jeff Davis<pgsql(at)j-davis(dot)com> writes:
>>>
>>>> On Mon, 2013-01-21 at 00:48 -0500, Tom Lane wrote:
>>>>
>>>>> I looked at this patch. ISTM we should not have the option at all but
>>>>> just do it always. I cannot believe that always-go-left is ever a
>>>>> preferable strategy in the long run; the resulting imbalance in the
>>>>> index will surely kill any possible benefit. Even if there are some
>>>>> cases where it miraculously fails to lose, how many users are going to
>>>>> realize that applies to their case and make use of the option?
>>>>>
>>>>
>>> Sounds good to me.
>>>>
>>>
>>> If I remember correctly, there was also an argument that it may be
>>>> useful for repeatable test results. That's a little questionable for
>>>> performance (except in those cases where few penalties are identical
>>>> anyway), but could plausibly be useful for a crash report or something.
>>>>
>>>
>>> Meh. There's already a random decision, in the equivalent place and for
>>> a comparable reason, in btree (cf _bt_findinsertloc). Nobody's ever
>>> complained about that being indeterminate, so I'm unconvinced that
>>> there's a market for it with gist.
>>>
>>
>> I wonder if it would work for gist to do something similar to
>> _bt_findinsertloc, and have a bias towards the left page, but sometimes
>> descend to one of the pages to the right. You would get the cache
>> locality of usually descending down the same subtree, but also fill the
>> pages to the right. Jeff / Alexander, want to give that a shot?
>>
>
> I did some experimenting with that. I used the same test case Alexander
> did, with geonames data, and compared unpatched version, the original
> patch, and the attached patch that biases the first "best" tuple found, but
> still sometimes chooses the other equally good ones.
>
> testname | initsize | finalsize | idx_blks_read | idx_blks_hit
> ----------------+----------+--**---------+---------------+----**----------
> patched-10-4mb | 75497472 | 90202112 | 5853604 | 10178331
> unpatched-4mb | 75145216 | 94863360 | 5880676 | 10185647
> unpatched-4mb | 75587584 | 97165312 | 5903107 | 10183759
> patched-2-4mb | 74768384 | 81403904 | 5768124 | 10193738
> origpatch-4mb | 74883072 | 82182144 | 5783412 | 10185373
>
> All these tests were performed with shared_buffers=4MB, so that the index
> won't fit completely in shared buffers, and I could use the
> idx_blk_read/hit ratio as a measure of cache-friendliness. The "origpath"
> test was with a simplified version of Alexander's patch, see attached. It's
> the same as the original, but with the randomization-option and gist-build
> related code removed. The patched-10 and patched-2 tests are two variants
> with my patch, with 1/10 and 1/2 chance of moving to the next equal tuple,
> respectively. The differences in cache hit ratio might be just a result of
> different index sizes. I included two unpatched runs above to show that
> there's a fair amount of noise in these tests. That's because of the random
> nature of the test case; it picks rows to delete and insert at random.
>
> I think the conclusion is that all of these patches are effective. The
> 1/10 variant is less effective, as expected, as it's closer in behavior to
> the unpatched behavior than the others. The 1/2 variant seems as good as
> the original patch.
>

Does two unpatched-4mb lines are different by coincidence? If so then
variance is significant and we need more experiments to actually compare
patched-2-4mb and origpatch-4mb lines.
There is another cause of overhead when use randomization in gistchoose:
extra penalty calls. It could be significant when index fits to cache. In
order evade it I especially change behaviour of my patch from "look
sequentially and choose random" to "look in random order". I think we need
to include comparison of CPU time.

------
With best regards,
Alexander Korotkov.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 20:35:24
Message-ID: 24647.1359059724@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> There is another cause of overhead when use randomization in gistchoose:
> extra penalty calls. It could be significant when index fits to cache. In
> order evade it I especially change behaviour of my patch from "look
> sequentially and choose random" to "look in random order". I think we need
> to include comparison of CPU time.

Hmm ... actually, isn't that an argument in favor of Heikki's method?
The way he's doing it, we can exit without making additional penalty
calls once we've found a zero-penalty tuple and decided not to look
further (which is something with a pretty high probability).

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 20:53:21
Message-ID: 51019F41.1060204@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24.01.2013 22:35, Tom Lane wrote:
> Alexander Korotkov<aekorotkov(at)gmail(dot)com> writes:
>> There is another cause of overhead when use randomization in gistchoose:
>> extra penalty calls. It could be significant when index fits to cache. In
>> order evade it I especially change behaviour of my patch from "look
>> sequentially and choose random" to "look in random order". I think we need
>> to include comparison of CPU time.
>
> Hmm ... actually, isn't that an argument in favor of Heikki's method?
> The way he's doing it, we can exit without making additional penalty
> calls once we've found a zero-penalty tuple and decided not to look
> further (which is something with a pretty high probability).

No, I think Alexander is right, although I believe the difference is
minimal in practice.

If we assume that the there are no zero-penalty tuples on the page, with
both approaches you have to call penalty on every tuple to know which is
best. If there are zero-penalty tuples, then there is a small
difference. With Alexander's method, you can stop looking as soon as you
find a zero-penalty tuple (the order you check the tuples is random).
With my method, you can stop looking as soon as you find a zero-penalty
tuple, *and* you decide to not look further. With the 1/2 probability to
stop looking further, you give up on average after 2 tuples.

So if I'm doing my math right, my patch does on average between 1x - 2x
as many penalty calls as Alexander's, depending on how many zero-penalty
tuples there are. The 2x case is when the page is full of zero-penalty
tuples, in which case the difference isn't big in absolute terms; 2
penalty calls per page versus 1.

BTW, one thing that I wondered about this: How expensive is random()?
I'm assuming not very, but I don't really know. Alexander's patch called
random() for every tuple on the page, while I call it only once for each
equal-penalty tuple. If it's really cheap, I think my/Tom's patch could
be slightly simplified by always initializing keep_current_best with
random() at top of the function, and eliminating the -1 "haven't decided
yet" state. OTOH, if random() is expensive, I note that we only need one
bit of randomness, so we could avoid calling random() so often if we
utilized all the bits from each random() call.

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2013-01-24 21:07:36
Message-ID: 25278.1359061656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> BTW, one thing that I wondered about this: How expensive is random()?
> I'm assuming not very, but I don't really know. Alexander's patch called
> random() for every tuple on the page, while I call it only once for each
> equal-penalty tuple. If it's really cheap, I think my/Tom's patch could
> be slightly simplified by always initializing keep_current_best with
> random() at top of the function, and eliminating the -1 "haven't decided
> yet" state.

I thought about that too, and concluded that random() is probably
expensive enough that we don't want to call it unnecessarily.

> OTOH, if random() is expensive, I note that we only need one
> bit of randomness, so we could avoid calling random() so often if we
> utilized all the bits from each random() call.

Meh. That would hard-wire the decision that the probability of keeping
a best tuple is exactly 0.5. I'd rather keep the flexibility to tune it
later. The way your patch is set up, it seems unlikely that it will
call random() very many times per page, so I'm not that concerned about
minimizing the number of calls further. (Also, in the probably-common
case that there are no exactly equally good alternatives, this would
actually be a net loss since it would add one unnecessary call.)

So basically, Alexander's method requires more random() calls and fewer
penalty() calls than yours, at least in typical cases. It's hard to say
which is faster without benchmarking, and it would matter which opclass
you were testing on.

regards, tom lane