Re: gistchoose vs. bloat

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: gistchoose vs. bloat
Date: 2012-08-20 17:13:16
Message-ID: CAPpHfdsXH5eDOaMndbONL4VJM5pbOSO5gXOZWmYS7m98pXBcUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 20, 2012 at 7:13 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> I took a look at this patch. The surrounding code is pretty messy (not
> necessarily because of your patch). A few comments would go a long way.
>
> The 'which_grow' array is initialized as it goes, first using pointer
> notations ("*which_grows = -1.0") and then using subscript notation. As
> far as I can tell, the first r->rd_att->natts of the array (the only
> elements that matter) need to be written the first time through anyway.
> Why not just replace "which_grow[j] < 0" with "i == FirstOffsetNumber"
> and add a comment that we're initializing the penalties with the first
> index tuple?
>
> The 'sum_grow' didn't make any sense, thank you for getting rid of that.
>
> Also, we should document that the earlier attributes always take
> precedence, which is why we break out of the inner loop as soon as we
> encounter an attribute with a higher penalty.
>
> Please add a comment indicating why you are randomly choosing among the
> equal penalties.
>
> I think that there might be a problem with the logic, as well. Let's say
> you have two attributes and there are two index tuples, it1 and it2;
> with penalties [10,10] and [10,100] respectively. The second time
> through the outer loop, with i = 2, you might (P=0.5) assign 2 to the
> 'which' variable in the first iteration of the inner loop, before it
> realizes that it2 actually has a higher penalty. I think you need to
> finish out the inner loop and have a flag that indicates that all
> attributes are equal before you do the probabilistic replacement.
>

Current gistchoose code has a bug. I've started separate thread about it.
http://archives.postgresql.org/pgsql-hackers/2012-08/msg00544.php
Also, it obviously needs more comments.

Current state of patch is more proof of concept than something ready. I'm
going to change it in following ways:
1) We don't know how expensive user penalty function is. So, I'm going to
change randomization algorithm so that it doesn't increase number of
penalty calls in average.
2) Since, randomization could produce additional IO, there are probably no
optimal solution for all the cases. We could introduce user-visible option
which enables or disables randomization. However, default value of this
option is another question.

> Also, I think you should use random() rather than rand().
>

Thanks, will fix.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-08-20 17:26:54 Re: bug of pg_trgm?
Previous Message Alexander Korotkov 2012-08-20 16:57:20 Fix for gistchoose