Re: Statistics and selectivity estimation for ranges

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Statistics and selectivity estimation for ranges
Date: 2012-09-04 13:27:00
Message-ID: CAPpHfds=cxF13Zg3WBzbqBy35djGSbKpTve_i73RMQKY_B-08g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 5:00 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> On 24.08.2012 18:51, Heikki Linnakangas wrote:
>
>> On 20.08.2012 00:31, Alexander Korotkov wrote:
>>
>>> New version of patch.
>>> * Collect new stakind STATISTIC_KIND_BOUNDS_**HISTOGRAM, which is lower
>>> and
>>> upper bounds histograms combined into single ranges array, instead
>>> of STATISTIC_KIND_HISTOGRAM.
>>>
>>
>> One worry I have about that format for the histogram is that you
>> deserialize all the values in the histogram, before you do the binary
>> searches. That seems expensive if stats target is very high. I guess you
>> could deserialize them lazily to alleviate that, though.
>>
>> * Selectivity estimations for>,>=,<,<=,<<,>>,&<,&> using this
>>> histogram.
>>>
>>
>> Thanks!
>>
>> I'm going to do the same for this that I did for the sp-gist patch, and
>> punt on the more complicated parts for now, and review them separately.
>> Attached is a heavily edited version that doesn't include the length
>> histogram, and consequently doesn't do anything smart for the &< and &>
>> operators. && is estimated using the bounds histograms. There's now a
>> separate stakind for the empty range fraction, since it's not included
>> in the length-histogram.
>>
>> I tested this on a dataset containing birth and death dates of persons
>> that have a wikipedia page, obtained from the dbpedia.org project. I can
>> send a copy if someone wants it. The estimates seem pretty accurate.
>>
>> Please take a look, to see if I messed up something.
>>
>
> Committed this with some further changes.

Addon patch is attached. Actually, I don't get your intention of
introducing STATISTIC_KIND_RANGE_EMPTY_FRAC stakind. Did you plan to leave
it as empty frac in distinct stakind or replace this stakind
with STATISTIC_KIND_LENGTH_HISTOGRAM? In the attached
patch STATISTIC_KIND_RANGE_EMPTY_FRAC is replaced
with STATISTIC_KIND_LENGTH_HISTOGRAM.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
range_stat-addon-0.1.patch.gz application/x-gzip 6.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2012-09-04 13:42:26 Re: Reduce the time to know trigger_fi​le's existence
Previous Message Amit kapila 2012-09-04 13:25:34 Re: [WIP PATCH] for Performance Improvement in Buffer Management