From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Statistics and selectivity estimation for ranges |
Date: | 2013-03-01 14:22:18 |
Message-ID: | CAPpHfdt92Fh5Jbovn-EXM_7Yv4Mhr6CvaVo=37jE85h1G2RGkA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Feb 13, 2013 at 5:55 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:
> On Wed, Feb 13, 2013 at 5:28 PM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 04.01.2013 10:42, Alexander Korotkov wrote:
>>
>>> /*
>>> * Calculate selectivity of "&&" operator using histograms of range
>>> lower bounds
>>> * and histogram of range lengths.
>>> */
>>> static double
>>> calc_hist_selectivity_overlap(**TypeCacheEntry *typcache, RangeBound
>>> *lower,
>>> RangeBound *upper, RangeBound
>>> *hist_lower, int hist_nvalues,
>>> Datum
>>> *length_hist_values, int length_hist_nvalues)
>>>
>>
>> We already have code to estimate &&, based on the lower and upper bound
>> histograms:
>>
>> case OID_RANGE_OVERLAP_OP:
>>> case OID_RANGE_CONTAINS_ELEM_OP:
>>> /*
>>> * A && B <=> NOT (A << B OR A >> B).
>>> *
>>> * "range @> elem" is equivalent to "range &&
>>> [elem,elem]". The
>>> * caller already constructed the singular range
>>> from the element
>>> * constant, so just treat it the same as &&.
>>> */
>>> hist_selec =
>>> calc_hist_selectivity_scalar(**typcache,
>>> &const_lower, hist_upper,
>>>
>>> nhist, false);
>>> hist_selec +=
>>> (1.0 - calc_hist_selectivity_scalar(**typcache,
>>> &const_upper, hist_lower,
>>>
>>> nhist, true));
>>> hist_selec = 1.0 - hist_selec;
>>> break;
>>>
>>
>> I don't think the method based on lower bound and length histograms is
>> any better. In fact, my gut feeling is that it's less accurate. I'd suggest
>> dropping that part of the patch.
>>
>
> Right. This estimation has an accuracy of histogram, while estimation
> based on lower bound and length histograms rely on additional assumption
> about independence of lower bound and length histogram. We can sum A << B
> and A >> B probabilities because they are mutually exclusive. It's pretty
> evident but I would like to mention it in the comments, because typical
> assumption about events in statistics calculation is their independence.
>
These changes were made in attached patch.
------
With best regards,
Alexander Korotkov.
Attachment | Content-Type | Size |
---|---|---|
range_stat-0.11.patch.gz | application/x-gzip | 8.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ants Aasma | 2013-03-01 15:01:27 | Re: Materialized views WIP patch |
Previous Message | Kevin Grittner | 2013-03-01 14:18:33 | Re: Materialized views WIP patch |