Re: GiST support for inet datatypes

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-31 01:58:09
Message-ID: 52EB0331.5090500@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/23/2014 11:22 PM, Emre Hasegeli wrote:
> inet-gist
> ---------
> I realized that create extension btree_gist fails with the patch.
>
> ERROR: could not make operator class "gist_inet_ops" be default for type inet
> DETAIL: Operator class "inet_ops" already is the default.
>
> I think making the new operator class default makes more sense. Name conflict
> can be a bigger problem. Should I rename the new operator class?

I agree that the new operator class should be the default one, it is
more useful and also not buggy. No idea about the naming.

>> The only thing is that I think your index would do poorly on the case where
>> all values share a prefix before the netmask but have different values after
>> the netmask, since gist_union and gist_penalty does not care about the bits
>> after the netmask. Am I correct? Have you thought anything about this?
>
> Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits()
> but I could not make it work with the cidr type. It can be done by adding
> operator as a parameter for union, penalty and picksplit functions on the
> GiST framework. I am not sure it worths the trouble. It would only help
> the basic comparison operators and it would slow down other operators because
> network length comparison before network address would not be possible for
> the intermediate nodes. I mentioned this behavior of the operator class on
> the paragraph added to the documentation.

I think this is fine since it does not seem like a very useful case to
support to me. Would be worth doing only if it is easy to do.

A separate concern: we still have not come to a decision about operators
for inet here. I do not like the fact that the operators differ between
ranges and inet. But I feel this might be out of scope for this patch.

Does any third party want to chime in with an opinion?

Current inet/cidr
<< is contained within
<<= is contained within or equals
>> contains
>>= contains or equals

Range types
@> contains range
<@ range is contained by
&& overlap (have points in common)
<< strictly left of
>> strictly right of

>> inet-selfuncs
>> -------------
>>
>> Here I have to honestly admit I am not sure if I can tell if your
>> selectivity function is correct. Your explanation sounds convincing and the
>> general idea of looking at the histogram is correct. I am just have no idea
>> if the details are right.
>
> I tried to improve it in this version. I hope it is more readable now. I used
> the word inclusion instead of overlap, made some changes on the algorithm
> to make it more suitable to the other operators.
>
> Using the histogram for inclusion which is originally for basic comparison,
> is definitely not correct. It is an empirical approach. I have tried several
> versions of it, tested them with different data sets and thought it is better
> than nothing.

Thanks for the updates. The code in this version is cleaner and easier
to follow.

I am not convinced of your approach to calculating the selectivity from
the histogram. The thing I have the problem with is the clever trickery
involved with how you handle different operator types. I prefer the
clearer code of the range types with how calc_hist_selectivity_scalar is
used. Is there any reason for why that approach would not work here or
result in worse code?

I have attached a patch where I improved the language of your comment
describing the workings of the selectivity estimation. Could you please
check it so I did not change the meaning of any of it?

I see from the tests that you still are missing selectivity functions
for operators, what is your plan for this?

--
Andreas Karlsson

Attachment Content-Type Size
inet-selfuncs-commentfix.patch text/x-patch 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-01-31 02:33:25 Re: Making strxfrm() blobs in indexes work
Previous Message Merlin Moncure 2014-01-31 01:13:08 Re: jsonb and nested hstore