Re: GiST support for inet datatypes

Lists: pgsql-hackers
From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: GiST support for inet datatypes
Date: 2013-12-17 18:58:13
Message-ID: CAE2gYzyUESd188j0b290Gf16502H9B-LwNRS3rFi1SwDb9Qcgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Attached patch adds GiST support to the inet datatypes with two new
operators. Overlaps operator can be used with exclusion constraints.
Is adjacent to operator is just the negator of it. Index uses only
the network bits of the addresses. Except for the new operators and
is contained within, contains; basic comparison operators are also
supported.

Query planner never chooses to use the index for the operators which
the index is particularly useful because selectivity estimation functions
are missing. I am planning to work on them.

I also wanted to add strictly left of and strictly right of operators
but I did not want to introduce new symbols. I think we need style
guidelines for them. Range types use <@ and @> for is contained within
and contains operators; << and >> for strictly left of and strictly right of
operators. It would be nice if we could change the symbols for contains
and is contained within operators of the inet datatypes. Then we could
use the old ones for strictly left of and strictly right of operators.

I did not touch opr_sanity regression tests as I did not decide
how to solve these problems. I did not add documentation except
the new operators. It would be nice to mention the index and exclusion
constraints for inet datatypes somewhere. I did not know which page
would be more suitable.

Attachment Content-Type Size
inet-gist-v1.patch application/octet-stream 39.1 KB

From: David Fetter <david(at)fetter(dot)org>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2013-12-23 05:20:49
Message-ID: 20131223052049.GC15630@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 17, 2013 at 08:58:13PM +0200, Emre Hasegeli wrote:
> Hi,
>
> Attached patch adds GiST support to the inet datatypes with two new
> operators. Overlaps operator can be used with exclusion constraints.
> Is adjacent to operator is just the negator of it. Index uses only
> the network bits of the addresses. Except for the new operators and
> is contained within, contains; basic comparison operators are also
> supported.

Please add this patch to the upcoming Commitfest at
https://commitfest.postgresql.org/action/commitfest_view?id=21

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-05 20:36:08
Message-ID: CAE2gYzw+Qp8W5WP-s_G0phkXm+AnyX8uf0NCUa+t=UznUoBZLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013-12-17 Emre Hasegeli <emre(at)hasegeli(dot)com>:
> Query planner never chooses to use the index for the operators which
> the index is particularly useful because selectivity estimation functions
> are missing. I am planning to work on them.

Attached patch adds selectivity estimation functions for the overlap and
adjacent operators. Other operators need a bit more work. I want to send it
before to get some feedback.

Attachment Content-Type Size
inet-selfuncs-v1.patch application/octet-stream 13.0 KB

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-19 01:45:06
Message-ID: 52DB2E22.7080009@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I will review your two patches (gist support + selectivity). This is
part 1 of my review. I will look more into the actual GiST
implementation in a couple of days, but thought I could provide you with
my initial input right away.

inet-gist
---------

General:

I like the idea of the patch and think the && operator is useful for
exclusion constraints, and that indexing the contains operator is useful
for IP-address lookups. There is an extension, ip4r, which adds a GiST
indexed type for IP ranges but since we have the inet type in core I
think it should have GiST indexes.

I am not convinced an adjacent operator is useful for the inet type, but
if it is included it should be indexed just like -|- of ranges. We
should try to keep these lists of indexed operators the same.

Compilation:

Compiled without errors.

Regression tests:

One of the broken regression tests seems unrelated to the selectivity
functions.

-- Make a list of all the distinct operator names being used in particular
-- strategy slots.

I think it would be fine just to add the newly indexed operators here,
but the more indexed operators we get in the core the less useful this
test becomes.

Source code:

The is adjacent to operator, -|-, does not seem to be doing the right
thing. In the code it is implemented as the inverse of && which is not
true. The below comparison should return false.

SELECT '10.0.0.1'::inet -|- '127.0.0.1'::inet;
?column?
----------
t
(1 row)

I am a bit suspicious about your memcmp based optimization in
bitncommon, but it could be good. Have you benchmarked it compared to
doing the same thing with a loop?

Documentation:

Please use examples in the operator table which will evaluate to true,
and for the && case an example where not both sides are the same.

I have not found a place either in the documentation where it is
documented which operators are documented. I would suggest just adding a
short note after the operators table.

inet-selfuncs
-------------

Compilation:

There were some new warnings caused by this patch (with gcc 4.8.2). The
warnings were all about the use of uninitialized variables (right,
right_min_bits, right_order) in inet_hist_overlap_selectivity. Looking
at the code I see that they are harmless but you should still rewrite
the loop to silence the warnings.

Regression tests:

There are two broken

-- Insist that all built-in pg_proc entries have descriptions

Here you should just add descriptions for the new functions in pg_proc.h.

-- Check that all opclass search operators have selectivity estimators.

Fails due to missing selectivity functions for the operators. I think
you should at least for now do like the range types and use areajoinsel,
contjoinsel, etc for the join selectivity estimation.

Source code:

The selectivity estimation functions, network_overlap_selectivity and
network_adjacent_selectivity, should follow the naming conventions of
the other selectivity estimation functions. They are named things like
tsmatchsel, tsmatchjoinsel, and rangesel.

--
Andreas Karlsson


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-19 10:10:12
Message-ID: CAE2gYzwKR2ib9JfV9U7fh4mAEPWEnQcUjd8H8zNV7JnuP9EgmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-01-19 Andreas Karlsson <andreas(at)proxel(dot)se>:
> Hi,
>
> I will review your two patches (gist support + selectivity). This is part 1
> of my review. I will look more into the actual GiST implementation in a
> couple of days, but thought I could provide you with my initial input right
> away.

Thank you for looking at it.

>
> inet-gist
> ---------
>
> General:
>
> I like the idea of the patch and think the && operator is useful for
> exclusion constraints, and that indexing the contains operator is useful for
> IP-address lookups. There is an extension, ip4r, which adds a GiST indexed
> type for IP ranges but since we have the inet type in core I think it should
> have GiST indexes.
>
> I am not convinced an adjacent operator is useful for the inet type, but if
> it is included it should be indexed just like -|- of ranges. We should try
> to keep these lists of indexed operators the same.

I added it just not to leave negotor field empty. It can also be useful with
exclusion constraints but not with GiST support. I did not add GiST support
for it and the not equals operator because they do not fit the index
structure. I can just remove the operator for now.

>
> Compilation:
>
> Compiled without errors.
>
> Regression tests:
>
> One of the broken regression tests seems unrelated to the selectivity
> functions.
>
> -- Make a list of all the distinct operator names being used in particular
> -- strategy slots.
>
> I think it would be fine just to add the newly indexed operators here, but
> the more indexed operators we get in the core the less useful this test
> becomes.

I did not add the new operators to the list because I do not feel right
about supporting <<, <<=, >>, >>= symbols for these operators.
They should be <@, <@=, @>, @>= to be consistent with other types.

>
> I am a bit suspicious about your memcmp based optimization in bitncommon,
> but it could be good. Have you benchmarked it compared to doing the same
> thing with a loop?

I did, when I was writing that part. I will be happy to do it again. I will
post the results.

>
> Documentation:
>
> Please use examples in the operator table which will evaluate to true, and
> for the && case an example where not both sides are the same.

I will change in the next version.

>
> I have not found a place either in the documentation where it is documented
> which operators are documented. I would suggest just adding a short note
> after the operators table.

I will add in the next version.

>
> inet-selfuncs
> -------------
>
> Compilation:
>
> There were some new warnings caused by this patch (with gcc 4.8.2). The
> warnings were all about the use of uninitialized variables (right,
> right_min_bits, right_order) in inet_hist_overlap_selectivity. Looking at
> the code I see that they are harmless but you should still rewrite the loop
> to silence the warnings.

I will fix in the next version.

>
> Regression tests:
>
> There are two broken
>
> -- Insist that all built-in pg_proc entries have descriptions
>
> Here you should just add descriptions for the new functions in pg_proc.h.

I will add in the next version.

>
> -- Check that all opclass search operators have selectivity estimators.
>
> Fails due to missing selectivity functions for the operators. I think you
> should at least for now do like the range types and use areajoinsel,
> contjoinsel, etc for the join selectivity estimation.

I did not support them in the first version because I could not decide
the way. I have better options than using the the geo_selfuncs for these
operators. The options are from simple to complex:

0) just use network_overlap_selectivity
1) fix network_overlap_selectivity with a constant between 0 and 1
2) calculate this constant by using masklens of all buckets of the histogram
3) calculate this constant by using masklens of matching buckets of
the histogram
4) store STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for this
types, calculate this constant with it
5) create another kind of histogram for masklens, calculate this constant
with it

I do not know how to do 4 or 5, so I will try 3 for the next version. Do you
think it is a good idea?

>
> Source code:
>
> The selectivity estimation functions, network_overlap_selectivity and
> network_adjacent_selectivity, should follow the naming conventions of the
> other selectivity estimation functions. They are named things like
> tsmatchsel, tsmatchjoinsel, and rangesel.

I will rename it to inetoverlapsel, then.


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-19 20:29:28
Message-ID: 52DC35A8.5000404@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/19/2014 11:10 AM, Emre Hasegeli wrote:
>> I am not convinced an adjacent operator is useful for the inet type, but if
>> it is included it should be indexed just like -|- of ranges. We should try
>> to keep these lists of indexed operators the same.
>
> I added it just not to leave negotor field empty. It can also be useful with
> exclusion constraints but not with GiST support. I did not add GiST support
> for it and the not equals operator because they do not fit the index
> structure. I can just remove the operator for now.

Sounds good. None of the other && implementations have a negator.

>> I think it would be fine just to add the newly indexed operators here, but
>> the more indexed operators we get in the core the less useful this test
>> becomes.
>
> I did not add the new operators to the list because I do not feel right
> about supporting <<, <<=, >>, >>= symbols for these operators.
> They should be <@, <@=, @>, @>= to be consistent with other types.

I agree, but I am not sure if it is ok to break backward compatibility here.

>> -- Check that all opclass search operators have selectivity estimators.
>>
>> Fails due to missing selectivity functions for the operators. I think you
>> should at least for now do like the range types and use areajoinsel,
>> contjoinsel, etc for the join selectivity estimation.
>
> I did not support them in the first version because I could not decide
> the way. I have better options than using the the geo_selfuncs for these
> operators. The options are from simple to complex:
>
> 0) just use network_overlap_selectivity
> 1) fix network_overlap_selectivity with a constant between 0 and 1
> 2) calculate this constant by using masklens of all buckets of the histogram
> 3) calculate this constant by using masklens of matching buckets of
> the histogram
> 4) store STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for this
> types, calculate this constant with it
> 5) create another kind of histogram for masklens, calculate this constant
> with it
>
> I do not know how to do 4 or 5, so I will try 3 for the next version. Do you
> think it is a good idea?

Solutions 0 and 1 does not really sound better than using geo_selfuncs.

--
Andreas Karlsson


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-19 23:17:04
Message-ID: 52DC5CF0.6070908@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here comes part 2 of 2 of the review.

inet-gist
---------

In general the code looks good and I think your approach makes sense.
Not an expert on GiST though so I would like a second opinion on this.
Maybe there is some clever trick which would make the index more
efficient, but the design I see is simple and logical. Like this much
more than the incorrect GiST index for inet in btree_gist.

The only thing is that I think your index would do poorly on the case
where all values share a prefix before the netmask but have different
values after the netmask, since gist_union and gist_penalty does not
care about the bits after the netmask. Am I correct? Have you thought
anything about this?

To create such data:

CREATE TABLE a (ip inet);
INSERT INTO a SELECT '127.0.0.0/16'::inet + s FROM generate_series(1,
pow(2, 16)::int) s;
CREATE INDEX aidx ON a USING gist (ip);

Saw also some minor things too.

Typo in comment, weird sentence:

* The main question to calculate the union is that they have how many
* bits in common. [...]

I do not like how you called the result key i inet_union_gist() "tmp".
Something like "result" or "result_ip" would be better.

Typo in comment, "reverse" should be "inverse":

* Penalty is reverse of the common IP bits of the two addresses.

Typo in comment, missing "be":

* Values bigger than 1 are used when the common IP bits cannot
* calculated.

Language can be improved in this comment. Both ways to split are by the
union of the keys, it is more of a split by address prefix.

* The second and the important way is to split by the union of the keys.
* Union of the keys calculated same way with the inet_gist_union function.
* The first and the last biggest subnets created from the calculated
* union. In this case addresses contained by the first subnet will be put
* to the left bucket, address contained by the last subnet will be put to
* the right bucket.

Could you not just use memcmp in inet_gist_same() instead of bitncmp()
since it only cares about equality?

There is an extra empty line at the end of network_gist.c

inet-selfuncs
-------------

Here I have to honestly admit I am not sure if I can tell if your
selectivity function is correct. Your explanation sounds convincing and
the general idea of looking at the histogram is correct. I am just have
no idea if the details are right.

DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.

Typo in comment, "constrant" -> "constant".

There is an extra empty line at the end of network_selfuncs.c

--
Andreas Karlsson


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-23 22:22:56
Message-ID: CAE2gYzzgY96OPrU7nn2jrh2G0_57Qj_XZRkrVyOaGVwRv+Qarg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014/1/20 Andreas Karlsson <andreas(at)proxel(dot)se>:
> Here comes part 2 of 2 of the review.

Second versions of the patches attached. They address both of your reviews.

>
> inet-gist
> ---------
>
> In general the code looks good and I think your approach makes sense. Not an
> expert on GiST though so I would like a second opinion on this. Maybe there
> is some clever trick which would make the index more efficient, but the
> design I see is simple and logical. Like this much more than the incorrect
> GiST index for inet in btree_gist.

I realized that create extension btree_gist fails with the patch.

ERROR: could not make operator class "gist_inet_ops" be default for type inet
DETAIL: Operator class "inet_ops" already is the default.

I think making the new operator class default makes more sense. Name conflict
can be a bigger problem. Should I rename the new operator class?

>
> The only thing is that I think your index would do poorly on the case where
> all values share a prefix before the netmask but have different values after
> the netmask, since gist_union and gist_penalty does not care about the bits
> after the netmask. Am I correct? Have you thought anything about this?

Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits()
but I could not make it work with the cidr type. It can be done by adding
operator as a parameter for union, penalty and picksplit functions on the
GiST framework. I am not sure it worths the trouble. It would only help
the basic comparison operators and it would slow down other operators because
network length comparison before network address would not be possible for
the intermediate nodes. I mentioned this behavior of the operator class on
the paragraph added to the documentation.

> Saw also some minor things too.
>
> Typo in comment, weird sentence:
>
> * The main question to calculate the union is that they have how many
> * bits in common. [...]

I rewrite it. I am sorry about my English.

>
> I do not like how you called the result key i inet_union_gist() "tmp".
> Something like "result" or "result_ip" would be better.
>
> Typo in comment, "reverse" should be "inverse":
>
> * Penalty is reverse of the common IP bits of the two addresses.
>
> Typo in comment, missing "be":
>
> * Values bigger than 1 are used when the common IP bits cannot
> * calculated.
>
> Language can be improved in this comment. Both ways to split are by the
> union of the keys, it is more of a split by address prefix.
>
> * The second and the important way is to split by the union of the keys.
> * Union of the keys calculated same way with the inet_gist_union function.
> * The first and the last biggest subnets created from the calculated
> * union. In this case addresses contained by the first subnet will be put
> * to the left bucket, address contained by the last subnet will be put to
> * the right bucket.
>
> Could you not just use memcmp in inet_gist_same() instead of bitncmp() since
> it only cares about equality?
>
> There is an extra empty line at the end of network_gist.c

I changed them.

>
> inet-selfuncs
> -------------
>
> Here I have to honestly admit I am not sure if I can tell if your
> selectivity function is correct. Your explanation sounds convincing and the
> general idea of looking at the histogram is correct. I am just have no idea
> if the details are right.

I tried to improve it in this version. I hope it is more readable now. I used
the word inclusion instead of overlap, made some changes on the algorithm
to make it more suitable to the other operators.

Using the histogram for inclusion which is originally for basic comparison,
is definitely not correct. It is an empirical approach. I have tried several
versions of it, tested them with different data sets and thought it is better
than nothing.

>
> DEFAULT_NETWORK_OVERLAP_SELECTIVITY should be renamed
> DEFAULT_NETWORK_OVERLAP_SEL and moved to the .c file.
>
> Typo in comment, "constrant" -> "constant".
>
> There is an extra empty line at the end of network_selfuncs.c

I changed them.

Attachment Content-Type Size
inet-gist-v2.patch application/octet-stream 39.9 KB
inet-selfuncs-v2.patch application/octet-stream 17.4 KB

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-01-31 01:58:09
Message-ID: 52EB0331.5090500@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/23/2014 11:22 PM, Emre Hasegeli wrote:
> inet-gist
> ---------
> I realized that create extension btree_gist fails with the patch.
>
> ERROR: could not make operator class "gist_inet_ops" be default for type inet
> DETAIL: Operator class "inet_ops" already is the default.
>
> I think making the new operator class default makes more sense. Name conflict
> can be a bigger problem. Should I rename the new operator class?

I agree that the new operator class should be the default one, it is
more useful and also not buggy. No idea about the naming.

>> The only thing is that I think your index would do poorly on the case where
>> all values share a prefix before the netmask but have different values after
>> the netmask, since gist_union and gist_penalty does not care about the bits
>> after the netmask. Am I correct? Have you thought anything about this?
>
> Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits()
> but I could not make it work with the cidr type. It can be done by adding
> operator as a parameter for union, penalty and picksplit functions on the
> GiST framework. I am not sure it worths the trouble. It would only help
> the basic comparison operators and it would slow down other operators because
> network length comparison before network address would not be possible for
> the intermediate nodes. I mentioned this behavior of the operator class on
> the paragraph added to the documentation.

I think this is fine since it does not seem like a very useful case to
support to me. Would be worth doing only if it is easy to do.

A separate concern: we still have not come to a decision about operators
for inet here. I do not like the fact that the operators differ between
ranges and inet. But I feel this might be out of scope for this patch.

Does any third party want to chime in with an opinion?

Current inet/cidr
<< is contained within
<<= is contained within or equals
>> contains
>>= contains or equals

Range types
@> contains range
<@ range is contained by
&& overlap (have points in common)
<< strictly left of
>> strictly right of

>> inet-selfuncs
>> -------------
>>
>> Here I have to honestly admit I am not sure if I can tell if your
>> selectivity function is correct. Your explanation sounds convincing and the
>> general idea of looking at the histogram is correct. I am just have no idea
>> if the details are right.
>
> I tried to improve it in this version. I hope it is more readable now. I used
> the word inclusion instead of overlap, made some changes on the algorithm
> to make it more suitable to the other operators.
>
> Using the histogram for inclusion which is originally for basic comparison,
> is definitely not correct. It is an empirical approach. I have tried several
> versions of it, tested them with different data sets and thought it is better
> than nothing.

Thanks for the updates. The code in this version is cleaner and easier
to follow.

I am not convinced of your approach to calculating the selectivity from
the histogram. The thing I have the problem with is the clever trickery
involved with how you handle different operator types. I prefer the
clearer code of the range types with how calc_hist_selectivity_scalar is
used. Is there any reason for why that approach would not work here or
result in worse code?

I have attached a patch where I improved the language of your comment
describing the workings of the selectivity estimation. Could you please
check it so I did not change the meaning of any of it?

I see from the tests that you still are missing selectivity functions
for operators, what is your plan for this?

--
Andreas Karlsson

Attachment Content-Type Size
inet-selfuncs-commentfix.patch text/x-patch 6.2 KB

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-02-06 10:03:00
Message-ID: CAE2gYzwtGdw+SS_4Zx327eNq8EnfJLx5KX=9Za4Qv1b0mcJR1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-01-19 12:10, Emre Hasegeli <emre(at)hasegeli(dot)com>:
> 2014-01-19 Andreas Karlsson <andreas(at)proxel(dot)se>:
>
>> I am a bit suspicious about your memcmp based optimization in bitncommon,
>> but it could be good. Have you benchmarked it compared to doing the same
>> thing with a loop?
>
> I did, when I was writing that part. I will be happy to do it again. I will
> post the results.

I was testing it by creating GiST indexes. I realized that these test are
inconsistent when BUFFERING = AUTO. I repeated them with BUFFERING = ON.
The function without memcmp was faster in this case. I will change
the function in the next version of the patch.

The test case:

Create table Network as
select (a || '.' || b || '.' || c || '/24')::cidr
from generate_series(0, 255) as a,
generate_series(0, 255) as b,
generate_series(0, 255) as c;

Drop index if exists N;
Create index N on Network using gist(cidr) with (buffering = on);

Create table Network6 as
select ('::' || to_hex(a) || ':' || to_hex(b))::inet
from generate_series(0, 255) as a,
generate_series(0, 65535) as b;

Drop index if exists N6;
Create index N6 on Network6 using gist(inet) with (buffering = on);

What I could not understand is the tests with IP version 6 was much faster.
The row count is same, the index size is bigger.


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-02-06 17:14:55
Message-ID: CAE2gYzxc0FXEwe59sFdUZNQ24c+fRBDMGXwwvbYvmeaNateidw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Third versions of the patches attached. They are rebased to the HEAD. In
this versions, the bitncommon function is changed. <sys/socket.h> included
to network_gist.c to be able to compile it on FreeBSD. Geometric mean
calculation for partial bucket match on the function
inet_hist_inclusion_selectivity
reverted back. It was something I changed without enough testing on
the second revision of the patch. This version uses the maximum divider
calculated from the boundaries of the bucket, like the first version. It is
simpler and more reliable.

2014-01-31 3:58, Andreas Karlsson <andreas(at)proxel(dot)se>:

> I agree that the new operator class should be the default one, it is more
> useful and also not buggy. No idea about the naming.

I have misread the name, rename was not necessary. I removed the DEFAULT
keywords for the inet and the cidr operator classes from btree_gist--1.0.sql
on the inet-gist patch.

> A separate concern: we still have not come to a decision about operators for
> inet here. I do not like the fact that the operators differ between ranges
> and inet. But I feel this might be out of scope for this patch.

I attached a separate optional patch to rename the inclusion operators. It
leaves the old names, changes the GiST supported operators and the operator
names in the documentation. It would be better to apply it with the GiST
support, in my opinion.

> I am not convinced of your approach to calculating the selectivity from the
> histogram. The thing I have the problem with is the clever trickery involved
> with how you handle different operator types. I prefer the clearer code of
> the range types with how calc_hist_selectivity_scalar is used. Is there any
> reason for why that approach would not work here or result in worse code?

Currently we do not have histogram of the lower and upper bounds as
the range types. Current histogram can be used nicely as the lower bound,
but not the upper bound because the comparison is first on the common bits
of the network part, then on the length of the network part. For example,
10.0/16 is defined as greater than 10/8.

Using the histogram as the lower bounds of the networks is not enough to
calculate selectivity for any of these operators. Using it also as the upper
bounds is still not enough for the inclusion operators. The lengths of
the network parts should taken into consideration in a way and it is
what this patch does. Using separate histograms for the lower bounds,
the upper bounds and the lengths of the network parts can solve all of these
problems, but it is a lot of work.

> I have attached a patch where I improved the language of your comment
> describing the workings of the selectivity estimation. Could you please
> check it so I did not change the meaning of any of it?

Thank you. I only added the "for" to the sentence "This ratio can be big
enough to not disregard for addresses with small masklens." I merged
the changes to this version of the patch.

> I see from the tests that you still are missing selectivity functions for
> operators, what is your plan for this?

This was because the join selectivity estimation functions. I set
the geo_selfuncs for the missing ones. All tests pass with them. I want
to develop the join selectivity function too, but not for this commit fest.

Attachment Content-Type Size
inet-gist-v3.patch application/octet-stream 43.0 KB
inet-selfuncs-v3.patch application/octet-stream 17.7 KB
inet-operator-v1.patch application/octet-stream 20.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-07 20:41:39
Message-ID: CA+TgmoZU_X5Haw053NXPpcPqWkojicfQ_M2_JHZj6uxpvPhwBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 6, 2014 at 12:14 PM, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
> I have misread the name, rename was not necessary. I removed the DEFAULT
> keywords for the inet and the cidr operator classes from btree_gist--1.0.sql
> on the inet-gist patch.

Generally, modifying already-release .sql files for extensions is a no-no...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: GiST support for inet datatypes
Date: 2014-02-17 12:40:07
Message-ID: CAE2gYzwVhGay1ipUoQdmY51egpsYYqQR89CPbegZBtezwxia1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-07 22:41, Robert Haas <robertmhaas(at)gmail(dot)com>:

> Generally, modifying already-release .sql files for extensions is a no-no...

I prepared separate patches for btree_gist extension with more options.
First one (btree-gist-drop-default-inet-v1.patch) removes DEFAULT keyword
only from the inet and the cidr operator classes. Second one
(btree-gist-drop-default-all-v1.patch) removes DEFAULT keyword for all
operator classes. I think it is more consistent to remove it from all.
Third one (btree-gist-drop-inet-v1.patch) removes the inet and the cidr
operator classes altogether. It is suggested by Tom Lane [1] on bug #5705.
The new GiST operator class includes basic comparison operators except !=
so it may be the right time to remove support from btree_gist. Fourth one
(btree-gist-drop-inet-and-default-v1.patch) is the second one and the third
one together.

[1] http://www.postgresql.org/message-id/10183.1287526949@sss.pgh.pa.us

Attachment Content-Type Size
btree-gist-drop-default-inet-v1.patch application/octet-stream 78.9 KB
btree-gist-drop-default-all-v1.patch application/octet-stream 80.1 KB
btree-gist-drop-inet-v1.patch application/octet-stream 124.2 KB
btree-gist-drop-inet-and-default-v1.patch application/octet-stream 125.6 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: GiST support for inet datatypes
Date: 2014-02-17 12:54:41
Message-ID: 20140217125441.GA18388@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-02-17 14:40:07 +0200, Emre Hasegeli wrote:
> 2014-02-07 22:41, Robert Haas <robertmhaas(at)gmail(dot)com>:
>
> > Generally, modifying already-release .sql files for extensions is a no-no...
>
> I prepared separate patches for btree_gist extension with more options.
> First one (btree-gist-drop-default-inet-v1.patch) removes DEFAULT keyword
> only from the inet and the cidr operator classes. Second one
> (btree-gist-drop-default-all-v1.patch) removes DEFAULT keyword for all
> operator classes. I think it is more consistent to remove it from all.
> Third one (btree-gist-drop-inet-v1.patch) removes the inet and the cidr
> operator classes altogether. It is suggested by Tom Lane [1] on bug #5705.
> The new GiST operator class includes basic comparison operators except !=
> so it may be the right time to remove support from btree_gist. Fourth one
> (btree-gist-drop-inet-and-default-v1.patch) is the second one and the third
> one together.
>
> [1] http://www.postgresql.org/message-id/10183.1287526949@sss.pgh.pa.us

> diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile
> index ba4af14..d5b1fd7 100644
> --- a/contrib/btree_gist/Makefile
> +++ b/contrib/btree_gist/Makefile
> @@ -9,7 +9,7 @@ OBJS = btree_gist.o btree_utils_num.o btree_utils_var.o btree_int2.o \
> btree_numeric.o
>
> EXTENSION = btree_gist
> -DATA = btree_gist--1.0.sql btree_gist--unpackaged--1.0.sql
> +DATA = btree_gist--1.1.sql btree_gist--unpackaged--1.0.sql

You need to add a file for going from 1.0 to 1.1.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, tgl <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-17 19:57:42
Message-ID: CAE2gYzz=-1tyQUAVQTUZ2=4MhLu91k40eZBinvNgLaLQBxfXwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-17 14:54, Andres Freund <andres(at)2ndquadrant(dot)com>:
> You need to add a file for going from 1.0 to 1.1.

Thank you for the notice. I added them to the patches which touch only two
of the operator classes. It drops and re-creates operator classes as there
is not ALTER OPERATOR CLASS DROP DEFAULT command. I do not apply it to
the patch to remove the DEFAULT keyword from all of the operator classes
because it would nearly be same as dropping and re-creating the extension.

Attachment Content-Type Size
btree-gist-drop-default-inet-v2.patch application/octet-stream 80.6 KB
btree-gist-drop-inet-v2.patch application/octet-stream 82.3 KB
btree-gist-drop-inet-and-default-v2.patch application/octet-stream 83.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: emre(at)hasegeli(dot)com
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-17 20:16:21
Message-ID: 9225.1392668181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Emre Hasegeli <emre(at)hasegeli(dot)com> writes:
> 2014-02-17 14:54, Andres Freund <andres(at)2ndquadrant(dot)com>:
>> You need to add a file for going from 1.0 to 1.1.

> Thank you for the notice. I added them to the patches which touch only two
> of the operator classes. It drops and re-creates operator classes as there
> is not ALTER OPERATOR CLASS DROP DEFAULT command.

Dropping an operator class is quite unacceptable, as it will cause indexes
based on that class to go away (or more likely, cause the upgrade to fail,
if you didn't use CASCADE). What we've done in the past for changes that
are nominally unsupported is to make the upgrade scripts tweak the system
catalogs directly.

More generally, it doesn't look to me like these upgrade scripts are
complete; shouldn't they be creating some new objects, not just replacing
old ones?

We need to have a discussion as to whether it's actually sane for an
upgrade to remove the DEFAULT marking on a pre-existing opclass. It
strikes me that this would for instance break pg_dump dumps, in the sense
that the reloaded index would probably now have a different opclass
than before (since pg_dump would see no need to have put an explicit
opclass name into CREATE INDEX if it was the default in the old database).
Even if the new improved opclass is in all ways better, that would be
catastrophic for pg_upgrade I suspect. Unless the new opclass is
on-disk-compatible with the old; in which case we shouldn't be creating
a new opclass at all, but just modifying the definition of the old one.

In short we probably need to think a bit harder about what this patch is
proposing to do. It seems fairly likely to me that some other approach
would be a better idea.

regards, tom lane


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-17 20:41:44
Message-ID: CAE2gYzxWUx_R1BExfHOBBkZxNVMRpnqLMNBq07Gb34EH7t=yKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-17 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> More generally, it doesn't look to me like these upgrade scripts are
> complete; shouldn't they be creating some new objects, not just replacing
> old ones?

The actual patches are on the previous mail [1]. I was just trying
to solve the problem that btree_gist cannot be loaded because of
the new operator class.

> In short we probably need to think a bit harder about what this patch is
> proposing to do. It seems fairly likely to me that some other approach
> would be a better idea.

How about only removing the inet and the cidr operator classes
from btree_gist. btree-gist-drop-inet-v2.patch does that.

[1] http://www.postgresql.org/message-id/CAE2gYzxc0FXEwe59sFdUZNQ24c+fRBDMGXwwvbYvmeaNateidw@mail.gmail.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: emre(at)hasegeli(dot)com
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-17 20:56:00
Message-ID: 6245.1392670560@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Emre Hasegeli <emre(at)hasegeli(dot)com> writes:
> How about only removing the inet and the cidr operator classes
> from btree_gist. btree-gist-drop-inet-v2.patch does that.

I'm not sure which part of "no" you didn't understand, but to be
clear: you don't get to break existing installations.

Assuming that this opclass is sufficiently better than the existing one,
it would sure be nice if it could become the default; but I've not seen
any proposal in this thread that would allow that without serious upgrade
problems. I think the realistic alternatives so far are (1) new opclass
is not the default, or (2) this patch gets rejected.

We should probably expend some thought on a general approach to
replacing the default opclass for a datatype, because I'm sure this
will come up again. Right now I don't see a feasible way.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-19 12:48:23
Message-ID: CA+TgmobYWNhfX4-jvrL2rh8Zp98U5UYpPg2FM+GUgcQ81m0Q0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 17, 2014 at 3:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Emre Hasegeli <emre(at)hasegeli(dot)com> writes:
>> How about only removing the inet and the cidr operator classes
>> from btree_gist. btree-gist-drop-inet-v2.patch does that.
>
> I'm not sure which part of "no" you didn't understand, but to be
> clear: you don't get to break existing installations.
>
> Assuming that this opclass is sufficiently better than the existing one,
> it would sure be nice if it could become the default; but I've not seen
> any proposal in this thread that would allow that without serious upgrade
> problems. I think the realistic alternatives so far are (1) new opclass
> is not the default, or (2) this patch gets rejected.
>
> We should probably expend some thought on a general approach to
> replacing the default opclass for a datatype, because I'm sure this
> will come up again. Right now I don't see a feasible way.

It seems odd for "default" to be a property of the opclass rather than
a separate knob. What comes to mind is something like:

ALTER TYPE sniffle SET DEFAULT OPERATOR CLASS FOR btree TO achoo_ops;

Not sure how much that helps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-19 14:52:56
Message-ID: 31281.1392821576@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Feb 17, 2014 at 3:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We should probably expend some thought on a general approach to
>> replacing the default opclass for a datatype, because I'm sure this
>> will come up again. Right now I don't see a feasible way.

> It seems odd for "default" to be a property of the opclass rather than
> a separate knob. What comes to mind is something like:
> ALTER TYPE sniffle SET DEFAULT OPERATOR CLASS FOR btree TO achoo_ops;
> Not sure how much that helps.

Not at all, AFAICS. If it were okay to decide that some formerly-default
opclass is no longer default, then having such a command would be better
than manually manipulating pg_opclass.opcdefault --- but extension upgrade
scripts could certainly do the latter if they had to. The problem here
is whether it's upgrade-safe to make such a change at all; having
syntactic sugar doesn't make it safer.

regards, tom lane


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-19 21:39:41
Message-ID: CAE2gYzzx_fgx-GuAb4z9R-_DH=QzJTkJaGN5=je7Pm3trfen1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-19 16:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Not at all, AFAICS. If it were okay to decide that some formerly-default
> opclass is no longer default, then having such a command would be better
> than manually manipulating pg_opclass.opcdefault --- but extension upgrade
> scripts could certainly do the latter if they had to. The problem here
> is whether it's upgrade-safe to make such a change at all; having
> syntactic sugar doesn't make it safer.

It is not hard to support != operator on the new operator class. That would
make it functionally compatible with the ones on btree_gist. That way,
changing the default would not break pg_dump dumps, it would only upgrade
the index to the new one.

pg_upgrade dumps current objects in the extension. It fails to restore them,
if the new operator class exists as the default. I do not really understand
how pg_upgrade handle extension upgrades. I do not have a solution to this.

It would be sad if we cannot make the new operator class default because of
the btree_gist implementation which is not useful for inet data types. You
wrote on 2010-10-11 about it [1]:

> Well, actually the btree_gist implementation for inet is a completely
> broken piece of junk: it thinks that convert_network_to_scalar is 100%
> trustworthy and can be used as a substitute for the real comparison
> functions, which isn't even approximately true. I'm not sure why
> Teodor implemented it like that instead of using the type's real
> comparison functions, but it's pretty much useless if you want the
> same sort order that the type itself defines.

[1] http://www.postgresql.org/message-id/8973.1286841006@sss.pgh.pa.us


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: emre(at)hasegeli(dot)com
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-19 23:37:42
Message-ID: 17751.1392853062@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Emre Hasegeli <emre(at)hasegeli(dot)com> writes:
> [ cites bug #5705 ]

Hm. I had forgotten how thoroughly broken btree_gist's inet and cidr
opclasses are. There was discussion at the time of just ripping them
out despite the compatibility hit. We didn't do it, but if we had
then life would be simpler now.

Perhaps it would be acceptable to drop the btree_gist implementation
and teach pg_upgrade to refuse to upgrade if the old database contains
any such indexes. I'm not sure that solves the problem, though, because
I think pg_upgrade will still fail if the opclass is in the old database
even though unused (because you'll get the complaint about multiple
default opclasses anyway). The only simple way to avoid that is to not
have btree_gist loaded at all in the old DB, and I doubt that's an
acceptable upgrade restriction. It would require dropping indexes of
the other types supported by btree_gist, which would be a pretty hard
sell since those aren't broken.

Really what we'd need here is for btree_gist to be upgraded to a version
that doesn't define gist_inet_ops (or at least doesn't mark it as default)
*before* pg_upgrading to a server version containing the proposed new
implementation. Not sure how workable that is. Could we get away with
having pg_upgrade unset the default flag in the old database?
(Ick, but there are no very good alternatives here ...)

BTW, I'd not been paying any attention to this thread, but now that
I scan through it, I think the proposal to change the longstanding
names of the inet operators has zero chance of being accepted.
Consistency with the names chosen for range operators is a mighty
weak argument compared to backwards compatibility.

regards, tom lane


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-20 13:30:48
Message-ID: CAE2gYzz5hpu_xWVyK0C2TmqvAPxtr_SEnziE+truOd+umUf8Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-20 01:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Perhaps it would be acceptable to drop the btree_gist implementation
> and teach pg_upgrade to refuse to upgrade if the old database contains
> any such indexes. I'm not sure that solves the problem, though, because
> I think pg_upgrade will still fail if the opclass is in the old database
> even though unused (because you'll get the complaint about multiple
> default opclasses anyway). The only simple way to avoid that is to not
> have btree_gist loaded at all in the old DB, and I doubt that's an
> acceptable upgrade restriction. It would require dropping indexes of
> the other types supported by btree_gist, which would be a pretty hard
> sell since those aren't broken.
>
> Really what we'd need here is for btree_gist to be upgraded to a version
> that doesn't define gist_inet_ops (or at least doesn't mark it as default)
> *before* pg_upgrading to a server version containing the proposed new
> implementation. Not sure how workable that is. Could we get away with
> having pg_upgrade unset the default flag in the old database?
> (Ick, but there are no very good alternatives here ...)

Upgrading btree_gist on the old installation would be almost impossible
for the majority of the users who use package managers, in my opinion.
I cannot think of a better solution than your suggestion. I can try to
prepare a patch to execute the following query on pg_upgrade before
dumping the old database, if that is the final decision.

UPDATE pg_opclass SET opcdefault = false
WHERE opcname IN ('gist_inet_ops', 'gist_cidr_ops');

> BTW, I'd not been paying any attention to this thread, but now that
> I scan through it, I think the proposal to change the longstanding
> names of the inet operators has zero chance of being accepted.
> Consistency with the names chosen for range operators is a mighty
> weak argument compared to backwards compatibility.

That is why I prepared it as a separate patch on top of the others. It is
not only consistency with the range types: <@ and @> symbols used for
containment everywhere except the inet data types, particularly on
the geometric types, arrays; cube, hstore, intaray, ltree extensions.
The patch does not just change the operator names, it leaves the old ones,
adds new operators with GiST support and changes the documentation, like
your commit ba920e1c9182eac55d5f1327ab0d29b721154277 back in 2006. I could
not find why did you leave the inet operators unchanged on that commit,
in the mailing list archives [1]. GiST support will be a promotion for
users to switch to the new operators, if we make this change with it.

This change will also indirectly deprecate the undocumented non-transparent
btree index support that works sometimes for some of the subnet inclusion
operators [2].

[1] http://www.postgresql.org/message-id/14277.1157304939@sss.pgh.pa.us

[2] http://www.postgresql.org/message-id/389.1042992616@sss.pgh.pa.us


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST support for inet datatypes
Date: 2014-02-24 00:44:19
Message-ID: 530A95E3.2080702@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/06/2014 06:14 PM, Emre Hasegeli wrote:
> Third versions of the patches attached. They are rebased to the HEAD. In
> this versions, the bitncommon function is changed. <sys/socket.h> included
> to network_gist.c to be able to compile it on FreeBSD. Geometric mean
> calculation for partial bucket match on the function
> inet_hist_inclusion_selectivity
> reverted back. It was something I changed without enough testing on
> the second revision of the patch. This version uses the maximum divider
> calculated from the boundaries of the bucket, like the first version. It is
> simpler and more reliable.

Thanks for the updated patch.

About the discussions about upgrading PostgreSQL, extensions and
defaults I do not have any strong opinion. I think that this patch is
useful even if it does not end up the default, but it would be a pity
since the BTree GiST index is broken.

Note: The patches do not apply anymore due to changes to
src/backend/utils/adt/Makefile.

>> I am not convinced of your approach to calculating the selectivity from the
>> histogram. The thing I have the problem with is the clever trickery involved
>> with how you handle different operator types. I prefer the clearer code of
>> the range types with how calc_hist_selectivity_scalar is used. Is there any
>> reason for why that approach would not work here or result in worse code?
>
> Currently we do not have histogram of the lower and upper bounds as
> the range types. Current histogram can be used nicely as the lower bound,
> but not the upper bound because the comparison is first on the common bits
> of the network part, then on the length of the network part. For example,
> 10.0/16 is defined as greater than 10/8.
>
> Using the histogram as the lower bounds of the networks is not enough to
> calculate selectivity for any of these operators. Using it also as the upper
> bounds is still not enough for the inclusion operators. The lengths of
> the network parts should taken into consideration in a way and it is
> what this patch does. Using separate histograms for the lower bounds,
> the upper bounds and the lengths of the network parts can solve all of these
> problems, but it is a lot of work.

I see, thanks for the explanation. But I am still not very fond of how
that code is written since I find it hard to verify the correctness of
it, but have no better suggestions.

>> I see from the tests that you still are missing selectivity functions for
>> operators, what is your plan for this?
>
> This was because the join selectivity estimation functions. I set
> the geo_selfuncs for the missing ones. All tests pass with them. I want
> to develop the join selectivity function too, but not for this commit fest.

All tests pass now. Excellent!

Do you think the new index is useful even if you use the basic
geo_selfuncs? Or should we wait with committing the patches until all
selfuncs are implemented?

--
Andreas Karlsson


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: emre(at)hasegeli(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-24 15:55:46
Message-ID: 20140224155546.GA21747@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 19, 2014 at 06:37:42PM -0500, Tom Lane wrote:
> Emre Hasegeli <emre(at)hasegeli(dot)com> writes:
> > [ cites bug #5705 ]
>
> Hm. I had forgotten how thoroughly broken btree_gist's inet and cidr
> opclasses are. There was discussion at the time of just ripping them
> out despite the compatibility hit. We didn't do it, but if we had
> then life would be simpler now.
>
> Perhaps it would be acceptable to drop the btree_gist implementation
> and teach pg_upgrade to refuse to upgrade if the old database contains
> any such indexes. I'm not sure that solves the problem, though, because
> I think pg_upgrade will still fail if the opclass is in the old database
> even though unused (because you'll get the complaint about multiple
> default opclasses anyway). The only simple way to avoid that is to not
> have btree_gist loaded at all in the old DB, and I doubt that's an
> acceptable upgrade restriction. It would require dropping indexes of
> the other types supported by btree_gist, which would be a pretty hard
> sell since those aren't broken.
>
> Really what we'd need here is for btree_gist to be upgraded to a version
> that doesn't define gist_inet_ops (or at least doesn't mark it as default)
> *before* pg_upgrading to a server version containing the proposed new
> implementation. Not sure how workable that is. Could we get away with
> having pg_upgrade unset the default flag in the old database?
> (Ick, but there are no very good alternatives here ...)

pg_upgrade has _never_ modified the old cluster, and I am hesitant to do
that now. Can we make the changes in the new cluster, or in pg_dump
when in binary upgrade mode?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-26 10:49:10
Message-ID: CAE2gYzzP+MGLz2eXbdobzrnfDxMOJ1zVGiXVmXBPc60p7mcoiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-24 02:44, Andreas Karlsson <andreas(at)proxel(dot)se>:

> Note: The patches do not apply anymore due to changes to
> src/backend/utils/adt/Makefile.

I will fix it on the next version.

> I see, thanks for the explanation. But I am still not very fond of how that
> code is written since I find it hard to verify the correctness of it, but
> have no better suggestions.

We can split the selectivity estimation patch from the GiST support, add
it to the next commit fest for more review. In the mean time, I can
improve it with join selectivity estimation. Testing with different datasets
would help the most to reveal how true the algorithm is.

> Do you think the new index is useful even if you use the basic geo_selfuncs?
> Or should we wait with committing the patches until all selfuncs are
> implemented?

My motivation was to be able to use the cidr datatype with exclusion
constraints. I think the index would also be useful without proper
selectivity estimation functions. Range types also use geo_selfuncs for
join selectivity estimation.


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-27 10:39:05
Message-ID: CAE2gYzzYkthDHc0M61Tm+fGbRAV9dQgk19A=kR7N3F5XbCabXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-24 17:55, Bruce Momjian <bruce(at)momjian(dot)us>:

> pg_upgrade has _never_ modified the old cluster, and I am hesitant to do
> that now. Can we make the changes in the new cluster, or in pg_dump
> when in binary upgrade mode?

It can be possible to update the new operator class in the new cluster
as not default, before restore. After the restore, pg_upgrade can upgrade
the btree_gist extension and reset the operator class as the default.
pg_upgrade suggests to re-initdb the new cluster if it fails, anyway. Do
you think it is a better solution? I think it will be more complicated
to do in pg_dump when in binary upgrade mode.


From: Florian Pflug <fgp(at)phlo(dot)org>
To: emre(at)hasegeli(dot)com
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-27 16:15:32
Message-ID: 2366213C-6E48-483A-8CAF-C14E000EAC3E@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Feb27, 2014, at 11:39 , Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
> 2014-02-24 17:55, Bruce Momjian <bruce(at)momjian(dot)us>:
>
>> pg_upgrade has _never_ modified the old cluster, and I am hesitant to do
>> that now. Can we make the changes in the new cluster, or in pg_dump
>> when in binary upgrade mode?
>
> It can be possible to update the new operator class in the new cluster
> as not default, before restore. After the restore, pg_upgrade can upgrade
> the btree_gist extension and reset the operator class as the default.
> pg_upgrade suggests to re-initdb the new cluster if it fails, anyway. Do
> you think it is a better solution? I think it will be more complicated
> to do in pg_dump when in binary upgrade mode.

Maybe I'm missing something, but isn't the gist of the problem here that
pg_dump won't explicitly state the operator class if it's the default?

If so, can't we just make pg_dump always spell out the operator class
explicitly? Then changing the default will never change the meaning of
database dumps, so upgraded clusters will simply continue to use the
old (now non-default) opclass, no?

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: emre(at)hasegeli(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-27 16:56:21
Message-ID: 1397.1393520181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> Maybe I'm missing something, but isn't the gist of the problem here that
> pg_dump won't explicitly state the operator class if it's the default?

That's not a bug, it's a feature, for much the same reasons that pg_dump
tries to minimize explicit schema-qualification.

At least, it's a feature for ordinary dumps. I'm not sure whether it
might be a good idea for binary_upgrade dumps. That would depend on
our policy about whether a new opclass has to have a new (and necessarily
weird) name. If we try to make the new opclass still have the nicest
name then it won't help at all for pg_dump to dump the old name.

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: emre(at)hasegeli(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-27 17:11:25
Message-ID: 5CD6A309-FC0F-4C52-A0B9-CCEB68AC8948@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Feb27, 2014, at 17:56 , Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> Maybe I'm missing something, but isn't the gist of the problem here that
>> pg_dump won't explicitly state the operator class if it's the default?
>
> That's not a bug, it's a feature, for much the same reasons that pg_dump
> tries to minimize explicit schema-qualification.

I fail to see the value in this for opclasses. It's certainly nice for
schema qualifications, because dumping one schema and restoring into a
different schema is probably quite common. But how often does anyone dump
a database and restore it into a database with a different default opclass
for some type?

Or is the idea just to keep the dump as readable as possible?

> At least, it's a feature for ordinary dumps. I'm not sure whether it
> might be a good idea for binary_upgrade dumps. That would depend on
> our policy about whether a new opclass has to have a new (and necessarily
> weird) name. If we try to make the new opclass still have the nicest
> name then it won't help at all for pg_dump to dump the old name.

Well, given the choice between not ever being able to change the default
opclass of a type, and not being able to re-use an old opclass' name,
I'd pick the latter. Especially because for default opclasses, users don't
usually have to know the name anyway.

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: emre(at)hasegeli(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-27 17:30:57
Message-ID: 2311.1393522257@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> On Feb27, 2014, at 17:56 , Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> That's not a bug, it's a feature, for much the same reasons that pg_dump
>> tries to minimize explicit schema-qualification.

> I fail to see the value in this for opclasses. It's certainly nice for
> schema qualifications, because dumping one schema and restoring into a
> different schema is probably quite common.

The value in it is roughly the same as the reason we don't include a
version number when dumping CREATE EXTENSION. If you had a default
opclass in the source database, you probably want a default opclass
in the destination, even if that's not bitwise the same as what you
had before. The implication is that you want best practice for the
current PG version.

Another reason for not doing it is that unique-constraint syntax doesn't
even support it. Constraints *have to* use the default opclass.

> But how often does anyone dump
> a database and restore it into a database with a different default opclass
> for some type?

Indeed. The root of the problem here is that we've never really thought
about changing a type's default opclass before. I don't think that a
two-line change in pg_dump fixes all the issues this will bring up.

I remind you also that even if you had a 100% bulletproof argument for
changing the behavior now, it won't affect what's in existing dump files.
The only real wiggle room we have is to change the --binary-upgrade
behavior, since we can plausibly insist that people use a current pg_dump
while doing an upgrade.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, emre(at)hasegeli(dot)com, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-28 14:07:25
Message-ID: CAM-w4HOZknf+d8PxHQu4T_uPjZBNhznJEaqaX6-=r0GyOwRspA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 27, 2014 at 5:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Indeed. The root of the problem here is that we've never really thought
> about changing a type's default opclass before. I don't think that a
> two-line change in pg_dump fixes all the issues this will bring up.

I think we did actually do this once. When you replaced rtree with
gist as the default opclass for the rtree method. IIRC you did it by
making "rtree" a synonym for "gist" and since the opclass wasn't
specified the default gist opclass kicked in automatically.

--
greg


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-28 14:45:15
Message-ID: CAE2gYzy8TCx5pyn6xYPWnfH9QMyV0zo6EfUo27Q_Wyj3zCug+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-27 18:15, Florian Pflug <fgp(at)phlo(dot)org>:
>> It can be possible to update the new operator class in the new cluster
>> as not default, before restore. After the restore, pg_upgrade can upgrade
>> the btree_gist extension and reset the operator class as the default.
>> pg_upgrade suggests to re-initdb the new cluster if it fails, anyway. Do
>> you think it is a better solution? I think it will be more complicated
>> to do in pg_dump when in binary upgrade mode.
>
> Maybe I'm missing something, but isn't the gist of the problem here that
> pg_dump won't explicitly state the operator class if it's the default?

No, the problem is pg_upgrade. We can even benefit from this behavior of
pg_dump, if we would like to remove the operator classes on btree_gist.
Because of that behavior; users, who would upgrade by dump and restore,
will upgrade to the better default operator class without noticing. I am
not sure not notifying is the best think to do, though.

The problem is that pg_dump --binary-upgrade dumps objects in the extension
on the old cluster, not just the CREATE EXTENSION statement. pg_upgrade
fails to restore them, if the new operator class already exists on the new
cluster as the default. It effects all users who use the extension, even
if they are not using the inet and cidr operator classes in it.


From: Florian Pflug <fgp(at)phlo(dot)org>
To: emre(at)hasegeli(dot)com
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-28 15:30:37
Message-ID: 85B75D9F-CC17-41FC-B9FE-FC2237ABF16A@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Feb28, 2014, at 15:45 , Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:
> The problem is that pg_dump --binary-upgrade dumps objects in the extension
> on the old cluster, not just the CREATE EXTENSION statement. pg_upgrade
> fails to restore them, if the new operator class already exists on the new
> cluster as the default. It effects all users who use the extension, even
> if they are not using the inet and cidr operator classes in it.

Hm, what if we put the new opclass into an extension of its own, say inet_gist,
instead of into core?

Couldn't we then remove the inet support from the latest version of btree_gist
(the one we'd ship with 9.4)? People who don't use the old inet opclass could
then simply upgrade the extension after running pg_upgrade to get rid of the
old, broken version. People who *do* use the old inet opclass would need to
drop their indices before doing that, then install the extension inet_gist,
and finally re-create their indices.

People who do nothing would continue to use the old inet opclass.

inet_gist's SQL script could check whether btree_gist has been upgrade, and
if not fail with an error like "btree_gist must be upgraded to at least version
x.y before inet_gist can be installed". That would avoid failing with a rather
cryptic error later.

best regards,
Florian Pflug


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-02-28 16:25:00
Message-ID: CAE2gYzyfHdRPZKpbgp7Pi1B1d6Mz1NkLtFVWGWCvfeYNqxkfOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-02-28 17:30, Florian Pflug <fgp(at)phlo(dot)org>:
> Hm, what if we put the new opclass into an extension of its own, say inet_gist,
> instead of into core?

It will work but I do not think it is better than adding it in core as
not default.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-03-03 16:16:47
Message-ID: CA+Tgmobjbz-ciZ6zN2afWvTWDqForZvOA9cvvddG_UF0E8d-Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 27, 2014 at 12:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> On Feb27, 2014, at 17:56 , Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> That's not a bug, it's a feature, for much the same reasons that pg_dump
>>> tries to minimize explicit schema-qualification.
>
>> I fail to see the value in this for opclasses. It's certainly nice for
>> schema qualifications, because dumping one schema and restoring into a
>> different schema is probably quite common.
>
> The value in it is roughly the same as the reason we don't include a
> version number when dumping CREATE EXTENSION. If you had a default
> opclass in the source database, you probably want a default opclass
> in the destination, even if that's not bitwise the same as what you
> had before. The implication is that you want best practice for the
> current PG version.

I don't think that argument holds a lot of water in this instance.
The whole reason for having multiple opclasses that each one can
implement different comparison behavior. It's unlikely that you want
an upgrade to change the comparison behavior you had before; you'd be
sad if, for example, the dump/restore process failed to preserve your
existing collation settings.

But even if that were desirable in general, suppressing it for binary
upgrade dumps certainly seems more than sane.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST support for inet datatypes
Date: 2014-03-03 16:43:37
Message-ID: 9356.1393865017@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Feb 27, 2014 at 12:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The value in it is roughly the same as the reason we don't include a
>> version number when dumping CREATE EXTENSION. If you had a default
>> opclass in the source database, you probably want a default opclass
>> in the destination, even if that's not bitwise the same as what you
>> had before. The implication is that you want best practice for the
>> current PG version.

> I don't think that argument holds a lot of water in this instance.
> The whole reason for having multiple opclasses that each one can
> implement different comparison behavior.

Well, I doubt we'd accept a proposal to change the default opclass
of a datatype to something that had incompatible behavior --- but
we might well accept one to change it to something that had improved
behavior, such as more operators.

The first couple dozen lines in GetIndexOpClass() make for interesting
reading in this context. That approach to cross-version compatibility
probably doesn't work in the pg_upgrade universe, of course; but what I'd
like to point out here is that those kluges wouldn't have been necessary
in the first place if pg_dump had had the suppress-default-opclasses
behavior at the time. (No, it didn't always do that; cf commits
e5bbf1965 and 1929a90b6.)

regards, tom lane


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-03-08 21:40:31
Message-ID: CAE2gYzxMoOioG8LiD8=nbnQW_CQDcbw89KZxTpd9P+GbC8Jc0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fourth version of the patch attached. It is rebased to the HEAD (8879fa0).
Operator name formatting patch rebased on top of it. I will put
the selectivity estimation patch to the next commit fest.

This version of the patch does not touch to the btree_gist extension,
does not set the operator class as the default. It adds support to
the not equals operator to make the operator class compatible with
the btree_gist extension.

Attachment Content-Type Size
inet-gist-v4.patch application/octet-stream 43.7 KB
inet-gist-v4-operator-rename.patch application/octet-stream 19.6 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-04 14:01:13
Message-ID: 20140404140113.GC14419@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-03-08 23:40:31 +0200, Emre Hasegeli wrote:
> Fourth version of the patch attached. It is rebased to the HEAD (8879fa0).
> Operator name formatting patch rebased on top of it. I will put
> the selectivity estimation patch to the next commit fest.
>
> This version of the patch does not touch to the btree_gist extension,
> does not set the operator class as the default. It adds support to
> the not equals operator to make the operator class compatible with
> the btree_gist extension.

This patch looks like it can be applied much more realistically, but it
looks too late for 9.4. I suggest moving it to the next CF?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-04 14:50:36
Message-ID: 533EC6BC.4050205@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/04/2014 04:01 PM, Andres Freund wrote:
> This patch looks like it can be applied much more realistically, but it
> looks too late for 9.4. I suggest moving it to the next CF?

If it does not change the default operator class I do not see anything
preventing it from being applied to 9.4, as long as the committers have
the time to look at this. My review is done and I think the first patch
is ok and useful by itself.

--
Andreas


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-04 15:07:26
Message-ID: 20140404150726.GH14419@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-04-04 16:50:36 +0200, Andreas Karlsson wrote:
> On 04/04/2014 04:01 PM, Andres Freund wrote:
> >This patch looks like it can be applied much more realistically, but it
> >looks too late for 9.4. I suggest moving it to the next CF?
>
> If it does not change the default operator class I do not see anything
> preventing it from being applied to 9.4, as long as the committers have the
> time to look at this. My review is done and I think the first patch is ok
> and useful by itself.

Well, the patch was marked as needs review, not ready for committer. So,
if that's your position, re-check the latest version and mark it as
ready.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: emre(at)hasegeli(dot)com
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-04 15:34:09
Message-ID: 533ED0F1.3070005@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/08/2014 10:40 PM, Emre Hasegeli wrote:
> Fourth version of the patch attached. It is rebased to the HEAD (8879fa0).
> Operator name formatting patch rebased on top of it. I will put
> the selectivity estimation patch to the next commit fest.
>
> This version of the patch does not touch to the btree_gist extension,
> does not set the operator class as the default. It adds support to
> the not equals operator to make the operator class compatible with
> the btree_gist extension.

The patch looks good but it does not apply anymore against master. If
you could fix that and the duplicate OIDs I think this is ready for a
committer.

Andreas


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-05 10:30:45
Message-ID: CAE2gYzzioHNxdZXyWz0waruJuw7wKpEJ-2xPTihjd6Rv8YJF_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-04-04 18:34, Andreas Karlsson <andreas(at)proxel(dot)se>:

> The patch looks good but it does not apply anymore against master. If you
> could fix that and the duplicate OIDs I think this is ready for a committer.

Rebased version attached. I am marking it as Ready for Committer.

I will be happy to rebase the operator rename patch, too, if there is any
interest committing it.

Attachment Content-Type Size
inet-gist-v5.patch application/octet-stream 43.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: emre(at)hasegeli(dot)com
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-08 04:52:14
Message-ID: 12597.1396932734@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Emre Hasegeli <emre(at)hasegeli(dot)com> writes:
> 2014-04-04 18:34, Andreas Karlsson <andreas(at)proxel(dot)se>:
>> The patch looks good but it does not apply anymore against master. If you
>> could fix that and the duplicate OIDs I think this is ready for a committer.

> Rebased version attached. I am marking it as Ready for Committer.

I've been hacking on this patch over the weekend. I wasn't that thrilled
with the design of the index representation: it seemed to me that by
allowing both the minimum netmask width and the number of common address
bits to limit what you store in union values, you were giving up a lot.
I tried recoding things to separate netmask width from number of common
bits. Initially I'd thought that this would allow for smarter
tree-descent logic in the consistent function, but after several failures
I realized that that was more easily said than done. (I've not totally
lost hope about it, but it's not easy given the existing inet comparison
rules.) Nonetheless, I found that doing it like this led to substantially
faster index searches --- better than two-to-one in many cases, on random
data as per the attached test script. I believe the reason is that
decoupling netmask and common address bits makes the picksplit function
more effective at choosing good splits.

I also thought that we should at least put in dummy selectivity functions
for the now-indexable inet operators. This will allow real selectivity
functions to be patched in without forcing initdb. It's probably unlikely
that we'd back-patch your selectivity-function patch into 9.4, but why
foreclose the option?

So attached is an updated patch with these things taken care of. It's
still not quite ready to commit (in particular I've not looked at the
documentation changes yet), but if anyone wants to try to break it or
do their own performance testing, now's the time.

I also attach the script I was using for testing. It runs too long
to be a plausible candidate for the regression tests, but perhaps
someone else would like to use it anyway. There's a test that makes
sure that index searches get the same answers as the base operators
over a bunch of random data, and then a bunch of straight queries that
can be timed for performance testing. I thought that maybe I was
overemphasizing the problem of trimming common address bits to the
minimum netmask width, so the performance part of the script also tests
probes into a table that has a uniform netmask width of 32. (It still
wins to do it like this.)

Comments?

regards, tom lane

Attachment Content-Type Size
inet-gist-v6.patch text/x-diff 58.2 KB
inet-index-test.sql text/plain 5.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: emre(at)hasegeli(dot)com
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-08 19:51:54
Message-ID: 26232.1396986714@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> [ inet-gist-v6.patch ]

Committed with some additional documentation work. Thanks for
submitting this!

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-08 19:56:52
Message-ID: CA+TgmoYNMVbiK1tV_R5FuTpzwKGeS6ja5X88pc+LcSDyWFrtqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 8, 2014 at 3:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> [ inet-gist-v6.patch ]
>
> Committed with some additional documentation work. Thanks for
> submitting this!

NICE. I'd like to tell you how excited I am about this part:

# It also handles a new network
# operator inet && inet (overlaps, a/k/a "is supernet or subnet of"),
# which is expected to be useful in exclusion constraints.

...but I can't, because my mouth is too full of drool. Wouldn't you
really want that more for cidr than for inet, though?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Emre Hasegeli <emre(at)hasegeli(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-08 20:02:48
Message-ID: 26488.1396987368@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> NICE. I'd like to tell you how excited I am about this part:

> # It also handles a new network
> # operator inet && inet (overlaps, a/k/a "is supernet or subnet of"),
> # which is expected to be useful in exclusion constraints.

> ...but I can't, because my mouth is too full of drool. Wouldn't you
> really want that more for cidr than for inet, though?

Probably, but it works with either since they're binary-compatible.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Emre Hasegeli <emre(at)hasegeli(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-08 21:04:53
Message-ID: 20140408210453.GF2556@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Tue, Apr 8, 2014 at 3:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I wrote:
> >> [ inet-gist-v6.patch ]
> >
> > Committed with some additional documentation work. Thanks for
> > submitting this!
>
> NICE. I'd like to tell you how excited I am about this part:
>
> # It also handles a new network
> # operator inet && inet (overlaps, a/k/a "is supernet or subnet of"),
> # which is expected to be useful in exclusion constraints.
>
> ...but I can't, because my mouth is too full of drool. Wouldn't you
> really want that more for cidr than for inet, though?

You realize ip4r has had all this and more for, like, forever, right?
It's also more efficient wrt storage and generally more 'sane' regarding
operators, etc..

Just thought I'd share.. If you have a use-case, ip4r is what
everyone's been using for quite some time. Also, yes, it supports both
ipv4 and ipv6, despite the name.

Thanks,

Stephen


From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: GiST support for inet datatypes
Date: 2014-04-09 16:43:23
Message-ID: CAE2gYzz-0SO_yNWz1EyF_j3=M_RS8acz+6PzYg8dddKaY3=uJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Committed with some additional documentation work. Thanks for
> submitting this!

Thank you for committing. I had not thought of using different structure
for the index. It works faster with my test case, too.

I am sending rebased version of the consistent operator names patch
in case you would like to include it to the same release. This is what
I wrote about this change before:

> That is why I prepared it as a separate patch on top of the others. It is
> not only consistency with the range types: <@ and @> symbols used for
> containment everywhere except the inet data types, particularly on
> the geometric types, arrays; cube, hstore, intaray, ltree extensions.
> The patch does not just change the operator names, it leaves the old ones,
> adds new operators with GiST support and changes the documentation, like
> your commit ba920e1c9182eac55d5f1327ab0d29b721154277 back in 2006. I could
> not find why did you leave the inet operators unchanged on that commit,
> in the mailing list archives [1]. GiST support will be a promotion for
> users to switch to the new operators, if we make this change with it.
>
> This change will also indirectly deprecate the undocumented non-transparent
> btree index support that works sometimes for some of the subnet inclusion
> operators [2].
>
> [1] http://www.postgresql.org/message-id/14277.1157304939@sss.pgh.pa.us
>
> [2] http://www.postgresql.org/message-id/389.1042992616@sss.pgh.pa.us

Attachment Content-Type Size
inet-operator-v2.patch application/octet-stream 12.8 KB