Re: Operator class group proposal

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Operator class group proposal
Date: 2006-12-13 21:27:09
Message-ID: 11695.1166045229@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

After further thought about the mergejoinable-operators issue and some
other longstanding planner problems, I have a modest proposal to make:
we should invent the notion of "operator class groups", which identify
sets of compatible operator classes. (I'm not wedded to the name "class
group"; it seems a bit redundant. Anyone have a better idea?) We've
touched on related ideas in past threads but never really put together
a concrete proposal. Here is one.

A class group is associated with a specific index AM and can contain only
opclasses for that AM. We might for instance invent "numeric" and
"numeric_reverse" groups for btree, to contain the default opclasses and
reverse-sort opclasses for the standard arithmetic types.

We further require that any given opclass be a member of at most one class
group (this simplifies matters, and there isn't any application I can see
for one opclass being in more than one group), and that a class group
contain at most one opclass for a given datatype (ditto). Recall that
the member operators of an opclass all have the opclass' indexable
datatype as their lefthand inputs, but the righthand input datatype can
vary. These rules imply that any given operator can stand in at most one
relationship to a class group: there is only one potential class member it
could be in. It's still possible for an operator to appear in different
groups (e.g., within both normal and reverse-sort groups).

For btree opclasses, if two opclasses are in the same group, it means that
they have compatible sort ordering and equality rules. In particular, for
any "=" operators of member opclasses and values A, B, C of the
appropriate datatypes, we require
A = B and B = C implies A = C
Likewise for "<" operators:
A < B and B < C implies A < C
and likewise for <=, >, >= (I think these cases are implied if the first
two must hold).

I believe these rules are sufficient to guarantee that any "=" operator in
the opclass group can be mergejoined using "<" and ">" operators taken
from the appropriate member opclasses. Therefore we can entirely eliminate
the oprlsortop etc. fields of pg_operator and rely on looking into the
opclass catalogs to determine how to do mergejoins. This is an
improvement over the current situation (and also my proposal of yesterday)
because it will be possible to support both ascending and descending
mergejoins. In the current structure we can only associate one lsortop
("<") with any "=", but if there are reverse-sort opclasses in existence
it would be nice to be able to perform a mergejoin using ">" as well.

Furthermore, the class group structure provides a solid foundation for the
planner's transitive-equality deductions, which are currently pretty bogus
as discussed yesterday.

Lastly, the group structure fixes a nasty problem in predicate testing:
currently, we tend to fail to prove implications involving cross-datatype
operators because we can't find operators that are in the same opclass and
compare the two types we need to compare. With class groups, we can use
any two operators in the same class group to make a deduction. (This also
cleans up btree scankey analysis for cross-type situations; it has the
same problem of not knowing how to determine which key values are lesser
or greater.)

For hash opclasses, placing two opclasses in the same group means that
they have compatible hash functions. This allows us to support hashing of
cross-type comparisons. As an example, "int4 eq int8" can be hashed if
the int8 hash function guarantees that for any int8 value that could fit
in int4, it will produce the same hash code as the int4 hash function
does. (Which, in fact, our current integer hash functions are designed to
do.) Currently we cannot exploit this because we have no way to identify
which two hash functions to use. With opclass groups, we could put
cross-type equality operators into hash opclasses the same way as we do
for btree opclasses, and the rule would be to use the hash functions for
the group's opclasses matching the left and right datatypes.

I don't have any idea whether opclass groups would be useful for GiST or
GIN indexes, but maybe Oleg and Teodor can think of applications.

As far as syntax goes, I was envisioning

CREATE/DROP OPERATOR CLASS GROUP name USING index_method;

and adding an optional "GROUP name" clause to CREATE OPERATOR CLASS.
We could imagine adding a lot of additional stuff such as being able to
ALTER an existing opclass' group membership, but at least for the first
cut I see no need for anything more than this. (In particular, since we'd
probably only allow these commands to superusers, we needn't bother with a
concept of ownership for class groups.)

In the system catalogs, we'd need to add a "pg_opclassgroup" catalog
with OIDs and columns for name, namespace, and index AM's OID; and
pg_opclass would get a new column for the OID of the class's group,
or 0 if not in any group (which really effectively means it's a group
of one opclass).

BTW, I'm also pretty strongly tempted to denormalize the catalogs a bit
by adding group OID and index AM OID to individual pg_amop entries,
thereby reducing lookup time when trying to identify an operator's
properties. Currently we need an extra probe into pg_opclass to see
which index AM the opclass belongs to, and with this proposal we'd also
be looking at the group OID quite a lot.

Comments?

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-13 21:46:46
Message-ID: 20061213214646.GE15546@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 13, 2006 at 04:27:09PM -0500, Tom Lane wrote:
> After further thought about the mergejoinable-operators issue and some
> other longstanding planner problems, I have a modest proposal to make:
> we should invent the notion of "operator class groups", which identify
> sets of compatible operator classes. (I'm not wedded to the name "class
> group"; it seems a bit redundant. Anyone have a better idea?) We've
> touched on related ideas in past threads but never really put together
> a concrete proposal. Here is one.

<snip>

I think it's a good idea, though I would point out that in the examples
given it's the underlying types that are compatable, not the classes.
But I'm unsure if there's a way to use that. These groups seem a
reasonable addition to the existing system.

Other names I can think of:

- type class
- type group
- compatability class
- operator class set

None of which sound any good :(

It's good is that this provides more information about the underlying
types to the system, which improves the possibility of optimisation
(and correctness).

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-13 22:21:54
Message-ID: 12397.1166048514@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Wed, Dec 13, 2006 at 04:27:09PM -0500, Tom Lane wrote:
>> we should invent the notion of "operator class groups", which identify
>> sets of compatible operator classes.

> I think it's a good idea, though I would point out that in the examples
> given it's the underlying types that are compatable, not the classes.

Well, I didn't try very hard to provide a complete set of examples,
but here is one addressing that point: the string types have one set of
opclasses involving < = > and one set involving ~<~ ~=~ ~>~ (the
"pattern_ops" opclasses). These would need to be distinct class groups
since in fact they have incompatible semantics. Reverse-sort opclasses
would be another example.

> Other names I can think of:
> - type class
> - type group
> - compatability class
> - operator class set
> None of which sound any good :(

Yeah, I'm drawing a blank on good names too. In the absence of any
better idea I'm inclined to re-use some word that's already a keyword,
rather than invent a new one. (GROUP is already a fully reserved word,
I think because it's used in GROUP BY, so it wouldn't pose any parsing
problems.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-13 22:42:37
Message-ID: 12615.1166049757@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> We further require that any given opclass be a member of at most one class
> group (this simplifies matters, and there isn't any application I can see
> for one opclass being in more than one group), and that a class group
> contain at most one opclass for a given datatype (ditto).

BTW, I forgot to mention one of the motivations for that last
restriction: I'm thinking it would be convenient to allow index
declarations to accept either an opclass name or a class group name.
Thus you could say "pattern_ops" instead of being specific about
"varchar_pattern_ops" or "text_pattern_ops". Not sure whether
there's a need to worry about name collisions ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-14 00:28:20
Message-ID: 13537.1166056100@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ replying to myself again, how tacky :-( ]

I wrote:
> BTW, I forgot to mention one of the motivations for that last
> restriction: I'm thinking it would be convenient to allow index
> declarations to accept either an opclass name or a class group name.
> Thus you could say "pattern_ops" instead of being specific about
> "varchar_pattern_ops" or "text_pattern_ops".

Some more thought about that yielded what might or might not be a good
idea: why not just collapse all the members of a "class group" into one
opclass? In other words, not make a distinction between groups and
individual opclasses? So for example, int2_ops int4_ops and int8_ops
would all collapse into one big opclass.

In SQL terms this would mean allowing several "FOR TYPE foo" clauses in
a single CREATE OPERATOR CLASS command, and we'd likely have to give up
the shorthand where you can omit argument types for operators (or at
least, it'd only work in the case where you say FOR TYPE just once).

At the system catalog level this'd require some rearrangement of
pg_opclass, and the unique keys for pg_amop and pg_amproc would probably
need another column, but it certainly seems doable.

One rather large objection is that existing pg_dump files may contain
references to specific opclasses such as "varchar_pattern_ops". How
could we deal with converting that to a world where there's only one
"pattern_ops" opclass? In the past we've resorted to hard-wired
substitutions in gram.y, but that got too bletcherous to bear quite
some time ago. I'm thinking about allowing opclasses to have "aliases"
recorded in the system catalogs as a more general solution to the
problem; it'd let us get rid of those old hacks in gram.y, too.

The other thing that'd have to be dealt with is that for GIST and GIN,
the STORAGE clause needs to be associated with a specific indexable
type. I'm thinking the syntax might be like

CREATE OPERATOR CLASS name USING index_method AS
{ [ DEFAULT ] FOR TYPE data_type [ STORAGE storage_type ] [ ALIAS name ]
| OPERATOR strategy_number operator_name [ (op_type, op_type) ] [ RECHECK ]
| FUNCTION support_number funcname (argument_type [, ...])
} [, ... ]

plus a backwards-compatible variant syntax where there's just one FOR TYPE
clause before the USING part.

With either this or the original proposal, or for that matter our
existing releases, there's a bit of an issue with needing to modify
existing opclasses. Consider the situation of someone adding say
"tinyint" as a contrib module, and wanting it to interact reasonably
with the existing datatypes. He has to create some cross-type
operators, which is no big deal, but then he needs to plug them into
already-existing opclasses, and there's no provision for doing that.
I had been thinking that the scheme of adding opclasses to a
pre-existing group would fix that, but it doesn't really: you still need
to add more cross-type operators to the existing group member opclasses.

So this leads me to think that we need to allow
ALTER OPERATOR CLASS foo ADD [ TYPE/OPERATOR/FUNCTION ]
ALTER OPERATOR CLASS foo DROP [ TYPE/OPERATOR/FUNCTION ]
and the process for the would-be tinyint implementor is to add on
entries to an existing "numeric_ops" operator class, instead of
starting his own class.

Among other things this would require finer granularity in the
pg_depends entries, since dropping a type would involve deleting
elements of an opclass but not necessarily the whole thing.
(That may well mean that we still need an object equivalent to the
current per-datatype opclass, as something to be named in pg_depends;
in which case the above restructuring becomes largely syntactic sugar.)

It may be that the separate-GROUP approach is about as clean as this
once all the dust settles, but we'd still need to do something about
allowing additions to an existing operator class.

Thoughts?

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-14 10:32:21
Message-ID: 45812835.1010109@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Some more thought about that yielded what might or might not be a good
> idea: why not just collapse all the members of a "class group" into one
> opclass? In other words, not make a distinction between groups and
> individual opclasses? So for example, int2_ops int4_ops and int8_ops
> would all collapse into one big opclass.

IMHO, this is better than groups, but in any case exact opclass will be defined
by pair: (opgroup, opclass) in first case or (opclass, alias) in second. Pair is
needed to find support function(s). For instance, contrib/intarray makes two
GiST opclasses with different storage types.

> I don't have any idea whether opclass groups would be useful for GiST or
> GIN indexes, but maybe Oleg and Teodor can think of applications.

I'm afraid it isn't useful for GiST/GIN - strategies are not defined at all and
planner can't known about semantic meaning of operation/strategies.

Examples of opclasses which can be joined in one group:
1) geometry opclasses
2) gist__int_ops and gist__intbig_ops for GiST from contrib/intarray (BTW,
gin__int_ops for GIN supports the same operations and the same strategies
as gist__int*_ops)
3) some of opclasses from btree_gist.
4) builtin GIN oplasses for arrays and it possible to add cross-type operations
for arrays ( ie int4[] && int8[] )

BTW, unpleasant behaviour of postgres:
postgres=# CREATE TABLE foo (a varchar);
postgres=# CREATE INDEX fooidx ON foo (a varchar_pattern_ops);
postgres=# ALTER OPERATOR CLASS varchar_pattern_ops USING btree RENAME TO bar;

% pg_dump postgres | psql wow
...
ERROR: operator class "bar" does not exist for access method "btree"
...

May be, it will be useful to forbids altering/renaming of builtin objects...

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2006-12-14 11:28:41
Message-ID: E1539E0ED7043848906A8FF995BDA5790198F001@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> CREATE OPERATOR CLASS name USING index_method AS
> { [ DEFAULT ] FOR TYPE data_type [ STORAGE storage_type ] [
> ALIAS name ]
> | OPERATOR strategy_number operator_name [ (op_type,
> op_type) ] [ RECHECK ]
> | FUNCTION support_number funcname (argument_type [, ...])
> } [, ... ]
>
> plus a backwards-compatible variant syntax where there's just
> one FOR TYPE
> clause before the USING part.
>
> With either this or the original proposal, or for that matter our
> existing releases, there's a bit of an issue with needing to modify
> existing opclasses. Consider the situation of someone adding say
> "tinyint" as a contrib module, and wanting it to interact reasonably
> with the existing datatypes. He has to create some cross-type
> operators, which is no big deal, but then he needs to plug them into
> already-existing opclasses, and there's no provision for doing that.
> I had been thinking that the scheme of adding opclasses to a
> pre-existing group would fix that, but it doesn't really: you
> still need
> to add more cross-type operators to the existing group member
> opclasses.
>
> So this leads me to think that we need to allow
> ALTER OPERATOR CLASS foo ADD [ TYPE/OPERATOR/FUNCTION ]
> ALTER OPERATOR CLASS foo DROP [ TYPE/OPERATOR/FUNCTION ]
> and the process for the would-be tinyint implementor is to add on
> entries to an existing "numeric_ops" operator class, instead of
> starting his own class.

I think it would be easier to understand if we do not merge the
opclasses
for different types into one statement.

We would add type to the key and thus have:
CREATE OPERATOR CLASS name USING index_method FOR TYPE data_type ...
DROP OPERATOR CLASS name USING index_method FOR TYPE data_type ...

I would then name the opclasses btree_ops (for the default btree
opclass)
with no reference to type, but only to functionality. (maybe we need to
diff text and number)
Classes with the same name and index_method would implicitly be a class
group.

I generally like the idea to simplify this area.

As an aside, Informix decided to name compatible operator functions
identically
and define an opclass as those names:
opclassname owner amid opclassid ops support
btree_ops informix 1 1
lessthan;lessthanorequal;equal;greaterthanorequal;greaterthan; compare;
rtree_ops informix 2 2
overlap;equal;contains;within; union;size;inter;
This is also simple to use and understand.

Andreas


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-14 13:09:49
Message-ID: 20061214130949.GC24370@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 13, 2006 at 07:28:20PM -0500, Tom Lane wrote:
> Some more thought about that yielded what might or might not be a good
> idea: why not just collapse all the members of a "class group" into one
> opclass? In other words, not make a distinction between groups and
> individual opclasses? So for example, int2_ops int4_ops and int8_ops
> would all collapse into one big opclass.

I think it may be useful to maintain the distinction between groups and
classes for users, because at that level the whole concept is easier to
understand. Dropping and recreating operator classes is easier to
handle than playing strange tricks with ALTER OPERATOR CLASS GROUP. And
probably easier to get right/harder to screw up.

How the backend implements it may be easier as one single large
opclass. Essentially the operater class table becomes merely a
placeholder for the name (maybe also aliases?), which can still be used
in index declarations, but is never used by the backend otherwise.

That would preserve backward compatability while still allowing all the
benefits.

BTW, in some aspects this resembles some of the changes needed for SQL
collate support. There also I tried to merge operator classes into
something larger. However, there the point was that the group becomes a
common pathkey identifier for all the underlying types, which is not in
your proposal.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-14 15:36:05
Message-ID: 20687.1166110565@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> writes:
> I think it would be easier to understand if we do not merge the
> opclasses for different types into one statement.

Agreed, huge CREATE OPERATOR CLASS commands would be no fun, which
is one reason for my recommendation to improve ALTER OPERATOR CLASS.
I think that in practice people would use ALTER to add one type at
a time to an opclass.

> Classes with the same name and
> index_method would implicitly be a class group.

[ itch... ] I've never cared for the idea that semantics should depend
fundamentally on the mere name of something. I think we want class
groups to be real objects in one form or another, not chance
associations. As a specific objection, under this rule it would never
become possible to allow unprivileged users to create opclasses, because
they could break the behavior of someone else's opclass just by creating
another one of the same name with not-really-compatible behavior.

> As an aside, Informix decided to name compatible operator functions
> identically and define an opclass as those names:

Interesting. Probably too much water under the bridge now for us to
consider forcing function/operator renames, though.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-14 15:58:56
Message-ID: 21036.1166111936@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> I think it may be useful to maintain the distinction between groups and
> classes for users, because at that level the whole concept is easier to
> understand. Dropping and recreating operator classes is easier to
> handle than playing strange tricks with ALTER OPERATOR CLASS GROUP. And
> probably easier to get right/harder to screw up.

I'm not sure. The problem that I'm seeing is that currently, cross-type
comparisons go into the opclass associated with their left-hand argument
type. Therefore, if say you want to add "tinyint" to an opclass group,
you not only need to add an opclass for tinyint (containing tinyint vs
tinyint as well as tinyint vs other-type operators), but you also need
to add other-type vs tinyint operators to the *other* members of the
group. So the notion of the classes being separate objects seems a bit
artificial to me. I think that "if I want to make tinyint part of the
numeric_ops index opclass, I just add the type and all these operators to
that opclass" is at least as clear.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2006-12-14 17:00:00
Message-ID: 87mz5qifan.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> I'm not sure. The problem that I'm seeing is that currently, cross-type
> comparisons go into the opclass associated with their left-hand argument
> type. Therefore, if say you want to add "tinyint" to an opclass group,
> you not only need to add an opclass for tinyint (containing tinyint vs
> tinyint as well as tinyint vs other-type operators), but you also need
> to add other-type vs tinyint operators to the *other* members of the
> group. So the notion of the classes being separate objects seems a bit
> artificial to me. I think that "if I want to make tinyint part of the
> numeric_ops index opclass, I just add the type and all these operators to
> that opclass" is at least as clear.

Hm, would we still need all the cross-data-type btree operators? Could the
planner be taught that as long as the two types belong to the same
opclassclass that it's ok to use a btree operator that requires a cast first
before being used?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-14 17:14:42
Message-ID: 21930.1166116482@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Hm, would we still need all the cross-data-type btree operators?

Yes, I think so; remember all the pain we had when we didn't have
indexable cross-type operators and spent years looking for a non-broken
way of introducing casts to solve the problem. Those were fundamental
semantic problems and AFAICS we'd be right back into that if we take
cross-type operators out of the opclasses again.

Basically what I'm on about here is that the way we shoehorned
cross-type operators into opclasses was a kluge. Which was not a bad
idea when we weren't yet sure it would solve the problem. But now it's
looking better and better to take the next step and allow opclasses to
support multiple types explicitly.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-15 01:31:59
Message-ID: 29947.1166146319@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> I think it may be useful to maintain the distinction between groups and
> classes for users, because at that level the whole concept is easier to
> understand. Dropping and recreating operator classes is easier to
> handle than playing strange tricks with ALTER OPERATOR CLASS GROUP. And
> probably easier to get right/harder to screw up.

After further thought I'm beginning to see the merits of this
suggestion. The consideration that drives it is this: what is an index
going to depend on in pg_depends? If we have a massive operator class
handling multiple types, we certainly don't want to have to drop every
index using that opclass if we want to drop just one of the datatypes.
(This would be especially true for things like "numeric_ops" which would
support both user indexes and system catalogs ... after adding "tinyint"
to that, what if you want to back it out again?) So it seems like in a
multi-type-operator-class system there'd still have to be an explicit
notion of an opclass-and-indexed-datatype combination --- which we might
as well keep calling an operator class for old times' sake. So we're
back to calling the larger structure an "operator class group" unless
someone's got a better idea.

However, thinking about the effects of DROP leads me to suggest that an
operator class in this sense should ideally contain only the "core"
stuff that you must have to support an index on a column of its
datatype. In particular, cross-type comparisons are generally not
"core", and could be dropped without breaking indexes that use the
class. So I suggest that cross-type comparison operators could be
declared to be members of an operator class group but not directly
members of any one of its operator classes.

In other words the object hierarchy looks like

Operator class group

Operator class

"core" ops and support functions for class

Operator class

"core" ops and support functions for class

...

Cross-type ops and support functions

The dependencies would be set up so that dropping a "core" operator
forces dropping its operator class and any dependent indexes, same
as today. But dropping a cross-type op doesn't force anything,
and an operator class group per se never goes away unless explicitly
destroyed. (It could validly exist with no contained objects, in
fact that would be its initial state.)

I think the CREATE OPERATOR CLASS GROUP command becomes just

CREATE OPERATOR CLASS GROUP name USING am;

and then you add opclasses to it by creating them with a GROUP
option used in CREATE OPERATOR CLASS, and/or add individual cross-type
ops to it with an ALTER command that would look like

ALTER OPERATOR CLASS GROUP name ADD { OPERATOR ... | FUNCTION ... } ...

As now, there's no ALTER OPERATOR CLASS, you can only drop the whole
thing (with any dependent indexes) and re-create it. This seems a good
idea because we lack any infrastructure for rebuilding dependent indexes
after an opclass redefinition. That doesn't seem like a problem for
fooling with individual cross-type ops in a class group though.

For backwards compatibility, we'd still accept cross-type ops listed in
CREATE OPERATOR CLASS, but this would be deprecated for the reason that
dropping such an op would force dropping the class (as indeed it does
today) and obviously you'd rather not have indexes dependent on non-core
operators. Note that which operators are "core" to an opclass is not
hardwired into this structure, it's the class author's choice.

> How the backend implements it may be easier as one single large
> opclass. Essentially the operater class table becomes merely a
> placeholder for the name (maybe also aliases?), which can still be used
> in index declarations, but is never used by the backend otherwise.

To represent this cleanly in the catalogs, we'd probably want nominally
stand-alone opclasses to belong to implicitly created single-member
groups, because the individual entries in pg_amop and pg_amproc are
always going to be shown as belonging to groups --- the membership
hierarchy above is only interesting in pg_depends, not for index
operations. I think that's about the same thing you're saying here.

Comments?

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-15 10:13:45
Message-ID: 20061215101345.GB958@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 14, 2006 at 08:31:59PM -0500, Tom Lane wrote:
> > How the backend implements it may be easier as one single large
> > opclass. Essentially the operater class table becomes merely a
> > placeholder for the name (maybe also aliases?), which can still be used
> > in index declarations, but is never used by the backend otherwise.
>
> To represent this cleanly in the catalogs, we'd probably want nominally
> stand-alone opclasses to belong to implicitly created single-member
> groups, because the individual entries in pg_amop and pg_amproc are
> always going to be shown as belonging to groups --- the membership
> hierarchy above is only interesting in pg_depends, not for index
> operations. I think that's about the same thing you're saying here.

I think we're on the same page. I thought of another motivation also:
protections/permissions. We don't currently allow people to mess with
definitions needed by system catalogs, so you don't want to allow users
to mess with the btree(int4) class, but you want to allow them to
modify the group as a whole to add new things and remove things they've
added.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-15 19:38:56
Message-ID: 20879.1166211536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> I think we're on the same page. I thought of another motivation also:
> protections/permissions. We don't currently allow people to mess with
> definitions needed by system catalogs, so you don't want to allow users
> to mess with the btree(int4) class, but you want to allow them to
> modify the group as a whole to add new things and remove things they've
> added.

Yeah. We're a long way from letting non-superusers manipulate opclass
definitions, but it'd be good if the structure were not one that would
forbid using permissions in future.

I'm still not real happy with the phrase "operator class group"; it
seems unwieldy, and it's not even technically accurate in this design,
since the structure can contain operators directly not only operator
classes. What do people think about calling it just an "operator
group"? That's sort of reasonable since really what we're doing is
saying that all these operators are compatible. (A lot of the things
the planner wants to do with this info have nothing to do with indexes
anyway.)

The main objection I can think of is that a novice seeing the terms
"operator class" and "operator group" isn't going to have any context
to know which is which, but I'm not sure that we can devise any terms
that would help much.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2006-12-15 23:23:15
Message-ID: 87ejr0iw0s.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> The main objection I can think of is that a novice seeing the terms
> "operator class" and "operator group" isn't going to have any context
> to know which is which, but I'm not sure that we can devise any terms
> that would help much.

Operator Superclass ?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-15 23:44:10
Message-ID: 22644.1166226250@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Operator Superclass ?

Yeah, I thought about that too, but I don't like it much ... can't
entirely put my finger on why not, except that class/superclass usually
implies that the objects you're talking about are all the same kind of
thing, whereas what we have here is a very definite distinction between
two kinds of objects. On the same grounds, I'd object to calling
schemas "directories" or "folders", unless they could be nested.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2006-12-15 23:57:22
Message-ID: 87ac1oiufx.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> Operator Superclass ?
>
> Yeah, I thought about that too, but I don't like it much ... can't
> entirely put my finger on why not, except that class/superclass usually
> implies that the objects you're talking about are all the same kind of
> thing, whereas what we have here is a very definite distinction between
> two kinds of objects.

I'm kind of glad to hear that since I had similar discomfort with it myself. I
thought it might be the least worst option though so I figured I would say it
out loud.

Perhaps something like

Operator Class
and
Data Type Class

Data type classes happens to involve operator classes but it sounds like
you're looking for them to specify other behaviours of how data types
inter-relate than just their operator classes anyways.

> On the same grounds, I'd object to calling schemas "directories" or
> "folders", unless they could be nested.

(Actually that's a bit of an odd case since real-world folders aren't
generally nestable and actually many early operating systems didn't allow them
to be nested either. We're just so used to the computer metaphor that we
forget what it's originally supposed to represent.)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 00:04:17
Message-ID: 22804.1166227457@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Perhaps something like

> Operator Class
> and
> Data Type Class

> Data type classes happens to involve operator classes but it sounds like
> you're looking for them to specify other behaviours of how data types
> inter-relate than just their operator classes anyways.

Well, actually I think of this more as something that tells me how a
bunch of *operators* relate. As an example, "=" and friends over the
string datatypes will be one group, and "~=~" and friends will be
another group over the very same types. So to me "data type class"
would really miss the point.

The alternatives I'd been thinking about were "operator set" and
"operator collection", but I don't really see any advantage over
"operator group" for either ...

>> On the same grounds, I'd object to calling schemas "directories" or
>> "folders", unless they could be nested.

> (Actually that's a bit of an odd case since real-world folders aren't
> generally nestable

True, ya got me ;-)

regards, tom lane


From: tomas(at)tuxteam(dot)de
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 05:39:37
Message-ID: 20061216053936.GC27519@www.trapp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Dec 15, 2006 at 06:44:10PM -0500, Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > Operator Superclass ?
>
> Yeah, I thought about that too, but I don't like it much ... can't
> entirely put my finger on why not [...]

I think I can ;-)

"Operator class group", unwieldy as it is, conveys the meaning that we
are talking about _sets of operator classes_. The nicer terms I have
seen all lose a bit of that ring to me.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFg4aYBcgs9XrR2kYRAp1mAJ9+ISc7Ex1qCBV2dKgNJSUAOSmR/ACeKt6O
KKp1DI9OkSrlO4VpJbb2xFM=
=KDb3
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: tomas(at)tuxteam(dot)de
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 16:14:02
Message-ID: 29449.1166285642@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tomas(at)tuxteam(dot)de writes:
> "Operator class group", unwieldy as it is, conveys the meaning that we
> are talking about _sets of operator classes_. The nicer terms I have
> seen all lose a bit of that ring to me.

The thing is that in the proposal as it currently stands, we're *not*
talking about sets of operator classes, because a group can contain
"free standing" operators as well. So the apparent technical accuracy
is really a bit misleading.

As I'm currently thinking about it, a group is a collection of
compatible operators, and the fact that it has some of those operators
in common with an opclass is almost incidental --- not from the index
AM's point of view maybe, but there will be large chunks of the system
that work with groups without ever thinking about opclasses.

Another thing that struck me this morning is that we make very heavy
use of the shorthand "opclass" in both the code and the docs. If we
call these things "operator groups" then we can use the parallel
construction "opgroup", which seems to read OK to me. But if they're
"operator class groups" we'll be stuck with something like
"opclassgroup" ... ugh.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 16:23:13
Message-ID: 20061216162313.GB14237@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > Perhaps something like
>
> > Operator Class
> > and
> > Data Type Class
>
> > Data type classes happens to involve operator classes but it sounds like
> > you're looking for them to specify other behaviours of how data types
> > inter-relate than just their operator classes anyways.
>
> Well, actually I think of this more as something that tells me how a
> bunch of *operators* relate. As an example, "=" and friends over the
> string datatypes will be one group, and "~=~" and friends will be
> another group over the very same types. So to me "data type class"
> would really miss the point.
>
> The alternatives I'd been thinking about were "operator set" and
> "operator collection", but I don't really see any advantage over
> "operator group" for either ...

I guess the problem is finding a term that makes it very obvious that is
"bigger than a class", but at the same time conveying the fact that you
can have lone operators in there, as well as whole classes.

So "collection", "group" and "set" fail because they can be easily seen
as being in the same category as a "class": a bunch of things. How
would anybody tell which one is the bigger one?

The "superclass" term passes that test, but fails because it is somehow
expected that you can't put a class inside a superclass.

How about "hyperclass" then? :-) Maybe "school" :-D

Maybe somebody should ask a philologist about a greek or latin prefix
with that meaning.


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 16:44:44
Message-ID: 2890.24.211.165.134.1166287484.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:

> The "superclass" term passes that test, but fails because it is somehow
> expected that you can't put a class inside a superclass.
>
> How about "hyperclass" then? :-) Maybe "school" :-D
>
> Maybe somebody should ask a philologist about a greek or latin prefix
> with that meaning.
>

I did wonder if "operator metaclass" might fit the bill ... finding a term
with meaning that is both adequate and accurate isn't easy.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 16:48:48
Message-ID: 420.1166287728@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> How about "hyperclass" then? :-) Maybe "school" :-D

So I finally went and looked at a thesaurus, and came up with some
possibilities:

category variety family crowd cluster flock pack troop gang clique
circle cabal association society union ensemble

"Operator family" seems like it might be a good choice.
One small problem with it is that in biology, "family" is a smaller
group than "class" ... but if we want to follow the biologists,
the only bigger groups than classes are "phylums" and "kingdoms",
neither of which seem like good words to use.

"Operator cluster" isn't too bad either, except that it might create
confusion with the other index-related meaning of "cluster".

regards, tom lane


From: tomas(at)tuxteam(dot)de
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 17:18:17
Message-ID: 20061216171817.GA29699@www.trapp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Dec 16, 2006 at 11:14:02AM -0500, Tom Lane wrote:
> tomas(at)tuxteam(dot)de writes:
> > "Operator class group", unwieldy as it is, conveys the meaning that we
> > are talking about _sets of operator classes_. The nicer terms I have
> > seen all lose a bit of that ring to me.
>
> The thing is that in the proposal as it currently stands, we're *not*
> talking about sets of operator classes, because a group can contain
> "free standing" operators as well. So the apparent technical accuracy
> is really a bit misleading.

Hm. Singleton classes?

> As I'm currently thinking about it, a group is a collection of
> compatible operators, and the fact that it has some of those operators
> in common with an opclass is almost incidental --- not from the index
> AM's point of view maybe, but there will be large chunks of the system
> that work with groups without ever thinking about opclasses.

Can you imagine a class straddling two groups?

[...]
> "opclassgroup" ... ugh.

Indeed.

regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFhCpZBcgs9XrR2kYRArzKAJ46mOwDkfW+bIC+HEKBROCYwHbk7wCfQCu+
yc0pj2yMXf+HUdJiVwq3Q/o=
=gA/y
-----END PGP SIGNATURE-----


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 17:28:26
Message-ID: 20061216172826.GB2244@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 16, 2006 at 11:48:48AM -0500, Tom Lane wrote:
> So I finally went and looked at a thesaurus, and came up with some
> possibilities:
>
> category variety family crowd cluster flock pack troop gang clique
> circle cabal association society union ensemble
>
> "Operator family" seems like it might be a good choice.

I like this one more than opcluster. opfamily has a good ring to it
also.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 17:32:10
Message-ID: 1166290330.31897.0.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2006-12-16 at 11:48 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > How about "hyperclass" then? :-) Maybe "school" :-D
>
> So I finally went and looked at a thesaurus, and came up with some
> possibilities:
>
> category variety family crowd cluster flock pack troop gang clique
> circle cabal association society union ensemble
>
> "Operator family" seems like it might be a good choice.
> One small problem with it is that in biology, "family" is a smaller
> group than "class" ... but if we want to follow the biologists,
> the only bigger groups than classes are "phylums" and "kingdoms",
> neither of which seem like good words to use.
>
> "Operator cluster" isn't too bad either, except that it might create
> confusion with the other index-related meaning of "cluster".

group?

Joshua D. Drake

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 19:43:05
Message-ID: 20061216194305.GB26451@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout wrote:
> On Sat, Dec 16, 2006 at 11:48:48AM -0500, Tom Lane wrote:
> > So I finally went and looked at a thesaurus, and came up with some
> > possibilities:
> >
> > category variety family crowd cluster flock pack troop gang clique
> > circle cabal association society union ensemble
> >
> > "Operator family" seems like it might be a good choice.
>
> I like this one more than opcluster. opfamily has a good ring to it
> also.

+1 on opfamily.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 19:53:55
Message-ID: 7451.1166298835@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Martijn van Oosterhout wrote:
>> On Sat, Dec 16, 2006 at 11:48:48AM -0500, Tom Lane wrote:
>>> "Operator family" seems like it might be a good choice.
>>
>> I like this one more than opcluster. opfamily has a good ring to it also.

> +1 on opfamily.

Done, unless anyone really doesn't like it.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-16 20:14:14
Message-ID: 7598.1166300054@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>> "Operator family" seems like it might be a good choice.
>>> I like this one more than opcluster. opfamily has a good ring to it also.
>> +1 on opfamily.
> Done, unless anyone really doesn't like it.

So I started revising my notes on what to change the catalog structure
to, and immediately came up with

pg_opfamily

OID
opfamid access method OID
opfname family name
opfnamespace family namespace
opfowner owner's OID

But "opfamid" is a really unfortunate name, because it looks way too
much like it's supposed to be the ID of the family itself, rather than
of the index AM it's associated with.

I'm thinking of changing it to "opfmethod", and similarly rename
pg_opclass.opcamid to opcmethod. That might break a few bits of code
that needn't be broken, but on the whole this patch will whack the
opclass-related catalogs around enough that most code looking at them
will need fixes anyway.

Comments?

regards, tom lane


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2006-12-18 10:32:53
Message-ID: E1539E0ED7043848906A8FF995BDA5790198F2D4@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> pg_opfamily
>
> OID
> opfamid access method OID
> opfname family name
> opfnamespace family namespace
> opfowner owner's OID
>
> But "opfamid" is a really unfortunate name, because it looks way too
> much like it's supposed to be the ID of the family itself, rather than
> of the index AM it's associated with.

I think by convention it is clear that we are talking about opf -- amid,
opfmethod imho removes the connex to amid so I wouldn't do it.

Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-18 16:49:30
Message-ID: 9671.1166460570@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> writes:
>> But "opfamid" is a really unfortunate name, because it looks way too
>> much like it's supposed to be the ID of the family itself, rather than
>> of the index AM it's associated with.

> I think by convention it is clear that we are talking about opf -- amid,
> opfmethod imho removes the connex to amid so I wouldn't do it.

I think that's fairly clear when you look at the catalog definition by
itself, but when you come across the name in running code it's likely to
be confusing. And it's not like "amid" is widely used to refer to index
access method OIDs.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Operator class group proposal
Date: 2006-12-22 15:48:23
Message-ID: 878xh00w54.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> A class group is associated with a specific index AM and can contain only
> opclasses for that AM. We might for instance invent "numeric" and
> "numeric_reverse" groups for btree, to contain the default opclasses and
> reverse-sort opclasses for the standard arithmetic types.

I thought that would just be formalizing what we currently have. But I just
discovered to my surprise tat it's not. I don't see any cross-data-type
operators between any of the integer types and numeric, or between any of the
floating point types and numeric, or between any of the integers and the
floating point types.

So does that mean we currently have three separate arithmetic "operator class
groups" such as they currently exist and you can't currently do merge joins
between some combinations of these arithmetic types?

What puzzles me is that we used to have problems with bigint columns where
people just did "WHERE bigint_col = 1". But my testing shows similar constructs
between integer and numeric or other types with no cross-data-type comparator
don't lead to similar problems. The system happily introduces casts now and
uses the btree operator. So I must have missed another change that was also
relevant to this in addition to the cross datatype operators.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-22 18:06:57
Message-ID: 26149.1166810817@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I thought that would just be formalizing what we currently have. But I just
> discovered to my surprise tat it's not. I don't see any cross-data-type
> operators between any of the integer types and numeric, or between any of the
> floating point types and numeric, or between any of the integers and the
> floating point types.

Correct.

> So does that mean we currently have three separate arithmetic "operator class
> groups" such as they currently exist and you can't currently do merge joins
> between some combinations of these arithmetic types?

No, what you'll get is something like

int4var::float8 float8eq float8var

which is perfectly mergejoinable ... however, it's not clear that the
planner will make very good estimates about the value of the cast
expression. I'm not sure if it's worth introducing a pile more
crosstype operators to change that situation --- improving
the selectivity functions to handle casts better might be a wiser
approach.

regards, tom lane


From: Gregory Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-22 21:21:25
Message-ID: 87tzznd3u2.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> No, what you'll get is something like
>
> int4var::float8 float8eq float8var
>
> which is perfectly mergejoinable ... however, it's not clear that the
> planner will make very good estimates about the value of the cast
> expression. I'm not sure if it's worth introducing a pile more
> crosstype operators to change that situation --- improving
> the selectivity functions to handle casts better might be a wiser
> approach.

So the only reason we needed the cross-data-type operators was to get better
estimates? I thought without them you couldn't get an index-based plan at all.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2006-12-23 03:42:23
Message-ID: 6347.1166845343@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> So the only reason we needed the cross-data-type operators was to get better
> estimates? I thought without them you couldn't get an index-based plan at all.

Oh, hm, there is that --- you won't get a nestloop with inner indexscan
unless the join expression uses the unmodified inner variable (unless
you do something weird like provide an index on the casted value...)

However, we need to be pretty wary about widening the families unless
we're sure that the semantics are right. In particular, I think that
numeric-vs-float crosstype operators would violate the transitive law:
you could have values for which A=B and B=C but A!=C. This is because
we smash numerics to float for comparison, and so there are distinct
numeric values that can compare equal to the same float. bigint against
float same problem. It'd be OK to integrate integers and numeric into
one class, but how much real value is there in that?

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <gsstark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2007-01-03 17:47:30
Message-ID: 87d55wvw7h.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hope you had a nice holiday.

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <gsstark(at)mit(dot)edu> writes:
>> So the only reason we needed the cross-data-type operators was to get better
>> estimates? I thought without them you couldn't get an index-based plan at all.
>
> Oh, hm, there is that --- you won't get a nestloop with inner indexscan
> unless the join expression uses the unmodified inner variable (unless
> you do something weird like provide an index on the casted value...)

Hm, so you're saying this case:

postgres=# create table a (a numeric);
CREATE TABLE
postgres=# create table b (b integer);
CREATE TABLE
postgres=# create index aa on a(a);
CREATE INDEX
postgres=# create index bb on b(b);
CREATE INDEX
postgres=# set enable_seqscan = off;
SET

So this case works (though I find the actual plan chosen to be a bit
surprising here):

postgres=# explain select * from a where a = 1;
QUERY PLAN
-----------------------------------------------------------------
Bitmap Heap Scan on a (cost=4.05..13.51 rows=6 width=32)
Recheck Cond: (a = 1::numeric)
-> Bitmap Index Scan on aa (cost=0.00..4.05 rows=6 width=0)
Index Cond: (a = 1::numeric)
(4 rows)

But this case doesn't work:

postgres=# explain select * from a join b on (a=b);
QUERY PLAN
--------------------------------------------------------------------------------
Merge Join (cost=100000149.78..100000448.70 rows=13161 width=36)
Merge Cond: (a.a = "inner"."?column2?")
-> Index Scan using aa on a (cost=0.00..62.45 rows=1230 width=32)
-> Sort (cost=100000149.78..100000155.13 rows=2140 width=4)
Sort Key: (b.b)::numeric
-> Seq Scan on b (cost=100000000.00..100000031.40 rows=2140 width=4)
(6 rows)

That doesn't seem to even fit your description; the join is in fact on the
unmodified inner variable here, no? Is this a bug?

> However, we need to be pretty wary about widening the families unless
> we're sure that the semantics are right. In particular, I think that
> numeric-vs-float crosstype operators would violate the transitive law:
> you could have values for which A=B and B=C but A!=C. This is because
> we smash numerics to float for comparison, and so there are distinct
> numeric values that can compare equal to the same float. bigint against
> float same problem. It'd be OK to integrate integers and numeric into
> one class, but how much real value is there in that?

I'm not sure. In all of my schemas I've rarely used NUMERIC and never for an
indexed column. But I'm sure there are users out there with databases
(probably existing databases from other database products) which use NUMERIC
extensively.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Gregory Stark" <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Operator class group proposal
Date: 2007-01-03 18:17:32
Message-ID: 16067.1167848252@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Merge Join (cost=100000149.78..100000448.70 rows=13161 width=36)
> Merge Cond: (a.a = "inner"."?column2?")
> -> Index Scan using aa on a (cost=0.00..62.45 rows=1230 width=32)
> -> Sort (cost=100000149.78..100000155.13 rows=2140 width=4)
> Sort Key: (b.b)::numeric
> -> Seq Scan on b (cost=100000000.00..100000031.40 rows=2140 width=4)

> That doesn't seem to even fit your description; the join is in fact on the
> unmodified inner variable here, no? Is this a bug?

No, the join condition is "a.a = b.b::numeric", because the only usable =
operator is numeric-eq-numeric. If we were to create a numeric-eq-int
operator then use of an indexscan on b would be possible. Whether this
is an issue in real-world usage is debatable, though. Certainly my
advice to anyone worried about the performance of such a join would be
to change the numeric column to an integer type --- without that you're
going to have performance problems anyway, just because numeric
arithmetic is slow.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <gsstark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Operator class group proposal
Date: 2007-01-03 19:17:22
Message-ID: 878xgjx6m5.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> Merge Join (cost=100000149.78..100000448.70 rows=13161 width=36)
>> Merge Cond: (a.a = "inner"."?column2?")
>> -> Index Scan using aa on a (cost=0.00..62.45 rows=1230 width=32)
>> -> Sort (cost=100000149.78..100000155.13 rows=2140 width=4)
>> Sort Key: (b.b)::numeric
>> -> Seq Scan on b (cost=100000000.00..100000031.40 rows=2140 width=4)
>
>> That doesn't seem to even fit your description; the join is in fact on the
>> unmodified inner variable here, no? Is this a bug?
>
> No, the join condition is "a.a = b.b::numeric", because the only usable =
> operator is numeric-eq-numeric. If we were to create a numeric-eq-int
> operator then use of an indexscan on b would be possible.

Hm, that's a thought. We could add some choice operators such as = and < even
if we didn't complete the set of operators to make a full transitively
merge-joinable set of operators. Not sure that buys much except some brain
cells dealing with the pg_operator columns though.

It does seem like those merge-joinable operator columns ought to be
automatically filled in based on the btree strategy numbers.

> is an issue in real-world usage is debatable, though. Certainly my
> advice to anyone worried about the performance of such a join would be
> to change the numeric column to an integer type --- without that you're
> going to have performance problems anyway, just because numeric
> arithmetic is slow.

Well I don't think those performance problems are comparable. Numeric
arithmetic is slow, and numeric storage is inefficient but both of those costs
are proportional. A user may well find it entirely tolerable that his database
runs 50% slower than his DBA tells him it can. But if his application runs
1000x slower or worse then effectively Postgres just doesn't work for his
application at all.

To be honest I'm just as happy to be told it's something we're not too
concerned about. pg_operator entries are a pain to maintain. We haven't seen
many of the "why is my query using a sequential scan" type queries since the
cross-data-type operators went in so perhaps my concern is misplaced.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com