Domains and subtypes, a brief proposal

Lists: pgsql-hackers
From: elein <elein(at)varlena(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Domains and subtypes, a brief proposal
Date: 2006-09-07 22:36:01
Message-ID: 20060907223601.GO25669@varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


As many of you know I've been contemplating the implementation
of Domains and subtypes.

DISCLAIMER: This is a proposal only. The actual work needs to
be picked up by someone in a better place to work on the code
than I am. For various reasons, I can only be an active reference
and tester on this projects.

DISCLAIMER: This is a proposal, not a detailed spec. A more detailed
spec can be written with the help of a developer.

Domains and subtypes.

* Create new child type from values in parent type.
* Maintain only checks for constraints
* Create implicit casts from child to parent

With this model all of the type checking that exists should
work and domain checking of types in any place except for
constraint checking can be removed.

* Add proper SQL standard CREATE childtype UNDER parenttype
This should include all of the ordinary create type options.

Constraints on types:

* Change the pg_types to hold a NULLABLE constraint text column
OR add a type constraint lookup table (pg_domains?)

This is a big deal, I know. Constraints for domains would be in
that field. All domain checking should be done from that source.

This would theoretically enable type constraint checking for any type
if we chose to add this feature to create type. This would be appropriate
for base types only, including UDTs. There may be an argument for complex
types, though, for example certain types of integer-ish arrays may only contain
unique values. Usually types have the constraint definition encoded in
their input routines. This gives the type developers an easier option
to validate their types since constraints can be written in pl languages.

Problems and Issues:

There are a few known issues with the current implementation of domains
with overriding operators.

* Creating the table with an domain PRIMARY KEY did not use the subtype comparison function.
It was necessary to create a unique index which explicitly used the domain operator class.
This should be fixed by having the domain as a proper type. No domain checking should be necessary.
* ORDER BY requires USING op clause.
This may be fixed by having the domain as a proper type. No domain checking should be necessary.
* LIKE requires explicit casting of the second argument to text.
(I need to double check this. It may work OK on cvs head.)
* COPY has a problem which still need investigating.

Why do we want this.

* We need subtypes
* Domains are *almost* subtypes because you can override the operators
as you can with any type.
* Use the types as designed

The current code jumps through hoops to check for domain types. Ideally
this change should remove a lot of that code in favor of checking just
for the existence of a constraint and then only where constraint checking
is needed.

By using this technique Illustra easily added subtypes, but did not add the
domains with their additional complexities. No special type checking for
simple subtypes was necessary.

I may have missed some stuff here. Obviously. For example how to divide and
conquer the various aspects of the issues raised here. But this is a high, high
level proposal at this time.

Comments, volunteers are welcome.

--elein

--------------------------------------------------------------
elein(at)varlena(dot)com Varlena, LLC www.varlena.com
(510)655-2584(o) (510)543-6079(c)
PostgreSQL Consulting, Support & Training
--------------------------------------------------------------


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: elein <elein(at)varlena(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 02:12:17
Message-ID: 200609071912.17221.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Elein,

> I may have missed some stuff here. Obviously.  For example how to divide
> and conquer the various aspects of the issues raised here. But this is a
> high, high level proposal at this time.

I'm not quite clear on what in your proposal is different from current Domain
behavior. Or are you just looking to remove the limitations on where Domains
can be used?

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: elein <elein(at)varlena(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, elein <elein(at)varlena(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 17:51:39
Message-ID: 20060908175138.GB24798@varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 07, 2006 at 07:12:17PM -0700, Josh Berkus wrote:
> Elein,
>
> > I may have missed some stuff here. Obviously.  For example how to divide
> > and conquer the various aspects of the issues raised here. But this is a
> > high, high level proposal at this time.
>
> I'm not quite clear on what in your proposal is different from current Domain
> behavior. Or are you just looking to remove the limitations on where Domains
> can be used?
>
> --
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
>

I'm looking to make domains proper types and eliminate the need for
special domain checking in the case where it is not a check constraint
check. I'm trying to influence a more logical implementation of
domains that removes some code and gains us features. Also it
should eliminate some (all?) of the current limitations with regards
to using domains as subtypes. It will also pave the way for
implementation of create type under type.

--elein
elein(at)varlena(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: elein <elein(at)varlena(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 18:33:13
Message-ID: 21444.1157740393@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

elein <elein(at)varlena(dot)com> writes:
> Domains and subtypes.

> * Create new child type from values in parent type.
> * Maintain only checks for constraints
> * Create implicit casts from child to parent

This seems a bit content-free, because it's not clear how it differs
from what we do now. We already have implicit child-to-parent casts.

> Constraints on types:

> * Change the pg_types to hold a NULLABLE constraint text column
> OR add a type constraint lookup table (pg_domains?)

I understand that you are arguing to allow constraints to be associated
with any type not only domains, but
(a) I don't see why we should want to add that overhead, and
(b) I don't see what that has to do with the problem you actually
need to solve, specifically limiting the application of implicit
domain-to-base-type casts.

regards, tom lane


From: elein <elein(at)varlena(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: elein <elein(at)varlena(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 19:41:02
Message-ID: 20060908194102.GC24798@varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 08, 2006 at 02:33:13PM -0400, Tom Lane wrote:
> elein <elein(at)varlena(dot)com> writes:
> > Domains and subtypes.
>
> > * Create new child type from values in parent type.
> > * Maintain only checks for constraints
> > * Create implicit casts from child to parent
>
> This seems a bit content-free, because it's not clear how it differs
> from what we do now. We already have implicit child-to-parent casts.

I guess the key point here was to treat the domains as proper udt types
except where constraint checking is required. And yes, this is already
done, but it needed to be included for context.

>
> > Constraints on types:
>
> > * Change the pg_types to hold a NULLABLE constraint text column
> > OR add a type constraint lookup table (pg_domains?)
>
> I understand that you are arguing to allow constraints to be associated
> with any type not only domains, but
> (a) I don't see why we should want to add that overhead, and
> (b) I don't see what that has to do with the problem you actually
> need to solve, specifically limiting the application of implicit
> domain-to-base-type casts.

This is a new feature idea, derived from the implementation of domains.
Usually people store type checking in the input functions, but this is
a nice addition to UDTs that require a constraint checking model.
It allows the constraints to be in plperl which is nice for parsing
complex object stored at strings. (My example was email and the constraint
was a plperl function that validated the format and legitimacy of the value.)

Other complex objects (stored as strings) such as key value lists and
your ordinary weirdly constructed values can use the (more expensive)
constraint at constraint time only instead of the input function which
should remain fast and may be a borrowed or inherited input function.

This is not a drop dead required feature but it should flow from the
cleaner implementation of domains. Changing the check from domain type
to constraint exists on any type should be cleaner. Changing
the SQL for CREATE TYPE should be the added work to get this feature
available.

It just seems simpler and cleaner. We want to treat all types the
same and maintain a type blind database server.

a) if subtypes/domains can have constraints then the model should
not be different for domains only but for all types. Constraint
checking would only
occur at check constraint points--and there for any type. You
already check for the existance of a domain. Change that test
to the existence of a constraint only and eliminate domain specific
code.
b) It is not part of the problem but a logical stretch given the
changes required. It will also reduce the domain checking.

Other than my existing tests (published previously) I do not have a good
idea code wise the extent of the changes. This discussion may help
us get to that point.

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

elein
elein(at)varlena(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: elein <elein(at)varlena(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 19:47:23
Message-ID: 22711.1157744843@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

elein <elein(at)varlena(dot)com> writes:
> a) if subtypes/domains can have constraints then the model should
> not be different for domains only but for all types. Constraint
> checking would only
> occur at check constraint points--and there for any type. You
> already check for the existance of a domain. Change that test
> to the existence of a constraint only and eliminate domain specific
> code.

Au contraire, the test whether a constraint actually exists occurs at
runtime, not at the time we check for domain-ness. Your proposal would
force such checks to be introduced into every single expression
evaluation. It's not feasible at all without plan invalidation, and
even with that I foresee fairly enormous added overhead. Our experience
with domains so far is that looking up those constraints is *expensive*.

regards, tom lane


From: elein <elein(at)varlena(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: elein <elein(at)varlena(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 21:02:28
Message-ID: 20060908210228.GE24798@varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 08, 2006 at 03:47:23PM -0400, Tom Lane wrote:
> elein <elein(at)varlena(dot)com> writes:
> > a) if subtypes/domains can have constraints then the model should
> > not be different for domains only but for all types. Constraint
> > checking would only
> > occur at check constraint points--and there for any type. You
> > already check for the existance of a domain. Change that test
> > to the existence of a constraint only and eliminate domain specific
> > code.
>
> Au contraire, the test whether a constraint actually exists occurs at
> runtime, not at the time we check for domain-ness. Your proposal would
> force such checks to be introduced into every single expression
> evaluation. It's not feasible at all without plan invalidation, and
> even with that I foresee fairly enormous added overhead. Our experience
> with domains so far is that looking up those constraints is *expensive*.

For domain checking isn't expression evaluation required anyway?
email := email_value || email_value should fail on a constraint check for
result value at assignment time.

I think what you are saying is that the domain checking (proposed constraint
existence checking) would need to be done in more places and I'm not sure I
understand this.

I believe constraints checking should done less often than input
types. And checking for constraint <> NULL should be equivalent to the
current check *typtype != 'd'. I could be wrong base on the current
implementation. There may be more to it, but I suspect making sure the
constraint value is available when you fetch a type would be necessary.

Turn the thing around a bit. The contraint is an attribute on anytype.
All type code, only where appropriate, should check for existence of
the constraint attribute. This is different from saying domains as
special types and need special casing in places (other than constraint checking).
I'm trying to remove the specialness from domains so that the type
code can pretty well work as is in all places execpt checking for the
constraint attribute of a type. This should solve some of the existing
domain problems.

We're pretty close to this as is, but there is still a lot of special
casing going on. The ability to add constraints to any type should
only be considered as a logical extension made easier by the change
in what you test when you test for constraints.

elein
elein(at)varlena(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: elein <elein(at)varlena(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 21:20:18
Message-ID: 23683.1157750418@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

elein <elein(at)varlena(dot)com> writes:
> I think what you are saying is that the domain checking (proposed constraint
> existence checking) would need to be done in more places and I'm not sure I
> understand this.

What I'm complaining about is the need to search the catalogs to see if
a datatype has constraints. At the moment we need to do that only
for operations yielding domain types. Your proposal appears to require
that it be done for *every* operation on *every* datatype, right down
to int4 and bool (which at the very least creates some interesting
circularity issues). I'm not willing to accept that much overhead on
the strength of what is frankly a pretty weak case. If you want a
constraint, what's wrong with putting a domain on your base type to
enforce it?

> And checking for constraint <> NULL should be equivalent to the
> current check *typtype != 'd'.

Not without an amazingly complicated substructure to the "constraint"
column (multiple entries, names as well as expressions, etc). At the
very least that's a violation of relational theory, and I'm not sure how
we're going to handle dependencies of the constraint expressions at all
if they aren't separate catalog entries.

regards, tom lane


From: elein <elein(at)varlena(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: elein <elein(at)varlena(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and subtypes, a brief proposal
Date: 2006-09-08 23:44:49
Message-ID: 20060908234449.GG24798@varlena.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 08, 2006 at 05:20:18PM -0400, Tom Lane wrote:
> elein <elein(at)varlena(dot)com> writes:
> > I think what you are saying is that the domain checking (proposed constraint
> > existence checking) would need to be done in more places and I'm not sure I
> > understand this.
>
> What I'm complaining about is the need to search the catalogs to see if
> a datatype has constraints. At the moment we need to do that only
> for operations yielding domain types. Your proposal appears to require
> that it be done for *every* operation on *every* datatype, right down
> to int4 and bool (which at the very least creates some interesting
> circularity issues). I'm not willing to accept that much overhead on
> the strength of what is frankly a pretty weak case. If you want a
> constraint, what's wrong with putting a domain on your base type to
> enforce it?
>
> > And checking for constraint <> NULL should be equivalent to the
> > current check *typtype != 'd'.
>
> Not without an amazingly complicated substructure to the "constraint"
> column (multiple entries, names as well as expressions, etc). At the
> very least that's a violation of relational theory, and I'm not sure how
> we're going to handle dependencies of the constraint expressions at all
> if they aren't separate catalog entries.
>
> regards, tom lane

I'm seeing the constraint column as an attribute of the type. It would
at least flag existence of a constraint on a type. This is necessary for
domains as they work now (but we check typtype). This would also be a
catalog change, i.e. non-trivial. It would link types to constraints only
and replace the 'd' value of typtype. Perhaps my ignorance is showing
and I'm missing something. How you have the ability to select the constraint
from the domain name now is fuzzy to me. But I'm trying to move "domains"
from a type of type to just an attribute of type with the goal of simplifying
the behaviour.

The single value result of an expression, on assignment is the only candidate
for constraint checking. Other expression evaluation would behave as is
until assignment to a final result. Theoretically, you would not have to add
constraints other than those defined now. And this should already be working
though the implementation would change slightly with my proposal.

elein
elein(at)varlena(dot)com