Re: VARIANT / ANYTYPE datatype

From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VARIANT / ANYTYPE datatype
Date: 2011-05-06 21:54:36
Message-ID: 4DC46E1C.3070405@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for picking up on my mentioning union types; I wasn't sure if anyone did.

Merlin Moncure wrote:
> On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>> On 05/06/2011 04:08 PM, Alvaro Herrera wrote:
>>> Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:
>>>
>>>> I see VARIANT/ANYTYPE as the most general case of supporting union types,
>>>> which,
>>>> say, could have more specific examples of "allow any number or date here
>>>> but
>>>> nothing else". If VARIANT is supported, unions in general ought to be
>>>> also.
>>> Okay, so aside from the performance (storage reduction) gained, there's
>>> this argument for having variant/union types. It seems to me that this
>>> is indeed possible to build. Completely general VARIANT, though, is
>>> rather complex. A declared union, where you specify exactly which types
>>> can be part of the union, can be catalogued, so that the system knows
>>> exactly where to look when a type needs to be modified. A general
>>> VARIANT however looks complex to me to solve.
>>>
>>> The problem is this: if an user attempts to drop a type, and this type
>>> is used in a variant somewhere, we would lose the stored data. So the
>>> drop needs to be aborted. Similarly, if we alter a type (easy example:
>>> a composite type) used in a variant, we need to cascade to modify all
>>> rows using that composite.
>>>
>>> If the unions that use a certain type are catalogued, we at least know
>>> what tables to scan to cascade.
>>>
>>> In a general variant, the system catalogs do not have the information of
>>> what type each variant masquerades as. We would need to examine the
>>> variant's masqueraded types on each insert; if the current type is not
>>> found, add it. This seems a bit expensive.
>> So how is a declared union going to look and operate? Something like this?
>>
>> CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
>> CREATE TABLE bar (myunion foo);
>> INSERT INTO bar (myunion) VALUES (ival=>1), (tval=>'some text');
>> UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;
>>
>>
>> Something like that could actually be quite nice for a number of purposes.
>
> using your hypothetical example, could you cast types to the union?
>
> select 1::int::foo;
>
> record types would presumably work as well? you could do a lot of
> *really* neat stuff that way...

Like with other respondents to this topic, I consider it much more useful and
important, as well as practical, to support explicitly defined type unions than
a wide-open ANYTYPE.

As for how it would operate: The example above implies a union type
implementation that is like C's concept of such, where you have to explicitly
state how you want the value bit pattern to be interpreted, by naming ival/etc,
rather than the DBMS just knowing that a particular value is of a specific type,
because per-value stored metadata says so (like with SQLite).

While that might be best in practice for implementation, I had envisioned
something more like set unions, so you could instead do it like this:

CREATE TYPE foo AS UNION (int, text, timestamptz):
CREATE TABLE bar (myunion foo);
INSERT INTO bar (myunion) VALUES (1), ('some text');
UPDATE bar SET myunion = CURRENT_TIMESTAMP;

Unlike a record type, where multiple attributes may have the same time,
presumably with a union, they would all be distinct, and so you could use the
type name itself to refer to each option; you don't have to make up "ival" or
whatever ... unless there are situations where types don't have names.

When doing operations that are type-generic, such as equality test or
assignment, especially assignment, between 2 things that are both declared to be
of type foo, you could just do it with no verbosity same as if you were doing 2
int or text or whatever.

When doing operations specific to int or text or whatever, or if you are
assigning a foo-declared thing to an int/text-declared thing, then you'd want an
explicit cast or assertion, such as "select myunion::int + 3 as answer from bar
where is_a(myunion,'int')".

Whether you want an explicit cast to go the other way, I would assume you don't
need to, like when you have DOMAINs; eg, I would expect the 4th line above to
just work, because the system knows the type of CURRENT_TIMESTAMP and it knows
that this is a member of the union type of myunion. I see a UNION type as being
like a DOMAIN type in reverse.

-- Darren Duncan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-05-06 21:58:04 Re: patch for new feature: Buffer Cache Hibernation
Previous Message Robert Haas 2011-05-06 21:54:09 Re: Prefered Types