VARIANT / ANYTYPE datatype

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: VARIANT / ANYTYPE datatype
Date: 2011-05-04 16:58:13
Message-ID: 1304527158-sup-5344@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

A customer came to us with this request: a way to store "any" data in a
column. We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful. I didn't find any indication that SQL contains anything
resembling this.

The main idea is to be able to store column values in an audit table
like this:

change_time timestamptz
table_name name
column_name name
old_value variant
new_value variant
So per-column changes, which is much more convenient than the regular
idea of storing the whole NEW and/or OLD record(s).

Currently, they use text for old_value and new_value, but this is, of
course, not very satisfactory.

My thought was that a variant type would store the datum as
<typid><data> so that it would be possible to identify the datatype
stored in each column/row and interpret adequately, calling the
appropriate output function etc. On input it would be limited to come
only from inside the system, not from the outside world, as that would
have obvious security implications; so it'd be similar to pg_node_tree
in that regard.

Now this has obvious limitations: first, any query that tries to extract
data would need to include a cast of the variant value to the
appropriate type, so that the type can be resolved early. Thus,
trying to extract rows of different types would be forbidden.

Also, there would be a security problem with a datum storing something
whose datatype later changes (consider a user-defined record type or
things like that). My first reaction was to do something like
CREATE TYPE foo VARIANT OF (int, text, timestamptz);
and then you could declare old_value with type foo, which would only
allow values of the declared types. This makes it easy to catalogue
used types in any variant, and thus easy to restrict modifying or
dropping types that are used in some variant. However, this idea was
rejected by the customer due to the unusability: you would have to
remember to edit the variant to add the new type anytime you added a new
column to a table, which would be cumbersome.

What the customer suggested was to have a new fork, which stores
type OIDs of datatypes used in the variant. Then when a type is to be
altered, all forks would be scanned to determine if the type is used,
and raise an error if so. I rejected that idea as unworkable.

However, as a middle ground we agreed that we could allow a declared
variant to store any pinned type without restrictions; those can't be
changed or dropped so there's no safety concern. Storing other types
(user-defined types, records and so on) would require some safety net.

Before spending too much time exploring a detailed design, I'd like to
hear some opinions on the basic idea.

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-05-04 17:04:51 Re: patch for new feature: Buffer Cache Hibernation
Previous Message Jeff Janes 2011-05-04 16:55:39 Re: patch for new feature: Buffer Cache Hibernation