Re: Better support for whole-row operations and composite

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Better support for whole-row operations and composite types
Date: 2004-03-29 19:18:32
Message-ID: 3072.1080587912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We have a number of issues revolving around the fact that composite types
(row types) aren't first-class objects. I think it's past time to fix
that. Here are some notes about doing it. I am not sure all these ideas
are fully-baked ... comments appreciated.

When represented as a Datum, the format of a row-type object needs to be
something like this:

* overall length: int4 (this makes the Datum a valid varlena item)
* row type id: Oid (either a composite type id or RECORDOID)
* row type typmod: int4 (see below for usage)
-- pad if needed to MAXALIGN boundary
* heap tuple representation, beginning with a HeapTupleHeaderData struct

If we do it exactly as above then we will be wasting some space, because
the xmin/xmax/cmax and ctid fields of HeapTupleHeaderData are of no use
in a row that isn't actually a table member row. It is very tempting to
overlay the length and rowtype fields with the HeapTupleHeaderData struct.
This would save some code as well as space --- see discussion below.

Only named composite types, not RECORD, will be allowed to be used as
table column types. This ensures that any row object stored on disk will
have a valid composite type ID embedded in it, so that the row structure
can be retrieved when the row is read. However, we want to be able to
support row objects in memory that are of transient record types (for
example, the output of a function returning RECORD will have a record type
determined by the query itself). I propose that we handle this case by
setting the type id to RECORDOID and using the typmod to identify the
particular record type --- the typmod will essentially be an index into
a backend-local cache of record types. More detail below.

We'll add "tdtypeid" and "tdtypmod" fields to TupleDesc structs. This
will make it easy to set the embedded type information correctly when
manufacturing a row datum using a TupleDesc. For TupleDescs associated
with relations, tdtypeid is just the relation's row type OID, and tdtypmod
is -1. For TupleDescs representing transient row types, we initially set
tdtypeid to RECORDOID and tdtypmod to -1 (indicating a completely
anonymous row type). If the row type actually needs to be identifiable
then we establish a cache entry for it and set the typmod to an index for
the cache entry. I think this will only need to happen when the query
contains a function-returning-RECORD or a whole-row variable referencing
what would otherwise be an anonymous row type, such as a JOIN result.

Composite types, as well as the RECORD type, will be marked in pg_type as
pass-by-ref, varlena (typlen -1), typalign 'd'. (We will use the maximum
alignment always to avoid any dependency on types of the contained
columns.)

The present function call and return conventions involving TupleTableSlots
will be replaced by simply passing and returning these row objects as
pass-by-reference Datums. In the case of functions returning rowtypes,
we'll continue to support the present ReturnSetInfo convention for
returning a separate TupleDesc describing the result type --- but this
will just be a crosscheck.

We will be able to make generic I/O routines for composite types,
comparable to those used now for arrays. Not sure what a convenient
external format would look like. (Possibly use the same conventions as
for a 1-D array?) We will need to make the convention that the type OID
of a composite type is passed to the input routine, in the same way that
an array input routine gets the typelem OID; else the input routine won't
know what to do.

We could also think about allowing functions that are declared as
accepting RECORD (ie, polymorphic-across-row-types functions). They would
use the same methods already used by polymorphic functions to find out the
true types of their inputs. (Might be best to invent a separate
pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.)

The recently developed SRF API is a bit unfortunate since it exposes the
assumption that a TupleTableSlot must be involved in returning a tuple.
If we don't overlay the Datum header with HeapTupleHeader then I think we
have to make TupleGetDatum copy the passed tuple and insert the row type
info from the slot's tupledesc, which'd be pretty inefficient because it
means making an extra copy of the row data. But if we do overlay the
header fields, then I think we can set up backwards-compatibility
definitions in which the slot is simply ignored. Specifically:

TupleDescGetSlot: no-op, returns NULL
TupleGetDatum: ignore slot, return tuple t_data pointer as datum

This will work because heap_formtuple and BuildTupleFromCStrings can
return a HeapTuple whose t_data part is already a valid row Datum, simply
by setting the appropriate length and type fields in it. (If the tuple is
ever stored to disk as a regular table row, these fields will be
overwritten with xmin/cmin info at that time.)

To convert a row Datum into something that can be passed to heap_getattr,
one could use a local variable of type HeapTupleData and set its t_data
field to the datum's pointer value. t_len is copied from the datum
contents, while the other fields of HeapTupleData can just be set to
zeroes.

ExecEvalVar for a whole-row reference will need to copy the scan tuple so
that it can insert the correct length and tuple type fields. (We cannot
scribble on the tuple as it sits in the disk buffer, of course.)
Fortunately this shouldn't be a major memory leak anymore since the copy
can be made in the current short-lived memory context.

Handling anonymous RECORD types
-------------------------------

I envision expanding typcache.c to be able to store TupleDesc structures
for composite and record types. In the case of regular composite types
this is not especially difficult. For record types, we are essentially
trying to make a backend-local mapping from typmod values to TupleDescs.
There are a couple of interesting points:

* We have to be able to re-use an already-existing cache entry if it
matches a requested TupleDesc. This avoids indefinite growth of the type
cache over many queries. There could still be issues with memory leakage
if a single backend session uses a huge number of distinct record types
over its lifetime, but that doesn't seem likely to be an issue in
practice. (We could avoid this problem by recycling no-longer-needed
cache entries, but what with plan caching I'm not sure there's any
pleasant way to do that. For the moment I intend that cache entries for
record types will live for the life of the backend.)

* Since record typmod values are backend-local, they aren't meaningful in
query structures stored on disk. When a stored rule is read in, we'll
need to be able to replace any embedded typmod values with correct
assignments for the current backend.

Safely storing composite types on disk
--------------------------------------

If a composite row value contains any out-of-line TOAST references, we'd
have to expand those references before we could safely store the value on
disk. This can be handled by the same tuptoaster.c routines that are
already concerned with replacing unsafe references.

ALTER TABLE issues
------------------

If an ALTER TABLE command does something that requires examining or
changing every row of a table, it would presumably have to do the same to
all entries in any composite-type column of the table's rowtype. To avoid
surprises and interesting debates about who has permissions to do this,
it might be wise to restrict on-disk composite columns to be only of
standalone composite types (ie, those made with CREATE TYPE AS). This
restriction would also avoid debates about whether table constraints apply
to composite-type columns.

Notes
-----

While doing this, we should once and for all rip out the last vestiges of
the "attisset" feature.

Add an Assert to ExecEvalVar that checks that whole-row vars (and, I
guess, any system column as well) are fetched from a scan tuple, never the
inner or outer side of a join. If they've not been converted into
ordinary field references in a join, it's too late.

The current API for TypeGetTupleDesc is somewhat bogus --- I don't think
the "column alias" option is really appropriate, and it is lacking a
typmod argument so it can't be used with record types. We shall have to
deprecate it in favor of a new routine.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-03-29 22:54:05
Message-ID: 87ad1zs1rm.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> We have a number of issues revolving around the fact that composite types
> (row types) aren't first-class objects. I think it's past time to fix
> that.
...
> Only named composite types, not RECORD, will be allowed to be used as
> table column types.

If I understand what you're talking about, you would be allowed to CREATE TYPE
a composite type, like say, "address" and then use that as a datatype all over
your database? And then if you find "address" needs a new field you can add it
to the type and automatically have it added all over your database to any
table column using that type?

Speaking as a user, that would be **very** nice. I've often found myself
wishing for just such a feature. It would simplify data model maintenance a
whole heck of a lot.

How will client programs see the data if i do a "select *"? In my ideal world
it would be shipped over in a binary representation that a driver would
translate to a perl hash / php array / whatever. But maybe it would be simpler
to just ship them over the subcolumns with names like "shipping.line_1" and
"shipping.country".

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-03-29 23:15:38
Message-ID: 14491.1080602138@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> If I understand what you're talking about, you would be allowed to
> CREATE TYPE a composite type, like say, "address" and then use that as
> a datatype all over your database? And then if you find "address"
> needs a new field you can add it to the type and automatically have it
> added all over your database to any table column using that type?

I believe that would work, though you might have some issues with cached
plans.

> How will client programs see the data if i do a "select *"?

TBD.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Postgresql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support for whole-row operations and composite
Date: 2004-03-30 15:19:21
Message-ID: 40698FF9.3090503@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>We have a number of issues revolving around the fact that composite types
>(row types) aren't first-class objects. I think it's past time to fix
>that. Here are some notes about doing it. I am not sure all these ideas
>are fully-baked ... comments appreciated.
>
>
>
[snip]

>Only named composite types, not RECORD, will be allowed to be used as
>table column types.
>
[snip]

Interesting. I'm slightly curious to know if there's an external driver
for this.

Will this apply recursively (an a has a b which has an array of c's)?
Are there indexing implications? Could one index on a subfield?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Postgresql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support for whole-row operations and composite
Date: 2004-03-30 19:09:21
Message-ID: 11089.1080673761@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> Only named composite types, not RECORD, will be allowed to be used as
>> table column types.

> Interesting. I'm slightly curious to know if there's an external driver
> for this.

There's noplace to store a permanent record of an anonymous rowtype's
structure. To do otherwise would amount to executing an implicit CREATE
TYPE AS for the user, so we might as well just say up front that you
have to create the type.

> Will this apply recursively (an a has a b which has an array of c's)?

Yup.

> Are there indexing implications? Could one index on a subfield?

Using an expression index, sure. I don't think we need to support it as
a "primitive" index type.

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-02 18:08:20
Message-ID: 406DAC14.1080408@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> We have a number of issues revolving around the fact that composite
> types (row types) aren't first-class objects. I think it's past time
> to fix that. Here are some notes about doing it. I am not sure all
> these ideas are fully-baked ... comments appreciated.

[Sorry for the delay in responding]

Nice work, and in general it makes sense to me. A few comments below.

> We will be able to make generic I/O routines for composite types,
> comparable to those used now for arrays. Not sure what a convenient
> external format would look like. (Possibly use the same conventions
> as for a 1-D array?)

So you mean like an array, but with possibly mixed datatypes?

'{1 , "abc def", 2.3}'

Seems to make sense.

Another option might be to use the ROW keyword, something like:

ROW[1 , 'abc', 2.3]

> We could also think about allowing functions that are declared as
> accepting RECORD (ie, polymorphic-across-row-types functions). They
> would use the same methods already used by polymorphic functions to
> find out the true types of their inputs. (Might be best to invent a
> separate pseudotype, say ANYRECORD, rather than overloading RECORD
> for this purpose.)

Check. I really like this idea.

> TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot,
> return tuple t_data pointer as datum
>
> This will work because heap_formtuple and BuildTupleFromCStrings can
> return a HeapTuple whose t_data part is already a valid row Datum,
> simply by setting the appropriate length and type fields in it. (If
> the tuple is ever stored to disk as a regular table row, these fields
> will be overwritten with xmin/cmin info at that time.)

Is this the way you did things in your recent commit?

> To convert a row Datum into something that can be passed to
> heap_getattr, one could use a local variable of type HeapTupleData
> and set its t_data field to the datum's pointer value. t_len is
> copied from the datum contents, while the other fields of
> HeapTupleData can just be set to zeroes.

I think I understand this, but an example would help.

> * We have to be able to re-use an already-existing cache entry if it
> matches a requested TupleDesc.

For anonymous record types, how will that lookup be done efficiently?
Can the hash key be an array of attribute oids?

> If an ALTER TABLE command does something that requires examining or
> changing every row of a table, it would presumably have to do the
> same to all entries in any composite-type column of the table's
> rowtype. To avoid surprises and interesting debates about who has
> permissions to do this, it might be wise to restrict on-disk
> composite columns to be only of standalone composite types (ie, those
> made with CREATE TYPE AS). This restriction would also avoid debates
> about whether table constraints apply to composite-type columns.

I agree.

As an aside, it would be quite useful to have support for arrays of
tuples. Any idea on how to do that without needing to define an explicit
array type for each tuple type?

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-04-02 19:28:02
Message-ID: 3934.1080934082@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> Tom Lane wrote:
>> We will be able to make generic I/O routines for composite types,
>> comparable to those used now for arrays. Not sure what a convenient
>> external format would look like. (Possibly use the same conventions
>> as for a 1-D array?)

> So you mean like an array, but with possibly mixed datatypes?
> '{1 , "abc def", 2.3}'
> Seems to make sense.

The unresolved question in my mind is how to represent NULL elements.
However, we have to solve that sooner or later for arrays too. Any
thoughts?

> Another option might be to use the ROW keyword, something like:
> ROW[1 , 'abc', 2.3]

This is a separate issue, just as the ARRAY[] constructor has different
uses from the array I/O representation. I do want some kind of runtime
constructor, but ROW[...] doesn't get the job done because it doesn't
provide any place to specify the rowtype name. Maybe we could combine
ROW[...] with some sort of cast notation?

ROW[1 , 'abc', 2.3] :: composite_type_name
CAST(ROW[1 , 'abc', 2.3] AS composite_type_name)

Does SQL99 provide any guidance here?

>> TupleDescGetSlot: no-op, returns NULL TupleGetDatum: ignore slot,
>> return tuple t_data pointer as datum
>>
>> This will work because heap_formtuple and BuildTupleFromCStrings can
>> return a HeapTuple whose t_data part is already a valid row Datum,
>> simply by setting the appropriate length and type fields in it. (If
>> the tuple is ever stored to disk as a regular table row, these fields
>> will be overwritten with xmin/cmin info at that time.)

> Is this the way you did things in your recent commit?

Almost. I ended up keeping TupleDescGetSlot as a live function, but its
true purpose is only to ensure that the tupledesc gets registered with
the type cache (see BlessTupleDesc() in CVS tip). The slot per se never
gets used. I believe that CVS tip is source-code-compatible with
existing SRFs, even though I adjusted all the ones in the distribution
to stop using the TupleTableSlot stuff.

The main point though is that row Datums now contain sufficient info
embedded in them to allow runtime type lookup the same as we do for arrays.

>> To convert a row Datum into something that can be passed to
>> heap_getattr, one could use a local variable of type HeapTupleData
>> and set its t_data field to the datum's pointer value. t_len is
>> copied from the datum contents, while the other fields of
>> HeapTupleData can just be set to zeroes.

> I think I understand this, but an example would help.

There are several in the PL sources now, for instance plpgsql does this
with an incoming rowtype argument:

if (!fcinfo->argnull[i])
{
HeapTupleHeader td;
Oid tupType;
int32 tupTypmod;
TupleDesc tupdesc;
HeapTupleData tmptup;

td = DatumGetHeapTupleHeader(fcinfo->arg[i]);

/* Extract rowtype info and find a tupdesc */
tupType = HeapTupleHeaderGetTypeId(td);
tupTypmod = HeapTupleHeaderGetTypMod(td);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);

/* Build a temporary HeapTuple control structure */
tmptup.t_len = HeapTupleHeaderGetDatumLength(td);
ItemPointerSetInvalid(&(tmptup.t_self));
tmptup.t_tableOid = InvalidOid;
tmptup.t_data = td;

exec_move_row(&estate, NULL, row, &tmptup, tupdesc);
}

This is okay because the HeapTupleData is not needed after the call to
exec_move_row.

>> * We have to be able to re-use an already-existing cache entry if it
>> matches a requested TupleDesc.

> For anonymous record types, how will that lookup be done efficiently?
> Can the hash key be an array of attribute oids?

Right, that's the way I did it. See src/backend/utils/cache/typcache.c

> As an aside, it would be quite useful to have support for arrays of
> tuples. Any idea on how to do that without needing to define an explicit
> array type for each tuple type?

Hmm, messy ...

I wonder now whether we still really need a separate pg_type entry for
every array type. The original motivation for doing that has been at
least partly subsumed by storing element type OIDs inside the arrays
themselves. I wonder if we could go over to a scheme where, say,
atttypid is the base type ID and attndims being nonzero is what you
check to find out it's really an array of atttypid. Not sure how we
could map that idea into function and expression args/results, though.

Plan B would be to go ahead and create array types. Not sure I would
want to do this for table rowtypes, but if we did it only for CREATE
TYPE AS then it doesn't sound like an unreasonable amount of overhead.

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-03 01:18:59
Message-ID: 406E1103.8050009@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>>So you mean like an array, but with possibly mixed datatypes?
>>'{1 , "abc def", 2.3}'
>>Seems to make sense.
>
> The unresolved question in my mind is how to represent NULL elements.
> However, we have to solve that sooner or later for arrays too. Any
> thoughts?

Good point. What's really ugly is that the external representation of
string types differs depending on whether quotes are needed or not. If
strings were *always* surrounded by quotes, we could just use the word
NULL, without the quotes.

>>Another option might be to use the ROW keyword, something like:
>>ROW[1 , 'abc', 2.3]
>
>
> This is a separate issue, just as the ARRAY[] constructor has different
> uses from the array I/O representation. I do want some kind of runtime
> constructor, but ROW[...] doesn't get the job done because it doesn't
> provide any place to specify the rowtype name. Maybe we could combine
> ROW[...] with some sort of cast notation?
>
> ROW[1 , 'abc', 2.3] :: composite_type_name
> CAST(ROW[1 , 'abc', 2.3] AS composite_type_name)
>
> Does SQL99 provide any guidance here?

The latter seems to agree with 6.12 (<cast specification>) of SQL2003.
I'd think we'd want the former supported anyway as an extension to standard.

> Almost. I ended up keeping TupleDescGetSlot as a live function, but its
> true purpose is only to ensure that the tupledesc gets registered with
> the type cache (see BlessTupleDesc() in CVS tip). The slot per se never
> gets used. I believe that CVS tip is source-code-compatible with
> existing SRFs, even though I adjusted all the ones in the distribution
> to stop using the TupleTableSlot stuff.

Almost compatible. I found that, to my surprise, PL/R compiles with no
changes after your commit. However it no segfaults (as I expected) on
composite type arguments. Should be easy to fix though (I think, really
haven't looked at it hard yet).

> The main point though is that row Datums now contain sufficient info
> embedded in them to allow runtime type lookup the same as we do for arrays.

Sounds good to me.

> There are several in the PL sources now, for instance plpgsql does this
> with an incoming rowtype argument:

Perfect -- thanks.

>>As an aside, it would be quite useful to have support for arrays of
>>tuples. Any idea on how to do that without needing to define an explicit
>>array type for each tuple type?
>
> Hmm, messy ...
>
> I wonder now whether we still really need a separate pg_type entry for
> every array type. The original motivation for doing that has been at
> least partly subsumed by storing element type OIDs inside the arrays
> themselves. I wonder if we could go over to a scheme where, say,
> atttypid is the base type ID and attndims being nonzero is what you
> check to find out it's really an array of atttypid. Not sure how we
> could map that idea into function and expression args/results, though.

Hmmm. I had thought maybe we could use a single datatype (anyarray?)
with in/out functions that would need to do the right thing based on the
element type. This would also allow, for example, arrays-of-arrays,
which is the way that SQL99/2003 seem to allow for multidimensional arrays.

> Plan B would be to go ahead and create array types. Not sure I would
> want to do this for table rowtypes, but if we did it only for CREATE
> TYPE AS then it doesn't sound like an unreasonable amount of overhead.

I was hoping we wouldn't need to do that.

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-04-03 03:13:15
Message-ID: 11055.1080961995@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> Tom Lane wrote:
>> ... I believe that CVS tip is source-code-compatible with
>> existing SRFs, even though I adjusted all the ones in the distribution
>> to stop using the TupleTableSlot stuff.

> Almost compatible. I found that, to my surprise, PL/R compiles with no
> changes after your commit. However it no segfaults (as I expected) on
> composite type arguments. Should be easy to fix though (I think, really
> haven't looked at it hard yet).

Let me know what you find out --- if I missed a trick on compatibility,
there's still plenty of time to fix it.

>> ... I wonder if we could go over to a scheme where, say,
>> atttypid is the base type ID and attndims being nonzero is what you
>> check to find out it's really an array of atttypid. Not sure how we
>> could map that idea into function and expression args/results, though.

> Hmmm. I had thought maybe we could use a single datatype (anyarray?)
> with in/out functions that would need to do the right thing based on the
> element type.

If we have just one datatype, how will the parser determine the type of
a "foo[subscript]" expression? After thinking a bit, I don't see how to
do that except by adding an out-of-line decoration to the underlying
type, somewhat like we do for "setof" or atttypmod. This is doable as
far as the backend itself is concerned, but the compatibility
implications for clients and user-written extensions seem daunting :-(

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-03 04:28:29
Message-ID: 406E3D6D.6080902@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>>Almost compatible. I found that, to my surprise, PL/R compiles with no
>>changes after your commit. However it no segfaults (as I expected) on
>>composite type arguments. Should be easy to fix though (I think, really
>>haven't looked at it hard yet).
>
> Let me know what you find out --- if I missed a trick on compatibility,
> there's still plenty of time to fix it.

I still haven't had time to look closely, and well may have been doing
something non-standard all along, but in any case this is the current
failing code:

else if (function->arg_is_rel[i])
{
/* for tuple args, convert to a one row data.frame */
TupleTableSlot *slot = (TupleTableSlot *) arg[i];
HeapTuple tuples = slot->val;
TupleDesc tupdesc = slot->ttc_tupleDescriptor;

PROTECT(el = pg_tuple_get_r_frame(1, &tuples, tupdesc));
}

The problem was (I think -- I'll check a little later) that
slot->ttc_tupleDescriptor is now '\0'.

>>Hmmm. I had thought maybe we could use a single datatype (anyarray?)
>>with in/out functions that would need to do the right thing based on the
>>element type.
>
> If we have just one datatype, how will the parser determine the type of
> a "foo[subscript]" expression? After thinking a bit, I don't see how to
> do that except by adding an out-of-line decoration to the underlying
> type, somewhat like we do for "setof" or atttypmod. This is doable as
> far as the backend itself is concerned, but the compatibility
> implications for clients and user-written extensions seem daunting :-(

I'll think-about/play-with this some more, hopefully this weekend.

Thanks,

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-04-03 05:02:27
Message-ID: 11811.1080968547@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> I still haven't had time to look closely, and well may have been doing
> something non-standard all along, but in any case this is the current
> failing code:

> /* for tuple args, convert to a one row data.frame */
> TupleTableSlot *slot = (TupleTableSlot *) arg[i];
> HeapTuple tuples = slot->val;
> TupleDesc tupdesc = slot->ttc_tupleDescriptor;

Um. Well, the arg is not a TupleTableSlot * anymore, so this is
guaranteed to fail. This isn't part of what I thought the documented
SRF API was though. If you take the arg[i] value and pass it to
GetAttributeByName or GetAttributeByNum it will work (with some compiler
warnings) and AFAICS we never documented more than that.

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-03 06:00:54
Message-ID: 406E5316.2080808@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>> /* for tuple args, convert to a one row data.frame */
>> TupleTableSlot *slot = (TupleTableSlot *) arg[i]; HeapTuple tuples
>> = slot->val; TupleDesc tupdesc = slot->ttc_tupleDescriptor;
>
> Um. Well, the arg is not a TupleTableSlot * anymore, so this is
> guaranteed to fail. This isn't part of what I thought the documented
> SRF API was though.

I'm sure you're correct. The SRF API was for user defined functions, not
procedural languages anyway. I'll look at how the other procedural
languages handle tuple arguments.

> If you take the arg[i] value and pass it to GetAttributeByName or
> GetAttributeByNum it will work (with some compiler warnings) and
> AFAICS we never documented more than that.

OK, thanks,

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-04-03 06:25:40
Message-ID: 12365.1080973540@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> ... The SRF API was for user defined functions, not
> procedural languages anyway. I'll look at how the other procedural
> languages handle tuple arguments.

It was a dozen-or-so-lines change in each of the PL languages AFAIR.
You will probably also want to look at what you do to return tuple
results.

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-03 06:33:49
Message-ID: 406E5ACD.2020509@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>>... The SRF API was for user defined functions, not
>>procedural languages anyway. I'll look at how the other procedural
>>languages handle tuple arguments.
>
> It was a dozen-or-so-lines change in each of the PL languages AFAIR.
> You will probably also want to look at what you do to return tuple
> results.

OK, thanks.

Just for reference, what is arg[i] if it isn't a (TupleTableSlot *)
anymore -- is it just a HeapTuple?

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-04-03 06:42:44
Message-ID: 12527.1080974564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> Just for reference, what is arg[i] if it isn't a (TupleTableSlot *)
> anymore -- is it just a HeapTuple?

No, it's a HeapTupleHeader pointer. You need to reconstruct a HeapTuple
on top of that to work with heap_getattr and most other core backend
routines. Also don't forget to ensure that you detoast the datum;
this is not useful at the moment but will be important Real Soon Now.
I added standard argument-fetch macros to fmgr.h to help with the
detoasting bit.

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-03 19:42:09
Message-ID: 406F1391.9090006@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> No, it's a HeapTupleHeader pointer. You need to reconstruct a
> HeapTuple on top of that to work with heap_getattr and most other
> core backend routines.

Thanks.

For triggers, I was previously building up the arguments thus:

slot = TupleDescGetSlot(tupdesc);
slot->val = trigdata->tg_trigtuple;
arg[7] = PointerGetDatum(slot);

I suppose now I should do this instead?

arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data);

> Also don't forget to ensure that you detoast the datum; this is not
> useful at the moment but will be important Real Soon Now. I added
> standard argument-fetch macros to fmgr.h to help with the detoasting
> bit.

OK. This is the net result:

#ifdef PG_VERSION_75_COMPAT
Oid tupType;
int32 tupTypmod;
TupleDesc tupdesc;
HeapTuple tuple = palloc(sizeof(HeapTupleData));
HeapTupleHeader tuple_hdr = DatumGetHeapTupleHeader(arg[i]);

tupType = HeapTupleHeaderGetTypeId(tuple_hdr);
tupTypmod = HeapTupleHeaderGetTypMod(tuple_hdr);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);

tuple->t_len = HeapTupleHeaderGetDatumLength(tuple_hdr);
ItemPointerSetInvalid(&(tuple->t_self));
tuple->t_tableOid = InvalidOid;
tuple->t_data = tuple_hdr;

PROTECT(el = pg_tuple_get_r_frame(1, &tuple, tupdesc));

pfree(tuple);
#else
TupleTableSlot *slot = (TupleTableSlot *) arg[i];
HeapTuple tuple = slot->val;
TupleDesc tupdesc = slot->ttc_tupleDescriptor;

PROTECT(el = pg_tuple_get_r_frame(1, &tuple, tupdesc));
#endif /* PG_VERSION_75_COMPAT */

Given the above changes, it's almost working now -- only problem left is
with triggers:

insert into foo values(11,'cat99',1.89);
+ ERROR: record type has not been registered
+ CONTEXT: In PL/R function rejectfoo

delete from foo;
+ ERROR: cache lookup failed for type 0
+ CONTEXT: In PL/R function rejectfoo

(and a few other similar failures)

Any ideas why the trigger tuple type isn't registered, or what I'm doing
wrong?

Thanks,

Joe


From: Joe Conway <mail(at)joeconway(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-03 23:05:40
Message-ID: 406F4344.8060301@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway wrote:
> Given the above changes, it's almost working now -- only problem left is
> with triggers:
>
>
> insert into foo values(11,'cat99',1.89);
> + ERROR: record type has not been registered
> + CONTEXT: In PL/R function rejectfoo
>
> delete from foo;
> + ERROR: cache lookup failed for type 0
> + CONTEXT: In PL/R function rejectfoo
>
> (and a few other similar failures)
>
> Any ideas why the trigger tuple type isn't registered, or what I'm doing
> wrong?

A little more info on this. It appears that the tuple type is set to
either 2249 (RECORDOID) or 0. In the case of RECORDOID this traces all
the way back to here:

/* ----------------------------------------------------------------
* CreateTemplateTupleDesc
*
* This function allocates and zeros a tuple descriptor structure.
*
* Tuple type ID information is initially set for an anonymous record
* type; caller can overwrite this if needed.
* ----------------------------------------------------------------
*/

But the type id is never overwritten for a BEFORE INSERT trigger. It
appears that somewhere it is explictly set to InvalidOid for both BEFORE
DELETE and AFTER INSERT triggers (and possibly others). My take is that
we now need to explicitly set the tuple type id for INSERT/UPDATE/DELETE
statements -- not sure where the best place to do that is though. Does
this sound correct?

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite types
Date: 2004-04-04 04:05:36
Message-ID: 20442.1081051536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
> For triggers, I was previously building up the arguments thus:
> slot = TupleDescGetSlot(tupdesc);
> slot->val = trigdata->tg_trigtuple;
> arg[7] = PointerGetDatum(slot);

> I suppose now I should do this instead?
> arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data);

Hm, no, that won't work because a tuple being passed to a trigger
probably isn't going to contain valid type information. The API for
calling triggers is different from calling ordinary functions, so
I never thought about trying to make it look the same. At what point
are you trying to do the above, anyway?

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-04 04:25:28
Message-ID: 406F8E38.9030302@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> Joe Conway <mail(at)joeconway(dot)com> writes:
>
>>For triggers, I was previously building up the arguments thus:
>> slot = TupleDescGetSlot(tupdesc);
>> slot->val = trigdata->tg_trigtuple;
>> arg[7] = PointerGetDatum(slot);
>
>
>>I suppose now I should do this instead?
>> arg[7] = PointerGetDatum(trigdata->tg_trigtuple->t_data);
>
>
> Hm, no, that won't work because a tuple being passed to a trigger
> probably isn't going to contain valid type information. The API for
> calling triggers is different from calling ordinary functions, so
> I never thought about trying to make it look the same. At what point
> are you trying to do the above, anyway?

That's a shame -- it used to work fine -- done this way so the same
function could handle tuple arguments to regular functions, and old/new
tuples to trigger functions. It is in plr_trigger_handler(); vaguely
similar to pltcl_trigger_handler(). I'll have to figure out a workaround
I guess.

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-04 04:25:33
Message-ID: 20581.1081052733@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joe Conway <mail(at)joeconway(dot)com> writes:
>> Any ideas why the trigger tuple type isn't registered, or what I'm doing
>> wrong?

> A little more info on this. It appears that the tuple type is set to
> either 2249 (RECORDOID) or 0.

After further thought, we could possibly make it work for BEFORE
triggers, but there's just no way for AFTER triggers: in that case what
you are getting is an image of what went to disk, which is going to
contain transaction info not type info.

If you really want the trigger API for PL/R to be indistinguishable from
the function-call API, then I think you will need to copy the passed
tuple and insert type information. This is more or less what
ExecEvalVar does now in the whole-tuple case (the critical code is
actually in heap_getsysattr though):

HeapTupleHeader dtup;

dtup = (HeapTupleHeader) palloc(tup->t_len);
memcpy((char *) dtup, (char *) tup->t_data, tup->t_len);

HeapTupleHeaderSetDatumLength(dtup, tup->t_len);
HeapTupleHeaderSetTypeId(dtup, tupleDesc->tdtypeid);
HeapTupleHeaderSetTypMod(dtup, tupleDesc->tdtypmod);

result = PointerGetDatum(dtup);

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Better support for whole-row operations and composite
Date: 2004-04-04 05:10:51
Message-ID: 406F98DB.6040304@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> If you really want the trigger API for PL/R to be indistinguishable from
> the function-call API, then I think you will need to copy the passed
> tuple and insert type information. This is more or less what
> ExecEvalVar does now in the whole-tuple case (the critical code is
> actually in heap_getsysattr though):

That got me there. It may not be the best in terms of pure speed, but it
is easier and simpler than refactoring, at least at the moment. And I
don't think the reason people will choose PL/R for triggers is speed in
any case ;-)

Thanks!

Joe