Re: libpq support for arrays and composites

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: libpq support for arrays and composites
Date: 2008-06-08 23:59:46
Message-ID: 484C7272.5000907@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


One of the areas where libpq seems to be severely lacking is in handling
arrays and composites in query results. I'd like to set about rectifying
that.

Ideally this would mean that drivers using libpq could easily and
reliably deliver such objects suitably structured in their particular
languages (e.g. lists and hashes in Perl).

One complicating factor I see is that there is no protocol level support
for anything other than simple objects - each data value is simply a
stream of bytes of a known length. We would therefore need some pretty
robust processing to pick apart structured objects.

We'll need a new API to handle such objects. I'm thinking of something like:

PQarray * PQgetArray( const PGresult *res, int row_number, int
column_number);
int PQgetArrayNDims(PQarray * array);
int PQgetArrayLower(PQarray * array, int dim);
int PQgetArrayUpper(PQarray * array, int dim);
int PQgetArrayElementLength(PQarray * array, int dim1, ...);
bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...);
char * PQgetArrayElement(PQarray * array, int dim1, ...);

PQcomposite * PQgetComposite(const PGresult *res, int row_number, int
column_number);
PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...);
int PQgetCompositeNFields(PQcomposite * composite);
char * PQgetCompositeFName(PQcomposite * composite, int fnumber);
int PQgetCompositeFNumber(PQcomposite * composite, char * fname);
Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber);
int PQgetCompositeFieldLength(PQcomposite * , int fnumber);
bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber);
char * PQgetCompositeField(PQcomposite * composite, int fnumber);

Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect
not, but right now I'm just thinking out loud.

Thoughts? Is this worth doing?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-09 00:10:16
Message-ID: 24946.1212970216@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> One complicating factor I see is that there is no protocol level support
> for anything other than simple objects - each data value is simply a
> stream of bytes of a known length. We would therefore need some pretty
> robust processing to pick apart structured objects.

Well, it's not that hard, the quoting/escaping rules for array_out and
record_out are well defined.

Are you intending that these operations support both text and binary
results?

The array accessors with ... parameter lists strike me as a bit
dangerous, because there is no way at all to verify that the caller is
passing the expected number of dimensions. Can't that be made tighter?
Also you need to spell out the error handling conventions for each of
these.

I think you missed some "free()" operations.

It might also be useful to provide some functions that form an array or
composite value from per-element strings, ie, the converse of the
de-construction routines. Here I'd be happy to skip the binary case.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-09 00:33:52
Message-ID: 484C7A70.3030503@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> One complicating factor I see is that there is no protocol level support
>> for anything other than simple objects - each data value is simply a
>> stream of bytes of a known length. We would therefore need some pretty
>> robust processing to pick apart structured objects.
>>
>
> Well, it's not that hard, the quoting/escaping rules for array_out and
> record_out are well defined.
>
> Are you intending that these operations support both text and binary
> results?
>

I'm a bit open on that.

> The array accessors with ... parameter lists strike me as a bit
> dangerous, because there is no way at all to verify that the caller is
> passing the expected number of dimensions. Can't that be made tighter?
>

Well, the only alternative I can think of is to make the client walk the
array one dimension at a time. Something like:

PQarray * PQgetInnerArray(PQarray * array, int dim);

then when we're down to the leaf level, we could have:

int PQgetArrayElementLength(PQarray * array, int dim);
bool PQgetArrayElementIsNull(PQarray * array, int dim);
char * PQgetArrayElement(PQarray * array, int dim);

That strikes me as somewhat more cumbersome, so I guess the question is
whether it's worth it. It probably fits the slightly clunky feel of libpq.

> Also you need to spell out the error handling conventions for each of
> these.
>
> I think you missed some "free()" operations.
>

Oh, yes, both of these are certainly true. This isn't really even a
proposal yet, more a sketch that would lead to a proposal. I'm hoping to
get some other input too, before settling this down, especially from
driver writers.
> It might also be useful to provide some functions that form an array or
> composite value from per-element strings, ie, the converse of the
> de-construction routines. Here I'd be happy to skip the binary case.
>
>
>

Yeah, that had occurred to me. Will think about it more, although it
could possibly be done as a separate project, too.

cheers

andrew


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-09 02:36:41
Message-ID: 484C9739.6060108@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> One of the areas where libpq seems to be severely lacking is in handling
> arrays and composites in query results. I'd like to set about rectifying
> that.
>
> Ideally this would mean that drivers using libpq could easily and
> reliably deliver such objects suitably structured in their particular
> languages (e.g. lists and hashes in Perl).
>
> One complicating factor I see is that there is no protocol level support
> for anything other than simple objects - each data value is simply a
> stream of bytes of a known length. We would therefore need some pretty
> robust processing to pick apart structured objects.
>
> We'll need a new API to handle such objects. I'm thinking of something
> like:
>
> PQarray * PQgetArray( const PGresult *res, int row_number, int
> column_number);
> int PQgetArrayNDims(PQarray * array);
> int PQgetArrayLower(PQarray * array, int dim);
> int PQgetArrayUpper(PQarray * array, int dim);
> int PQgetArrayElementLength(PQarray * array, int dim1, ...);
> bool PQgetArrayElementIsNull(PQarray * array, int dim1, ...);
> char * PQgetArrayElement(PQarray * array, int dim1, ...);
>
> PQcomposite * PQgetComposite(const PGresult *res, int row_number, int
> column_number);
> PQcomposite * PQgetArrayElementComposite(PQarray * array, int dim1, ...);
> int PQgetCompositeNFields(PQcomposite * composite);
> char * PQgetCompositeFName(PQcomposite * composite, int fnumber);
> int PQgetCompositeFNumber(PQcomposite * composite, char * fname);
> Oid PQgetCOmpositeFType(PQcomposite * composite, int fnumber);
> int PQgetCompositeFieldLength(PQcomposite * , int fnumber);
> bool PQgetCompositeFieldIsNull(PQcomposite * composite, int fnumber);
> char * PQgetCompositeField(PQcomposite * composite, int fnumber);
>
> Not sure if we need analogs for PQfformat, PQfmod or PQfsize - I suspect
> not, but right now I'm just thinking out loud.
>
> Thoughts? Is this worth doing?
>
> cheers
>
> andrew
>
>
>

libpqtypes already implemented this. It is a different approach but provides
the same functionality; with the inclusion of being able to handle every data
type. libpqtypes uses the PGresult API for composites and arrays, rather than
adding a new set of functions.

To support this, one must be able to convert all data types (unless you are only
supporting text results) because composites can be made up of any data type.

Simple arrays:
http://libpqtypes.esilo.com/man3/pqt-specs.html#array

Composite arrays:
http://libpqtypes.esilo.com/man3/pqt-composites.html

EXAMPLE OF GETTING A COMPOSITE:
(taken from http://libpqtypes.esilo.com/ home page)

/* Let's get a composite.
* CREATE TYPE simple AS (a int4, t text);
*/
PGint4 i4;
PGtext text;
PGresult *res, *simple;
int resultFormat = 1;

/* Your composites need to be registered */
PQregisterTypeHandler(conn, "simple", NULL, NULL);

/* 2nd arg, PGparam, can be NULL if there are no query params.
* Composites require binary results, so we can't use PQexec().
*/
res = PQparamExec(conn, NULL, "SELECT my_simple FROM t", resultFormat);
if(!res)
fprintf(stderr, "ERROR: %s\n", PQgeterror());

/* Get the simple composite, which is exposed as a PGresult. */
PQgetf(res, 0, "%simple", 0, &simple);
PQclear(res); /* no longer needed */

/* Get the simple composite attributes from the simple result.
* Reference fields by name by using a '#' rather than a '%'.
* The field names are the composite attribute names.
*/
PQgetf(simple, 0, "#int4 #text", "a", &i4, "t", &text);
PQclear(simple);

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Chernow <ac(at)esilo(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-09 04:31:57
Message-ID: 484CB23D.8010706@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Chernow wrote:
>
> libpqtypes already implemented this. It is a different approach but
> provides the same functionality; with the inclusion of being able to
> handle every data type. libpqtypes uses the PGresult API for
> composites and arrays, rather than adding a new set of functions.
>
>

Yes, I thought you'd say that :-)

This has some significant limitations - for example (quoting from your docs)

Arrays are only handled using binary format. This means that any type
used as an array element must be put and gotten in binary format. If a
user-defined type does not implement a send and recv function in the
backend, it can not be used as an array element.

That makes it quite useless for my intended purpose.

I also am not particularly enamoured of the libpqtypes way of doing things, which feels rather foreign to me.

Lastly, the idea is to provide extra facilities to libpq clients without requiring any extra library.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Andrew Chernow <ac(at)esilo(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-09 07:00:15
Message-ID: 5994.1212994815@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Lastly, the idea is to provide extra facilities to libpq clients
> without requiring any extra library.

Or more to the point, without requiring boatloads of new code that
only some libpq users would have any use for.

To my mind, the point of the present proposal is to provide some
client-side code that understands how to invert the data
formatting/escaping rules implemented by array_out, record_out,
and perhaps array_in/record_in. We can make that happen without
taking a quantum jump in libpq's API complexity --- and offhand
it seems that Andrew D's proposal is at about the right level of
complication. libpqtypes has its place also, but I think it's
addressing a different level of problem complexity.

regards, tom lane


From: Andrew Chernow <ac(at)esilo(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-09 10:02:05
Message-ID: 484CFF9D.4050506@esilo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>That makes it quite useless for my intended purpose.

I found no more use cases for text results after libpqtypes started to
take shape, eventhough libpqtypes supports all data types in text &
binary excluding arrays and composites. Because of this, adding a text
parser for arrays and composites felt like a lot of work for a little
gain. libpqtypes is really designed to be a binary interface. The text
support offered allows existing applications to use the new interface
with results generated by PQexec(), meaning you can use PQgetf w/o
having to change code to use PQputf().

If you take another glance at libpqtypes, you may see that result format
decisions are pretty well abstracted and there really is no need for
text results anymore (okay, I'll catagorize that as an opinion).

> I also am not particularly enamoured of the libpqtypes way of doing
> things, which feels rather foreign to me.

Not sure we can fix this issue. We made every attempt to keep things
familiar ... printf/scanf style. It's a new approach for libpq but an
old one for C hacks.

--
Andrew Chernow
eSilo, LLC
every bit counts
http://www.esilo.com/


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-10 14:11:20
Message-ID: b42b73150806100711g75b9033amfdb1d441518702f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/8/08, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Tom Lane wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > > One complicating factor I see is that there is no protocol level support
> for anything other than simple objects - each data value is simply a stream
> of bytes of a known length. We would therefore need some pretty robust
> processing to pick apart structured objects.
> >
> > Are you intending that these operations support both text and binary
> > results?
>
> I'm a bit open on that.

IMO, support for binary is critical. Because of the interplay of the
array and composite out formats, the number of backslashes grows
exponentially (!) with nesting levels. This makes text format arrays
unsuitable for any non-trivial operations involving arrays of
composites.

> > The array accessors with ... parameter lists strike me as a bit
> > dangerous, because there is no way at all to verify that the caller is
> > passing the expected number of dimensions. Can't that be made tighter?
>
> Well, the only alternative I can think of is to make the client walk the
> array one dimension at a time. Something like:
>
> PQarray * PQgetInnerArray(PQarray * array, int dim);

Dimension walking is probably the best approach. The pure varargs
version which Andrew D is suggesting has too many arguments based
strictly on convention IMO (meaning, I agree with Tom).

One alternative is to do a MAXDIM (6) argument 'getter' also taking
the requested dimension with perhaps some wrapping macros for
simplicity. One issue with this is that it seems to suggest array
slicing etc. which seems more complicated than it's worth.

One small style note here: the structs in libpq generally have a 'PG'
prefix...PGarray, etc.

> then when we're down to the leaf level, we could have:
>
> int PQgetArrayElementLength(PQarray * array, int dim);
> bool PQgetArrayElementIsNull(PQarray * array, int dim);
> char * PQgetArrayElement(PQarray * array, int dim);
>
> That strikes me as somewhat more cumbersome, so I guess the question is
> whether it's worth it. It probably fits the slightly clunky feel of libpq.

Have you considered using a PGresult to present the array/composite?
The idea is to arrange it so: one column per composite field and one
row per array row. I think this is a pretty elegant fit over the
requirements of getting information out of an array, and you have to
add a lot less convetions to libpq. This is a semantic enhancement
of some of the existing functions but

Tom Lane wrote:
> It might also be useful to provide some functions that form an array or
> composite value from per-element strings, ie, the converse of the
> de-construction routines. Here I'd be happy to skip the binary case.

That seems a little inconsistent to me: (we allow sending binary query
parameters except when sending arrays). ISTM that arrays are one of
the cases where sending in binary could be a big win in terms of
performance.

Andrew Dunstan wrote:
> Yeah, that had occurred to me. Will think about it more, although it could
> possibly be done as a separate project, too.

I can't help but notice you proposing some overlapping behavior with
libpqtypes when we could be making better progress working together
and extending libpq in a way that:
*) doesn't introduce new behavior to libpq which is an overreach of
the library (particularly things which can be achieved via simple
wrapping strategies)
*) provides a reasonable foundation for the objectives which
libpqtypes is trying to achieve.

I have always thought that array and composite presentation belongs in
libpq. For example, since the 'inner result' is constructed on the
libpqtype side, the result has to be built with a deep copy. This is
fine in most cases but a potential optimzation target if everything is
done in libpq. Also we could drop some portions of our hooking patch,
namely PQsetValue, PQcopyResult, and possibly PQresultAlloc.

While we didn't introduce text/composite array parsing into
libpqtypes, there is no reason why we couldn't (certain functions are
undefined

I understand that some portion of libpqtypes are unlikely to make it
into libpq. Is it feasible come up with an array composite approach
that is useful from libpqtypes perspective so we can layer on top of
what andrew d is proposing.

For example, in libpq:
*) array/composite get/put (in text/binary)
*) 'PGparam' argument stacker building arrays/composites for
*) introduce PGarray struct. If PGresult is used to present data,
AIUI the 'composite' family of functions, outside of what is needed to
present the result from the composite, does not need to exist.

But not:
*) type serializing/deserializing from binary (except in the special
case of arrays/composites).
*) printf/scanf style query functions.

Which we would presumably keep in the libpqtypes library. I haven't
thought it completely through, but I'm guessing it's possible to
adjust the libpqtype library so that it gives the interface we like
but allow extending libpq to naturally allow better handling of
arrays/composites which we need and libpq users obviously want.

Our propsosed libpqtypes library solves all the use cases regarding
arrays/composites with the exception of pulling out arrays in text
(but it could be easily adjusted to do that, too). So, maybe it's
worthwhile seeing if we can adapt some of the work which we have
already done and is working quite well.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-10 14:23:11
Message-ID: 16909.1213107791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On 6/8/08, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Tom Lane wrote:
>>> Are you intending that these operations support both text and binary
>>> results?
>>
>> I'm a bit open on that.

> IMO, support for binary is critical. Because of the interplay of the
> array and composite out formats, the number of backslashes grows
> exponentially (!) with nesting levels. This makes text format arrays
> unsuitable for any non-trivial operations involving arrays of
> composites.

Um ... but who cares, as long as you've got functions to wrap and unwrap
the data for you? Personally I wouldn't object if these were text-only;
they'd be a whole lot more future-proof that way.

> One alternative is to do a MAXDIM (6) argument 'getter' also taking
> the requested dimension with perhaps some wrapping macros for
> simplicity. One issue with this is that it seems to suggest array
> slicing etc. which seems more complicated than it's worth.

Let's not embed MAXDIM in libpq's ABI :-(

regards, tom lane


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq support for arrays and composites
Date: 2008-06-10 15:29:17
Message-ID: b42b73150806100829l22c9aecat65e746fe746a24a9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/10/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> > IMO, support for binary is critical. Because of the interplay of the
> > array and composite out formats, the number of backslashes grows
> > exponentially (!) with nesting levels. This makes text format arrays
> > unsuitable for any non-trivial operations involving arrays of
> > composites.
>
> Um ... but who cares, as long as you've got functions to wrap and unwrap
> the data for you? Personally I wouldn't object if these were text-only;
> they'd be a whole lot more future-proof that way.

Are you truly suggesting that a format where potentially hundreds of
backslashes are sent per character of useful data is something you
would encourage users to use in a data-centric application? This is
bloated and wasteful in the extreme! I took one look at the output
the database produced in one application we are using with a couple of
levels of nesting and determined that the text approach was not
workable. This was in fact the early genesis of libpqtypes.

Granted, it doesn't impact a wide range of use cases. It's not like
the array and composite container formats are particularly
complicated. The 'future proofing' requirements should be fairly
easily solved (as opposed to a general solution which covers the
entire range of types).

> > One alternative is to do a MAXDIM (6) argument 'getter' also taking
> > the requested dimension with perhaps some wrapping macros for
> > simplicity. One issue with this is that it seems to suggest array
> > slicing etc. which seems more complicated than it's worth.
>
> Let's not embed MAXDIM in libpq's ABI :-(

right. For this and other reasons I think recursing into the array
structure is the best approach...ideally using a PGresult to present
the array data.

merlin