plperl vs. bytea

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: plperl vs. bytea
Date: 2007-05-06 01:46:11
Message-ID: 463D3363.8070900@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I have been talking with Theo some more about his recent problem with
bytea arguments and results (see recent discussion on -bugs and also
recent docs patch), what he needs is a way to have bytea (and possibly
other unknown types) passed as binary data to and from plperl. The
conversion overhead is too big both computationally and in increased
memory usage. After discussing some possibilities, we decided that maybe
the best approach would be to allow a custom GUC variable that would
specify a list of types to be passed in binary form with no conversion, e.g.

plperl.pass_as_binary = 'bytea, other-type'

This would affect function args, trigger data, return results, and I
think it should also apply to arguments for SPI prepared queries and to
SPI returned results.

If this seems like a good idea maybe it should go on the TODO list in
whatever is the current incarnation.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-06 01:59:57
Message-ID: 549.1178416797@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> After discussing some possibilities, we decided that maybe
> the best approach would be to allow a custom GUC variable that would
> specify a list of types to be passed in binary form with no conversion, e.g.

> plperl.pass_as_binary = 'bytea, other-type'

At minimum this GUC would have to be superuser-only, and even then the
security risks seem a bit high. But the real problem with this thinking
is the same one I already pointed out to Theo: why do you think this
issue is plperl-specific?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-06 02:19:36
Message-ID: 463D3B38.40607@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> After discussing some possibilities, we decided that maybe
>> the best approach would be to allow a custom GUC variable that would
>> specify a list of types to be passed in binary form with no conversion, e.g.
>>
>
>
>> plperl.pass_as_binary = 'bytea, other-type'
>>
>
> At minimum this GUC would have to be superuser-only, and even then the
> security risks seem a bit high. But the real problem with this thinking
> is the same one I already pointed out to Theo: why do you think this
> issue is plperl-specific?
>
>
>

It's not. If we really want to tackle this root and branch without
upsetting legacy code, I think we'd need to have a way of marking data
items as binary in the grammar, e.g.

create function myfunc(myarg binary bytea) returns binary bytea
language plperl as $$ ...$$;

That's what I originally suggested to Theo. It would be a lot more work,
though :-)

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-06 13:17:45
Message-ID: 200705061517.46632.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> It's not. If we really want to tackle this root and branch without
> upsetting legacy code, I think we'd need to have a way of marking
> data items as binary in the grammar, e.g.
>
> create function myfunc(myarg binary bytea) returns binary bytea
> language plperl as $$ ...$$;

This ought to be a property of data type plus language, not a property
of a function.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-06 13:24:36
Message-ID: 463DD714.6090406@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Andrew Dunstan wrote:
>
>> It's not. If we really want to tackle this root and branch without
>> upsetting legacy code, I think we'd need to have a way of marking
>> data items as binary in the grammar, e.g.
>>
>> create function myfunc(myarg binary bytea) returns binary bytea
>> language plperl as $$ ...$$;
>>
>
> This ought to be a property of data type plus language, not a property
> of a function.
>
>

Why should it?

And how would you do it in such a way that it didn't break legacy code?

My GUC proposal would have made it language+type specific, but Tom
didn't like that approach.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 00:48:28
Message-ID: 18422.1178498908@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Peter Eisentraut wrote:
>> This ought to be a property of data type plus language, not a property
>> of a function.

> Why should it?

> And how would you do it in such a way that it didn't break legacy code?

> My GUC proposal would have made it language+type specific, but Tom
> didn't like that approach.

It may indeed need to be language+type specific; what I was objecting to
was the proposal of an ad-hoc plperl-specific solution without any
consideration for other languages (or other data types for that matter).
I think that's working at the wrong level of detail, at least for
initial design.

What we've basically got here is a complaint that the default
textual-representation-based method for transmitting PL function
parameters and results is awkward and inefficient for bytea.
So the first question is whether this is really localized to only
bytea, and if not which other types have got similar issues.
(Even if you make the case that no other scalar types need help,
what of bytea[] and composite types containing bytea or bytea[]?)

After that we have to look at which PLs have the issue. I think
this is largely driven by what the PL's internal type system is
like, in particular does it have a datatype that is a natural
conversion target for bytea, or other types with the same issue?
(Tcl for instance once did not have 8-bit-clean strings, though
I think it does today.)

After we've got a handle on the scope of the problem we can start
to think about solutions.

regards, tom lane


From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Theo Schlossnagle" <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 05:38:36
Message-ID: 162867790705062238x54a895bx2412ac1678384d61@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> What we've basically got here is a complaint that the default
> textual-representation-based method for transmitting PL function
> parameters and results is awkward and inefficient for bytea.
> So the first question is whether this is really localized to only
> bytea, and if not which other types have got similar issues.
> (Even if you make the case that no other scalar types need help,
> what of bytea[] and composite types containing bytea or bytea[]?)
>

It can be solution for known isues. Current textual representation is
more ugly hack than everythink else.

Regards
Pavel Stehule


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 10:20:09
Message-ID: 20070507102009.GA9410@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, May 06, 2007 at 08:48:28PM -0400, Tom Lane wrote:
> What we've basically got here is a complaint that the default
> textual-representation-based method for transmitting PL function
> parameters and results is awkward and inefficient for bytea.
> So the first question is whether this is really localized to only
> bytea, and if not which other types have got similar issues.
> (Even if you make the case that no other scalar types need help,
> what of bytea[] and composite types containing bytea or bytea[]?)

I must say I was indeed surprised by the idea that bytea is passed by
text, since Perl handles embedded nulls in strings without any problem
at all. Does this mean integers are passed as text also? I would have
expected an array argument to be passed as an array, but now I'm not so
sure.

So I'm with Tom on this one: there needs to be a serious discussion
about how types are passed to Perl and the costs associated with it.

I do have one problem though: for bytea/integers/floats Perl has
appropriate internel representations. But what about other user-defined
types? Say the user-defined UUID type, it should probably also passed
by a byte string, yet how could Perl know that. That would imply that
user-defined types need to be able to specify how they are passed to
PLs, to *any* PL.

So fixing it for bytea is one thing, but there's a bigger issue here
that needs discussion.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 11:18:25
Message-ID: 463F0B01.2050506@wildenhain.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout schrieb:
...
> I do have one problem though: for bytea/integers/floats Perl has
> appropriate internel representations. But what about other user-defined
> types? Say the user-defined UUID type, it should probably also passed
> by a byte string, yet how could Perl know that. That would imply that
> user-defined types need to be able to specify how they are passed to
> PLs, to *any* PL.
>
Yes exactly. One way could be to pass the type binary and provide
a hull class for the PL/languages which then call the input/output
routines on the string boundaries of the type unless overridden by
user implementation. So default handling could be done in string
representation of the type whatever that is and for a defined set
of types every pl/language could implement special treatment like
mapping to natural types.

This handling can be done independently for every pl implementation
since it would for the most types just move the current type treatment
just a bit closer to the user code instead of doing all of it
in the call handler.

2nd problem is language interface for outside of the database scripting.
Efficient and lossless type handling there would improve some
situations - maybe a similar approach could be taken here.

Regards
Tino


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 13:02:55
Message-ID: 463F237F.7040106@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
>> My GUC proposal would have made it language+type specific, but Tom
>> didn't like that approach.
>>
>
> It may indeed need to be language+type specific; what I was objecting to
> was the proposal of an ad-hoc plperl-specific solution without any
> consideration for other languages (or other data types for that matter).
> I think that's working at the wrong level of detail, at least for
> initial design.
>
> What we've basically got here is a complaint that the default
> textual-representation-based method for transmitting PL function
> parameters and results is awkward and inefficient for bytea.
> So the first question is whether this is really localized to only
> bytea, and if not which other types have got similar issues.
> (Even if you make the case that no other scalar types need help,
> what of bytea[] and composite types containing bytea or bytea[]?)
>

Well, the proposal would have allowed the user to specify the types to
be passed binary, so it wouldn't have been bytea only.

Array types are currently passed as text. This item used to be on the
TODO list but it disappeared at some stage:

. Pass arrays natively instead of as text between plperl and postgres

(Perhaps it's naughty of me to observe that if we had a tracker we might
know why it disappeared). Arrays can be returned as arrayrefs, and
plperl has a little postprocessing magic that turns that into text which
will in turn be parsed back into a postgres array. Not very efficient
but it's a placeholder until we get better array support.

Composites are in fact passed as hashrefs and can be returned as
hashrefs. Unfortunately, this is not true recursively - a composite
within a composite will be received as text.

Another aspect of this is how we deal with SPI arguments and results. I
need to look into that, but sufficient unto the day ...

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tino Wildenhain <tino(at)wildenhain(dot)de>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 13:08:24
Message-ID: 463F24C8.5020205@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tino Wildenhain wrote:
> Martijn van Oosterhout schrieb:
> ...
> > I do have one problem though: for bytea/integers/floats Perl has
> > appropriate internel representations. But what about other user-defined
> > types? Say the user-defined UUID type, it should probably also passed
> > by a byte string, yet how could Perl know that. That would imply that
> > user-defined types need to be able to specify how they are passed to
> > PLs, to *any* PL.
> >
> Yes exactly. One way could be to pass the type binary and provide
> a hull class for the PL/languages which then call the input/output
> routines on the string boundaries of the type unless overridden by
> user implementation. So default handling could be done in string
> representation of the type whatever that is and for a defined set
> of types every pl/language could implement special treatment like
> mapping to natural types.
>
> This handling can be done independently for every pl implementation
> since it would for the most types just move the current type treatment
> just a bit closer to the user code instead of doing all of it
> in the call handler.
>
> 2nd problem is language interface for outside of the database scripting.
> Efficient and lossless type handling there would improve some
> situations - maybe a similar approach could be taken here.
>
>

This seems like an elaborate piece of scaffolding for a relatively small
problem.

This does not need to be over-engineered, IMNSHO.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 14:48:14
Message-ID: 5569.1178549294@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Sun, May 06, 2007 at 08:48:28PM -0400, Tom Lane wrote:
>> What we've basically got here is a complaint that the default
>> textual-representation-based method for transmitting PL function
>> parameters and results is awkward and inefficient for bytea.

> I must say I was indeed surprised by the idea that bytea is passed by
> text, since Perl handles embedded nulls in strings without any problem
> at all. Does this mean integers are passed as text also?

Pretty much everything is passed as text. This is a historical
accident, in part: our first PL with an external interpreter was pltcl,
and Tcl of the day had no other variable type besides "text string".
(They've gotten smarter since then, but from a user's-eye point of view
it's still true that every value in Tcl is a string.) So it was natural
to decree that the value transmission protocol was just to convert to
text and back with the SQL datatype I/O functions. Later PLs copied
that decision without thinking hard about it. We've wedged a few bits
of custom transmission protocol into plperl for arrays and records, but
it's been pretty ad-hoc each time. Seems it's time to take a step back
and question the assumptions.

regards, tom lane


From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 16:09:17
Message-ID: 463F4F2D.4040705@wildenhain.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan schrieb:
>
>
> Tino Wildenhain wrote:
>> Martijn van Oosterhout schrieb:
>> ...
>> > I do have one problem though: for bytea/integers/floats Perl has
>> > appropriate internel representations. But what about other user-defined
>> > types? Say the user-defined UUID type, it should probably also passed
>> > by a byte string, yet how could Perl know that. That would imply that
>> > user-defined types need to be able to specify how they are passed to
>> > PLs, to *any* PL.
>> >
>> Yes exactly. One way could be to pass the type binary and provide
>> a hull class for the PL/languages which then call the input/output
>> routines on the string boundaries of the type unless overridden by
>> user implementation. So default handling could be done in string
>> representation of the type whatever that is and for a defined set
>> of types every pl/language could implement special treatment like
>> mapping to natural types.
>>
>> This handling can be done independently for every pl implementation
>> since it would for the most types just move the current type treatment
>> just a bit closer to the user code instead of doing all of it
>> in the call handler.
>>
>> 2nd problem is language interface for outside of the database scripting.
>> Efficient and lossless type handling there would improve some
>> situations - maybe a similar approach could be taken here.
>>
>>
>
> This seems like an elaborate piece of scaffolding for a relatively small
> problem.
>
> This does not need to be over-engineered, IMNSHO.

Well could you explain where it would appear over-engineered?
All I was proposing is to move the rather hard-coded
type mapping to a softer approach where the language
is able to support it.

Is there any insufficience in perl which makes it harder to
do in a clean way?

Regards
Tino


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tino Wildenhain <tino(at)wildenhain(dot)de>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 17:02:54
Message-ID: 463F5BBE.9030509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tino Wildenhain wrote:
> Andrew Dunstan schrieb:
>>
>>
>> Tino Wildenhain wrote:
>>> Martijn van Oosterhout schrieb:
>>> ...
>>> > I do have one problem though: for bytea/integers/floats Perl has
>>> > appropriate internel representations. But what about other
>>> user-defined
>>> > types? Say the user-defined UUID type, it should probably also passed
>>> > by a byte string, yet how could Perl know that. That would imply that
>>> > user-defined types need to be able to specify how they are passed to
>>> > PLs, to *any* PL.
>>> >
>>> Yes exactly. One way could be to pass the type binary and provide
>>> a hull class for the PL/languages which then call the input/output
>>> routines on the string boundaries of the type unless overridden by
>>> user implementation. So default handling could be done in string
>>> representation of the type whatever that is and for a defined set
>>> of types every pl/language could implement special treatment like
>>> mapping to natural types.
>>>
>>> This handling can be done independently for every pl implementation
>>> since it would for the most types just move the current type treatment
>>> just a bit closer to the user code instead of doing all of it
>>> in the call handler.
>>>
>>> 2nd problem is language interface for outside of the database
>>> scripting.
>>> Efficient and lossless type handling there would improve some
>>> situations - maybe a similar approach could be taken here.
>>>
>>>
>>
>> This seems like an elaborate piece of scaffolding for a relatively
>> small problem.
>>
>> This does not need to be over-engineered, IMNSHO.
>
> Well could you explain where it would appear over-engineered?
> All I was proposing is to move the rather hard-coded
> type mapping to a softer approach where the language
> is able to support it.
>
> Is there any insufficience in perl which makes it harder to
> do in a clean way?
>
>

Anything that imposes extra requirements on type creators seems undesirable.

I'm not sure either that the UUID example is a very good one. This whole
problem arose because of performance problems handling large gobs of
data, not just anything that happens to be binary.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tino Wildenhain <tino(at)wildenhain(dot)de>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 17:34:42
Message-ID: 15091.1178559282@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tino Wildenhain wrote:
>> Andrew Dunstan schrieb:
>>> This does not need to be over-engineered, IMNSHO.
>>
>> Well could you explain where it would appear over-engineered?

> Anything that imposes extra requirements on type creators seems undesirable.

> I'm not sure either that the UUID example is a very good one. This whole
> problem arose because of performance problems handling large gobs of
> data, not just anything that happens to be binary.

Well, we realize that bytea has got a performance problem, but are we so
sure that nothing else does? I don't want to stick in a one-purpose
wart only to find later that we need a few more warts of the same kind.

An example of something else we ought to be considering is binary
transmission of float values. The argument in favor of that is not
so much performance (although text-and-back conversion is hardly cheap)
as it is that the conversion is potentially lossy, since float8out
doesn't by default generate enough digits to ensure a unique
back-conversion.

ISTM there are three reasons for considering non-text-based
transmission:

1. Performance, as in the bytea case
2. Avoidance of information loss, as for float
3. Providing a natural/convenient mapping to the PL's internal data types,
as we already do --- but incompletely --- for arrays and records

It's clear that the details of #3 have to vary across PLs, but I'd
like it not to vary capriciously. For instance plperl currently has
special treatment for returning perl arrays as SQL arrays, but AFAICS
from the manual not for going in the other direction; plpython and
pltcl overlook arrays entirely, even though there are natural mappings
they could and should be using.

I don't know to what extent we should apply point #3 to situations other
than arrays and records, but now is the time to think about it. An
example: working with the geometric types in a PL function is probably
going to be pretty painful for lack of simple access to the constituent
float values (not to mention the lossiness problem).

We should also be considering some non-core PLs such as PL/Ruby and
PL/R; they might provide additional examples to influence our thinking.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tino Wildenhain <tino(at)wildenhain(dot)de>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-07 17:57:25
Message-ID: 463F6885.8000809@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tino Wildenhain wrote:
>>
>>> Andrew Dunstan schrieb:
>>>
>>>> This does not need to be over-engineered, IMNSHO.
>>>>
>>> Well could you explain where it would appear over-engineered?
>>>
>
>
>> Anything that imposes extra requirements on type creators seems undesirable.
>>
>
>
>> I'm not sure either that the UUID example is a very good one. This whole
>> problem arose because of performance problems handling large gobs of
>> data, not just anything that happens to be binary.
>>
>
> Well, we realize that bytea has got a performance problem, but are we so
> sure that nothing else does? I don't want to stick in a one-purpose
> wart only to find later that we need a few more warts of the same kind.
>
> An example of something else we ought to be considering is binary
> transmission of float values. The argument in favor of that is not
> so much performance (although text-and-back conversion is hardly cheap)
> as it is that the conversion is potentially lossy, since float8out
> doesn't by default generate enough digits to ensure a unique
> back-conversion.
>
> ISTM there are three reasons for considering non-text-based
> transmission:
>
> 1. Performance, as in the bytea case
> 2. Avoidance of information loss, as for float
> 3. Providing a natural/convenient mapping to the PL's internal data types,
> as we already do --- but incompletely --- for arrays and records
>
> It's clear that the details of #3 have to vary across PLs, but I'd
> like it not to vary capriciously. For instance plperl currently has
> special treatment for returning perl arrays as SQL arrays, but AFAICS
> from the manual not for going in the other direction; plpython and
> pltcl overlook arrays entirely, even though there are natural mappings
> they could and should be using.
>
> I don't know to what extent we should apply point #3 to situations other
> than arrays and records, but now is the time to think about it. An
> example: working with the geometric types in a PL function is probably
> going to be pretty painful for lack of simple access to the constituent
> float values (not to mention the lossiness problem).
>
> We should also be considering some non-core PLs such as PL/Ruby and
> PL/R; they might provide additional examples to influence our thinking.
>

OK, we have a lot of work to do here, then.

I can really only speak with any significant knowledge on the perl
front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer,
float, string). IV can accomodate at least the largest integer or
pointer type on the platform, NV a double, and PV an arbitrary string of
bytes.

As for structured types, as I noted elsewhere we have some of the work
done for plperl. My suggestion would be to complete it for plperl and
get it fully orthogonal and then retrofit that to plpython/pltcl.

I've actually been worried for some time that the conversion glue was
probably imposing significant penalties on the non-native PLs, so I'm
glad to see this getting some attention.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tino Wildenhain <tino(at)wildenhain(dot)de>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-12 22:06:56
Message-ID: 200705122206.l4CM6uA03621@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

o Allow data to be passed in native language formats, rather
than only text

http://archives.postgresql.org/pgsql-hackers/2007-05/msg00289$

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >
> >> Tino Wildenhain wrote:
> >>
> >>> Andrew Dunstan schrieb:
> >>>
> >>>> This does not need to be over-engineered, IMNSHO.
> >>>>
> >>> Well could you explain where it would appear over-engineered?
> >>>
> >
> >
> >> Anything that imposes extra requirements on type creators seems undesirable.
> >>
> >
> >
> >> I'm not sure either that the UUID example is a very good one. This whole
> >> problem arose because of performance problems handling large gobs of
> >> data, not just anything that happens to be binary.
> >>
> >
> > Well, we realize that bytea has got a performance problem, but are we so
> > sure that nothing else does? I don't want to stick in a one-purpose
> > wart only to find later that we need a few more warts of the same kind.
> >
> > An example of something else we ought to be considering is binary
> > transmission of float values. The argument in favor of that is not
> > so much performance (although text-and-back conversion is hardly cheap)
> > as it is that the conversion is potentially lossy, since float8out
> > doesn't by default generate enough digits to ensure a unique
> > back-conversion.
> >
> > ISTM there are three reasons for considering non-text-based
> > transmission:
> >
> > 1. Performance, as in the bytea case
> > 2. Avoidance of information loss, as for float
> > 3. Providing a natural/convenient mapping to the PL's internal data types,
> > as we already do --- but incompletely --- for arrays and records
> >
> > It's clear that the details of #3 have to vary across PLs, but I'd
> > like it not to vary capriciously. For instance plperl currently has
> > special treatment for returning perl arrays as SQL arrays, but AFAICS
> > from the manual not for going in the other direction; plpython and
> > pltcl overlook arrays entirely, even though there are natural mappings
> > they could and should be using.
> >
> > I don't know to what extent we should apply point #3 to situations other
> > than arrays and records, but now is the time to think about it. An
> > example: working with the geometric types in a PL function is probably
> > going to be pretty painful for lack of simple access to the constituent
> > float values (not to mention the lossiness problem).
> >
> > We should also be considering some non-core PLs such as PL/Ruby and
> > PL/R; they might provide additional examples to influence our thinking.
> >
>
> OK, we have a lot of work to do here, then.
>
> I can really only speak with any significant knowledge on the perl
> front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer,
> float, string). IV can accomodate at least the largest integer or
> pointer type on the platform, NV a double, and PV an arbitrary string of
> bytes.
>
> As for structured types, as I noted elsewhere we have some of the work
> done for plperl. My suggestion would be to complete it for plperl and
> get it fully orthogonal and then retrofit that to plpython/pltcl.
>
> I've actually been worried for some time that the conversion glue was
> probably imposing significant penalties on the non-native PLs, so I'm
> glad to see this getting some attention.
>
>
> cheers
>
> andrew
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tino Wildenhain <tino(at)wildenhain(dot)de>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Theo Schlossnagle <jesus(at)omniti(dot)com>
Subject: Re: plperl vs. bytea
Date: 2007-05-14 10:57:32
Message-ID: 1179140252.14897.113.camel@hannu-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, E, 2007-05-07 kell 13:57, kirjutas Andrew Dunstan:
>
> Tom Lane wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >
> >> Tino Wildenhain wrote:
> >>
> >>> Andrew Dunstan schrieb:
> >>>
> >>>> This does not need to be over-engineered, IMNSHO.
> >>>>
> >>> Well could you explain where it would appear over-engineered?
> >>>
> >
> >
> >> Anything that imposes extra requirements on type creators seems undesirable.
> >>
> >
> >
> >> I'm not sure either that the UUID example is a very good one. This whole
> >> problem arose because of performance problems handling large gobs of
> >> data, not just anything that happens to be binary.
> >>
> >
> > Well, we realize that bytea has got a performance problem, but are we so
> > sure that nothing else does? I don't want to stick in a one-purpose
> > wart only to find later that we need a few more warts of the same kind.
> >
> > An example of something else we ought to be considering is binary
> > transmission of float values. The argument in favor of that is not
> > so much performance (although text-and-back conversion is hardly cheap)
> > as it is that the conversion is potentially lossy, since float8out
> > doesn't by default generate enough digits to ensure a unique
> > back-conversion.
> >
> > ISTM there are three reasons for considering non-text-based
> > transmission:
> >
> > 1. Performance, as in the bytea case
> > 2. Avoidance of information loss, as for float
> > 3. Providing a natural/convenient mapping to the PL's internal data types,
> > as we already do --- but incompletely --- for arrays and records
> >
> > It's clear that the details of #3 have to vary across PLs, but I'd
> > like it not to vary capriciously. For instance plperl currently has
> > special treatment for returning perl arrays as SQL arrays, but AFAICS
> > from the manual not for going in the other direction; plpython and
> > pltcl overlook arrays entirely, even though there are natural mappings
> > they could and should be using.

plpy (from http://python.projects.postgresql.org/project/be.html ) goes
to another extreme and exposes the whole postgresql type system to
embedded python interpreter.

> > I don't know to what extent we should apply point #3 to situations other
> > than arrays and records, but now is the time to think about it.

If we can avoid copying/converting large(ish) values between postgresql
and embedded language, we should try to do it. The main problems seem to
be in differences alloc/free, palloc, refcounting/CG between pg and
embedded languages.

> > An
> > example: working with the geometric types in a PL function is probably
> > going to be pretty painful for lack of simple access to the constituent
> > float values (not to mention the lossiness problem).

of course we should provide access to subparts of pg types, either by
writing some wrapper class/accessor functios or providing access through
postgresql's existing functions.

> > We should also be considering some non-core PLs such as PL/Ruby and
> > PL/R; they might provide additional examples to influence our thinking.
> >
>
> OK, we have a lot of work to do here, then.
>
> I can really only speak with any significant knowledge on the perl
> front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer,
> float, string). IV can accomodate at least the largest integer or
> pointer type on the platform, NV a double, and PV an arbitrary string of
> bytes.

OTOH python has an extensible type system from the start (i.e. anything
is an object), and thus could be painlessly (just SMOP) extended to use
postgresql's native types when there is no 1:1 match with existing
types.

> As for structured types, as I noted elsewhere we have some of the work
> done for plperl. My suggestion would be to complete it for plperl and
> get it fully orthogonal and then retrofit that to plpython/pltcl.
>
> I've actually been worried for some time that the conversion glue was
> probably imposing significant penalties on the non-native PLs, so I'm
> glad to see this getting some attention.
>
>
> cheers
>
> andrew
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly