Re: new procedural language - PL/R

Lists: pgsql-hackers
From: Joe Conway <mail(at)joeconway(dot)com>
To: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: new procedural language - PL/R
Date: 2003-02-03 15:51:11
Message-ID: 3E3E8FEF.4080702@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm nearing completion of a new procedural language, PL/R. It provides an
interface to the R Statistical Computing language. R is similar to the
commercial package S-Plus; for more on R see:
http://www.r-project.org/

Here is the first paragraph of their intro:
"R is a language and environment for statistical computing and graphics. It is
a GNU project which is similar to the S language and environment which was
developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by
John Chambers and colleagues. R can be considered as a different
implementation of S. There are some important differences, but much code
written for S runs unaltered under R."

Before I post the source somewhere, I have a few questions:

1) R itself is under GPL, and as far as I can tell the shared library libR.so
is also under GPL, not LGPL. I assume that means I need to release plr under
GPL -- does that sound correct?

2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and
assuming plr is released under GPL, is there any chance that it would be
accepted into src/pl or contrib, or should I start a gborg project (I'd prefer
if it could at least live in contrib)? If I am somehow able to release it
under a BSD license, would that change the answer (if so, I'll at least ask
the r-devel list about LGPL on the shared library)?

3) The only major feature not yet developed is the ability to handle triggers.
Any strong feelings on whether this is necessary for a first release? I see
that pl/perl doesn't handle triggers. It seems like using a plpgsql trigger to
call a plr function is a reasonable workaround.

Thanks for any thoughts or comments.

Joe


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new procedural language - PL/R
Date: 2003-02-03 19:13:58
Message-ID: 3E3EBF76.9060104@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>>2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and
>>assuming plr is released under GPL, is there any chance that it would be
>>accepted into src/pl or contrib, or should I start a gborg project (I'd prefer
>>if it could at least live in contrib)?
>
> I think we'd have to insist on gborg. The only reason there are any
> non-BSD-license items left in contrib is that I haven't finished my TODO
> item to get their licenses changed or remove 'em.

Thanks for the confirmation. That's what I suspected.

>>If I am somehow able to release it
>>under a BSD license, would that change the answer (if so, I'll at least ask
>>the r-devel list about LGPL on the shared library)?
>
> BSD would be good. I agree that it'll be a pain in the neck to
> maintain a PL that is not in the main tree, so I'd support accepting it
> if we can get the license right.

OK -- I'll see what they have to say about it over on r-devel.

>>3) The only major feature not yet developed is the ability to handle triggers.
>>Any strong feelings on whether this is necessary for a first release?
>
> No. I'm not sure you'd really need triggers written in R ever ;-)

Yeah, that's what I figured too.

Thanks for the feedback!

Joe


From: cbbrowne(at)cbbrowne(dot)com
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new procedural language - PL/R
Date: 2003-02-03 20:56:12
Message-ID: 20030203205612.4EC3C51DC4@cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> >>Any strong feelings on whether this is necessary for a first release?
> >
> > No. I'm not sure you'd really need triggers written in R ever ;-)
>
> Yeah, that's what I figured too.

Indeed. R sounds like it might be an interesting platform from which to
do "data mining," and in that sort of context, you're almost exclusively
reading data, so the loss of triggers would be of no great importance.

What might be "nifty" would be to have some mappings that did Clever
Transformations of Queries Into Views, particularly if that allowed
harnessing the DBMS to do some of the statistical analysis behind your
back...
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://www3.sympatico.ca/cbbrowne/rdbms.html
"After you've heard two eyewitness accounts of an auto accident it
makes you wonder about history." -- Bits and Pieces


From: Joe Conway <mail(at)joeconway(dot)com>
To: cbbrowne(at)cbbrowne(dot)com
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new procedural language - PL/R
Date: 2003-02-03 21:39:06
Message-ID: 3E3EE17A.4030107@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

cbbrowne(at)cbbrowne(dot)com wrote:
> What might be "nifty" would be to have some mappings that did Clever
> Transformations of Queries Into Views, particularly if that allowed
> harnessing the DBMS to do some of the statistical analysis behind your
> back...

I'm not quite sure what you mean here, but it does support pulling data into
the R interpreter as a "data.frame" via SPI, and returning R
matricies/vectors/data.frames as either Postgres arrays or as rows and columns
of a table function. Here's two contrived, but illustrative, examples:

create or replace function test_dtup() returns record as
'data.frame(letters[1:10],1:10)' language 'plr';
select * from test_dtup() as t(f1 text, f2 int);
f1 | f2
----+----
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
g | 7
h | 8
i | 9
j | 10
(10 rows)

create or replace function test_spi_tup(text) returns record as
'pg.spi.exec(arg1)' language 'plr';
select * from test_spi_tup('select oid, typname from pg_type where typname =
''oid'' or typname = ''text''') as t(typeid oid, typename name);
typeid | typename
--------+----------
25 | text
26 | oid
(2 rows)

You could easily perform a parameterized query via SPI, retrieve the results
into an R data.frame, do some statistical manipulations, and then return the
results as a table function. The table function itself could be wrapped in a
view to hide the whole thing from the end-user.

You can also create custom aggregates. There has been at least one thread not
too long ago regarding an aggregate to calculate median, for instance. Here it
is in plr:

create table foo(f1 text, f2 float8);
insert into foo values('cat1',1.21);
insert into foo values('cat1',1.24);
insert into foo values('cat1',1.18);
insert into foo values('cat1',1.26);
insert into foo values('cat1',1.15);
insert into foo values('cat2',1.15);
insert into foo values('cat2',1.26);
insert into foo values('cat2',1.32);
insert into foo values('cat2',1.30);
create or replace function r_median(_float8) returns float as 'median(arg1)'
language 'plr';
CREATE AGGREGATE median (sfunc = array_accum, basetype = float8, stype =
_float8, finalfunc = r_median);
select f1, median(f2) from foo group by f1 order by f1;
f1 | median
------+--------
cat1 | 1.21
cat2 | 1.28
(2 rows)

It's not as fast as the native PostgreSQL functions if you just need average
or standard deviation, but it's alot easier and faster than writing your own
for something more out-of-the-ordinary.

Joe


From: Joe Conway <mail(at)joeconway(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new procedural language - PL/R
Date: 2003-02-11 21:32:18
Message-ID: 3E496BE2.2020502@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Joe Conway <mail(at)joeconway(dot)com> writes:
>>2) Knowing the trend to move stuff *out* of the PostgreSQL source tarball, and
>>assuming plr is released under GPL, is there any chance that it would be
>>accepted into src/pl or contrib, or should I start a gborg project (I'd prefer
>>if it could at least live in contrib)?
>
> I think we'd have to insist on gborg. The only reason there are any
> non-BSD-license items left in contrib is that I haven't finished my TODO
> item to get their licenses changed or remove 'em.

[...snip...]

> BSD would be good. I agree that it'll be a pain in the neck to
> maintain a PL that is not in the main tree, so I'd support accepting it
> if we can get the license right.

I finally got a response from one of the core R developers: "libR is
GPL-ed, and that is unlikely to change" -- so I guess gborg it is :-(
(not that I have anything against gborg ;-))

Before making any release announcements, I'd be interested in feedback
if anyone feels so inclined. The source is currently available here:
http://www.joeconway.com/plr/plr.0.1.1.alpha.tar.gz

The documentation, including preprocessed html, is in the tar ball. I've
also posted the html docs here:
http://www.joeconway.com/plr/index.html

From the README (more or less):
-------------------------------
Installation:
Place tar file in 'contrib' in the PostgreSQL source tree and untar.
Then run:

make
make install
make installcheck

You can use plr.sql to create the language and functions in your
database of choice, e.g.

psql mydatabase < plr.sql
-------------------------------

In addition to the documentation, the plr.out file in plr/expected is a
good source of usage examples.

PL/R should build cleanly with PostgreSQL 7.3.x and cvs HEAD. It was
developed using libR from R 1.6.2 under Red Hat 7.3 & 8.0 -- I've not
tested against other versions of R or different OSes.

Please let me know how it goes.

Thanks,

Joe