Re: Array of composite types returned from python

Lists: pgsql-hackers
From: Sim Zacks <sim(at)compulab(dot)co(dot)il>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Array of composite types returned from python
Date: 2014-06-28 19:41:23
Message-ID: 1440287423.479.1403984483224.JavaMail.root@compulab.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am I missing anything, (ie memory leak, undesirable behavior elsewhere)?

-Ed

I applied the patch and it looks like it is working well. As a longtime plpython user, I appreciate the fix.

I have a few comments:
1) I would remove the error message from the PO files as well.

2) You removed the comment:
- /*
- * We don't support arrays of row types yet, so the first argument
- * can be NULL.
- */

But didn't change the code there.
I haven't delved deep enough into the code yet to understand the full meaning, but the comment would indicate that if arrays of row types are supported, the first argument cannot be null.

3) This is such a simple change with no new infrastructure code (PLyObject_ToComposite already exists). Can you think of a reason why this wasn't done until now? Was it a simple miss or purposefully excluded?


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Sim Zacks <sim(at)compulab(dot)co(dot)il>
Cc: ebehn(at)arinc(dot)com, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-06-29 12:38:53
Message-ID: 20140629123853.GA650@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

When this patch was first added to a CF, I had a quick look at it, but
left it for a proper review by someone more familiar with PL/Python
internals for precisely this reason:

> 2) You removed the comment:
> - /*
> - * We don't support arrays of row types yet, so the first argument
> - * can be NULL.
> - */
>
> But didn't change the code there.
> I haven't delved deep enough into the code yet to understand the full
> meaning, but the comment would indicate that if arrays of row types
> are supported, the first argument cannot be null.

I had another look now, and I think removing the comment is fine. It
actually made no sense to me in context, so I went digging a little.

After following a plpython.c → plpy_*.c refactoring (#147c2482) and a
pgindent run (#65e806cb), I found that the comment was added along with
the code by this commit:

commit db7386187f78dfc45b86b6f4f382f6b12cdbc693
Author: Peter Eisentraut <peter_e(at)gmx(dot)net>
Date: Thu Dec 10 20:43:40 2009 +0000

PL/Python array support

Support arrays as parameters and return values of PL/Python functions.

At the time, the code looked like this:

+ else
+ {
+ nulls[i] = false;
+ /* We don't support arrays of row types yet, so the first
+ * argument can be NULL. */
+ elems[i] = arg->elm->func(NULL, arg->elm, obj);
+ }

Note that the first argument was actually NULL, so the comment made
sense when it was written. But the code was subsequently changed to
pass in arg->elm by the following commit:

commit 09130e5867d49c72ef0f11bef30c5385d83bf194
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Mon Oct 11 22:16:40 2010 -0400

Fix plpython so that it again honors typmod while assigning to tuple fields.

This was broken in 9.0 while improving plpython's conversion behavior for
bytea and boolean. Per bug report from maizi.

The comment should have been removed at the same time. So I don't think
there's a problem here.

> 3) This is such a simple change with no new infrastructure code
> (PLyObject_ToComposite already exists). Can you think of a reason
> why this wasn't done until now? Was it a simple miss or purposefully
> excluded?

This is not an authoritative answer: I think the infrastructure was
originally missing, but was later added in #bc411f25 for OUT parameters.
Perhaps it was overlooked at the time that the same code would suffice
for this earlier-missing case. (I've Cc:ed Peter E. in case he has any
comments.)

I think the patch is ready for committer.

-- Abhijit

P.S. I'm a wee bit confused by this mail I'm replying to, because it's
signed "Ed" and looks like a response, but it's "From: Sim Zacks". I've
added the original author's address to the Cc: in case I misunderstood
something.


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Sim Zacks <sim(at)compulab(dot)co(dot)il>
Cc: ebehn(at)arinc(dot)com, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>, Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Subject: Re: Array of composite types returned from python
Date: 2014-06-29 12:43:14
Message-ID: 20140629124314.GS31357@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-06-29 18:08:53 +0530, ams(at)2ndQuadrant(dot)com wrote:
>
> I think the patch is ready for committer.

That's based on my earlier quick look and the current archaeology. But
I'm not a PL/Python user, and Ronan signed up to review the patch, so I
haven't changed the status.

Ronan, did you get a chance to look at it?

-- Abhijit


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Sim Zacks <sim(at)compulab(dot)co(dot)il>, ebehn(at)arinc(dot)com, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-06-29 14:06:39
Message-ID: 7739.1404050799@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> I had another look now, and I think removing the comment is fine. It
> actually made no sense to me in context, so I went digging a little.
> ...
> Note that the first argument was actually NULL, so the comment made
> sense when it was written. But the code was subsequently changed to
> pass in arg->elm by the following commit:

> commit 09130e5867d49c72ef0f11bef30c5385d83bf194
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Mon Oct 11 22:16:40 2010 -0400

> Fix plpython so that it again honors typmod while assigning to tuple fields.

> This was broken in 9.0 while improving plpython's conversion behavior for
> bytea and boolean. Per bug report from maizi.

> The comment should have been removed at the same time. So I don't think
> there's a problem here.

Yeah, you're right: the comment is referring to the struct PLyTypeInfo *
argument, which isn't there at all anymore. Mea culpa --- that's the same
sort of failure-to-update-nearby-comments thinko that I regularly mutter
about other people making :-(

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Sim Zacks <sim(at)compulab(dot)co(dot)il>, ebehn(at)arinc(dot)com, pgsql-hackers(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-06-29 20:54:03
Message-ID: 4575.1404075243@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
>> 3) This is such a simple change with no new infrastructure code
>> (PLyObject_ToComposite already exists). Can you think of a reason
>> why this wasn't done until now? Was it a simple miss or purposefully
>> excluded?

> This is not an authoritative answer: I think the infrastructure was
> originally missing, but was later added in #bc411f25 for OUT parameters.
> Perhaps it was overlooked at the time that the same code would suffice
> for this earlier-missing case. (I've Cc:ed Peter E. in case he has any
> comments.)

> I think the patch is ready for committer.

I took a quick look at this; not really a review either, but I have
a couple comments.

1. While I think the patch does what it intends to, it's a bit distressing
that it will invoke the information lookups in PLyObject_ToComposite over
again for *each element* of the array. We probably ought to quantify that
overhead to see if it's bad enough that we need to do something about
improving caching, as speculated in the comment in PLyObject_ToComposite.

2. I wonder whether the no-composites restriction in plpy.prepare
(see lines 133ff in plpy_spi.c) could be removed as easily.

regards, tom lane


From: "Behn, Edward (EBEHN)" <EBEHN(at)arinc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Sim Zacks <sim(at)compulab(dot)co(dot)il>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-06-30 15:37:20
Message-ID: 93F16B4BD93A7840AC75EB16E9494C7B153C0C83@EXANPMB2.arinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just writing to check in.

I haven't done anything to look into allowing arrays of composites for input
to PL/Python function. I made the submitted modification for a specific
project that I'm working on that involves python code that returns data
structures.

I also have no idea about a more efficient way to convert composite
elements.
-Ed

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Sunday, June 29, 2014 4:54 PM
To: Abhijit Menon-Sen
Cc: Sim Zacks; Behn, Edward (EBEHN); pgsql-hackers(at)postgresql(dot)org; Peter
Eisentraut
Subject: Re: [HACKERS] Array of composite types returned from python

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
>> 3) This is such a simple change with no new infrastructure code
>> (PLyObject_ToComposite already exists). Can you think of a reason why
>> this wasn't done until now? Was it a simple miss or purposefully
>> excluded?

> This is not an authoritative answer: I think the infrastructure was
> originally missing, but was later added in #bc411f25 for OUT parameters.
> Perhaps it was overlooked at the time that the same code would suffice
> for this earlier-missing case. (I've Cc:ed Peter E. in case he has any
> comments.)

> I think the patch is ready for committer.

I took a quick look at this; not really a review either, but I have a couple
comments.

1. While I think the patch does what it intends to, it's a bit distressing
that it will invoke the information lookups in PLyObject_ToComposite over
again for *each element* of the array. We probably ought to quantify that
overhead to see if it's bad enough that we need to do something about
improving caching, as speculated in the comment in PLyObject_ToComposite.

2. I wonder whether the no-composites restriction in plpy.prepare (see lines
133ff in plpy_spi.c) could be removed as easily.

regards, tom lane


From: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Sim Zacks <sim(at)compulab(dot)co(dot)il>, ebehn(at)arinc(dot)com, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-07-01 09:47:03
Message-ID: 1781180.dBSBzifV3P@ronan.dunklau.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le dimanche 29 juin 2014 16:54:03 Tom Lane a écrit :
> Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> >> 3) This is such a simple change with no new infrastructure code
> >> (PLyObject_ToComposite already exists). Can you think of a reason
> >> why this wasn't done until now? Was it a simple miss or purposefully
> >> excluded?
> >
> > This is not an authoritative answer: I think the infrastructure was
> > originally missing, but was later added in #bc411f25 for OUT parameters.
> > Perhaps it was overlooked at the time that the same code would suffice
> > for this earlier-missing case. (I've Cc:ed Peter E. in case he has any
> > comments.)
> >
> > I think the patch is ready for committer.

Sorry for being this late.

I've tested the patch, everything seems to work as expected, including complex
nesting of Composite and array types.

No documentation changes are needed, since the limitation wasn't even
mentioned before.

Regression tests are ok, and the patch seems simple enough. Formatting looks
OK too.

>
> I took a quick look at this; not really a review either, but I have
> a couple comments.
>
> 1. While I think the patch does what it intends to, it's a bit distressing
> that it will invoke the information lookups in PLyObject_ToComposite over
> again for *each element* of the array. We probably ought to quantify that
> overhead to see if it's bad enough that we need to do something about
> improving caching, as speculated in the comment in PLyObject_ToComposite.

I don't know how to do that without implementing the cache itself.

>
> 2. I wonder whether the no-composites restriction in plpy.prepare
> (see lines 133ff in plpy_spi.c) could be removed as easily.

Hum, I tried that, but its not that easy: lifting the restriction results in a
SEGFAULT when trying to pfree the parameters given to SPI_ExecutePlan (line
320 in plpy_spi.c).

Correct me if I'm wrong, but I think the problem is that HeapTupleGetDatum
returns the t_data field, whereas heap_form_tuple allocation returns the
address of the HeapTuple itself. Then, the datum itself has not been palloced.

Changing the HeapTupleGetDatum call for an heap_copy_tuple_as_datum fixes this
issue, but I'm not sure this the best way to do that.

The attached patch implements this.

>
> regards, tom lane

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

Attachment Content-Type Size
PLPythonCompositeArrays_v2.patch text/x-patch 7.5 KB

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sim Zacks <sim(at)compulab(dot)co(dot)il>, ebehn(at)arinc(dot)com, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-07-01 19:48:39
Message-ID: 20140701194839.GB17428@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Ronan.

Based on your review, I'm marking this as ready for committer.

> The attached patch implements this.

Your patch looks sensible enough (thanks for adding tests), but I guess
we'll let the reviewer sort out whether to commit the original or your
extended version.

Thanks.

-- Abhijit


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Sim Zacks <sim(at)compulab(dot)co(dot)il>, ebehn(at)arinc(dot)com, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-07-02 05:55:14
Message-ID: 21954.1404280514@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com> writes:
> Le dimanche 29 juin 2014 16:54:03 Tom Lane a =E9crit :
>> 1. While I think the patch does what it intends to, it's a bit distressing
>> that it will invoke the information lookups in PLyObject_ToComposite over
>> again for *each element* of the array. We probably ought to quantify that
>> overhead to see if it's bad enough that we need to do something about
>> improving caching, as speculated in the comment in PLyObject_ToComposite.

> I don't know how to do that without implementing the cache itself.

I don't either, but my thought was that we could hack up a simple
one-element cache pretty trivially, eg static info and desc variables
in PLyObject_ToComposite that are initialized the first time through.
You could only test one composite-array type per session with that
sort of kluge, but that would be good enough for doing some simple
performance testing.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Sim Zacks <sim(at)compulab(dot)co(dot)il>, ebehn(at)arinc(dot)com, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Array of composite types returned from python
Date: 2014-07-03 20:43:21
Message-ID: 1774.1404420201@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Ronan Dunklau <ronan(dot)dunklau(at)dalibo(dot)com> writes:
>> I don't know how to do that without implementing the cache itself.

> I don't either, but my thought was that we could hack up a simple
> one-element cache pretty trivially, eg static info and desc variables
> in PLyObject_ToComposite that are initialized the first time through.
> You could only test one composite-array type per session with that
> sort of kluge, but that would be good enough for doing some simple
> performance testing.

I did that, and found that building and returning a million-element
composite array took about 4.2 seconds without any optimization, and 3.2
seconds with the hacked-up cache (as of HEAD, asserts off). I'd say that
means we might want to do something about it eventually, but it's hardly
the first order of business.

I've committed the patch with a bit of additional cleanup. I credited
Ronan and Ed equally as authors, since I'd say the fix for plpy.prepare
was at least as complex as the original patch.

regards, tom lane