row literal problem

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: row literal problem
Date: 2012-07-18 18:58:22
Message-ID: 5007074E.7000308@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm chasing up an issue from a client who has this problem (in 9.1):

with q as
(
some query here
)
select q.* from q

yields:

job_scope | checked_col
-----------------------------------------------+------------------------------------------
Co Revenues: Co Revenues $100 to $999 Million | <input panel=data
type=checkbox checked>
Metropolitan Area: Austin-Round Rock | <input panel=data
type=checkbox checked>

which is as expected.

However,

with q as
(
same query here
)
select q from q

yields:

q
-----------------------------------------------------------------------------------------------
("Co Revenues: Co Revenues $100 to $999 Million","<input panel=data
type=checkbox checked>",)
("Metropolitan Area: Austin-Round Rock","<input panel=data
type=checkbox checked>",)

Note the trailing comma inside the (), which certainly looks bogus to
me. If I replace "select q" with "select row(q.*)" it disappears.

It doesn't happen in all cases, and I'm trying to work out a minimal
reproducible example. But it sure is puzzling.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 19:18:27
Message-ID: CAHyXU0y2ixy4B+0vmn03sx5YHh8a_yPwsQZDBfHGH4MgspZ+bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 18, 2012 at 1:58 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> I'm chasing up an issue from a client who has this problem (in 9.1):
>
> with q as
> (
> some query here
> )
> select q.* from q
>
> yields:
>
> job_scope | checked_col
> -----------------------------------------------+------------------------------------------
> Co Revenues: Co Revenues $100 to $999 Million | <input panel=data
> type=checkbox checked>
> Metropolitan Area: Austin-Round Rock | <input panel=data
> type=checkbox checked>
>
> which is as expected.
>
> However,
>
> with q as
> (
> same query here
> )
> select q from q
>
> yields:
>
> q
> -----------------------------------------------------------------------------------------------
> ("Co Revenues: Co Revenues $100 to $999 Million","<input panel=data
> type=checkbox checked>",)
> ("Metropolitan Area: Austin-Round Rock","<input panel=data type=checkbox
> checked>",)
>
>
> Note the trailing comma inside the (), which certainly looks bogus to me. If
> I replace "select q" with "select row(q.*)" it disappears.
>
> It doesn't happen in all cases, and I'm trying to work out a minimal
> reproducible example. But it sure is puzzling.

there are no null fields, right? if the last field is sometimes null
you'd see that (you probably ruled that out though). when you say
'sometimes', do you mean for some rows and not others? or for some
queries?

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 19:27:45
Message-ID: 50070E31.8010006@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 07/18/2012 03:18 PM, Merlin Moncure wrote:
> On Wed, Jul 18, 2012 at 1:58 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> I'm chasing up an issue from a client who has this problem (in 9.1):
>>
>> with q as
>> (
>> some query here
>> )
>> select q.* from q
>>
>> yields:
>>
>> job_scope | checked_col
>> -----------------------------------------------+------------------------------------------
>> Co Revenues: Co Revenues $100 to $999 Million | <input panel=data
>> type=checkbox checked>
>> Metropolitan Area: Austin-Round Rock | <input panel=data
>> type=checkbox checked>
>>
>> which is as expected.
>>
>> However,
>>
>> with q as
>> (
>> same query here
>> )
>> select q from q
>>
>> yields:
>>
>> q
>> -----------------------------------------------------------------------------------------------
>> ("Co Revenues: Co Revenues $100 to $999 Million","<input panel=data
>> type=checkbox checked>",)
>> ("Metropolitan Area: Austin-Round Rock","<input panel=data type=checkbox
>> checked>",)
>>
>>
>> Note the trailing comma inside the (), which certainly looks bogus to me. If
>> I replace "select q" with "select row(q.*)" it disappears.
>>
>> It doesn't happen in all cases, and I'm trying to work out a minimal
>> reproducible example. But it sure is puzzling.
> there are no null fields, right? if the last field is sometimes null
> you'd see that (you probably ruled that out though). when you say
> 'sometimes', do you mean for some rows and not others? or for some
> queries?
>

No, the inner query has two fields.

It happens for all rows, but not for all two-field-resulting queries as
q. I'm trying to find a simple case rather than the rather complex query
my customer is using.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 19:30:05
Message-ID: 29564.1342639805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 07/18/2012 03:18 PM, Merlin Moncure wrote:
>> there are no null fields, right? if the last field is sometimes null
>> you'd see that (you probably ruled that out though). when you say
>> 'sometimes', do you mean for some rows and not others? or for some
>> queries?

> No, the inner query has two fields.

> It happens for all rows, but not for all two-field-resulting queries as
> q. I'm trying to find a simple case rather than the rather complex query
> my customer is using.

I'm wondering about a rowtype with a third, dropped column.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 20:42:48
Message-ID: 50071FC8.1060806@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 07/18/2012 03:30 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 07/18/2012 03:18 PM, Merlin Moncure wrote:
>>> there are no null fields, right? if the last field is sometimes null
>>> you'd see that (you probably ruled that out though). when you say
>>> 'sometimes', do you mean for some rows and not others? or for some
>>> queries?
>> No, the inner query has two fields.
>> It happens for all rows, but not for all two-field-resulting queries as
>> q. I'm trying to find a simple case rather than the rather complex query
>> my customer is using.
> I'm wondering about a rowtype with a third, dropped column.

As usual Tom has hit the nail on the head. Here's a simple test case
that demonstrates the problem. I could probably have cut it down more
but I was following the structure of the original somewhat:

# with q as
(
select max(nspname) as nspname, sum(allind.count) as indices
from (select indrelid, count(*)
from pg_index
group by indrelid) allind
left outer join pg_class on pg_class.oid = allind.indrelid
left outer join pg_namespace on pg_class.relnamespace =
pg_namespace.oid
group by pg_namespace.oid
)
select q from q;

q
--------------------
(pg_catalog,91,11)
(pg_toast,18,99)
(2 rows)

cheers

andrew

>
> regards, tom lane
>


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 20:56:06
Message-ID: CAHyXU0wMF=CLaxqdJFiSkFX-Y=5m14Oar4d0_pRTT_O=1=w7rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 18, 2012 at 3:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 07/18/2012 03:30 PM, Tom Lane wrote:
>>
>> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>>>
>>> On 07/18/2012 03:18 PM, Merlin Moncure wrote:
>>>>
>>>> there are no null fields, right? if the last field is sometimes null
>>>> you'd see that (you probably ruled that out though). when you say
>>>> 'sometimes', do you mean for some rows and not others? or for some
>>> queries?
>>>
>>> No, the inner query has two fields.
>>> It happens for all rows, but not for all two-field-resulting queries as
>>> q. I'm trying to find a simple case rather than the rather complex query
>>> my customer is using.
>>
>> I'm wondering about a rowtype with a third, dropped column.
>
>
>
> As usual Tom has hit the nail on the head. Here's a simple test case that
> demonstrates the problem. I could probably have cut it down more but I was
> following the structure of the original somewhat:
>
> # with q as
> (
> select max(nspname) as nspname, sum(allind.count) as indices
> from (select indrelid, count(*)
> from pg_index
> group by indrelid) allind
> left outer join pg_class on pg_class.oid = allind.indrelid
> left outer join pg_namespace on pg_class.relnamespace =
> pg_namespace.oid
> group by pg_namespace.oid
> )
> select q from q;
>
> q
> --------------------
> (pg_catalog,91,11)
> (pg_toast,18,99)
> (2 rows)

hm, it's the 'group by' -- for example if you add group by
pg_namespace.oid, group by pg_namespace.oid || 'abc', you can invent
columns that come back into the rowtype.

merlin


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 20:58:26
Message-ID: CAHyXU0ywPPKEcppvJt+WWRn_rkfm6Wz3r4id-+snvGURRhbfFA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 18, 2012 at 3:56 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> hm, it's the 'group by' -- for example if you add group by
> pg_namespace.oid, group by pg_namespace.oid || 'abc', you can invent
> columns that come back into the rowtype.

here's a cut down example:
with q as (select max(v) from (select 1 as v) q group by v) select q from q;

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-18 21:31:06
Message-ID: 2603.1342647066@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> hm, it's the 'group by' -- for example if you add group by
> pg_namespace.oid, group by pg_namespace.oid || 'abc', you can invent
> columns that come back into the rowtype.

Yeah, the whole-row variable is evidently picking up "resjunk" columns
from the inner query. Haven't looked to see exactly where.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-20 02:35:42
Message-ID: 21986.1342751742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> here's a cut down example:
> with q as (select max(v) from (select 1 as v) q group by v) select q from q;

I traced through this example, and find that it's another issue in an
area we've hacked at before. ExecEvalVar, when it finds that it's
dealing with a whole-row Var of type RECORD, just assumes that the tuple
descriptor of the source TupleTableSlot is the correct descriptor for
the whole-row result. Now, in the case where the whole-row Var has a
named composite type, we have found that we have to go to quite some
lengths to deal with possible discrepancies in the desired and actual
rowtypes; in particular notice this comment:

* ... Also, we have to allow the case that the slot has
* more columns than the Var's type, because we might be looking
* at the output of a subplan that includes resjunk columns. (XXX
* it would be nice to verify that the extra columns are all
* marked resjunk, but we haven't got access to the subplan
* targetlist here...)

I think the way to solve this is to do whatever it takes to get access
to the subplan targetlist. We could then do something a bit cleaner
than what the named-rowtype code is currently doing: if there are
resjunk columns in the subplan targetlist, use the tlist to create a
JunkFilter, and then pass the tuples through that. After that we can
insist that the tuples don't have any extra columns.

Getting hold of that tlist is going to be a bit messy, though. I think
what we can do is create a special ExprState variant for whole-row Vars
(which we'd need anyway to hold the JunkFilter), and have ExecInitExpr
store the "parent" PlanState pointer into it. Then at the first call
of ExecEvalVar, dig down to the appropriate tlist depending on what
type of PlanState we find ourselves running in. This shouldn't be
too painful because a whole-row Var can only appear in a simple scan
node, not an upper-level plan node, so there are not as many cases
to deal with as you might think.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-20 06:31:33
Message-ID: 8365.1342765893@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I think the way to solve this is to do whatever it takes to get access
> to the subplan targetlist. We could then do something a bit cleaner
> than what the named-rowtype code is currently doing: if there are
> resjunk columns in the subplan targetlist, use the tlist to create a
> JunkFilter, and then pass the tuples through that. After that we can
> insist that the tuples don't have any extra columns.

Here's a draft patch for that. It wasn't quite as ugly as I feared.
A lot of the apparent bulk of the patch is because I chose to split
ExecEvalVar into separate functions for the scalar and whole-row
cases, which seemed appropriate because they now get different
ExprState node types.

regards, tom lane

Attachment Content-Type Size
whole-row-junk-cols.patch text/x-patch 27.9 KB

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-20 14:21:32
Message-ID: CAHyXU0zmJ6N6AGsZrzXcrHprrOgEoUF8GFamne-MZYCPRXFhqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 20, 2012 at 1:31 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I think the way to solve this is to do whatever it takes to get access
>> to the subplan targetlist. We could then do something a bit cleaner
>> than what the named-rowtype code is currently doing: if there are
>> resjunk columns in the subplan targetlist, use the tlist to create a
>> JunkFilter, and then pass the tuples through that. After that we can
>> insist that the tuples don't have any extra columns.
>
> Here's a draft patch for that. It wasn't quite as ugly as I feared.
> A lot of the apparent bulk of the patch is because I chose to split
> ExecEvalVar into separate functions for the scalar and whole-row
> cases, which seemed appropriate because they now get different
> ExprState node types.

Thanks for that! Applying the patch and confirming the fix turned up
no issues. I did a perfunctory review and it all looks pretty good:
maybe ExecInitExpr could use a comment describing the
InvalidAttrNumber check though...it's somewhat common knowledge that
InvalidAttrNumber means row variables but it's also used to initialize
variables before loops scans and things like that.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row literal problem
Date: 2012-07-20 17:13:21
Message-ID: 26635.1342804401@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Fri, Jul 20, 2012 at 1:31 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Here's a draft patch for that. It wasn't quite as ugly as I feared.
>> A lot of the apparent bulk of the patch is because I chose to split
>> ExecEvalVar into separate functions for the scalar and whole-row
>> cases, which seemed appropriate because they now get different
>> ExprState node types.

> Thanks for that! Applying the patch and confirming the fix turned up
> no issues. I did a perfunctory review and it all looks pretty good:
> maybe ExecInitExpr could use a comment describing the
> InvalidAttrNumber check though...it's somewhat common knowledge that
> InvalidAttrNumber means row variables but it's also used to initialize
> variables before loops scans and things like that.

Thanks for testing. I added the suggested comment and made some other
cosmetic improvements, and have committed this.

regards, tom lane