Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them

Lists: pgsql-bugs
From: webmaster(at)dhs-club(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
Date: 2012-10-15 16:01:21
Message-ID: E1TNn6X-0003vO-1x@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 7604
Logged by: Bill MacArthur
Email address: webmaster(at)dhs-club(dot)com
PostgreSQL version: 9.2.1
Operating system: CentOS 5.8
Description:

vip_declines_mailers_base is a VIEW that itself uses another VIEW alongside
several other joined tables. nop_seed is a 1 column table that contains 1
date as a reference. members_cancel_pending is a VIEW.

CREATE OR REPLACE VIEW vip_decline_mailers_base AS
SELECT m.id, m.alias, m.firstname, m.lastname, m.emailaddress,
CASE
WHEN s.void = false THEN s.end_date
ELSE (s.end_date - '1 mon'::interval)::date
END AS paid_thru,
mop.payment_method, m.mail_option_lvl, now()::date AS "current_date"
FROM nop_seed,
subscriptions s
JOIN mop ON mop.id = s.member_id
JOIN members_cancel_pending m ON m.id = s.member_id AND
m.membertype::text = 'v'::text
JOIN subscription_types st ON s.subscription_type = st.subscription_type
WHERE (s.end_date < nop_seed.paid_thru OR s.void = true) AND
st.sub_class::text = 'VM'::text;

Then executing a query against vip_declines_mailers_base with no
constraining arguments, the complete result set counts as this:
network=# select count(*) from vip_decline_mailers_base;
count
-------
358

vip_declines_mailers_base is another VIEW that merely adds some criteria to
limit the result set of the 'base' VIEW. In versions 9.0 and back it did
just that. After upgrading to 9.2, the criteria actually expand the result
set.
CREATE OR REPLACE VIEW vip_mailer_unpaid_current AS
SELECT vip_decline_mailers_base.id, vip_decline_mailers_base.alias,
vip_decline_mailers_base.firstname, vip_decline_mailers_base.lastname,
vip_decline_mailers_base.emailaddress,
vip_decline_mailers_base.paid_thru,
vip_decline_mailers_base.payment_method,
vip_decline_mailers_base.mail_option_lvl,
vip_decline_mailers_base."current_date",
current_month_text(now()) AS current_month_text
FROM vip_decline_mailers_base
WHERE vip_decline_mailers_base.mail_option_lvl > 0 AND
vip_decline_mailers_base.paid_thru >= first_of_another_month((now()::date -
'1 mon'::interval)::date) AND vip_decline_mailers_base.paid_thru <=
(first_of_month() - 1);

network=# select count(*) from vip_mailer_unpaid_current;
count
-------
391

How can this be? What's worse, is that adding the criteria somehow mangles
the inner workings of the 'base' VIEW and causes it to return results where
the membertype does not even match the join criteria which should be 'v'
only.

I could create a self contained test case, but the number of tables and
scrubbing the data could be tedious. Perhaps there is enough here to help
pinpoint a trouble spot. I should restate, also, that these VIEWS have been
working fine with 9.0 and earlier versions.

FWIW, here are the EXPLAINs on the two queries.

network=# explain select count(*) from vip_decline_mailers_base;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
Aggregate (cost=82439.33..82439.34 rows=1 width=0)
-> Hash Right Join (cost=78647.01..82439.21 rows=47 width=0)
Hash Cond: (c.id = m.id)
Filter: ((CASE WHEN (c.id IS NULL) THEN m.membertype WHEN
((c.status = 'tr'::text) OR (c.status = 'p'::text) OR (c.status =
'cd'::text) OR (c.status = 'sp'::text) OR (c.status = 'sa'::text)) THEN
m.membertype ELSE 'c'::character
varying END)::text = 'v'::text)
-> Seq Scan on cancellations c (cost=0.00..3325.35 rows=119735
width=6)
-> Hash (cost=78529.91..78529.91 rows=9368 width=6)
-> Hash Join (cost=6128.57..78529.91 rows=9368 width=6)
Hash Cond: (m.id = mop.id)
-> Seq Scan on members m (cost=0.00..66417.39
rows=1570739 width=6)
-> Hash (cost=6011.47..6011.47 rows=9368 width=8)
-> Hash Join (cost=3129.39..6011.47 rows=9368
width=8)
Hash Cond: (mop.id = s.member_id)
-> Seq Scan on mop (cost=0.00..2158.67
rows=71967 width=4)
-> Hash (cost=3012.28..3012.28 rows=9369
width=4)
-> Nested Loop (cost=2.67..3012.28
rows=9369 width=4)
Join Filter: ((s.end_date <
nop_seed.paid_thru) OR s.void)
-> Seq Scan on nop_seed
(cost=0.00..1.01 rows=1 width=4)
-> Hash Join
(cost=2.67..2723.56 rows=23017 width=9)
Hash Cond:
(s.subscription_type = st.subscription_type)
-> Seq Scan on
subscriptions s (cost=0.00..2188.61 rows=80561 width=11)
-> Hash
(cost=2.52..2.52 rows=12 width=2)
-> Seq Scan on
subscription_types st (cost=0.00..2.52 rows=12 width=2)
Filter:
((sub_class)::text = 'VM'::text)
(23 rows)

network=# explain select count(*) from vip_mailer_unpaid_current;


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
Aggregate (cost=25954.57..25954.58 rows=1 width=0)
-> Nested Loop (cost=0.00..25954.57 rows=1 width=0)
-> Nested Loop Left Join (cost=0.00..25948.57 rows=1 width=8)
-> Nested Loop (cost=0.00..25741.69 rows=34 width=10)
-> Nested Loop (cost=0.00..25186.62 rows=47 width=4)
Join Filter: (s.subscription_type =
st.subscription_type)
-> Nested Loop (cost=0.00..25154.54 rows=164
width=6)
Join Filter: ((s.end_date <
nop_seed.paid_thru) OR s.void)
-> Seq Scan on nop_seed (cost=0.00..1.01
rows=1 width=4)
-> Seq Scan on subscriptions s
(cost=0.00..25148.50 rows=403 width=11)
Filter: ((CASE WHEN (NOT void) THEN
end_date ELSE ((end_date - '1 mon'::interval))::date END <=
((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void)
THEN end_date ELSE ((end_date
- '1 mon'::interval))::date END >= first_of_another_month((((now())::date -
'1 mon'::interval))::date)))
-> Materialize (cost=0.00..2.58 rows=12
width=2)
-> Seq Scan on subscription_types st
(cost=0.00..2.52 rows=12 width=2)
Filter: ((sub_class)::text =
'VM'::text)
-> Index Scan using members_pkey on members m
(cost=0.00..11.80 rows=1 width=6)
Index Cond: (id = s.member_id)
Filter: (mail_option_lvl > 0)
-> Index Scan using cancellations_id_key on cancellations c
(cost=0.00..6.07 rows=1 width=6)
Index Cond: (id = m.id)
Filter: ((CASE WHEN (id IS NULL) THEN m.membertype WHEN
((status = 'tr'::text) OR (status = 'p'::text) OR (status = 'cd'::text) OR
(status = 'sp'::text) OR (status = 'sa'::text)) THEN m.membertype ELSE
'c'::character
varying END)::text = 'v'::text)
-> Index Only Scan using mop_pkey on mop (cost=0.00..5.99 rows=1
width=4)
Index Cond: (id = m.id)
(22 rows)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: webmaster(at)dhs-club(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
Date: 2012-10-15 16:18:16
Message-ID: 13773.1350317896@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

webmaster(at)dhs-club(dot)com writes:
> vip_declines_mailers_base is another VIEW that merely adds some criteria to
> limit the result set of the 'base' VIEW. In versions 9.0 and back it did
> just that. After upgrading to 9.2, the criteria actually expand the result
> set.

This doesn't seem to match any of the known bugs in 9.2, so I'm afraid
there's no help for it: you need to create a self-contained test case.

regards, tom lane


From: Bill MacArthur <webmaster(at)dhs-club(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
Date: 2012-10-15 18:20:35
Message-ID: 507C53F3.70606@dhs-club.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 10/15/2012 12:18 PM, Tom Lane wrote:
> webmaster(at)dhs-club(dot)com writes:
>> vip_declines_mailers_base is another VIEW that merely adds some criteria to
>> limit the result set of the 'base' VIEW. In versions 9.0 and back it did
>> just that. After upgrading to 9.2, the criteria actually expand the result
>> set.
>
> This doesn't seem to match any of the known bugs in 9.2, so I'm afraid
> there's no help for it: you need to create a self-contained test case.
>
> regards, tom lane
>

Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the columns significant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema (just renamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are as expected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues??


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill MacArthur <webmaster(at)dhs-club(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
Date: 2012-10-15 18:29:43
Message-ID: 29762.1350325783@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Bill MacArthur <webmaster(at)dhs-club(dot)com> writes:
> Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the columns significant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema (just renamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are as expected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues??

Is the query plan the same according to EXPLAIN?

If not, you may have forgotten to vacuum/analyze the new tables, or
forgotten some relevant index. Or it might be that the total table size
is affecting the plan choice, in which case you need dummy data in the
"irrelevant" columns rather than removing them altogether.

regards, tom lane


From: Bill MacArthur <webmaster(at)dhs-club(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
Date: 2012-10-15 18:48:17
Message-ID: 507C5A71.50807@dhs-club.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 10/15/2012 2:29 PM, Tom Lane wrote:
> Bill MacArthur <webmaster(at)dhs-club(dot)com> writes:
>> Tom, in preparation for a test case I created a new schema (testcase) and copied 6 tables to that, including only the columns significant to enable the VIEWs to be created. I took the 3 VIEWs involved and tweaked them into the new schema (just renamed to testcase.viewname and referencing testcase.relation). However, when run from in there, the results are as expected rather than erroneous. The live data and VIEWs still produce erroneous results. Any clues??
>
> Is the query plan the same according to EXPLAIN?
>
> If not, you may have forgotten to vacuum/analyze the new tables, or
> forgotten some relevant index. Or it might be that the total table size
> is affecting the plan choice, in which case you need dummy data in the
> "irrelevant" columns rather than removing them altogether.
>
> regards, tom lane
>
Update, I started placing primary keys on the testcase tables and watched the planner output. Once I put a PK on one of the tables in particular, the planner revised the plan to use the PK. At that point, the results become erroneous as the planner also moves another filter evaluation to an earlier point at which time I don't think it has the data to make the decision.

I can still finish up with the test case if you like, but here is a highlight. The first VIEW, which I did not originally post is this:

CREATE OR REPLACE VIEW testcase.members_cancel_pending AS
SELECT m.id, m.alias, m.emailaddress, m.firstname, m.lastname,
m.mail_option_lvl,
CASE
WHEN c.id IS NULL THEN m.membertype
WHEN c.status = 'tr'::text OR c.status = 'p'::text OR c.status = 'cd'::text OR c.status = 'sp'::text OR c.status = 'sa'::text THEN m.membertype
ELSE 'c'::character varying
END AS membertype
FROM testcase.members m
LEFT JOIN testcase.cancellations c ON c.id = m.id;

The membertype column is actually calculated.
Then I have the two VIEWs I previous posted:

CREATE OR REPLACE VIEW testcase.vip_decline_mailers_base AS
SELECT m.id, m.alias, m.firstname, m.lastname, m.emailaddress,
CASE
WHEN s.void = false THEN s.end_date
ELSE (s.end_date - '1 mon'::interval)::date
END AS paid_thru,
mop.payment_method, m.mail_option_lvl, now()::date AS "current_date"
FROM testcase.nop_seed,
testcase.subscriptions s
JOIN testcase.mop mop ON mop.id = s.member_id
JOIN testcase.members_cancel_pending m ON m.id = s.member_id AND m.membertype::text = 'v'::text
JOIN testcase.subscription_types st ON s.subscription_type = st.subscription_type
WHERE (s.end_date < nop_seed.paid_thru OR s.void = true) AND st.sub_class::text = 'VM'::text;

This should only be looking for membertype='v'

From here another VIEW is built:

CREATE OR REPLACE VIEW testcase.vip_mailer_unpaid_current AS
SELECT vip_decline_mailers_base.id, vip_decline_mailers_base.alias,
vip_decline_mailers_base.firstname, vip_decline_mailers_base.lastname,
vip_decline_mailers_base.emailaddress, vip_decline_mailers_base.paid_thru,
vip_decline_mailers_base.payment_method,
vip_decline_mailers_base.mail_option_lvl,
vip_decline_mailers_base."current_date",
current_month_text(now()) AS current_month_text
FROM testcase.vip_decline_mailers_base vip_decline_mailers_base
WHERE vip_decline_mailers_base.mail_option_lvl > 0 AND vip_decline_mailers_base.paid_thru >= first_of_another_month((now()::date - '1 mon'::interval)::date) AND vip_decline_mailers_base.paid_thru <= (first_of_month() - 1);

It is assuming that there will only be membertype 'v' is the basic results and is only applying date filters.

Here is the planner output before I put a PK on testcase.cancellations

network=# explain select count(*) from testcase.vip_mailer_unpaid_current;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------
Aggregate (cost=29784.07..29784.08 rows=1 width=0)
-> Nested Loop (cost=25947.06..29783.08 rows=395 width=0)
Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void)
-> Hash Right Join (cost=25947.06..29720.65 rows=1 width=5)
Hash Cond: (c.id = m.id)
Filter: ((CASE WHEN (c.id IS NULL) THEN m.membertype WHEN ((c.status = 'tr'::text) OR (c.status = 'p'::text) OR (c.status = 'cd'::text) OR (c.status = 'sp'::text) OR (c.status = 'sa'::text)) THEN m.membertype ELSE 'c'::char
acter varying END)::text = 'v'::text)
-> Seq Scan on cancellations c (cost=0.00..3324.41 rows=119741 width=6)
-> Hash (cost=25946.03..25946.03 rows=83 width=11)
-> Nested Loop (cost=0.00..25946.03 rows=83 width=11)
-> Nested Loop (cost=0.00..25315.99 rows=115 width=13)
-> Nested Loop (cost=0.00..25053.20 rows=115 width=9)
Join Filter: (s.subscription_type = st.subscription_type)
-> Seq Scan on subscriptions s (cost=0.00..24979.10 rows=403 width=11)
Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1 mon'::interval))::date END <= ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date ELSE ((en
d_date - '1 mon'::interval))::date END >= first_of_another_month((((now())::date - '1 mon'::interval))::date)))
-> Materialize (cost=0.00..1.58 rows=12 width=2)
-> Seq Scan on subscription_types st (cost=0.00..1.52 rows=12 width=2)
Filter: ((sub_class)::text = 'VM'::text)
-> Index Only Scan using tcmopid on mop (cost=0.00..2.28 rows=1 width=4)
Index Cond: (id = s.member_id)
-> Index Scan using tcmembersid on members m (cost=0.00..5.47 rows=1 width=6)
Index Cond: (id = mop.id)
Filter: (mail_option_lvl > 0)
-> Seq Scan on nop_seed (cost=0.00..33.30 rows=2330 width=4)
(23 rows)

network=# select count(*) from testcase.vip_mailer_unpaid_current;
count
-------
331
(1 row)

Here is the output after putting a PK on testcase.cancellations:
ALTER TABLE testcase.cancellations
ADD CONSTRAINT cancellations_pkey PRIMARY KEY(id);

network=# explain select count(*) from testcase.vip_mailer_unpaid_current;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------
Aggregate (cost=26233.61..26233.62 rows=1 width=0)
-> Nested Loop (cost=0.00..26232.63 rows=395 width=0)
Join Filter: ((s.end_date < nop_seed.paid_thru) OR s.void)
-> Nested Loop Left Join (cost=0.00..26170.20 rows=1 width=5)
-> Nested Loop (cost=0.00..25946.03 rows=83 width=11)
-> Nested Loop (cost=0.00..25315.99 rows=115 width=13)
-> Nested Loop (cost=0.00..25053.20 rows=115 width=9)
Join Filter: (s.subscription_type = st.subscription_type)
-> Seq Scan on subscriptions s (cost=0.00..24979.10 rows=403 width=11)
Filter: ((CASE WHEN (NOT void) THEN end_date ELSE ((end_date - '1 mon'::interval))::date END <= ((date_trunc('month'::text, now()))::date - 1)) AND (CASE WHEN (NOT void) THEN end_date ELSE ((end_date
- '1 mon'::interval))::date END >= first_of_another_month((((now())::date - '1 mon'::interval))::date)))
-> Materialize (cost=0.00..1.58 rows=12 width=2)
-> Seq Scan on subscription_types st (cost=0.00..1.52 rows=12 width=2)
Filter: ((sub_class)::text = 'VM'::text)
-> Index Only Scan using tcmopid on mop (cost=0.00..2.28 rows=1 width=4)
Index Cond: (id = s.member_id)
-> Index Scan using tcmembersid on members m (cost=0.00..5.47 rows=1 width=6)
Index Cond: (id = mop.id)
Filter: (mail_option_lvl > 0)
-> Index Scan using cancellations_pkey on cancellations c (cost=0.00..2.69 rows=1 width=6)
Index Cond: (id = m.id)
Filter: ((CASE WHEN (id IS NULL) THEN m.membertype WHEN ((status = 'tr'::text) OR (status = 'p'::text) OR (status = 'cd'::text) OR (status = 'sp'::text) OR (status = 'sa'::text)) THEN m.membertype ELSE 'c'::character
varying END)::text = 'v'::text)
-> Seq Scan on nop_seed (cost=0.00..33.30 rows=2330 width=4)
(22 rows)

network=# select count(*) from testcase.vip_mailer_unpaid_current;
count
-------
390
(1 row)

Would you still like a test case to run?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill MacArthur <webmaster(at)dhs-club(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them
Date: 2012-10-15 19:01:44
Message-ID: 2890.1350327704@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Bill MacArthur <webmaster(at)dhs-club(dot)com> writes:
> Update, I started placing primary keys on the testcase tables and watched the planner output. Once I put a PK on one of the tables in particular, the planner revised the plan to use the PK. At that point, the results become erroneous as the planner also moves another filter evaluation to an earlier point at which time I don't think it has the data to make the decision.

Curious ...

> I can still finish up with the test case if you like, but here is a highlight. The first VIEW, which I did not originally post is this:

Test case, please.

regards, tom lane