Re: View vs. Statement Query Plan

Lists: pgsql-general
From: Curt Sampson <cjs(at)cynic(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: View vs. Statement Query Plan
Date: 2002-06-03 09:18:52
Message-ID: Pine.NEB.4.43.0206031810060.489-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


It seems that my server is happy to use some indices to optimize
access when I do a specific query involving a UNION, but when I
make a view and then query on that view, it doesn't use the indices
any more.

I have two tables that look like this:

CREATE TABLE data (
rec_no INT PRIMARY KEY,
day DATE NOT NULL,
user_id INT NOT NULL,
value INT NOT NULL
) WITHOUT OIDS;
CREATE INDEX data_day ON data (day);
CREATE INDEX data_user_id ON data (user_id);
CREATE INDEX data_value ON data (value);

data_4 has about 10 Mrows, data_4a has about 100 Krows. I created a view,
data, combining these two tables:

CREATE VIEW data AS
SELECT * FROM data_4 UNION ALL SELECT * FROM data_4a

But for some reason this view doesn't use the indices that an
equivalant query uses:

test=# explain select * from data_4 where user_id = 12345 union all select * from data_4a where user_id = 12345;
NOTICE: QUERY PLAN:

Append (cost=0.00..4334.59 rows=1080 width=16)
-> Subquery Scan *SELECT* 1 (cost=0.00..4325.05 rows=1078 width=16)
-> Index Scan using data_4_user_id on data_4 (cost=0.00..4325.05 rows=1078 width=16)
-> Subquery Scan *SELECT* 2 (cost=0.00..9.54 rows=2 width=16)
-> Index Scan using data_4a_user_id on data_4a (cost=0.00..9.54 rows=2 width=16)

EXPLAIN
test=# explain select * from data where user_id = 12345;
NOTICE: QUERY PLAN:

Subquery Scan data (cost=0.00..1638580.00 rows=100100000 width=16)
-> Append (cost=0.00..1638580.00 rows=100100000 width=16)
-> Subquery Scan *SELECT* 1 (cost=0.00..1636943.00 rows=100000000 width=16)
-> Seq Scan on data_4 (cost=0.00..1636943.00 rows=100000000 width=16)
-> Subquery Scan *SELECT* 2 (cost=0.00..1637.00 rows=100000 width=16)
-> Seq Scan on data_4a (cost=0.00..1637.00 rows=100000 width=16)

Any idea why this is? Should I be creating the view in a different way?

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-04 05:15:38
Message-ID: 14790.1023167738@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Curt Sampson <cjs(at)cynic(dot)net> writes:
> But for some reason this view doesn't use the indices that an
> equivalant query uses:

You're essentially supposing that

select * from (select * from a union select * from b) where foo;

may be transformed into

(select * from a where foo) union (select * from b where foo);

I don't doubt that this transformation is valid in some cases ... but
I do doubt that it is valid in all cases. If someone can supply a
rigorous proof about when it is valid, I'd be willing to look into
doing the necessary programming.

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-04 07:53:34
Message-ID: 20020604175334.A2471@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Jun 04, 2002 at 01:15:38AM -0400, Tom Lane wrote:
> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > But for some reason this view doesn't use the indices that an
> > equivalant query uses:
>
> You're essentially supposing that
>
> select * from (select * from a union select * from b) where foo;
>
> may be transformed into
>
> (select * from a where foo) union (select * from b where foo);
>
> I don't doubt that this transformation is valid in some cases ... but
> I do doubt that it is valid in all cases. If someone can supply a
> rigorous proof about when it is valid, I'd be willing to look into
> doing the necessary programming.

IIRC, union does an implicit DISTINCT (there's UNION ALL, right). So if what
is being selected is anything other than a simple statement, it'll be very
hard to prove equivalence (i guess this is what the iscachable is for).

HTH,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-04 23:15:59
Message-ID: 19935.1023232559@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Tue, Jun 04, 2002 at 01:15:38AM -0400, Tom Lane wrote:
>> I don't doubt that this transformation is valid in some cases ... but
>> I do doubt that it is valid in all cases. If someone can supply a
>> rigorous proof about when it is valid, I'd be willing to look into
>> doing the necessary programming.

> IIRC, union does an implicit DISTINCT (there's UNION ALL, right). So if what
> is being selected is anything other than a simple statement, it'll be very
> hard to prove equivalence (i guess this is what the iscachable is
> for).

Yeah, the UNION vs. UNION ALL difference is one of the things that would
need to be thought about. I think it's more likely that the
transformation would work for UNION ALL than for UNION ... but I have
not had the time to try to work it out.

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-05 08:47:16
Message-ID: 20020605184716.B9784@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Jun 04, 2002 at 07:15:59PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > On Tue, Jun 04, 2002 at 01:15:38AM -0400, Tom Lane wrote:
> >> I don't doubt that this transformation is valid in some cases ... but
> >> I do doubt that it is valid in all cases. If someone can supply a
> >> rigorous proof about when it is valid, I'd be willing to look into
> >> doing the necessary programming.
>
> > IIRC, union does an implicit DISTINCT (there's UNION ALL, right). So if what
> > is being selected is anything other than a simple statement, it'll be very
> > hard to prove equivalence (i guess this is what the iscachable is
> > for).
>
> Yeah, the UNION vs. UNION ALL difference is one of the things that would
> need to be thought about. I think it's more likely that the
> transformation would work for UNION ALL than for UNION ... but I have
> not had the time to try to work it out.

One thing I don't know and that is how closely SQL follows relational
algebra. Is it close enough that you can prove results in relational algebra
and have them work in SQL. Or there enough special cases to make that
tricky.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-05 09:35:10
Message-ID: Pine.NEB.4.43.0206051749150.454-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 5 Jun 2002, Martijn van Oosterhout wrote:

> One thing I don't know and that is how closely SQL follows relational
> algebra. Is it close enough that you can prove results in relational algebra
> and have them work in SQL. Or there enough special cases to make that
> tricky.

Ha! Nowhere near close enough. SQL actually breaks the relational model
in many ways; C. J. Date's books (particularly _The Third Manifesto_ and
his _Introduction to Database Systems_) go into this in detail.

I've not really had time to sit down and think about the characteristics
of the problem of whether or not a WHERE clause on a pair of SELECT
statements joined by UNION is distributitive or not. The easiest
thing would be if someone could find a case that proves by example
that the WHERE clause is not distributitive.

Providing a formal proof that it is distributive seems to me rather
difficult, because I can't even really see how to translate SQL
into relational algebra....

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-05 11:24:36
Message-ID: 20020605212436.C9784@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jun 05, 2002 at 06:35:10PM +0900, Curt Sampson wrote:
> On Wed, 5 Jun 2002, Martijn van Oosterhout wrote:
>
> > One thing I don't know and that is how closely SQL follows relational
> > algebra. Is it close enough that you can prove results in relational algebra
> > and have them work in SQL. Or there enough special cases to make that
> > tricky.
>
> Ha! Nowhere near close enough. SQL actually breaks the relational model
> in many ways; C. J. Date's books (particularly _The Third Manifesto_ and
> his _Introduction to Database Systems_) go into this in detail.

Interesting. I don't know these books, I may see if I can find them.

> I've not really had time to sit down and think about the characteristics
> of the problem of whether or not a WHERE clause on a pair of SELECT
> statements joined by UNION is distributitive or not. The easiest
> thing would be if someone could find a case that proves by example
> that the WHERE clause is not distributitive.

Well, I think UNION ALL may be doable, but straight UNION could be tricky.
You need something stricter than iscachable do be able to use functions.
I've got a nifty example below.

> Providing a formal proof that it is distributive seems to me rather
> difficult, because I can't even really see how to translate SQL
> into relational algebra....

Most places I've looked at on the web that relate SQL and relational algebra
seem good, but tend to ignore outer joins and group by, which kinda blows a
hole in most things.

Nifty example:

select lower(f) from (select f from a union select f from b);
(select lower(f) from a) union (select lower(f) from b);

If table a has "HELLO" and table b has "hello", the first will produce two
rows of output, whereas the second produce only one. To work, you require
the function to be *invertable* (1-to-1 mapping) which is pretty strict. Off
the top of my head I'd say the vast majority would fail that test.

UNION ALL would be fine I think, as long as the functions ware cachable and
there were no aggregates. Basically, as long all you're pushing down is a
cacheable projection.

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.


From: Gregory Seidman <gss+pg(at)cs(dot)brown(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-05 14:49:46
Message-ID: 20020605144946.GA19448@jamaica.cs.brown.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout sez:
[...]
} Nifty example:
}
} select lower(f) from (select f from a union select f from b);
} (select lower(f) from a) union (select lower(f) from b);
}
} If table a has "HELLO" and table b has "hello", the first will produce two
} rows of output, whereas the second produce only one. To work, you require
} the function to be *invertable* (1-to-1 mapping) which is pretty strict. Off
} the top of my head I'd say the vast majority would fail that test.
}
} UNION ALL would be fine I think, as long as the functions ware cachable and
} there were no aggregates. Basically, as long all you're pushing down is a
} cacheable projection.

As long as you aren't messing with the columns (no projections, no value
modifications, no aggregation), it is $\sigma_p(A) \cup \sigma_p(B)$ vs.
$\sigma_p(A \cup B)$. These are equivalent, regardless of whether it is a
proper UNION or a UNION ALL.

} Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
--Greg


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Gregory Seidman <gss+pg(at)cs(dot)brown(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: View vs. Statement Query Plan
Date: 2002-06-06 00:50:05
Message-ID: 20020606105005.A13084@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jun 05, 2002 at 10:49:46AM -0400, Gregory Seidman wrote:
> As long as you aren't messing with the columns (no projections, no value
> modifications, no aggregation), it is $\sigma_p(A) \cup \sigma_p(B)$ vs.
> $\sigma_p(A \cup B)$. These are equivalent, regardless of whether it is a
> proper UNION or a UNION ALL.

Well, i think in this users case he was using a view. Unless you are using
all the columns from the view you're going to have a problem, since dropping
columns is also a projection.

select a from (select a,b from tableA union select a,b from tableB)
(select a from tableA) union (select a from tableB)

are not equivalent. I don't think it's possible to simplify the first
statement at all with UNION. OTOH, UNION ALL would be fine.

Anyhow, for a programmtical point of view, all that would to be done would
be to allow certain types of projections to be distributed across Append
nodes. Basically, projections that drop fields and only involve cachable
functions. But you need to think harder about what can be passed through
Distinct nodes.

Proving that it works for straight Append is easy. First it does one query
and then the other. So any projections done just after can also be done
just before with the same effect. Although fields get renamed, don't they.

So when people ask why their view is slow, ask whether they mean UNION or
UNION ALL and change it if it makes no difference.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.