Re: Help with a subselect inside a view

Lists: pgsql-general
From: Bill Moseley <moseley(at)hank(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Help with a subselect inside a view
Date: 2005-08-25 06:12:17
Message-ID: 20050825061217.GA5388@hank.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I need a little SQL help:

I'm trying to get a subselect working inside a view.

I have a table "class" that has related tables (a class has a
location, a location has an address with columns city, state, zip).
I want to use a VIEW to display columns related to a given class.

But a class can also have one or more instructors. So I have a link
table:

Table "public.instructors"
Column | Type | Modifiers
--------+---------+-----------
person | integer | not null
class | integer | not null

Foreign-key constraints:
"$1" FOREIGN KEY (person) REFERENCES person(id)
"$2" FOREIGN KEY ("class") REFERENCES "class"(id)

I can do the following, but in the (very rare) case where there may be
two instructors assigned to the class I will get two rows back.

CREATE VIEW class_list
(
id, name, class_time, location, location_name,
address, city, state, zip,
instructor_name
)
AS
SELECT class.id, class.name, class.class_time, class.location,
location.name,
address.id, address.city, address.state, address.zip,
person.last_name

FROM class, location, address,
instructors, person

WHERE class.location = location.id
AND location.address = address.id
AND location.region = region.id
-- Not what I want
AND instructors.person = person.id
AND instructors.class = class.id;

I'm completely happy to just fetch just one of the instructors, and
don't care which one. I just need only one row per class. (I assume
that's my hint right there.)

I can select a single instructor from a given class like:

SELECT person.id
FROM instructors, person
WHERE instructors.class = 555
AND person.id = instructors.person
LIMIT 1;

So I thought I might be able to add that as a subselect to the VIEW,
but I have not been able to make it work. I suspect I'm missing
something obvious.

Thanks,

Oh BTW -- If I do a count(*) and a WHERE that only includes columns in
the "class" table on the VIEW, will Postgresql still do the joins? Or
will it only do the select on the "class" table. I suspect it will
do the joins to make sure the relations can be found.

--
Bill Moseley
moseley(at)hank(dot)org


From: David Fetter <david(at)fetter(dot)org>
To: Bill Moseley <moseley(at)hank(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 08:22:02
Message-ID: 20050825082202.GB21940@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Aug 24, 2005 at 11:12:17PM -0700, Bill Moseley wrote:
> I need a little SQL help:
>
> I'm trying to get a subselect working inside a view.
>
> I have a table "class" that has related tables (a class has a
> location, a location has an address with columns city, state, zip).
> I want to use a VIEW to display columns related to a given class.
>
> But a class can also have one or more instructors. So I have a link
> table:
>
> Table "public.instructors"
> Column | Type | Modifiers
> --------+---------+-----------
> person | integer | not null
> class | integer | not null
>
> Foreign-key constraints:
> "$1" FOREIGN KEY (person) REFERENCES person(id)
> "$2" FOREIGN KEY ("class") REFERENCES "class"(id)
>
> I can do the following, but in the (very rare) case where there may be
> two instructors assigned to the class I will get two rows back.
>
> CREATE VIEW class_list
> (
> id, name, class_time, location, location_name,
> address, city, state, zip,
> instructor_name
> )
> AS
> SELECT class.id, class.name, class.class_time, class.location,
> location.name,
> address.id, address.city, address.state, address.zip,
> person.last_name
>
> FROM class, location, address,
> instructors, person
>
> WHERE class.location = location.id
> AND location.address = address.id
> AND location.region = region.id
> -- Not what I want
> AND instructors.person = person.id
> AND instructors.class = class.id;
>
> I'm completely happy to just fetch just one of the instructors, and
> don't care which one. I just need only one row per class. (I assume
> that's my hint right there.)

This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality.

http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

HTH :)

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Bill Moseley <moseley(at)hank(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 13:05:36
Message-ID: 20050825130536.GB15100@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Aug 24, 2005 at 23:12:17 -0700,
Bill Moseley <moseley(at)hank(dot)org> wrote:
> I need a little SQL help:
>
> I'm trying to get a subselect working inside a view.

Unfortunately you didn't show us what you tried. My guess would be that
you didn't enclose the subselect in parenthesis.

The distinct on solution that was suggested is probably a better way to
go anyway.


From: Bill Moseley <moseley(at)hank(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 15:01:26
Message-ID: 20050825150126.GA14559@hank.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi David,

On Thu, Aug 25, 2005 at 01:22:02AM -0700, David Fetter wrote:
> This sounds like a case for PostgreSQL's nifty DISTINCT ON functionality.
>
> http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

The DISTINCT ON expression(s) must match the leftmost ORDER BY
expression(s). The ORDER BY clause will normally contain additional
expression(s) that determine the desired precedence of rows within
each DISTINCT ON group.

I read that and thought it wasn't a drop-in replacement for my code
due to the leftmost ORDER BY requirement. But, it seems to work even
if that requirement is not met.

Perhaps I not understanding the wording above? Or is Postgresql
adding in the order automatically?

My original VIEWS with duplicates:

DROP VIEW cl;
CREATE VIEW cl (id, class_time, instructor)
AS
SELECT class.id, class.class_time, person.first_name
FROM class, instructors, person
WHERE instructors.person = person.id
AND class.id = instructors.class;

select * from cl where id = 555;
id | class_time | instructor
-----+------------------------+------------
555 | 2005-09-30 09:00:00-07 | Cheryl
555 | 2005-09-30 09:00:00-07 | Bob
(2 rows)

And with DISTINCT ON():

DROP VIEW cl;
CREATE VIEW cl (id, class_time, instructor)
AS
SELECT DISTINCT ON(class.id)
class.id, class.class_time, person.first_name
FROM class, instructors, person
WHERE instructors.person = person.id
AND class.id = instructors.class;

select * from cl where id = 555;
id | class_time | instructor
-----+------------------------+------------
555 | 2005-09-30 09:00:00-07 | Cheryl
(1 row)

Here where the leftmost ORDER BY doesn't match the DISTINCT ON, which I thought
was not possible:

select * from cl where class_time > now() order by instructor limit 3;
id | class_time | instructor
-----+------------------------+------------
544 | 2005-08-31 09:00:00-07 | Cheryl
555 | 2005-09-30 09:00:00-07 | Cheryl
737 | 2005-08-30 09:00:00-07 | Cynthia

--
Bill Moseley
moseley(at)hank(dot)org


From: Bill Moseley <moseley(at)hank(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 15:19:25
Message-ID: 20050825151925.GB14559@hank.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Aug 25, 2005 at 08:05:36AM -0500, Bruno Wolff III wrote:
> On Wed, Aug 24, 2005 at 23:12:17 -0700,
> Bill Moseley <moseley(at)hank(dot)org> wrote:
> > I need a little SQL help:
> >
> > I'm trying to get a subselect working inside a view.
>
> Unfortunately you didn't show us what you tried. My guess would be that
> you didn't enclose the subselect in parenthesis.

No, it wasn't that. I just didn't want to look too foolish. ;)

DROP VIEW cl;
CREATE VIEW cl (id, instructor)
AS
SELECT class.id, person.first_name
FROM class, instructors, person
WHERE instructors.person = person.id
AND class.id = (
SELECT instructors.id
FROM instructors, person
WHERE instructors.class = class.id
AND person.id = instructors.person
LIMIT 1
);

Which returns a row for every row in "instructors" table.

> The distinct on solution that was suggested is probably a better way to
> go anyway.

Turns out it is, Thanks.

--
Bill Moseley
moseley(at)hank(dot)org


From: Bill Moseley <moseley(at)hank(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 16:06:40
Message-ID: 20050825160640.GC14559@hank.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

And about being efficient:

On Thu, Aug 25, 2005 at 08:01:26AM -0700, Bill Moseley wrote:
> DROP VIEW cl;
> CREATE VIEW cl (id, class_time, instructor)
> AS
> SELECT DISTINCT ON(class.id)
> class.id, class.class_time, person.first_name
> FROM class, instructors, person
> WHERE instructors.person = person.id
> AND class.id = instructors.class;

And in a case like above, I'm displaying the list a page at a time.
So I first do a count to find total rows and then a select:

select count(*) from cl where class_time >= now();
select * from cl where class_time >= now() LIMIT 20 OFFSET 40;

I looked at the EXPLAIN ANALYZE for both and both do the join, it
seems. I guess it has to be that way.

So would it be smart to do the initial count on "class" instead
of the view first?

select count(*) from class where class_time >= now();
select * from cl where class_time >= now() LIMIT 20 OFFSET 40;

That is, Postgresql won't figure out that it only need to look at one
table, right?

--
Bill Moseley
moseley(at)hank(dot)org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Moseley <moseley(at)hank(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 16:14:31
Message-ID: 29633.1124986471@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Bill Moseley <moseley(at)hank(dot)org> writes:
>> http://www.postgresql.org/docs/current/static/sql-select.html#SQL-DISTINCT

> The DISTINCT ON expression(s) must match the leftmost ORDER BY
> expression(s). The ORDER BY clause will normally contain additional
> expression(s) that determine the desired precedence of rows within
> each DISTINCT ON group.

> I read that and thought it wasn't a drop-in replacement for my code
> due to the leftmost ORDER BY requirement. But, it seems to work even
> if that requirement is not met.

> CREATE VIEW cl (id, class_time, instructor)
> AS
> SELECT DISTINCT ON(class.id)
> class.id, class.class_time, person.first_name
> FROM class, instructors, person
> WHERE instructors.person = person.id
> AND class.id = instructors.class;

This is allowed because the code automatically adds "ORDER BY class.id"
within the view (as you would see if you examined the view with \d).
It's fairly pointless though, because as the manual notes, you can't get
any well-defined behavior without additional ORDER BY columns to
prioritize the rows within class.id groups. As is, you're getting
random choices of class_time and first_name within the groups.
(Though maybe in this application, you don't care.)

regards, tom lane


From: Bill Moseley <moseley(at)hank(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 16:27:23
Message-ID: 20050825162723.GF14559@hank.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote:
> > CREATE VIEW cl (id, class_time, instructor)
> > AS
> > SELECT DISTINCT ON(class.id)
> > class.id, class.class_time, person.first_name
> > FROM class, instructors, person
> > WHERE instructors.person = person.id
> > AND class.id = instructors.class;
>
> This is allowed because the code automatically adds "ORDER BY class.id"
> within the view (as you would see if you examined the view with \d).

I see that now. Might be helpful for the docs to say that for folks
like me.

> It's fairly pointless though, because as the manual notes, you can't get
> any well-defined behavior without additional ORDER BY columns to
> prioritize the rows within class.id groups. As is, you're getting
> random choices of class_time and first_name within the groups.
> (Though maybe in this application, you don't care.)

I'm not sure I follow what you are saying. I understand that I have
no control over which "first_name" I end up with (and I don't really
care), but class_time is a column in the "class" table which I'm using
DISTINCT ON on, so that should be unique as well. So I assume you
meant random choice of first_name, not class_time.

Thanks,

--
Bill Moseley
moseley(at)hank(dot)org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Moseley <moseley(at)hank(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 16:50:11
Message-ID: 29837.1124988611@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Bill Moseley <moseley(at)hank(dot)org> writes:
> On Thu, Aug 25, 2005 at 12:14:31PM -0400, Tom Lane wrote:
>> It's fairly pointless though, because as the manual notes, you can't get
>> any well-defined behavior without additional ORDER BY columns to
>> prioritize the rows within class.id groups. As is, you're getting
>> random choices of class_time and first_name within the groups.
>> (Though maybe in this application, you don't care.)

> I'm not sure I follow what you are saying. I understand that I have
> no control over which "first_name" I end up with (and I don't really
> care), but class_time is a column in the "class" table which I'm using
> DISTINCT ON on, so that should be unique as well. So I assume you
> meant random choice of first_name, not class_time.

Sorry, I meant that the query by itself doesn't guarantee anything about
which values you will get. If you know a-priori that there is only one
value of class_time per class id, then of course you don't care which
row it's selected from. But from the point of view of this query,
you're getting an unspecified one of the possible values.

In most of the applications I've seen for DISTINCT ON, people *do* care.
For instance, if you wanted to further constrain what you were getting,
you might wish that the returned first_name were the alphabetically
first among the class's instructors. You could get that with
SELECT DISTINCT ON(class.id)
...
ORDER BY class.id, person.first_name;

regards, tom lane


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Bill Moseley <moseley(at)hank(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Help with a subselect inside a view
Date: 2005-08-25 20:13:00
Message-ID: 20050825201300.GA22712@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Aug 25, 2005 at 08:19:25 -0700,
Bill Moseley <moseley(at)hank(dot)org> wrote:
>
> DROP VIEW cl;
> CREATE VIEW cl (id, instructor)
> AS
> SELECT class.id, person.first_name
> FROM class, instructors, person
> WHERE instructors.person = person.id
> AND class.id = (
> SELECT instructors.id
> FROM instructors, person
> WHERE instructors.class = class.id
> AND person.id = instructors.person
> LIMIT 1
> );
>
> Which returns a row for every row in "instructors" table.

I think if you were to use this approach you would do something more like:

DROP VIEW cl;
CREATE VIEW cl (id, instructor)
AS
SELECT class.id,
(SELECT person.first_name
FROM instructors, person
WHERE instructors.class = class.id
AND person.id = instructors.person
LIMIT 1)
FROM class;