Re: Simple OUTER JOIN doubt

Lists: pgsql-general
From: Sandro Dentella <sandro(at)e-den(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Simple OUTER JOIN doubt
Date: 2006-10-27 18:48:30
Message-ID: 20061027184830.GA21813@casa.e-den.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I've a very simple task. I thought I knew how to solve it but there's
something I'm surely missing.

I got film and scores for the film. In table "film" and "punteggio"
(score). I want a join returning all the films and the votes
expressed by the user(s). If the user did not vote I want a NULL.

I only get films for which a vote was expressed. My query:

SELECT f.id, f.titolo, p.voto
FROM film_film f
LEFT OUTER JOIN vota_punteggio p
ON (f.id = p.film_id)
WHERE
(p.user_id = 2 OR p.user_id IS NULL)

Can you help me understanding what is wrong?
TIA
sandro
*:-)

cinemino=# \d film_film
Tabella "public.film_film"
Colonna | Tipo | Modificatori
-----------------+------------------------+--------------------------------------------------------
id | integer | not null default nextval('film_film_id_seq'::regclass)
titolo | character varying(100) | not null
regista | character varying(40) |
url_iann | character varying(200) |
url_altri | character varying(200) |
anno | integer |
image | character varying(100) |
durata | integer |
genere_id | integer |
data_proiezione | date |
proposto_da | integer |

cinemino=# \d vota_punteggio
Tabella "public.vota_punteggio"
Colonna | Tipo | Modificatori
---------+---------+-------------------------------------------------------------
id | integer | not null default nextval('vota_punteggio_id_seq'::regclass)
voto | integer | not null
user_id | integer | not null
film_id | integer | not null

--
Sandro Dentella *:-)
e-mail: sandro(at)e-den(dot)it
http://www.tksql.org TkSQL Home page - My GPL work


From: Russ Brown <pickscrape(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Simple OUTER JOIN doubt
Date: 2006-10-27 19:42:06
Message-ID: 4542610E.9020302@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sandro Dentella wrote:
> I've a very simple task. I thought I knew how to solve it but there's
> something I'm surely missing.
>
> I got film and scores for the film. In table "film" and "punteggio"
> (score). I want a join returning all the films and the votes
> expressed by the user(s). If the user did not vote I want a NULL.
>
> I only get films for which a vote was expressed. My query:
>
> SELECT f.id, f.titolo, p.voto
> FROM film_film f
> LEFT OUTER JOIN vota_punteggio p
> ON (f.id = p.film_id)
> WHERE
> (p.user_id = 2 OR p.user_id IS NULL)
>
>
> Can you help me understanding what is wrong?
> TIA
> sandro
> *:-)
>

Looks to me like it is because you're referencing vota_punteggio in the
WHERE clause.

Try something like this:

SELECT f.id, f.titolo, p.voto
FROM film_film f
LEFT OUTER JOIN vota_punteggio p
ON f.id = p.film_id
AND p.user_id = 2

>
>
> cinemino=# \d film_film
> Tabella "public.film_film"
> Colonna | Tipo | Modificatori
> -----------------+------------------------+--------------------------------------------------------
> id | integer | not null default nextval('film_film_id_seq'::regclass)
> titolo | character varying(100) | not null
> regista | character varying(40) |
> url_iann | character varying(200) |
> url_altri | character varying(200) |
> anno | integer |
> image | character varying(100) |
> durata | integer |
> genere_id | integer |
> data_proiezione | date |
> proposto_da | integer |
>
> cinemino=# \d vota_punteggio
> Tabella "public.vota_punteggio"
> Colonna | Tipo | Modificatori
> ---------+---------+-------------------------------------------------------------
> id | integer | not null default nextval('vota_punteggio_id_seq'::regclass)
> voto | integer | not null
> user_id | integer | not null
> film_id | integer | not null
>
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sandro Dentella <sandro(at)e-den(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Simple OUTER JOIN doubt
Date: 2006-10-27 19:55:35
Message-ID: 8937.1161978935@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sandro Dentella <sandro(at)e-den(dot)it> writes:
> I only get films for which a vote was expressed. My query:

> SELECT f.id, f.titolo, p.voto
> FROM film_film f
> LEFT OUTER JOIN vota_punteggio p
> ON (f.id = p.film_id)
> WHERE
> (p.user_id = 2 OR p.user_id IS NULL)

> Can you help me understanding what is wrong?

[ scratches head... ] Looks all right to me. Are you sure you copied
what you typed accurately? Does EXPLAIN show that a left join is being
used?

regards, tom lane


From: Sandro Dentella <sandro(at)e-den(dot)it>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Simple OUTER JOIN doubt
Date: 2006-10-27 20:11:09
Message-ID: 20061027201109.GA24515@casa.e-den.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Oct 27, 2006 at 02:42:06PM -0500, Russ Brown wrote:
> Looks to me like it is because you're referencing vota_punteggio in the
> WHERE clause.
>
> Try something like this:
>
> SELECT f.id, f.titolo, p.voto
> FROM film_film f
> LEFT OUTER JOIN vota_punteggio p
> ON f.id = p.film_id
> AND p.user_id = 2
>

this works in fact, and it's simpler. But I don't really understard why I
should put it in this way.

On Fri, Oct 27, 2006 at 03:55:35PM -0400, Tom Lane wrote:
> [ scratches head... ] Looks all right to me. Are you sure you copied
> what you typed accurately? Does EXPLAIN show that a left join is being
> used?

here is the explain for both queries:

cinemino=# explain SELECT f.id, f.titolo, p.voto
FROM film_film f
LEFT OUTER JOIN vota_punteggio p
ON (f.id = p.film_id)
WHERE
(p.user_id = 2 OR p.user_id IS NULL)
;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Merge Right Join (cost=18.00..106.26 rows=170 width=76)
Merge Cond: ("outer".film_id = "inner".id)
Filter: (("outer".user_id = 2) OR ("outer".user_id IS NULL))
-> Index Scan using vota_punteggio_film_id on vota_punteggio p (cost=0.00..59.93 rows=1630 width=12)
-> Sort (cost=18.00..18.42 rows=170 width=72)
Sort Key: f.id
-> Seq Scan on film_film f (cost=0.00..11.70 rows=170 width=72)
(7 righe)

cinemino=# explain SELECT f.id, f.titolo, p.voto
FROM film_film f
LEFT OUTER JOIN vota_punteggio p
ON (f.id = p.film_id AND p.user_id = 2)
;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Left Join (cost=12.05..24.67 rows=170 width=76)
Hash Cond: ("outer".id = "inner".film_id)
-> Seq Scan on film_film f (cost=0.00..11.70 rows=170 width=72)
-> Hash (cost=12.03..12.03 rows=8 width=8)
-> Bitmap Heap Scan on vota_punteggio p (cost=2.03..12.03 rows=8 width=8)
Recheck Cond: (user_id = 2)
-> Bitmap Index Scan on vota_punteggio_user_id (cost=0.00..2.03 rows=8 width=0)
Index Cond: (user_id = 2)
(8 righe)

BTW: I'm no able to read explain output, but it's a long time I want to
start studying them. I think I should start studying chapter 13, other
hints on this subject?

--
Sandro Dentella *:-)
e-mail: sandro(at)e-den(dot)it
http://www.tksql.org TkSQL Home page - My GPL work


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sandro Dentella <sandro(at)e-den(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Simple OUTER JOIN doubt
Date: 2006-10-27 21:08:57
Message-ID: 10222.1161983337@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sandro Dentella <sandro(at)e-den(dot)it> writes:
> this works in fact, and it's simpler. But I don't really understard why I
> should put it in this way.

The other way seems to work for me:

regression=# create table film_film (id int, titolo text);
CREATE TABLE
regression=# create table vota_punteggio(film_id int, user_id int, voto int);
CREATE TABLE
regression=# insert into film_film values(1, 'one');
INSERT 0 1
regression=# insert into film_film values(2, 'two');
INSERT 0 1
regression=# insert into vota_punteggio values(1, 2, 10);
INSERT 0 1
regression=# SELECT f.id, f.titolo, p.voto
regression-# FROM film_film f LEFT OUTER JOIN vota_punteggio p
regression-# ON (f.id = p.film_id)
regression-# WHERE (p.user_id = 2 OR p.user_id IS NULL);
id | titolo | voto
----+--------+------
1 | one | 10
2 | two |
(2 rows)

Can you put together a self-contained test case showing the problem?
What PG version are you running, anyway? (There's a known bug in merge
right join in 8.1.0-8.1.3, but AFAIR the symptom is too many output rows
not too few.)

regards, tom lane


From: Russ Brown <pickscrape(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Simple OUTER JOIN doubt
Date: 2006-10-27 21:22:36
Message-ID: 4542789C.1020205@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sandro Dentella wrote:
> On Fri, Oct 27, 2006 at 02:42:06PM -0500, Russ Brown wrote:
>> Looks to me like it is because you're referencing vota_punteggio in the
>> WHERE clause.
>>
>> Try something like this:
>>
>> SELECT f.id, f.titolo, p.voto
>> FROM film_film f
>> LEFT OUTER JOIN vota_punteggio p
>> ON f.id = p.film_id
>> AND p.user_id = 2
>>

My thinking was that by putting the conditions in the WHERE clause you
were restricting the rows returned by the entire query, while putting
them in the ON clause only restricts the matching in the JOIN itself.

>
> this works in fact, and it's simpler. But I don't really understard why I
> should put it in this way.
>
> On Fri, Oct 27, 2006 at 03:55:35PM -0400, Tom Lane wrote:
>> [ scratches head... ] Looks all right to me. Are you sure you copied
>> what you typed accurately? Does EXPLAIN show that a left join is being
>> used?
>
> here is the explain for both queries:
>
> cinemino=# explain SELECT f.id, f.titolo, p.voto
> FROM film_film f
> LEFT OUTER JOIN vota_punteggio p
> ON (f.id = p.film_id)
> WHERE
> (p.user_id = 2 OR p.user_id IS NULL)
> ;
> QUERY PLAN
> ----------------------------------------------------------------------------------------------------------
> Merge Right Join (cost=18.00..106.26 rows=170 width=76)
> Merge Cond: ("outer".film_id = "inner".id)
> Filter: (("outer".user_id = 2) OR ("outer".user_id IS NULL))
> -> Index Scan using vota_punteggio_film_id on vota_punteggio p (cost=0.00..59.93 rows=1630 width=12)
> -> Sort (cost=18.00..18.42 rows=170 width=72)
> Sort Key: f.id
> -> Seq Scan on film_film f (cost=0.00..11.70 rows=170 width=72)
> (7 righe)
>
> cinemino=# explain SELECT f.id, f.titolo, p.voto
> FROM film_film f
> LEFT OUTER JOIN vota_punteggio p
> ON (f.id = p.film_id AND p.user_id = 2)
> ;
> QUERY PLAN
> -------------------------------------------------------------------------------------------------
> Hash Left Join (cost=12.05..24.67 rows=170 width=76)
> Hash Cond: ("outer".id = "inner".film_id)
> -> Seq Scan on film_film f (cost=0.00..11.70 rows=170 width=72)
> -> Hash (cost=12.03..12.03 rows=8 width=8)
> -> Bitmap Heap Scan on vota_punteggio p (cost=2.03..12.03 rows=8 width=8)
> Recheck Cond: (user_id = 2)
> -> Bitmap Index Scan on vota_punteggio_user_id (cost=0.00..2.03 rows=8 width=0)
> Index Cond: (user_id = 2)
> (8 righe)
>
>
>
>
> BTW: I'm no able to read explain output, but it's a long time I want to
> start studying them. I think I should start studying chapter 13, other
> hints on this subject?
>

I really need to read up more on that myself. Looking at the above I
really couldn't tell you which is the 'better' query plan.

>


From: Sandro Dentella <sandro(at)e-den(dot)it>
To: pgsql-general(at)postgresql(dot)org
Cc: Vittorino Parenti <vparenti(at)thundersystems(dot)it>
Subject: Re: Simple OUTER JOIN doubt
Date: 2006-10-27 23:14:34
Message-ID: 20061027231434.GA28409@casa.e-den.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Oct 27, 2006 at 04:22:36PM -0500, Russ Brown wrote:
> Sandro Dentella wrote:
> > On Fri, Oct 27, 2006 at 02:42:06PM -0500, Russ Brown wrote:
> >> Looks to me like it is because you're referencing vota_punteggio in the
> >> WHERE clause.
> >>
> >> Try something like this:
> >>
> >> SELECT f.id, f.titolo, p.voto
> >> FROM film_film f
> >> LEFT OUTER JOIN vota_punteggio p
> >> ON f.id = p.film_id
> >> AND p.user_id = 2
> >>
>
> My thinking was that by putting the conditions in the WHERE clause you
> were restricting the rows returned by the entire query, while putting
> them in the ON clause only restricts the matching in the JOIN itself.

It took me a while to understand this sentence (here is 1 past midnight...if
that's an excuse... ;-). I went back to wikipedia

http://en.wikipedia.org/wiki/Outer_join#Left_outer_join

Left outer join

A left outer join is very different from an inner join. Instead of limiting
results to those in both tables, it limits results to those in the "left"
table (A). This means that if the ON clause matches 0 records in B, a row in
the result will still be returned but with NULL values for each column
from B.

in my case 'film' is my left side, 'score' the right. When ON clause match 0
records on B the row is returned but in can be filtered by following WHERE
conditions. Putting it inside the ON clause, garantees that the row is added
and not filtered (and doesn't require the condition p_user_id IS NULL).

If this is correct it would be nice to add this case to the wikipedia
description about outer joins. I think it is not at all self evident. I
don't think I'm skilled enought to do that thought...

Thanks a lot to you and Tom for you help
sandro
*:-)

--
Sandro Dentella *:-)
http://www.tksql.org TkSQL Home page - My GPL work