Re: OUTER JOIN with filter

Lists: pgsql-sql
From: "Nicolas Fertig" <nfertig(at)swissnow(dot)ch>
To: pgsql-sql(at)postgresql(dot)org
Subject: OUTER JOIN with filter
Date: 2003-03-01 18:53:27
Message-ID: 3e61011d$0$1220$5402220f@news.sunrise.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hello,

Can anyone help me with the problem bellow ?

I want to have all the row in table "table_main" with the value in the table
"table_slave" (value or null if not exist)

It is possible to have the same result without sub-select in OUTER JOIN
(speed problem on big table) ?

Actualy this is what I make...

CREATE TABLE "table_main" (
"id" int4,
"some_field" varchar(100)
);

CREATE TABLE "table_slave" (
"id" int4,
"name" varchar(100),
"c1" varchar(30)
);

SELECT TM.some_field, TS.name
FROM table_main TM
LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS
USING(id)

the request bellow is not correct:

SELECT TM.some_filed, TS.name
FROM table_main TM
LEFT OUTER JOIN table_slave TS USING(id)
WHERE (TS.c1 = 'myc1' OR TS.c1 IS NULL)

Many thanks,

Nicolas


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Nicolas Fertig <nfertig(at)swissnow(dot)ch>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OUTER JOIN with filter
Date: 2003-03-02 01:07:09
Message-ID: 20030301170535.U21117-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Sat, 1 Mar 2003, Nicolas Fertig wrote:

> Can anyone help me with the problem bellow ?
>
> SELECT TM.some_field, TS.name
> FROM table_main TM
> LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS
> USING(id)

Maybe,

select tm.some_field, ts.name
from table_tmain tm left outer join table_slave ts
on (tm.id=ts.id and ts.c1='myvalue');


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Nicolas Fertig <nfertig(at)swissnow(dot)ch>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: OUTER JOIN with filter
Date: 2003-03-02 01:54:06
Message-ID: 20030302015406.GA17308@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Sat, Mar 01, 2003 at 19:53:27 +0100,
Nicolas Fertig <nfertig(at)swissnow(dot)ch> wrote:
>
> I want to have all the row in table "table_main" with the value in the table
> "table_slave" (value or null if not exist)
>
> It is possible to have the same result without sub-select in OUTER JOIN
> (speed problem on big table) ?

Why do you think there will be a speed problem? With proper statistics
a good plan for the query below should be possible. As long as both tables
have indexes on id a merge join with a filter can be used. If there are
only a few rows with c1 = 'myvalue' out of a large number of rows and
there is an index on c1, then an alternative plan that does a merge join
with the sorted output from the subselect (done with an index scan) might
be faster. The planner should be able to choose between these plans as
long as has good statistics for the tables.

> SELECT TM.some_field, TS.name
> FROM table_main TM
> LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS
> USING(id)

Assuming that the above query gives you the results you want, then I expect
that it is the most efficient way to write it. You could use a case
statement to handle the where c1 = 'myvalue' clause, but doing this is probably
going to be slower than doing a join to the subselect.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
Cc: Nicolas Fertig <nfertig(at)swissnow(dot)ch>, pgsql-sql(at)postgresql(dot)org
Subject: Re: OUTER JOIN with filter
Date: 2003-03-02 07:09:17
Message-ID: 16210.1046588957@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> On Sat, 1 Mar 2003, Nicolas Fertig wrote:
>> Can anyone help me with the problem bellow ?
>>
>> SELECT TM.some_field, TS.name
>> FROM table_main TM
>> LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS
>> USING(id)

> select tm.some_field, ts.name
> from table_main tm left outer join table_slave ts
> on (tm.id=ts.id and ts.c1='myvalue');

Offhand I believe that these queries should generate identical plans.
They do not at the moment --- the second one generates a worse plan
(sorry Stephan ;-)) --- because the planner does not realize it could
push down the ts.c1='myvalue' JOIN condition into the scan of ts, even
though the join is OUTER. But AFAICS it would not change the results to
do so; ts rows failing ts.c1='myvalue' will not produce join output
anyway, but would allow outer-joined lefthand rows to be produced. (Can
anyone see a hole in that logic? It's on my to-do list to change it...)

I concur with the other comments that there's nothing wrong with the
given query ... other than such trivial details as it doesn't work ...
the sub-select doesn't produce an ID column. So I think Nicolas is
giving us an over-censored description of his real problem; but without
the omitted details, we're not likely to offer much useful help.

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nicolas Fertig <nfertig(at)swissnow(dot)ch>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OUTER JOIN with filter
Date: 2003-03-02 07:33:00
Message-ID: 20030301232815.Q24208-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


On Sun, 2 Mar 2003, Tom Lane wrote:

> Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> writes:
> > On Sat, 1 Mar 2003, Nicolas Fertig wrote:
> >> Can anyone help me with the problem bellow ?
> >>
> >> SELECT TM.some_field, TS.name
> >> FROM table_main TM
> >> LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS
> >> USING(id)
>
> > select tm.some_field, ts.name
> > from table_main tm left outer join table_slave ts
> > on (tm.id=ts.id and ts.c1='myvalue');
>
> Offhand I believe that these queries should generate identical plans.
> They do not at the moment --- the second one generates a worse plan
> (sorry Stephan ;-)) --- because the planner does not realize it could

I wasn't really sure if it would or not (wrote it without testing on a
running server), actually I didn't realize it wouldn't push down, and
figured it'd give pretty much the same plan, but it is less verbose. :)

> push down the ts.c1='myvalue' JOIN condition into the scan of ts, even
> though the join is OUTER. But AFAICS it would not change the results to
> do so; ts rows failing ts.c1='myvalue' will not produce join output
> anyway, but would allow outer-joined lefthand rows to be produced. (Can
> anyone see a hole in that logic? It's on my to-do list to change it...)

I don't see an obvious case where it'd be different either.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nicolas Fertig <nfertig(at)swissnow(dot)ch>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: OUTER JOIN with filter
Date: 2003-03-03 18:30:11
Message-ID: 200303031030.11785.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Tom, Stephan,

> > Offhand I believe that these queries should generate identical plans.
> > They do not at the moment --- the second one generates a worse plan
> > (sorry Stephan ;-)) --- because the planner does not realize it could
>
> I wasn't really sure if it would or not (wrote it without testing on a
> running server), actually I didn't realize it wouldn't push down, and
> figured it'd give pretty much the same plan, but it is less verbose. :)

Yeah. I've found that when I'm doing a left outer join to a large table with
criteria, it pays to try the query both ways. The way Stephan did it is
frequently very speedy with small tables but not so speedy with large ones.

For that matter, in a few queries I've found that it pays to force the
planner's hand by repeating some of the clauses in the WHERE clause in the
JOIN as well, as:

SELECT a.x, b.y, c.z
FROM a JOIN b ON (a.id = b.a_id AND b.type = 'm')
LEFT OUTER JOIN c ON (b.id = c.b_id)
WHERE b.type = 'm'

This is only useful, of course, when the presense of outer joins forces you
into an explicit join order ( And not always then ) -- otherwise the planner
will generally do a good job given a free hand.

> > push down the ts.c1='myvalue' JOIN condition into the scan of ts, even
> > though the join is OUTER. But AFAICS it would not change the results to
> > do so; ts rows failing ts.c1='myvalue' will not produce join output
> > anyway, but would allow outer-joined lefthand rows to be produced. (Can
> > anyone see a hole in that logic? It's on my to-do list to change it...)

The logic is fine. The issue comes where the outer joined table is several
times larger than the main queried table. The planner should recognize the
possibility of filtering the records in the joined table before joining in
order to minimize the join operation. If that's what you're asking, please
fix it!

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Nicolas Fertig <nfertig(at)swissnow(dot)ch>, pgsql-sql(at)postgresql(dot)org
Subject: Re: OUTER JOIN with filter
Date: 2003-03-03 19:17:19
Message-ID: 15484.1046719039@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> For that matter, in a few queries I've found that it pays to force the
> planner's hand by repeating some of the clauses in the WHERE clause in the
> JOIN as well, as:

> SELECT a.x, b.y, c.z
> FROM a JOIN b ON (a.id = b.a_id AND b.type = 'm')
> LEFT OUTER JOIN c ON (b.id = c.b_id)
> WHERE b.type = 'm'

> The logic is fine. The issue comes where the outer joined table is several
> times larger than the main queried table. The planner should recognize the
> possibility of filtering the records in the joined table before joining in
> order to minimize the join operation.

AFAIR, it's always done that; certainly the above query looks quite
redundant to me. Note though that pushing down the WHERE clause is only
legal because b is on the outer side of the join.

A comparable query in the 7.3 regression database is

regression=# explain select * from
regression-# tenk1 a join tenk1 b on (a.unique1 = b.unique1)
regression-# left join tenk1 c on (b.unique2 = c.unique2)
regression-# where b.thousand = 999;
QUERY PLAN
-------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..572.49 rows=10 width=732)
-> Nested Loop (cost=0.00..542.55 rows=10 width=488)
-> Seq Scan on tenk1 b (cost=0.00..483.00 rows=10 width=244)
Filter: (thousand = 999)
-> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..6.01 rows=1 width=244)
Index Cond: (a.unique1 = "outer".unique1)
-> Index Scan using tenk1_unique2 on tenk1 c (cost=0.00..3.01 rows=1 width=244)
Index Cond: ("outer".unique2 = c.unique2)
(8 rows)

You can see that the condition on b.thousand does get pushed down to the
bottom scan. (There's no index on b.thousand, so we don't get an
indexscan --- but we would if there were an index...)

regards, tom lane