Re: How would I write this query...

Lists: pgsql-general
From: "Guy Rouillier" <guyr(at)masergy(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Re: How would I write this query...
Date: 2006-05-01 16:51:57
Message-ID: D4D1632DC736E74AB95FE78CD609007901260D@mtxexch01.add0.masergy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jim Fitzgerald wrote:
> Hi -
>
> I have two tables, one of them has names of people and an associated
> integer ID. The other table is a list of the people (from the first
> table) by their ID number that have signed up for a class. How would
> I write a query that would list all the people from the first table
> that do not have any entries in the second table? Basically, I want
> a listing of all my people who have not signed up for the class.

select *
from people
where id not in
(
select id
from class_registration
)

--
Guy Rouillier


From: "John D(dot) Burger" <john(at)mitre(dot)org>
To: PostgreSQL-general general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How would I write this query...
Date: 2006-05-01 21:23:41
Message-ID: 08c9bd676d30ce46237d7469263f8ad1@mitre.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> select *
> from people
> where id not in
> (
> select id
> from class_registration
> )

In my experience, queries like the OUTER LEFT JOIN version posted
earlier are usually much more efficient than NOT IN queries like the
above. The planner seems to be pretty smart about turning (positive)
IN queries into joins, but NOT IN queries usually turn into nested
table scans, in my experience.

- John D. Burger
MITRE


From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Guy Rouillier <guyr(at)masergy(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How would I write this query...
Date: 2006-05-02 09:09:40
Message-ID: 445721D4.3040806@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Guy Rouillier wrote:
> Jim Fitzgerald wrote:
>
>>Hi -
>>
>> I have two tables, one of them has names of people and an associated
>>integer ID. The other table is a list of the people (from the first
>>table) by their ID number that have signed up for a class. How would
>>I write a query that would list all the people from the first table
>>that do not have any entries in the second table? Basically, I want
>>a listing of all my people who have not signed up for the class.
>
>
> select *
> from people
> where id not in
> (
> select id
> from class_registration
> )

Wouldn't a NOT EXISTS be faster? After all, the current record can be
disposed of as soon as there's any reference to it from class_registration.

For example:
select *
from people
where not exists (
select 1
from class_registration
where id = people.id
);

It may be faster to use * or a specific column name in the subquery
instead of the constant value 1. EXPLAIN ANALYZE will tell ;)

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: "John D(dot) Burger" <john(at)mitre(dot)org>
Cc: PostgreSQL-general general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How would I write this query...
Date: 2006-05-02 09:15:01
Message-ID: 20060502091501.GA7820@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, May 01, 2006 at 05:23:41PM -0400, John D. Burger wrote:
> In my experience, queries like the OUTER LEFT JOIN version posted
> earlier are usually much more efficient than NOT IN queries like the
> above. The planner seems to be pretty smart about turning (positive)
> IN queries into joins, but NOT IN queries usually turn into nested
> table scans, in my experience.

That's because they're not equivalent. IN/NOT IN have special semantics
w.r.t. NULLs that make them a bit more difficult to optimise. OUTER
JOINs on the other hand is easier since in a join condition anything =
NULL evaluates to NULL -> FALSE.

I think there's been some discussion about teaching the planner about
columns that cannot be NULL (like primary keys) thus allowing it to
perform this transformation safely. I don't know if anyone has done it
though...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: "John D(dot) Burger" <john(at)mitre(dot)org>
To: PostgreSQL-general general <pgsql-general(at)postgresql(dot)org>
Subject: Re: How would I write this query...
Date: 2006-05-02 14:15:02
Message-ID: a115999fa9479251bed1581c6e3ed478@mitre.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout wrote:

> That's because they're not equivalent. IN/NOT IN have special semantics
> w.r.t. NULLs that make them a bit more difficult to optimise. OUTER
> JOINs on the other hand is easier since in a join condition anything =
> NULL evaluates to NULL -> FALSE.

Which is why Hash IN Joins were added, presumably. But there's nothing
analogous for NOT IN, I guess, perhaps there can't be.

> I think there's been some discussion about teaching the planner about
> columns that cannot be NULL (like primary keys) thus allowing it to
> perform this transformation safely. I don't know if anyone has done it
> though...

Yeah, I've noticed cases where I've thought "Ah, the planner doesn't
know that column can't be null". Similarly, it has seemed to me that
knowing that a column was UNIQUE could have made for a better plan,
although I can't think of any examples off-hand. Maybe where I saw it
using a Hash aggregate on a unique column, and I thought it could just
use the index, although that may not make sense either.

- John D. Burger
MITRE