Re: I don't understand something...

Lists: pgsql-general
From: Alexander Pyhalov <alp(at)rsu(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: I don't understand something...
Date: 2011-10-03 06:33:12
Message-ID: 4E895728.8000809@rsu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello.
I was asked a simple question. We have table employees:
\d employees
Table "public.employees"
Column | Type |
Modifiers
----------------+-----------------------------+-----------------------------------------------------------------
employee_id | integer | not null default
nextval('employees_employee_id_seq'::regclass)
first_name | character varying(20) |
last_name | character varying(25) | not null
email | character varying(25) | not null
phone_number | character varying(20) |
hire_date | timestamp without time zone | not null
job_id | character varying(10) | not null
salary | numeric(8,2) |
commission_pct | numeric(2,2) |
manager_id | integer |
department_id | integer |
Indexes:
"employees_pkey" PRIMARY KEY, btree (employee_id)
"emp_email_uk" UNIQUE, btree (email)
"emp_department_ix" btree (department_id)
"emp_job_ix" btree (job_id)
"emp_manager_ix" btree (manager_id)
"emp_name_ix" btree (last_name, first_name)
Check constraints:
"emp_salary_min" CHECK (salary > 0::numeric)
Foreign-key constraints:
"employees_department_id_fkey" FOREIGN KEY (department_id)
REFERENCES departments(department_id)
"employees_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobs(job_id)
"employees_manager_id_fkey" FOREIGN KEY (manager_id) REFERENCES
employees(employee_id)
Referenced by:
TABLE "departments" CONSTRAINT "dept_mgr_fk" FOREIGN KEY
(manager_id) REFERENCES employees(employee_id)
TABLE "employees" CONSTRAINT "employees_manager_id_fkey" FOREIGN
KEY (manager_id) REFERENCES employees(employee_id)
TABLE "job_history" CONSTRAINT "job_history_employee_id_fkey"
FOREIGN KEY (employee_id) REFERENCES employees(employee_id)

Now we want to select count of all employees who doesn't have any
subordinates (query 1):
SELECT count(employee_id) from employees o where not exists (select 1
from employees where manager_id=o.employee_id);
count
-------
89
(1 row)

We can select count of all managers (query 2):
SELECT count(employee_id) from employees where employee_id in (select
manager_id from employees);
count
-------
18
(1 row)

But if we reformulate the first query in the same way, answer is
different (query 3):
SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees) (query 3);
count
-------
0
(1 row)

I don't understand why queries 1 and 3 give different results. They
seems to be the same... Could someone explain the difference?

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University


From: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: I don't understand something...
Date: 2011-10-03 07:19:52
Message-ID: 201110031019.52919.achill@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alexander, that's a classic one,
rewrite your last query as :

SELECT count(employee_id) from employees where employee_id not in
(select manager_id from employees WHERE manager_id IS NOT NULL);

NULLS semantics are sometimes not so obvious.

Στις Monday 03 October 2011 09:33:12 ο/η Alexander Pyhalov έγραψε:
> Hello.
> I was asked a simple question. We have table employees:
> \d employees
> Table "public.employees"
> Column | Type |
> Modifiers
> ----------------+-----------------------------+-----------------------------------------------------------------
> employee_id | integer | not null default
> nextval('employees_employee_id_seq'::regclass)
> first_name | character varying(20) |
> last_name | character varying(25) | not null
> email | character varying(25) | not null
> phone_number | character varying(20) |
> hire_date | timestamp without time zone | not null
> job_id | character varying(10) | not null
> salary | numeric(8,2) |
> commission_pct | numeric(2,2) |
> manager_id | integer |
> department_id | integer |
> Indexes:
> "employees_pkey" PRIMARY KEY, btree (employee_id)
> "emp_email_uk" UNIQUE, btree (email)
> "emp_department_ix" btree (department_id)
> "emp_job_ix" btree (job_id)
> "emp_manager_ix" btree (manager_id)
> "emp_name_ix" btree (last_name, first_name)
> Check constraints:
> "emp_salary_min" CHECK (salary > 0::numeric)
> Foreign-key constraints:
> "employees_department_id_fkey" FOREIGN KEY (department_id)
> REFERENCES departments(department_id)
> "employees_job_id_fkey" FOREIGN KEY (job_id) REFERENCES jobs(job_id)
> "employees_manager_id_fkey" FOREIGN KEY (manager_id) REFERENCES
> employees(employee_id)
> Referenced by:
> TABLE "departments" CONSTRAINT "dept_mgr_fk" FOREIGN KEY
> (manager_id) REFERENCES employees(employee_id)
> TABLE "employees" CONSTRAINT "employees_manager_id_fkey" FOREIGN
> KEY (manager_id) REFERENCES employees(employee_id)
> TABLE "job_history" CONSTRAINT "job_history_employee_id_fkey"
> FOREIGN KEY (employee_id) REFERENCES employees(employee_id)
>
> Now we want to select count of all employees who doesn't have any
> subordinates (query 1):
> SELECT count(employee_id) from employees o where not exists (select 1
> from employees where manager_id=o.employee_id);
> count
> -------
> 89
> (1 row)
>
> We can select count of all managers (query 2):
> SELECT count(employee_id) from employees where employee_id in (select
> manager_id from employees);
> count
> -------
> 18
> (1 row)
>
> But if we reformulate the first query in the same way, answer is
> different (query 3):
> SELECT count(employee_id) from employees where employee_id not in
> (select manager_id from employees) (query 3);
> count
> -------
> 0
> (1 row)
>
> I don't understand why queries 1 and 3 give different results. They
> seems to be the same... Could someone explain the difference?
>
> --
> Best regards,
> Alexander Pyhalov,
> system administrator of Computer Center of Southern Federal University
>

--
Achilleas Mantzios


From: Alexander Pyhalov <alp(at)rsu(dot)ru>
To: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: I don't understand something...
Date: 2011-10-03 07:48:45
Message-ID: 4E8968DD.2060207@rsu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/03/2011 11:19, Achilleas Mantzios wrote:
> Alexander, that's a classic one,
> rewrite your last query as :
>
> SELECT count(employee_id) from employees where employee_id not in
> (select manager_id from employees WHERE manager_id IS NOT NULL);
>
> NULLS semantics are sometimes not so obvious.

Thanks.
It's confusing, that IN check works as expected, but NOT IN works this
way...

--
Best regards,
Alexander Pyhalov,
system administrator of Computer Center of Southern Federal University


From: hubert depesz lubaczewski <depesz(at)depesz(dot)com>
To: Alexander Pyhalov <alp(at)rsu(dot)ru>
Cc: Achilleas Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: I don't understand something...
Date: 2011-10-03 08:02:16
Message-ID: 20111003080216.GC4824@depesz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Oct 03, 2011 at 11:48:45AM +0400, Alexander Pyhalov wrote:
> On 10/03/2011 11:19, Achilleas Mantzios wrote:
> >Alexander, that's a classic one,
> >rewrite your last query as :
> >
> >SELECT count(employee_id) from employees where employee_id not in
> > (select manager_id from employees WHERE manager_id IS NOT NULL);
> >
> >NULLS semantics are sometimes not so obvious.
>
> Thanks.
> It's confusing, that IN check works as expected, but NOT IN works
> this way...

If I might plug:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/


From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Alexander Pyhalov <alp(at)rsu(dot)ru>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: I don't understand something...
Date: 2011-10-03 12:18:56
Message-ID: CAF-3MvPgzbF8fvDsCqDocs1u+Q-6RVXMHt05sw0cwmt9fUFxqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 3 October 2011 08:33, Alexander Pyhalov <alp(at)rsu(dot)ru> wrote:
> Now we want to select count of all employees who doesn't have any
> subordinates (query 1):
>  SELECT count(employee_id) from employees o where not exists  (select 1 from
> employees  where manager_id=o.employee_id);
>  count
> -------
>    89
> (1 row)
>
> We can select count of all managers (query 2):
> SELECT count(employee_id) from employees where employee_id  in (select
> manager_id from employees);
>  count
> -------
>    18
> (1 row)
>
> But if we reformulate the first query in the same way, answer is different
> (query 3):
> SELECT count(employee_id) from employees where employee_id not in (select
> manager_id from employees) (query 3);
>  count
> -------
>     0
> (1 row)
>
> I don't understand why queries 1 and 3 give different results. They seems to
> be the same... Could someone explain the difference?

That's because NOT IN returns NULL if there are any NULLs in the list.
As the WHERE-clause requires something to evaluate to either true or
false (NULL won't do), you (correctly) get false if someone is a
manager, but also if _anyone_ is NOT a manager.

That's an artefact of how 3-valued logic is implemented in the SQL standard.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.