Some of us while writing an SQL select statement with a join between two tables or more may get confused with whether to add a certain condition to the "ON" clause of the join or just add it to the "WHERE" clause as if it is just a filtering condition.
Some people may think that both approaches will return the same results every time. This is not completely true. Yes in some cases both approaches will return the same results but in other cases they won't. Let's check the case below and we will get the whole thing at the end.
Assume that we have two tables:
- Departments table which includes all departments which exist at a certain company
- Employees table which includes all employees in the same company
The two tables can be as follows
As you can see the "DepartmentId" column in the "Employees" table is a foreign key which references the "deptID" column of the "Departments" table.
Now, before going deep into code let's introduce a very useful online tool which is simply a SQL online simulator/compiler. Using this tool you can write some SQL statements, run them and get results as if you are running on an SQL management studio. I used this tool to prepare and test my queries while writing this post and I really encourage you to give it a try.
So, to use this tool:
- Browse to http://sqlfiddle.com
- In the site tool bar on the upper left side there is a drop down list to choose the SQL engine you wish to use, so choose "MS SQL Server 2008" ...... I was so happy to find "MS SQL Server 2012" in the list :)
Now we are ready to start testing. You will find that you have two wide text areas. The one on the left is where you write your schema building code while the one on the right is where you write your SQL selects.
So, first let's build our schema. To do so, paste the code below into the text area on the left and hit the "Build Schema" button.
create table Departments
(
[deptID] [int] NOT NULL
, Name nvarchar(max)
, CONSTRAINT pk_departments_pid PRIMARY KEY(deptID)
);
create table Employees
(
[empID] [int] IDENTITY(1,1) NOT NULL
, Name nvarchar(max)
, DepartmentId int
, CONSTRAINT pk_employees_pid PRIMARY KEY(empID)
, CONSTRAINT FK_employees_Department FOREIGN KEY (DepartmentId)
REFERENCES Departments (deptID)
);
So now we have our two tables as in the picture
Now, let's test our two approaches and see what happens. Assume that for some reason we only care about all departments except department with deptID = 3.
Adding Condition To The "WHERE" Clause
Copy and paste the code below into the text area on the right, on the "Run SQL" button there is a small arrow pointing down, hit this arrow and choose "Tabular Output" and finally hit "Run SQL".
insert into Departments (deptID, Name)
values(1, 'Dept1'),(2, 'Dept2'),(3, 'Dept3'),(4, 'Dept4'),(5, 'Dept5');
insert into Employees (Name, DepartmentId)
values('Ahmed', 1),('Tarek', 2),('Hasan', 2),('Saleh', 2),('Selim', 3);
select e.Name as empName
, d.Name as deptName
from Employees as e
left outer join Departments as d
on e.DepartmentId = d.deptID
where d.deptID <> 3;
After running the code you will get the results as in the picture below
As you can see there should be the record (Selim, Dept3) but it is filtered out because we added a condition to the "WHERE" clause which states that we only need records matching to all departments except the departments which have the "deptID" column equal to "3".
Adding Condition To The "ON" Clause
Copy and paste the code below into the text area on the right, on the
"Run SQL" button there is a small arrow pointing down, hit this arrow
and choose "Tabular Output" and finally hit "Run SQL".
insert into Departments (deptID, Name)
values(1, 'Dept1'),(2, 'Dept2'),(3, 'Dept3'),(4, 'Dept4'),(5, 'Dept5');
insert into Employees (Name, DepartmentId)
values('Ahmed', 1),('Tarek', 2),('Hasan', 2),('Saleh', 2),('Selim', 3);
select e.Name as empName
, d.Name as deptName
from Employees as e
left outer join Departments as d
on (e.DepartmentId = d.deptID and d.deptID <> 3);
After running the code you will get the results as in the picture below
As you can see there should be the record (Selim, Dept3) but instead we have the record (Selim, NULL) and this is because we added the condition to the "ON" clause to filter out the departments with "deptID" equal to "3". So, just before the joining the table "Departments" is filtered according to the condition and then the joining is performed. Since the joining is outer left joining (knowing that the "Employees" table is the one on the left), then whatever values in the "Departments" table are all records in the "Employees" table will return. This time the employee "Selim" has no corresponding department in the "Departments" table because it is already filtered out by the condition just before the joining occurred.
Conclusion
When adding the condition to the "WHERE" clause, the joining is executed first and the filtering based on the condition is the last thing to happen. While when adding the condition to the "ON" clause the filtering is applied to the tables first then the joining happens.
So, in which cases do both approaches return the same results? This happens in case of inner joins as in inner joins if a record in the "Employees" table doesn't have a corresponding record in the "Departments" table then this record will be filtered out. So, even if the condition is added to the "ON" clause, the department record is filtered out before joining, joining is applied, then the joining itself will filter out the whole employee record because it has no corresponding department.
That's it. Hope you find this post useful.