How To Apply Recursive SQL Selections On Hierarchical Data
Posted by 5/27/2013 10:05:00 PM with No comments
on Sometimes we work on systems where a hierarchical data structure exists on some entities like employees and their managers. Both employees and managers can be called employees but there is a self join relation between them as each employee must have a manager. I think we are all familiar with this pattern.
Also, I think we all faced the situation when we need to get the info about each employee and his/her direct manager. At this point we used to join between the employees (child) table and itself (parent) on the condition that the parent id of the child is equal to the id of the parent. This is good.
But, what about if we need to get the hierarchical tree of managers of a certain employee not just his direct manager. It seems as we just need to the same join but more than one time till we are up all the way to the head manager. This is somehow logical but how can we do this number of joins and we don't know the number of levels up to the head manager?!!!
This introduces the problem or the challenge we are going to find a solution to right now. The answer is simply recursive.
Before going into more details, let's simplify the meaning of the expression "Recursive". It means that we have some logic and we want this logic to be repeated more than one time till a certain condition is satisfied (or not satisfied), at this point the execution should be stopped to get the result of the whole process.
So, is this what we really need to find a solution for what we have on hand right now? I think it is so let's now see some code and try to simplify things a little bit.
First, let's assume that we have a hierarchical tree of departments in a certain company. The tree is as in the image below
Now, we need to write a select statement which will be converted to a stored procedure later. This select statement will select all parent departments from a certain department up to the head department which is department #1 as in the picture.
So, if we assume that the certain department we want to investigate is department #6, then we should have a table having the results as follows
6, 5
5, 2
2, 1
1, 1
As explained before this can't be achieved using single join operations, so lets now check the proposed solution.
Note
You can see the code already written and ready for execution on this link
First we will create our "Departments" table. This table includes each department and a reference to its parent (self join).
CREATE TABLE Departments ( [ID] [int] NOT NULL , Name nvarchar(max) , ParentID int );
Second, let's fill some data into the "Departments" table to work on.
INSERT INTO Departments (ID, Name, ParentID) VALUES(1, 'Dept1', 1) ,(2, 'Dept2', 1) ,(3, 'Dept3', 1) ,(4, 'Dept4', 1) ,(5, 'Dept5', 2) ,(6, 'Dept6', 5);
Now, we are ready to work on the select statement.
WITH AllDepartments([ChosenDept], [ChildID], [ChildName], [ParentID], [ParentName]) AS ( SELECT Child.ID AS [ChosenDept] , Child.ID AS [ChildID] , Child.Name AS [ChildName] , Parent.ID AS [ParentID] , Parent.Name AS [ParentName] FROM Departments AS Child LEFT OUTER JOIN Departments AS Parent ON Child.ParentID = Parent.ID UNION ALL SELECT AllDepartments.ChosenDept AS [ChosenDept] , AllDepartments.ParentID AS [ChildID] , AllDepartments.ParentName AS [ChildName] , NewParent.ParentID AS [ParentID] , NewParentInfo.Name AS [ParentName] FROM AllDepartments INNER JOIN Departments AS NewParent ON AllDepartments.ParentID = NewParent.ID AND AllDepartments.ParentID <> AllDepartments.ChildID INNER JOIN Departments as NewParentInfo ON NewParent.ParentID = NewParentInfo.ID ) SELECT AllDepartments.[ChildID] , AllDepartments.[ChildName] , AllDepartments.[ParentID] , AllDepartments.[ParentName] FROM AllDepartments WHERE ChosenDept = 6;
As you can see in the code above it seems to be a bit complicated but believe me it is not that hard to understand, so let's go through the analysis.
Analysis
- As you can see in the first line we are using the "with" statement to create what is known as "Common Table Expression" (aka: CTE)
- The CTE is an expression by which we a create a table using a select statement to be used as a common table on which other later select statements depend
- CTE is not only used with SQL recursion but it is one of the most important applications
- CTE is defined as "WITH tableName (columnAlias1, columnAlias2, columnAlias3, .....)
- Inside the CTE body, there is two select blocks with a "UNION ALL" operator in the middle
- The select statement on the top, before the "UNION ALL", is a select statement which provides the data to start with as a seed for the recursion process to start with
- This select statement is only called once at first then the recursion process will depend on its results to work on
- In our case, we need this select to get all the info we need about each department like
- Department ID
- Department Name
- Parent Department ID
- Parent Department Name (needs a self join)
- So, it is a simple self join as we see but you can notice that we added an extra column over the 4 columns above, so what about this column?
- This column represents the department we choose for investigation;
- We need the final result (after the recursion is fully executed) to include each department and all parent departments up to the head one
- So, for each department upon investigation we have multiple records (as in our example above, to investigate department #6, we had records {(6, 5), (5, 2), (2, 1), (1, 1)}, for #5 we had records {(5, 2), (2, 1), (1, 1)}, ............. so to consolidate all these results into one result set, we should add the department to investigate with each result so for #6 it should be {(6, 6, 5), (6, 5, 2), (6, 2, 1), (6, 1, 1)} and for #5 it should be {(5, 5, 2), (5, 2, 1), (5, 1, 1)} and so on)
- Then at last when we decide to choose certain department to investigate, we can use this extra column to filter with, so if we need to investigate department #6, we can filter the final result set as follows "WHERE ChosenDept = 6"
- For the second select statement below, this is the select which represents the recursive opertation
- To understand what is going on let's imagine that we have the results of the first select statement stored int the CTE we created and try to go through the operation manually just using our minds
- The results will be
- {ChosenDept, ChildID, ChildName, ParentID, ParentName}
- {1, 1, 'Dept1', 1, 'Dept1'}
- {2, 2, 'Dept2', 1, 'Dept1'}
- {3, 3, 'Dept3', 1, 'Dept1'}
- {4, 4, 'Dept4', 1, 'Dept1'}
- {5, 5, 'Dept5', 2, 'Dept2'}
- {6, 6, 'Dept6', 5, 'Dept5'}
- Let's assume that we are going to investigate "Dept6"
- Then we will check which department is its parent, it is "Dept5"
- So now we try to find the department which is parent of "Dept5"
- This piece of info exists in the "Departments" table
- This means that we need to join the CTE result set above with the "Departments" table (let's call it "NewParent") on the condition that the "ParentName" column of the CTE result set is equal to the "ID" of the "NewParent" table. All of this to be able to get the parent department of "Dept5"
- So now in the new select the selected columns will be as follows:
- "ChildID" will be the "ParentID" of the CTE (because now this is the new child we are trying to find its parent, in this example, it is "Dept5")
- "ChildName" will be the "ParentName" of the CTE
- "ParentID" will be the "ParentID" of the "NewParent" table (because this is the direct parent of the current child which is "Dept5" in this example)
- "ParentName" will be the name of the new parent we got in the column above (this required an extra join cause the name of the parent doesn't exist in the same table)
- "ChosenDept" will be got from the CTE cause this is the same department we are still investigating from the main CTE
- Now after we have finished the CTE including the recursion, we will write our final select statement which will get only the results when the "ChosenDept" column of the CTE is equal to 6
That's it. I know that it is not easy to understand it the first time but believe me if you read it one more time you will get it. Try to imagine the whole process in your head as if you are going to do it manual. After short time you will get the common sense of the whole operation and you will be able to understand every detail.
I encourage you to repeat reading the code and the steps more than once and for sure you can search the internet for any other tutorials or resources talking about "Recursive SQL using Common Table Expression".
Other Resources
- Recursive Queries Using Common Table Expressions
- SQL SERVER – Simple Example of Recursive CTE | Journey to SQL Authority with Pinal Dave
- SQL SERVER – SQL SERVER – Simple Example of Recursive CTE – Part 2 – MAXRECURSION – Prevent CTE Infinite Loop | Journey to SQL Authority with Pinal Dave
- SQL Anywhere: Example: RECURSIVE UNION
Hope you find this post useful someday :)
Categories: Reusable Code, SQL