For SQL Hierarchical Data, How To Show Only Tree Branches Including Certain Type Of Entities And Discard Others

Posted by Ahmed Tarek Hasan on 1/13/2014 02:20:00 AM with No comments

SQL Hierarchical Data

If you have a tree of entities represented into a SQL database into parent-child relation as in the image above, you may need to be able to view only the tree branches which include a certain type of entities
and discard the branches which don't include this type.

For example, let's have a look on the tree above and assume that we need to show only the branches which include the "J" type of entities. For this to happen, then we need to transfer the above tree into the one below.

SQL Hierarchical Data

How to do this in SQL?
To be able to answer this question, we first need to know how our mind does it. Our mind does a lot of processing and computing in an efficient way which makes us think that it is so easy, but is it that easy?

Assume that we have these types of entities in our tree.

And the real entities instances in our database are as in the image below.

Which means that our entities are in the tree form as in the image below.

Now, let's assume that we need to view only the tree branches including the "J" type. To do this manually just by a piece of paper and a pen, we look at the leaf entities in the tree. By leaf I mean the entities which have no children. Then, up from there we trace each branch and if a branch doesn't include a "J", we erase this branch till we finally get the final tree.

To do this in a more systematic approach, we can first locate our leaf entities. Then, we go up one level for each one of the leaf entities, then another one level up and so on till we reach the top of the tree. This could be illustrated as in the table below.

As you can see, we first started with the leaf entities and it took us five iterations to reach the top most of the tree. But why do we need to do that?

We did that because we already know the sequence of the system entities types, for example we know that if the leaf entity of a branch is "T", then this branch will not have a "J" as "J" comes after "T". So, we needed to trace back each branch to decide which branch to keep and which to discard.

Does that mean that every branch doesn't end with "J" should be completely deleted up to the top of the tree? No, as you can see in our tree above, there is a branch which ends with "T4", we are sure that this branch will not include a "J" but if we delete this branch up to the "C" entity, we will lose "P2" which has valid "J" children.

So, to get it right you can think of it as a voting system. Each entity in the tree should vote on whether its parent should be kept or no. Finally, we sum the voting on each parent and know if at least one child needs this parent, if not, we can discard this parent without any problems or loosing any valid branches.

So, each entity of the leaf entities should already know if its parent should be kept or not, so in our example, if the leaf entity is a "J", then its parent should be voted a "1". But if the leaf entity is a "S" or "T" or "P" or "C" then the parent should be voted a "0".

So, this leads us to the result in the image below.

Now, we need to sum the votes of each entity to know which entities to keep and which to discard. After doing the summation, we will get the result as in the image below.

So, to make sure the logic we applied here is valid, let's return to the tree diagram and highlight the entities which got a sum of "0" votes. These entities are decided by all child entities to be useless and disposable.

SQL Hierarchical Data

Are these the entities we can delete from the tree to make sure every branch includes a "J" entity? Yes, those are the ones and by deleting them we get the final tree as in the image below.

So, now we are sure that the logic we applied is valid but can we apply this complex logic in SQL?

Applying the same logic in SQL

--Creating table structure
DECLARE @AllEntities AS Table
 , ParentId INT
 , [Type] NVARCHAR(10)

--Inserting sample data as in the example supplied
INSERT INTO @AllEntities
 , ParentId
 , [Type]
(1, 'C', NULL, 'C')
, (2, 'P1', 1, 'P')
, (3, 'P2', 1, 'P')
, (4, 'T1', 2, 'T')
, (5, 'T2', 2, 'T')
, (6, 'T3', 2, 'T')
, (7, 'T4', 3, 'T')
, (8, 'T5', 3, 'T')
, (9, 'S1', 4, 'S')
, (10, 'S2', 4, 'S')
, (11, 'S3', 5, 'S')
, (12, 'S4', 6, 'S')
, (13, 'S5', 6, 'S')
, (14, 'S6', 8, 'S')
, (15, 'S7', 8, 'S')
, (16, 'J1', 10, 'J')
, (17, 'J2', 10, 'J')
, (18, 'J3', 11, 'J')
, (19, 'J4', 11, 'J')
, (20, 'J5', 14, 'J')
, (21, 'J6', 14, 'J')

--Declaring a variable to hold the required ensured entity type
--Each branch in the tree should include this entity type, otherwise,
--the whole branch will be excluded from the final result
DECLARE @EnsuredEntityType AS NVARCHAR(10)
SET @EnsuredEntityType = 'J'

;WITH EnsuredEntityTree(Id, ParentId, Voting) AS
 --Strating with the leaf entities on the tree as the seed
 SELECT parent.ObjectID
 , parent.ParentObjectID
  (@EnsuredEntityType = 'J' AND (parent.Type = 'J'))
  (@EnsuredEntityType = 'S' AND (parent.Type = 'S' OR parent.Type = 'J'))
  (@EnsuredEntityType = 'T' AND (parent.Type = 'T' OR parent.Type = 'S' OR parent.Type = 'J'))
  (@EnsuredEntityType = 'P' AND (parent.Type = 'P' OR parent.Type = 'T' OR parent.Type = 'S' OR parent.Type = 'J'))
  (@EnsuredEntityType = 'C' AND (parent.Type = 'C'))
 ) THEN 1
 ELSE 0 END AS Voting
 FROM @AllEntities as parent
 LEFT OUTER JOIN @AllEntities as children
 ON children.ParentId = parent.Id
 WHERE children.Id IS NULL


 SELECT parent.Id
 , parent.ParentId
 , et.Voting --the same voting from the original seed object
 FROM @AllEntities AS parent
 INNER JOIN EnsuredEntityTree AS et
 ON et.ParentId = parent.Id

FROM EnsuredEntityTree
GROUP BY Id, ParentId
HAVING SUM(Voting) > 0

Finally, I hope you find this useful someday.
Good luck.