January 2015 - Development Simply Put

A blog simplifies main concepts in IT development and provides tips, hints, advices and some re-usable code. "If you can't explain it simply, you don't understand it well enough" -Albert Einstein

  • Development Simply Put

    If you can't explain it simply, you don't understand it well enough.

    Read More
  • Integrant

    Based in the U.S. with offshore development centers in Jordan and Egypt, Integrant builds quality custom software since 1992. Our staff becomes an extension of your team to eliminate the risks of software development outsourcing and our processes...

    Read More
  • ITWorx

    ITWorx is a global software professional services organization. Headquartered in Egypt, the company offers Portals, Business Intelligence, Enterprise Application Integration and Application Development Outsourcing services to Global 2000 companies.

    Read More
  • Information Technology Institute

    ITI is a leading national institute established in 1993 by the Information and Decision Support Centre.

    Read More

2015-01-30

A Guide For Dealing With Hierarchical, Parent-Child And Tree Form Data Operations



Lately I have been working on more than one project dealing with hierarchical data structures. This encouraged me to try to sum up my experience on this type of structures and corresponding operations.

From time to time I will revisit this post to update it with whatever new I found related to this topic. Please find the assembled posts below and let me know if you have any comments.


How To Apply Recursive SQL Selections On Hierarchical Data
Sometimes we work on systems where a hierarchical data structure exists on some entities like employees and their managers. Both employees and managers can be called employees but there is a self join relation between them as each employee must have a manager. We all faced the situation when we need to get the info about each employee and his/her direct manager. At this point we used to join between the employees (child) table and itself (parent) on the condition that the parent id of the child is equal to the id of the parent. This is good. But, what about if we need to get the hierarchical tree of managers of a certain employee not just his direct manager. It seems as we just need to the same join but more than one time till we are up all the way to the head manager. This is somehow logical but how can we do this number of joins and we don't know the number of levels up to the head manager?!!! If you want to know more, you can read this article.


How To Transform Unsorted Flat Hierarchical Data Structures Into Nested Parent-Child Or Tree Form Objects
Sometimes you have hierarchical data structure presented into an SQL database table where there is a parent-child relation between rows. You may need to transform this flat hierarchical data structure into a parent-child or tree form entity so that you can use it in more advanced or complex business like binding to a tree control or whatever. If you want to know how to do this transformation process, you can read this article.


How To Copy SQL Hierarchical Data At Run-time While Keeping Valid Internal References And Self Joins
Sometimes when you deal with hierarchical data structures you may need to perform internal copy operations. These copy operations should be handled in a smart way as just using INSERT-SELECT statements will mess up the internal references and self joins of the newly inserted records. If you want to know more about this, you can read this article.


For SQL Hierarchical Data, How To Show Only Tree Branches Including Certain Type Of Entities And Discard Others
If you have a tree of entities represented into a SQL database into parent-child relation, you may need to be able to view only the tree branches which include a certain type of entities and discard the branches which don't include this type. If you ever faced such situation or even curious to know how to manage such situation, you can read this article.


Knockout Advanced Tree Library & Control
This post shows you how to fully implement a tree control using knockout. This tree control has great features like; 01.Flat input data, 02.Dynamic node object properties, 03.Sorting by node object properties, 04.Searching by node object properties, 05.Searching by like patterns (case sensitive/insensitive) or whole words, 06.Searching by regular expressions, 07.Expanding to matching nodes, 08.Highlighting matching nodes, 09.Expand/Collapse, 10.Adding nodes, 11.Extensibility. It was taken into consideration while writing the code to separate the business-related code from the core implementation as far as possible without adding too much complexity to the whole solution. You can take this library as a base which you can modify to be adapted to your needs. Feel free to do it and for sure if you have any comments or ideas I am all ears.


That's it. Hope this will help you someday.


2015-01-23

Splitting Daytime Into Chunks To Enhance SQL Bulk Time-based Operations Performance





The best way to understand what this post is about is to start with a real scenario.

One of my colleagues was building a system which controls some motors using some readings coming from electronic sensors. The problem was that the frequency of the sensors readings was so high that the system had to insert a few reading records into the SQL database in few milliseconds. This means that in just a second the system had to insert hundreds or even thousands of records into the database.

This for sure was a big challenge and due to some customer needs and business requirements the approved solution was to compress the sensor readings per a fixed time range at the end of the day. This means that at the end of each day the sensor readings should be divided into groups where each group is limited by a time range of half an hour for example and then the readings of each group should be aggregated by taking the average. This way we will end up having hundreds of readings records into the database instead of thousands or even more. Till now everything is somehow good and logical.

A new challenge aroused while working on the SQL routine which will carry out the bulk aggregation process. The performance of this process was not promising. The process used to take a huge amount of time due to the heavy date-time comparisons and groupings. This was what encouraged me to jump in and try to help.

Before applying any changes the process was going like this; getting all sensor readings which were created between 00:00:00 and 00:15:00 and then taking the average for these readings, then, getting all sensor readings which were created between 00:15:00 and 00:30:00 and then taking the average for these readings, and so on......

As you can see this amount of processing and grouping based on time is huge and this was causing the whole process to be a nightmare performance wise.

One of the best ways to enhance the performance of bulk actions is to try to break the whole one big process into small controllable ones. Then, observe each small process and check whether it, for one record, depends only on the record itself and doesn't need to know anything about other records or not. If this happens, then this small process could be held on single record basis at early stage which could be the moment the record was created or something else based on the business requirements. This will help split the huge processing effort on different moments and stages of the system business life cycle and workflow which will make it more controllable and less recognizable.

Now, to apply the same concept on the problem we have on hand right now we should notice that:
  1. The chunks of time ranges (15 minutes) are the same for all readings and they do not change for any reason unless requested by system user which is done manually or through user interference
  2. The creation date of each sensor reading record is known at the moment the record is created
  3. Most of the processing effort is consumed on the date-time comparisons followed by the groupings
This helped me to decide that:
  1. The chunks of time ranges should be defined only once at the system launch and the results should be kept physically in a table to be used as a quick cached reference
  2. The time chunk to which each sensor reading record belongs should be decided at the moment the record is created
  3. This way each record can have an id of the time chunk it belongs to which will cause the grouping process to be much more easier and effortless


Create Database
USE [master]
GO

CREATE DATABASE [DayChunks] ON  PRIMARY 
( NAME = N'DayChunks', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
 LOG ON 
( NAME = N'DayChunks_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks_log.ldf' , SIZE = 1280KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

ALTER DATABASE [DayChunks] SET COMPATIBILITY_LEVEL = 100
GO

IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [DayChunks].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO

ALTER DATABASE [DayChunks] SET ANSI_NULL_DEFAULT OFF 
GO

ALTER DATABASE [DayChunks] SET ANSI_NULLS OFF 
GO

ALTER DATABASE [DayChunks] SET ANSI_PADDING OFF 
GO

ALTER DATABASE [DayChunks] SET ANSI_WARNINGS OFF 
GO

ALTER DATABASE [DayChunks] SET ARITHABORT OFF 
GO

ALTER DATABASE [DayChunks] SET AUTO_CLOSE OFF 
GO

ALTER DATABASE [DayChunks] SET AUTO_CREATE_STATISTICS ON 
GO

ALTER DATABASE [DayChunks] SET AUTO_SHRINK OFF 
GO

ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS ON 
GO

ALTER DATABASE [DayChunks] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO

ALTER DATABASE [DayChunks] SET CURSOR_DEFAULT  GLOBAL 
GO

ALTER DATABASE [DayChunks] SET CONCAT_NULL_YIELDS_NULL OFF 
GO

ALTER DATABASE [DayChunks] SET NUMERIC_ROUNDABORT OFF 
GO

ALTER DATABASE [DayChunks] SET QUOTED_IDENTIFIER OFF 
GO

ALTER DATABASE [DayChunks] SET RECURSIVE_TRIGGERS OFF 
GO

ALTER DATABASE [DayChunks] SET  DISABLE_BROKER 
GO

ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO

ALTER DATABASE [DayChunks] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO

ALTER DATABASE [DayChunks] SET TRUSTWORTHY OFF 
GO

ALTER DATABASE [DayChunks] SET ALLOW_SNAPSHOT_ISOLATION OFF 
GO

ALTER DATABASE [DayChunks] SET PARAMETERIZATION SIMPLE 
GO

ALTER DATABASE [DayChunks] SET READ_COMMITTED_SNAPSHOT OFF 
GO

ALTER DATABASE [DayChunks] SET HONOR_BROKER_PRIORITY OFF 
GO

ALTER DATABASE [DayChunks] SET  READ_WRITE 
GO

ALTER DATABASE [DayChunks] SET RECOVERY FULL 
GO

ALTER DATABASE [DayChunks] SET  MULTI_USER 
GO

ALTER DATABASE [DayChunks] SET PAGE_VERIFY CHECKSUM  
GO

ALTER DATABASE [DayChunks] SET DB_CHAINING OFF 
GO


Create Tables
USE [DayChunks]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[Settings](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [Name] [nvarchar](max) NOT NULL,
 [Value] [nvarchar](max) NULL,
 CONSTRAINT [PK_Settings] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO





CREATE TABLE [dbo].[DayChunks](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [ChunkNumOfMinutes] [int] NOT NULL,
 [ChunkNumOfSeconds] [int] NOT NULL,
 [ChunkStart] [time](0) NOT NULL,
 [ChunkEnd] [time](0) NOT NULL,
 [IsLastChunk] [bit] NOT NULL,
 CONSTRAINT [PK_DayChunks] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[DayChunks] ADD  CONSTRAINT [DF_DayChunks_IsLastChunk]  DEFAULT (0) FOR [IsLastChunk]
GO




CREATE TABLE [dbo].[SensorReadings](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [CreationDateTime] [datetime] NOT NULL,
 [Reading] [float] NOT NULL,
 [DayChunkID] [int] NULL,
 [DayDate] [date] NOT NULL,
 CONSTRAINT [PK_SensorReadings] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[SensorReadings] ADD  CONSTRAINT [DF_SensorReadings_CreationDateTime]  DEFAULT (getdate()) FOR [CreationDateTime]
GO

ALTER TABLE [dbo].[SensorReadings] ADD  CONSTRAINT [DF_SensorReadings_DayDate]  DEFAULT (getdate()) FOR [DayDate]
GO





CREATE TABLE [dbo].[AggregatedSensorReadings](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [DayChunkID] [int] NULL,
 [Reading] [float] NOT NULL,
 [DayDate] [date] NOT NULL,
 CONSTRAINT [PK_AggregatedSensorReadings] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Create Routines
USE [DayChunks]
GO


SET ANSI_NULLS ON
GO

IF EXISTS
(
 SELECT * FROM dbo.sysobjects
 WHERE id = object_id(N'[dbo].[SetDayChunksSettings]')
 AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
 
DROP PROCEDURE [dbo].[SetDayChunksSettings]

GO
CREATE PROCEDURE [dbo].[SetDayChunksSettings]
(
    @ChunksNumOfMinutes INT
 , @ChunksNumOfSeconds INT
 , @ChunksStartTime TIME(0)
)
AS
BEGIN
 DELETE FROM Settings
 WHERE Name IN
 (
  'DayChunksStartTime'
  , 'DayChunksNumOfMinutes'
  , 'DayChunksNumOfSeconds'
 )
 
 INSERT INTO Settings
 (
  Name
  , Value
 )
 VALUES
 (
  'DayChunksStartTime'
  , CAST(@ChunksStartTime AS NVARCHAR(MAX))
 )
 ,
 (
  'DayChunksNumOfMinutes'
  , CAST(@ChunksNumOfMinutes AS NVARCHAR(MAX))
 )
 ,
 (
  'DayChunksNumOfSeconds'
  , CAST(@ChunksNumOfSeconds AS NVARCHAR(MAX))
 )
END
GO





IF EXISTS
(
 SELECT * FROM dbo.sysobjects
 WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnInput]')
 AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
 
DROP PROCEDURE [dbo].[CreateDayChunksBasedOnInput]

GO
CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnInput]
(
    @ChunksNumOfMinutes INT
 , @ChunksNumOfSeconds INT
 , @ChunksStartTime TIME(0)
)
AS
BEGIN
 DECLARE @MaxEnd TIME(0) = DATEADD(minute, (-1 * @ChunksNumOfMinutes), @ChunksStartTime)
 SET @MaxEnd = DATEADD(second, (-1 * @ChunksNumOfSeconds), @MaxEnd)

 DECLARE @ChunkStart TIME(0) = @ChunksStartTime

 DECLARE @ChunkEnd TIME(0) = DATEADD(minute, @ChunksNumOfMinutes, @ChunksStartTime)
 SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd)
 
 TRUNCATE TABLE DayChunks
 
 WHILE (@ChunkEnd < @MaxEnd)
 BEGIN
  INSERT INTO DayChunks
  (
   ChunkNumOfMinutes
   , ChunkNumOfSeconds
   , ChunkStart
   , ChunkEnd
  )
  VALUES
  (
   @ChunksNumOfMinutes
   , @ChunksNumOfSeconds
   , @ChunkStart
   , @ChunkEnd
  )
     
  SET @ChunkStart = DATEADD(minute, @ChunksNumOfMinutes, @ChunkStart)
  SET @ChunkStart = DATEADD(second, @ChunksNumOfSeconds, @ChunkStart)
     
  SET @ChunkEnd = DATEADD(minute, @ChunksNumOfMinutes, @ChunkEnd)
  SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd)
 END

 IF(@ChunkEnd = @MaxEnd OR @ChunkEnd > @MaxEnd)
 BEGIN
  IF(@ChunkEnd > @MaxEnd)
  BEGIN
   SET @ChunkEnd = @MaxEnd
  END
  
  INSERT INTO DayChunks
  (
   ChunkNumOfMinutes
   , ChunkNumOfSeconds
   , ChunkStart
   , ChunkEnd
  )
  VALUES
  (
   @ChunksNumOfMinutes
   , @ChunksNumOfSeconds
   , @ChunkStart
   , @ChunkEnd
  )
  
  INSERT INTO DayChunks
  (
   ChunkNumOfMinutes
   , ChunkNumOfSeconds
   , ChunkStart
   , ChunkEnd
   , IsLastChunk
  )
  VALUES
  (
   @ChunksNumOfMinutes
   , @ChunksNumOfSeconds
   , @MaxEnd
   , @ChunksStartTime
   , 1
  )
 END
END
GO





IF EXISTS
(
 SELECT * FROM dbo.sysobjects
 WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnSettings]')
 AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
 
DROP PROCEDURE [dbo].[CreateDayChunksBasedOnSettings]
GO

CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnSettings]
AS
BEGIN
 DECLARE @ChunksNumOfMinutes INT
 DECLARE @ChunksNumOfSeconds INT
 DECLARE @ChunksStartTime TIME(0)
 
 IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfMinutes')
 BEGIN
  SELECT TOP 1 @ChunksNumOfMinutes = 
   CASE
    WHEN Value IS NULL THEN CAST(0 AS INT)
    ELSE CAST (Value AS INT)
   END
  FROM Settings
  WHERE Name = 'DayChunksNumOfMinutes'
 END
 ELSE
 BEGIN
  SET @ChunksNumOfMinutes = 0
 END

 IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfSeconds')
 BEGIN
  SELECT TOP 1 @ChunksNumOfSeconds = 
   CASE
    WHEN Value IS NULL THEN CAST(0 AS INT)
    ELSE CAST (Value AS INT)
   END
  FROM Settings
  WHERE Name = 'DayChunksNumOfSeconds'
 END
 ELSE
 BEGIN
  SET @ChunksNumOfSeconds = 0
 END

 IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksStartTime')
 BEGIN
  SELECT TOP 1 @ChunksStartTime = 
   CASE
    WHEN Value IS NULL THEN CAST('00:00:00' AS TIME(0))
    ELSE CAST (Value AS TIME(0))
   END
  FROM Settings
  WHERE Name = 'DayChunksStartTime'
 END
 ELSE
 BEGIN
  SET @ChunksStartTime = CAST('00:00:00' AS TIME(0))
 END
 
 EXEC [dbo].[CreateDayChunksBasedOnInput] @ChunksNumOfMinutes, @ChunksNumOfSeconds, @ChunksStartTime
END
GO





SET ANSI_NULLS ON
GO

IF EXISTS
(
 SELECT *
 FROM INFORMATION_SCHEMA.ROUTINES
 WHERE ROUTINE_NAME = 'GetDateTimeDayChunkID'
 AND ROUTINE_SCHEMA = 'dbo'
 AND ROUTINE_TYPE = 'FUNCTION'
)
 
DROP FUNCTION [dbo].[GetDateTimeDayChunkID]
GO

CREATE FUNCTION [dbo].[GetDateTimeDayChunkID] 
(
 @InputDateTime DATETIME
)
RETURNS INT
AS
BEGIN
 DECLARE @Result INT
 DECLARE @InputTime TIME(0) = @InputDateTime
 
 SELECT TOP 1 @Result = ID
 FROM DayChunks
 WHERE @InputTime BETWEEN ChunkStart AND ChunkEnd
 
 IF(@Result IS NULL)
 BEGIN
  SET @Result = (SELECT TOP 1 ID FROM DayChunks WHERE IsLastChunk = 1)
 END
 
 RETURN @Result
END
GO

Testing Solution
The script below tests the solution by simulating sensor readings every one second and the bulk aggregation process is held every one minute.
USE [DayChunks]
GO


-- Setting day chunks settings
-- DayChunksStartTime = '00:00:00'
-- DayChunksNumOfMinutes = 1
-- DayChunksNumOfSeconds = 0
-- This means that the day starts at 00:00:00 and is splitted into chunks each is 1 minute long
EXEC [dbo].[SetDayChunksSettings] 1, 0, '00:00:00'

-- Splitting the day into proper chunks based on the settings set in the previous step
EXEC [dbo].[CreateDayChunksBasedOnSettings]
GO


-- Simulating sensor readings every one second
TRUNCATE TABLE SensorReadings
GO

DECLARE @i INT;
DECLARE @SensorReading FLOAT
DECLARE @Now DATETIME

SET @i = 1;
WHILE (@i <= 90)
BEGIN
 WAITFOR DELAY '00:00:01'
 
 DECLARE @CreationDateTime1 DATETIME = GETDATE()
 DECLARE @RandomReading1 FLOAT
 SELECT TOP 1 @RandomReading1 = RAND()
 
 INSERT INTO SensorReadings
 (
  Reading
  , CreationDateTime
  , DayChunkID
 )
 VALUES
 (
  @RandomReading1
  , @CreationDateTime1
  , [dbo].[GetDateTimeDayChunkID](@CreationDateTime1)
 )
 
 SET  @i = @i + 1;
END


SELECT *
FROM SensorReadings


-- Applying aggregation on sensor readings
TRUNCATE TABLE AggregatedSensorReadings
GO

INSERT INTO AggregatedSensorReadings
(
 DayChunkID
 , Reading
 , DayDate
)
SELECT DayChunkID
, AVG(Reading)
, DayDate
FROM SensorReadings
GROUP BY DayDate, DayChunkID


SELECT AggregatedSensorReadings.*
, DayChunks.ChunkStart
, DayChunks.ChunkEnd
FROM AggregatedSensorReadings
LEFT OUTER JOIN DayChunks
ON DayChunks.ID = AggregatedSensorReadings.DayChunkID

Result

 Settings


DayChunks 1




DayChunks 2


SensorReadings 1


SensorReadings 2


AggregatedSensorReadings



That's it. This is just an example of an application of the main concept of splitting day time into chunks. You can still use this concept and adapt the code to your business needs.


Hope this will help you someday.
Good Luck.