Splitting Daytime Into Chunks To Enhance SQL Bulk Time-based Operations Performance
Posted by 1/23/2015 03:36:00 AM with No comments
on The best way to understand what this post is about is to start with a real scenario.
One of my colleagues was building a system which controls some motors using some readings coming from electronic sensors. The problem was that the frequency of the sensors readings was so high that the system had to insert a few reading records into the SQL database in few milliseconds. This means that in just a second the system had to insert hundreds or even thousands of records into the database.
This for sure was a big challenge and due to some customer needs and business requirements the approved solution was to compress the sensor readings per a fixed time range at the end of the day. This means that at the end of each day the sensor readings should be divided into groups where each group is limited by a time range of half an hour for example and then the readings of each group should be aggregated by taking the average. This way we will end up having hundreds of readings records into the database instead of thousands or even more. Till now everything is somehow good and logical.
A new challenge aroused while working on the SQL routine which will carry out the bulk aggregation process. The performance of this process was not promising. The process used to take a huge amount of time due to the heavy date-time comparisons and groupings. This was what encouraged me to jump in and try to help.
Before applying any changes the process was going like this; getting all sensor readings which were created between 00:00:00 and 00:15:00 and then taking the average for these readings, then, getting all sensor readings which were created between 00:15:00 and 00:30:00 and then taking the average for these readings, and so on......
As you can see this amount of processing and grouping based on time is huge and this was causing the whole process to be a nightmare performance wise.
One of the best ways to enhance the performance of bulk actions is to try to break the whole one big process into small controllable ones. Then, observe each small process and check whether it, for one record, depends only on the record itself and doesn't need to know anything about other records or not. If this happens, then this small process could be held on single record basis at early stage which could be the moment the record was created or something else based on the business requirements. This will help split the huge processing effort on different moments and stages of the system business life cycle and workflow which will make it more controllable and less recognizable.
Now, to apply the same concept on the problem we have on hand right now we should notice that:
- The chunks of time ranges (15 minutes) are the same for all readings and they do not change for any reason unless requested by system user which is done manually or through user interference
- The creation date of each sensor reading record is known at the moment the record is created
- Most of the processing effort is consumed on the date-time comparisons followed by the groupings
- The chunks of time ranges should be defined only once at the system launch and the results should be kept physically in a table to be used as a quick cached reference
- The time chunk to which each sensor reading record belongs should be decided at the moment the record is created
- This way each record can have an id of the time chunk it belongs to which will cause the grouping process to be much more easier and effortless
Create Database
USE [master] GO CREATE DATABASE [DayChunks] ON PRIMARY ( NAME = N'DayChunks', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ) LOG ON ( NAME = N'DayChunks_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks_log.ldf' , SIZE = 1280KB , MAXSIZE = 2048GB , FILEGROWTH = 10%) GO ALTER DATABASE [DayChunks] SET COMPATIBILITY_LEVEL = 100 GO IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled')) begin EXEC [DayChunks].[dbo].[sp_fulltext_database] @action = 'enable' end GO ALTER DATABASE [DayChunks] SET ANSI_NULL_DEFAULT OFF GO ALTER DATABASE [DayChunks] SET ANSI_NULLS OFF GO ALTER DATABASE [DayChunks] SET ANSI_PADDING OFF GO ALTER DATABASE [DayChunks] SET ANSI_WARNINGS OFF GO ALTER DATABASE [DayChunks] SET ARITHABORT OFF GO ALTER DATABASE [DayChunks] SET AUTO_CLOSE OFF GO ALTER DATABASE [DayChunks] SET AUTO_CREATE_STATISTICS ON GO ALTER DATABASE [DayChunks] SET AUTO_SHRINK OFF GO ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS ON GO ALTER DATABASE [DayChunks] SET CURSOR_CLOSE_ON_COMMIT OFF GO ALTER DATABASE [DayChunks] SET CURSOR_DEFAULT GLOBAL GO ALTER DATABASE [DayChunks] SET CONCAT_NULL_YIELDS_NULL OFF GO ALTER DATABASE [DayChunks] SET NUMERIC_ROUNDABORT OFF GO ALTER DATABASE [DayChunks] SET QUOTED_IDENTIFIER OFF GO ALTER DATABASE [DayChunks] SET RECURSIVE_TRIGGERS OFF GO ALTER DATABASE [DayChunks] SET DISABLE_BROKER GO ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS_ASYNC OFF GO ALTER DATABASE [DayChunks] SET DATE_CORRELATION_OPTIMIZATION OFF GO ALTER DATABASE [DayChunks] SET TRUSTWORTHY OFF GO ALTER DATABASE [DayChunks] SET ALLOW_SNAPSHOT_ISOLATION OFF GO ALTER DATABASE [DayChunks] SET PARAMETERIZATION SIMPLE GO ALTER DATABASE [DayChunks] SET READ_COMMITTED_SNAPSHOT OFF GO ALTER DATABASE [DayChunks] SET HONOR_BROKER_PRIORITY OFF GO ALTER DATABASE [DayChunks] SET READ_WRITE GO ALTER DATABASE [DayChunks] SET RECOVERY FULL GO ALTER DATABASE [DayChunks] SET MULTI_USER GO ALTER DATABASE [DayChunks] SET PAGE_VERIFY CHECKSUM GO ALTER DATABASE [DayChunks] SET DB_CHAINING OFF GO
Create Tables
USE [DayChunks] GO SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Settings]( [ID] [int] IDENTITY(1,1) NOT NULL, [Name] [nvarchar](max) NOT NULL, [Value] [nvarchar](max) NULL, CONSTRAINT [PK_Settings] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[DayChunks]( [ID] [int] IDENTITY(1,1) NOT NULL, [ChunkNumOfMinutes] [int] NOT NULL, [ChunkNumOfSeconds] [int] NOT NULL, [ChunkStart] [time](0) NOT NULL, [ChunkEnd] [time](0) NOT NULL, [IsLastChunk] [bit] NOT NULL, CONSTRAINT [PK_DayChunks] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[DayChunks] ADD CONSTRAINT [DF_DayChunks_IsLastChunk] DEFAULT (0) FOR [IsLastChunk] GO CREATE TABLE [dbo].[SensorReadings]( [ID] [int] IDENTITY(1,1) NOT NULL, [CreationDateTime] [datetime] NOT NULL, [Reading] [float] NOT NULL, [DayChunkID] [int] NULL, [DayDate] [date] NOT NULL, CONSTRAINT [PK_SensorReadings] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[SensorReadings] ADD CONSTRAINT [DF_SensorReadings_CreationDateTime] DEFAULT (getdate()) FOR [CreationDateTime] GO ALTER TABLE [dbo].[SensorReadings] ADD CONSTRAINT [DF_SensorReadings_DayDate] DEFAULT (getdate()) FOR [DayDate] GO CREATE TABLE [dbo].[AggregatedSensorReadings]( [ID] [int] IDENTITY(1,1) NOT NULL, [DayChunkID] [int] NULL, [Reading] [float] NOT NULL, [DayDate] [date] NOT NULL, CONSTRAINT [PK_AggregatedSensorReadings] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO
Create Routines
USE [DayChunks] GO SET ANSI_NULLS ON GO IF EXISTS ( SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[SetDayChunksSettings]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1 ) DROP PROCEDURE [dbo].[SetDayChunksSettings] GO CREATE PROCEDURE [dbo].[SetDayChunksSettings] ( @ChunksNumOfMinutes INT , @ChunksNumOfSeconds INT , @ChunksStartTime TIME(0) ) AS BEGIN DELETE FROM Settings WHERE Name IN ( 'DayChunksStartTime' , 'DayChunksNumOfMinutes' , 'DayChunksNumOfSeconds' ) INSERT INTO Settings ( Name , Value ) VALUES ( 'DayChunksStartTime' , CAST(@ChunksStartTime AS NVARCHAR(MAX)) ) , ( 'DayChunksNumOfMinutes' , CAST(@ChunksNumOfMinutes AS NVARCHAR(MAX)) ) , ( 'DayChunksNumOfSeconds' , CAST(@ChunksNumOfSeconds AS NVARCHAR(MAX)) ) END GO IF EXISTS ( SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnInput]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1 ) DROP PROCEDURE [dbo].[CreateDayChunksBasedOnInput] GO CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnInput] ( @ChunksNumOfMinutes INT , @ChunksNumOfSeconds INT , @ChunksStartTime TIME(0) ) AS BEGIN DECLARE @MaxEnd TIME(0) = DATEADD(minute, (-1 * @ChunksNumOfMinutes), @ChunksStartTime) SET @MaxEnd = DATEADD(second, (-1 * @ChunksNumOfSeconds), @MaxEnd) DECLARE @ChunkStart TIME(0) = @ChunksStartTime DECLARE @ChunkEnd TIME(0) = DATEADD(minute, @ChunksNumOfMinutes, @ChunksStartTime) SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd) TRUNCATE TABLE DayChunks WHILE (@ChunkEnd < @MaxEnd) BEGIN INSERT INTO DayChunks ( ChunkNumOfMinutes , ChunkNumOfSeconds , ChunkStart , ChunkEnd ) VALUES ( @ChunksNumOfMinutes , @ChunksNumOfSeconds , @ChunkStart , @ChunkEnd ) SET @ChunkStart = DATEADD(minute, @ChunksNumOfMinutes, @ChunkStart) SET @ChunkStart = DATEADD(second, @ChunksNumOfSeconds, @ChunkStart) SET @ChunkEnd = DATEADD(minute, @ChunksNumOfMinutes, @ChunkEnd) SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd) END IF(@ChunkEnd = @MaxEnd OR @ChunkEnd > @MaxEnd) BEGIN IF(@ChunkEnd > @MaxEnd) BEGIN SET @ChunkEnd = @MaxEnd END INSERT INTO DayChunks ( ChunkNumOfMinutes , ChunkNumOfSeconds , ChunkStart , ChunkEnd ) VALUES ( @ChunksNumOfMinutes , @ChunksNumOfSeconds , @ChunkStart , @ChunkEnd ) INSERT INTO DayChunks ( ChunkNumOfMinutes , ChunkNumOfSeconds , ChunkStart , ChunkEnd , IsLastChunk ) VALUES ( @ChunksNumOfMinutes , @ChunksNumOfSeconds , @MaxEnd , @ChunksStartTime , 1 ) END END GO IF EXISTS ( SELECT * FROM dbo.sysobjects WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnSettings]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1 ) DROP PROCEDURE [dbo].[CreateDayChunksBasedOnSettings] GO CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnSettings] AS BEGIN DECLARE @ChunksNumOfMinutes INT DECLARE @ChunksNumOfSeconds INT DECLARE @ChunksStartTime TIME(0) IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfMinutes') BEGIN SELECT TOP 1 @ChunksNumOfMinutes = CASE WHEN Value IS NULL THEN CAST(0 AS INT) ELSE CAST (Value AS INT) END FROM Settings WHERE Name = 'DayChunksNumOfMinutes' END ELSE BEGIN SET @ChunksNumOfMinutes = 0 END IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfSeconds') BEGIN SELECT TOP 1 @ChunksNumOfSeconds = CASE WHEN Value IS NULL THEN CAST(0 AS INT) ELSE CAST (Value AS INT) END FROM Settings WHERE Name = 'DayChunksNumOfSeconds' END ELSE BEGIN SET @ChunksNumOfSeconds = 0 END IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksStartTime') BEGIN SELECT TOP 1 @ChunksStartTime = CASE WHEN Value IS NULL THEN CAST('00:00:00' AS TIME(0)) ELSE CAST (Value AS TIME(0)) END FROM Settings WHERE Name = 'DayChunksStartTime' END ELSE BEGIN SET @ChunksStartTime = CAST('00:00:00' AS TIME(0)) END EXEC [dbo].[CreateDayChunksBasedOnInput] @ChunksNumOfMinutes, @ChunksNumOfSeconds, @ChunksStartTime END GO SET ANSI_NULLS ON GO IF EXISTS ( SELECT * FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_NAME = 'GetDateTimeDayChunkID' AND ROUTINE_SCHEMA = 'dbo' AND ROUTINE_TYPE = 'FUNCTION' ) DROP FUNCTION [dbo].[GetDateTimeDayChunkID] GO CREATE FUNCTION [dbo].[GetDateTimeDayChunkID] ( @InputDateTime DATETIME ) RETURNS INT AS BEGIN DECLARE @Result INT DECLARE @InputTime TIME(0) = @InputDateTime SELECT TOP 1 @Result = ID FROM DayChunks WHERE @InputTime BETWEEN ChunkStart AND ChunkEnd IF(@Result IS NULL) BEGIN SET @Result = (SELECT TOP 1 ID FROM DayChunks WHERE IsLastChunk = 1) END RETURN @Result END GO
Testing Solution
The script below tests the solution by simulating sensor readings every one second and the bulk aggregation process is held every one minute.
USE [DayChunks] GO -- Setting day chunks settings -- DayChunksStartTime = '00:00:00' -- DayChunksNumOfMinutes = 1 -- DayChunksNumOfSeconds = 0 -- This means that the day starts at 00:00:00 and is splitted into chunks each is 1 minute long EXEC [dbo].[SetDayChunksSettings] 1, 0, '00:00:00' -- Splitting the day into proper chunks based on the settings set in the previous step EXEC [dbo].[CreateDayChunksBasedOnSettings] GO -- Simulating sensor readings every one second TRUNCATE TABLE SensorReadings GO DECLARE @i INT; DECLARE @SensorReading FLOAT DECLARE @Now DATETIME SET @i = 1; WHILE (@i <= 90) BEGIN WAITFOR DELAY '00:00:01' DECLARE @CreationDateTime1 DATETIME = GETDATE() DECLARE @RandomReading1 FLOAT SELECT TOP 1 @RandomReading1 = RAND() INSERT INTO SensorReadings ( Reading , CreationDateTime , DayChunkID ) VALUES ( @RandomReading1 , @CreationDateTime1 , [dbo].[GetDateTimeDayChunkID](@CreationDateTime1) ) SET @i = @i + 1; END SELECT * FROM SensorReadings -- Applying aggregation on sensor readings TRUNCATE TABLE AggregatedSensorReadings GO INSERT INTO AggregatedSensorReadings ( DayChunkID , Reading , DayDate ) SELECT DayChunkID , AVG(Reading) , DayDate FROM SensorReadings GROUP BY DayDate, DayChunkID SELECT AggregatedSensorReadings.* , DayChunks.ChunkStart , DayChunks.ChunkEnd FROM AggregatedSensorReadings LEFT OUTER JOIN DayChunks ON DayChunks.ID = AggregatedSensorReadings.DayChunkID
Result
Settings
DayChunks 1
DayChunks 2
SensorReadings 1
SensorReadings 2
AggregatedSensorReadings
That's it. This is just an example of an application of the main concept of splitting day time into chunks. You can still use this concept and adapt the code to your business needs.
Hope this will help you someday.
Good Luck.