The best way to understand what this post is about is to start with a real scenario.
One of my colleagues was building a system which controls some motors using some readings coming from electronic sensors. The problem was that the frequency of the sensors readings was so high that the system had to insert a few reading records into the SQL database in few milliseconds. This means that in just a second the system had to insert hundreds or even thousands of records into the database.
This for sure was a big challenge and due to some customer needs and business requirements the approved solution was to compress the sensor readings per a fixed time range at the end of the day. This means that at the end of each day the sensor readings should be divided into groups where each group is limited by a time range of half an hour for example and then the readings of each group should be aggregated by taking the average. This way we will end up having hundreds of readings records into the database instead of thousands or even more. Till now everything is somehow good and logical.
A new challenge aroused while working on the SQL routine which will carry out the bulk aggregation process. The performance of this process was not promising. The process used to take a huge amount of time due to the heavy date-time comparisons and groupings. This was what encouraged me to jump in and try to help.
Before applying any changes the process was going like this; getting all sensor readings which were created between 00:00:00 and 00:15:00 and then taking the average for these readings, then, getting all sensor readings which were created between 00:15:00 and 00:30:00 and then taking the average for these readings, and so on......
As you can see this amount of processing and grouping based on time is huge and this was causing the whole process to be a nightmare performance wise.
One of the best ways to enhance the performance of bulk actions is to try to break the whole one big process into small controllable ones. Then, observe each small process and check whether it, for one record, depends only on the record itself and doesn't need to know anything about other records or not. If this happens, then this small process could be held on single record basis at early stage which could be the moment the record was created or something else based on the business requirements. This will help split the huge processing effort on different moments and stages of the system business life cycle and workflow which will make it more controllable and less recognizable.
Now, to apply the same concept on the problem we have on hand right now we should notice that:
- The chunks of time ranges (15 minutes) are the same for all readings and they do not change for any reason unless requested by system user which is done manually or through user interference
- The creation date of each sensor reading record is known at the moment the record is created
- Most of the processing effort is consumed on the date-time comparisons followed by the groupings
This helped me to decide that:
- The chunks of time ranges should be defined only once at the system launch and the results should be kept physically in a table to be used as a quick cached reference
- The time chunk to which each sensor reading record belongs should be decided at the moment the record is created
- This way each record can have an id of the time chunk it belongs to which will cause the grouping process to be much more easier and effortless
Create Database
USE [master]
GO
CREATE DATABASE [DayChunks] ON PRIMARY
( NAME = N'DayChunks', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks.mdf' , SIZE = 4096KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB )
LOG ON
( NAME = N'DayChunks_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\DayChunks_log.ldf' , SIZE = 1280KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO
ALTER DATABASE [DayChunks] SET COMPATIBILITY_LEVEL = 100
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [DayChunks].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO
ALTER DATABASE [DayChunks] SET ANSI_NULL_DEFAULT OFF
GO
ALTER DATABASE [DayChunks] SET ANSI_NULLS OFF
GO
ALTER DATABASE [DayChunks] SET ANSI_PADDING OFF
GO
ALTER DATABASE [DayChunks] SET ANSI_WARNINGS OFF
GO
ALTER DATABASE [DayChunks] SET ARITHABORT OFF
GO
ALTER DATABASE [DayChunks] SET AUTO_CLOSE OFF
GO
ALTER DATABASE [DayChunks] SET AUTO_CREATE_STATISTICS ON
GO
ALTER DATABASE [DayChunks] SET AUTO_SHRINK OFF
GO
ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS ON
GO
ALTER DATABASE [DayChunks] SET CURSOR_CLOSE_ON_COMMIT OFF
GO
ALTER DATABASE [DayChunks] SET CURSOR_DEFAULT GLOBAL
GO
ALTER DATABASE [DayChunks] SET CONCAT_NULL_YIELDS_NULL OFF
GO
ALTER DATABASE [DayChunks] SET NUMERIC_ROUNDABORT OFF
GO
ALTER DATABASE [DayChunks] SET QUOTED_IDENTIFIER OFF
GO
ALTER DATABASE [DayChunks] SET RECURSIVE_TRIGGERS OFF
GO
ALTER DATABASE [DayChunks] SET DISABLE_BROKER
GO
ALTER DATABASE [DayChunks] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
GO
ALTER DATABASE [DayChunks] SET DATE_CORRELATION_OPTIMIZATION OFF
GO
ALTER DATABASE [DayChunks] SET TRUSTWORTHY OFF
GO
ALTER DATABASE [DayChunks] SET ALLOW_SNAPSHOT_ISOLATION OFF
GO
ALTER DATABASE [DayChunks] SET PARAMETERIZATION SIMPLE
GO
ALTER DATABASE [DayChunks] SET READ_COMMITTED_SNAPSHOT OFF
GO
ALTER DATABASE [DayChunks] SET HONOR_BROKER_PRIORITY OFF
GO
ALTER DATABASE [DayChunks] SET READ_WRITE
GO
ALTER DATABASE [DayChunks] SET RECOVERY FULL
GO
ALTER DATABASE [DayChunks] SET MULTI_USER
GO
ALTER DATABASE [DayChunks] SET PAGE_VERIFY CHECKSUM
GO
ALTER DATABASE [DayChunks] SET DB_CHAINING OFF
GO
Create Tables
USE [DayChunks]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Settings](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](max) NOT NULL,
[Value] [nvarchar](max) NULL,
CONSTRAINT [PK_Settings] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DayChunks](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ChunkNumOfMinutes] [int] NOT NULL,
[ChunkNumOfSeconds] [int] NOT NULL,
[ChunkStart] [time](0) NOT NULL,
[ChunkEnd] [time](0) NOT NULL,
[IsLastChunk] [bit] NOT NULL,
CONSTRAINT [PK_DayChunks] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DayChunks] ADD CONSTRAINT [DF_DayChunks_IsLastChunk] DEFAULT (0) FOR [IsLastChunk]
GO
CREATE TABLE [dbo].[SensorReadings](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CreationDateTime] [datetime] NOT NULL,
[Reading] [float] NOT NULL,
[DayChunkID] [int] NULL,
[DayDate] [date] NOT NULL,
CONSTRAINT [PK_SensorReadings] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[SensorReadings] ADD CONSTRAINT [DF_SensorReadings_CreationDateTime] DEFAULT (getdate()) FOR [CreationDateTime]
GO
ALTER TABLE [dbo].[SensorReadings] ADD CONSTRAINT [DF_SensorReadings_DayDate] DEFAULT (getdate()) FOR [DayDate]
GO
CREATE TABLE [dbo].[AggregatedSensorReadings](
[ID] [int] IDENTITY(1,1) NOT NULL,
[DayChunkID] [int] NULL,
[Reading] [float] NOT NULL,
[DayDate] [date] NOT NULL,
CONSTRAINT [PK_AggregatedSensorReadings] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Create Routines
USE [DayChunks]
GO
SET ANSI_NULLS ON
GO
IF EXISTS
(
SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[SetDayChunksSettings]')
AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
DROP PROCEDURE [dbo].[SetDayChunksSettings]
GO
CREATE PROCEDURE [dbo].[SetDayChunksSettings]
(
@ChunksNumOfMinutes INT
, @ChunksNumOfSeconds INT
, @ChunksStartTime TIME(0)
)
AS
BEGIN
DELETE FROM Settings
WHERE Name IN
(
'DayChunksStartTime'
, 'DayChunksNumOfMinutes'
, 'DayChunksNumOfSeconds'
)
INSERT INTO Settings
(
Name
, Value
)
VALUES
(
'DayChunksStartTime'
, CAST(@ChunksStartTime AS NVARCHAR(MAX))
)
,
(
'DayChunksNumOfMinutes'
, CAST(@ChunksNumOfMinutes AS NVARCHAR(MAX))
)
,
(
'DayChunksNumOfSeconds'
, CAST(@ChunksNumOfSeconds AS NVARCHAR(MAX))
)
END
GO
IF EXISTS
(
SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnInput]')
AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
DROP PROCEDURE [dbo].[CreateDayChunksBasedOnInput]
GO
CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnInput]
(
@ChunksNumOfMinutes INT
, @ChunksNumOfSeconds INT
, @ChunksStartTime TIME(0)
)
AS
BEGIN
DECLARE @MaxEnd TIME(0) = DATEADD(minute, (-1 * @ChunksNumOfMinutes), @ChunksStartTime)
SET @MaxEnd = DATEADD(second, (-1 * @ChunksNumOfSeconds), @MaxEnd)
DECLARE @ChunkStart TIME(0) = @ChunksStartTime
DECLARE @ChunkEnd TIME(0) = DATEADD(minute, @ChunksNumOfMinutes, @ChunksStartTime)
SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd)
TRUNCATE TABLE DayChunks
WHILE (@ChunkEnd < @MaxEnd)
BEGIN
INSERT INTO DayChunks
(
ChunkNumOfMinutes
, ChunkNumOfSeconds
, ChunkStart
, ChunkEnd
)
VALUES
(
@ChunksNumOfMinutes
, @ChunksNumOfSeconds
, @ChunkStart
, @ChunkEnd
)
SET @ChunkStart = DATEADD(minute, @ChunksNumOfMinutes, @ChunkStart)
SET @ChunkStart = DATEADD(second, @ChunksNumOfSeconds, @ChunkStart)
SET @ChunkEnd = DATEADD(minute, @ChunksNumOfMinutes, @ChunkEnd)
SET @ChunkEnd = DATEADD(second, @ChunksNumOfSeconds, @ChunkEnd)
END
IF(@ChunkEnd = @MaxEnd OR @ChunkEnd > @MaxEnd)
BEGIN
IF(@ChunkEnd > @MaxEnd)
BEGIN
SET @ChunkEnd = @MaxEnd
END
INSERT INTO DayChunks
(
ChunkNumOfMinutes
, ChunkNumOfSeconds
, ChunkStart
, ChunkEnd
)
VALUES
(
@ChunksNumOfMinutes
, @ChunksNumOfSeconds
, @ChunkStart
, @ChunkEnd
)
INSERT INTO DayChunks
(
ChunkNumOfMinutes
, ChunkNumOfSeconds
, ChunkStart
, ChunkEnd
, IsLastChunk
)
VALUES
(
@ChunksNumOfMinutes
, @ChunksNumOfSeconds
, @MaxEnd
, @ChunksStartTime
, 1
)
END
END
GO
IF EXISTS
(
SELECT * FROM dbo.sysobjects
WHERE id = object_id(N'[dbo].[CreateDayChunksBasedOnSettings]')
AND OBJECTPROPERTY(id, N'IsProcedure') = 1
)
DROP PROCEDURE [dbo].[CreateDayChunksBasedOnSettings]
GO
CREATE PROCEDURE [dbo].[CreateDayChunksBasedOnSettings]
AS
BEGIN
DECLARE @ChunksNumOfMinutes INT
DECLARE @ChunksNumOfSeconds INT
DECLARE @ChunksStartTime TIME(0)
IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfMinutes')
BEGIN
SELECT TOP 1 @ChunksNumOfMinutes =
CASE
WHEN Value IS NULL THEN CAST(0 AS INT)
ELSE CAST (Value AS INT)
END
FROM Settings
WHERE Name = 'DayChunksNumOfMinutes'
END
ELSE
BEGIN
SET @ChunksNumOfMinutes = 0
END
IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksNumOfSeconds')
BEGIN
SELECT TOP 1 @ChunksNumOfSeconds =
CASE
WHEN Value IS NULL THEN CAST(0 AS INT)
ELSE CAST (Value AS INT)
END
FROM Settings
WHERE Name = 'DayChunksNumOfSeconds'
END
ELSE
BEGIN
SET @ChunksNumOfSeconds = 0
END
IF EXISTS (SELECT TOP 1 ID FROM Settings WHERE Name = 'DayChunksStartTime')
BEGIN
SELECT TOP 1 @ChunksStartTime =
CASE
WHEN Value IS NULL THEN CAST('00:00:00' AS TIME(0))
ELSE CAST (Value AS TIME(0))
END
FROM Settings
WHERE Name = 'DayChunksStartTime'
END
ELSE
BEGIN
SET @ChunksStartTime = CAST('00:00:00' AS TIME(0))
END
EXEC [dbo].[CreateDayChunksBasedOnInput] @ChunksNumOfMinutes, @ChunksNumOfSeconds, @ChunksStartTime
END
GO
SET ANSI_NULLS ON
GO
IF EXISTS
(
SELECT *
FROM INFORMATION_SCHEMA.ROUTINES
WHERE ROUTINE_NAME = 'GetDateTimeDayChunkID'
AND ROUTINE_SCHEMA = 'dbo'
AND ROUTINE_TYPE = 'FUNCTION'
)
DROP FUNCTION [dbo].[GetDateTimeDayChunkID]
GO
CREATE FUNCTION [dbo].[GetDateTimeDayChunkID]
(
@InputDateTime DATETIME
)
RETURNS INT
AS
BEGIN
DECLARE @Result INT
DECLARE @InputTime TIME(0) = @InputDateTime
SELECT TOP 1 @Result = ID
FROM DayChunks
WHERE @InputTime BETWEEN ChunkStart AND ChunkEnd
IF(@Result IS NULL)
BEGIN
SET @Result = (SELECT TOP 1 ID FROM DayChunks WHERE IsLastChunk = 1)
END
RETURN @Result
END
GO
Testing Solution
The script below tests the solution by simulating sensor readings every one second and the bulk aggregation process is held every one minute.
USE [DayChunks]
GO
-- Setting day chunks settings
-- DayChunksStartTime = '00:00:00'
-- DayChunksNumOfMinutes = 1
-- DayChunksNumOfSeconds = 0
-- This means that the day starts at 00:00:00 and is splitted into chunks each is 1 minute long
EXEC [dbo].[SetDayChunksSettings] 1, 0, '00:00:00'
-- Splitting the day into proper chunks based on the settings set in the previous step
EXEC [dbo].[CreateDayChunksBasedOnSettings]
GO
-- Simulating sensor readings every one second
TRUNCATE TABLE SensorReadings
GO
DECLARE @i INT;
DECLARE @SensorReading FLOAT
DECLARE @Now DATETIME
SET @i = 1;
WHILE (@i <= 90)
BEGIN
WAITFOR DELAY '00:00:01'
DECLARE @CreationDateTime1 DATETIME = GETDATE()
DECLARE @RandomReading1 FLOAT
SELECT TOP 1 @RandomReading1 = RAND()
INSERT INTO SensorReadings
(
Reading
, CreationDateTime
, DayChunkID
)
VALUES
(
@RandomReading1
, @CreationDateTime1
, [dbo].[GetDateTimeDayChunkID](@CreationDateTime1)
)
SET @i = @i + 1;
END
SELECT *
FROM SensorReadings
-- Applying aggregation on sensor readings
TRUNCATE TABLE AggregatedSensorReadings
GO
INSERT INTO AggregatedSensorReadings
(
DayChunkID
, Reading
, DayDate
)
SELECT DayChunkID
, AVG(Reading)
, DayDate
FROM SensorReadings
GROUP BY DayDate, DayChunkID
SELECT AggregatedSensorReadings.*
, DayChunks.ChunkStart
, DayChunks.ChunkEnd
FROM AggregatedSensorReadings
LEFT OUTER JOIN DayChunks
ON DayChunks.ID = AggregatedSensorReadings.DayChunkID
Result
Settings
DayChunks 1
DayChunks 2
SensorReadings 1
SensorReadings 2
AggregatedSensorReadings
That's it. This is just an example of an application of the main concept of splitting day time into chunks. You can still use this concept and adapt the code to your business needs.
Hope this will help you someday.
Good Luck.