Archive for October, 2012

DWH basics – columnstore index for fast DW

Posted: October 28, 2012 in MSSQL
Tags:
Task: to make performance tests to show a performance impact of the columnstore index (SQL Server 2012)


Recommendations:
  • to use on very large tables only (77 628 600 rows in my example; fact tables are good candidates)
  • when the columnstore index is created, the table becomes read-only (despite of this fact, update/insert/delete operations are possible, e.g. by means of disabling the columnstore index)

The process of testing a performance impact of the columnstore index consists of the following steps:
1. Create the testing table


USE [AdventureWorksDW2012]
GO

CREATE TABLE [dbo].[MyFactProductInventory](
  [ProductKey] [int] NOT NULL,
  [DateKey] [int] NOT NULL,
  [MovementDate] [date] NOT NULL,
  [UnitCost] [money] NOT NULL,
  [UnitsIn] [int] NOT NULL,
  [UnitsOut] [int] NOT NULL,
  [UnitsBalance] [int] NOT NULL,
) ON [PRIMARY]
GO

2. Create the clustered regular index


CREATE CLUSTERED INDEX [IX_MyFactProductInventory_Clustered]
ON [dbo].[MyFactProductInventory]([ProductKey])
GO

3. Create the sample data table (WARNING: this query may run up to 2-10 minutes based on your systems resources)


INSERT INTO [dbo].[MyFactProductInventory]
SELECT [Fact].* FROM [dbo].[FactProductInventory] AS [Fact]
GO 100

4. Performance tests (NOTE: comparing the regular (clustered / non-clustered) index with the columnstore index)


SET STATISTICS IO ON
GO
SET STATISTICS TIME ON
GO

5. Select the table with the clustered regular index


SELECT ProductKey, SUM(UnitCost) AS 'SumUnitCost',
  AVG(UnitsBalance) AS 'AvgUnitsBalance'
FROM [dbo].[MyFactProductInventory]
GROUP BY ProductKey
ORDER BY ProductKey
GO
/* First testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 5, logical reads 602219,
-- physical reads 81, read-ahead reads 379178, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 23665 ms, elapsed time = 18398 ms.
/* Second testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 5, logical reads 601939,
-- physical reads 0, read-ahead reads 602119, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 20670 ms, elapsed time = 22654 ms.
/* Third testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 5, logical reads 602244,
-- physical reads 0, read-ahead reads 598015, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 21123 ms, elapsed time = 20832 ms.

6.1. Create the non-clustered regular index


CREATE NONCLUSTERED INDEX [IX_MyFactProductInventory_Nonclustered]
ON [MyFactProductInventory](UnitCost, UnitsBalance, ProductKey)
GO

7.1. Select the table with the non-clustered regular index


SELECT ProductKey, SUM(UnitCost) AS 'SumUnitCost',
  AVG(UnitsBalance) AS 'AvgUnitsBalance'
FROM [dbo].[MyFactProductInventory]
GROUP BY ProductKey
ORDER BY ProductKey
GO
/* First testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 5, logical reads 290552,
-- physical reads 0, read-ahead reads 32391, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 31323 ms, elapsed time = 8665 ms.
/* Second testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 5, logical reads 290652,
-- physical reads 0, read-ahead reads 0, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 32448 ms, elapsed time = 8428 ms.
/* Third testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 5, logical reads 290597,
-- physical reads 0, read-ahead reads 0, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 31138 ms, elapsed time = 8351 ms.
ALTER INDEX [IX_MyFactProductInventory_Nonclustered]
ON [MyFactProductInventory] DISABLE
GO

6.2. Create the columnstore index


CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_MyFactProductInventory_ColumnStore]
ON [MyFactProductInventory](UnitCost, UnitsBalance, ProductKey)
GO

7.2 Select the table with the columnstore index


SELECT ProductKey, SUM(UnitCost) AS 'SumUnitCost',
  AVG(UnitsBalance) AS 'AvgUnitsBalance'
FROM [dbo].[MyFactProductInventory]
GROUP BY ProductKey
ORDER BY ProductKey
GO
/* First testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 4, logical reads 12497,
-- physical reads 4, read-ahead reads 19973, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 874 ms, elapsed time = 559 ms.
/* Second testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 4, logical reads 12320,
-- physical reads 0, read-ahead reads 4853, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 1043 ms, elapsed time = 363 ms.
/* Third testing results: */
-- (606 row(s) affected)
-- Table 'MyFactProductInventory'. Scan count 4, logical reads 12370,
-- physical reads 0, read-ahead reads 9278, lob logical reads 0,
-- lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times: CPU time = 874 ms, elapsed time = 435 ms.

8. Cleanup


DROP INDEX [IX_MyFactProductInventory_ColumnStore]
ON [dbo].[MyFactProductInventory]
GO
TRUNCATE TABLE [dbo].[MyFactProductInventory]
GO
DROP TABLE [dbo].[MyFactProductInventory]
GO
SET STATISTICS TIME OFF
GO
SET STATISTICS IO OFF
GO

Conclusion: the above results clearly show that the performance of the query is extremely fast after creating the columnstore index
  • for logical reads: approximately 48 times faster than the clustered and 23 times faster than the non-clustered index
  • for the elapsed time: approximately 33 times faster than the clustered and 16 times faster than the non-clustered index


Additional references:
Task: to show how simply to hack a system stored procedure in SQL Server 2005 and higher by means of DAC (Dedicated Administrator Connection) and resource database (the read-only mssqlsystemresource database)




WARNING: by hacking any part of SQL Server, e.g. a system stored procedure, you will lose the warranty from Microsoft. This blog post has the educational purpose only. For testing purposes, you should try it on the development machine only. I do not take any responsibility for any damages caused by this article.

Recommendations:
  • make a backup of the mssqlsystemresource database
  • if necessary, add “Full control” permissions for the user of “NT Service\MSSLQSERVER” (the account of SQL Server (MSSQLSERVER) service) on “C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn\mssqlsystemresource.ldf and mssqlsystemresource.mdf”
  • run the cmd.exe as an administrator

The process of altering, i.e. hacking, e.g. of the system stored procedure sys.sp_who2 consists of the following steps:
1. Windows > Run => RUN A NEW CMD WINDOW


net stop mssqlserver
net start mssqlserver
cd C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn
sqlcmd -S .\
sys.sp_who2
go
exit
exit @echo CLOSE THE CMD WINDOW

2. Windows > Run => RUN A NEW CMD WINDOW


net stop mssqlserver
cd C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn
sqlservr -s mssqlserver -m
@echo DO NOT CLOSE THE CMD WINDOW

3. Windows > Run => RUN A NEW CMD WINDOW


cd C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn
sqlcmd -S .\ -A
use mssqlsystemresource
go
alter database mssqlsystemresource set read_write
go
sp_helptext 'sys.sp_who2'
go
alter procedure sys.sp_who2 as select 'hacked procedure'
go
sp_helptext 'sys.sp_who2'
go
alter database mssqlsystemresource set read_only
go
exit
exit @echo CLOSE ALL RUNNING CMD WINDOWS

4. Windows > Run => RUN A NEW CMD WINDOW


net start mssqlserver
cd C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Binn
sqlcmd -S .\
sys.sp_who2
go
exit
exit @echo CLOSE THE CMD WINDOW


Abbreviations:
  • -A > dedicated admin connection
  • -S > server
  • -m > single user admin mode
  • -s > name (alternate registry key name)
  • .\ > localhost and default instance


Figure 1: SQLCMD result – the procedure sys.sp_who2 before hacking

Figure 1: SQLCMD result – the procedure sys.sp_who2 before hacking


Figure 2: SQLCMD result – the procedure sys.sp_who2 after hacking

Figure 2: SQLCMD result – the procedure sys.sp_who2 after hacking


Additional references:
Task: to create a SSIS package to incremental data loading by means of Change Tracking



The process of creating a SSIS solution consists of the following steps:
1. creating of a SSDT (SQL Server Data Tools) SSIS project
2. enabling of the ‘CHANGE TRACKING’ feature on a database and table level
3. creating of the ‘MySettings’ table and inserting a default value into the ‘LastVersion’ column
4. creating of the ‘Version_Get’ stored procedure
5. getting and setting of the ‘@LastVersion’ and ‘@CurrentVersion’ variables
6. creating of the FAKE DWH ‘BusinessEntity_DWH’ table
7. FAKE inserting into the ‘BusinessEntity’ table
8. incremental data loading into the fake DWH ‘BusinessEntity_DWH’ table (the core of ‘CHANGE TRACKING’)
9. saving of the ‘@CurrentVersion’ variable into the ‘LastVersion’ column
10. disabling of the ‘CHANGE TRACKING’ feature on a table and database level; deleting of created objects

Change Tracking – Control Flow

Figure 1: Change Tracking – Control Flow


Change Tracking – Data Flow

Figure 2: Change Tracking – Data Flow


Source code: SSIS Package (SQL Server 2012 – SSDT)
Note: do not forget to customize the ConnectionString in the Project.params setting

Additional references:
Task: to create a SSIS package to delete duplicates by means of Fuzzy logic



The process of creating a SSIS solution consists of the following steps:
1. creating of a SSDT (SQL Server Data Tools) SSIS project
2. defining of connection to data sources
3. defining of connection to saving duplicate records
4. defining of connection to saving error messages
5. defining of connection to a database
6. creating of a table in a target database
7. creating of a Data Flow diagram
8. removing of exact duplicates
9. removing of “similar” duplicates by means of Fuzzy logic
10. saving of retrieved duplicates

Deleting duplicates by means of Fuzzy logic

Figure 1: Deleting duplicates by means of Fuzzy logic


Source code: SSIS Package (SQL Server 2012 – SSDT)
Note: do not forget to customize the Connection Managers in solution setting

Additional references: