Archive for November, 2012

DWH basics – partitioning

Posted: November 3, 2012 in MSSQL
Tags:
Task: to show how to horizontal partition a database table and how to apply different data compression to this partitions


SQL Server supports only one type of horizontal partitioning, the range partitions. It is the partitioning strategy in which data is partitioned based on the range that the value of a particular field falls in.

Figure 1: UML – Use Case

Figure 1: UML – Use Case

The process of horizontal partitioning of a database table and data compression consists of the following steps:
1. Create a new test database with two different filegroups (NOTE: the path C:\Temp\SQL Server\Partitioning\Primary and Secondary must already exist!)


IF EXISTS(
SELECT name
FROM sys.databases
WHERE name = N'Test_Partitioning')
DROP DATABASE Test_Partitioning
GO
CREATE DATABASE Test_Partitioning
ON PRIMARY
(NAME = 'TestPartitioning_Part2010',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Primary\TestPartitioning_Part2010.mdf',
SIZE = 5, -- ISSUE: The CREATE DATABASE statement failed. The primary file
-- must be at least 5 MB to accommodate a copy of the model database.
MAXSIZE = 100,
FILEGROWTH = 1),
FILEGROUP TestPartitioning_Part2011
(NAME = 'TestPartitioning_Part2011',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Secondary\TestPartitioning_Part2011.ndf',
SIZE = 2,
MAXSIZE = 100,
FILEGROWTH = 1),
FILEGROUP TestPartitioning_Part2012
(NAME = 'TestPartitioning_Part2012',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Secondary\TestPartitioning_Part2012.ndf',
SIZE = 2,
MAXSIZE = 100,
FILEGROWTH = 1)
GO

2. Create the partition range function


USE Test_Partitioning
GO
CREATE PARTITION FUNCTION TestPartitioning_PartitionFunction(DATE)
AS RANGE LEFT FOR
VALUES('2010-12-31', '2012-01-01')
-- NOTE: interval: (- infinity, 2010-12-31],
--                 [2011-01-01, 2011-12-31] and
--                 [2012-01-01, infinity +)
GO

3. Create the partition scheme and attach the partition scheme to filegroups


USE Test_Partitioning
GO
CREATE PARTITION SCHEME TestPartitioning_PartitionScheme
AS PARTITION TestPartitioning_PartitionFunction
TO([PRIMARY], TestPartitioning_Part2011, TestPartitioning_Part2012)
GO

4. Create a table with the partition key and the partition scheme


USE Test_Partitioning
GO
CREATE TABLE dbo.Test_PartitionedTable
(ID INT NOT NULL,
[Date] DATE)
ON TestPartitioning_PartitionScheme([Date])
GO

5. Create the index on the partitioned table (optional and recommended)


USE Test_Partitioning
GO
CREATE UNIQUE CLUSTERED INDEX IX_TestPartition_Table
ON dbo.Test_PartitionedTable([Date])
ON TestPartitioning_PartitionScheme([Date])
GO

6. Insert data into the partitioned table


USE Test_Partitioning
GO
INSERT INTO dbo.Test_PartitionedTable(ID, [Date])
VALUES
(1, DATEADD(YEAR, -3, GETDATE())), -- inserted in the partition 2009
(2, DATEADD(YEAR, -2, GETDATE())), -- inserted in the partition 2010
(3, DATEADD(YEAR, -1, GETDATE())), -- inserted in the partition 2011
(4,                   GETDATE())   -- inserted in the partition 2012
GO

7. Test data from the dbo.Test_PartitionedTable


USE Test_Partitioning
GO
SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
3   2011-10-19
4   2012-10-19
*/
GO

8. Verify rows inserted in partitions


USE Test_Partitioning
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  1	 0	                   0	              NONE
72057594039369728   245575913	1	   3	              72057594039369728	  1	 0	                   0	              NONE
*/
GO

9. SPLIT – add the new partition P4 for the next year 2013


USE Test_Partitioning
GO
ALTER DATABASE Test_Partitioning
ADD FILEGROUP TestPartitioning_Part2013
GO
ALTER DATABASE Test_Partitioning
ADD FILE
(NAME = 'TestPartitioning_Part2013',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Secondary\TestPartitioning_Part2013.ndf',
SIZE = 2,
MAXSIZE = 100,
FILEGROWTH = 1)
TO FILEGROUP TestPartitioning_Part2013
GO
ALTER PARTITION SCHEME TestPartitioning_PartitionScheme
NEXT USED TestPartitioning_Part2013
GO
ALTER PARTITION FUNCTION TestPartitioning_PartitionFunction()
SPLIT RANGE('2013-01-01')
-- NOTE: interval: (- infinity, 2010-12-31],
--                 [2011-01-01, 2011-12-31],
--                 [2012-01-01, 2012-12-31] and
--                 [2013-01-01, infinity +)
GO

10. SWITCH IN – move data from the staging table to the empty newly added partition P4


USE Test_Partitioning
GO

--INSERT INTO dbo.Test_PartitionedTable(ID, [Date])
--VALUES
--(5, DATEADD(YEAR,  1, GETDATE())), -- inserted in the partition 2013
--(6, DATEADD(YEAR,  2, GETDATE()))  -- inserted in the partition 2014
--GO

IF EXISTS(
SELECT name
FROM sys.databases
WHERE name = N'Test_Partitioning_Staging')
DROP DATABASE Test_Partitioning_Staging
GO
CREATE DATABASE Test_Partitioning_Staging
ON PRIMARY
(NAME = 'Test_Partitioning_Staging',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Primary\Test_Partitioning_Staging.mdf',
SIZE = 5, -- ISSUE: The CREATE DATABASE statement failed. The primary file
-- must be at least 5 MB to accommodate a copy of the model database.
MAXSIZE = 100,
FILEGROWTH = 1)
GO


USE Test_Partitioning_Staging
GO

CREATE TABLE dbo.Test_NonPartitionedStagingTable
(ID INT NOT NULL,
[Date] DATE)
GO
CREATE UNIQUE CLUSTERED INDEX IX_Test_NonPartitionedStagingTable
ON dbo.Test_NonPartitionedStagingTable([Date])
GO
INSERT INTO dbo.Test_NonPartitionedStagingTable(ID, [Date])
VALUES
(5, DATEADD(YEAR,  1, GETDATE())), -- inserted in the partition 2013
(6, DATEADD(YEAR,  2, GETDATE()))  -- inserted in the partition 2014
/*
SSMS RESULTS:
ID  Date
5   2013-10-31
6   2014-10-31
*/
GO


USE Test_Partitioning
GO

ALTER DATABASE Test_Partitioning
MODIFY FILEGROUP TestPartitioning_Part2012 DEFAULT
GO

CREATE TABLE dbo.Test_NonPartitionedStagingTemp
(ID INT NOT NULL,
[Date] DATE)
GO
CREATE UNIQUE CLUSTERED INDEX IX_Test_NonPartitionedStagingTemp
ON dbo.Test_NonPartitionedStagingTemp([Date])
GO
ALTER TABLE dbo.Test_NonPartitionedStagingTemp
ADD CONSTRAINT PartitionFunction CHECK ([Date] > '2013-01-01'
AND [Date] IS NOT NULL)
GO
INSERT INTO dbo.Test_NonPartitionedStagingTemp(ID, [Date])
SELECT ID, [Date]
FROM Test_Partitioning_Staging.dbo.Test_NonPartitionedStagingTable
GO
SELECT * FROM dbo.Test_NonPartitionedStagingTemp
GO
/*
SSMS RESULTS:
ID  Date
5   2013-10-31
6   2014-10-31
*/

ALTER TABLE dbo.Test_NonPartitionedStagingTemp
SWITCH TO dbo.Test_PartitionedTable PARTITION 4
GO
DROP TABLE dbo.Test_NonPartitionedStagingTemp
GO

ALTER DATABASE Test_Partitioning
MODIFY FILEGROUP [PRIMARY] DEFAULT
GO

SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
3   2011-10-19
4   2012-10-19
5   2013-10-19
6   2014-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  1	 0	                   0	              NONE
72057594039435264   245575913	1	   3	              72057594039435264	  1	 0	                   0                  NONE
72057594039369728   245575913	1	   4	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

11. SWITCH OUT – move data from the first partition P1 to the archive table dbo.Test_NonPartitionedArchiveTable


USE Test_Partitioning
GO

CREATE TABLE dbo.Test_NonPartitionedArchiveTable
(ID INT NOT NULL,
[Date] DATE)
ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX IX_NonPartitioned_Archive
ON dbo.Test_NonPartitionedArchiveTable([Date])
GO

ALTER TABLE dbo.Test_PartitionedTable SWITCH PARTITION 1
TO dbo.Test_NonPartitionedArchiveTable
GO

SELECT *
FROM dbo.Test_NonPartitionedArchiveTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   0	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  1	 0	                   0	              NONE
72057594039435264   245575913	1	   3	              72057594039435264	  1	 0	                   0                  NONE
72057594039369728   245575913	1	   4	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

12. MERGE – based on ‘2012-01-01’, i.e. P1 = PRIMARY FG (2010) + SECONDARY FG (2011)


USE Test_Partitioning
GO

ALTER PARTITION FUNCTION TestPartitioning_PartitionFunction()
MERGE RANGE('2012-01-01')
-- NOTE: interval: (- infinity, 2010-12-31]
--                 [2011-01-01, 2011-12-31],
--                 [2012-01-01, 2012-12-31] and
--                 [2013-01-01, infinity +)
GO
SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
3   2011-10-19
4   2012-10-19
5   2013-10-19
6   2014-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   0	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  2	 0	                   0	              NONE
72057594039369728   245575913	1	   3	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

INSERT INTO dbo.Test_PartitionedTable(ID, [Date])
VALUES
(1, DATEADD(YEAR, -3, GETDATE())), -- inserted in the partition 2009
(2, DATEADD(YEAR, -2, GETDATE()))  -- inserted in the partition 2010
GO
SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
3   2011-10-19
4   2012-10-19
5   2013-10-19
6   2014-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  2	 0	                   0	              NONE
72057594039369728   245575913	1	   3	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

13. Apply compression to your partitioned table


USE Test_Partitioning
GO
ALTER TABLE Test_PartitionedTable
REBUILD PARTITION = All
WITH 
(
DATA_COMPRESSION = PAGE ON Partitions(1), -- NOTE: (1 TO 3)
DATA_COMPRESSION = ROW  ON Partitions(2) ,
DATA_COMPRESSION = NONE ON Partitions(3)
);
GO

14. Apply compression to your partitioned index


USE Test_Partitioning
GO
ALTER INDEX IX_TestPartition_Table
ON dbo.Test_PartitionedTable
REBUILD PARTITION  = All
WITH 
(
DATA_COMPRESSION = PAGE ON Partitions(1),
DATA_COMPRESSION = ROW  ON Partitions(2),
DATA_COMPRESSION = NONE ON Partitions(3)
);
GO

15. Apply compression to your unpartitioned index


--USE Test_Partitioning
--GO
--ALTER INDEX YourUnpartitionedIndex
--ON dbo.Test_PartitionedTable
--REBUILD WITH (DATA_COMPRESSION = ROW);
--GO

16. Testing of application three types of data compression


USE Test_Partitioning
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   2	              PAGE
72057594039304192   245575913	1	   2	              72057594039304192	  2	 0	                   1	              ROW
72057594039369728   245575913	1	   3	              72057594039369728	  2	 0	                   0	              NONE
*/
GO


Recommendations:
  • filegroups can be setup as read-only. It will increase the performance
  • relational and cube partitions should be based on the same column
  • SQL Server 7.0 introduced partitioning through partitioned views
  • partitioning: when the number of rows is higher than 1M
  • roughly 25GB per partition is well manageable (100GB would be poorly manageable)
  • operations: SPLIT, MERGE and SWITCH are meta data operations only and do not involve any movement of data
  • usage: to migrate older data from more expensive disk to less expensive disk
Quotes from references:
  • “Though it is possible to create all partitions on PRIMARY but it would be best if these different partitions are stored in a separate file groups. This gives some performance improvement even in the case of single core computers. It would be best if these file groups are on different discs on a multi core processing machine.”
  • “A partitioned table may have a partitioned index. Partition aligned index views may also be created for this table.
    There may be question in your mind if it is possible to partition your table using multiple columns. The answer may be YES or NO. Why? No, because there is no such direct support for this in SQL Server. Yes, because you can still do that by using persisted computed column based on any number of columns you want.”
  • “For partitioning your existing table just drop the clustered index on your table and recreate it on the required partition scheme.”
  • “Though SQL Server does not directly support List Partitioning, you can create list partitions by tricking the partition function to specify the values with the LEFT clause. After that, put a CHECK constraint on the table, so that no other values are allowed to be inserted in the table specifying Partition Key column any value other than the ‘list’ of values.”
  • “It must be remembered that indexes can be partitioned using a different partition key than the table. The index can also have different numbers of partitions than the table. We cannot, however, partition the clustered index differently from the table. To partition an index, ON clause is used, specifying the partition scheme along with the column when creating the index:
    If your table and index use the same Partition function then they are called Aligned. If they go further and also use the same partition scheme as well, then they are called Storage Aligned (note this in the figure below). If you use the same partition function for partitioning index as used by the table, then generally performance is improved.”
Additional references: