Archive for June, 2012

SSAS basics – building my first cube

Posted: June 30, 2012 in MSSQL
Tags:
Task: to give a basic look at the building of my first data cube by means of SSDT by Microsoft Visual Studio



To build a data cube we will need to follow these few steps:

1. Creating a data source
To connect to this data source the service account is being used. Using the service account (the account that runs the SQL Server Analysis Server service) is common even in production. Make sure that the service account has privileges to read this data source.

Figure 1: Creating a data source – Configuration

Figure 1: Creating a data source – Configuration



2. Creating a data source view
Select FactInternetSales and click on the Add Related Tables button.

Figure 2a: Creating a data source view – Configuration

Figure 2a: Creating a data source view – Configuration


Figure 2b: Creating a data source view – Diagram of facts and dimensions

Figure 2b: Creating a data source view – Diagram of facts and dimensions



3. Building my first cube
Select the data source from the first step, then click on the Suggest button and at the end select measures and dimensions.

Figure 3a: Building my first cube

Figure 3a: Building my first cube


Figure 3b: Tweaking dimensions for user-friendly appearance

Figure 3b: Tweaking dimensions for user-friendly appearance


Any warnings appear after tweaking. We will ignore warnings for this basic example.

4. Deploying and querying the cube
At the last step, click on the Deploy Solution item on the Build menu and publish the cube.

Figure 4: Querying the cube

Figure 4: Querying the cube


Double click on the cube and navigate to the Browser tab. For example, drag and drop measures such as ExtendedAmount and dimensions and hierarchies such as Fiscal Quarter – Month hierarchy under the Due Date dimension into the query pane. We will retrieve requested data.

Source code: SSAS Package (SQL Server 2008 R2 and BIDS; Adventure Works DW2008R2)
Note: do not forget to customize connection strings etc

Additional references:
Task: to give a basic look at the import of a CSV file to a SQL Server table by means of SSDT by Microsoft Visual Studio



Sales per region,Africa,Asia,Europe
January,34,67,56
February,36,87,78
March,31,56,88
April,29,67,92
May,54,71,68
June,68,71,54
July,54,71,68
August,68,71,54
September,92,67,29
October,88,56,31
November,78,87,36
December,56,67,34

Source code 1: CSV data



USE [Test]
GO

CREATE TABLE [dbo].[SSIS_CSV_Import]
(
    [Month]      [varchar](9)    NOT NULL,
    [SaleAfrica] [numeric](2, 0) NOT NULL,
    [SaleAsia]   [numeric](2, 0) NOT NULL,
    [SaleEurope] [numeric](2, 0) NOT NULL
)
GO

Source code 2: Table structure


Flat File Connection Manager Editor – General

Figure 1: Flat File Connection Manager Editor – General


Flat File Connection Manager Editor – Columns

Figure 2: Flat File Connection Manager Editor – Columns


Flat File Connection Manager Editor – Advanced

Figure 3: Flat File Connection Manager Editor – Advanced


SQL Destination Editor – Connection Manager

Figure 4: SQL Destination Editor – Connection Manager


SQL Destination Editor – Mappings

Figure 5: SQL Destination Editor – Mappings


SSIS Package – Data Flow

Figure 6: SSIS Package - Data Flow



SELECT [Month], [SaleAfrica], [SaleAsia], [SaleEurope]
FROM   [dbo].[SSIS_CSV_Import]
GO
/* Result:
   Month       | Africa | Asia | Europe
   January       34       67	 56
   February      36       87     78
   March         31       56     88
   April         29       67     92
   May	         54       71     68
   June	         68       71     54
   July	         54       71     68
   August        68       71     54
   September     92       67     29
   October       88       56     31
   November      78       87     36
   December      56       67     34
*/

Source code 3: SQL query for result verification

Additional references:

Parallel query processing in SQL Server

Posted: June 24, 2012 in MSSQL
Tags:
Task: to give a basic look at the subsystem for parallel query processing in SQL Server



/* Testing laptop: ProBook 6540b, Intel Core i5 */

USE [AdventureWorks2012]
GO

SELECT [cpu_count]                     AS 'Logical CPU Count',
       [hyperthread_ratio]             AS 'Hyperthread Ratio',
       [cpu_count]/[hyperthread_ratio] AS 'Physical CPU Count'
FROM   [sys].[dm_os_sys_info]
GO
/* Result:
   Logical CPU Count | Hyperthread Ratio | Physical CPU Count
   4                   4	           1
*/

Source code 1: System configuration



SELECT [name], [value]
FROM   [sys].[configurations]
WHERE  [name] = 'cost threshold for parallelism'
       OR
       [name] = 'max degree of parallelism'
GO
/* Result:
   name                           | value
   cost threshold for parallelism   5 => default value, i.e. 5 seconds
   max degree of parallelism        0 => default value, i.e. parallelism
                                         with an automatic count of
                                         processors
*/


SELECT   *
FROM     [Sales].[SalesOrderDetail]        AS [sod]
         INNER JOIN [Production].[Product] AS [p]
         ON [sod].[ProductID] = [p].[ProductID]
ORDER BY [Style]
GO

Source code 2: Parallel query processing providing that the cost threshold parameter equals to 5


Execution plan for parallel query processing regarding the cost threshold parameter

Figure 1: Execution plan for parallel query processing regarding the cost threshold parameter



EXEC sp_configure 'show advanced option', '1'
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure "max degree of parallelism", 0
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure "cost threshold for parallelism", 10
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure 'show advanced option', '0'
RECONFIGURE WITH OVERRIDE
GO

SELECT [name], [value]
FROM   [sys].[configurations]
WHERE  [name] = 'cost threshold for parallelism'
       OR
       [name] = 'max degree of parallelism'
GO
/* Result:
   name                           | value
   cost threshold for parallelism   10 => custom value i.e., 10 seconds
   max degree of parallelism        0  => default value i.e., parallelism
                                          with an automatic count of
                                          processors
*/


SELECT   *
FROM     [Sales].[SalesOrderDetail]        AS [sod]
         INNER JOIN [Production].[Product] AS [p]
         ON [sod].[ProductID] = [p].[ProductID]
ORDER BY [Style]
GO


EXEC sp_configure 'show advanced option', '1'
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure "max degree of parallelism", 0
RECONFIGURE WITH OVERRIDE
GO
EXEC sp_configure "cost threshold for parallelism", 5
RECONFIGURE WITH OVERRIDE
GO

SELECT [name], [value]
FROM   [sys].[configurations]
WHERE  [name] = 'cost threshold for parallelism'
       OR
       [name] = 'max degree of parallelism'
GO
/* Result:
   name                           | value
   cost threshold for parallelism   5 => default value i.e., 5 seconds
   max degree of parallelism        0 => default value i.e., parallelism
                                         with an automatic count of
                                         processors
*/

Source code 3: Serial query processing providing that the cost threshold parameter equals to 10


Execution plan for serial query processing regarding the cost threshold parameter

Figure 2: Execution plan for serial query processing regarding the cost threshold parameter



SELECT   *
FROM     [Sales].[SalesOrderDetail]
ORDER BY [ProductID]

SELECT   *
FROM     [Sales].[SalesOrderDetail]
ORDER BY [ProductID]
OPTION   (MAXDOP 1) -- Note: query hint for one processor
GO

Source code 4: Execution plan – costs comparison for parallel and serial with OPTION (MAXDOP 1) query processing


Execution plan - cost 40% for parallel vs. 60% for serial with OPTION (MAXDOP 1) query processing

Figure 3: Execution plan - cost 40% for parallel vs. 60% for serial with OPTION (MAXDOP 1) query processing


Setting of physical and logical CPU count simulation

Figure 4: Setting of physical and logical CPU count simulation



SELECT [cpu_count]                     AS 'Logical CPU Count',
       [hyperthread_ratio]             AS 'Hyperthread Ratio',
       [cpu_count]/[hyperthread_ratio] AS 'Physical CPU Count'
FROM   [sys].[dm_os_sys_info]
GO
/* Result:
   Logical CPU Count | Hyperthread Ratio | Physical CPU Count
   16                  4	           4
*/

Source code 5: Simulating 4 physical and 16 logical CPU count

Note: Express Edition of SQL Server does not support simulating physical CPU count

General recommendations:
  • OLTP: on Pure OLTP system, where the transactions are smaller and queries are not long but very quick usually, set the “Maximum Degree of Parallelism” to 1 (one).
  • Data-warehousing / Reporting server: as queries will be running for long time, it is advised to set the “Maximum Degree of Parallelism” to 0 (zero).
  • Mixed System (OLTP & OLAP): the right balance has to be found. To set the “Maximum Degree of Parallelism” to 2, which means the query still uses parallelism but only on 2 CPUs. However, to keep the “Cost Threshold for Parallelism” very high.

Additional references: