Task: to present near real-time (or batch) enterprise search platform built on the Apache Lucene project

Cloudera Search offers the following methods for indexing data at scale:

  • NRT indexing (Lily HBase NRT indexing or Flume NRT indexing)
  • batch indexing (Spark or MapReduce indexing: MapReduceIndexerTool or Lily HBase batch indexing)
cloudera_search_solr-archi

Architecture: NRT Cloudera Search

Environment:

  • Hadoop (big data) cluster: Cloudera (either an existing Cloudera infrastructure or Cloudera Quickstart VM)
  • Cloudera Search services: HBase with ZooKeeper, Key-Value Store Indexer (Lily NRT HBase indexer) and Solr; supporting services: Cloudera Manager, Hue, HDFS and YARN (with MapReduce included)
  • data ingestion: e.g. Talend Open Studio / Solr UI update (optional)
  • testing: Solr UI select / SOAP UI (optional)

Setup:
a) creating/enabling HBase table ‘Solr_Test’ with column family ‘cities1000’ and enabling replication for Lily HBase NRT indexing

hbase shell
create 'Solr_Test', 'cities1000'
alter 'Solr_Test', {NAME => 'cities1000', REPLICATION_SCOPE => 1}
enable 'Solr_Test'
cloudera_search_solr-hbase_script

Listing: HBase shell

b) creating Solr collection ‘solr_test-collection’ and schema ‘schema.xml’

solrctl instancedir --generate $HOME/solr_test-collection

download: schema.xml

solrctl instancedir --create solr_test-collection $HOME/solr_test-collection
solrctl collection --create solr_test-collection
cloudera_search_solr-solr_script

Listing: Solr shell

c) creating Lily HBase configuration files: ‘morphlines.conf’ and ‘morphline-hbase-mapper.xml’ and adding indexer
download: morphlines.conf
download: morphline-hbase-mapper.xml

hbase-indexer add-indexer \
--name SolrTestIndexer \
--indexer-conf $HOME/solr_test-collection/conf/morphline-hbase-mapper.xml \
--connection-param solr.collection=solr_test-collection \
--zookeeper quickstart.cloudera:2181
cloudera_search_solr-lily_script

Listing: HBase indexer

hbase-indexer list-indexers
cloudera_search_solr-lily_running

Listing: HBase indexers

d) additional settings
URL: http://quickstart.cloudera:7180

Key-Value Store Indexer -> “logging”
log4j.logger.org.kitesdk.morphline=TRACE
log4j.logger.com.ngdata=TRACE

cloudera_search_solr-lily_logging

Lily: logging setting

Cloudera Manager -> Clusters -> Key-Value Store Indexer -> Configuration
Java Heap Size of Lily HBase Indexer in Bytes -> 50 MB -> e.g. 1 GB (based on the input)

cloudera_search_solr-lily_heapsize

Lily: heap size setting

Cloudera Manager -> Clusters -> HBase -> Configuration
Java Heap Size of HBase Master in Bytes -> 50 MB -> e.g. 1 GB (based on the input)

cloudera_search_solr-hbase_heapsize_master

HBase: heap size master setting

Java Heap Size of HBase RegionServer in Bytes -> 50 MB -> e.g. 1 GB (based on the input)

cloudera_search_solr-hbase_heapsize_region

HBase: heap size region setting

e) verifying that the indexer works
URL: http://quickstart.cloudera:8983

cloudera_search_solr-solr_output_empty

Solr: indexer (empty)

Note: HBase indexer log file: /var/log/hbase-solr/ lily-hbase-indexer*.log.out

f) data ingest

cloudera_search_solr-talend_loading

Talend: data ingestion

cloudera_search_solr-hbase_data

Hue -> HBase: ingested data

Optional Solr data ingest in form (note: not related to HBase part!):
http://quickstart.cloudera:8983/solr/solr_test-collection_shard1_replica1/update/csv?commit=true&separator=%09&fieldnames=id,name
,,alternative_names,latitude,longitude,,,countrycode,,,,,,population,elevation,,timezone,lastupdate&stream.file
=/home/cloudera/solr_test-collection/cities1000.txt&overwrite=true&stream.contentType=text/plain;charset=utf-8

g) testing
URL: http://quickstart.cloudera:8983

At this point, if you run data ingestion (e.g. via job in Talend), in a matter of few seconds (i.e. near real-time), you will receive new data as result to query in Solr.

cloudera_search_solr-solr_output_full

Solr: indexed data (documents)

cloudera_search_solr-solr_output_query

Solr: query result

Field q (in query) accepts format field:value and accepts wildcard symbols.

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "indent": "true",
      "q": "name:Botta*",
      "_": "1544275090709",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 2,
    "start": 0,
    "docs": [
      {
        "countrycode": "IT",
        "alternative_names": "Bottanuco",
        "elevation": "222",
        "id": "3181668",
        "lastupdate": "2014-04-13",
        "timezone": "Europe/Rome",
        "name": "Bottanuco",
        "longitude": "9.50903",
        "latitude": "45.63931",
        "population": "5121",
        "_version_": 1619289130669179000
      },
      {
        "countrycode": "IT",
        "alternative_names": "Botta",
        "elevation": "",
        "id": "9036161",
        "lastupdate": "2014-05-20",
        "timezone": "Europe/Rome",
        "name": "Botta",
        "longitude": "9.53257",
        "latitude": "45.83222",
        "population": "751",
        "_version_": 1619289135325905000
      }
    ]
  }
}

Optional SOAP UI REST: http://quickstart.cloudera:8983/solr/solr_test-collection_shard1_replica1/select?q=name%3ABotta~&sort=score+desc%2C+name+asc&rows=6&fl=name%2C+score&wt=xml&indent=true

cloudera_search_solr-soapui_fuzzy

SOAP UI: REST query result

Abbreviations

  • EDH: enterprise data hub
  • DL: data lake
  • NRT: near real-time

Sources

References

Advertisement

EDH/DL vs EDW – Architecture Use Cases

Posted: January 13, 2018 in Hadoop
Tags:

Task: to compare EDH/DL vs. EDW and present architecture use cases based on main (IMHO known during writing) Apache Hadoop distributions: Cloudera (CDH) / Hortonworks (HDP)

EDH (source: Wikipedia)

A data hub is a collection of data from multiple sources organized for distribution, sharing, and often subsetting and sharing. Generally this data distribution is in the form of a hub and spoke architecture.

A data hub differs from a data warehouse in that it is generally unintegrated and often at different grains. It differs from an operational data store because a data hub does not need to be limited to operational data.

A data hub differs from a data lake by homogenizing data and possibly serving data in multiple desired formats, rather than simply storing it in one place, and by adding other value to the data such as de-duplication, quality, security, and a standardized set of query services. A Data Lake tends to store data in one place for availability, and allow/require the consumer to process or add value to the data.

DL (source: Wikipedia)

A data lake is a system or repository of data stored in its natural format, usually object blobs or files. A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video).

EDW (source: Wikipedia)

In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise.

Lambda and Kappa Architectures

Lambda (source: Wikipedia)

Is a data-processing architecture designed to handle massive quantities of data by taking advantage of both batch and stream-processing methods.

Kappa (source: Milinda Pathigage)

Is a software architecture pattern. Rather than using a relational DB like SQL or a key-value store like Cassandra, the canonical data store in a Kappa Architecture system is an append-only immutable log. From the log, data is streamed through a computational system and fed into auxiliary stores for serving.

Kappa Architecture is a simplification of Lambda Architecture. A Kappa Architecture system is like a Lambda Architecture system with the batch processing system removed. To replace batch processing, data is simply fed through the streaming system quickly.

Other sources:

Data Storage Formats / Data Storage Engines

  • Text (Data Storage Format)
  • Sequence File (Data Storage Format)
  • Apache Avro (Data Storage Format)
  • Apache Parquet (Data Storage Format)
  • Apache Optimized Row Columnar – ORC (Data Storage Format)
  • Apache HBase (Data Storage Engine)
  • Apache Kudu (Data Storage Engine)

Text:

  • More specifically text = csv, tsv, json records…
  • Convenient format to use to exchange with other applications or scripts that produce or read delimited files
  • Human readable and parsable
  • Data stores is bulky and not as efficient to query
  • Schema completely integrated with data
  • Do not support block compression

Sequence File:

  • Provides a persistent data structure for binary key-value pairs
  • Row based
  • Commonly used to transfer data between MapReduce jobs
  • Can be used as an archive to pack small files in Hadoop
  • Supports splitting even when the data is compressed

Apache Avro:

  • Widely used as a serialization platform
  • Row-based (row major format), offers a compact and fast binary format
  • Schema is encoded on the file, so the data can be untagged
  • Files support block compression and are splittable
  • Supports schema evolution
  • Supports nested data
  • No internal indexes (HDFS directory-based partitioning technique can be applied for fast random data access)

Apache Parquet:

  • Column-oriented binary file format (column major format suitable for efficient data analytics)
  • Uses the record shredding and assembly algorithm described in the Google’s Dremel paper
  • Each data file contains the values for a set of rows
  • Efficient in terms of disk I/O when specific columns need to be queried
  • Integrated compression (provides very good compaction ratios) and indexes
  • HDFS directory-based partitioning technique can be applied for fast random data access

Apache ORC – Optimized Row Columnar:

  • Considered the evolution of the RCFile (originally part of Hive)
  • Stores collections of rows and within the collection the row data is stored in columnar format
  • Introduces a lightweight indexing that enables skipping of irrelevant blocks of rows
  • Splittable: allows parallel processing of row collections
  • It comes with basic statistics on columns (min, max, sum, and count)
  • Integrated compression

Apache HBase:

  • Scalable and distributed NoSQL database on HDFS for storing key-value pairs (note: based on Google’s Bigtable) for hosting of very large tables: billions of rows X millions of columns
  • Keys are indexed which typically provides very quick access to the records
  • Suitable for: random, realtime read/write access to Big Data
  • Schemaless
  • Supports security labels

Apache Kudu:

  • Scalable and distributed table-based storage
  • Provides indexing and columnar data organization to achieve a good compromise between ingestion speed and analytics performance
  • Like in HBase case, Kudu APIs allows modifying the data already stored in the system

Data Storage Formats / Data Storage Engines Benchmarks:

Text (e.g. JSON): do not use it for processing!

Sequence File: MapReduce jobs relevant; not suitable to use it as a main data storage format!

Apache Avro: a fast-universal encoder for structured data. Due to very efficient serialization and deserialization, this format can guarantee very good performance whenever an access to all the attributes of a record is required at the same time – data transportation, staging areas etc.

Apache Parquet / Apache Kudu: columnar stores deliver very good flexibility between fast data ingestion, fast random data lookup and scalable data analytics, ensuring at the same time a system simplicity – only one technology for storing the data. Kudu excels faster random lookup when Parquet excels faster data scans and ingestion.

Apache ORC: minor differences in comparison to Apache Parquet (note: at the time of writing Impala does not support the ORC file format!)

Apache HBase: delivers very good random data access performance and the biggest flexibility in the way how data representations can be stored (schema-less tables). The performance of batch processing of HBase data heavily depends on a chosen data model and typically cannot compete on this field with the other technologies. Therefore, any analytics with HBase data should be performed rather rarely.

Alternatively to a single storage technology implementation, a hybrid system could be considered composed of a raw storage for batch processing (like Parquet) and indexing layer (like HBase) for random access. Notably, such approach comes at a price of data duplication and an overall complexity of a system architecture and higher maintenance costs. So, if a system simplicity is one of the important factors, Apache Kudu appears to be a good compromise.

Advantages / Disadvantages of “Row” and “Column” oriented Storages / Data Access Patterns:

  • In “row oriented” storage, the full contents of a record in a database is stored as a sequence of adjacent bytes. Reading a full record in row format is thus an efficient operation. However, reading the third column of each record in a file is not particularly efficient; disks read data in minimum amounts of 1 block (typically 4KB) which means that even if the exact location of the 3rd column of each record is known, lots of irrelevant data will be read and then discarded.
  • In the simplest form of “column oriented” storage, there is a separate file for each column in the table; for a single record each of its columns is written into a different file. Reading a full record in this format therefore requires reading a small amount of data from each file – not efficient. However, reading the third column of each record in a file is very efficient. There are ways of encoding data in “column-oriented” format which do not require file-per-column, but they all (in various ways) store column values from multiple records adjacent to each other.
  • Data access patterns which are oriented around reading whole records are best with “row oriented” formats. A typical example is a “call center” which retrieves a customer record and displays all fields of that record on the screen at once. Such applications often fall into the category “OLTP” (online transaction processing).
  • Queries which search large numbers of records for a small set of “matches” work well with “column oriented” formats. A typical example is “select count(*) from large_data_set where col3>10”. In this case, only col3 from the dataset is ever needed, and the “column oriented” layout minimises the total amount of disk reads needed. Operations which calculate sum/min/max and similar aggregate values over a set of records also work efficiently with column-oriented formats. Such applications often fall into the category “OLAP” (online analytics processing).
  • “Column oriented” storage also allows data to be compressed better than row-oriented formats; because all values in a column are adjacent, and they all have the same data type. A type specific compression algorithm can then be used (e.g. one specialized for compressing integers or dates or strings).
  • “Column oriented” storage does have a number of disadvantages. As noted earlier, reading a whole record is less efficient. Inserting records is also less efficient, as is deleting records. Supporting atomic and transactional behaviour is also more complex.

Infrastructure Overview (source: Cloudera)

Master Node (source: Cloudera)

Runs the Hadoop master daemons: NameNode, Standby NameNode, YARN Resource Manager and History Server, the HBase Master daemon, Sentry server, and the Impala StateStore Server and Catalog Server. Master nodes are also the location where Zookeeper and JournalNodes are installed. The daemons can often share single pool of servers. Depending on the cluster size, the roles can instead each be run on a dedicated server. Kudu Master Servers should also be deployed on master nodes.

Worker Node (source: Cloudera)

Runs the HDFS DataNode, YARN NodeManager, HBase RegionServer, Impala impalad, Search worker daemons and Kudu Tablet Servers.

Edge Node (source: Cloudera)

Contains all client-facing configurations and services, including gateway configurations for HDFS, YARN, Impala, Hive, and HBase. The edge node is also a good place for Hue, Oozie, HiveServer2, and Impala HAProxy. HiveServer2 and Impala HAProxy serve as a gateway to external applications such as Business Intelligence (BI) tools.

Utility Node (source: Cloudera)

Runs Cloudera Manager and the Cloudera Management Services. It can also host a MySQL (or another supported) database instance, which is used by Cloudera Manager, Hive, Sentry and other Hadoop-related projects.edh-dl_edw_architecture-use-cases_hdp-v3.png

Figure: Hortonworks Data Platform (source: Hortonworks)

Reference Vendor’s Infrastructure Architectures:

Security Overview (source: Cloudera)

Apache Atlas: Data Governance and Metadata framework for Hadoop: NOT supported by CDH platform; use CDH Navigator instead

Apache Knox: REST API and Application Gateway for the Apache Hadoop Ecosystem: NOT supported by CDH platform; a standard firewall will give you more or less the same functionality with respect to network security. More advanced security (authorization, authentication, encryption) are provided by other components in the stack (Kerberos, Sentry, HDFS encryption, etc.)

Apache Metron: Real-time big data security (cyber-crime) : NOT supported by CDH platform; use http://spot.incubator.apache.org/ instead

Apache Ranger: Framework to enable, monitor and manage comprehensive data security across the Hadoop platform: NOT supported by CDH platform; use Apache Sentry instead

Apache Sentry: Is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster: SUPPORTED by CDH platform; you can use Sentry or Ranger depends upon what Hadoop distribution tool that you are using like Cloudera or Hortonworks (Apache Sentry – Owned by Cloudera and Apache Ranger – Owned by Hortonworks; Ranger will not support Impala)

edh-dl_edw_architecture-use-cases_security-facets.png

Figure: Security overview (source: Cloudera)

edh-dl_edw_architecture-use-cases_security-ref-arch.png

Figure: Security architecture (source: Cloudera)

edh-dl_edw_architecture-use-cases_securing-and-governing-a-multitenant-data-lake.jpg

Figure: Securing and governing a multi-tenant data lake (source: Dataworks Summit)

References:

Big data introduction II

Posted: February 22, 2017 in Hadoop
Tags:

Big data introduction

Posted: November 29, 2016 in Hadoop
Tags:

ODS – universal codebooks and hierarchies

Posted: August 6, 2016 in MSSQL
Tags:

Task: to show how to create universal codebooks and hierarchies for Operational Data Storage (3NF) in DW solutions

a) universal codebooks:

ods_univ_codebook_diagram

CREATE TABLE [dbo].[CodeBook](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [CodeBookTypeAutoID] [int] NULL,
 [CodeBookID] [nvarchar](50) NULL,
 [CodeBookText] [nvarchar](50) NULL,
 [CodeBookTextEnglish] [nvarchar](50) NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_CodeBook] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[CodeBookType](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [CodeBookTypeID] [nvarchar](50) NULL,
 [CodeBookTypeText] [nvarchar](50) NULL,
 [CodeBookTypeTextEnglish] [nvarchar](50) NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_CodeBookType] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[CodeBook2Contract](
 [CodeBookTypeAutoID] [int] NULL,
 [CodeBookAutoID] [int] NULL,
 [ContractAutoID] [int] NULL,
 [ValidFrom] [int] NULL,
 [ValidTo] [int] NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[Contract](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [ContractID] [nvarchar](50) NULL,
 [ValidFrom] [int] NULL,
 [ValidTo] [int] NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_Contract] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


ALTER TABLE [dbo].[CodeBook2Contract] WITH NOCHECK ADD CONSTRAINT [FK_CodeBook2Contract_CodeBook] FOREIGN KEY([CodeBookAutoID])
REFERENCES [dbo].[CodeBook] ([AutoID])
GO
ALTER TABLE [dbo].[CodeBook2Contract] CHECK CONSTRAINT [FK_CodeBook2Contract_CodeBook]
GO

ALTER TABLE [dbo].[CodeBook2Contract] WITH NOCHECK ADD CONSTRAINT [FK_CodeBook2Contract_CodeBookType] FOREIGN KEY([CodeBookTypeAutoID])
REFERENCES [dbo].[CodeBookType] ([AutoID])
GO
ALTER TABLE [dbo].[CodeBook2Contract] CHECK CONSTRAINT [FK_CodeBook2Contract_CodeBookType]
GO

ALTER TABLE [dbo].[CodeBook2Contract] WITH NOCHECK ADD CONSTRAINT [FK_CodeBook2Contract_Contract] FOREIGN KEY([ContractAutoID])
REFERENCES [dbo].[Contract] ([AutoID])
GO
ALTER TABLE [dbo].[CodeBook2Contract] CHECK CONSTRAINT [FK_CodeBook2Contract_Contract]
GO


ALTER TABLE [dbo].[CodeBook] WITH NOCHECK ADD CONSTRAINT [FK_CodeBook_CodeBookType] FOREIGN KEY([CodeBookTypeAutoID])
REFERENCES [dbo].[CodeBookType] ([AutoID])
GO
ALTER TABLE [dbo].[CodeBook] CHECK CONSTRAINT [FK_CodeBook_CodeBookType]
GO


SET IDENTITY_INSERT [dbo].[CodeBookType] ON
GO
INSERT [dbo].[CodeBookType] ([AutoID], [CodeBookTypeID], [CodeBookTypeText], [CodeBookTypeTextEnglish], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, N'CS', NULL, N'Contract status', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[CodeBookType] OFF
GO


SET IDENTITY_INSERT [dbo].[CodeBook] ON
GO
INSERT [dbo].[CodeBook] ([AutoID], [CodeBookTypeAutoID], [CodeBookID], [CodeBookText], [CodeBookTextEnglish], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, 1, N'CSO', NULL, N'Open contract', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[CodeBook] OFF
GO


SET IDENTITY_INSERT [dbo].[Contract] ON
GO
INSERT [dbo].[Contract] ([AutoID], [ContractID], [ValidFrom], [ValidTo], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, N'1', NULL, NULL, NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[Contract] OFF
GO


INSERT [dbo].[CodeBook2Contract] ([CodeBookTypeAutoID], [CodeBookAutoID], [ContractAutoID], [ValidFrom], [ValidTo], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, 1, 1, NULL, NULL, NULL, NULL)
GO

 

ods_univ_codebook_result

b) universal hierarchies:

ods_univ_hierarchy_diagram

CREATE TABLE [dbo].[HierarchyType](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [HierarchyTypeID] [nvarchar](50) NULL,
 [HierarchyTypeText] [nvarchar](50) NULL,
 [HierarchyTypeTextEnglish] [nvarchar](50) NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_HierarchyType] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[Hierarchy](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [HierarchyID] [nvarchar](50) NULL,
 [HierarchyTypeAutoID] [int] NULL,
 [HierarchyText] [nvarchar](50) NULL,
 [HierarchyTextEnglish] [nvarchar](50) NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_Hierarchy] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[HierarchyLinkType](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [HierarchyLinkTypeID] [nvarchar](50) NULL,
 [HierarchyTypeSourceAutoID] [int] NULL,
 [HierarchyTypeDestinationAutoID] [int] NULL,
 [HierarchyLinkTypeText] [nvarchar](50) NULL,
 [HierarchyLinkTypeTextEnglish] [nvarchar](50) NULL,
 [ValidFrom] [int] NULL,
 [ValidTo] [int] NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_HierarchyLinkType] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


CREATE TABLE [dbo].[HierarchyLink](
 [AutoID] [int] IDENTITY(1,1) NOT NULL,
 [HierarchyLinkID] [nvarchar](50) NULL,
 [HierarchyLinkTypeAutoID] [int] NULL,
 [HierarchySourceAutoID] [int] NULL,
 [HierarchyDestinationAutoID] [int] NULL,
 [ValidFrom] [int] NULL,
 [ValidTo] [int] NULL,
 [SourceSystemAutoID] [int] NULL,
 [ExtractionDate] [int] NULL,
 CONSTRAINT [PK_HierarchyLink] PRIMARY KEY CLUSTERED 
( [AutoID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


ALTER TABLE [dbo].[Hierarchy] WITH NOCHECK ADD CONSTRAINT [FK_Hierarchy_HierarchyType] FOREIGN KEY([HierarchyTypeAutoID])
REFERENCES [dbo].[HierarchyType] ([AutoID])
GO
ALTER TABLE [dbo].[Hierarchy] CHECK CONSTRAINT [FK_Hierarchy_HierarchyType]
GO


ALTER TABLE [dbo].[HierarchyLinkType] WITH NOCHECK ADD CONSTRAINT [FK_HierarchyLinkType_HierarchyType] FOREIGN KEY([HierarchyTypeSourceAutoID])
REFERENCES [dbo].[HierarchyType] ([AutoID])
GO
ALTER TABLE [dbo].[HierarchyLinkType] CHECK CONSTRAINT [FK_HierarchyLinkType_HierarchyType]
GO


ALTER TABLE [dbo].[HierarchyLink] WITH NOCHECK ADD CONSTRAINT [FK_HierarchyLink_Hierarchy_S] FOREIGN KEY([HierarchySourceAutoID])
REFERENCES [dbo].[Hierarchy] ([AutoID])
GO
ALTER TABLE [dbo].[HierarchyLink] CHECK CONSTRAINT [FK_HierarchyLink_Hierarchy_S]
GO

ALTER TABLE [dbo].[HierarchyLink] WITH NOCHECK ADD CONSTRAINT [FK_HierarchyLink_Hierarchy_D] FOREIGN KEY([HierarchyDestinationAutoID])
REFERENCES [dbo].[Hierarchy] ([AutoID])
GO
ALTER TABLE [dbo].[HierarchyLink] CHECK CONSTRAINT [FK_HierarchyLink_Hierarchy_D]
GO

ALTER TABLE [dbo].[HierarchyLink] WITH NOCHECK ADD CONSTRAINT [FK_HierarchyLink_HierarchyLinkType] FOREIGN KEY([HierarchyLinkTypeAutoID])
REFERENCES [dbo].[HierarchyLinkType] ([AutoID])
GO
ALTER TABLE [dbo].[HierarchyLink] CHECK CONSTRAINT [FK_HierarchyLink_HierarchyLinkType]
GO


SET IDENTITY_INSERT [dbo].[HierarchyType] ON 
GO
INSERT [dbo].[HierarchyType] ([AutoID], [HierarchyTypeID], [HierarchyTypeText], [HierarchyTypeTextEnglish], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, N'ORG_L1', NULL, N'Level1', NULL, NULL)
GO
INSERT [dbo].[HierarchyType] ([AutoID], [HierarchyTypeID], [HierarchyTypeText], [HierarchyTypeTextEnglish], [SourceSystemAutoID], [ExtractionDate]) VALUES (2, N'ORG_L2', NULL, N'Level2', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HierarchyType] OFF
GO


SET IDENTITY_INSERT [dbo].[Hierarchy] ON 
GO
INSERT [dbo].[Hierarchy] ([AutoID], [HierarchyID], [HierarchyTypeAutoID], [HierarchyText], [HierarchyTextEnglish], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, N'ORG_L1_1', 1, NULL, N'ORG Level1 A', NULL, NULL)
GO
INSERT [dbo].[Hierarchy] ([AutoID], [HierarchyID], [HierarchyTypeAutoID], [HierarchyText], [HierarchyTextEnglish], [SourceSystemAutoID], [ExtractionDate]) VALUES (2, N'ORG_L2_1', 2, NULL, N'ORG Level2 A', NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[Hierarchy] OFF
GO


SET IDENTITY_INSERT [dbo].[HierarchyLinkType] ON 
GO
INSERT [dbo].[HierarchyLinkType] ([AutoID], [HierarchyLinkTypeID], [HierarchyTypeSourceAutoID], [HierarchyTypeDestinationAutoID], [HierarchyLinkTypeText], [HierarchyLinkTypeTextEnglish], [ValidFrom], [ValidTo], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, N'ORG_L1_L2', 1, 2, NULL, N'ORG Level1 - Level2', NULL, NULL, NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HierarchyLinkType] OFF
GO


SET IDENTITY_INSERT [dbo].[HierarchyLink] ON 
GO
INSERT [dbo].[HierarchyLink] ([AutoID], [HierarchyLinkID], [HierarchyLinkTypeAutoID], [HierarchySourceAutoID], [HierarchyDestinationAutoID], [ValidFrom], [ValidTo], [SourceSystemAutoID], [ExtractionDate]) VALUES (1, N'ORG_L1_1-ORG_L2_1', 1, 1, 2, NULL, NULL, NULL, NULL)
GO
SET IDENTITY_INSERT [dbo].[HierarchyLink] OFF
GO

 

ods_univ_hierarchy_result

Conclusion:

a) universal codebooks: one set of tables (3x codebook + 1 per entity) can contain more codebooks’ records (e.g. Contract status, Contract payment frequency, Client type etc.)

b) universal hierarchies: one set of tables (4 “common” tables) can contain more hierarchies’ records (e.g. Organisational structure, Business structure, Sales channels etc.) By using entities “HierarchyLinkType and HierarchyLink” is possible to create combinations between “source” and “destination” (bi-directional connections.)

Source code:

Additional references:

Task: to bring a brief overview based on my own experience acquired during the last few days in our company about building mobile BI (Business Intelligence) solutions related to SQL Server. This concept will be demonstrated mainly by means of a virtualized three-server testing environment for SharePoint/mobile BI platform via Hyper-V as a virtualization technology by Microsoft


At the time of writing this blog post in the world of Microsoft, there doesn’t exist any comprehensive cross-platform for the mobile BI (Apple, Android, Microsoft) environment for easy-building mobile BI solutions, including tablets and smartphones as well (iPad/iPhone, Android and Windows 8/Phones devices). The future Microsoft sees itself in the project called Mobile Helix Link, Power Map (codename “GeoFlow”) + Power Query (codename “Data Explorer”) for Excel and mainly in the technology “InfoNav” (codename) as a feature of Power BI (note: Power BI – self-service “power” tools: Power Pivot, Power View, Power Query and Power Map). Unfortunately, nowadays we can build our mobile BI solutions based on SQL Server as a data source in the following (different) ways:

  • IIS (Internet Information Services; as a very low cost solution): SSRS (SQL Server Reporting Services; note: limited use for the Apple platform, details are described below)
  • SharePoint Server: Excel Services (easier solutions), PerformancePoint Services (much more complex solutions)
  • OWA Server (Office Web Apps): Excel
  • 3rd party developer tools/platforms or ready-to-use solutions

Below, there is described a hardware and software configuration of a three-server testing environment for building SharePoint BI in general and mobile BI solutions based on this Microsoft platform.

Three-server SharePoint testing environment:

In a few bullets below, there are described the results of my investigation about some possibilities for mobile BI solutions based on the SQL Server platform.

Mobile BI (iPad/iPhone, Android, Windows 8/Phone devices) platform:

    1. IIS:

  • SSRS:
  • iPad: Safari/Chrome/Opera – NO, Mercury (Chrome and Firefox alternative) – NO, FF/IE – not supported
  • either to use a native application from app store, e.g.:
  • Mobi Reports Pro by Mobi Weave, Inc. (note: very good solution – I recommend it’s use)
  • SSRS Report Viewer Pro by By Ororo a.s. (note: iPad only)
  • or to use Remote Desktop Services/Terminal Services (Windows Server) instead of native application (note: in addition one CAL per user)
  • Android: FF – OK, Chrome/Opera – NO, Safari/IE – not supported
  • Windows 8: IE – OK, alternative – FF, Chrome/Opera – NO, Safari – not supported on tablet?
  • 2. SharePoint 2013:

  • Excel Services:
  • iPad: Safari – OK (note: slicers – OK), alternatives – Chrome/Mercury (Chrome and Firefox alternative), Opera – NO, FF/IE – not supported
  • Android: FF – OK (note: slicers – NO), Chrome/Opera/(Built-in Browser/Dolphin) – NO, Safari/IE – not supported
  • Windows 8: IE – OK, alternatives – FF/Chrome/Opera, Safari – not supported
  • PerformancePoint Services (Dashboards):
  • OLAP/Excel Services:
  • iPad: Safari – OK (note: slicers – OK), alternatives – Chrome/Mercury (Chrome and Firefox alternative), Opera – NO, FF/IE – not supported
  • Android: FF – OK (note: slicers – NO), Chrome/Opera/(Built-in Browser/Dolphin) – NO, Safari/IE – not supported
  • Windows 8: IE – OK, alternative – FF, Chrome/Opera – OK (note: zoom – NO), Safari – not supported
  • SSRS:
  • iPad: Safari/Chrome/Opera – NO, Mercury (Chrome and Firefox alternative) – NO, FF/IE – not supported
  • Android: FF – OK, Chrome/Opera/(Built-in Browser/Dolphin) – NO, Safari/IE – not supported
  • Windows 8: IE – OK, alternative – FF, Chrome/Opera – NO, Safari – not supported on tablet?
  • PowerPivot/PowerView: the output is rendered in Silverlight
  • iPad: SL – not supported on iOS (note: SL is available only for Windows and Mackintosh)
  • Android: SL – not supported
  • Windows 8: IE – OK, alternatives – FF/Chrome/Opera (note: scroll – NO), Safari – not supported
  • 3. OWA 2013 (Office Web Apps):

  • Deploy Office Web Apps Server
  • Configure SharePoint 2013 to use Office Web Apps
  • 4.a) 3rd party developer tools/platforms:

  • Build cross-platform iOS, Android, Mac and Windows apps with C# and .NET by Xamarin
  • UIFramework for .NET by ComponentArt
  • Data Visualization for Visual Studio by ComponentArt
  • 4.b) 3rd party ready-to-use solutions:

  • Datazen – Mobile BI for Windows 8, iPad/iPhone and Android by ComponentArt
  • Tableau Server by Tableau Software (note: I recommend it’s use)
  • iPad: web browsers – OK and native free app – OK
  • Android: web browsers – OK and native free app – OK
  • Windows 8: web browsers – OK (note: a native app isn’t available)
  • Mobi Reports Pro – BI for SSRS on iPad and iPhone by Mobi Weave
  • Mobi Office – Mobile Content Management by Mobi Weave

Testing devices:

  • iPad: 6.1.3 (note: iPhone – not tested)
  • Android: Samsung Tab 10.1 with OS: 4.0.3 (note: Android on a phone – not tested)
  • Windows 8: ThinkPad Tablet (note: Windows Phone – not tested)

Recommendations:

  • iPad: Mercury (note: for the sake of drill down in zoomed mode – PerformancePoint Services), alternatives – Chrome/Safari + Mobi Reports Pro for SSRS
  • Android: FF (note: without slicers)
  • Windows 8: IE (note: alternative – FF)
  • note: cross-platform independence is possible to achieve by using Remote Desktop Services/Terminal Services (Windows Server; in addition one CAL per user)


Figure 1: Dashboard (PerformancePoint Services)

Figure 1: Dashboard (PerformancePoint Services)


Figure 1: Dashboard (Tableau Server)

Figure 1: Dashboard (Tableau Server)


Conclusion: nowadays (at the time of writing the blog post), as a universal cross-platform solution for building mobile BI applications directly with Microsoft tools/frameworks, the use of Excel Services (more suited for analysts) or using dashboards (PerformancePoint Services) with charts, scorecards (more suited for top management) based on SharePoint platform seems to be the best but not ideal choice


Documents:

Additional references:
Task: to show how to render spatial data report in SSRS


New spatial data types geometry and geography were introduced in SQL Server 2008 and the new map feature in SQL Server Reporting Services 2008 R2.
The report will use the AdventureWorksDW2012 database and show reseller sales amount by state (US).
The process of creating the map report consists of the following steps:

Code 1: Create datasets


Data Source:
Data Source=localhost;Initial Catalog=AdventureWorksDW2012

Dataset:
dsResellerSales
SELECT g.StateProvinceCode, SUM(f.SalesAmount) AS 'SalesAmount'
FROM dbo.FactResellerSales f JOIN dbo.DimDate d ON d.DateKey = f.ShipDateKey
 JOIN dbo.DimReseller s ON s.ResellerKey = f.ResellerKey
 JOIN dbo.DimGeography g ON g.GeographyKey = s.GeographyKey
WHERE d.CalendarYear = @CalendarYear AND g.CountryRegionCode = 'US'
GROUP BY g.StateProvinceCode

Dataset:
dsOrderYears
SELECT DISTINCT YEAR(f.OrderDate) AS 'OrderYear'
FROM dbo.FactResellerSales f JOIN dbo.DimDate d ON d.DateKey = f.ShipDateKey
 JOIN dbo.DimReseller s ON s.ResellerKey = f.ResellerKey
 JOIN dbo.DimGeography g ON g.GeographyKey = s.GeographyKey
WHERE g.CountryRegionCode = 'US'
ORDER BY 'OrderYear'


Figure 1: New map layer – choose a source of spatial data

Figure 1: New map layer – choose a source of spatial data


Figure 2: New map layer – choose spatial data and map view options

Figure 2: New map layer – choose spatial data and map view options


Figure 3: New map layer – choose map visualization

Figure 3: New map layer – choose map visualization


Figure 4: New map layer – choose the analytical dataset

Figure 4: New map layer – choose the analytical dataset


Figure 5: New map layer – specify the match fields for spatial and analytical data

Figure 5: New map layer – specify the match fields for spatial and analytical data


Figure 6: New map layer – choose color theme and data visualization

Figure 6: New map layer – choose color theme and data visualization


Figure 7: Map layers – right click – map properties

Figure 7: Map layers – right click – map properties


Figure 8: Report parameter properties – calendar year

Figure 8: Report parameter properties – calendar year


Figure 9: Preview reseller sales amount map

Figure 9: Preview reseller sales amount map


Source code:

Additional references:
Task: to design an architecture for automated team-based development and deployment of data warehouse and business intelligence projects by means of open source or free products


Figure 1: UML – Use Case (Enterprise Architect)

Figure 1: UML – Use Case (Enterprise Architect)

Abbreviations/Notes:
SVN+WinMerge+VS(Visual Studio: SSDT/BIDS – Free Product, Schema and Data Compare):
SVN+VS:
Figure 2: SNV+VS (DB and BI Projects)

Figure 2: SNV+VS (DB and BI Projects)

Jenkins:
Figure 3: Jenkins – Build Pileline

Figure 3: Jenkins – Build Pileline (Source: Build Pipeline Plugin's Site)

Parameters:
  • Environment (Choice): DEV, TEST, PROD
  • INCLUDE_DB_STG (Boolean value): TRUE
  • INCLUDE_DB_ODS (Boolean value): TRUE
  • INCLUDE_DB_DWH (Boolean value): TRUE
  • INCLUDE_DB_LOG (Boolean value): TRUE
  • INCLUDE_SSIS_STG (Boolean value): TRUE
  • INCLUDE_SSIS_ODS (Boolean value): TRUE
  • INCLUDE_SSIS_DWH (Boolean value): TRUE
  • INCLUDE_SSAS (Boolean value): TRUE
  • INCLUDE_SSRS (Boolean value): TRUE
Source Code Management:
Build Triggers:
  • Build periodically: H H(0-5) * * 1-5 (note: every working day during night hours)
Extra Plugins:

Build:
1. Execute windows batch command – BUILD_DEVENV.bat


"%WORKSPACE%\CI_JENKINS\BUILD_DEVENV.bat" %Environment%

1.1 BUILD_DEVENV.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET devenv="C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\devenv.exe"
 SET solution="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\TEST.sln"
 %devenv% %solution% /build Debug
GOTO END

:TEST
 SET devenv="C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\devenv.exe"
 SET solution="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\TEST.sln"
 %devenv% %solution% /build Release
GOTO END

:PROD
 SET devenv="C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\devenv.exe"
 SET solution="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\TEST.sln"
 %devenv% %solution% /build Release
GOTO END

:UNKNOWN
GOTO END

:END

2. Execute windows batch command – DB_RDB_BACKUP.bat


"%WORKSPACE%\CI_JENKINS\DB_RDB_BACKUP.bat" %Environment%

2.1 DB_RDB_BACKUP.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET sqlcmd="sqlcmd.exe"
 SET backup="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_RDB_BACKUP.sql"
 SET server="server\test"
 %sqlcmd% -S %server% -i %backup%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

2.2 DB_RDB_BACKUP.sql


BACKUP DATABASE [DWH_TEST_DWH] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_DWH.bak' WITH NOFORMAT, NOINIT,  NAME = N'DWH_TEST_DWH-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

BACKUP DATABASE [DWH_TEST_LOG] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_LOG.bak' WITH NOFORMAT, NOINIT,  NAME = N'DWH_TEST_LOG-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

BACKUP DATABASE [DWH_TEST_ODS] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_ODS.bak' WITH NOFORMAT, NOINIT,  NAME = N'DWH_TEST_ODS-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

BACKUP DATABASE [DWH_TEST_STG] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_STG.bak' WITH NOFORMAT, NOINIT,  NAME = N'DWH_TEST_STG-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

2.3 DB_RDB_RESTORE.bat (optional)


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET sqlcmd="sqlcmd.exe"
 SET backup="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_RDB_RESTORE.sql"
 SET server="server\test"
 %sqlcmd% -S %server% -i %backup%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

2.4 DB_RDB_RESTORE.sql (optional)


USE [master]
ALTER DATABASE [DWH_TEST_DWH] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
RESTORE DATABASE [DWH_TEST_DWH] FROM  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_DWH.bak' WITH  FILE = 4,  MOVE N'DWH_TEST_DWH' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_DWH.mdf',  MOVE N'DWH_TEST_DWH_log' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_DWH.ldf',  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [DWH_TEST_DWH] SET MULTI_USER
GO

USE [master]
ALTER DATABASE [DWH_TEST_LOG] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
RESTORE DATABASE [DWH_TEST_LOG] FROM  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_LOG.bak' WITH  FILE = 4,  MOVE N'DWH_TEST_LOG' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_LOG.mdf',  MOVE N'DWH_TEST_LOG_log' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_LOG.ldf',  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [DWH_TEST_LOG] SET MULTI_USER
GO

USE [master]
ALTER DATABASE [DWH_TEST_ODS] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
RESTORE DATABASE [DWH_TEST_ODS] FROM  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_ODS.bak' WITH  FILE = 4,  MOVE N'DWH_TEST_ODS' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_ODS.mdf',  MOVE N'DWH_TEST_ODS_log' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_ODS.ldf',  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [DWH_TEST_ODS] SET MULTI_USER
GO

USE [master]
ALTER DATABASE [DWH_TEST_STG] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
RESTORE DATABASE [DWH_TEST_STG] FROM  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\Backup\DWH_TEST_STG.bak' WITH  FILE = 4,  MOVE N'DWH_TEST_STG' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_STG.mdf',  MOVE N'DWH_TEST_STG_log' TO N'D:\MSSQL\MSSQL11.TEST\MSSQL11.TEST\MSSQL\DATA\DWH_TEST_STG.ldf',  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [DWH_TEST_STG] SET MULTI_USER
GO

3. Execute windows batch command – DB_SSIS_BACKUP.bat


"%WORKSPACE%\CI_JENKINS\DB_SSIS_BACKUP.bat" %Environment%

3.1 DB_SSIS_BACKUP.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET sqlcmd="sqlcmd.exe"
 SET backup="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_SSIS_BACKUP.sql"
 SET server="server"
 %sqlcmd% -S %server% -i %backup%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

3.2 DB_SSIS_BACKUP.sql


BACKUP DATABASE [SSISDB] TO  DISK = N'D:\MSSQL\Backup\SSISDB.bak' WITH NOFORMAT, NOINIT,  NAME = N'SSISDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

3.3 DB_SSIS_RESTORE.bat (optional)


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET sqlcmd="sqlcmd.exe"
 SET backup="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_SSIS_RESTORE.sql"
 SET server="server"
 %sqlcmd% -S %server% -i %backup%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

3.4 DB_SSIS_RESTORE.sql (optional)


USE [master]
ALTER DATABASE [SSISDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
RESTORE DATABASE [SSISDB] FROM  DISK = N'D:\MSSQL\Backup\SSISDB.bak' WITH  FILE = 3,  MOVE N'data' TO N'D:\MSSQL\Data\SSISDB.mdf',  MOVE N'log' TO N'D:\MSSQL\Data\SSISDB.ldf',  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [SSISDB] SET MULTI_USER
GO

4. Execute windows batch command – DB_SSRS_BACKUP.bat


"%WORKSPACE%\CI_JENKINS\DB_SSRS_BACKUP.bat" %Environment%

4.1 DB_SSRS_BACKUP.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET sqlcmd="sqlcmd.exe"
 SET backup="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_SSRS_BACKUP.sql"
 SET server="server\test"
 %sqlcmd% -S %server% -i %backup%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

4.2 DB_SSRS_BACKUP.sql


BACKUP DATABASE [ReportServer$TEST] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL\Backup\ReportServer$TEST.bak' WITH NOFORMAT, NOINIT,  NAME = N'ReportServer$TEST-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

BACKUP DATABASE [ReportServer$TESTTempDB] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL\Backup\ReportServer$TESTTempDB.bak' WITH NOFORMAT, NOINIT,  NAME = N'ReportServer$TESTTempDB-Full Database Backup', SKIP, NOREWIND, NOUNLOAD,  STATS = 10
GO

4.3 DB_SSRS_RESTORE.bat (optional)


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET sqlcmd="sqlcmd.exe"
 SET backup="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_SSRS_RESTORE.sql"
 SET server="server\test"
 %sqlcmd% -S %server% -i %backup%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

4.4 DB_SSRS_RESTORE.sql (optional)


USE [master]
ALTER DATABASE [ReportServer$TEST] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
BACKUP LOG [ReportServer$TEST] TO  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL\Backup\ReportServer$TEST_LogBackup_2013-05-25_11-54-42.bak' WITH NOFORMAT, NOINIT,  NAME = N'ReportServer$TEST_LogBackup_2013-05-25_11-54-42', NOSKIP, NOREWIND, NOUNLOAD,  NORECOVERY ,  STATS = 5
RESTORE DATABASE [ReportServer$TEST] FROM  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL\Backup\ReportServer$TEST.bak' WITH  FILE = 2,  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [ReportServer$TEST] SET MULTI_USER
GO

USE [master]
ALTER DATABASE [ReportServer$TESTTempDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE
RESTORE DATABASE [ReportServer$TESTTempDB] FROM  DISK = N'D:\MSSQL\MSSQL11.TEST\MSSQL\Backup\ReportServer$TESTTempDB.bak' WITH  FILE = 2,  NOUNLOAD,  REPLACE,  STATS = 5
ALTER DATABASE [ReportServer$TESTTempDB] SET MULTI_USER
GO

4.5 SSRS_BACKUP.bat (optional)


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET ItemPath="/SSRS_TEST"
 SET ReportServerURL=http://server/ReportServer_TEST
 SET BackupFolder="C:\temp\CI_JENKINS\Backups\SSRS"
 SET ScriptFolder="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_BACKUP.rss"
 RS -i %ScriptFolder% -s %ReportServerURL% -v ItemPath="%ItemPath%" -v BackupFolder="%BackupFolder%"
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

4.6 SSRS_BACKUP.rss (optional)


Public Sub Main()
 
Try
 
rs.Credentials = System.Net.CredentialCache.DefaultCredentials
 
Dim Items as CatalogItem()
 
Dim Item as CatalogItem
 
Dim ReportName As String
 
Items = rs.ListChildren(ItemPath, false)
 
Console.Writeline()
 
Console.Writeline("Reports Back Up Started.")
 
For Each Item in Items
 
ReportName = ItemPath + "/" + Item.Name
 
Dim reportDefinition As Byte() = Nothing
 
Dim rdlReport As New System.Xml.XmlDocument
reportDefinition = rs.GetReportDefinition(ReportName)
Dim Stream As New MemoryStream(reportDefinition)
 
Dim curDate as Date = Date.Now()
Dim strDate as String = curDate.ToString("dd-MM-yyyy")
 
Dim BackupFolderNew as String = BackupFolder+"\"+strDate+"\"+ItemPath
 
If(Not System.IO.Directory.Exists(BackupFolderNew )) Then
    System.IO.Directory.CreateDirectory(BackupFolderNew)
End If
 
rdlReport.Load(Stream)
rdlReport.Save(BackupFolderNew + "\" + Item.Name +".rdl")
 
Console.Writeline("Report " + Item.Name +".rdl Backed up Successfully")
 
Next
 
Console.Writeline("Reports Back Up Completed.")
 
Console.Writeline()
 
catch e As Exception
 
Console.Writeline(e.Message)
 
End Try
 
End Sub

5. Execute windows batch command – SSAS_BACKUP.bat


"%WORKSPACE%\CI_JENKINS\SSAS_BACKUP.bat" %Environment%

5.1 SSAS_BACKUP.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET ASCMD="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSAS_ASCMD.exe"
 SET XMLA="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSAS_BACKUP.xmla"
 SET SERVER="server\test"
 %ASCMD% -S %SERVER% -i %XMLA%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

5.2 SSAS_BACKUP.xmla


<Backup xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Object>
    <DatabaseID>SSAS</DatabaseID>
  </Object>
  <File>SSAS.abf</File>
  <AllowOverwrite>true</AllowOverwrite>
</Backup>

5.3 SSAS_RESTORE.xmla (optional)


<Restore xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <File>SSAS.abf</File>
  <DatabaseName>SSAS</DatabaseName>
  <AllowOverwrite>true</AllowOverwrite>
</Restore>

6. Execute windows batch command – INCLUDE_DB_STG.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_DB_STG.bat" %INCLUDE_DB_STG%, %Environment%

6.1 INCLUDE_DB_STG.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_STG_DEPLOY.bat" %2
GOTO END

:END

6.2 DB_STG_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET SqlPackage="C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe"
 SET publish="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_STG\DB_STG_server.publish.xml"
 SET dacpac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_STG\bin\Debug\DB_STG.dacpac"
 %SqlPackage% /pr:%publish% /sf:%dacpac% /a:Publish
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

7. Execute windows batch command – INCLUDE_DB_ODS.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_DB_ODS.bat" %INCLUDE_DB_ODS%, %Environment%

7.1 INCLUDE_DB_ODS.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_ODS_DEPLOY.bat" %2
GOTO END

:END

7.2 DB_ODS_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET SqlPackage="C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe"
 SET publish="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_ODS\DB_ODS_server.publish.xml"
 SET dacpac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_ODS\bin\Debug\DB_ODS.dacpac"
 %SqlPackage% /pr:%publish% /sf:%dacpac% /a:Publish
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

8. Execute windows batch command – INCLUDE_DB_DWH.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_DB_DWH.bat" %INCLUDE_DB_DWH%, %Environment%

8.1 INCLUDE_DB_DWH.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_DWH_DEPLOY.bat" %2
GOTO END

:END

8.2 DB_DWH_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET SqlPackage="C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe"
 SET publish="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_DWH\DB_DWH_server.publish.xml"
 SET dacpac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_DWH\bin\Debug\DB_DWH.dacpac"
 %SqlPackage% /pr:%publish% /sf:%dacpac% /a:Publish
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

9. Execute windows batch command – INCLUDE_DB_LOG.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_DB_LOG.bat" %INCLUDE_DB_LOG%, %Environment%

9.1 INCLUDE_DB_LOG.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\DB_LOG_DEPLOY.bat" %2
GOTO END

:END

9.2 DB_LOG_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET SqlPackage="C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\SqlPackage.exe"
 SET publish="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_LOG\DB_LOG_server.publish.xml"
 SET dacpac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\DB_LOG\bin\Debug\DB_LOG.dacpac"
 %SqlPackage% /pr:%publish% /sf:%dacpac% /a:Publish
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

10 Execute windows batch command – INCLUDE_SSIS_STG.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_SSIS_STG.bat" %INCLUDE_SSIS_STG%, %Environment%

10.1 INCLUDE_SSIS_STG.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSIS_STG_DEPLOY.bat" %2
GOTO END

:END

10.2 SSIS_STG_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET ISDeploymentWizard="ISDeploymentWizard.exe"
 SET ispac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\SSIS_STG\bin\Development\SSIS_STG.ispac"
 SET server="server"
 SET SSIS="/SSISDB/TEST/SSIS_STG"
 %ISDeploymentWizard% /S /SP:%ispac% /DS:%server% /DP:%SSIS%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

11. Execute windows batch command – INCLUDE_SSIS_ODS.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_SSIS_ODS.bat" %INCLUDE_SSIS_ODS%, %Environment%

11.1 INCLUDE_SSIS_ODS.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSIS_ODS_DEPLOY.bat" %2
GOTO END

:END

11.2 SSIS_ODS_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET ISDeploymentWizard="ISDeploymentWizard.exe"
 SET ispac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\SSIS_ODS\bin\Development\SSIS_ODS.ispac"
 SET server="server"
 SET SSIS="/SSISDB/TEST/SSIS_ODS"
 %ISDeploymentWizard% /S /SP:%ispac% /DS:%server% /DP:%SSIS%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

12. Execute windows batch command – INCLUDE_SSIS_DWH.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_SSIS_DWH.bat" %INCLUDE_SSIS_DWH%, %Environment%

12.1 INCLUDE_SSIS_DWH.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSIS_DWH_DEPLOY.bat" %2
GOTO END

:END

12.2 SSIS_DWH_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET ISDeploymentWizard="ISDeploymentWizard.exe"
 SET ispac="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\SSIS_DWH\bin\Development\SSIS_DWH.ispac"
 SET server="server"
 SET SSIS="/SSISDB/TEST/SSIS_DWH"
 %ISDeploymentWizard% /S /SP:%ispac% /DS:%server% /DP:%SSIS%
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

13. Execute windows batch command – INCLUDE_SSAS.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_SSAS.bat" %INCLUDE_SSAS%, %Environment%

13.1 INCLUDE_SSAS.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSAS_DEPLOY.bat" %2
GOTO END

:END

13.2 SSAS_DEPLOY.bat


IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET Deployment="C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Binn\ManagementStudio\Microsoft.AnalysisServices.Deployment.exe"
 SET asdatabase="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\SSAS\bin\SSAS.asdatabase"
 %Deployment% %asdatabase% /s
GOTO END

:TEST
GOTO END

:PROD
GOTO END

:UNKNOWN
GOTO END

:END

14. Execute windows batch command – INCLUDE_SSRS.bat


"%WORKSPACE%\CI_JENKINS\INCLUDE_SSRS.bat" %INCLUDE_SSRS%, %Environment%

14.1 INCLUDE_SSRS.bat


IF [%1]==[true] (
 GOTO INCLUDED
)
IF [%1]==[false] (
 GOTO END
)

:INCLUDED
 "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_DEPLOY.bat" %2
GOTO END

:END

14.2 SSRS_DEPLOY.bat


@echo off
::Script Variables
SET LOGFILE="C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_TEST.log"
SET SCRIPTLOCATION=C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS



REM SET REPORTSERVER=http://server/ReportServer_TEST

IF [%1]==[DEV] (
 GOTO DEV
) 
IF [%1]==[TEST] (
 GOTO TEST
)
IF [%1]==[PROD] (
 GOTO PROD
) ELSE (
 GOTO UNKNOWN
)

:DEV
 SET REPORTSERVER=http://server/ReportServer_TEST
GOTO END

:TEST
 SET REPORTSERVER=http://server/ReportServer_TEST
GOTO END

:PROD
 SET REPORTSERVER=http://server/ReportServer_TEST
GOTO END

:UNKNOWN
 SET REPORTSERVER=http://server/ReportServer_TEST
GOTO END

:END



SET RS=rs.exe
SET TIMEOUT=60

::Clear Log file
IF EXIST %logfile% DEL %logfile%

::Write Log Header
ECHO Reporting Services Scripter Load Log >>%LOGFILE%
ECHO. >>%LOGFILE%
ECHO Starting Load at %DATE% %TIME% >>%LOGFILE%
ECHO SCRIPTLOCATION = %SCRIPTLOCATION% >>%LOGFILE%
ECHO REPORTSERVER   = %REPORTSERVER% >>%LOGFILE%
ECHO BACKUPLOCATION = %BACKUPLOCATION% >>%LOGFILE%
ECHO SCRIPTLEVEL    = SQL2008 >>%LOGFILE%
ECHO TIMEOUT        = %TIMEOUT% >>%LOGFILE%
ECHO RS             = %rs% >>%LOGFILE%
ECHO. >>%LOGFILE%

::Run Scripts

%rs% -i "%SCRIPTLOCATION%\SSRS_DEPLOY.rss" -s %REPORTSERVER% -l %TIMEOUT% -v REGIONFOLDER="" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST" -v DATASOURCE="" >>%LOGFILE% 2>&1
ECHO. >>%LOGFILE%

@echo off %rs% -i "%SCRIPTLOCATION%\SSRS_DEPLOY.rss" -s %REPORTSERVER% -l %TIMEOUT% -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_1" -v DATASOURCE="" >>%LOGFILE% 2>&1
@echo off ECHO. >>%LOGFILE%

@echo off %rs% -i "%SCRIPTLOCATION%\SSRS_DEPLOY.rss" -s %REPORTSERVER% -l %TIMEOUT% -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_2" -v DATASOURCE="" >>%LOGFILE% 2>&1
@echo off ECHO. >>%LOGFILE%

@echo off %rs% -i "%SCRIPTLOCATION%\SSRS_DEPLOY.rss" -s %REPORTSERVER% -l %TIMEOUT% -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_3" -v DATASOURCE="" >>%LOGFILE% 2>&1
@echo off ECHO. >>%LOGFILE%

@echo off %rs% -i "%SCRIPTLOCATION%\SSRS_DEPLOY.rss" -s %REPORTSERVER% -l %TIMEOUT% -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_4" -v DATASOURCE="" >>%LOGFILE% 2>&1
@echo off ECHO. >>%LOGFILE%

ECHO. >>%LOGFILE%
ECHO Finished Load at %DATE% %TIME% >>%LOGFILE%
ECHO. >>%LOGFILE%

14.3 SSRS_DEPLOY.rss


Public Sub Main()
	Dim pathToReport As String = "../workspace/SSRS_TEST/"

	Dim region As String = REGIONFOLDER 
	Dim folder As String = PARENTFOLDER
	Dim name As String = RDLNAME
	Dim data As String = DATASOURCE

	Dim parent As String = region + "/" + folder
	Dim location As String = pathToReport + name + ".rdl"

	Dim overwrite As Boolean = True
	Dim reportContents As Byte() = Nothing
	Dim warnings As Warning() = Nothing

	Dim fullpath As String = parent + "/" + name

	'Common CatalogItem properties
	Dim descprop As New [Property]
	descprop.Name = "Description"
	descprop.Value = ""
	Dim hiddenprop As New [Property]
	hiddenprop.Name = "Hidden"
	hiddenprop.Value = "False"

	Dim props(1) As [Property]
	props(0) = descprop
	props(1) = hiddenprop

	'console.writeline("region:   {0}", region)
	'console.writeline("folder:   {0}", folder)
	'console.writeline("name:     {0}", name)
	'console.writeline("parent=region/folder: {0}", parent)
	'console.writeline("location=name.rdl: {0}", location)
	'console.writeline("fullpath=parent/folder/name: {0}", fullpath)

	'Read RDL definition from disk
	Try
		Dim stream As FileStream = File.OpenRead(location)
		reportContents = New [Byte](stream.Length-1) {}
		stream.Read(reportContents, 0, CInt(stream.Length))
		stream.Close()

		warnings = RS.CreateReport(name, parent, overwrite, reportContents, props)

		If Not (warnings Is Nothing) Then
			Dim warning As Warning
			For Each warning In warnings
				Console.WriteLine(Warning.Message)
			Next warning
		Else
			Console.WriteLine("Report: {0} published successfully with no warnings", name)
		End If

		'Set report DataSource references
		''Dim dataSources(0) As DataSource

		''Dim dsr0 As New DataSourceReference
		''dsr0.Reference = region + "/Data Sources/MyDataSource"
		''Dim ds0 As New DataSource
		''ds0.Item = CType(dsr0, DataSourceDefinitionOrReference)
		''ds0.Name = data 
		''dataSources(0) = ds0

		''RS.SetItemDataSources(fullpath, dataSources)

		''Console.Writeline("Report DataSources set successfully")
        ''Console.WriteLine("Report: {0} published successfully", name)

	Catch e As IOException
		Console.WriteLine(e.Message)
	Catch e As SoapException
		Console.WriteLine("Error : " + e.Detail.Item("ErrorCode").InnerText + " (" + e.Detail.Item("Message").InnerText + ")")
		Console.WriteLine("Report: {0} published with error", name)
	End Try
End Sub

14.4 SSRS_TEST.log (output log)


Reporting Services Scripter Load Log 
 
Starting Load at st 22. 05. 2013 10:40:36,81 
SCRIPTLOCATION = C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS 
REPORTSERVER   = http://server/ReportServer_TEST 
BACKUPLOCATION =  
SCRIPTLEVEL    = SQL2008 
TIMEOUT        = 60 
RS             = rs.exe 
 
Report: Report_TEST published successfully with no warnings
The command completed successfully
 
off rs.exe -i "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_DEPLOY.rss" -s http://server/ReportServer_TEST -l 60 -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_1" -v DATASOURCE="" 
off ECHO. 
off rs.exe -i "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_DEPLOY.rss" -s http://server/ReportServer_TEST -l 60 -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_2" -v DATASOURCE="" 
off ECHO. 
off rs.exe -i "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_DEPLOY.rss" -s http://server/ReportServer_TEST -l 60 -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_3" -v DATASOURCE="" 
off ECHO. 
off rs.exe -i "C:\Program Files (x86)\Jenkins\jobs\DWH_BI\workspace\CI_JENKINS\SSRS_DEPLOY.rss" -s http://server/ReportServer_TEST -l 60 -v REGIONFOLDER="SSRS_TEST" -v PARENTFOLDER="SSRS_TEST" -v RDLNAME="Report_TEST_4" -v DATASOURCE="" 
off ECHO. 
 
Finished Load at st 22. 05. 2013 10:40:53,28 


Recommendations:
  • Offline Development: it allows developers independent development, debugging own code and database structures and consequently commit already functional code to online database project repository
  • Units (granularity – smaller units are better) for Development:
  • DB (table, view, stored procedure, function etc.)
  • SSIS (package etc.)
  • SSAS (cube, dimension etc.)
  • SSRS (data sets, report etc.)
  • MSBuild.exe: only database projects
  • devenv.exe: database (after installing SSDT) and business intelligence projects
Source code:
Additional references:

A simple hack of a login password

Posted: March 30, 2013 in MSSQL
Tags:
Task: to show how simply to hack a login password in SQL Server. Comparison of elapsed/estimated time for decryption vs different password lengths in different SQL Server versions




WARNING: by hacking any part of SQL Server, e.g. a password login, you will lose the warranty from Microsoft. This blog post has the educational purpose only. For testing purposes, you should try it on the development machine only. I do not take any responsibility for any damages caused by this article.

Recommendations:
  • run the cmd.exe as an administrator
  • ERROR: cuModuleLoad() 209 – NV users require ForceWare 310.32 or later (NVIDIA update); AMD users require Catalyst 13.1 -exact-
  • set convenient paths to hashcat utility (C:\Temp\Hashcat) and to text files (PASSWORDS and HASHES)

Prerequisites:

1. SSMS > T-SQL > get hashes of login passwords


SELECT [NAME], [PASSWORD_HASH]
FROM [SYS].[SQL_LOGINS]

or

SELECT [NAME], 
LOGINPROPERTY([SYS].[SYSLOGINS].[NAME],'PasswordHash') AS PasswordHash
FROM [SYS].[SYSLOGINS]
WHERE LOGINPROPERTY([SYS].[SYSLOGINS].[NAME],'PasswordHash') IS NOT NULL

2. Windows > Explorer > download hashcat utility and create auxiliary files

  • create C:\Temp\Hashcat\MSSQL05-08R2_PASSWORDS_CPU.txt
  • create C:\Temp\Hashcat\MSSQL05-08R2_PASSWORDS_GPU.txt
  • create C:\Temp\Hashcat\MSSQL05-08R2_HASHES.txt
  • create C:\Temp\Hashcat\MSSQL12_PASSWORDS_CPU.txt
  • create C:\Temp\Hashcat\MSSQL12_PASSWORDS_GPU.txt
  • create C:\Temp\Hashcat\MSSQL12_HASHES.txt
  • download and unzip hashcat utility from hashcat utility to C:\Temp\Hashcat\

Tests:

SW & HW > Laptop > Lenovo Thinkpad E530

  • Windows 7
  • Processor: Intel Core i5 3210 Ivy Bridge
  • RAM: 12GB
  • Graphics: NVIDIA GeForce GT630M 2GB Optimus + Intel HD Graphics 4000

1. Windows > Run > run a new CMD window – MSSQL05-08R2_CMD_CPU.bat


cd C:\Temp\Hashcat\hashcat-0.44
echo Started: %time%
hashcat-cli64.exe -a 3 --pw-min=5 --pw-max=5 -m 131 -p : -o "C:\Temp\Hashcat\MSSQL05-08R2_PASSWORDS_CPU.txt" --output-format=0 -n 4 "C:\Temp\Hashcat\MSSQL05-08R2_HASHES.txt" -1 ?l?u?d?s ?1?1?1?1?1
echo Stopped: %time%
pause


2. Windows > Run > run a new CMD window – MSSQL05-08R2_CMD_GPU.bat


cd C:\Temp\Hashcat\oclHashcat-lite-0.15
cudaHashcat-lite64.exe -m 132 -p : -o "C:\Temp\Hashcat\MSSQL05-08R2_PASSWORDS_GPU.txt" --outfile-format=3 --gpu-temp-abort=100 --pw-min=5 --pw-max=5 -1 ?l?u?d?s "C:\Temp\Hashcat\MSSQL05-08R2_HASHES.txt" ?1?1?1?1?1
pause


3. Windows > Run > run a new CMD window – MSSQL12_CMD_CPU.bat


cd C:\Temp\Hashcat\hashcat-0.44
echo Started: %time%
hashcat-cli64.exe -a 3 --pw-min=5 --pw-max=5 -m 1731 -p : -o "C:\Temp\Hashcat\MSSQL12_PASSWORDS_CPU.txt" --output-format=0 -n 4 "C:\Temp\Hashcat\MSSQL12_HASHES.txt" -1 ?l?u?d?s ?1?1?1?1?1
echo Stopped: %time%
pause


4. Windows > Run > run a new CMD window – MSSQL12_CMD_GPU.bat


cd C:\Temp\Hashcat\oclHashcat-lite-0.15
cudaHashcat-lite64.exe -m 1732 -p : -o "C:\Temp\Hashcat\MSSQL12_PASSWORDS_GPU.txt" --outfile-format=3 --gpu-temp-abort=100 --pw-min=5 --pw-max=5 -1 ?l?u?d?s "C:\Temp\Hashcat\MSSQL12_HASHES.txt" ?1?1?1?1?1
pause

NOTE: not supported at the moment

Abbreviations:
  • -a 3 – the attack mode. 3 indicates using brute force
  • –pw-min=5 –pw-max=5 – at least 5 characters long and not more than 5 characters long
  • -m 131 – this means a SQL 2005-2008 R2 hash (CPU)
  • -m 132 – this means a SQL 2005-2008 R2 hash (GPU)
  • -m 1731 – this means a SQL 2012 hash (CPU)
  • -m 1732 – this means a SQL 2012 hash (GPU; note: not supported at the moment)
  • -p : -o “C:\Temp\Hashcat\MSSQL05-08R2_PASSWORDS_CPU.txt” – the output file name and location (CPU)
  • -p : -o “C:\Temp\Hashcat\MSSQL05-08R2_PASSWORDS_GPU.txt” – the output file name and location (GPU)
  • -p : -o “C:\Temp\Hashcat\MSSQL12_PASSWORDS_CPU.txt” – the output file name and location (CPU)
  • -p : -o “C:\Temp\Hashcat\MSSQL12_PASSWORDS_GPU.txt” – the output file name and location (GPU)
  • –output-format=0 – the format of the output file (CPU)
  • –output-format=3 – the format of the output file (GPU)
  • -n 4 – the number of thread counts to use
  • “C:\Temp\Hashcat\MSSQL05-08R2_HASHES.txt” – the name and location of hash file for SQL 2005-2008 R2
  • “C:\Temp\Hashcat\MSSQL12_HASHES.txt” – the name and location of hash file for SQL server 2012
  • -1 ?l?u?d?s – the type of characters to try using brute force. l = lower case letters, u = upper case letters, d = numbers and s = special characters. (!@#, etc). Using ?a for all
  • -?1?1?1?1?1 – number of position to the pw-max
  • –gpu-temp-abort=100 – at 100 degrees Celsius, it will automatically stop

Notes:
  • -oclHashcat-lite – decrypting single hash only
  • -oclHashcat-plus – decrypting multiple hashes

Results:

Figure 1: CMD result – MSSQL05-08R2 (4-character password) – CPU

Figure 1: CMD result – MSSQL05-08R2 (4-character password) – CPU


Figure 2: CMD result – MSSQL05-08R2 (5-character password) – CPU

Figure 2: CMD result – MSSQL05-08R2 (5-character password) – CPU


Figure 3: CMD result – MSSQL05-08R2 (6-character password) – CPU

Figure 3: CMD result – MSSQL05-08R2 (6-character password) – CPU


Figure 4: CMD result – MSSQL05-08R2 (8-character password) – CPU

Figure 4: CMD result – MSSQL05-08R2 (8-character password) – CPU


Figure 5: CMD result – MSSQL05-08R2 (4-character password) – GPU

Figure 5: CMD result – MSSQL05-08R2 (4-character password) – GPU


Figure 6: CMD result – MSSQL05-08R2 (5-character password) – GPU

Figure 6: CMD result – MSSQL05-08R2 (5-character password) – GPU


Figure 7: CMD result – MSSQL05-08R2 (6-character password) – GPU

Figure 7: CMD result – MSSQL05-08R2 (6-character password) – GPU


Figure 8: CMD result – MSSQL05-08R2 (8-character password) – GPU

Figure 8: CMD result – MSSQL05-08R2 (8-character password) – GPU


Figure 9: CMD result – MSSQL12 (4-character password) – CPU

Figure 9: CMD result – MSSQL12 (4-character password) – CPU


Figure 10: CMD result – MSSQL12 (5-character password) – CPU

Figure 10: CMD result – MSSQL12 (5-character password) – CPU


Figure 11: CMD result – MSSQL12 (6-character password) – CPU

Figure 11: CMD result – MSSQL12 (6-character password) – CPU


Figure 12: CMD result – MSSQL12 (8-character password) – CPU

Figure 12: CMD result – MSSQL12 (8-character password) – CPU

Table results:
SQL Server CPU GPU
password [chars] _1Tc [4] _1Tc& [5] _1Tc5& [6] _1Tc5&dI [8] _1Tc [4] _1Tc& [5] _1Tc5& [6] _1Tc5&dI [8]
2005-08R2 decryption time <1s 7m 12h 30m >113h 2s 7s 2h 19m 2y 143d
2012 decryption time <3s >31m >2h >63h

Notes:
  • -light green – elapsed time
  • -yellow – estimated time

Conclusion: the hashcat – advanced password recovery utility is very useful tool not only for decrypting SQL Server login passwords. The decryption time differences between particular SQL Server versions are caused by using of different hash algorithms. The version 2005-08R2 uses only SHA-1 hash whereas the version 2012 already uses SHA-2 (SHA-512 concretely) hash. To increase computing power and decrease decrypting time, it would be possible to connect more computers into grid etc.


Source Code:
Additional references:

DWH basics – partitioning

Posted: November 3, 2012 in MSSQL
Tags:
Task: to show how to horizontal partition a database table and how to apply different data compression to this partitions


SQL Server supports only one type of horizontal partitioning, the range partitions. It is the partitioning strategy in which data is partitioned based on the range that the value of a particular field falls in.

Figure 1: UML – Use Case

Figure 1: UML – Use Case

The process of horizontal partitioning of a database table and data compression consists of the following steps:
1. Create a new test database with two different filegroups (NOTE: the path C:\Temp\SQL Server\Partitioning\Primary and Secondary must already exist!)


IF EXISTS(
SELECT name
FROM sys.databases
WHERE name = N'Test_Partitioning')
DROP DATABASE Test_Partitioning
GO
CREATE DATABASE Test_Partitioning
ON PRIMARY
(NAME = 'TestPartitioning_Part2010',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Primary\TestPartitioning_Part2010.mdf',
SIZE = 5, -- ISSUE: The CREATE DATABASE statement failed. The primary file
-- must be at least 5 MB to accommodate a copy of the model database.
MAXSIZE = 100,
FILEGROWTH = 1),
FILEGROUP TestPartitioning_Part2011
(NAME = 'TestPartitioning_Part2011',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Secondary\TestPartitioning_Part2011.ndf',
SIZE = 2,
MAXSIZE = 100,
FILEGROWTH = 1),
FILEGROUP TestPartitioning_Part2012
(NAME = 'TestPartitioning_Part2012',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Secondary\TestPartitioning_Part2012.ndf',
SIZE = 2,
MAXSIZE = 100,
FILEGROWTH = 1)
GO

2. Create the partition range function


USE Test_Partitioning
GO
CREATE PARTITION FUNCTION TestPartitioning_PartitionFunction(DATE)
AS RANGE LEFT FOR
VALUES('2010-12-31', '2012-01-01')
-- NOTE: interval: (- infinity, 2010-12-31],
--                 [2011-01-01, 2011-12-31] and
--                 [2012-01-01, infinity +)
GO

3. Create the partition scheme and attach the partition scheme to filegroups


USE Test_Partitioning
GO
CREATE PARTITION SCHEME TestPartitioning_PartitionScheme
AS PARTITION TestPartitioning_PartitionFunction
TO([PRIMARY], TestPartitioning_Part2011, TestPartitioning_Part2012)
GO

4. Create a table with the partition key and the partition scheme


USE Test_Partitioning
GO
CREATE TABLE dbo.Test_PartitionedTable
(ID INT NOT NULL,
[Date] DATE)
ON TestPartitioning_PartitionScheme([Date])
GO

5. Create the index on the partitioned table (optional and recommended)


USE Test_Partitioning
GO
CREATE UNIQUE CLUSTERED INDEX IX_TestPartition_Table
ON dbo.Test_PartitionedTable([Date])
ON TestPartitioning_PartitionScheme([Date])
GO

6. Insert data into the partitioned table


USE Test_Partitioning
GO
INSERT INTO dbo.Test_PartitionedTable(ID, [Date])
VALUES
(1, DATEADD(YEAR, -3, GETDATE())), -- inserted in the partition 2009
(2, DATEADD(YEAR, -2, GETDATE())), -- inserted in the partition 2010
(3, DATEADD(YEAR, -1, GETDATE())), -- inserted in the partition 2011
(4,                   GETDATE())   -- inserted in the partition 2012
GO

7. Test data from the dbo.Test_PartitionedTable


USE Test_Partitioning
GO
SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
3   2011-10-19
4   2012-10-19
*/
GO

8. Verify rows inserted in partitions


USE Test_Partitioning
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  1	 0	                   0	              NONE
72057594039369728   245575913	1	   3	              72057594039369728	  1	 0	                   0	              NONE
*/
GO

9. SPLIT – add the new partition P4 for the next year 2013


USE Test_Partitioning
GO
ALTER DATABASE Test_Partitioning
ADD FILEGROUP TestPartitioning_Part2013
GO
ALTER DATABASE Test_Partitioning
ADD FILE
(NAME = 'TestPartitioning_Part2013',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Secondary\TestPartitioning_Part2013.ndf',
SIZE = 2,
MAXSIZE = 100,
FILEGROWTH = 1)
TO FILEGROUP TestPartitioning_Part2013
GO
ALTER PARTITION SCHEME TestPartitioning_PartitionScheme
NEXT USED TestPartitioning_Part2013
GO
ALTER PARTITION FUNCTION TestPartitioning_PartitionFunction()
SPLIT RANGE('2013-01-01')
-- NOTE: interval: (- infinity, 2010-12-31],
--                 [2011-01-01, 2011-12-31],
--                 [2012-01-01, 2012-12-31] and
--                 [2013-01-01, infinity +)
GO

10. SWITCH IN – move data from the staging table to the empty newly added partition P4


USE Test_Partitioning
GO

--INSERT INTO dbo.Test_PartitionedTable(ID, [Date])
--VALUES
--(5, DATEADD(YEAR,  1, GETDATE())), -- inserted in the partition 2013
--(6, DATEADD(YEAR,  2, GETDATE()))  -- inserted in the partition 2014
--GO

IF EXISTS(
SELECT name
FROM sys.databases
WHERE name = N'Test_Partitioning_Staging')
DROP DATABASE Test_Partitioning_Staging
GO
CREATE DATABASE Test_Partitioning_Staging
ON PRIMARY
(NAME = 'Test_Partitioning_Staging',
FILENAME =
'C:\Temp\SQL Server\Partitioning\Primary\Test_Partitioning_Staging.mdf',
SIZE = 5, -- ISSUE: The CREATE DATABASE statement failed. The primary file
-- must be at least 5 MB to accommodate a copy of the model database.
MAXSIZE = 100,
FILEGROWTH = 1)
GO


USE Test_Partitioning_Staging
GO

CREATE TABLE dbo.Test_NonPartitionedStagingTable
(ID INT NOT NULL,
[Date] DATE)
GO
CREATE UNIQUE CLUSTERED INDEX IX_Test_NonPartitionedStagingTable
ON dbo.Test_NonPartitionedStagingTable([Date])
GO
INSERT INTO dbo.Test_NonPartitionedStagingTable(ID, [Date])
VALUES
(5, DATEADD(YEAR,  1, GETDATE())), -- inserted in the partition 2013
(6, DATEADD(YEAR,  2, GETDATE()))  -- inserted in the partition 2014
/*
SSMS RESULTS:
ID  Date
5   2013-10-31
6   2014-10-31
*/
GO


USE Test_Partitioning
GO

ALTER DATABASE Test_Partitioning
MODIFY FILEGROUP TestPartitioning_Part2012 DEFAULT
GO

CREATE TABLE dbo.Test_NonPartitionedStagingTemp
(ID INT NOT NULL,
[Date] DATE)
GO
CREATE UNIQUE CLUSTERED INDEX IX_Test_NonPartitionedStagingTemp
ON dbo.Test_NonPartitionedStagingTemp([Date])
GO
ALTER TABLE dbo.Test_NonPartitionedStagingTemp
ADD CONSTRAINT PartitionFunction CHECK ([Date] > '2013-01-01'
AND [Date] IS NOT NULL)
GO
INSERT INTO dbo.Test_NonPartitionedStagingTemp(ID, [Date])
SELECT ID, [Date]
FROM Test_Partitioning_Staging.dbo.Test_NonPartitionedStagingTable
GO
SELECT * FROM dbo.Test_NonPartitionedStagingTemp
GO
/*
SSMS RESULTS:
ID  Date
5   2013-10-31
6   2014-10-31
*/

ALTER TABLE dbo.Test_NonPartitionedStagingTemp
SWITCH TO dbo.Test_PartitionedTable PARTITION 4
GO
DROP TABLE dbo.Test_NonPartitionedStagingTemp
GO

ALTER DATABASE Test_Partitioning
MODIFY FILEGROUP [PRIMARY] DEFAULT
GO

SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
3   2011-10-19
4   2012-10-19
5   2013-10-19
6   2014-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  1	 0	                   0	              NONE
72057594039435264   245575913	1	   3	              72057594039435264	  1	 0	                   0                  NONE
72057594039369728   245575913	1	   4	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

11. SWITCH OUT – move data from the first partition P1 to the archive table dbo.Test_NonPartitionedArchiveTable


USE Test_Partitioning
GO

CREATE TABLE dbo.Test_NonPartitionedArchiveTable
(ID INT NOT NULL,
[Date] DATE)
ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX IX_NonPartitioned_Archive
ON dbo.Test_NonPartitionedArchiveTable([Date])
GO

ALTER TABLE dbo.Test_PartitionedTable SWITCH PARTITION 1
TO dbo.Test_NonPartitionedArchiveTable
GO

SELECT *
FROM dbo.Test_NonPartitionedArchiveTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   0	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  1	 0	                   0	              NONE
72057594039435264   245575913	1	   3	              72057594039435264	  1	 0	                   0                  NONE
72057594039369728   245575913	1	   4	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

12. MERGE – based on ‘2012-01-01’, i.e. P1 = PRIMARY FG (2010) + SECONDARY FG (2011)


USE Test_Partitioning
GO

ALTER PARTITION FUNCTION TestPartitioning_PartitionFunction()
MERGE RANGE('2012-01-01')
-- NOTE: interval: (- infinity, 2010-12-31]
--                 [2011-01-01, 2011-12-31],
--                 [2012-01-01, 2012-12-31] and
--                 [2013-01-01, infinity +)
GO
SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
3   2011-10-19
4   2012-10-19
5   2013-10-19
6   2014-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   0	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  2	 0	                   0	              NONE
72057594039369728   245575913	1	   3	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

INSERT INTO dbo.Test_PartitionedTable(ID, [Date])
VALUES
(1, DATEADD(YEAR, -3, GETDATE())), -- inserted in the partition 2009
(2, DATEADD(YEAR, -2, GETDATE()))  -- inserted in the partition 2010
GO
SELECT *
FROM dbo.Test_PartitionedTable
/*
SSMS RESULTS:
ID  Date
1   2009-10-19
2   2010-10-19
3   2011-10-19
4   2012-10-19
5   2013-10-19
6   2014-10-19
*/
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   0	              NONE
72057594039304192   245575913	1	   2	              72057594039304192	  2	 0	                   0	              NONE
72057594039369728   245575913	1	   3	              72057594039369728	  2	 0	                   0	              NONE
*/
GO

13. Apply compression to your partitioned table


USE Test_Partitioning
GO
ALTER TABLE Test_PartitionedTable
REBUILD PARTITION = All
WITH 
(
DATA_COMPRESSION = PAGE ON Partitions(1), -- NOTE: (1 TO 3)
DATA_COMPRESSION = ROW  ON Partitions(2) ,
DATA_COMPRESSION = NONE ON Partitions(3)
);
GO

14. Apply compression to your partitioned index


USE Test_Partitioning
GO
ALTER INDEX IX_TestPartition_Table
ON dbo.Test_PartitionedTable
REBUILD PARTITION  = All
WITH 
(
DATA_COMPRESSION = PAGE ON Partitions(1),
DATA_COMPRESSION = ROW  ON Partitions(2),
DATA_COMPRESSION = NONE ON Partitions(3)
);
GO

15. Apply compression to your unpartitioned index


--USE Test_Partitioning
--GO
--ALTER INDEX YourUnpartitionedIndex
--ON dbo.Test_PartitionedTable
--REBUILD WITH (DATA_COMPRESSION = ROW);
--GO

16. Testing of application three types of data compression


USE Test_Partitioning
GO
SELECT *
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID) = 'Test_PartitionedTable'
ORDER BY partition_number
/*
SSMS RESULTS:
partition_id        object_id   index_id   partition_number   hobt_id             rows   filestream_filegroup_id   data_compression   data_compression_desc 
72057594039238656   245575913	1	   1	              72057594039238656   2	 0	                   2	              PAGE
72057594039304192   245575913	1	   2	              72057594039304192	  2	 0	                   1	              ROW
72057594039369728   245575913	1	   3	              72057594039369728	  2	 0	                   0	              NONE
*/
GO


Recommendations:
  • filegroups can be setup as read-only. It will increase the performance
  • relational and cube partitions should be based on the same column
  • SQL Server 7.0 introduced partitioning through partitioned views
  • partitioning: when the number of rows is higher than 1M
  • roughly 25GB per partition is well manageable (100GB would be poorly manageable)
  • operations: SPLIT, MERGE and SWITCH are meta data operations only and do not involve any movement of data
  • usage: to migrate older data from more expensive disk to less expensive disk
Quotes from references:
  • “Though it is possible to create all partitions on PRIMARY but it would be best if these different partitions are stored in a separate file groups. This gives some performance improvement even in the case of single core computers. It would be best if these file groups are on different discs on a multi core processing machine.”
  • “A partitioned table may have a partitioned index. Partition aligned index views may also be created for this table.
    There may be question in your mind if it is possible to partition your table using multiple columns. The answer may be YES or NO. Why? No, because there is no such direct support for this in SQL Server. Yes, because you can still do that by using persisted computed column based on any number of columns you want.”
  • “For partitioning your existing table just drop the clustered index on your table and recreate it on the required partition scheme.”
  • “Though SQL Server does not directly support List Partitioning, you can create list partitions by tricking the partition function to specify the values with the LEFT clause. After that, put a CHECK constraint on the table, so that no other values are allowed to be inserted in the table specifying Partition Key column any value other than the ‘list’ of values.”
  • “It must be remembered that indexes can be partitioned using a different partition key than the table. The index can also have different numbers of partitions than the table. We cannot, however, partition the clustered index differently from the table. To partition an index, ON clause is used, specifying the partition scheme along with the column when creating the index:
    If your table and index use the same Partition function then they are called Aligned. If they go further and also use the same partition scheme as well, then they are called Storage Aligned (note this in the figure below). If you use the same partition function for partitioning index as used by the table, then generally performance is improved.”
Additional references: