Monday 9 December 2013

SQL 2014 - early view

It seems like only yesterday that SQL 2012 was released but 2014 in around the corner already. It’s already at CTP2 (community technology preview two) and is looking interesting.

As Microsoft has moved to a quicker release cycle it will become more and more difficult it stay up to date. Indeed, many of the big hit features of 2014 are not relevant to many organisations so there will be a lot of customers who stay on SQL 2012 or wait for the 2000-next. However, these features might be exactly what you need right now so let’s have a look.

The big hitter in this release is Hekaton and its related features. Hekaton is a Greek word meaning hundred and that describes what Hekaton offers, performance improvements of one hundred times for transactional processing. Wow, one hundred times! The changes comes from moving to an in-memory model for the relational engine. This removes the impact of moving data between disk and memory and radically reduces the impact of memory based operations. 

The Greek phalanx comes to SQL 2014 - image by Andrew Becraft

Moving to in memory is, in many ways, caching up with industry trends. In the past disk access was thousands of times slower than memory, server didn’t have enough memory to keep most of their data in disk and it was assumed that servers were not trust worthy enough to keep their data is memory only. These have changed with the move to SSDs (solid state disks) for storage, increases in memory and server resilience improvements. These days “memory is the new disk” (Jim Gray) and the move to in memory is gathering speed.

Hekaton’s includes in-memory “latch less” data access model, improved algorithms for data access, natively compiled stored procedures and other associated features including delayed durability for transactions, updatable column store indexes and AlwaysOn improvements. There’s a lot in there and it’ll take a while to digest the full set of changes.

As expected, there is also better integration between on premise and Azure databases. On premise databases can now backup to the cloud easier (building on a SQL2012 service pack feature) and leverage a new managed service for Azure cloud backup. Built-in backup encryption is thrown in for good measure, while this is a small feature it’s an essential for backing up to the cloud so helps to complete the cloud picture.

In BI, the biggest feature is support added to the PowerView for the multi-dimensional model. Like the backup to Azure, this was already available as a service pack plus update. Not much on this list, but that will happen when releases are so close together.

A few extras should be added by the time SQL 2014 gets to RTM (ready to manufacture).

Friday 10 May 2013

SSIS and fleixible table design

A lot of databases “feature” an interesting design pattern that I’m calling a repeating table design. I’ve seen this used for campaigns and similar business problems and data structures. When a new task is created a new set of tables are created for this task. This can cause problems for the ETL developer as the table the ETL packages need to target change over time and cannot be predicted at design time. Today, we’ll look a way to allow SSIS to target this structure and overcome the difficulties it imposes. Our issues are: how do we consume this structure in SSIS? How do we read from these different tables using one dataflow? How do we ensure that any new tables matching this structure are automatically supported?

Where you're head? Image by MGSPiller
First, we’ll create a set of these tables. I’m using the using the AdventureWorks2008R2 version of adventure works DB for demo purposes. Apply the script below to get started.
SELECT
      S.*
INTO
      SalesOrderHeader_Mason_Roberts
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Mason'
                  AND
                        P.LastName = 'Roberts'
            )    

SELECT
      S.*
INTO
      SalesOrderHeader_Dalton_Perez
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Dalton'
                  AND
                        P.LastName = 'Perez'
            )    
           

SELECT
      S.*
INTO
      SalesOrderHeader_Jennifer_Simmons
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Jennifer'
                  AND
                        P.LastName = 'Simmons'
            )    
           
SELECT
      S.*
INTO
      SalesOrderHeader_Jason_Griffin
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Jason'
                  AND
                        P.LastName = 'Griffin' 
            )    




This creates copies of the SalesOrderHeader table for four customers. Our issue now is, how do we consume this structure in SSIS? How do we read from these different tables using one dataflow? How do we ensure that any new tables matching this structure are automatically supported?
These questions can be answered by using catalogue views to determine the set of tables required. This set can be saved in an object that the Foreach Loop Container can traverse. A dataflow inside the container can then select the appropriate table as the input for data source.
First, we need to get the list of tables required. We can use a query like this to get the list of tables.
SELECT
      name
FROM
      sys.tables
WHERE
      name LIKE 'SalesOrderHeader[_]%'



This list needs to be saved in an object the Foreach Loop Container can iterate through. There are a few options here but the easiest way is to populate an ADO object variable. To do this, first create a user variable setting it’s data type to object. Now create an Execute SQL task. Set the task’s result set to “Full result set” and the SQLStatement to your equivalent of the query above. On the Result Set tab ensure your object variable is selected and that its Result Name is set to 0. This will populate your variable with the required result set. This result set is an ADO dataset that the Foreach Loop Container can use.
Now, moving on to the Foreach Loop Container. First create a new variable. The Foreach Loop Container will populate this with the required table name as you run through the loop. Set its default value to one of your table names. You need a the default value for SSIS to generate its deeisgn time meta-data when the data source is created later in the process.
Okay, time to add the Foreach Loop Container and set it up. On the collection tab, set the Enumerator to “Foreach ADO Enumerator”. The options available in the “Enumerator Configuration” section will change. Use the ADO object source variable option to select your variable. On the variable mapping tab select the variable you’ve created for the table name. The value of this variable will change as the container works through the item in the ADO data set.
The collection tab should look like this:

The variable mapping looks like this…

Next, add a data flow, or whatever you need, to do the actual processing. Add a data source and configure it to use the named table. As usual, there are a few options here. I’ve set the data access mode to use Table name or view name variable option and selected my variable.  SSIS will use the table name you’ve specified when the variable was created to define the query’s meta-data. Select the columns that you need and they are available as in any standard dataflow. The data source will look something like this…

I also add a derived column transform to the dataflow to capture the name of the table the data is coming from. 
Outside these, everything else will be standard SSIS. As usual, there are a few more wrinkles for anybody living in the real work but for now we’re good to go.
Last point, this is not a good database design pattern. Just because you do this doesn’t mean you should, so please don’t take this post as an invitation to use this repeating table pattern.