Tuesday 21 January 2014

Enterprise Information Management (EIM) webinar


Doing a webinar on Enterprise Information Management (EIM) next week, the 30th of January. The EIM toolset covers Master Data and Data Quality Services, both of which are really important to a successful BI initiative.

This topic are a bit funny in one way. Most IT personnel aren’t excited by the subject and often don’t really get the subject. On the other hand, business users are really excited by the tools and the opportunity they offer to add value. For me, even though I’m a technologist, I always believe in adding value to the business. So, I’m excited by this area which offers real value to a business and is often a pre-requisite to building data mart and warehouses which really is the area I’m most interested in.

We’re planning to do a webinar most months on an interesting BI topic so watch this space…


Webinar link: http://www.storm.ie/insights/enterprise-information-management-webinar

Friday 17 January 2014

Speaking events in March 2014

I have a number of speaking events coming up in March. Starting on the first I'm in Budapest for SQL Saturday 278 and on the twenty second I’m in Exeter for 269. For both if these, I’ll be covering the use of dynamic SQL for searches. I’ve done this presentation a few times and I’m looking forward to doing it again, it always seems to hit the mark. It’s useful, relevant and a good fit for the SQL Saturday attendees.

On the twenty seventh I'm in Belfast presenting on delivering a successful BI project. This is a non-technical event focusing on the challenges implementing a BI solution raises and how to address these. This will be the first time I’ve done this presentation so I’m looking forward to seeing how it goes and what the reception is. I'm excited as I feel this topic really adds value and can help people in the real world.

Links to the sites:

Monday 9 December 2013

SQL 2014 - early view

It seems like only yesterday that SQL 2012 was released but 2014 in around the corner already. It’s already at CTP2 (community technology preview two) and is looking interesting.

As Microsoft has moved to a quicker release cycle it will become more and more difficult it stay up to date. Indeed, many of the big hit features of 2014 are not relevant to many organisations so there will be a lot of customers who stay on SQL 2012 or wait for the 2000-next. However, these features might be exactly what you need right now so let’s have a look.

The big hitter in this release is Hekaton and its related features. Hekaton is a Greek word meaning hundred and that describes what Hekaton offers, performance improvements of one hundred times for transactional processing. Wow, one hundred times! The changes comes from moving to an in-memory model for the relational engine. This removes the impact of moving data between disk and memory and radically reduces the impact of memory based operations. 

The Greek phalanx comes to SQL 2014 - image by Andrew Becraft

Moving to in memory is, in many ways, caching up with industry trends. In the past disk access was thousands of times slower than memory, server didn’t have enough memory to keep most of their data in disk and it was assumed that servers were not trust worthy enough to keep their data is memory only. These have changed with the move to SSDs (solid state disks) for storage, increases in memory and server resilience improvements. These days “memory is the new disk” (Jim Gray) and the move to in memory is gathering speed.

Hekaton’s includes in-memory “latch less” data access model, improved algorithms for data access, natively compiled stored procedures and other associated features including delayed durability for transactions, updatable column store indexes and AlwaysOn improvements. There’s a lot in there and it’ll take a while to digest the full set of changes.

As expected, there is also better integration between on premise and Azure databases. On premise databases can now backup to the cloud easier (building on a SQL2012 service pack feature) and leverage a new managed service for Azure cloud backup. Built-in backup encryption is thrown in for good measure, while this is a small feature it’s an essential for backing up to the cloud so helps to complete the cloud picture.

In BI, the biggest feature is support added to the PowerView for the multi-dimensional model. Like the backup to Azure, this was already available as a service pack plus update. Not much on this list, but that will happen when releases are so close together.

A few extras should be added by the time SQL 2014 gets to RTM (ready to manufacture).

Friday 10 May 2013

SSIS and fleixible table design

A lot of databases “feature” an interesting design pattern that I’m calling a repeating table design. I’ve seen this used for campaigns and similar business problems and data structures. When a new task is created a new set of tables are created for this task. This can cause problems for the ETL developer as the table the ETL packages need to target change over time and cannot be predicted at design time. Today, we’ll look a way to allow SSIS to target this structure and overcome the difficulties it imposes. Our issues are: how do we consume this structure in SSIS? How do we read from these different tables using one dataflow? How do we ensure that any new tables matching this structure are automatically supported?

Where you're head? Image by MGSPiller
First, we’ll create a set of these tables. I’m using the using the AdventureWorks2008R2 version of adventure works DB for demo purposes. Apply the script below to get started.
SELECT
      S.*
INTO
      SalesOrderHeader_Mason_Roberts
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Mason'
                  AND
                        P.LastName = 'Roberts'
            )    

SELECT
      S.*
INTO
      SalesOrderHeader_Dalton_Perez
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Dalton'
                  AND
                        P.LastName = 'Perez'
            )    
           

SELECT
      S.*
INTO
      SalesOrderHeader_Jennifer_Simmons
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Jennifer'
                  AND
                        P.LastName = 'Simmons'
            )    
           
SELECT
      S.*
INTO
      SalesOrderHeader_Jason_Griffin
FROM
      Sales.SalesOrderHeader S
WHERE
      S.CustomerID IN (
                  SELECT
                        C.CustomerID
                  FROM
                        SALES.Customer C
                  INNER JOIN
                        Person.Person P
                  ON C.PersonID = P.BusinessEntityID
                  WHERE
                        P.FirstName = 'Jason'
                  AND
                        P.LastName = 'Griffin' 
            )    




This creates copies of the SalesOrderHeader table for four customers. Our issue now is, how do we consume this structure in SSIS? How do we read from these different tables using one dataflow? How do we ensure that any new tables matching this structure are automatically supported?
These questions can be answered by using catalogue views to determine the set of tables required. This set can be saved in an object that the Foreach Loop Container can traverse. A dataflow inside the container can then select the appropriate table as the input for data source.
First, we need to get the list of tables required. We can use a query like this to get the list of tables.
SELECT
      name
FROM
      sys.tables
WHERE
      name LIKE 'SalesOrderHeader[_]%'



This list needs to be saved in an object the Foreach Loop Container can iterate through. There are a few options here but the easiest way is to populate an ADO object variable. To do this, first create a user variable setting it’s data type to object. Now create an Execute SQL task. Set the task’s result set to “Full result set” and the SQLStatement to your equivalent of the query above. On the Result Set tab ensure your object variable is selected and that its Result Name is set to 0. This will populate your variable with the required result set. This result set is an ADO dataset that the Foreach Loop Container can use.
Now, moving on to the Foreach Loop Container. First create a new variable. The Foreach Loop Container will populate this with the required table name as you run through the loop. Set its default value to one of your table names. You need a the default value for SSIS to generate its deeisgn time meta-data when the data source is created later in the process.
Okay, time to add the Foreach Loop Container and set it up. On the collection tab, set the Enumerator to “Foreach ADO Enumerator”. The options available in the “Enumerator Configuration” section will change. Use the ADO object source variable option to select your variable. On the variable mapping tab select the variable you’ve created for the table name. The value of this variable will change as the container works through the item in the ADO data set.
The collection tab should look like this:

The variable mapping looks like this…

Next, add a data flow, or whatever you need, to do the actual processing. Add a data source and configure it to use the named table. As usual, there are a few options here. I’ve set the data access mode to use Table name or view name variable option and selected my variable.  SSIS will use the table name you’ve specified when the variable was created to define the query’s meta-data. Select the columns that you need and they are available as in any standard dataflow. The data source will look something like this…

I also add a derived column transform to the dataflow to capture the name of the table the data is coming from. 
Outside these, everything else will be standard SSIS. As usual, there are a few more wrinkles for anybody living in the real work but for now we’re good to go.
Last point, this is not a good database design pattern. Just because you do this doesn’t mean you should, so please don’t take this post as an invitation to use this repeating table pattern.

Thursday 11 October 2012

Dangers of dynamic SQL

In my previous post I outlined why Dynamic SQL can be really fantastic and is something you need in your toolbox. I also mentioned a few reasons why you should avoid using Dynamic SQL and the potential dangers you expose yourself to when going down this road. I’ll cover more about these draw backs in this post.
Will be looking at:
·        It’s difficult
·        Hard to test
·        Security issues
·        Cause potential performance issues
Image by Jonathan Gill
It’s difficult
Yes, Dynamic SQL is more difficult to write then traditional hardcoded SQL. Just being more difficult dosen’t automatically make this a problem but it is an issue you need to be aware of. This is particular problem for a lot of application developers who need to cover the whole application stack. A developer recently mentioned to me how much he hates doing CSS. If you don’t know what CSS is, trust me it’s pretty far away from database development. He needs to do a day or two of CSS every year and just dosen’t do it often enough to get comfortable with. I’ve had a similar problem writing XQuery and XML which I need to spend a few days on most years. The average application developer, who writes most of the SQL code that gets into production, needs to know an awful lot of different technologies. Most of these need to be a “good enough” level rather than expert. Expecting typical application developers to know SQL well enough to use complex techniques like Dynamic SQL is neither realistic nor even fair. If you decide you need Dynamic SQL you need to allow more time to develop it, possibly look for your most database focused application developer and maybe get help from a specialist db developer.
Hard to test
By it’s nature, different code is generated at run time and it is therefore much more difficult to ensure you have complete code coverage and that every possibility has been tested and is correct. You’ll need to be methodical and thorough when testing. Testing will need to consider correctness, completeness and performance for each possible combination. Not easy.
Security issues
Security can be a big issue here. There are two areas you need to look at – SQL Injection and Broken Ownerships Chains. Both of these are big topics and need attention.
Starting with SQL Injection. SQL Injection happens when an application user types code into an UI and this code is executed by the server. This exposes the server to running anything the end user wants. An example makes this clearer.
User wants to search for customer “x”. This generates code this like:
SELECT CustomerName, OtherFields
FROM Customer
WHERE CustomerName = ‘x’
The “bad” user changes the search string to “x’’;SELECT @@VERSION”. This generates code like:
SELECT CustomerName, OtherFields
FROM Customer
WHERE CustomerName = ‘x’;
SELECT @@VERSION
The server will run the code and execute the extra section. The end user now has more information about the application than the application designers intended. Much more dangerous and revealing code can be included. I will cover this topic in more detail but the core thing to remember is to parameterise your code using the sp_executesql stored procedure. I’ll cover this in more detail when going through sample implementation.
Next, dynamic SQL break ownership chains. Ownership chains are used to control access to objects. Often, an application logs into SQL Server with a user that has no rights to directly query tables or views. This user has the right to execute a set of stored procedures required for the application role. The stored procedure is owned by another user that has rights to access the necessary tables and objects. By giving the application user execution rights to the stored procedure they indirectly get rights to access these tables. Dynamic SQL breaks these ownership chains. You have a few options to over come this. You can execute the stored procedure or code section using a user with elevated rights. Or, you can sign the stored procedure using a certificate that extends it’s rights. Again, this is a large area and I won’t cover this in detail here.
Cause potential performance issues
Wow, slow down – isn’t this supposed to improve performance? Well, there are cases where it can be bad for performance. The biggest issue is plan cache bloat. When a query is run, an execution plan is generated, used to run the query and saved in memory. It will be re-used if the same query is run again. If a lot of different queries hit the server it will fill up memory and reduce the amount free for data pages and other usages. The problem here is that different plans will be generated for each version of the query’s code. The entire code is used to match the cache so the two queries below are seen as different queries and each get an entry in the plan cache.
SELECT *
FROM Customer C INNER JOIN Address A ON A.AddressID = A.AddressID
WHERE C.Name = ‘ABC’
SELECT *
FROM Customer C INNER JOIN Address A ON A.AddressID = A.AddressID
WHERE C.Name = ‘123’
There are a number of things you can do. Making the query parameterised (also used to counter  Injection attacks) is the best option. It’s also possible to just clear the plan cache periodically (a bit crude but is option for some use cases), force parameterisation and optimising for ad-hoc queries. Looks like more details are required here too!!!





Sunday 7 October 2012

Progress?

It’s been a while since my last blog. Finding it surprisingly difficult to keep these running and to stop Word putting green lines under everything I write.
Been very busy last few months, a lot of travel for work and finding it more draining I expected. Anyway, working on a BI solution for a customer. Using 2008R2, some SSIS and a fairly simple cube. “Older” technology but still a few interesting things in the project and getting ammunition for an eventual blog or two.
Did a user group presentation a few weeks ago. It was kind of a first for me. First time I presented  externally and in front of a public crowd. But, I’ve done a lot of internal presentations and informal training and some customer presentations so felt ready for it. Covered Dynamic SQL, so reminds me to follow up on that blog I did a few days weeks ago. Will follow this up with my first SQL Saturday in November. It’ll be in Pordenore, Italy on November 17th. Excited, first time in Italy and looking forward to it. Will be covering Dynamic SQL again and have got a lot of work to do to streamline the presentation. Busy, busy, but it’s all progress.

Tuesday 31 July 2012

Dynamic SQL – the case for and against

I want to talk a bit about writing dynamic SQL. Dynamic doesn’t mean exciting or interesting, it simply means that the SQL queries are created at run time although, I think, it is both exciting and interesting. This is a big area and will take a few posts to cover all the detail I want to but we’ll get started today.
Before getting down to details, there are two main use cases for dynamic SQL. It is frequently used by DBAs to automate maintenance activates. I will not be covering this in these blogs. It is also used for complex and unpredictable queries - like customer searches. It can allow your queries to find that needle you need without sifting through the entire hay stack of other stuff. I’ll look at this use case.

Use dynamic SQL to find a needle in a haystack
Image by iluvcocacola
First, why use dynamic SQL? The answer is - it can lead to incredible performance and scalability increases. It becomes necessary as data sets grow to medium and larger sizes. In some cases, the impact to query response time can be millions of times.
Wow, improvements of millions of times? That raises the next question; why not use dynamic SQL all the time? There are a number of answers and I’ll look at a few here.
1.      Dynamic SQL will not improve every type of query and we’ll look at recognising the types of query to target in later posts.
2.      Dynamic SQL have a number of drawbacks including:
·        Dynamic code is more difficult to write
·        It raises a number of security difficulties that need to be resolved.
·        It does not always fit easily into development frameworks like ORM tools.
·        Can lead to memory issues.
·        It is harder to test and to change over time.
Now that sounds like something to avoid, doesn’t it? The simple answer is, yes you should avoid using dynamic SQL but sometimes it is necessary and beneficial. You need to recognise when it is appropriate. When you do need it, you need to know the implication and the techniques required to avoid the issues.
That’s all for today - I’ll cover more about the potential issues raised by dynamic SQL is the next post and follow up with details on how to implement later.