Thursday 11 October 2012

Dangers of dynamic SQL

In my previous post I outlined why Dynamic SQL can be really fantastic and is something you need in your toolbox. I also mentioned a few reasons why you should avoid using Dynamic SQL and the potential dangers you expose yourself to when going down this road. I’ll cover more about these draw backs in this post.
Will be looking at:
·        It’s difficult
·        Hard to test
·        Security issues
·        Cause potential performance issues
Image by Jonathan Gill
It’s difficult
Yes, Dynamic SQL is more difficult to write then traditional hardcoded SQL. Just being more difficult dosen’t automatically make this a problem but it is an issue you need to be aware of. This is particular problem for a lot of application developers who need to cover the whole application stack. A developer recently mentioned to me how much he hates doing CSS. If you don’t know what CSS is, trust me it’s pretty far away from database development. He needs to do a day or two of CSS every year and just dosen’t do it often enough to get comfortable with. I’ve had a similar problem writing XQuery and XML which I need to spend a few days on most years. The average application developer, who writes most of the SQL code that gets into production, needs to know an awful lot of different technologies. Most of these need to be a “good enough” level rather than expert. Expecting typical application developers to know SQL well enough to use complex techniques like Dynamic SQL is neither realistic nor even fair. If you decide you need Dynamic SQL you need to allow more time to develop it, possibly look for your most database focused application developer and maybe get help from a specialist db developer.
Hard to test
By it’s nature, different code is generated at run time and it is therefore much more difficult to ensure you have complete code coverage and that every possibility has been tested and is correct. You’ll need to be methodical and thorough when testing. Testing will need to consider correctness, completeness and performance for each possible combination. Not easy.
Security issues
Security can be a big issue here. There are two areas you need to look at – SQL Injection and Broken Ownerships Chains. Both of these are big topics and need attention.
Starting with SQL Injection. SQL Injection happens when an application user types code into an UI and this code is executed by the server. This exposes the server to running anything the end user wants. An example makes this clearer.
User wants to search for customer “x”. This generates code this like:
SELECT CustomerName, OtherFields
FROM Customer
WHERE CustomerName = ‘x’
The “bad” user changes the search string to “x’’;SELECT @@VERSION”. This generates code like:
SELECT CustomerName, OtherFields
FROM Customer
WHERE CustomerName = ‘x’;
SELECT @@VERSION
The server will run the code and execute the extra section. The end user now has more information about the application than the application designers intended. Much more dangerous and revealing code can be included. I will cover this topic in more detail but the core thing to remember is to parameterise your code using the sp_executesql stored procedure. I’ll cover this in more detail when going through sample implementation.
Next, dynamic SQL break ownership chains. Ownership chains are used to control access to objects. Often, an application logs into SQL Server with a user that has no rights to directly query tables or views. This user has the right to execute a set of stored procedures required for the application role. The stored procedure is owned by another user that has rights to access the necessary tables and objects. By giving the application user execution rights to the stored procedure they indirectly get rights to access these tables. Dynamic SQL breaks these ownership chains. You have a few options to over come this. You can execute the stored procedure or code section using a user with elevated rights. Or, you can sign the stored procedure using a certificate that extends it’s rights. Again, this is a large area and I won’t cover this in detail here.
Cause potential performance issues
Wow, slow down – isn’t this supposed to improve performance? Well, there are cases where it can be bad for performance. The biggest issue is plan cache bloat. When a query is run, an execution plan is generated, used to run the query and saved in memory. It will be re-used if the same query is run again. If a lot of different queries hit the server it will fill up memory and reduce the amount free for data pages and other usages. The problem here is that different plans will be generated for each version of the query’s code. The entire code is used to match the cache so the two queries below are seen as different queries and each get an entry in the plan cache.
SELECT *
FROM Customer C INNER JOIN Address A ON A.AddressID = A.AddressID
WHERE C.Name = ‘ABC’
SELECT *
FROM Customer C INNER JOIN Address A ON A.AddressID = A.AddressID
WHERE C.Name = ‘123’
There are a number of things you can do. Making the query parameterised (also used to counter  Injection attacks) is the best option. It’s also possible to just clear the plan cache periodically (a bit crude but is option for some use cases), force parameterisation and optimising for ad-hoc queries. Looks like more details are required here too!!!





Sunday 7 October 2012

Progress?

It’s been a while since my last blog. Finding it surprisingly difficult to keep these running and to stop Word putting green lines under everything I write.
Been very busy last few months, a lot of travel for work and finding it more draining I expected. Anyway, working on a BI solution for a customer. Using 2008R2, some SSIS and a fairly simple cube. “Older” technology but still a few interesting things in the project and getting ammunition for an eventual blog or two.
Did a user group presentation a few weeks ago. It was kind of a first for me. First time I presented  externally and in front of a public crowd. But, I’ve done a lot of internal presentations and informal training and some customer presentations so felt ready for it. Covered Dynamic SQL, so reminds me to follow up on that blog I did a few days weeks ago. Will follow this up with my first SQL Saturday in November. It’ll be in Pordenore, Italy on November 17th. Excited, first time in Italy and looking forward to it. Will be covering Dynamic SQL again and have got a lot of work to do to streamline the presentation. Busy, busy, but it’s all progress.

Tuesday 31 July 2012

Dynamic SQL – the case for and against

I want to talk a bit about writing dynamic SQL. Dynamic doesn’t mean exciting or interesting, it simply means that the SQL queries are created at run time although, I think, it is both exciting and interesting. This is a big area and will take a few posts to cover all the detail I want to but we’ll get started today.
Before getting down to details, there are two main use cases for dynamic SQL. It is frequently used by DBAs to automate maintenance activates. I will not be covering this in these blogs. It is also used for complex and unpredictable queries - like customer searches. It can allow your queries to find that needle you need without sifting through the entire hay stack of other stuff. I’ll look at this use case.

Use dynamic SQL to find a needle in a haystack
Image by iluvcocacola
First, why use dynamic SQL? The answer is - it can lead to incredible performance and scalability increases. It becomes necessary as data sets grow to medium and larger sizes. In some cases, the impact to query response time can be millions of times.
Wow, improvements of millions of times? That raises the next question; why not use dynamic SQL all the time? There are a number of answers and I’ll look at a few here.
1.      Dynamic SQL will not improve every type of query and we’ll look at recognising the types of query to target in later posts.
2.      Dynamic SQL have a number of drawbacks including:
·        Dynamic code is more difficult to write
·        It raises a number of security difficulties that need to be resolved.
·        It does not always fit easily into development frameworks like ORM tools.
·        Can lead to memory issues.
·        It is harder to test and to change over time.
Now that sounds like something to avoid, doesn’t it? The simple answer is, yes you should avoid using dynamic SQL but sometimes it is necessary and beneficial. You need to recognise when it is appropriate. When you do need it, you need to know the implication and the techniques required to avoid the issues.
That’s all for today - I’ll cover more about the potential issues raised by dynamic SQL is the next post and follow up with details on how to implement later.

Friday 27 July 2012

Achieved MCM SQL 2008 Qualification

I want to announce that I’ve gotten the MCM SQL 2008 qualification. I got this a few weeks ago and was waiting for the announcement from Microsoft but the official announcement is taking longer than I expected so I’m putting it here.
It’s been a long road getting here and I’ve learnt a lot over the period. Still a bit surprised that I managed to get to the end of process. Next steps is the BI MCITP to complete the set and after that looking at SQL 2012.  

Tuesday 3 July 2012

Mirroring Lync

Recently I did some work on mirroring the support SQL databases used by Lync. Lync is Microsoft’s unified communications offering providing lots of communications functionality like telephones, video conferencing etc… I won’t be covering either mirroring or Lync in much depth as there is already a lot of stuff on the web for both of these technologies. I’ll look at the specifics required by Lync and some of the discovery process that could be re-used to support mirroring on another application that does not natively support mirroring.
After analysing the problems and issues I determined that a hot fail over was not possible for Lync using mirroring but that a warm failover was possible. A hot failover will automatically bring the mirrored databases up on the mirror server and start serving queries automatically if the principal server has failed. A warm failover will not failover automatically but does support an easy manual failover when required. The mirrored databases are kept up to date on the secondary / mirror and can be brought online if the main / primary server fails or needs to be brought offline for operations like service pack patching.
There are two reasons why Lync cannot support a hot fail over.  There is no option in the Lync configuration facility to enter the failover partner’s address and the Lync solution uses cross data queries.
Normally, you add an extra option to database connection strings when using mirroring. This provides the address of the failover and is needed when the client connects to the database when the normal primary is down. If the client does not know the address of the mirror it cannot connect. Mirroring is built into the SQL client stack and the clients can get the mirroring server’s address from the active primary and automatically re-route connections if there is a failover while they are connected. But, this is not reliable and will not work if the client is not connected when the failover happens and the primary is not running when they try to connect. So, the absence of an option to specify the failover address in Lync’s configuration means that a hot failover will not be reliable and will not work in some usage cases.
There is a possibility that the connection string is held in a configuration file somewhere and can be manually edited to provide the failover partner. I chose not to pursue option due to concerns over maintainability the possibility of missing some entries.
This means that the address of the mirror must be specified in the hosts file when there is a failover to ensure normal operation is possible. This is done by editing the hosts.txt file normally found in C:\Windows\System32\drivers\etc\hosts.txt. There is more detail on this on the web.
The next issue is the use of cross database queries. Lync includes a number of databases, the exact number of these depends on which features you are using, some of these use cross database queries. A cross database query is used when a query connected to one database references tables or objects in another database on the same server using a three part name. The cross database query means that these databases must be running on the same server all the time to function correctly. If one database automatically fails over this dependency is broken and the system becomes unstable. This is the second reason why the fail over must be a manual process. Furthermore, the system uses cross database ownerships chains that require a special setting to be applied to the database.
I identified this issue by using the sys.databases system view to examine the Lync databases. I found that the is_db_chaining_on was set to 1 for the rtc and rtcdyn databases. If you’re trying to get mirroring working on another vendor solution that does not support mirroring it is worth creating an empty dummy database on the server with the default options and comparing the other fields in this view and researching any differences you find. Setting is_db_chaining_on to the required value is done by applying the ALTER DATABASE xx SET DB_CHAINING ON command. The wrinkle here is, this cannot be applied when restoring the database. It must be brought online and then applied, so doing a manual failover and then applying the setting needs to be included in your deployment process.
Other than these issues, there were no major problems. The databases were using the simple recovery model and needed to be changed to full recovery for mirroring and log backup jobs were created to prevent uncontrolled log growth. Logins also needed to be identified and moved. There were no other dependencies that needed to be duplicated.
So, you end up with a relatively simple manual process that needs to be documented and followed when a failover is required. Not perfect but much better then needing to build servers from scratch and find / restore backups is an emergency and does provide a temporary server for patching / server maintenance. Of course, this isn’t formally supported by Microsoft and has not been exhaustively tested for all the various Lync components.
By the way, SQL 2012 Always On has added Availability Groups to specify a set of databases that must run on the same server. If one database fails over the rest are dragged over too. This would remove the issue caused by the cross database queries. That would justify more researching into removing the obstacle caused by the Lync configuration tools and make a full hot failover a possibility.

Monday 28 May 2012

Hello...

Hello and welcome to my blog.

My name is Ian Meade and I'm starting a blog. This is the first post, hopefully they'll be more posts and the posts might even get better. Plan is, use this to get into the habit of blogging and gradually build something. Will be using this to pratice the techniques of blogging, find the spell checker and may start to do good stuff.

Interested in databases, software devlopemnt and coffee. Other then that, pretty boring but what can you do?
What's the surprise?

This is repost of a blog I did somewhere else. Originally posted 19/7/2011. Still trying to remember / work out what the surprise is. Think, I was unimpressed by the recommended index in the query plan. Could be a little clearer...

Executuion plans and a surprise

It's a while since my last blog but here goes...
You use a query’s execution plan to work out how SQL is running is returning the results of your query to you. In this post I am going to:

·         Show how to get estimated and actual query plans.
·         Show how to save them to allow other people to look at that.
·         Show a few things around using them
There are two version of execution plan, estimated and actual.
·         Estimated can be generated without running a query. It is the plan that SQL will try to use when it runs the query.
·         The actual plan is, surprisingly, the actual plan recorded when the query is actually run.
The actual plan has some details not in the estimated plan and there are some conditions when the actual plan will be different to the generated estimated plan. Using the actual plan allows you to see the results of your query and get an idea of how long it will take to run. If you’re looking at performance, also run without generating the actual plan as it adds a lot of overhead and does not reflect the query’s actual run time. Clear out the caches as well to give a clean basis of comparison.
You can get the estimated plan in SSMS using the Query \  Display Estimated Execution Plan menu option (or CTRL+L for anybody who just can’t wait). It will take a few seconds to return the plan. This is not a reflection of how long SQL takes to generate the plan when running.
The actual plan is generated when a query is run and will be returned if you set the Query \ Include Actual Execution Plan option before running the query.
This will return the actual details record when the query was run. A good thing to look out for is differences between the actual and estimated row counts. This indicates that SQL could not determine how many rows it would need to process and there is a good chance a better execution plan was possible if more accurate row counts were available. This can be caused by a lack of statistics, out of date statistics or using table variables with large row sets. For us, table variables are more likely to be an issue. I’ve seen cases where changing from table variables to temporary tables has changed query time form around 20 minutes to around 10 seconds. I’ve also seen cases where going from around 1 second to around 1/100 of a second. But, that’s not what I want to write about today.
Once you have the plan, you can start to use it to improve your query performance.
The easiest thing you can do is write two versions of your query and compare the relative cost. A lot of the time recoding the query can make a difference in performance and readability.
In SSMS, hovering over the query icons will show a tool tip. Details change with the task type. You can get more details by selecting the task and displaying its properties.
The size of the arrows joining tasks tells you how many rows the task is returning. The tool tip will tell you the number and size of estimated and actual rows. Differences in row counts indicate a statistics problem. The size of the rows multiplied by the number of rows gives the total data returned by the task. Therefore, the size of rows can be more significant than simply the number of rows returned.
In 2008, the plan can return a list of missing indexes and tell you how much adding them will improve your queries. It comes up in green. Right clicking it and selecting Missing Index Details… opens the index’s creation code in a new query window. In 2005, and in 2008, the supporting details are stored in DMVs and can be retrieved using queries. Be careful with these hints. They can recommend a number of slightly different but similar indexes when one covering index could improve all queries with less overhead. The index definition could be bad too. Look at this query and its recommended index.

Query
SELECT
      *
FROM
      FactSales
WHERE
      ProductKey = 12345
Index
/*
Missing Index Details from SQLQuery3.sql - (local)\dwh.DWH_TEST_08062011 (ERGO\imeade (57))
The Query Processor estimates that implementing the following index could improve the query cost by 99.962%.
*/
/*
USE [DWH_TEST_08062011]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[FactSales] ([ProductKey])
INCLUDE ([everyb field in the table] ) GO
*/

I’ve marked the expected improvement and the include clause in bold and underline. The improvement look great, but you do not want this in this index. It’ll copy the entire table without adding much value. This is a table of about 35G in a 40G database. This index will give you a 75G database that has a few point queries with excellent performance. Adding the index without the include  will take about 1M and still give excellent gains for the test query. By the way, I didn’t know this and am quite shocked to discover this.
Back to the plan details, you can get even more details by right clicking on the plan and selecting Show Execution Plan XML… This isn’t easy to read but does have an important benefit. You can save the xml and send and open it as a graphic execution plan. In 2008, you can right click on the plan and selection Save Execution Plan As… This lets you save the xml as a file with the sqlplan extension. SQL recognises this extension and allow you to open the plan is SSMS. You can then save a plan to allow before and after comparison if you do major changes or want to monitor a query that after a few weeks of production data has hit a database. You could also email the plan to somebody else to look at. This is much more useful than a screen shot of an isolated task from an execution plan.
You can do the same in 2005, bit more difficult but can be done. Haven’t got a copy handy to check the steps but I think you get the XML plan and then copy and paste it into a file with the sqlplan extension.