Tuesday 3 July 2012

Mirroring Lync

Recently I did some work on mirroring the support SQL databases used by Lync. Lync is Microsoft’s unified communications offering providing lots of communications functionality like telephones, video conferencing etc… I won’t be covering either mirroring or Lync in much depth as there is already a lot of stuff on the web for both of these technologies. I’ll look at the specifics required by Lync and some of the discovery process that could be re-used to support mirroring on another application that does not natively support mirroring.
After analysing the problems and issues I determined that a hot fail over was not possible for Lync using mirroring but that a warm failover was possible. A hot failover will automatically bring the mirrored databases up on the mirror server and start serving queries automatically if the principal server has failed. A warm failover will not failover automatically but does support an easy manual failover when required. The mirrored databases are kept up to date on the secondary / mirror and can be brought online if the main / primary server fails or needs to be brought offline for operations like service pack patching.
There are two reasons why Lync cannot support a hot fail over.  There is no option in the Lync configuration facility to enter the failover partner’s address and the Lync solution uses cross data queries.
Normally, you add an extra option to database connection strings when using mirroring. This provides the address of the failover and is needed when the client connects to the database when the normal primary is down. If the client does not know the address of the mirror it cannot connect. Mirroring is built into the SQL client stack and the clients can get the mirroring server’s address from the active primary and automatically re-route connections if there is a failover while they are connected. But, this is not reliable and will not work if the client is not connected when the failover happens and the primary is not running when they try to connect. So, the absence of an option to specify the failover address in Lync’s configuration means that a hot failover will not be reliable and will not work in some usage cases.
There is a possibility that the connection string is held in a configuration file somewhere and can be manually edited to provide the failover partner. I chose not to pursue option due to concerns over maintainability the possibility of missing some entries.
This means that the address of the mirror must be specified in the hosts file when there is a failover to ensure normal operation is possible. This is done by editing the hosts.txt file normally found in C:\Windows\System32\drivers\etc\hosts.txt. There is more detail on this on the web.
The next issue is the use of cross database queries. Lync includes a number of databases, the exact number of these depends on which features you are using, some of these use cross database queries. A cross database query is used when a query connected to one database references tables or objects in another database on the same server using a three part name. The cross database query means that these databases must be running on the same server all the time to function correctly. If one database automatically fails over this dependency is broken and the system becomes unstable. This is the second reason why the fail over must be a manual process. Furthermore, the system uses cross database ownerships chains that require a special setting to be applied to the database.
I identified this issue by using the sys.databases system view to examine the Lync databases. I found that the is_db_chaining_on was set to 1 for the rtc and rtcdyn databases. If you’re trying to get mirroring working on another vendor solution that does not support mirroring it is worth creating an empty dummy database on the server with the default options and comparing the other fields in this view and researching any differences you find. Setting is_db_chaining_on to the required value is done by applying the ALTER DATABASE xx SET DB_CHAINING ON command. The wrinkle here is, this cannot be applied when restoring the database. It must be brought online and then applied, so doing a manual failover and then applying the setting needs to be included in your deployment process.
Other than these issues, there were no major problems. The databases were using the simple recovery model and needed to be changed to full recovery for mirroring and log backup jobs were created to prevent uncontrolled log growth. Logins also needed to be identified and moved. There were no other dependencies that needed to be duplicated.
So, you end up with a relatively simple manual process that needs to be documented and followed when a failover is required. Not perfect but much better then needing to build servers from scratch and find / restore backups is an emergency and does provide a temporary server for patching / server maintenance. Of course, this isn’t formally supported by Microsoft and has not been exhaustively tested for all the various Lync components.
By the way, SQL 2012 Always On has added Availability Groups to specify a set of databases that must run on the same server. If one database fails over the rest are dragged over too. This would remove the issue caused by the cross database queries. That would justify more researching into removing the obstacle caused by the Lync configuration tools and make a full hot failover a possibility.

1 comment:

  1. Missed a bit, you also need to allow snapshot isolation on the rtc and rtcdyn databases using ALTER DATABASE xx SET ALLOW_SNAPSHOT_ISOLATION ON

    ReplyDelete