Having worked with a number of customers over the last few years who use CRM Online I have found that during the project lifecycle the discussion often comes up that we have chosen the SaaS version of CRM but what does that mean in terms of how high availability works and disaster recovery. While the easiest way to look at it is that you have outsourced the problem to Microsoft who will take this on for you allowing you to focus on the application part of the solution, you can be sure that at some point the questions in the business continuity space will come up.
When implementing CRM on premise the problem domain is well understood because people know they have servers, infrastructure, networking, storage, and SQL Server among other things to think about and in most organisations people are familiar with discussions around DR and HA in these spaces. When considering the cloud you may have out sourced the solution but you can still have scenarios where availability and disaster recovery can happen so its important to have an understanding of what Microsoft will do on your behalf and how it may affect you.
Whenever I have checked on the information for this I have tended to find that there is a number of fragmented sources and its not always easy to get a straight explanation of how things stack up. With this in mind the below is my interpretation of the information I have read. I am sure there may be the odd thing I have gotten wrong so I welcome feedback from people so I can update this article and hopefully it will provide a useful source for others.
Top Level Microsoft Statements
The official statements from Microsoft include:
CRM Online has a monetary backed 99.9% committed up time. If the SLA is not met then we get credits back
Microsoft has operations staff 24*7 monitoring its services
When considering the overall availability and recovery for CRM it is important to understand the dependency tree for CRM and its fellow services.
Depends on Office 365
Depends on Azure Active Directory
[Optional] Can depend on ADFS
Would depend on your on premise Active Directory
Can use SharePoint Online
Can use Azure custom components
Can use market place products
|Azure Active Directory||
|Azure Custom Extensions||
|Marketplace Add ons||
Understanding some technical bits
When you setup Dynamics CRM you choose a region and Microsoft will have 2 data centres in that local region. As an example if we choose Europe we may have Dublin as the primary data centre and Amsterdam would work as a secondary sister datacentre. In this scenario Microsoft would do the following with our data:
Two copies of your data are written to local storage in the primary data centre
an additional 2 copies of the data are written to storage in the secondary data centre
Additionally a daily back up is taken by default and held in offsite storage
Under the hood Microsoft are using SQL Always On with the SQL instances which back CRM which allows them to do these additional writes across the data centre pair. This is important as it allows them to failover data centre from Dublin to Amsterdam seamlessly to the customer in a local data centre DR scenario. If this occurred then it would be expected that the RPO and RTO would be in the region of a small number of minutes to seconds. We would expect almost zero data loss.
Recovery Time Objective
In terms of RTO we are very much in Microsoft’s hands. If we have a scenario where our CRM primary data centre goes down and Microsoft flip the switch to transition us to the sister data centre then our recovery time would be the time it took for Microsoft to make that decision and to execute the failover.
Currently there is no clear statement I can find on this as an SLA but we assume they would often make the call on a case by case basis depending upon the issue.
If we felt like the downtime was going to be an extended period, we would have the option to be able to take our daily back up and deploy it to CRM in another region. This could be a potential option if the downtime in our primary DC was going to be too long before the flip to sister DC or if both paired DC were down. If we took this option we would need to consider how we may roll back to the original data centres when they are back online. This would be via a backup and restore again which would involve some downtime. The url for our CRM instance would also change in this scenario.
Recovery Point Objective
We can assume there are 3 possible recovery points which we can use on the cloud service:
The most common one will be based on the cross DC writes which CRM uses out of the box. We would expect the recovery point to be within milliseconds/seconds of the state of the system at the time it went down
The next option would be if we choose to roll back to the last system back up. This would put our recovery point to anywhere between 0 and 24 hours depending on what time of day we roll back and when the backup was taken
The final option would involve choosing to roll back to a custom back up. In this case the recovery point would be at the time the backup was taken
In the default scenario we expect that Microsoft would be recovering the CRM service and our recovery point would be almost exactly the time the system went down
What about Backups
In addition to the Microsoft approach to multi-writes which offers a good recovery point solution, there are also system backups which are taken daily and kept for around 3 days.
It would be possible for us to use one of these backups as a restore point.
We expect in most cases we would not use this approach and rely on our trust of Microsoft. We would be more likely to use backups to restore to sandbox instances of CRM if we wanted to troubleshoot something.
There is also the option to create custom backups at specific times for the same purposes.
Considering all of the above our approach has a big trust on Microsoft, however it is possible that CRM could be working fine as a service but we could accidently break our own solution. Based on this we feel that a good practice would be to create a custom back up prior to any deployment activity so that we have a safe roll back if the deployment had problems.
CRM High Availability
Azure AD High Availability
As I mentioned earlier without coming across another source discussing these considerations for choosing CRM Online I am hoping my interpretation of the materials out there is useful to others, but also please feedback any bits I may have missed or misinterpreted.