Datacenter switchover in Exchange 2010 are operationally complex because recovery of mailbox data (DAG) and client access (namespace) are tied together there. So if you lose all or a significant portion of your CAS, or the VIP for the array, or a significant portion of your DAG, you were in a situation where you needed to do a datacenter switchover. In Exchange 2010, perhaps the biggest single point of failure in the messaging system is the FQDN that you give to users because it tells the user where to go. In the Exchange 2010 paradigm, changing where that FQDN goes isn’t easy because you have to change DNS, and then handle DNS latency, which in some parts of the world is challenging. And you have name caches in browsers that are typically about 30 minutes or more that also have to be handled.
In Exchange 2013, a client can receive multiple IP Addresses from DNS for a given FQDN.
In Exchange 2013, the namespace doesn’t need to move with the DAG. Exchange leverages fault tolerance built into the namespace through multiple IP addresses, load balancing (and if need be, the ability to take servers in and out of service).
This means the namespace is no longer a single point of failure as it was in Exchange 2010.
Since almost all client access in Exchange 2013 now relies on HTTP (Outlook, Outlook Anywhere, EAS, EWS, OWA, and EAC), if the first IP Address on a HTTP stack fails, the HTTP client will try the next and so on. If a Virtual IP of a CAS array were to fail, the client can automatically connect to other IPs to access the same service in a matter of seconds, instead of waiting minutes for DNS to failover. For Example, if a client tries one and it fails, it waits about 20 seconds and then tries the next one in the list. Thus, if you lose the VIP for the Client Access server array, recovery for the clients happens automatically, and in about 21 seconds !!!
Great news right?
If you lose your CAS array, you don’t need to perform a datacenter switchover. Clients are automatically redirected to a second datacenter that has operating Client Access servers, which remains unaffected by the outage (because you don’t do a switchover). Instead of working to recover service, the service recovers itself and you can focus on fixing the core issue
If you lose the load balancer in your primary site, you simply turn it off (or maybe turn off the VIP) and repair or replace it. Clients that aren’t already using the VIP in the secondary datacenter will automatically fail over to the secondary VIP without any change of namespace, and without any change in DNS. Not only does that mean you no longer have to perform a switchover, but it also means that all of the time normally associated with a datacenter switchover recovery isn’t spent.
Example DAG Design for Exchange 2013: Because you can fail over the namespace between datacenters, all that’s needed to achieve a datacenter failover is a mechanism for failover of the Mailbox server role across datacenters. To get automatic failover for the DAG, you simply architect a solution where the DAG is evenly split between two datacenters, and then place the witness server in a third location so that it can be arbitrated by DAG members in either datacenter, regardless of the state of the network between the datacenters that contain the DAG members.
How to deal with intermittent failures in Exchange 2013: An intermittent failure requires some sort of extra administrative action to be taken because it might be the result of a replacement device being put into service. In this scenario, the administrator can perform a namespace switchover by simply removing the VIP for the device being replaced from DNS. Then during that service period, no clients will be trying to connect to it. After the replacement process has completed, the administrator can add the VIP back to DNS, and clients will eventually start using it.