Datacenter switchover in Exchange 2010 are operationally complex because recovery of mailbox data (DAG) and client access (namespace) are tied together there. So if you lose all or a significant portion of your CAS, or the VIP for the array, or a significant portion of your DAG, you were in a situation where you needed to do a datacenter switchover. In Exchange 2010, perhaps the biggest single point of failure in the messaging system is the FQDN that you give to users because it tells the user where to go. In the Exchange 2010 paradigm, changing where that FQDN goes isn’t easy because you have to change DNS, and then handle DNS latency, which in some parts of the world is challenging. And you have name caches in browsers that are typically about 30 minutes or more that also have to be handled.

 

In Exchange 2013, a client can receive multiple IP Addresses from DNS for a given FQDN.

 

In Exchange 2013, the namespace doesn’t need to move with the DAG. Exchange leverages fault tolerance built into the namespace through multiple IP addresses, load balancing (and if need be, the ability to take servers in and out of service).

This means the namespace is no longer a single point of failure as it was in Exchange 2010.

Since almost all client access in Exchange 2013 now relies on HTTP (Outlook, Outlook Anywhere, EAS, EWS, OWA, and EAC), if the first IP Address on a HTTP stack fails, the HTTP client will try the next and so on. If a Virtual IP of a CAS array were to fail, the client can automatically connect to other IPs to access the same service in a matter of seconds, instead of waiting minutes for DNS to failover. For Example, if a client tries one and it fails, it waits about 20 seconds and then tries the next one in the list. Thus, if you lose the VIP for the Client Access server array, recovery for the clients happens automatically, and in about 21 seconds !!!

 

Great news right?

 

If you lose your CAS array, you don’t need to perform a datacenter switchover. Clients are automatically redirected to a second datacenter that has operating Client Access servers, which remains unaffected by the outage (because you don’t do a switchover). Instead of working to recover service, the service recovers itself and you can focus on fixing the core issue

 

If you lose the load balancer in your primary site, you simply turn it off (or maybe turn off the VIP) and repair or replace it. Clients that aren’t already using the VIP in the secondary datacenter will automatically fail over to the secondary VIP without any change of namespace, and without any change in DNS. Not only does that mean you no longer have to perform a switchover, but it also means that all of the time normally associated with a datacenter switchover recovery isn’t spent.

 

Example DAG Design for Exchange 2013: Because you can fail over the namespace between datacenters, all that’s needed to achieve a datacenter failover is a mechanism for failover of the Mailbox server role across datacenters. To get automatic failover for the DAG, you simply architect a solution where the DAG is evenly split between two datacenters, and then place the witness server in a third location so that it can be arbitrated by DAG members in either datacenter, regardless of the state of the network between the datacenters that contain the DAG members.

 

How to deal with intermittent failures in Exchange 2013: An intermittent failure requires some sort of extra administrative action to be taken because it might be the result of a replacement device being put into service. In this scenario, the administrator can perform a namespace switchover by simply removing the VIP for the device being replaced from DNS. Then during that service period, no clients will be trying to connect to it. After the replacement process has completed, the administrator can add the VIP back to DNS, and clients will eventually start using it.

7 Comments


  1. What’s up i am kavin, its my first time to commenting anywhere, when i read this piece of writing i thought i could also create comment due to this brilliant piece of writing.

    Reply

  2. Have you ever considered about adding a little bit more than just your articles?
    I mean, what you say is important and all. Nevertheless imagine if you added some great graphics or video clips to give your posts
    more, “pop”! Your content is excellent but with pics and video clips, this
    site could undeniably be one of the very best in its
    niche. Good blog!

    Reply

  3. Excellent article. Keep writing such kind of information on your page.
    Im really impressed by your site.
    Hi there, You’ve done a great job. I’ll definitely digg it and
    individually recommend to my friends. I am confident they will be benefited from
    this website.

    Reply
  4. KenL

    Hi,
    A question about how the external HTTP clients (OA or OWA) deal with multiple IP addresses….
    Do you just add two IP addresses to the A recored? And if so does the HTTP client know to use the first one on the list first?
    The reason I am asking is we want the primary datacentre to take the load during normal operation.
    And, the same question for internal traffic using Windows 2008 R2 DNS. Do we need to turn off round robin to keep the order of the IP addresses presented to the OA or OWA client?

    Reply
    • Jomon Jose

      @Kenl
      I too got your doubt. I think this can be achieved by some sort of geo load balances doing round robin to multiple DCs .

      Reply

  5. Exchange Single Global Namespace requires more than one CAS servers and indeed more than one IP addresses to be specified for a single name that will be used by the clients to connect.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

Protected by WP Anti Spam