Bare Metal Restore for Virtual Machines in Hyper-V (Part 2)

by [Published on 25 Jan. 2011 / Last Updated on 25 Jan. 2011]

This article discusses some shortcomings of Hyper-V disaster recovery plans.

If you would like to read the other parts in this article series please go to:

Introduction

In the first part of this article series, I explained that although I have a comprehensive disaster recovery plan in place on my network, there were some unexpected issues that I encountered when I had to rebuild my network after a lightning strike. In this article, I want to shed some light on those shortcomings and talk about how I have addressed them.

Let's Begin

As you may recall from my first article, the primary line of defense that I use in protecting my network is Microsoft’s System Center Data Protection Manager 2007 (DPM 2007). I had configured DPM 2007 to take an incremental, block level backup of each of my virtual hard drives every fifteen minutes.

I was fortunate in that my DPM 2007 server and the storage array that is attached to it both survived the lightning strike. However, one of the primary casualties was a Hyper-V host server containing several mission critical virtual machines.

Once I had assessed the damage, my plan was to replace the damaged server, install Hyper-V, and then restore my backups from DPM 2007. Ultimately however, that plan was short sighted at best.

The problem was that in order for DPM 2007 to protect resources on a server, the server must run a DPM agent. That same agent is also used during the restore process. My problem was that I had an empty Hyper-V host server without any guest servers, and without any DPM agents.

Unfortunately, DPM doesn’t allow you to blindly copy an agent from the DPM 2007 server. The agent is server specific, and it will only work if the server must be connected to the Active Directory. Connecting my servers to the Active Directory simply wasn’t an option for me because every one of my domain controllers had been destroyed by the lightning strike. For all practical purposes, the Active Directory didn’t exist! That meant that even though I had full backups for all of my servers, I couldn’t use any of them.

I ended up getting lucky in that part of my disaster recovery plan had involved shutting down all of my virtual servers once every six weeks and exporting the virtual machines to an external hard drive. When the export process completed, I would unplug the external hard drive and store it in a safe place. Therefore, the external drive had not been damaged by the lightning strike.

I was able to recover my servers by importing the virtual machines from the removable hard drive. Of course it had been several weeks since I had last exported my virtual servers, so my backups were outdated. It didn’t really matter though, because the images allowed me to bring my virtual servers online. Once online, the agents connected to my DPM server, and I was able to perform a restore operation that brought all of my servers back to a current state.

After I had gotten all of my servers back up and running, a few things began to occur to me. First, my recovery process had taken a full week, which would have been completely unacceptable for most organizations. Granted, a few days were spent waiting for replacement hardware to arrive, but even the actual recovery process took days to complete. In some ways this is to be expected simply because of the sheer volume of data that I was recovering, but there is no denying that the process would have gone much faster if I didn’t have to perform two separate restorations (a restoration from an exported image and a second restoration from DPM 2007).

Another thing that I began to realize was that the recovery would have been impossible had I not taken the time to export all of my virtual servers, even though DPM 2007 contained backups of everything. As such, I knew that I had really dodged a bullet and that I needed a better plan for next time.

Before I tell you about my new disaster recovery strategy, I’m sure that some of you are probably wondering how DPM 2007 is normally able to recover a server after a catastrophic failure. Normally when a computer that is protected by DPM 2007 fails, the rest of the servers on the network are still online. As long as the Active Directory is still functional, you can use the Active Directory Users and Computers console to reset the Computer account for the failed server.

As I mentioned earlier, DPM 2007 depends on the Active Directory. You can only restore a backup if the protected server has an agent that is able to communicate with the DPM server. Furthermore, the agent can only be deployed if the server is a domain member.

Resetting an Active Directory computer account is a way of telling Windows that the computer that is associated with the account is gone, but that you want to retain the computer account information and eventually associate it with a different server. This means that you can install Windows onto a replacement server, give the replacement server the same computer name as the server that failed, and then join the server to your domain. In doing so, the new server takes on the identity of the server that it is replacing.

Before you can complete the recovery process, you must still deploy an agent to the server, but doing so is simple. You can use the DPM 2007 management console to deploy the agent, and once the agent has been installed, you can restore the server’s backup.

Even this method isn’t entirely foolproof though. As I mentioned earlier, it will only work so long as the Active Directory is still functional. The other problem with this technique is that Windows will not allow you to reset the computer account for a domain controller (at least not in a way that facilitates replacing a dead server). As such, the method that I just described can only be used to recover member servers. It will not work for domain controllers.

If you would like to read the other parts in this article series please go to:

Featured Links