Moving DPM 2010
Have you been tasked with migrating Data Protection Manager (DPM) to a new server? Have you too that the TechNet resources for it don't even scratch the surface? Me too. Read below as I discuss my adventure of moving DPM to a new server.
Long story short, the department of the University I work for is going to be hosting the IT services of another, smaller department. We already do this for 5 other departments on campus so this one will make 6. The first 5 are primarily managed by us with some being more managed than others. They have some web people and such but our Support department goes out there, our networking team manages everything from the switches and on, their mailbox datastores are on our Exchange server, etc. This 6th department we're hosting is just that; Hosting. The way it was explained to me was that it's their IT car being stored in our virtual garage. They still change the oil and keep it clean; We just keep the roof up and the door locked.
I'll get back to taking in other departments soon; Let's talk about our backup setup. My predecessor was the primary guy who managed the backup system. We run Microsoft's Data Protection Manager (part of their System Center suite of IT management software) for short-term backups and IBM Tivoli for our long-term backups. Since I'm still relatively fresh to this position, my boss has only tasked me with managing DPM (Tivoli stuff will come later).
We have 2 older HP Proliant servers with MS Server 2008 R2. Each server has 2x 300Gb 15k SAS drives in RAID1 with the OS and DPM 2010. Each server also has their own 20Tb (10x 2Tb) array where the DPM-dedicated local SQL database is kept (SQL Server 2008). One server (a primary) backs up (aka "Protects") files and databases on our production servers as well as the user-data on a handful of executive-level workstations. The other server (a secondary) protects the primary DPM server. For as long as I can tell, we've been over 90% full on our backup space. Usually, we've been between 1.2Tb and 1.5Tb free in the DPM storage pool. We weren't sure if this little free space was enough to accommodate the backup storage requirements of this new department under our wing so we set out to expand our SAN storage and move our DPM server to a VM in our virtual infrastructure.
At first my "management" of DPM was merely investigating alerts. Sync issues? Run a synchronization job. Recovery Points failing? Try manually creating one. Host unreachable? Find out why. Maybe uninstall/reinstall the Protection Agent. With us being at near capacity with our current load and still taking on more, we needed to make some changes.
I knew that we'd eventually want to move DPM into our virtual environment. I knew that we'd eventually want to upgrade to DPM 2012. I knew that we'd eventually have more storage for it. What I didn't know was that we were going to try and do it all at the same time. I'd discussed with my boss the need to make a few VMs in our test environment to test migrating and upgrading DPM and he gave the green light.
Just after making the test VMs (both Windows Server 2008), my boss tells me I should make the second VM Windows Server 2012 Core. (For those that aren't familiar, Server 2012 Core is just like Server 2012 but with no GUI. You login and all you see is a CMD window and a black background. You can do everything through command-line or you can open up Powershell and do everything that way.) I figured it wasn't that big of a deal. I'm a good Googler and can figure out simple commands through Powershell.
The first hiccup came when trying to install DPM on Server 2012 Core. System Center Products (Configuration Manager, Data Protection Manager, Operations Manager, etc) as well as recent Exchange and Lync require Silverlight to run the installer. If you don't have Silverlight installed, you'll just see a blank white box on the screen. I experienced this issue with both DPM 2012 and DPM 2010 installers. I didn't realize it was a Silverlight issue until I showed it to my office-mate who said that Lync and Exchange installers require it. At least for the configuration and testing, no more Core mode. I used the add-windowsfeature cmdlet in Powershell to add the Desktop-Experience, Server-Gui-Infra features. (as an aside, you will have to reboot when adding or removing the GUI for Windows Server 2012.)
(As for setting up a server for the first time, SCONFIG.exe does a pretty good job of making configuration of a new server easy without having the GUI. If you're accustomed to setting things up in the GUI with Server Manager, SCONFIG.exe is similarly easy to use through the command-line.)
The second hiccup was more of an issue found through reading rather than an actual issue. The process for migrating DPM to another server according to MS is to basically take a backup of the DPM database, put it somewhere (like a flash drive, external HDD or another server on the domain), remove the first server from the domain, setup your second server (same hostname, same IP addy) and use your new DPM install to restore the DPM database. For this to actually work, The 2nd server has to be the SAME version of the program (2007, 2010, 2012, 2012 SP1, etc) and have the exact SAME hotfixes installed. With this, it's impossible to migrate the DPM database from DPM 2010 on one server into DPM 2012 on another server. (FWIW, the list of build numbers for DPM can be found here.)
After realizing that I needed to uninstall DPM 2012 and install 2010, I then realized another caveat of Server Core; There's no Control Panel, thus, no Add/Remove Pograms. Yes, there is the Remove-Windowsfeature cmdlet via Powershell but DPM (and it's accompanying copy of SQL server) aren't roles; They're separate programs. The jist of uninstalling programs without Add/Remove Programs involves opening Regedit.exe, finding the program, finding the uninstall-string in that program's registry keys, and then using that uninstall-string as a command in Powershell as an administrator.
This is what I'd learned over the course of 2 days, Monday and Tuesday. On Wednesday, my boss asked me to have this done by Friday to meet a deadline of finalizing that migration I'd mentioned earlier. I felt an odd mix of LMAO and thinking that if nothing went wrong in the testing (the testing that I hadn't actually started yet), then it could be done. I really wanted to do this well and do it right since this was the first project of this caliber I'd been given.
As a just-in-case, my boss asked me to open a support case with Microsoft so I have someone on the Bat-phone ready to help if I need it. After it's all said and done, Prateek C. was very helpful and professional.
In the actual testing (making a backup of a DPM database, decommissioning the DPM server and then using the second server to restore that same DPM database backup), I learned that it was more complex than the technet article lead me to believe. Assuming you've already moved the DPMDB.bak file onto your new DPM server, the Powershell command to restore the database is
The problem is that you'll likely end up with something like below.
As you can see in the screenshot, it mentions not having access. In order to restore the database, I had to use SQL 2008 Management Studio, detach DPMDB, remove DPMDB and then restore using the DPMDB.bak file we took from the first server.
That actually works. When you open up the Administrator Console, you'll see your old protection groups and all of their protected members. (If it doesn't work and the DPM Administrator Console can't access the DPMDB, the console will still open but only show the top menu bar and the action pane on the right with no available options.The console will also freeze when you try to shut it down as well.)
However, each member will likely complain about "Disk Missing" under Status. When you go to the Management section and click the Disks tab, You'll see the same disk you had on the old server but it too will say that the disk is missing. This is because that disk isn't on the new server you setup; It's still on the old one. In order to remove the missing disk, you have to stop protection on the members that were using that disk. When you stop protection on all of the members of a Protection Group, it deletes the Protection Group. Once you've done that, you have to create a new Protection Group, add the protected member back into it. In testing, I was only using one protection group and one test server. In production, we have over 500 protected members and about 30 Protection Groups.
If you're following along so far, what you've read is that Technet's documentation for moving DPM to a new server is that you move the DPMDB database over to the new server running the same build number of DPM. It doesn't bring over any of your replica or recovery point data so don't be surprised when the DPMDB file for your 20Tb of backup data is actually 2Gb. (FWIW, you CAN move your replicas over but you have to do it one at a time and the previously created recovery point data will still live on the old server.) Also, when you do move the DPMDB database over to the new DPM server as discussed above, you still have to remove the old "missing" disk and recreate all of your Protection Groups.
The good part about this method is that as long as the IP address and the hostname are the same on the new DPM server as it was on the prior one and you're using the same DPM version, the protected hosts won't know the difference so you *shouldn't* have to re-install the Protection Agents on each server you're backing up. However, you're still basically setting up everything in DPM by hand (so.... many.... clicks....).
At this point, you're probably asking yourself why even go through the hassle of moving everything over if you have to reconfigure most of the program as well? I asked myself the same thing.
The plan I ended up coming up with was actually an idea I'd mentioned sarcastically weeks prior. My sarcastic idea was to setup the new server with the latest version of DPM (System Center Data Protection Manager 2012 R2 for those wondering), write down what's being backed up and in what protection group, write down the configuration for the Protection Groups (retention range, when to sync & recovery point schedules, etc), mass uninstall the DPM 2010 Protection Agent, mass install the new DPM 2012 R2 protection agent and then configure the Protection Groups on the new server. This way, you get the latest and greatest MS backup server and you didn't have to deal with the caveats of actually migrating the data to a new server. This basically became my actual plan. See below for a contextual explanation GIF.
Long story short, the department of the University I work for is going to be hosting the IT services of another, smaller department. We already do this for 5 other departments on campus so this one will make 6. The first 5 are primarily managed by us with some being more managed than others. They have some web people and such but our Support department goes out there, our networking team manages everything from the switches and on, their mailbox datastores are on our Exchange server, etc. This 6th department we're hosting is just that; Hosting. The way it was explained to me was that it's their IT car being stored in our virtual garage. They still change the oil and keep it clean; We just keep the roof up and the door locked.
I'll get back to taking in other departments soon; Let's talk about our backup setup. My predecessor was the primary guy who managed the backup system. We run Microsoft's Data Protection Manager (part of their System Center suite of IT management software) for short-term backups and IBM Tivoli for our long-term backups. Since I'm still relatively fresh to this position, my boss has only tasked me with managing DPM (Tivoli stuff will come later).
We have 2 older HP Proliant servers with MS Server 2008 R2. Each server has 2x 300Gb 15k SAS drives in RAID1 with the OS and DPM 2010. Each server also has their own 20Tb (10x 2Tb) array where the DPM-dedicated local SQL database is kept (SQL Server 2008). One server (a primary) backs up (aka "Protects") files and databases on our production servers as well as the user-data on a handful of executive-level workstations. The other server (a secondary) protects the primary DPM server. For as long as I can tell, we've been over 90% full on our backup space. Usually, we've been between 1.2Tb and 1.5Tb free in the DPM storage pool. We weren't sure if this little free space was enough to accommodate the backup storage requirements of this new department under our wing so we set out to expand our SAN storage and move our DPM server to a VM in our virtual infrastructure.
At first my "management" of DPM was merely investigating alerts. Sync issues? Run a synchronization job. Recovery Points failing? Try manually creating one. Host unreachable? Find out why. Maybe uninstall/reinstall the Protection Agent. With us being at near capacity with our current load and still taking on more, we needed to make some changes.
I knew that we'd eventually want to move DPM into our virtual environment. I knew that we'd eventually want to upgrade to DPM 2012. I knew that we'd eventually have more storage for it. What I didn't know was that we were going to try and do it all at the same time. I'd discussed with my boss the need to make a few VMs in our test environment to test migrating and upgrading DPM and he gave the green light.
Just after making the test VMs (both Windows Server 2008), my boss tells me I should make the second VM Windows Server 2012 Core. (For those that aren't familiar, Server 2012 Core is just like Server 2012 but with no GUI. You login and all you see is a CMD window and a black background. You can do everything through command-line or you can open up Powershell and do everything that way.) I figured it wasn't that big of a deal. I'm a good Googler and can figure out simple commands through Powershell.
PREP FOR TESTING
The first hiccup came when trying to install DPM on Server 2012 Core. System Center Products (Configuration Manager, Data Protection Manager, Operations Manager, etc) as well as recent Exchange and Lync require Silverlight to run the installer. If you don't have Silverlight installed, you'll just see a blank white box on the screen. I experienced this issue with both DPM 2012 and DPM 2010 installers. I didn't realize it was a Silverlight issue until I showed it to my office-mate who said that Lync and Exchange installers require it. At least for the configuration and testing, no more Core mode. I used the add-windowsfeature cmdlet in Powershell to add the Desktop-Experience, Server-Gui-Infra features. (as an aside, you will have to reboot when adding or removing the GUI for Windows Server 2012.)
(As for setting up a server for the first time, SCONFIG.exe does a pretty good job of making configuration of a new server easy without having the GUI. If you're accustomed to setting things up in the GUI with Server Manager, SCONFIG.exe is similarly easy to use through the command-line.)
The second hiccup was more of an issue found through reading rather than an actual issue. The process for migrating DPM to another server according to MS is to basically take a backup of the DPM database, put it somewhere (like a flash drive, external HDD or another server on the domain), remove the first server from the domain, setup your second server (same hostname, same IP addy) and use your new DPM install to restore the DPM database. For this to actually work, The 2nd server has to be the SAME version of the program (2007, 2010, 2012, 2012 SP1, etc) and have the exact SAME hotfixes installed. With this, it's impossible to migrate the DPM database from DPM 2010 on one server into DPM 2012 on another server. (FWIW, the list of build numbers for DPM can be found here.)
After realizing that I needed to uninstall DPM 2012 and install 2010, I then realized another caveat of Server Core; There's no Control Panel, thus, no Add/Remove Pograms. Yes, there is the Remove-Windowsfeature cmdlet via Powershell but DPM (and it's accompanying copy of SQL server) aren't roles; They're separate programs. The jist of uninstalling programs without Add/Remove Programs involves opening Regedit.exe, finding the program, finding the uninstall-string in that program's registry keys, and then using that uninstall-string as a command in Powershell as an administrator.
TESTING
This is what I'd learned over the course of 2 days, Monday and Tuesday. On Wednesday, my boss asked me to have this done by Friday to meet a deadline of finalizing that migration I'd mentioned earlier. I felt an odd mix of LMAO and thinking that if nothing went wrong in the testing (the testing that I hadn't actually started yet), then it could be done. I really wanted to do this well and do it right since this was the first project of this caliber I'd been given.
As a just-in-case, my boss asked me to open a support case with Microsoft so I have someone on the Bat-phone ready to help if I need it. After it's all said and done, Prateek C. was very helpful and professional.
In the actual testing (making a backup of a DPM database, decommissioning the DPM server and then using the second server to restore that same DPM database backup), I learned that it was more complex than the technet article lead me to believe. Assuming you've already moved the DPMDB.bak file onto your new DPM server, the Powershell command to restore the database is
DpmSync –restoredb –dbloc (DPMDB location)
The problem is that you'll likely end up with something like below.
As you can see in the screenshot, it mentions not having access. In order to restore the database, I had to use SQL 2008 Management Studio, detach DPMDB, remove DPMDB and then restore using the DPMDB.bak file we took from the first server.
That actually works. When you open up the Administrator Console, you'll see your old protection groups and all of their protected members. (If it doesn't work and the DPM Administrator Console can't access the DPMDB, the console will still open but only show the top menu bar and the action pane on the right with no available options.The console will also freeze when you try to shut it down as well.)
However, each member will likely complain about "Disk Missing" under Status. When you go to the Management section and click the Disks tab, You'll see the same disk you had on the old server but it too will say that the disk is missing. This is because that disk isn't on the new server you setup; It's still on the old one. In order to remove the missing disk, you have to stop protection on the members that were using that disk. When you stop protection on all of the members of a Protection Group, it deletes the Protection Group. Once you've done that, you have to create a new Protection Group, add the protected member back into it. In testing, I was only using one protection group and one test server. In production, we have over 500 protected members and about 30 Protection Groups.
If you're following along so far, what you've read is that Technet's documentation for moving DPM to a new server is that you move the DPMDB database over to the new server running the same build number of DPM. It doesn't bring over any of your replica or recovery point data so don't be surprised when the DPMDB file for your 20Tb of backup data is actually 2Gb. (FWIW, you CAN move your replicas over but you have to do it one at a time and the previously created recovery point data will still live on the old server.) Also, when you do move the DPMDB database over to the new DPM server as discussed above, you still have to remove the old "missing" disk and recreate all of your Protection Groups.
The good part about this method is that as long as the IP address and the hostname are the same on the new DPM server as it was on the prior one and you're using the same DPM version, the protected hosts won't know the difference so you *shouldn't* have to re-install the Protection Agents on each server you're backing up. However, you're still basically setting up everything in DPM by hand (so.... many.... clicks....).
At this point, you're probably asking yourself why even go through the hassle of moving everything over if you have to reconfigure most of the program as well? I asked myself the same thing.
THE NEW PLAN
The plan I ended up coming up with was actually an idea I'd mentioned sarcastically weeks prior. My sarcastic idea was to setup the new server with the latest version of DPM (System Center Data Protection Manager 2012 R2 for those wondering), write down what's being backed up and in what protection group, write down the configuration for the Protection Groups (retention range, when to sync & recovery point schedules, etc), mass uninstall the DPM 2010 Protection Agent, mass install the new DPM 2012 R2 protection agent and then configure the Protection Groups on the new server. This way, you get the latest and greatest MS backup server and you didn't have to deal with the caveats of actually migrating the data to a new server. This basically became my actual plan. See below for a contextual explanation GIF.
When you stop protection of a member in a protection group, you'll have the option of keeping or deleting the backed up data. If you keep the data, the member will have a status of "Inactive Replica Available". This means that even though this DPM server isn't backing up this data source any longer, you can still recover files from that replica.
My plan was to setup a new server, but instead of killing all of the protection groups, uninstalling the old protection agents, installing the new protection agents and recreating the protection groups on the new server all at once, I'd do it one server or one protection group at a time.
For example, our file server backup would be something I'd move over Friday right before leaving since it's the largest single member of any protection group we have (3Tb+). The only time that data isn't being backed up is the time between stopping protection of that member on the old server and when the replica has been made on the new server. Any files changed or deleted in that time won't be backed up by the new server. That window of vulnerability will be the largest for our file server. That's why it'd be something to do over the weekend (while the changes to files on the file server are at a minimum). All of the other protected members should go relatively quick.
Another thing to consider in this plan are Disk and CPU bottlenecks. For example, in the screenshot to the right, this is what the ResMon (Resource Monitor)graphs look like for DPM. For what it's worth, this is creating replicas for 5 large SQL databases and 1 large system volume. Disk and CPU graphs are mostly maxed out.
It's important to keep the DPM server working hard making new replicas but you don't want to overload it. Yes, it'll all get done eventually but the longer it takes creating the replicas, the more time that's going by without actually backing up the to-be-protected source or computer.
Another thing to consider in this plan are Disk and CPU bottlenecks. For example, in the screenshot to the right, this is what the ResMon (Resource Monitor)graphs look like for DPM. For what it's worth, this is creating replicas for 5 large SQL databases and 1 large system volume. Disk and CPU graphs are mostly maxed out.
It's important to keep the DPM server working hard making new replicas but you don't want to overload it. Yes, it'll all get done eventually but the longer it takes creating the replicas, the more time that's going by without actually backing up the to-be-protected source or computer.
THE CONCLUSION
It actually all went over pretty well. I had installed DPM 2012 but there was an issue with DPM 2012 running on Server 2012. DPM seems to work great until you try to recover a file. When you click the "Recovery" tab, DPM crashes. After consulting Google, other's had experienced this issue as well. It was fixed in DPM 2012 SP1 so we upgraded to that and it's been running fine.
I spent most of last week stopping protection on the old server, removing the DPM 2010 protection agent, installing the DPM 2012 SP1 protection agent and reconfiguring the protection group. Before stopping protection on a data source, I used the "Modify Protection Group" wizard to view things such as retention range, how often to sync and create recovery points, when to initiate an Express Full Backup, etc. I'd note down everything I needed to know when recreating the same protection group on the new server.
After that, all that was left to do was add it to Cacti, write some documentation on what I did and why (so the next guy that tackles this isn't going in completely blind) and raise a glass. If I had more time and spent most of that additional time on Google, I might have resolved the inability to actually migrate the data over. However, given the time constraints, simply building a new server was the best and easiest option for us.
Hopefully this will help someone who's looking at tackling the same task. This was the first project I've handled of this caliber so I'm very proud of how smoothly it seemed to go and I learned a lot about DPM, making servers and test servers in our environment, as well as just testing as a whole. My boss and his boss are also happy with the job I did so I can't ask for more.
Comments
Post a Comment