Monday, 22 December 2014

TFS Backup/Restore – Lessons Learnt

I was upgrading a TFS from 2012.2 to 2013.4 today and came across unexpected situations in backing up and restoring DBs. Thought worth sharing my experience to anyone else doing upgrades might easily do these mistakes.

Lesson 1
I have done a trial upgrade before today’s live upgrade and used TFS admin console backup schedule to take a manual scheduled full backup of TFS. This was a hardware migration upgrade and worked fine for the trial. Today when I started actual upgrade I took the TFS offline by executing “TFSServiceControl quiesce” and “net Stop WAS” commands. After that as in trial I started the scheduled backup manually from TFS admin console.
After about 30 minutes nothing happened. No backups were created in the path specified. This is when me and my colleagues realized this could be because we have taken the TFS services offline (it was necessary to take it offline to make sure no team members from three global offices access the current TFS while upgrade going on, so we can assure no data new TFS production).
We started TFS services again using “TFSServiceControl unquiesce” and “net start WAS” followed by a “iisreset”. Suddenly backups started getting created. So it is confirmed TFS services specially Job Agent Service should be running to get backups via admin console scheduled backups. Since we wanted to take TFS offline we did take it offline again, and took manual DB backups. Including backing up Reporting Services key as well. List of items we backed up is below for reference. ( Bad thing was we lost about one hour of the planned schedule)
All Collection DBs
Tfs_Analysis DB
Tfs_Warehouse DB
ReportServer DB
ReportServerTemp DB
Reporting Services key

Lesson 2
The second one was a bit of unfortunate situation. While we were taking down build servers etc we have seen a notification of windows update. We did not consider it would affect our work since we had seen nothing in the intermediate machine we have used to do the upgrade. We took backups of all TFS DBs and started restoring them in a intermediate server to run the upgrade before moving database to new production hardware. One collection DB out of six collection DBs having size of 182 GB was the largest. While we were restoring it in the SQL 2008 R2 SP1 (Simulating current production TFS environment) took around two hours to reach 80% restore completion and suddenly crashed. The error message was SQL services not found. When checked found SQL Server is down. Reason was a windows update has shut down the service of SQL while restore operation is going on. Disabled windows updates, and restarted restore DBs.This cost us 2 additional hours.

1 comment:

Sachith Abhayaratne said...

Thanks for Sharing ! (Y)