As part of a major CRM upgrade we upgraded from Windows 2008R2 and SQL Server 2008R2 to Windows 2012R2 and SQL Server 2016 (woo-hoo!).  Our CRM product is a little complex (see my working environment for some details), and it’s large (over 1TB for the main database), and we needed to perform the upgrade in all of our non-prod environments (4 total for this one).  Since it’s a critical revenue-generating system, our production upgrade would need to be as fast as possible.  Due to the size of the data and the amount of systems and integrations, we were already expecting to have a multi-day outage for the upgrade and full post-upgrade testing.

To migrate a server you not only need to migrate databases, you need to migrate logins, linked servers (and their logins/passwords), logins and their passwords, server permissions, jobs and job steps, operators, notifications, job history … the list is long.  We needed a process which would reduce the amount of manual effort involved, which would also reduce the probability of errors from missing something.  We couldn’t simply perform an in-place upgrade of SQL Server because we also needed to upgrade the OS, and given the amount of customizations the CRM application does to the OS we couldn’t be sure it would work; we decided it would be much better to provide a clean field for a new installation.

We also needed a mechanism which could provide a plan to rollback to the exact original state.  In the past, for rollback purposes we’ve taken VMWare clones, which are basically copies of the entire VM (including VMDKs); this means that the entire server is copied, which not only takes up space for the second copy but also introduces multiple hours of downtime prior to the actual upgrade to be able to copy all of the data.  Our storage and VM team had a big question – how could we leverage our SAN technology to speed up this process?  What they discovered was some great magic ju-ju!

Using VMWare, we were able to take advantage of their VAAI – vSphere storage APIs Array Integration.  This set of APIs interacts with storage to perform the clones.  By ensuring that our VM guests’ VMDKs were all on the same datastore we also ensured that they were all on the same aggregate on the filer (we have a NetApp, your terminology may be different), and this allowed the storage component to simply create snapshots for the VMWare clone process.  This took under 15 minutes to complete for all three servers, instead of over 4 hours, which was great!  We had our safety net.

Next I wanted to figure out the fastest way to upgrade from SQL 2008R2 to SQL 2016.  I had tested some other cases of doing in-place upgrades and they all just worked, so I figured we could first try that.  Because these are all VMs, and all of our database files are all on VMDKs (including system databases, errorlog files, etc.; nothing on the C drive!), we could move the VMDKs from one server to another, and then do an in-place SQL upgrade because you can’t simply copy the system databases (master, msdb) from a prior version of SQL Server and have it work.  There are some gotchas here.  Our servers are built with the same layout for drives:

  • C – OS
  • D – application installs (some older servers do not have this)
  • I – Main data and SQL system database files
  • J – Database log files
  • K – TempDB files

But not all of our older CRM servers were built to this standard; specifically SQL Server would be installed on the C drive instead of the new D drive.  More on this a little later.

Basically we would have this process:

  1. Existing server is SQL1
    1. Windows 2008R2
    2. SQL 2008R2
  2. Build new server SQL2
    1. Windows 2012R2
    2. SQL 2008R2   ** This is important!  Needs to be the exact same version/build
  3. (do a bunch of other prep work, specific to our environment)
  4. Shut off SQL1
  5. Rename SQL1 to SQL1_OLD
  6. Rename SQL2 to SQL1
  7. Delete the VMDKs from SQL1
  8. Move the VMDKs from SQL1_OLD to SQL1
  9. Bring up SQL server on SQL1 (Windows 2012R2)

From here SQL Server would not necessarily work; the data directories on the drives were “I:\MSSQL10_50.MSSQLSERVER” but the startup parameters were using our new build of “I:\SQLServer\MSSQL10_50.MSSQLSERVER”, so I simply moved the directory to the expected location.    With this we need to start SQL Server without recovering any databases, then change the locations of those databases in master. Unlike what some web sites claim, SQL does not need to be started in single-user mode to do this.

      1. Move directory “I:\MSSQL10_50.MSSQLSERVER” to “I:\SQLServer\MSSQL10_50.MSSQLSERVER
        1. The startup parameters are already looking for master in “I:\SQLServer\MSSQL10_50.MSSQLSERVER”
      2. Start SQL in master-only recovery mode (Trace flag 3608)
        1. Can use NET START MSSQLSERVER /T3608 from a command window
      3. Connect to SQL Server using SSMS and move databases. This changes the catalog so the next time SQL tries to start them up it will look in the correct location.
        alter database model modify file ( name = modeldev, filename = 'I:\SQLServer\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\model.mdf')
        alter database model modify file ( name = modellog, filename = 'I:\SQLServer\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\modellog.ldf')
        alter database MSDB modify file ( name = MSDBData, filename = 'I:\SQLServer\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\MSDBData.mdf')
        alter database MSDB modify file ( name = MSDBLog, filename = 'I:\SQLServer\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\MSDBLog.ldf')
      4. If SQL Server thinks the MSSQLSYSTEMRESOURCE database is on C, it’s a trick to fix it, since the resource database technically can’t be moved; SQL Server expects it to be where the binaries are located. When starting up with trace flag 3608 the resource database is NOT started (just master), so this will only show as an issue after moving the system databases and starting SQL in normal mode, and to fix requires 2 SQL restarts.
        1. In the errorlog you’ll see:
          Starting up database 'mssqlsystemresource'.
          Error: 17204, Severity: 16, State: 1.
          FCB::Open failed: Could not open file C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\mssqlsystemresource.mdf for file number 1.  OS error: 3(The system cannot find the path specified.).
          The resource database has been detected in two different locations. Attaching the resource database in the same directory as sqlservr.exe at 'D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\sqlservr.exe' instead of the currently attached resource database at 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\mssqlsystemresource.mdf'.
          Starting up database 'mssqlsystemresource'.
          The resource database build version is 10.50.6000. This is an informational message only. No user action is required.
        2. Copy folder “D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER” to “C:\Program Files\Microsoft SQL Server\”
        3. Start SQL Server, it will find the resource database but in a location we don’t want it to use. Stop SQL Server.
        4. Delete folder “C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER”
        5. Start SQL Server. The errorlog should show:
          The resource database has been detected in two different locations. Attaching the resource database in the same directory as sqlservr.exe at 'D:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\sqlservr.exe' instead of the currently attached resource database at 'C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn\mssqlsystemresource.mdf'.
          Starting up database 'mssqlsystemresource'.
          The resource database build version is 10.50.6000. This is an informational message only. No user action is required.
          
        6. Start SQL Server, validate works as expected
      5. SQL Agent has subsystems which use binaries (SSIS, PowerShell, replication, etc.), and if you’re changing the install directory like this you also need to change those.  The easiest way is to delete all of the entries using C and run “exec msdb.dbo.sp_verify_subsystems 1” (http://support.microsoft.com/kb/914171/en-us).

DELETE  dbo.syssubsystems
WHERE   subsystem_dll LIKE 'C:\%'
EXEC msdb.dbo.sp_verify_subsystems 1

    From here I just performed an in-place upgrade from SQL 2008R2 to SQL 2016 and everything. just. worked.  Trust me, I was surprised it all worked.