Overview

We recently performed a major version upgrade of our CRM product, including upgrading to SQL 2016 (see my working environment for details on our environment, and this post for part of our CRM upgrade shenanigans).  As part of Thomas LaRock’s guidance we planned on updating all statistics.  For our CRM environment we have a large amount of data and not a lot of time for the upgrade and post-upgrade maintenance, so I needed to make it as optimal as possible.  Enter parallel updating of statistics, new to SQL 2016!  (Ok, and backported to SQL 2014 SP1CU6).

Parallel updating of statistics allows us to not only have stats done in parallel but perhaps to take advantage of the merry-go-round (aka “Advanced Scan”) in Enterprise Edition?  It’s a good theory, but unfortunately as this link and this link show, difficult to demonstrate.  My results don’t prove anything but I like to think that it was a good thing.

The challenge with parallel operations in T-SQL is that … well … T-SQL commands are single-threaded, so you can’t easily spawn multiple threads to perform different actions asynchronously/in parallel (I’m working on a way to do this though).  In the case of post-upgrade work where we were already in an outage and didn’t need to have a repeatable process, what I did was good enough.  Since all of the statistics were to be done, I created a control table to hold the records, collected the names and table/index names all of the statistics from all databases, and built a small engine to process the records.

Collect the statistics

To collect the statistics I used the basic queries from my custom stats updater (which uses sys.dm_db_stats_properties) to create a system procedure that I called “sp_MultiThreadUpdateStats_PrepWork”, and just ran it in every database:

EXEC sys.sp_MSforeachdb
 'EXEC (''USE [?]; EXEC sp_MultiThreadUpdateStats_PrepWork'')'

The proc inserted the metadata into a table “StatsToUpdate” in our DBA database “SQLAdmin”.  The basic process is to have a proc which can:

  • Iterate over all of the non-updated statistics in the table;
  • Set a value in the control table to show that it is handling that statistic, as well as the START datetime and the SPID it’s running under;
  • Perform the stats update; and
  • Set the END datetime for that record

Then comes the fun engine part – how to control concurrency in the engine, or, as I like to call it, my Case for Cursors.

Drive the Engine

I needed to create a process to run the stats updater in parallel, but not to rely on T-SQL, so I used the SQLQueryStress tool.  With a proc as the engine, I just need to set the “Number of Threads” value to a realistic number.  I want to be able to allow parallelism to happen if the stats updating needs it, without being starved for processor resources, but also not to overload the memory grants and prevent the sessions from actually running.   So the target was the number of processors on the machine divided by the MAXDOP setting, -1 (with a minimum of half the number of processors).  On prod CRM, for instance, with 32 processors and MAXDOP set to 4, I set it to 7 threads to allow 4 CPUs to do other work.

The query in SQLQueryStress to drive the engine was basic:

	EXEC sys.sp_MSforeachdb
	'IF EXISTS ( SELECT  *
			FROM    SQLAdmin.dbo.StatsToUpdate AS stu
			WHERE   stu.DBName = ''?''
					AND stu.UpdatingBySPID IS NULL)
	BEGIN
		EXEC (''USE [?]; EXEC sp_MultiThreadUpdateStats_MultiThreaded'')
	END'

While there are any stats un-updated, keep updating stats.  Since the control table has the database name, and we need to update all stats, go through all records.

The procedure

Basically the procedure needed to iterate through all stats which had not been started, and start updating it.  To provide concurrency, I used a cursor with the following hints:

	SELECT  stu.StatsToUpdateID
		  , QUOTENAME(stu.SchemaName)
		  , QUOTENAME(stu.TableName)
		  , QUOTENAME(stu.name)
	FROM    SQLAdmin.dbo.StatsToUpdate AS stu WITH ( READPAST, XLOCK )
	WHERE   stu.UpdatingBySPID IS NULL
			AND stu.DBName = DB_NAME()
	FOR UPDATE

The breakdown:

  • Local cursor – local to the session, does not need to be explicitly closed and deallocated
  • Dynamic cursor – instead of a Keyset cursor, rerun the query on each fetch.  This way in case the underlying data changes so the next row no longer complies with the query, the next fetch will not get that row.
  • SCROLL_LOCKS cursor hint– Specifies that positioned updates or deletes made through the cursor are guaranteed to succeed. SQL Server locks the rows as they are read into the cursor to ensure their availability for later modifications.
  • WITH (READPAST) table hint – Specifies that the Database Engine not read rows that are locked by other transactions.
  • WITH (XLOCK)  table hint – Specifies that exclusive locks are to be taken and held until the transaction completes.
  • FOR UPDATE cursor hint – Allow updates through the cursor

The combination of (Dynamic cursor, SCROLL_LOCKS, READPAST, and XLOCK) allows multiple sessions to pick from the bin of work to be done, without fear of stepping on each others’ toes.  Without the READPAST hint, for example, each one coming in will be blocked trying to read the record held by the XLOCK.  Without the XLOCK, every session will get every record and try to process them.  The Local cursor is simply a lazy man’s preference, and the FOR UPDATE is so we can update the status after the UPDATE STATISTICS finishes without having to go back to determine which record we were on.  The rest of the engine is fairly straightforward – set the statuses, build and execute the UPDATE STATISTICS string, and set the end time status:

OPEN StatsCur

WHILE 1 = 1
    BEGIN
        FETCH StatsCur
        INTO
            @RowNum
          , @SchemaName
          , @TableName
          , @StatsName
        IF @@FETCH_STATUS <> 0
            BREAK

        BEGIN TRY

            UPDATE  SQLAdmin.dbo.StatsToUpdate
            SET
                    UpdatingBySPID = @@SPID
                  , UpdatingStartTime = GETDATE()
            WHERE CURRENT OF StatsCur

            SELECT  @SQL = 'UPDATE STATISTICS ' + @SchemaName + '.' + @TableName + ' (' + @StatsName + ') WITH FULLSCAN;'

            EXEC ( @SQL )

            UPDATE  SQLAdmin.dbo.StatsToUpdate
            SET     UpdatingEndTime = GETDATE()
            WHERE CURRENT OF StatsCur;

        END TRY
        BEGIN CATCH

            THROW
            CONTINUE

        END CATCH

    END

There may be cases where updating one stat will take much longer than others, this allows other threads to keep going.