SQL Server Statistics
I love ’em. They are near and dear to my heart. They are so very important to the query optimizer, and when they ain’t happy ain’t nobody happy. I discovered stats early on in my career, via Inside Microsoft SQL Server 2000 by Kalen Delaney (which was my bible). What some people don’t realize is that when SQL Server is auto updating stats it’s generally while a query is running, where the optimizer realizes the rows have changed sufficiently to invalidate the stats and needs to rebuild them and does so as part of that query execution; as such, the implementation needs to be relatively fast so that query execution isn’t necessarily taking 1000% times longer than it should. With the db option to auto update statistics asynchronously the default behavior changes so the triggering query keeps on with the original stats but a separate thread is fired to update the stats; in this case the same sampling logic is still used, it’s just not affecting that one query. Well, with great speed comes smaller and smaller amounts of data being sampled: the sampling ratio is generally inverse to the size of the data, so the larger your data the fewer rows are sampled; with larger data sets it’s easy to see sampling ratios of 0.001%. One thousandth of a percent. That’s not a realistic number for the optimizer to use.
When I first worked with a large data warehouse it had stability issues. One of the things I found was that the stats were set to auto update, and there was no process to manually update them. With 500 million records in one table we easily saw 0.001% sampling – that’s (500,000,000 * 0.00001) = 5,000 records. The reporting and DW teams would review all execution plans prior to releasing them into production but a lot of the plans would be low cost (like under 20 farkles) with nested loops. One of the knobs I tweaked to improve performance was to create a basic process to update all of the stats in the DW once or twice per month, depending on amount of data change. After getting all of the stats updated we revisited some of those plans – the costs were now in the thousands with some nice hash joins. Reality has arrived!
Fast forward a lot of years. I prefer to have my stats as accurate as they can be (I aim for 100% sampling), while balancing with available maintenance times and the workload for SQL Server to manage the maintenance. Now, I work with Ola Hallengren’s maintenance scripts (see my environment info), and while he’s done great work for index maintenance, this solution has some limitations with statistics maintenance. The parameters for updating stats are basically either do all or do none, with an option for only updating ones which were modified. The challenge with only doing modified is there is no threshold to set; if a table has 100 million rows and one row gets updated, that’s considered modified and you’re looking at scanning 100 million rows to update the stats.
I built a process to update stats based on a threshold value and to perform the updates in a more efficient manner and not simply iterating through all individual stats. This includes stats which have a modified row count over a certain threshold as well as stats which were auto-updated with a sampling ratio under a different threshold. The process works with Ola’s CommandLog table, basically following the same logic and populating the fields the same way. On our largest OLTP servers (vendor-provided with some tables with a lot of data churn) we run this process hourly to make sure we’re not caught with our pants down.