SQL Server Statistics
I love ’em. They are near and dear to my heart. They are so very important to the query optimizer, and when they ain’t happy ain’t nobody happy. I discovered stats early on in my career, via Inside Microsoft SQL Server 2000 by Kalen Delaney (which was my bible). What some people don’t realize is that when SQL Server is auto updating stats it’s generally while a query is running, where the optimizer realizes the rows have changed sufficiently to invalidate the stats and needs to rebuild them and does so as part of that query execution; as such, the implementation needs to be relatively fast so that query execution isn’t necessarily taking 1000% times longer than it should. With the db option to auto update statistics asynchronously the default behavior changes so the triggering query keeps on with the original stats but a separate thread is fired to update the stats; in this case the same sampling logic is still used, it’s just not affecting that one query. Well, with great speed comes smaller and smaller amounts of data being sampled: the sampling ratio is generally inverse to the size of the data, so the larger your data the fewer rows are sampled; with larger data sets it’s easy to see sampling ratios of 0.001%. One thousandth of a percent. That’s not a realistic number for the optimizer to use. Continue reading