Tuesday, April 12, 2016

Cardinality Estimator - Part 2- Out of Date Statistics

Continuing from our series on Cardinality Estimator 2014, Let us look at the first difference between 2 CE models. The way estimates are made when the statistics are out of date or stale is the first difference we will be exploring. Couple of prerequisites for this demo

  1. Download SQL Server 2012 Adventureworks database from here
  2. Restore the downloaded database into a 2014 instance and switch the compatibility level to 120 ( SQL 2014 )  

Let us check the row count of the table [Production].[TransactionHistory]

EXEC sp_spaceused '[Production].[TransactionHistory]'
GO


Contains 113,443 rows. Lets insert 10,000 ( little less than 10%) rows so that it doesn't cause the statistic to be automatically updated and statistic remains out of date. Note that 20% of table needs to change for statistics to be updated automatically.

The plan is to insert rows, run queries against both the Cardinality Estimator models and compare the estimates.



Step 1: Insert 10,000 rows.

Insert into [Production].[TransactionHistory]([ProductID]
      ,[ReferenceOrderID]
      ,[ReferenceOrderLineID]
      ,[TransactionDate]
      ,[TransactionType]
      ,[Quantity]
      ,[ActualCost]
      ,[ModifiedDate])
SELECT Top 10000
       [ProductID]
      ,[ReferenceOrderID]
      ,[ReferenceOrderLineID]
      ,[TransactionDate]
      ,[TransactionType]
      ,[Quantity]
      ,[ActualCost]
      ,[ModifiedDate]
  FROM [Production].[TransactionHistory]
  WHERE [ModifiedDate] BETWEEN '20070101' and '20080101'

 
Step 2: Run query against both the models


  SELECT *
  FROM [Production].[TransactionHistory]  WHERE ProductID = 714  OPTION (QUERYTRACEON 9481)



  SELECT *
  FROM [Production].[TransactionHistory]
  WHERE ProductID = 714



Note that the hint "OPTION (QUERYTRACEON 9481)" will force the query to use old CE 70 in SQL Server 2014.Observe the estimates for both.




CE 70:





CE 120:




They are the same. Let us observer the histogram below.




Notice that the histogram contains value "714" as a range hi key and it estimates 645 rows. Estimate on the query plan is slightly higher. As SQL Server knows that approximately 10% of the rows have changed, it makes an adjustment ( 645 + ~10% of 645 ) on the both the Cardinality Estimator models.


Step 3 :


Now let us observe for another value


  SELECT *
  FROM [Production].[TransactionHistory]   WHERE ProductID = 746  OPTION (QUERYTRACEON 9481)



  SELECT *
  FROM [Production].[TransactionHistory]
  WHERE ProductID = 746


Let us compare the estimates now


CE 70:



CE 120:




Now we see a difference in the estimates. Let us check the histogram for the value.




The Value doesn't exist as a Range Hi Key and hence "Avg Range Rows" is used for the estimate. CE 70 uses exactly the value estimated by the histogram but CE 120 (new CE) makes the adjustment to the estimate from Histogram, even if the estimate is a approximate guess via Average Range rows. So, CE 70 estimates the value to be 206 as shown above, while CE 120 estimates it to be slightly higher, at 224.15


So, the takeaway from the post is Cardinality Estimator 120 will adjust the estimate obtained from histogram when statistics are out of date and even when the value has to be guessed via "Avg Range rows".




Sunday, April 3, 2016

SQL Server 2014 - Cardinality Estimator - Part 1

It has been a while since I blogged. Just moved to few other mediums like speaking at user group and SQL Saturday events in the last few months and it has kept me away from blogging for a while. Now back to blogging again. Hope to do a better balancing act :)

Starting from where I left, which was on Statistics and Histograms, I now move on to Cardinality Estimator, one of the major changes of SQL Server 2014

What is Cardinality Estimator?

Cardinality Estimator is responsible for providing the estimates on number of rows returned for each operator. If you have observed a query plan before, the estimated row count is provided by the cardinality estimator. What happens behind the scene is

  1. Query optimizer instructs cardinality estimator (CE) to find the estimated rows for each operation ( read a table, filter, join etc )
  2.  Cardinality estimator checks statistics and histogram to find the number of rows returned
  3.  Query optimizer based on the inputs given by CE, picks the right operator and subsequently the right plan

Obviously, CE plays a key role in getting right plan. Incorrect estimates would lead to poor plan and spiral out into poor performance.

CE has remained unchanged since SQL Server 7.0 and SQL Server 2014 brought about changes on the way estimates are made and on the algorithm used in making estimates. 

Run the following query with actual plan ( Query ->Include Actual Execution Plan ) on "Adventureworks2012". 

SELECT * FROM 
[Production].[TransactionHistory]
WHERE ProductID < 100

Right click on the select operator and click on properties as shown below


Observe the property "CardinalityEstimatonModelVersion". Value equal to 70 indicates it is using Old CE.




Query uses old CE as the compatibility level of "Adventureworks2012" database is 110 (SQL 2012). Switch the compatibility level to 2014 or higher,
rerun the query and you will notice the "CardinalityEstimatorModelVersion" to be 120, which indicates it uses the new cardinality estimator.

In a series of posts ( expect around 10 ), I would be covering the various changes in new cardinality estimator.