Wednesday, January 13, 2016

SQL Server Statistics - Histograms

Most of us are aware of Statistics in SQL Server and its purpose. But still, just to get started, statistics store information on the distribution of data in a column(s), which helps the query optimizer pick the right query plans. Effectively, it helps to answer questions like "how many rows with value 25 on column "Age" or how many more than 15 etc..

This post will get little in depth on how statistics are actually stored in the database. Statistics are actually stored as "histograms" ( https://en.wikipedia.org/wiki/Histogram). One may take a look at histogram used in statistics by expanding a commonly queried table in your database, expand statistics -> properties -> Click on "Details" tab.

 
Interesting part of the image above is the histogram steps. One would notice that histogram steps contain entries for selected values in the column. Basically what histogram steps intend to convey is the number of rows that exist for a particular or range of values.
For example, for a column containing
marks, it attempts to say
 
0 - 10   there are 15 rows
10 - 30 - there are 45 rows
50 - 80 - there are 60 rows
80 - 100 - there are 20 rows and so on..
 
Observe RANGE_HI_KEY column. It indicates the highest value in the range. Each row represents the range starting from the ending of previous row ( Previous row RANGE HI KEY value )  to the value in RANGE_HIGH_KEY. For example the second row "" indicates distribution between x to y
 
  1. RANGE_ROWS Column indicate the total number of rows in the range ( Previous hi key to current hi key )
  2. DISTINCT_RANGE_ROWS indicate how many among them are distinct
  3. AVG_RANGE_ROWS indicate the average row returned per distinct value
  4. EQ_ROWS indicate total the number of rows for the range hi key value alone.
 
The most important part is number of steps in the histogram are fixed to 200. Meaning the histogram will have maximum of 200 entries / rows irrespective of number of rows the table. Even if there are 1 million rows, 200 entries will describe the column's data distribution.
So, why this post on statistics? :) Armed with this knowledge, let us dive into deeper concepts like cardinality estimator and how it has changed in SQL 2014 in upcoming posts.
 


12 comments:

  1. Thanks For sharing an informative blog keep rocking bring more details...At Sarkari result adda, you can check the updated RRB Recruitment 2020 notifications for both fresher and experienced candidates under various departments and then you can apply for the post which is suitable to your qualification.

    ReplyDelete
  2. I have been reading for the past two days about your blogs and topics, still on fetching! Wondering about your words on each line was massively effective. Techno-based information has been fetched in each of your topics. Sure it will enhance and fill the queries of the public needs. Feeling so glad about your article. Thanks…! keep it up
    Ai & Artificial Intelligence Course in Chennai
    PHP Training in Chennai
    Ethical Hacking Course in Chennai Blue Prism Training in Chennai
    UiPath Training in Chennai

    ReplyDelete
  3. You comprehend your errands stand separated of the gathering. there may be something uncommon almost them. It appears to me all of them are in truth amazing!File Viewer Plus Activation Key Free

    ReplyDelete

  4. Here we will be making your work a lot easier by our perfect anniversary wishes for the wife that will make your bond even more special and stronger. Rekindle your feelings and love with these pretty words and wedding anniversary wishes for wife. .Anniversary Words For Wife

    ReplyDelete
  5. Hi, I wish to be a regular contributor of your blog. I have read your blog. Your information is really useful for us.
    Oracle Cloud infrastructure Training
    SAP SCM Training
    ASP .NET Training

    ReplyDelete
  6. Thanks for sharing such an useful and helpful post.
    SQL course in Pune

    ReplyDelete