Sunday, November 22, 2009

Covering Index - Include NonKey columns


To help my impatient readers reaching here thro google search, let me come to the point right away.

What is Include Non Key columns in SQL 2005?

Include Non Key Columns allows one to add columns only to the leaf of a non clustered index. Include Non Key columns option is applicable only for Non Clustered indexes.

Whats the use?

* SQL Server has a restriction that the Key columns of an index ( the columns that are present in the non leaf levels of an index) can have a maximum size
of 900 bytes. Also, Number of columns within an index cant exceed 16. Non key
columns are not counted for these restrictions. So if you are likely to execeed
900 bytes Include non key columns.
* Including Non key columns increases your chances of covering your query and hence boosting performance.

How to include non key columns?

SSMS->User Table->Indexes->Index properties->Included Columns.
Refer Screenshot below.


/*****************************************************************************************/

Now for my usual lengthy write up.

Covered indexes have a few drawbacks as stated above. 'Include Non Key columns' is wonderful especially in the following scenario.


DECLARE  @dt DATETIME


SET @dt = Getdate()


SELECT supplier_xid,

       customer_xid,

       product_xid,

       trans_city,

       trans_date,

       full_payment_receieved,

       payment_mode

FROM   dbo.supplier_product_customer

WHERE  trans_city = 'Moscow'

       AND payment_mode = 'Amex'


SELECT Datediff(ms,@dt,Getdate())


Supplier_xid,Customer_xid,Product_xid,Trans_city,Trans_date Columns from the Primary key.Trans_city column has a nonclustered index.The stats obtained by running the query are provided below.The table contains 112155 rows.

Time taken : 343 ms

IO stats : Table 'Supplier_Product_Customer'. Scan count 1, logical
reads 402, physical reads 3, read-ahead reads 3, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Overall query cost : 0.32

Query Plan is shown below:


It shows Key lookup which means that the query engine refered to Non Clustered index
key and jumped to the table to get some data.This doesnt come as a surprise as the Clustered index and Non clustered index doesnt contain the columns full_payment_receieved, Payment_mode columns, as they are not covered by the Non clustered index.
So ,How can we improve it?
One way is to add these two columns to the Nonclustered index on the trans_city column. Thats not a great idea because

* Already there are 5 key columns that are part of the index ( 4 Cluster
+ 1 Non cluster ). Adding two more would make the index's non leaf
pages huge.

* Payment_mode,full_payment_receieved dont have great selectivity. In
other words,Payment_mode column can have a maximum of 6 disitnct
values. ie.Amex / master/cheque /Cash/Visa/Nets. So its not going to
filter out most of the rows.Trans_city column which has reasonable
selectivity ( greater filtering ability/column not having many
duplicate values ) filters most of the rows and only the rest are
filtered by Payment_mode column(filters 1/6th). This makes Payment_mode
a bad candidate for indexing.'full_payment_receieved' has only Yes/No
values and is not part of the filtering criteria itself, so it
doesnt make great sense to store them at non leaf levels of
the index.

Just to stress a basic point in indexes, its the non leaf levels ( higher/upper
levels of the B+ tree ) which direct the way to search for the required row. More
selective ( filering out more rows) the columns in non leaf, shorter the distance to reach the required row(s) and hence better the performance.

So whats the next option? Including these two columns at leaf level alone using
SQL 2k5's Include non key columns.


CREATE NONCLUSTERED INDEX [NCIX_Supplier_Product_Customer_Trans_City] ON [dbo].[Supplier_Product_Customer] (

      [Trans_city] ASC)

INCLUDE ([Payment_mode],[full_payment_receieved])

WITH (pad_index = OFF,

      statistics_norecompute = OFF,

      sort_in_tempdb = OFF,

      ignore_dup_key = OFF,

      drop_existing = OFF,

      online = OFF,

      allow_row_locks = ON,

      allow_page_locks = ON)

ON [PRIMARY]


So, now let us run the same query.


DBCC FREEPROCCACHE


DBCC DROPCLEANBUFFERS



       --To Clear the cache


SET STATISTICS  io  ON

--To capture IO stats


DECLARE  @dt DATETIME


SET @dt = Getdate()


SELECT supplier_xid,

       customer_xid,

       product_xid,

       trans_city,

       trans_date,

       full_payment_receieved,

       payment_mode

FROM   dbo.supplier_product_customer

WHERE  trans_city = 'Moscow'

       AND payment_mode = 'Amex'


SELECT Datediff(ms,@dt,Getdate())



Query plan screen shot is provided above. Query plan doesnt have any Key lookup and Shows NonClustered index seek which implies the query was fully covered with in the index.


Other stats:

Time taken :170 ms

IO stats : Table 'Supplier_Product_Customer'. Scan count 1,
logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.


Query cost :0.003

You can see the difference. 170 ms against 340 ms which is almost 50% improvement.
IO stats indicate a tremendous improvement. 5 reads against 402 reads.
Query cost has improved 100 times from 0.32 to 0.003.

So to sum up, Include Nonkey column is ideal when

* Most of the columns have been covered and 1/2 columns have been left out.

* Left out columns are not a part of filtering criteria.

* Even if the left out column is a part of filtering criteria, if it is not
a major filtering condition and has a bad selectivity factor.

* Like for any indexed column, included columns shouldnt be frequently
updated.

6 comments:

karthik yella said...

A Very Detailed explaination. Excellent Blog Post.

Nagaraj Venkatesan said...

Thank you karthik

KRishna said...

Very good article
see more articles from you.

Nagaraj Venkatesan said...

@Krishna,
Thanks. I will write more. :) :)

Melika NoKaOi said...

Exactly what I was looking for. Thanks!

Sudhir DBAKings said...

Nice post very helpful

dbakings