Sunday, November 15, 2009

Covering Index


Before we get started with this post please read the last one here.Even if you dont, just know that the leaf of a non clustered index contains Clustered index key.

So what are we upto now?
Covered indexes.

Covered indexes arguablly gets the best out of an index. When all the columns that
are required by the query ( Both Select columns - Projection and where clause columns ( selection criteria ) )are present in an index, then the index is said to be covered index.Covered index performs much better than Non Clustered index + Row look up as the query need not go( rather jump ) to the Data page to fetch the data.
All data access starts and ends within the index itself. The number of data pages in a table are always greater ( theortically greater than equal to ) than the leaf pages of an index, as data pages need to hold all the columns in the table whereas the index leaf pages hold only the indexed columns. Let us see an example..

Consider the following table


CREATE TABLE [dbo].[Supplier_Product_Customer] (

  [Supplier_xid]  [INT]    NOT NULL,

  [Customer_xid]  [INT]    NOT NULL,

  [Product_xid]   [INT]    NOT NULL,

  [trans_date]    [DATETIME]    NOT NULL,

  [Qty]           [INT]    NULL,

  [Trans_city]    [VARCHAR](100)    NULL,

  [Note]          [VARCHAR](500)    NULL,

  [Rate_per_unit] FLOAT,

  [comments]      VARCHAR(500),

  [balance_left]  INT,

  [Payment_mode]  VARCHAR(50)

  CONSTRAINT [PK_Supplier_Product_Customer] PRIMARY KEY CLUSTERED ( [Supplier_xid] ASC,[Customer_xid] ASC,[Product_xid] ASC,[trans_date] ASC ) WITH ( pad_index = OFF,statistics_norecompute = OFF,ignore_dup_key = OFF,allow_row_locks = on,allow_page_locks = on,FILLFACTOR = 90 ) ON [PRIMARY])

ON [PRIMARY]


A typical transaction table with a primary key on Supplier id ( Supplier_xid) , Customer id ([Customer_xid] ),Product ID ([Product_xid]) and transaction date ( [trans_date] ). So We have clustered index key on our primary key columns.

The table has 112055 rows. Total number of pages is 1020. Consider the following query.


SET STATISTICS  IO  ON

--To get IO stats



SELECT supplier_xid,

       customer_xid,

       product_xid,

       trans_city,

       trans_date

FROM   dbo.supplier_product_customer

WHERE  trans_city = 'Perth'


The query returns 22316 rows.

IO Statistics are provided below.

Table 'Supplier_Product_Customer'. Scan count 1, logical reads 1029, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Grab a look at the query plan.




Query Plan shows a clustered index scan.

Points to note are

* Query plan uses Clustered index scan
* Total Logical Reads : 1029
* Query cost: 0.881357

From IO stats and query plan its obvious that the entire table was read/scanned.

To make this query use a covered index, let us create a nonclustered index on Trans_city column.


CREATE NONCLUSTERED INDEX [NCIX_Supplier_Product_Customer_Trans_City] ON [dbo].[Supplier_Product_Customer] (

      [Trans_city] ASC)

WITH (pad_index = OFF,

      statistics_norecompute = OFF,

      sort_in_tempdb = OFF,

      ignore_dup_key = OFF,

      drop_existing = OFF,

      online = OFF,

      allow_row_locks = ON,

      allow_page_locks = ON)

ON [PRIMARY]


Let us run the same query again:


SET STATISTICS  IO  ON

--To get IO stats



SELECT supplier_xid,

       customer_xid,

       product_xid,

       trans_city,

       trans_date

FROM   dbo.supplier_product_customer

WHERE  trans_city = 'Perth'


The query returns the same 22316 rows.

Query plan provided below.



IO Stats are provided below.

Table 'Supplier_Product_Customer'. Scan count 1, logical reads 100, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Points to note are

* Query plan uses Non Clustered index seek
* Total Logical Reads : 100
* Query cost: 0.105275

Query plan indicates a non clustered index seek using the new created NCIX_Supplier_Product_Customer_Trans_City index instead of the clustered index scan.
The number of IOs are also very very less and its down to 100 compared to 1029 in clustered index scan. The cost of the query shows great improvement from 0.88 to 0.105. Hence, there is a significant improvement in the performance as well.

The reason as explained, the query engine accesses only NCIX_Supplier_Product_Customer_Trans_City index and not the table at all as all the columns are present with in the index itself. The out of the columns selected Supplier_xid,Customer_xid,Product_xid,Trans_city,Trans_date make the clustered index which is present in the leaf nodes of Non clustered index. Trans_city column is the non clustered index key and so its also present in the non clustered index. Non Clustered index is definitely smaller than the entire table and hence less number of reads.. So better performance.Had it been any version before SQL 2k then using non clustered index would have needed 22316 IOS + cost of traversing index non leaf pages , which lot more expensive than table scan and hence it would have perfomed a table scan again.But with SQL 2k and after, the unwanted table scan was prevented.

Covered index in SQL 2k has a drawback.SQL 2005 has gone one more step ahead and improved this performance even more by making an enhancement on covered indexes.We will look into the drawback and the improvement done on SQL 2k5 in the next
post.

No comments: