Sunday, October 11, 2009

Fill Factor - 3

As already mentioned this post will attempt to answer the follwoing question

> While deciding fill factor, What options are we left with when we don't have a good idea of how many inserts are happening?

There are two DMVs which can help us in such a case.

1) sys.dm_db_index_operational_stats
2) sys.dm_db_index_usage_stats

Let us take up the first one - sys.dm_db_index_operational_stats.
Index_operational_stats provides very very valubale info like

> Leaf_insert_count/Leaf_delete_count/Leaf_update_count
> Leaf_allocation_count

As the name suggests the columns Leaf_insert_count,Leaf_delete_count,Leaf_update_count provide
the number of rows inserted,updated,deleted on the index. So when you want to check for the important
table then query for the clustered index of the table as clustered index's leaf level represents
the actual table. It can be done by

FROM sys.dm_db_index_operational_stats
( db_id() , object_id('Supplier_Product_Customer'), 1 , NULL )
WHERE Index_id = 1

Just a small note here. Sometimes the leaf_delete_count can be zero( or less than actual number of deleted rows ) even if deletion had occur ed. In that case one can check leaf_ghost_count which would represent the number of deleted rows. The deleted rows are not physically moved but are marked as deleted ie logical deletion.SqlServer periodically physically removes deleted rows from the disk but not immediately after the deletion.leaf_ghost_count also has number greater than zero when transactions are rolled back.

Leaf_allocation_count indicates the total number of new pages that were allocated to the table.This can relate to the Page splits, if most of your inserts are completely randomized and very little is happening at the tail of the index.

There are quite a few other useful columns row_lock_count,row_lock_wait_in_ms but I will leave them for you to explore.

So, we have all the info that's needed to compute the fill factor. Apply the
on the method specified here then you should get your ideal fill factor..right.. But not so easy..Like all wonderful things in life, this one also comes with a drawback. One major drawback is that the details about the table are retained on the
DMV only till the meta data of the table remains in cache. Once the table's meta data moves out of cache, the row containing the useful info is removed from DMV and the values are reset to zero. But how long it will be retained depends on how well your table is getting used and how well other tables are getting used as well.

So, one option you have is to dump the DMV's result periodically into a table and then later use it for analysis. As querying this DMV for a single ( or few ) table doesn't acquire any locks on the actual table,and it is not a resource consuming exercises. But, the analysis can be very tedious because one needs to check whether the table's meta data moved out of the cache or not by comparing the result recorded previous time.For ex: if your number of inserts recorded at 10 AM is less than number of inserts recorded at 9:45 AM then its likely that the table has moved out of the cache between 9:45 and 10. So, analysis can be painful but its the best option available.

Then the Second option we have is sys.dm_db_index_usage_stats .Column of interest is user_updates.There are other interesting columns like user_seeks, user_scans, last_user_seek, last_user_scans but now the one that's needed is user_updates. User_updates column indicates the number of changes that occur ed on the index. In other words,User_updates number represents the total number of insert,update and delete queries that were fired against the table. Point to note
is, its total number of queries and not rows which kind of dents our cause. But it can be useful if you are aware of your queries say something like all your inserts/updates/deletes affect one row or 'x' rows at a time.The good thing is that the DMV retains the value till the next restart of the SQLServer.Still its not a accurate method of measuring as its a cumulative sum of update, delete ,insert queries and doesn't provide the breakup.

After estimating the inserts,updates and deletes and setting the initial fill factor one needs to monitor fragmentation numbers, page count( for checking number of page growth ), page splits and if possible data distribution. Data distribution would help one to check whether the data distribution is skewed and up to what extent.Skewed data can definitely disturb your calculations and make indexes perform
below par. 2k5 doesn't provide a perfect solution for skewed data distribution and its problems.But 2k8 to a certain extent does with Filtered indexes. Will write a piece on that soon.

At last I am kind of done with the series on Fill factors and fragmentation.
Will stay away from this one at least for the next few posts :)


Deepak said...

Excellent 3 part article on Fill Factor. Fill Factor fulfilled now!!!

Thanks again.

Nagaraj Venkatesan said...

:) Thanks for reading Deepak