Tuesday, November 9, 2010

Finding CPU Pressure - dm_os_schedulers DMV


The last post dealt with checking CPU pressure using wait_stats DMV. But, to get a complete picture of CPU Pressure, the script provided in the previous post alone wouldn't suffice. We need the help of additional DMV
sys.dm_os_schedulers.

Why we need the help of sys.dm_os_schedulers?
As already mentioned, Waitstats DMV captures waiting time for a group of wait types. WaitStats DMV works in the following way. WaitStats checks whether any processes waits at any of the defined wait types. If yes, then WaitStats tracks resource wait time experienced by the process. After the resource wait is over, waitstats tracks the signal wait/CPU wait. Once, both the waits are completed, the waitstats reflect in the DMV.

There have been scenarios where a process doesn't experience any resource wait and just has a CPU wait/Signal wait. In such a case, the waiting time misses the wait stats, and wait stats doesn't reflect the CPU pressure experienced at all. In other words, if a process doesn't wait for any resource and directly gets
into the CPU queue and spends long time only in runnable state, then
sys.dm_os_wait_stats doesn't reflect it. The Same scenario is explained in detail by MVP Linchi Shea over here.

In such a scenario, one can use sys.dm_os_schedulers to detect CPU pressure.
Consider the following query.


SELECT scheduler_id,
       cpu_id,
       current_tasks_count,
       runnable_tasks_count,
       current_workers_count,
       active_workers_count,
       work_queue_count
FROM   sys.dm_os_schedulers
WHERE  scheduler_id < 255; 


Simple and straight forward query on sys.dm_os_schedulers DMV. Returns one row per CPU. Scheduler_id < 255 filters out schedulers used by system processes. Detailed explanation on the DMV can be obtained from here. Most important column is runnable_tasks_count which indicates the number of tasks that are waiting
on the runnable queue. Once a process moves out of the runnable state into running state, the value of the column reduces by one. So, the DMV always indicates the current state of the server and the runnable queue.dm_os_schedulers doesnt hold historical data since last restart unlike waitstats dmv. So, this needs to
polled lot more frequently, perhaps every 5 minutes if necessary. Any non zero value noted on runnable_tasks_count is a indication of CPU pressure and requires close monitoring.

To sum it up, CPU monitoring for SQL Server would include three parts.

* Directly checking CPU using sys.os_ring_buffers dmv as explained here.
* Checking waitstats dmv as explained here
* checking sys.os_schedulers as explained in this post.


No comments: