Sunday, October 31, 2010

Finding CPU Pressure using wait_stats DMV

CPU Pressure forms vital part in performance monitoring and sys.dm_os_wait_stats is a good place to check. Some background info before we look at the DMV.

Waiting time for a SQL Process is divided into two. They are Resource waits and signal waits( CPU waits ). When a process/query is submitted to SQL Server, SQL Server checks whether all resources required for the process are available.Resources would include things like IO, Network, Memory, Locks, Latch waits etc. For Ex: If a process is waiting for another process to release its lock, then it becomes a resource wait because of locks. If a process wants to write something to a disk but its has to wait for some other IO activity to complete then its a resource wait because of IO. So all these waiting time gets categorized as resource waits..

After a process grabs all the resources it requires, the process gets scheduled to run/executed by the Processor/ CPU. So, it enters the CPU queue. If there are already many processes executing, then it has to wait to get the CPU. The time spent waiting for the CPU is termed as signal waits.

sys.dm_os_wait_stats DMV provides the cumulative sum of the waiting time experienced by all processes at each wait type. dm_os_wait_stats DMV is a SQL Server 2005/2008 equivalent of DBCC SQLPERF(waitstats). A lot more accurate explanation about the DMV is provided here . dm_os_wait_stats maintains these stats since last restart of the server or since the last execution DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR);

sys.dm_os_wait_stats provides about 485 wait types/wait events and waiting time spent at each wait type. Two interesting columns of this DMV are

* wait_time_ms
* signal_wait_time_ms

Wait_time_ms gives the total time spent at each wait type ( Resource + cpu wait ). signal_wait_time_ms specifies the time spent on waiting for CPU. Resource waittime can be found by wait_time_ms - signal_wait_time_ms.

SELECT SUM(signal_wait_time_ms) * 100 / SUM(wait_time_ms) AS
       SUM(signal_wait_time_ms)                           AS cpu_wait,
       SUM(wait_time_ms - signal_wait_time_ms)            AS resource_wait,
       SUM(wait_time_ms)                                  AS total_wait_time
FROM   sys.dm_os_wait_stats 

Consider the query above. cpu_pressure_percentage provides percentage of time spent waiting for CPU, by finding the ratio of signal_time and total wait time. Microsoft recommends this value to be less than 25% for a normal system. Value any higher would indicate a system suffering likely from a CPU stress.
Most common causes for CPU stress/high signal waits can be

* Poor and ineffective query plans / missing indexes / table scans
* Too many Plan Recompliations
* Slow CPU and requires a upgrade

As a production DBA, its a good idea to collect the resultset dm_os_wait_stats DMV once few hours or twice or thrice a day and perhaps analyze the results to notice any surge in wait time. This can be one proactive approach to detect potential major problems.

Saturday, October 23, 2010

Backup path

Sometimes after taking a backup, one can forget the path where the backup was taken.
The simple script below provides the list of backups taken on the server, backup type ( log - l, Full -D..),backup start date and time, lsn numbers and the path of the backup. Simple and handy.

SELECT database_name,
FROM   msdb.dbo.backupset,
WHERE  backupset.media_set_id = backupmediafamily.media_set_id
ORDER  BY backup_start_date DESC 

Tuesday, October 19, 2010

Currently executing query - wait status

Recently, I had a developer who came to my desk with a request.

' I started a execution of a script from a program. the program has been running for a long time. I don't know whether its executing or hanging. Can you check from database whether the script is running or hanging and if hanging at which statement it is hanging?. If possible please tell me why its hanging..'

Earlier in my blog, I posted the following script to check the currently running query.

SELECT spid,
       TEXT                       AS [query],
       Db_name(sysprocesses.dbid) AS dbname,
FROM   sys.sysprocesses
       OUTER APPLY Fn_get_sql(sql_handle)
WHERE  spid > 50

The above query would give the stored procedure's name or starting statements of the script/batch and not the exact statement within the stored procedure or batch that is currently executing. To find that one should use sys.dm_exec_requests

SELECT session_id,
       Substring(txt.TEXT, ( statement_start_offset / 2 ) + 1,
       ( ( CASE statement_end_offset
       WHEN -1 THEN Datalength(txt.TEXT)
       ELSE statement_end_offset
           - statement_start_offset ) / 2 ) + 1) AS statement_text,
FROM   sys.dm_exec_requests
       CROSS APPLY sys.Dm_exec_sql_text([sql_handle]) AS txt
WHERE  session_id <> @@SPID
       AND session_id > 50 

The query above gives the currently executing statement within a batch. sys.dm_exec_requests provides the columns statement_start_offset ,statement_end_offset which specify the starting and ending positions of the currently executing query within a batch, in bytes. As 'text' column returned by dm_exec_sql_text is of nvarchar datatype, statement_start_offset/2 is required to get to the starting postion of the query. statement_end_offset returns -1 to indicate the end of the batch.So, statement_text column of the query answers the first part of the developer's request.

Now, for the next part , ie finding the status of the query ie.. running/hanging/sleeping etc

A user request can take the following major states.

> Sleeping : Query completed execution and has nothing to do. Connection
is still not closed.
> Running : Currently running/executing.
> Runnable: Can run but waiting for a processor. Implies waiting in
processor queue.
> Suspended: Waiting for some resource or event. Wait may be because of
locks, IO completion, Latch or Memory wait etc.

Sleeping requests are not shown in sys.dm_exec_requests. Runnable/Suspended state requests are the ones we need to watch out for. Runnable/Suspended states are not abnormal, as long as they dont stay at the same state for a long time.

Now for the third part..' Why it is hanging..'

The wait_type column on the resultset will return the reason why a request is waiting. To know the meaning of each wait type refer here

Popular few wait types are given below

CXPACKET - Wait beacuse of Parallelism.
LOGBUFFER - Waiting for the logbuffer to release some space in the memory.
LCK_M% - Waiting for lock to be released. Refer to Blocking_Session_id for finding the blocking process.

Other interesting columns returned by dm_exec_requests are wait_time returning time spent waiting in ms, percent_complete indicating progress of the query,
cpu_time,reads,writes,logical_reads and row_count. percent_complete mostly indicates the progress for select queries,backup process but not for DML operations.

So, a quick execution of the script and checking the wait state details can clearly
show the current progress/ status of the request.