Monday, January 18, 2010

Recursive CTEs



CTE or Common table expression was introduced in SQL 2k5. CTE is like a run time/inline view, which provides temporary result during runtime.
Not to sure, how many developers are using it effectively, but it is one of the best things added in SQL Server 2005. Its definitely late to introduce CTE in 2010, So I wont. Refer here

Perhaps the best thing about CTE is the ease with which one can write recursive queries. Recurisve querying can be very handy especially in graph and tree traversals, parsing hierarchy etc.A sample recursive query and its query plan are provided below.

;
WITH cte_hierarchy

     AS (SELECT emp_name,

                emp_id,

                boss_id

         FROM   hierarchy

         WHERE  emp_id = @emp_id

         UNION ALL

         SELECT hierarchy..emp_name,

                hierarchy.emp_id,

                hierarchy.boss_id

         FROM   cte_hierarchy,

                hierarchy

         WHERE  hierarchy.emp_id = cte_hierarchy.boss_id)

SELECT *

FROM   cte_hierarchy

OPTION (MAXRECURSION 0);


I have a table hierarchy which has the columns emp_id to indicate id of the employee,emp_name for employee name and boss_id for immediate higher up id.
The query is attempting to find all the higher ups in the oraganization ladder
for a particular employee. ie..If a progammer's id is given as a input
the output would be each row for Programmerid ->Team lead -> Project manager ->Senior Project manager -> director -> CEO.

So How it works?

Step 1 -> On the cte_hierarchy's definition, the first part of the
definition, ie Select emp_name,emp_id,boss_id from hierarchy where emp_id =
@emp_id'
is executed. The upper part of the query ( non recursive member )
is called the anchor query.

Step 2 -> The results obtained from anchor query are placed in temporary
table.

Step 3 -> Recursive query runs on the temporary table. The Results at each
execution are loaded again to the temporary table. The recursive query is
executed till it returns zero rows.

The optimal index strategy for a recursive ctes is to create a index for the where clause of recursive query.If possible, one can try to cover the recursive query by including the columns in the select list.So, I have created a index on the hierarchy table on the column emp_id.


CREATE CLUSTERED INDEX [CIX_hierarchy_emp_id] ON [dbo].[hierarchy] (

      [emp_id])


Now let us understand the components in the query plan



Query plan picture is provided above.

Index seek operator: Right most one at the top. Used by the anchor query
for the condition emp_id = @emp_id

Compute scalar operator : Compute scalar is used to initialize the counter
which counts the number of recursive calls. Initially set to 0. Note that by default recursive calls are limited to 100. By using the hint OPTION (MAXRECURSION 0 ),we can get the maximum number of recursive calls possible which 32677.

Concatenation operator : Concatenation operator is used to combine the
results of anchor query and recursive query.

Nested loop operator : Nested loop operator marks the start of recursive
section.Used for iteration for executing recursion.

Index seek operator : Index seek operator is used by the recursive query
to fetch the required rows from the temporary table. The highlighted section
'Number of executes' indicates the number of times the query was called
recursively. As the recursive query is executed many number of times, its
crucial to pick the correct index for the recursive query and ensure its a
index seek and not a table scan. In other words, on recursive
CTE, recursive query should be optimized to provide best possible performance.

Compute Scalar : Compute scalar operator increments the counter to
keep track of number of recursive calls. This is to ensure that the number of
recursive calls dont exceed the maximum allowed by SQL Server. If 'OPTION
(MAXRECURSION 0)' was nt used, then a 'Assert operator' would have
been present to check the number of recursive calls.

Table / Lazy spool operator : Lazy spool operator used to store temporary
results obtained from each recursive call.


The Rest of the query plan is self explanatory.

I performed a performance comparison of recursive queries against conventional method. Will provide the results in the next post.

No comments: