Sunday, January 31, 2010

Database on a Network Drive



How to create a database on a network drive?

Though not a recommended practice, if you need to resort to this option, due to space constraints then follow the following steps.

Step 1:
DBCC traceon ( 1807 )

Step 2:

CREATE DATABASE [test] ON  PRIMARY

( NAME = N'test', FILENAME = N'\\Machinename\\network_shared_machinename\\path\test.mdf' , SIZE = 2048KB , FILEGROWTH = 1024KB )

 LOG ON

( NAME = N'test_log', FILENAME = N'\\Machinename\\network_shared_machinename\path\test_log.ldf' , SIZE = 1024KB , FILEGROWTH = 10%)


GO



The SQL service account/user account should have full rights to access the shared machine and the shared folder.Also note that you can only use the script to create the database, as management studio doesn't show shared drives even after enabling trace flag ( 1807 ).

Having a database on a network/shared drive makes the db more vulnerable to corruption issues and performance can also dip. So, certainly not a recommended practice, even for stage/development environments but perhaps can be used temporary arrangement.

Refer here for more details

Wednesday, January 27, 2010

Recursive CTE vs temp table - Performance comparison



I ended last post discussing the query plan of a recursive cte. On this one, we will see the performance comparison of recursive cte based solution against a while loop based solution.

The table structure remains same with three columns emp_id,emp_name and boss_id.
The problem statement is, when given a employee id, the query/TSQL batch should traverse the hierarchy ladder and provide all the boss_ids upto the CEO/ the top most position.

The query used and the query plan obtained for a recursive cte are provided here .
If we were do the same thing using a conventional technique say using a while loop,
then the TSQL batch would look like


SET nocount  ON
DECLARE  @dt DATETIME
DECLARE  @emp_id INT
DECLARE  @rowcnt      INT,
         @last_row_id INT
CREATE TABLE #full_hierarchy (
  rowid    INT    IDENTITY ( 1 , 1 )    PRIMARY KEY,
  emp_name VARCHAR(100),
  emp_id   INT,
  boss_id  INT)
SET @dt = Getdate()
SET @emp_id = 1
INSERT INTO #full_hierarchy
           (emp_name,
            emp_id,
            boss_id)
SELECT emp_name,
       emp_id,
       boss_id
FROM   hierarchy
WHERE  emp_id = @emp_id
SET @rowcnt = @@ROWCOUNT
SET @last_row_id = 0
WHILE (@rowcnt != 0)
  BEGIN
    INSERT INTO #full_hierarchy
               (emp_name,
                emp_id,
                boss_id)
    SELECT hierarchy.emp_name,
           hierarchy.emp_id,
           hierarchy.boss_id
    FROM   #full_hierarchy fh,
           hierarchy
    WHERE  rowid > @last_row_id
           AND fh.boss_id = hierarchy.emp_id -- Recurisve query
    
    SET @rowcnt = @@ROWCOUNT
    
    SET @last_row_id = @@IDENTITY - @rowcnt
  END
SELECT emp_name,
       emp_id,
       boss_id
FROM   #full_hierarchy
PRINT Datediff(ms,@dt,Getdate())
DROP TABLE #full_hierarchy


Query plan for the same is provided below.



The query plan that repeats at each recursive call is alone provided. I have a clustered index seek at the temp table and at hierarchy table which means the plan is pretty good. So when compared against the CTE based solution, we get the following results.



CTE wins hands down on performance when compared to temp table based solution.
But most important part is one needs to have the recursive part of the CTE indexed perfectly. Without the correct index on hierarchy table, I noticed that there was not much of a difference between CTE and temp table solution.So, the inference that we can draw out of these results is CTE uses indexes effectively and is definitively a better solution while solving recursion based problems.



PS: Shifting house last weekend. No internet connection @ home. So post delayed by 2 days :(

Monday, January 18, 2010

Recursive CTEs



CTE or Common table expression was introduced in SQL 2k5. CTE is like a run time/inline view, which provides temporary result during runtime.
Not to sure, how many developers are using it effectively, but it is one of the best things added in SQL Server 2005. Its definitely late to introduce CTE in 2010, So I wont. Refer here

Perhaps the best thing about CTE is the ease with which one can write recursive queries. Recurisve querying can be very handy especially in graph and tree traversals, parsing hierarchy etc.A sample recursive query and its query plan are provided below.

;
WITH cte_hierarchy

     AS (SELECT emp_name,

                emp_id,

                boss_id

         FROM   hierarchy

         WHERE  emp_id = @emp_id

         UNION ALL

         SELECT hierarchy..emp_name,

                hierarchy.emp_id,

                hierarchy.boss_id

         FROM   cte_hierarchy,

                hierarchy

         WHERE  hierarchy.emp_id = cte_hierarchy.boss_id)

SELECT *

FROM   cte_hierarchy

OPTION (MAXRECURSION 0);


I have a table hierarchy which has the columns emp_id to indicate id of the employee,emp_name for employee name and boss_id for immediate higher up id.
The query is attempting to find all the higher ups in the oraganization ladder
for a particular employee. ie..If a progammer's id is given as a input
the output would be each row for Programmerid ->Team lead -> Project manager ->Senior Project manager -> director -> CEO.

So How it works?

Step 1 -> On the cte_hierarchy's definition, the first part of the
definition, ie Select emp_name,emp_id,boss_id from hierarchy where emp_id =
@emp_id'
is executed. The upper part of the query ( non recursive member )
is called the anchor query.

Step 2 -> The results obtained from anchor query are placed in temporary
table.

Step 3 -> Recursive query runs on the temporary table. The Results at each
execution are loaded again to the temporary table. The recursive query is
executed till it returns zero rows.

The optimal index strategy for a recursive ctes is to create a index for the where clause of recursive query.If possible, one can try to cover the recursive query by including the columns in the select list.So, I have created a index on the hierarchy table on the column emp_id.


CREATE CLUSTERED INDEX [CIX_hierarchy_emp_id] ON [dbo].[hierarchy] (

      [emp_id])


Now let us understand the components in the query plan



Query plan picture is provided above.

Index seek operator: Right most one at the top. Used by the anchor query
for the condition emp_id = @emp_id

Compute scalar operator : Compute scalar is used to initialize the counter
which counts the number of recursive calls. Initially set to 0. Note that by default recursive calls are limited to 100. By using the hint OPTION (MAXRECURSION 0 ),we can get the maximum number of recursive calls possible which 32677.

Concatenation operator : Concatenation operator is used to combine the
results of anchor query and recursive query.

Nested loop operator : Nested loop operator marks the start of recursive
section.Used for iteration for executing recursion.

Index seek operator : Index seek operator is used by the recursive query
to fetch the required rows from the temporary table. The highlighted section
'Number of executes' indicates the number of times the query was called
recursively. As the recursive query is executed many number of times, its
crucial to pick the correct index for the recursive query and ensure its a
index seek and not a table scan. In other words, on recursive
CTE, recursive query should be optimized to provide best possible performance.

Compute Scalar : Compute scalar operator increments the counter to
keep track of number of recursive calls. This is to ensure that the number of
recursive calls dont exceed the maximum allowed by SQL Server. If 'OPTION
(MAXRECURSION 0)' was nt used, then a 'Assert operator' would have
been present to check the number of recursive calls.

Table / Lazy spool operator : Lazy spool operator used to store temporary
results obtained from each recursive call.


The Rest of the query plan is self explanatory.

I performed a performance comparison of recursive queries against conventional method. Will provide the results in the next post.

Monday, January 11, 2010

Job Owner,SQL Agent Account, Proxy accounts



For each SQL job that is created, there is a SQL/Windows account assigned as job owner.
The SQL Agent Service runs using a Windows account which is called the agent account.
It is interesting to see the account that is used to execute the jobs under different scenarios.


* When the job owner is a sysadmin account then the job runs using the SQL Agent account.

A small example is shown below. A sample job test is created. The job owner is set to SA.The TSQL step prints the current user name using the function SYSTEM_user.


Job owner is set to SA (Ref pic above ).


The step details are shown above.



As you can see that the job has been executed under Local System account which is my SQL agent account.

Just in case, if you dont want to execute under the agent account but on a different account then you can use the option run the job as option under advanced tab of each step.

But note that 'Run as ' option works only when the following are true.

* When the job owner is SA.
* When the Step type is T-SQL.

* When the job owner is a Non System Admin account, the SQL Job runs under the Job owner account.

Now I have changed the job owner to a Non Sysadmin account called 'Test'.
Note that 'Test' is ordinary SQL account.


By executing the job you can see from the image below that the job runs under context of the account 'Test'.


Now let me add another step which executes a simple operating system command.



Now the execution fails with the error 'Non-SysAdmins have been denied permission to run CmdExec job steps. The step failed.'



As you may understand clearly from the error message, When the job owner is Non Sysadmin one doesnt have the right to execute operating system commands.

This can be a sort of drawback in certain cases where one is not interested in giving full rights to all jobs. In such a scenario, SQL Server gives an option to create a proxy account and execute the job using SQL Agent proxy account ie., When a Non Sysadmin owned job executes 'Operating system command' step or
'Activex Script step' then it runs under the context of Proxy account. Note that the Proxy account should be a Sysadmin account on the SQL Server. Another requirement of proxy accounts are SQL Server's service account should be a part of Windows Admin group while using proxy account.More details on Proxy account can be found here.

Just a few steps on how to enable proxy account are shown below.
Under SQLAgent properties -> Job system tab ->Uncheck SQL Agent
Picture provided below.



Key in your User id / password and Domain details for your Proxy account.
The proxy acct I have configured is proxy_acct.



Now let us execute the job again to see the results.



The job executed successfully. Notice that the job ran under the context of proxy_acct.



Also, note that the T-SQL step still executed under the context of
'Test' account. The Operating system command executed under the Proxy account.

The proxy account creation steps explained here are applicable for SQL 2k.
For 2k5 refer here

Monday, January 4, 2010

Difference between Left outer join and *=



As many of you may be aware *= is a Microsoft proprietary operator used as a equvivalent for left outer join. *= doesnt adhere to the ANSCI standards and is not supported from SQL 2k5 versions.Though both the operators are meant for the same purpose, there is a major difference in the way they operate. Let us run thro the difference.

Consider the following query


SELECT jobs.job_id,

       jobs.job_desc,

       employee.emp_id,

       employee.fname

FROM   jobs

       LEFT OUTER JOIN employee

         ON jobs.job_id = employee.job_id

WHERE  employee.job_id IS NULL


The 'job' table contains the list of jobs. The 'employee' table contains the employee details and the job_id which indicates the job assigned to the employee.The query attempts to find the job that is not assigned to any employee. The query returns a single row as shown below.



Now let us write the *= equvivalent of it.


SELECT jobs.job_id,

       jobs.job_desc,

       employee.emp_id,

       employee.fname

FROM   jobs,

       employee

WHERE  (jobs.job_id *= employee.job_id)

       AND employee.job_id IS NULL


The query returns all the rows in the jobs table instead of getting just the job table row that didn't have any associated employee. To understand why this happens let us first understand the way internally left outer join is processed.

Left outer join has 3 steps.

* Step 1 : Cartisian product of the two tables are obtained.
* Step 2 : On Condition clause is applied.
* Step 3 : Outer rows are added.ie For left outer join the left table unmatched rows are added.

After these three steps are applied, then the temporary result is passed for additional 'where' conditions. A look at the query plan will explain the same.



As you may notice the query plan shows left outer join operator which performs the steps 1,2,3 mentioned above. The Next 'Filter' operator in the plan is responsible for filtering out the where condition employee.job_id is null . So, to sum it up left outer join is processed first and then the filter. Let us now see the *= 's query plan.



*= 's plan shows the following.
* Step 1 : The job table ( outer table ) is scanned first.
* Step 2 : The required columns are gathered.
* Step 3 : The filters ( where clause ) is applied at one go.

The order in which the two filters ie.. (jobs.job_id *=employee.job_id) and employee.job_id is null are applied is purely in the hands of SQL Server's optimiser and the query doesnt have any control over it.So, when ' employee.job_id is null ' filter is applied first, then no rows on employee table would qualify as all rows on employee table have a non null value on job_id column. Then if jobs.job_id *=employee.job_id filter is applied, as there are no rows from employee table for a possible match of jobs.job_id =employee.job_id, all the rows from 'jobs' table are added to final resultset as outer rows ( null values on employee table column ).So, to sum it up, the 'where' condition filters are applied in any order unlike Left outer join and hence *= returns a unexpected resultset.

Same query if we rewrite using 'having' clause then it provides a plan similar to the one seen for left outer join.


SELECT   jobs.job_id,

         jobs.job_desc,

         employee.emp_id,

         employee.fname,

         employee.job_id

FROM     jobs,

         employee

WHERE    (jobs.job_id *= employee.job_id)

GROUP BY jobs.job_id,

         jobs.job_desc,

         employee.emp_id,

         employee.fname,

         employee.job_id

HAVING   employee.job_id IS NULL


After comparing both the options, one is better off using LEFT OUTER JOIN instead of *= as LEFT OUTER JOIN makes the query lot more readable and clear about what the query is doing. Also, *= is not supported in SQL 2k5 and hence the query should be rewritten at the time of migration.