Monday, January 4, 2010

Difference between Left outer join and *=



As many of you may be aware *= is a Microsoft proprietary operator used as a equvivalent for left outer join. *= doesnt adhere to the ANSCI standards and is not supported from SQL 2k5 versions.Though both the operators are meant for the same purpose, there is a major difference in the way they operate. Let us run thro the difference.

Consider the following query


SELECT jobs.job_id,

       jobs.job_desc,

       employee.emp_id,

       employee.fname

FROM   jobs

       LEFT OUTER JOIN employee

         ON jobs.job_id = employee.job_id

WHERE  employee.job_id IS NULL


The 'job' table contains the list of jobs. The 'employee' table contains the employee details and the job_id which indicates the job assigned to the employee.The query attempts to find the job that is not assigned to any employee. The query returns a single row as shown below.



Now let us write the *= equvivalent of it.


SELECT jobs.job_id,

       jobs.job_desc,

       employee.emp_id,

       employee.fname

FROM   jobs,

       employee

WHERE  (jobs.job_id *= employee.job_id)

       AND employee.job_id IS NULL


The query returns all the rows in the jobs table instead of getting just the job table row that didn't have any associated employee. To understand why this happens let us first understand the way internally left outer join is processed.

Left outer join has 3 steps.

* Step 1 : Cartisian product of the two tables are obtained.
* Step 2 : On Condition clause is applied.
* Step 3 : Outer rows are added.ie For left outer join the left table unmatched rows are added.

After these three steps are applied, then the temporary result is passed for additional 'where' conditions. A look at the query plan will explain the same.



As you may notice the query plan shows left outer join operator which performs the steps 1,2,3 mentioned above. The Next 'Filter' operator in the plan is responsible for filtering out the where condition employee.job_id is null . So, to sum it up left outer join is processed first and then the filter. Let us now see the *= 's query plan.



*= 's plan shows the following.
* Step 1 : The job table ( outer table ) is scanned first.
* Step 2 : The required columns are gathered.
* Step 3 : The filters ( where clause ) is applied at one go.

The order in which the two filters ie.. (jobs.job_id *=employee.job_id) and employee.job_id is null are applied is purely in the hands of SQL Server's optimiser and the query doesnt have any control over it.So, when ' employee.job_id is null ' filter is applied first, then no rows on employee table would qualify as all rows on employee table have a non null value on job_id column. Then if jobs.job_id *=employee.job_id filter is applied, as there are no rows from employee table for a possible match of jobs.job_id =employee.job_id, all the rows from 'jobs' table are added to final resultset as outer rows ( null values on employee table column ).So, to sum it up, the 'where' condition filters are applied in any order unlike Left outer join and hence *= returns a unexpected resultset.

Same query if we rewrite using 'having' clause then it provides a plan similar to the one seen for left outer join.


SELECT   jobs.job_id,

         jobs.job_desc,

         employee.emp_id,

         employee.fname,

         employee.job_id

FROM     jobs,

         employee

WHERE    (jobs.job_id *= employee.job_id)

GROUP BY jobs.job_id,

         jobs.job_desc,

         employee.emp_id,

         employee.fname,

         employee.job_id

HAVING   employee.job_id IS NULL


After comparing both the options, one is better off using LEFT OUTER JOIN instead of *= as LEFT OUTER JOIN makes the query lot more readable and clear about what the query is doing. Also, *= is not supported in SQL 2k5 and hence the query should be rewritten at the time of migration.

No comments: