Sunday, December 7, 2008

Best Practices - Querying - Avoiding Sub Queries

One of the first steps in basic query tuning is avoiding unneccasary sub queries. I understand this topic has been written again and again. But just couldnt resist writing one on it.


In general, Sub queries are costlier than a join. If a query can be achieved using a join its always the best way. If there is definitely a need to use a sub query, then 'Correlated Sub query' can be good option. Correlated Sub query is a case where a main query column becomes a part of sub query condition. Let us see a few examples.

Consider the query to find the list of customers who have not borrowed money. There are two ways of writing it.
Normal Sub query :

SELECT name
FROM   customer
WHERE  name NOT IN (SELECT customer_name
                    FROM   borrower)


Correlated Subquery:

SELECT name
FROM   customer c
WHERE  NOT EXISTS (SELECT customer_name
                   FROM   borrower
                   WHERE  c.name = customer_name)


In correlated sub queries one also has a chance of improving by creating an index on the sub query 'where' clause if necessary. When index is created on the sub query's where condition it would be extremely effective as the sub query table would not be scanned at all and only the index would be scanned. Also, Correlated sub query can be extended to few more attributes, easily. Ex:


SELECT name
FROM   customer c
WHERE  NOT EXISTS (SELECT customer_name
                   FROM   borrower
                   WHERE  c.name = customer_name
                          AND c.some_attribute = attribute)


Another scenario where sub queries can be easily avoided using a simple variable( or even a join ) is shown below.


SELECT name
FROM   cusotmer c
WHERE  name = (SELECT cusotmer_name
               FROM   borrower
               WHERE  loan_number = 123)



This can be changed to


DECLARE  @StrCustomerName VARCHAR(100)

SELECT @StrCustomerName = custmer_name
FROM   borrower
WHERE  loan_number = 123

SELECT name
FROM   customer
WHERE  name = @StrCustomerName



This would again reduce the IOs/Scans to a great extent. The reasons are obvious.So we can always say that we can replace a sub query to variable when



  • The sub query result is 1 row for sure.


  • The sub query is not correlated.

Another interesting scenario where a sub query can be with a variable is when finding average/highest in a group. Assume you want to find the student(s) with average score.


Standarad way of writing it would be


SELECT student_name
FROM   class
WHERE  MARK = (SELECT Avg
(MARK)
               FROM   class)


Optimized way would be


DECLARE  @intAvgMarks INT

SELECT @intAvgMarks = Avg
(MARK)
FROM   student

SELECT student_name
FROM   class
WHERE  MARK = @intAvgMarks


Simple is it not.. . There are few more workarounds for sub queries which I will try to cover the next few posts. So always try your best to say bye bye to sub queries :)