One of the first steps in basic query tuning is avoiding unneccasary sub queries. I understand this topic has been written again and again. But just couldnt resist writing one on it.
In general, Sub queries are costlier than a join. If a query can be achieved using a join its always the best way. If there is definitely a need to use a sub query, then 'Correlated Sub query' can be good option. Correlated Sub query is a case where a main query column becomes a part of sub query condition. Let us see a few examples.
Consider the query to find the list of customers who have not borrowed money. There are two ways of writing it.
Normal Sub query :
SELECT name
FROM customer
WHERE name NOT IN (SELECT customer_name
FROM borrower)
Correlated Subquery:
SELECT name
FROM customer c
WHERE NOT EXISTS (SELECT customer_name
FROM borrower
WHERE c.name = customer_name)
In correlated sub queries one also has a chance of improving by creating an index on the sub query 'where' clause if necessary. When index is created on the sub query's where condition it would be extremely effective as the sub query table would not be scanned at all and only the index would be scanned. Also, Correlated sub query can be extended to few more attributes, easily. Ex:
SELECT name
FROM customer c
WHERE NOT EXISTS (SELECT customer_name
FROM borrower
WHERE c.name = customer_name
AND c.some_attribute = attribute)
FROM cusotmer c
WHERE name = (SELECT cusotmer_name
FROM borrower
WHERE loan_number = 123)
SELECT @StrCustomerName = custmer_name
FROM borrower
WHERE loan_number = 123
SELECT name
FROM customer
WHERE name = @StrCustomerName
This would again reduce the IOs/Scans to a great extent. The reasons are obvious.So we can always say that we can replace a sub query to variable when
- The sub query result is 1 row for sure.
- The sub query is not correlated.
Another interesting scenario where a sub query can be with a variable is when finding average/highest in a group. Assume you want to find the student(s) with average score.
Standarad way of writing it would be
SELECT student_name
FROM class
WHERE MARK = (SELECT Avg(MARK)
FROM class)
Optimized way would be
DECLARE @intAvgMarks INT
SELECT @intAvgMarks = Avg(MARK)
FROM student
SELECT student_name
FROM class
WHERE MARK = @intAvgMarks
Simple is it not.. . There are few more workarounds for sub queries which I will try to cover the next few posts. So always try your best to say bye bye to sub queries :)