Scalar functions as discussed in the last post can be a performance bottleneck.Another place where Scalar functions ( both user defined and system ) can become a performance bottleneck is when used on a where clause of the query. Scalar functions when used on a where clause of a query can make the optimizer pick a table scan, even if the index is present, making the index useless.
/****** Object: Index [ncix_Supplier_product_customer_trans_date] ******/
CREATE NONCLUSTERED INDEX [ncix_Supplier_product_customer_trans_date]
ON [dbo].[Supplier_Product_Customer] ( [trans_date] ASC )
ON [PRIMARY]
A non clustered index on trans_date column of supplier_product_customer table has been created.When one needs to find the list of transactions for a particular day ( cutting the time part of the , the most common method of writing the query would be
SELECT *
FROM [Supplier_Product_customer]
WHERE Datediff(dd, [trans_date], '11/19/2009') = 0
But,the problem with such a query is that it takes table scan instead of using the Non clustered index created on trans_date. The reason is that scalar function datediff used on trans_date column stops the query analyzer from using the index.
A better way of writing it would be
SELECT *
FROM supplier_product_customer
WHERE trans_date >= '20091119'
AND trans_date < '20091120'
The picture above shows a non clustered index seek instead of table scan. There are few other common scenarios where functions can be avoided. Let me briefly list them here
* Left() string fuction can be effectively replaced by like
'str%'
* Usage of upper/lower string functions can be avoided
when the database has a case insensitive collation.
* isnull(col1,'xyz') = 'xyz' can be replaced by col1 = 'xyz'
or col1 is null
Note that OR conditions do use indexes and but at times they don't.
Please check before use.
* Getting data older than 60/n days query.
Standard way of doing it would be
Where datediff(dd, trans_date, getdate()) > 60
Replaced by
Where trans_date < CONVERT(VARCHAR, getdate()-60, 112)
To Conclude, one should try the best to avoid having scalar functions in where clauses.
Sunday, June 6, 2010
Scalar functions - on where clause
Subscribe to:
Post Comments (Atom)
4 comments:
Thanks for the post!
Could where clauses possibly return the wrong information?
select m.id
from table m
where m.course in
('05','06','07','08','09','10')
and m.type = 'AuditOnly'
and m.gender = 'M'
returns zero, but:
select m.id
from table m
where m.course in
('05','06','07','08','09','10')
intersect
select m.id
from table m
where m.type = 'AuditOnly'
intersect
select m.id
from table m
where m.gender = 'M'
returns a few hundred. Why the difference?
Interesting comment Anony. Thanks. The difference is because the way in which intersect operates. There is no problem with where clause.
Interesct executes each part of the query individually and then prints the results like a set interesection ie prints the rows that have the same value on the select column.But interesect doesnt check whether all the 3 conditions are satisfied on the same row. Meaning interesect doesnt check whether the your m.id at all the three queries comes from the same row. As where clause applies all the conditions at one go it checks whether all the conditions are satisfied for the same row. Grab a look at the sample query posted by me below..
create table #temp(id int, c1 varchar(5),c2 varchar(5))
Insert into #temp Select 1,'A','Z'
Insert into #temp Select 1,'B','Y'
Insert into #temp Select 1,'C','X'
Select * from #temp
where id = 1 and c1 = 'B' and c2 = 'X'
Select id
from #temp where id = 1
intersect
Select id
from #temp where c1 = 'B'
intersect
Select id
from #temp where c2 = 'X'
Please post if still not clear or if you have any more doubts.
Thanks! However, what if the ID is unique?
If the ID is unique, then the results for the case explained should be same.
Post a Comment