User defined Scalar function is often chosen as a easier alternative for joining two or multiple tables while writing queries by developers. Other reason given by 'Few' developers is that its provides better readability on a long query. Both are bad reasons for using functions as scalar functions when used inappropriately can potentially kill the performance.
Scalar function when used on a query is executed for every row of the result set, just like a cursor would do. Row by Row processing is very very expensive when compared to SET based processing provided by joins. Scalar functions also makes indexes on the joined tables less effective or even useless.Above factors make joins, in general, a better option when they can replace a scalar function.
We will see the same with an example.
Consider the following tables student , student_marks . Student table contains student details and student_marks contain marks obtained by students on 5 subjects. Table definition and random data population scripts are provided below.
/***********************************************/
/************* Table [dbo].[student_marks]******/
CREATE TABLE [dbo].[student_marks]
(
[sid] [INT] NOT NULL,
[marks] [INT] NULL,
[subject_name] [VARCHAR](50) NOT NULL,
[subject_id] [INT] NOT NULL,
CONSTRAINT [PK_student_marks] PRIMARY KEY CLUSTERED ( [sid] ASC,
[subject_id] ASC )
)
ON [PRIMARY]
GO
/****** Table [dbo].[Student] ******/
CREATE TABLE [dbo].[Student]
(
[sid] [INT] NOT NULL,
[student_name] [VARCHAR](50) NULL,
CONSTRAINT [PK_Student] PRIMARY KEY CLUSTERED ( [sid] ASC )
)
ON [PRIMARY]
GO
Random data population scripts
/*************Student table data population*********/
DECLARE @id INT
SET @id = 0
WHILE @id <= 100000
BEGIN
INSERT INTO student
SELECT @id,
CHAR(( @id % 26 ) + 65) + '-' + CONVERT(VARCHAR, @id)
SET @id = @id + 1
END
GO
/*************Student_marks table population**********************/
INSERT INTO student_marks
SELECT sid,
Round(Abs(( Rand(sid + x.subid) * 1000000000000 ) - Round((
( Rand(sid + x.subid) *
1000000000000 ) / 100 ), 0) * 100), 0),
x.subject,
x.subid
FROM student,
(SELECT 'math' AS subject,
1 AS subid
UNION
SELECT 'physics',
2
UNION
SELECT 'chemistry',
3
UNION
SELECT 'computer_science',
4
UNION
SELECT 'tamil',
5) AS x
The Scripts provided would insert 100,001 rows in student table and 500,005 rows student marks table.Primary key clustered index has been created on sid column on student table and sid,subject_id columns on student_marks table.
Assume that the requirement is to provide the student names and marks of all students in the school.Solution using scalar function would involve creating a function to get student name from student table and querying the student_marks table for obtaining rest of the details. Simple function definition would look like
/****** Object: UserDefinedFunction [dbo].[student_name] ******/
SET ansi_nulls ON
GO
SET quoted_identifier ON
GO
CREATE FUNCTION [dbo].[Student_name](@sid INT)
RETURNS VARCHAR(50)
AS
BEGIN
DECLARE @student_name VARCHAR(50)
SELECT @student_name = student_name
FROM student
WHERE sid = @sid
RETURN @student_name
END
After creating the function, the following query using scalar function is executed.
SET statistics io ON
GO
DECLARE @dt DATETIME
SET @dt = Getdate()
SELECT dbo.Student_name(sid),
*
FROM student_marks
WHERE sid < 20000
SELECT Datediff(ms, @dt, Getdate())
Perfomance Stats are provided below:
Time taken :55640 ms ( 0:56 Seconds)
Rows Returned :100000
IO details : Table Table 'student_marks'. Scan count 1, logical reads 415, physical reads 56, read-ahead reads 412, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Query plan:
The query plan doesnt show whats exactly happening with in the function/
how many times was the function precisely called.So, to find the same, let me use the DMV dm_exec_query_Stats
SELECT Substring(TEXT, ( statement_start_offset / 2 ) + 1, ( ( CASE
statement_end_offset
WHEN -1 THEN Datalength(TEXT)
ELSE statement_end_offset
END
- statement_start_offset ) / 2 ) + 1) AS statement_text,
Isnull(Object_name(objectid), '-') AS obj_name,
execution_count,
max_elapsed_time,
max_logical_reads,
max_logical_writes
FROM sys.dm_exec_query_stats
OUTER APPLY sys.Dm_exec_sql_text(plan_handle) AS sqltext
WHERE TEXT NOT LIKE '%dm_exec_query_Stats%'
ORDER BY execution_count DESC
Execution count column on dm_exec_query_stats DMV clearly shows that the query Select dbo.student_name(sid),* From student_marks where sid < 20000 in the function dbo.student_name has been executed 10000 times , which is exactly the same number of rows returned. So its basically passing a number ( sid ) and making the functions' code execute over and over which makes it painfully slow.
Let us Consider the join based solution:
SET statistics io ON
GO
DECLARE @dt DATETIME
SET @dt = Getdate()
SELECT student_name,
student_marks.*
FROM student_marks,
student
WHERE student_marks.sid = student.sid
AND student_marks.sid < 20000
SELECT Datediff(ms, @dt, Getdate())
Performance Stats :
Time taken : 2876 ms ( 2.9 Seconds)
Rows Returned: 100,000
IO details : Table 'student_marks'. Scan count 1, logical reads 415, physical reads 3, read-ahead reads 412, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Student'. Scan count 1, logical reads 60, physical reads 1, read-ahead reads 58, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Query plan:
Join based solution completes the query in less than three seconds, which is 20 times faster than Scalar functions. IO also indicates lesser number of reads. The student table is also read only once against 100,000 times as in the case of scalar functions. Query Stats DMV confirm the same.
So the conclusion is, one should always avoid using scalar functions as much as possible and should be used only when we don't have any other option to replace it.Complex string manipulation requirements or date calculation requirements are perhaps, plausible excuses for Scalar functions, but still they have to be used with caution as they have been found as performance bottlenecks more often than not.
Tuesday, May 25, 2010
Performance comparison - Functions versus Joins
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment