Monday, December 14, 2009

Error - Invalid length parameter passed to the SUBSTRING function.



This is my 25th post. kind of special. :)
So, I am starting a new cateogory called 'Errors'. Under Errors we will deal with those sudden errors which pop up out of the blue. A bit of ( sometimes a lot )
of investigation clears the air and provides a good story to tell.I have one such story to tell today.

I had one developer who came to my desk and said 'Suddenly my stored procedure is failing with the following error

'Invalid length parameter passed to the SUBSTRING function.'

The developer told me that the Stored procedure was running absolutely fine for the
last 2 years in production and there has been no change on it.( This is somethng all
developers say :) ). True, there was no change on the sp. With bit of debugging, i found that the following query was failing.


SELECT *

FROM   dbo.authors_test

WHERE  Charindex(' ',city) - 1 > 0

       AND auto_id IN (SELECT auto_id from authors_test

                       WHERE  Left(city,Charindex(' ',city) - 1) = 'San')



Its obvious that the query was failing because of the function
left(city,charindex(' ',city) -1 ).Function left(city,charindex(' ',city) -1 ) would get a negative 2nd parameter when there is no space in the column city. So the query would fail when it applies the function on any row which doesnt contain ' ' on the column city. But, however if thats the case then the query should have failed long back and not all of a sudden. And also, the developer claimed that as he has done the checking charindex(' ',city) -1 > 0 on outer query to filter out all the rows without the space are filtered out first.Let us check the actual reason the query failed.

The table had a primary key clustered index in auto_id. No other indexes.


SELECT *

FROM   dbo.authors_test

WHERE  Charindex(' ',city) - 1 > 0

       AND auto_id IN (SELECT auto_id from authors_test

                       WHERE  Left(city,Charindex(' ',city) - 1) = 'San')


When I 1st ran the query the table contained little over 90 records and it failed with the error
'Invalid length parameter passed to the SUBSTRING function.'

To check the claim of the application team that the query ran perfectly few days before,I took backup of the database that was few days old and ran the same query. It worked perfectly. The table contained around 50 odd rows and the query worked fine. When checked with application team on the row count, they replied saying the increase in the row count was normal and the data didnt have any dirty/junk/Null values on city column.

So difference in row counts gave me a bit of clue and I started comparing the query plans of the old and new table. First the old one had a query plan which is shown below.




As you can understand, the query plan works in the following way

Step 1 : filter out the rows using the condition
charindex(' ',city) -1 > 0.
uses a Clustered index scan as there are no useful index to use.
Output out of the Clustered index scan operator is auto_id as it
will be used to join with the sub query.

Step 2 : Use a Nested loop join operator to join with the sub query.
Step 3 : Uses a Clustered index seek operator with auto_id as input from
outer query.Clustered index seek operator uses the Clusered
index to directly jump to the row with auto_id it is interested
in. After finding the row directly it applies the conditon
left(city,charindex(' ',city) -1 ) only against the
auto_id selected.

Now let us look at the same query that fails on the latest database.



The estimated query plan( only estimated plan possible as the query fails and hence full plan is not available) shows two differences.

1) The Clustered index seek operator is replaced by Clustered index scan operator at the inner loop ( sub query ).
2) Nested loop join is replaced by Merge join.

The query plan clearly provides the reason why the query failed. As in the second case the Clustered index scan operator on the subquery would scan all the rows in the table for the condition left(city,charindex(' ',city) -1 ) and then match it with the input auto_id. So , in that case though the sub query will read all the values in the table which means charindex(' ',city) -1 ) > 0 condition
specified at the outer query is in vain. The query took a clustered index scan instead of seek as there were more rows in the table than before. The optimizer swithces from a seek to scan when it feels that the cost of seek( mostly IOs ) would be more than that of a scan. How it decides is a potential post on its own, which I will perhaps cover later.As of now, understand that a switch from seek to scan can occur beacuse of data distribution/missing statistics/fragmented indexes etc.

So, a change in query plan can not only slow down your system, but also cause runtime errors in scenarios like this. This also proves the importance of reindexing, as a seek can become a scan even when a reindexing
job fails.So, lessons learnt are

> Always ensure you read the query plan before you deploy
any code to production
> Try your best to visualize the table load and perform data
validations in a fool proof way.
> Make sure reindex jobs are running at the correctly, so
that the queries pick the correct plan.

Noticed that for such queries SQL 2k5 takes a cluster scan and hence errors are indicated early. But still, as any RDBMS changes query plans with respect to the factors explained above, its good to be wary of such scenarios.

No comments: