Tuesday, March 27, 2012

Inserting UniCode / Special characters in tables


Consider the situation where one needs to store multilingual data / Special characters into a table. For example Chinese characters or Tamil characters. Most of the folks would be aware that one should use NVarchar column instead of Varchar column as Nvarchar column can store unicode characters. This post explains the problem one faces while inserting special characters from a query. Consider the following script to insert some special character data into database


CREATE TABLE #sample
  (
     id       INT,
     spl_char NVARCHAR(500)
  )
GO
INSERT INTO #sample
SELECT 1,
       '我的妻子塞尔伽'
GO
INSERT INTO #sample
SELECT 2,
       'மறத்தமிழன் ' 


The script executes successfully.Let us see the results. Refer to picture below.



We are surprised to see that the special characters are not inserted correctly. We have set the column as Nvarchar but still the special characters appear corrupted. Why?

The reason is when one is expilictly specifying the special character within quotation, one needs to prefix it with the letter N. For ex, while specifying 'மறத்தமிழன்', one needs to specify it as N'மறத்தமிழன்'. The reason is when a string is enclosed with single quotes, its automatically converted to Non Unicode data type or Varchar/char data type. Specifying the letter N before the quotes informs SQL Server that the next string contains unique code character and should be treated as Nvarchar.

Let us modify the script and try using inserting special / Unicode characters.


CREATE TABLE #sample
  (
     id       INT,
     spl_char NVARCHAR(500)
  )
GO
INSERT INTO #sample
SELECT 1,
       N'我的妻子塞尔伽'
GO
INSERT INTO #sample
SELECT 2,
       N'மறத்தமிழன் '
GO
SELECT *
FROM   #sample; 


The result shows that the multilingual characters are now correctly displayed.



So one shouldn't forget to include the letter N while specifying NVarchar or special characters explicitly.