Sunday, December 31, 2017

Azure SQL Database - Backup

Most of you must be knowing that backup in Azure SQL Database fully automatic. After the database is created and the database is automatically backed up by Azure. One can restore to any point in time for a period of 7 days for the basic subscription and 35 days for standard and premium.

Most of the above are already known to you. But, what can be little surprising for someone learning Azure SQL Database is, one can never access backup files used for recovering the database. Meaning, you can only restore the database as another azure sql database but you can never download the automatic backup files like you would do when you configure backups on your on-premise database.

So, what do you do when you need the backup file of Azure SQL Database for reference?

Create a bacpac file backup (similar to your full back up on - premise database ) of Azure SQL Database


1) Connect to the Azure SQL Database via SSMS
2) Right click tasks-> Export -Data-Tier Application. Follow the screenshots below to manually take a backup of Azure SQL Database




So, if you need an older backup of the azure database (yesterday 8:00 am backup), then recover the older version of the database using the Azure portal to a different azure sql database and extract the bacpac file using the steps explained above.

Saturday, December 23, 2017

My Interview at Data Platform Geeks 2017

Dear All,

Earlier this year in August, I had the privilege of speaking at Data Platform Summit 2017 at Bangalore. As always a terrific experience to hangout, interact and learn from the best in the business. Here is my interview to DPS Team about my experience at the summit






Adaptive Query Processing - Adaptive Joins - Plan resue - 3



Continuing on the series on Adaptive Query Processing covered over here, this post would cover what happens to adaptive joins when plans are reused.

To give a little bit of context to the post, the earlier posts in the series explained how SQL Server 2017's Adaptive joins made it possible for SQL Server to switch between nested loop join and hash join in query plan during runtime depending upon the number of rows returned by one of the joining tables. "Adaptive join threshold" is a row count threshold used to decide whether to go for Nested loop join or hash join.
So, the question is what happens when the query plan is reused? Does the Query plan dynamically switch between the nested loop and hash join when the query plan is reused or does it just go with the last choice? What is the role of "Adaptive join threshold" when the plan is reused? To check it out, let's test it


DBCC FREEPROCCACHE

GO

EXECUTE sp_executesql

         N'SELECT  [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]

FROM    [Fact].[Order] AS [fo]

INNER  JOIN [Dimension].[Stock Item] AS [si]

       ON [fo].[Stock Item Key] = [si].[Stock Item Key]

WHERE   [fo].[Quantity] = @quantity', N'@quantity int',@quantity = 360



EXECUTE sp_executesql

         N'SELECT  [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]

FROM    [Fact].[Order] AS [fo]

INNER  JOIN [Dimension].[Stock Item] AS [si]

       ON [fo].[Stock Item Key] = [si].[Stock Item Key]

WHERE   [fo].[Quantity] = @quantity', N'@quantity int',@quantity = 130



Step 1: Clearing the cache
Step 2: Run the query using sp_executesql and pass the variable, so that plan is reused
Step 3: Run the same query with a different parameter value, so that the plan is reused with a different value.

Don't forget to turn on the Show query plan option to see the query plan.
Query plans for both the queries provided below







Both use Adaptive Joins which is a good news.
Now, let's compare the properties of Adaptive Join operator to understand more




First observation: Actual join type for both executions are different. So, Adaptive Join switches the join operator dynamically even when the plan is reused
 Second Observation: Adaptive threshold of rows remains the same and this implies it is fixed at the compile time or when the plan was first generated. Hence "Estimated Join Type" also remains the same.


 So, to summarize, Adaptive Join threshold is computed at the compile time. So, when the plans are reused, the same adaptive join threshold is used for all executions. However, Adaptive join still switches the join operator at each execution, using the adaptive join threshold calculated at the first time the plan was generated.

Saturday, December 2, 2017

Data Channel Interview - 09 - Mr.Madhivanan on Best Practices in TSQL Programming

Dear All,

Very happy to share the 9th interview of #DataChannel. Mr.Madhivanan Ramachandran, 10 time Microsoft MVP Award winner and a Data Platform expert from India discusses "Best Practices for TSQL Programming".  Mr.Madhivanan demystifies quite a few concepts with his to the point answers in this interview. Happy Learning!!!





Regards,
Nagaraj 

Wednesday, November 29, 2017

Adaptive Query Processing and Automatic Tuning - Adaptive Joins - 2

Continuing from the introduction written over here, let’s look at one of the aspects of Adaptive Query Processing which is Adaptive Joins.

Before we start Adaptive Joins, SQL Server's Query optimizer internally has 3 join types.

Nested Loop Join - Usually picked by optimizer, when tables participating in the joins are small or when one of the table is big and indexed on joining column
Hash Join: One of the table is big but doesn't have an index
Merge Join: Both the tables are big, and the joining column is indexed

So, the size of the table or the rows participating in the join plays a major role SQL's Query optimizer picking the correct join type. Let’s say, SQL Server estimates few rows participating in the join at compile time and picks "nested join" but during runtime, the join extracts larger number of rows and thereby making the choice of plan ineffective. 

SQL Server 2017's adaptive join addresses this problem. With Adaptive Join, SQL Server comes with plan attribute called "Adaptive Row Threshold" which is a row count threshold estimated by SQL Server.  "Adaptive Row Threshold" will help SQL Server dynamically alter the plan choice at run time. Adaptive join in short will work in the following way

  • If the number of rows participating in the join are greater than "Adaptive Row Threshold" rows, then "Hash Join" operator is used for join operation
  • If the number of rows participating in the join are lesser than "Adaptive Row Threshold" rows, then "Nested loop Join" operator is used for join operation
Consider the following query:

SELECT  [fo].[Order Key], [si].[Lead Time Days], [fo].[Quantity]
FROM    [Fact].[Order] AS [fo]
INNER JOIN [Dimension].[Stock Item] AS [si]
       ON [fo].[Stock Item Key] = [si].[Stock Item Key]
WHERE   [fo].[Quantity] = 360;

Observe the picture below:

Query Optimizer fixes 68 rows as Adaptive join threshold.
Estimated number of rows for the outer table in the join was 213. As 213 is greater than 68, estimated join type is Hash Match Join. During runtime, 206 rows from outer table participated in the join and as it was greater than "Adaptive join threshold" of 68, Hash match join was selected.
Sharing another example where Adaptive Join switches the join type in runtime. Refer to pic below

In this case, threshold was 46 rows and estimate were 87 rows. However, runtime row count was just 36 rows hence switching the join type to Nested loop join from an estimate of "Hash Join"
 The ability to alter execution plan operator in runtime would make SQL Server's query optimizer weed out most of the incorrect plan choices during runtime.

Saturday, November 25, 2017

Tricks - Copy T SQL Result to Excel

Sometimes, some of the silliest of tricks save so much of time and effort. Very common scenario is to copy a query result to excel. Most take either of the 2 approaches

* Copy the result from result grid of management studio and paste to Excel as shown below


The problem with above approach is sometimes, when one pastes the result to excel, the formatting can go haywire, especially if the result contains carriage return (new line) and commas as shown below.


* Export to excel via export wizard

Little tedious and sometimes getting the mapping right can be a pain.

Over the years, the simplest and effective technique, I have used is running T SQL query directly from excel (Yes, you can do that :)). Not just that, it gives a cleanly formatted result all the time!!! Step by step instructions provided below

How to run TSQL Query from Excel:

Open Excel and move to data tab and click on the icon "Other Data Sources" and select "SQL Server" as shown below.


Enter the server to connect to and the credentials


Select the database against which you want to run the query. Pick a random table (doesn't matter if you are not going to query it. Any table is fine)


Click next and finish (we are not done yet :))



Click on properties



Uncheck background refresh check box



Click on the "Definition" tab and change the command type to "SQL" from "Table"



Paste the query you want to run on the "Command Text" box as shown below



Result would be a nice formatted excel sheet as shown below



From excel once can not only get select query results but stored procedures as well. Excel also servers as an excellent tool to query database when one needs to access the database on a machine that doesn't have SQL Server Management Studio. Hope the tip was useful!!!

Sunday, November 12, 2017

Adaptive Query Processing and Automatic Tuning - Introduction - 1


SQL Server 2017 and Azure SQL Database are being touted as the self learning, artificial intelligent database. One of the main reasons for the same is due to release of the following features

 1) Adaptive Query Processing

2) Automatic Tuning
Adaptive Query Processing is SQL Server's first attempt using artificial intelligence to fix poor performing queries due to incorrect cardinality estimates. Query optimizer produces better plans using "Adaptive Query Processing" which is enabled by default when one is on SQL 2017's compatibility mode (CE 140). Queries can benefit via Adaptive Query Processing in the following scenarios.

1) Adaptive Join -> Query optimizer makes a intelligent choice between Nested Loop and Hash Join in runtime

2) Memory Grant Feedback -> Query optimizer adjusts the memory granted to a query depending upon the previous run's execution results

3) Interleaved Execution -> Portion of the query is executed while generating the query plan to come with the estimates. Applies currently for Multi Statement Table Valued Functions alone

 The above scenarios and operators are likely to expand in upcoming releases or patches

 Other aspect of auto driven query tuning is "Automatic Tuning" feature. Automatic Query tuning helps in the following scenarios
 Automatic Tuning:

1) Force Last Good Plan: Attempts to fix query regression by detecting any sudden change in query performance due to plan change and forcing the last good performing plan. one can turn it on using the command below
ALTER DATABASE DBName SET AUTOMATIC_TUNING ( FORCE_LAST_GOOD_PLAN = ON );
2) Automatic Index creation and deletion: Feature available only in Azure SQL Database. By studying the workload, the database engine automatically deletes and creates indexes!!!
This is just a introduction post. As usual,will be writing a series of posts to cover the individual sections in detail.

Friday, October 27, 2017

SQL Azure Databases - Active Secondary - Failover Groups - Part 4

150th Blog Post . So, the post involves video demo :)

Previous posts on this series was about using "Geo Replication" to configure active secondary. However, it comes with few limitations like

* No transparent failover capabilities like Always on Listener

* No availability groups and group failover possible

So, Azure Failover Groups provides "Read Only" / "Read Write" connection URLs which allows applications transparent failovers. Read only and Read write workloads get different connection strings and upon failover, applications connect to the appropriate node automatically

Also, one can combine group of databases as a unit of failover and make them failover together as well, as done in Always on availability groups. 

Step by step instructions on configuring availability groups provided below

* Configure Primary Server, Secondary Server and Geo Replication as explained in http://strictlysql.blogspot.sg/2017/05/sql-azure-databases-active-secondary.html

* Click on the primary server ( not database ) and select failover groups


* Click on the add group and

          * Provide the failover group a name
          * Pick the existing database "db1" to be added to the failover group
          * Pick the existing secondary server
          * Click on create


* Once done, failover group is listed as shown below


* Click on the failover group, Read only and Read Write URLs can be seen as shown below


* To perform failover click on the failover icon. Post failover, failover group looks like the picture below


To make the understanding easier, I have recorded the video of the demo. Please take a look. Happy learning



Wednesday, September 6, 2017

Dynamic Threshold for Statistic Update - SQL Server 2016

By default, on a SQL Server Database, statistics are updated automatically when one hits a threshold of changes. If the Auto Update Statistics setting is turned on by default, the threshold for number of changes to trigger auto statistic update before SQL Server 2016 was

For tables with more than 500 rows - 20% of tables row count + 500 changes
For tables with less than 500 rows - 500 changes

However, in modern day times, the above formula may not be effective. 10 million row tables is pretty much a norm and not typically considered huge. The threshold of changes for 10 million row table would be 2,000,500 changes, which is a huge number of changes required to trigger auto stat update. This effectively implies, on larger tables one would be running without updated statistics almost all the time as the threshold is too high as the tables get bigger. To address this problem, SQL Server 2016 has a behavior called dynamic statistic threshold,  which is enabled by default.


As the table gets bigger, the threshold for statistic update reduces. For tables greater than 25,000 rows dynamic statistics is applicable. For example consider the following table with 1 million rows

Let’s update statistic before starting the task



Make note the updated stat time (12:12 AM)

Now let’s run the following script to make 100,000 changes (10%)

SET ROWCOUNT 100000
GO
/* SET Rowcount so that we can update just 100K rows */

Update Production.product_1M
SET StandardCost = StandardCost + 5
WHERE  Productid % 5 = 0
GO

Please note on older versions of SQL Server, by default, 10% changes wouldn't suffice to trigger auto stat update.


Now let’s check if statistics are updated. A select is query has to be fired on the table to trigger auto statistic update.


Surprisingly they are updated as the statistic updated time has changed to 12:15 AM.

Though there is no official exact formula how statistic update has been triggered, following pic shows how the threshold percentage reduces with row count of the table.


Picture referred from the following blog: https://blogs.msdn.microsoft.com/saponsqlserver/2011/09/07/changes-to-automatic-update-statistics-in-sql-server-traceflag-2371/

Dynamic Statistic update threshold is not a new feature and has been available since SQL Server 2008 via the trace flag 2371. Meaning, one can enable "Dynamic Statistic update threshold" on SQL Server 2008 and after by turning the trace flag 2371 on. On SQL Server 2016 and after, "Dynamic Statistic update threshold" has been made a default behavior.

Sunday, September 3, 2017

DBCC CloneDatabase for SQL Server 2012

Dear All,

In case you are wondering, title of the post is not a typo :) DBCC CloneDatabase is going to be available for SQL Server 2012 :)

For folks who don't know, DBCC CloneDatabase is a feature introduced in SQL Server 2014 SP2 for creating a skeleton copy of the database. When I mean skeleton copy, I mean without the data but the exact schema, table structure, index, constraints, file and file groups and the best of all - table and index statistics. Using DBCC CloneDatabase, one can actually get a exact copy of the database ( including actual statistics ) but without the data. This is extremely useful in performance troubleshooting as one can clone a production database using DBCC CloneDatabase and backup the cloned database and restore to DEV / UAT to get estimated plans. SQL Server 2016 onwards Querystore's data and reports are also included to the cloned database. One can read more about DBCC CloneDatabase over here

Coming back to the title of the post, Microsoft product team will be extending DBCC CloneDatabase to SQL Server 2012 as a part of SQL Server 2012 Service Pack 4. SQL Server 2012 Service Pack 4 ( SP4 ) is due to be released in September 2017. A excellent move by product team to extend cloned database SQL Server 2012. Check out the official announcement of SP4 over here. More importantly, observe the comments section :) 

Monday, August 28, 2017

Data Channel Interview - 08 - Leila Etaati on Machine Learning using Microsoft BI

Dear All,

Very happy to share the 8th interview of #DataChannel. Dr. Leila Etaati MVP, MCT from New Zealand discusses Machine Learning using Microsoft Business Intelligence tools in this interview. Leila gives us an insight into Machine Learning process and explains how Azure ML, Power BI, R and SQL Server can be used in it. Thanks Leila for the informative interview




Friday, June 9, 2017

SQL Azure Databases - Active Secondary - Failover - Part 3

Continuing on the series on SQL Azure Databases - Active Secondary, this post will explore the task of failing over to secondary.

Fairly straightforward task, just one needs to take care of few items.

Step 1: Login into portal.azure.com and click on Databases. The primary and secondary databases are listed



Step 2: Click on the primary and then click on "Geo Replication"



Step 3: Scroll down and click on the secondary as shown below


Step 4: Click on Failover as shown below


Step 5: Click "Ok" to confirm Failover



Step 6: The dotted lines between primary and secondary, showing the failover progress, turn to solid line upon completion of failover


Step 7: After failover, the replication role changes confirming the successful failover as shown below



Step 8: Secondary server accepts write operations as shown below


Few important things to know

This failover is like a DB Mirroring failover. This means application will need to change their connection strings to secondary server upon failover. Unlike, Always on Availability groups, azure geo replication explained above, doesn't support transparent application failover.

However, Azure Failover Groups (right now in preview) allows transparent failover for applications. More about that on next post :)

Saturday, May 27, 2017

SQL Azure Databases - Active Secondary - How to connect? - Part 2

As a continuation of "SQL Azure Databases - Active Secondary" series started earlier, this post will cover how to connect to a azure SQL Database primary and secondary after configuring "Geo Replication".

To some, it may sound like "what is the big deal in connecting to, after configuring it". If you are new to azure, it can take a while for you figure out how to connect and at sometimes it can turn out to be one of those annoying simple things, which refuse come off in easily :)

As described in the earlier post, we have configured the following

* Database called "rajprimary" on "rajprimarysvr" server
* Geo replicated database copy of "rajprimary" on "rajsecondary" server  


To connect to the primary or the secondary server, follow the steps provided below

Configure the firewall:


* Login to your portal.azure.com

* Click on SQL Databases on the left 





* Click on the primary / secondary databases ( as shown under replication role), depending upon the one you would like to connect to


* Click on set firewall as shown below


* Firewall settings are turned on and IP Addresses that are listed alone are allowed to access. Click on "add client IP" for allowing the machine that you are using, to connect to the azure database. 



* You may add other client IP Addresses to, if they are to connect to the database. Click on save after adding.



Connect to the database

* Start sql server management studio on the machine you are logged on, and connect to ".database.windows.net". In this case it would be rajprimarysvr.database.windows.net

 if you are not sure of your database servername, you may look at by clicking on "sql databases" -> "databasename" -> "Properties"

 * Provide the user account and password used while setting up the server

Upon connecting, one would notice that only database related components are visible. Server related components like SqlAgent, Server Error logs are not seen as we are using "Sql database" service on azure.


Same steps can be repeated for secondary server too. More on Azure - Geo replication in upcoming posts. 

Thursday, May 11, 2017

Find the number of rows for a value, without querying the table

Dear all,

Being a DBA, developers approach me with questions like can you run this query for me in production?

Select *
FROM
[dbo].[FactInternetSales]

WHERE UnitPrice = 32.6

Lets, say the table contains few billion rows and UnitPrice is not an indexed column, then query would take ages to provide the result. Not just that, it causes a massive I/O on a busy production database.  

For the above situation, I would usually ask the developer, "Would you need a accurate value or is it ok if it is approximate?". If approximate numbers are fairly sufficient then, one can answer these type of questions without actually querying the table. How?

Simple - STATISTICS is the word :)

To find the statistic that will be useful for the query, please use the following script and provide the table name and column name. The script provides the statistic name we should be checking.

Declare @table_name varchar(100)
Declare @Column_name varchar(100)

SET @table_name = 'FactInternetSales'
SET @Column_name = 'UnitPrice'

SELECT OBJECT_NAME(s.object_id) AS object_name, 
    COL_NAME(sc.object_id, sc.column_id) AS column_name, 
    s.name AS statistics_name 
FROM sys.stats AS
JOIN sys.stats_columns AS sc 
    ON s.stats_id = sc.stats_id 
AND s.object_id = sc.object_id 
WHERE  
OBJECT_NAME(s.object_id) like @table_name
and    
COL_NAME(sc.object_id, sc.column_id) like @Column_name
ORDER BY s.name;

After finding the statistic name, just go to the table, expand "Statistics" and double click on the stat that you are interested in. The click on the details section, find the value interested in on "Range_HI_Key" Column and observe the "EQ_Rows" section to get the approximate number of rows. 



Most of you would know that statistics can be used to find query estimates but few (I guess) would use it operationally for these kind of requests. One can also use them to estimate for queries of this type too

Select count(*), 
ProductKey 
from [FactInternetSales]
Group by ProductKey

Select count(*) from
[FactInternetSales]
WHERE UnitPrice > 40  -- ( Possible but can get little tricky at times, more of it on upcoming posts :) )

Few quick pointers:

The above method relies on the following prerequisites

* Either Auto create Stats should be turned on or the column should be the leading column of any index for the statistic to be present. Auto Create Stats are on by default

* For the stat to be reasonably accurate, one needs to have Auto Update Statistics on ( which is also "on" by default )

* Automatically created statistics are named like '_WA%'. For a column to have a auto created stat, the column should have been queried at least once since database creation. Indexed column would not need auto created statistics

* To figure out when was the statistic last updated, use the following query 

SELECT 
sp.stats_id, name, filter_definition, last_updated, rows,
rows_sampled, steps, unfiltered_rows, modification_counter  
FROM sys.stats AS stat  
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp 
WHERE stat.object_id = object_id('people_Data')
and name like '_WA_Sys_0000000D_1273C1CD';

If the value you are looking for doesn't appear as a "Range_HI_Key" on the histogram then refer to the next biggest value on the "Range_Hi_Key" and observe "Avg_Range_Rows" for the estimates. Please refer to this post http://strictlysql.blogspot.sg/2016/01/sql-server-statistics-histograms.html for interpreting histograms.

There is always more to statistics. Will cover them in upcoming posts.