Thursday, May 11, 2017

Find the number of rows for a value, without querying the table

Dear all,

Being a DBA, developers approach me with questions like can you run this query for me in production?

Select *
FROM
[dbo].[FactInternetSales]

WHERE UnitPrice = 32.6

Lets, say the table contains few billion rows and UnitPrice is not an indexed column, then query would take ages to provide the result. Not just that, it causes a massive I/O on a busy production database.  

For the above situation, I would usually ask the developer, "Would you need a accurate value or is it ok if it is approximate?". If approximate numbers are fairly sufficient then, one can answer these type of questions without actually querying the table. How?

Simple - STATISTICS is the word :)

To find the statistic that will be useful for the query, please use the following script and provide the table name and column name. The script provides the statistic name we should be checking.

Declare @table_name varchar(100)
Declare @Column_name varchar(100)

SET @table_name = 'FactInternetSales'
SET @Column_name = 'UnitPrice'

SELECT OBJECT_NAME(s.object_id) AS object_name, 
    COL_NAME(sc.object_id, sc.column_id) AS column_name, 
    s.name AS statistics_name 
FROM sys.stats AS
JOIN sys.stats_columns AS sc 
    ON s.stats_id = sc.stats_id 
AND s.object_id = sc.object_id 
WHERE  
OBJECT_NAME(s.object_id) like @table_name
and    
COL_NAME(sc.object_id, sc.column_id) like @Column_name
ORDER BY s.name;

After finding the statistic name, just go to the table, expand "Statistics" and double click on the stat that you are interested in. The click on the details section, find the value interested in on "Range_HI_Key" Column and observe the "EQ_Rows" section to get the approximate number of rows. 



Most of you would know that statistics can be used to find query estimates but few (I guess) would use it operationally for these kind of requests. One can also use them to estimate for queries of this type too

Select count(*), 
ProductKey 
from [FactInternetSales]
Group by ProductKey

Select count(*) from
[FactInternetSales]
WHERE UnitPrice > 40  -- ( Possible but can get little tricky at times, more of it on upcoming posts :) )

Few quick pointers:

The above method relies on the following prerequisites

* Either Auto create Stats should be turned on or the column should be the leading column of any index for the statistic to be present. Auto Create Stats are on by default

* For the stat to be reasonably accurate, one needs to have Auto Update Statistics on ( which is also "on" by default )

* Automatically created statistics are named like '_WA%'. For a column to have a auto created stat, the column should have been queried at least once since database creation. Indexed column would not need auto created statistics

* To figure out when was the statistic last updated, use the following query 

SELECT 
sp.stats_id, name, filter_definition, last_updated, rows,
rows_sampled, steps, unfiltered_rows, modification_counter  
FROM sys.stats AS stat  
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp 
WHERE stat.object_id = object_id('people_Data')
and name like '_WA_Sys_0000000D_1273C1CD';

If the value you are looking for doesn't appear as a "Range_HI_Key" on the histogram then refer to the next biggest value on the "Range_Hi_Key" and observe "Avg_Range_Rows" for the estimates. Please refer to this post http://strictlysql.blogspot.sg/2016/01/sql-server-statistics-histograms.html for interpreting histograms.

There is always more to statistics. Will cover them in upcoming posts.


Sunday, May 7, 2017

SQL Azure Databases - Active Secondary - How to configure? - Part 1

Dear all,

I have taken the azure dive at last :)
In case if you are new to azure, azure is the Microsoft cloud platform service, offering lots of services for handling data platform tasks. There are tons and tons of material on it and perhaps you can get a good introduction on the following links - here and here

This post is about SQL Azure Database - a service (a PAAS service ) which allows one to host just a database in cloud and a feature of it called "active secondary". "Geo Replication" - Active Secondary is a feature in SQL Azure in premium service tier ( similar to our enterprise edition but certainly not as costly :) ), which synchronizes data asynchronously to another server (either in the same geographical location or different ) and allows one to run read only queries to it.

In my opinion, the below specifications are perhaps the most common requirement of most modern day mid size applications.

  •  A database with good performance to store and retrieve data
  • A simple disaster recovery / high availability solution
  • Readable secondary if possible

All these with a really affordable cost ( will discuss that part soon ) , just in few clicks is what makes SQL Azure tick. This post will run thro the details on how to configure it.

Pre requisite:

1) Grab a Microsoft Azure subscription :)

 Demo :

Now for the step by step  "how to do" screenshots :)

1) Login to portal.azure.com and Click on SQL Databases -> Click on Create SQL Database




2) Provide the server name. Though one pays only for the database, one just needs to create a server to host the database. Server will be maintained by azure and no charges are applicable for the server while using  “sql azure database” service.



 


Provide unique names for


·         Database name
·         Resource group ( more about it later )
·         Server name
 
 3) Click on "Configure required settings" and provide Server Details to be provided as shown below
 
 
4) Set the service level to "Premium"
 
One needs to pick Premium to have readable secondary
5) Click on the create database to complete primary setup. Creation takes few minutes

 
 
6) Configure GEO replication - secondary

Once the notification bar indicates that the primary has been created, click on sql databases to find the primary database created.
Pick “Geo Replication” as shown below


7) Configure GEO Replication

The screen below allows one to pick the geographical location of the secondary server. If one needs lesser latencies, closer locations to primary are preferred

 

8) Once location is picked, one is prompted to fill in the rest of the details of secondary server
 
 
 9) Create a secondary server similar to primary, by providing details like servername and admin login name
 
 
10) Create the secondary server
 
 

Upon clicking ok, Primary and secondary starts to synchronize as shown below

 
11) Synchronization completion
The completed tick marks will indicate synchronization completion as shown below
 
 
Synchronization can take few minutes or longer depending upon data size. Once complete, click on SQL Databases on Azure Portal to verify as shown below
 
 
With that the setup of active secondary on sql azure database is complete. Pretty simple as it takes a few clicks and needs almost zero pre requisites. A similar setup on "on Premises" would take days to months in most places.
 That's a lengthy first post on SQL Azure - Active Secondary.  On subsequent posts, I will cover how to connect to primary and secondary, failover , monitoring and how applications can benefit via readable secondary and of course the cost involved as well.

 

Thursday, May 4, 2017

Data Channel Interview - 07 - Steve Knutson on Managing Sharepoint Databases


7th Interview of #DataChannel is out :)

Very happy to have Mr.Steve Knutson, office servers and SharePoint MVP, share his knowledge on "Managing SharePoint Databases" on #DataChannel. Steve, demystifies quite a few items for SQL Server DBAs managing SharePoint databases like "what are the do's and don’ts for SharePoint databases", "Why are SharePoint DBs huge", "How to Size SharePoint Databases", "What are RBS / File Streams / Blobs on SharePoint Databases" and many more. 

Steve is to the point on his answers and in my opinion, the interview is very useful for database administrators managing SharePoint Databases. Happy Watching!!!