Saturday, December 31, 2016

Data Channel - Interview 02 - Hamid Fard on SQL Server Memory Capacity Planning

Dear All,

Welcome to the second interview of Data Channel. Very happy and proud to have Mr Hamid Fard, MVP and MCM from Malaysia for Data Channel. Hamid discusses SQL Server Memory Capacity Planning on Data Channel. Discussion covers topics like SQL Server Memory sizing, AWE, Plan Cache, In Memory OLTP Capacity Planning and Buffer Pool Extension.

Interview Video provided below:

Excerpt from the interview provided below.

What is Capacity Planning? What does one needs to know?

While managing SQL Server, DBA's need to estimate and come up with the sufficient memory required for database instance. There have been many cases where the database server was setup and released to production and upon analysis it was found that the memory available for the server was insufficient and this resulted in poor performance. So, DBA's should plan and carefully estimate the right memory required for SQL Server.

There is a common complaint that SQL Server uses all the memory in the server. How does SQL Server use the memory?

SQL Server is a memory hungry and one shouldn't be scared if it does so!!! SQL Server loads all the data into the memory for all its queries and then the results are returned to client. If we have lot of memory, more data can be placed on the memory and that improves the performance of queries

Does SQL Server ever release the memory to Windows?

Yes. Lazy writer process scans the buffer pool when the SQL Server or Windows operating system is under memory pressure, and releases the unused pages in buffer pool. The least recently used memory pages are released by Lazy writer. If the pages are dirty ( unwritten to disk ), then they are written to disk before being removed from memory.

Is there any guideline for setting the maximum memory?

Ones needs to come up with a formula based on the Windows Server & Edition and SQL Server Version & Edition. A formula which I will be explaining, will help in coming up with maximum memory setting. 

Maximum memory setting allocates only for the buffer pool. One needs to estimate the memory required for SQL MGR, memory used by worker and threads, columnstore cache, plan cache and reduce the same from available physical memory to derive the maximum memory setting for SQL Server.

Can you please share the formula used? Has that been documented anywhere by you? Would be useful for many.

Yes. I have blogged the same.  Please refer to my blogpost below where I have written about it in detail 

Is there a recommended method to set the maximum memory?

Please refer to the blog written by me provided below

Blog post clearly explains how to size the memory for the following items

1) Reserve memory for windows
2) SQL Server DLLs
3) MTL - SQL Server Memory To Leave area
4) Plan Cache
5) Buffer Pool 

Is there a need to size differently for OLTP and OLAP systems? Why ? 

OLAP systems involves more of "Select" queries and hence one needs to have bigger buffer pool to cache more data pages. Otherwise, OLAP systems will have lots of disk IO.
OLTP Systems normally comprises of insert statements, random I/Os, smaller transactions and don't have huge select statements. As the types of workloads are different, a slightly different formula for different systems. Some OLTP systems also have huge reports causing massive select statements, but the formula can serve as a general guideline.

Buffer Pool Extension ( BPE ) was introduced in SQL Server 2014. How do you think it helps in "Memory Capacity Planning"? 

Buffer Pool Extension is good for companies using old hardware but running on SQL Server 2014, where one is unable to upgrade the memory. Buffer Pool Extension is normally used with SSD disks, but SSD disks response rates are in micro seconds and not nano seconds as it is in usual memory. So, BPE doesn't give a drastic performance gain.

My suggestion is to use it as last resort and ideally one should look to upgrade the RAM. Even if one decides to use BPE, suggestion is to use it with faster disks like PCI RAM disks

Another major technology to release in SQL Server 2014, is "In Memory" OLTP. How does one size if the database uses "In Memory OLTP" tables?

"In Memory OLTP" is one of the good features to have released in 2014 and improved further in SQL Server 2016. However, one needs to carefully size them to achieve the performance gain.

Are there been scenarios where "In Memory OLTP" actually resulted in negative performance?

Yes. If one doesn't size the memory carefully, it can even result in negative performance. Refer to the formula provided below.

One needs to calculate 4 different areas

1) Table & Row Data - Table data size is the sum of timestamp size, In memory Index Pointers size and actual data size ( row size )

   Begin & End Timestamp - 24 bytes

   In memory index pointers - 8 + 8 + 8 + 8 ( because 8 bytes      per index and there are 4 indexes )

   Data Size - 200 bytes in this example

So the per row table data for this example is 256 bytes. 

2) Hash Index - Size of hash index depends upon bucket count. It is as follows

( Bucket Count rounded up to nearest power of  2) * 
( Index Pointer ) * ( number of hash indexes )

     In this case - 5,000,000 nearest power of 2 is 2^23. So,            2^23 * 8 * 3

3) Memory for Non Clustered Index -  Derived using indexed column ( key column ) size and number of unique values on the column

 Index Pointer + Sum( Key Column Data Size ) * Unique Rows

4) Memory for Row Versioning - In memory OLTP uses row versioning which implies for every operation, the older rows are copied and changes are applied on the latest version. So, the formula is 

       Longest Transaction Duration per Second * Peak rows updated / deleted * Row Size 

So, for the above example, for a table that has a maximum 30% of its rows changing at a time, would require 3.6 GB of memory even though the table size is 1.2 GB.

Ideally, In memory OLTP tables are best suited for tables with insert only and fewer changes ( Deletes / Updates ) on it.

SQL Server 2016 SP1 announcement saw quite a few changes to SQL Server features availability across edition. How do you see the changes that are related to memory?

I was shocked(pleasantly) to see the changes. The big change was making columnstore and "In Memory" feature available on Standard and lower editions. The memory used by Columnstore is
capped to 32 GB and is outside of buffer pool.Hence one needs to calculate the memory for column store and then substract the same from physical memory to come up with the maximum memory 
required for buffer pool.

"In Memory" OLTP tables have a maximum of 32 GB of memory per database which is also outside buffer pool. One needs to take that into account as well while sizing
the memory.

What are the warning signs if one fails to configure memory correctly or downsizes it? 

Internal memory pressure is experienced if insufficient memory is sized. It can be felt via

1) Stolen Pages
2) Plan cache issues
3) Not enough buffer pool memory to keep all the data

Another thing one may want to check is unnecessary or unused "Non Clustered Indexes". As the indexes are updated, they are loaded to memory as well. So, unused indexes not just add up to the disk usage but also waste buffer pool space.

Edit from Nagaraj: Some of the content is debatable as indicated in the comments. Viewers discretion is advised

Monday, December 26, 2016

Query Store- Part 5: Checking forced plans and Query Store Failures

Most of you would know that using query store one can force query plans. Is there a need to check what plans are forced? Why?

Fundamentally, two important reasons for checking forced plans. 

1) To check if the query is yielding the desired performance?. The performance needs to be obviously better than the traditional query performance in most of the cases.

2) To look for query store failures while forcing plans.

This post will deal with point 2, query store failures while forcing plans.

What are query store failures?

Query store failures are events where query store is unable to force the plan it is supposed to. 

For example, - Let's say you have set the query store to force the index "NCIX_dt_present" for a particular query, and assume you have dropped the index, then query store does the following

1) Ignores the plan to be forced and picks up the next best query plan of optimizer's choice

2) Marks the failure to force plan on query store on the table "sys.query_store_plan" in the column "force_failure_count" 

For example,

Consider the following query

Select session_id,cpu_time,dt,logical_reads 
from [dbo].[Process_monitor]
WHERE dt between '20150715' and '20150715 00:15'

Refer to the screenshot from query store

Plan 99 uses "NCIX_dt_present"( a non clustered Index ) for the query

To ensure that the query always uses the Index "NCIX_dt_present", lets force the plan 99 on query 92 as shown below.

Run the select query few times and verify if plan is getting forced. Now lets drop the index

DROP INDEX Process_monitor.NCIX_dt_present

Re run the select query again few times

Select session_id,cpu_time,dt,logical_reads 
from [dbo].[Process_monitor]
WHERE dt between '20150715' and '20150715 00:15'

List of query store failures can be found from the query below

CAST(query_plan AS XML) AS 'Execution Plan',
rs.first_execution_time, rs.last_execution_time,
FROM sys.query_store_plan qp
INNER JOIN sys.query_store_query q
ON qp.query_id = q.query_id
INNER JOIN sys.query_store_query_text qt
ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_runtime_stats rs
ON qp.plan_id = rs.plan_id
WHERE qp.is_forced_plan = 1
AND qp.force_failure_count > 0
Order by rs.last_execution_time

qp.force_failure_count indicates the number of failures of query store to force plan. qp.last_force_failure_reason_desc" gives the reason why plan couldn't be forced. 

It is strongly recommended to track query store failures and forced plans to avoid unexpected performance results.

Thursday, December 22, 2016

R and SQL Server 2016

Yesterday, I delivered another session for Singapore SQL PASS Chapter. The session titled "R and SQL Server 2016" was giving an overview on "R" integration with SQL Server 2016 and also demonstrated how database administrators can apply "R" in their day to day operations. Video provided below. Thanks to Engineers.SG team for recording the same.

Sunday, December 11, 2016

My Interview at SQL Server Geeks 2016

One of the biggest highlights of this year for me was speaking at SQL Server Geeks Annual Summit 2016. In the summit, I gave an interview to the SQL Server Geeks team on my summit experience. Video provided below

Thursday, December 8, 2016

Query Store - Part 4 - Find the plans and queries forced by Query Store

As most of you know, one of the most fundamental uses of query store is to force query plans. Unlike query hints, Query store doesn't demand code a change nor it is complex to implement like plan guides. One can force a plan without make a query / code change and it is pretty simple to achieve the same via user friendly GUI. To figure out how to force a plan, please refer here

But, one of the important things to do, after forcing plans using query store is to track the plans that are being forced. One can do that using the following query

CAST(query_plan AS XML) AS 'Execution Plan',
rs.first_execution_time, rs.last_execution_time,
rs.avg_query_max_used_memory, qp.force_failure_count
FROM sys.query_store_plan qp
INNER JOIN sys.query_store_query q
ON qp.query_id = q.query_id
INNER JOIN sys.query_store_query_text qt
ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_runtime_stats rs
ON qp.plan_id = rs.plan_id
WHERE qp.is_forced_plan = 1
Order by rs.last_execution_time

The reasons why one needs to track forced plans are interesting. That will be covered in the next post :)

Saturday, December 3, 2016

Data Channel - Interview 01 - Andreas Wolter on Columnstore

Dear all,

Welcome to the first ever interview of DataChannel. I am honored to have Mr.Andreas Wolter, a German MVP, MCM, MCSM for the first interview. Detailed profile of Andreas provided below.

Andreas Wolter has over 16 years of experience with SQL Server and earned the Master-Certification both on SQL Server 2008 (MCM) and SQL Server 2012 (MCSM)– as one of only 2 experts worldwide. Besides that he has also been awarded with the MVP for SQL Server since 2014.

He is the founder of Sarpedon Quality Lab, originally from Germany, specializing in Development and Optimization of SQL Server Database- and Datawarehouse-architectures with focus on performance, scalability and security.


Topic of discussion is Columnstore Index. 

Interview video recording provided below:

Excerpts from the Interview provided below:

 What is Columnstore Index, how is it different from traditional Index?

ColumnStore Index is supposed to be optimized for access of large volume of data in the range of millions of rows. Columnstore index is optimized for single column access queries which are common in datawarehouse style applications. Columnstore Indexes design combined with underlying compression helps in fetching large volume of data at a much faster pace from OLAP style queries. Columnstore index has improved so much more in SQL Server 2016 to make it much more flexible.

What are the standard gains for applications using columnstore index?

First, application workload needs to be suitable to columnstore index.  The major gains would be

Compression: If the application is a typical datawarehouse application with "star schema" design, with 20 million or more rows, one is likely to receive 50% data compression by using columnstore index. Effectiveness of compression also depends on the data type of columns

Speed: On the speed front, it can be double or triple times or sometimes even 100 times faster, depending upon how the query has been designed and how the query was performing previously.

 How does SQL Server make such a improvement possible? What is the architecture of columnstore that makes it possible? 

It is a fundamental change in the data storage making it possible. Columnstore is hugely compressed resulting in massive reduction of I/O. Even today, in the era of SSDs, disk IO is the slowest part of the system. Compression in columnstore does provide half of the improvement in columnstore. And of course, there is more to it too.

How does one pick the tables that would benefit from columnstore? How does one design the application to leverage upon the benefits of columnstore?

Ideally datawarehouse style applications would benefit. Tables running into several million rows and wide tables with many columns (for ex 50 columns) would benefit from columnstore index
The width and length table need to be huge to benefit via columnstore index. The type of queries commonly executed by the application is also important. Columnstore would help queries fetching larger number of rows and wouldn't make a difference on queries reading 1 or 2 rows. Especially queries which involve aggregation functions with group by clause reading lots of rows would benefit the most.

How are the columnstore indexes stored on the disk? Are they differently stored compared to traditional Indexes?

Totally different. Only thing that is common is 8KB Pages for storing data. Unlike traditional Index pages which contain data of multiple fields and multiple data types, columnstore stores only one type of column data on a single page (One data type). These pages are organized as segments and each segment contains data from just one single column. This makes large compression possible.
Second part is, when fetching the data, if the query is selecting only 5 columns, column store reads the only the segments that contain the 5 columns. The traditional index would be forced to read all the columns as each page contains all the columns of the table or index.

Columnstore Index is one of those technologies which has improved a lot since 2012. Can you drive us thro the lifecycle of improvements it has had till 2016?

Columnstore was actually first introduced in SQL Server 2008 in the Parallel Data warehouse edition of SQL Server. 
In SQL Server 2012, once the nonclustered columnstore was put on the table, the table was write protected and supported only read only queries. Due to this limitation it very few people adapted or started using it.

In SQL Server 2014, clustered columnstore was introduced. Clustered columnstore by design contains all columns of the table. Clustered columnstore was updatable. However, Non clustered columnstore was still read only.

In SQL Server 2016, both clustered and non clustered column store indexes are updatable. However, one can't mix columns on clustered and non clustered indexes. Meaning, one can either have a clustered columnstore index or a non clustered columnstore index on the same table.

Andreas has prepared a Index Cheat Sheet to showcase the various combinations with Indexes. Please check it out over here

We understand that clustered columnstore index would contain all the columns of the table. Do I still need traditional indexes?

Yes. Good question. You will still need them if you have queries which pick smaller set of data. Traditional non clustered indexes are required for single row or fewer row lookups. Fetching lesser number of rows or specific rows from compressed segments would be harder and traditional indexes would help those requests.

What is Real Time Operational Analytics?

Let’s split this into 2 parts - Analytics is what we have discussed using all along using columnstore. Real time operational data refers to the data that is constantly changed via inserts / updates / Deletes. 
Real time operational analytics refers to the same table being used for two kinds of workload - both regular operations like insert / update /deletes and also datawarehouse style analytic queries. This has been made possible in SQL Server 2016 via the "Delta" store in columnstore indexes. "Delta Store" is an area of columnstore, which contains the hot data or recently changed data in an uncompressed format. This combination of static data on compressed segments and hot data on "Delta" Store makes real time operational analytics possible in SQL Server 2016.

The other buzzword is In Memory OLTP. Can it be combined with columnstore? How would applications benefit out of these two technologies?

The other much improved technology of SQL Server 2016 is in "In Memory". Yes, In SQL Server 2016, one can combine the columnstore technology and In memory table, which gives faster access to real time data. "Real Time Operational Analytics" with "in memory OLTP", makes the access to operational data lock free, latch free and also let analytic workload benefit via columnstore technology

What are the warning signs in using columnstore? What are the "Dont's" while using columnstore?

The queries/ workload used should suit the design of columnstore index. 
1) Application with most of the queries using "select * " or many columns may not benefit via columnstore index
2) Database schema design also matters. Shouldn't be a too complex schema design like snowflake / galaxy schema design. Datawarehouse applications using star schema design are most suitable
3) Functions or queries used also matters - For ex: - functions like min / max are suitable while commands like "distinct" are unsuitable for columnstore