I've been teaching and writing about common SQL Server mistakes for many years. I wrote a blog about it years ago too, however as time has marched on, guidance has changed a bit. This article will expand on my previous article and point out how these apply to SQL Server, Azure SQL Database, and Azure SQL Managed Instance.
For many years I've found users making the same mistakes. I call them mistakes however, in most cases, it is more just things not being done properly because the people managing the environment don't know any better. Here are some of the more critical items that anyone installing and supporting SQL Server should know about:
- DBCC CHECKDB
- Memory settings
- Index maintenance
- MAXDOP and cost threshold for parallelism
- SQL Server Agent alerts
I always check backups first when looking at a new system. Having proper backups to meet recovery objectives is critical. Data loss can be detrimental to an organization. When looking at backups, I check for recovery model and the current history of backups for each database. I usually find a combination of the following:
- No backup at all – no record of any backup for the database
- Missing backups – no log backups for a database using the full recovery model
- No recent backups – last backup was weeks/months/years old
Misconfigured backups are detrimental to an organization when a recovery situation comes up. Working with and having to tell customers that they've lost data is never fun or easy. Having proper backups to meet SLAs should be any organizations top priority in addition to making sure there are copies of these backups stored in a secondary location offsite.
This situation applies to on-premises SQL Server and IaaS. Azure SQL Database and Azure Managed Instance have managed backups.
Database corruption happens unfortunately. Without regularly checking for corruption, customers can find themselves in a bad place by not having backups in order to recover when that corruption affects the physical data. To check for corruption, DBCC CHECKDB should be run against each database on a regular basis. What I find is very similar to backups:
- No DBCC CHECKDBs performed at all
- DBCC CHECKDBs being performed only on select databases
- DBCC CHECKDBs last performed months or years ago
Worst case is a job scheduled reporting failed DBCC CHECKDBs
It is never pleasant finding corruption or having a customer reach out with a corruption issue when the corruption is a heap or clustered index and there are no backups prior to the corruption occurring. In these cases, the corruption is the actual data and starting the restore from before the corruption is in most cases, the only option. In cases where the corruption is a non-clustered index, rebuilding the index is the fix.
In a few situations, I've had to work with customers who have nasty corruption without proper backups where I've been able to script out the database and manually copy all the usable data into a newly created database. These costly situations can be easily avoided by running DBCC CHECKDB and having proper backup retention.
I advise customers to run DBCC CHECKDB on-premises, IaaS, Azure SQL Database, and Azure SQL Managed Instance. Azure does a great job checking for physical corruption; however, I feel that consumers need to be checking for logical corruption.
A default installation of Microsoft SQL Server has minimum memory value set to 0 and maximum server memory value set to 2147483647 MB, which is 2 Petabytes. Prior to SQL Server 2012, the maximum server memory value only applied to the bufferpool, so customers needed to limit the amount of memory the bufferpool could use to save memory for the operating system and other processes. SQL Server 2012 introduced a memory manager rewrite so that the maximum server memory value applies to all SQL Server memory allocations.
It is highly advisable to set a maximum value for your SQL Server instance. Jonathan Kehayias has written an blog post How much memory does my SQL Server actually need, with a formula that helps establish the baseline for the maximum memory value. In cases of a shared SQL Server, I recommend my clients to set the minimum value to 30% of the memory on the server.
In situations with multiple instances or where the server is used for SQL Server, SSIS, SSAS, or SSRS, you need to evaluate how much memory those other systems need and reduce the maximum server memory value to allow adequate memory for the OS and the other services.
This issue is valid for on-premises, IaaS, and partially for Azure SQL Managed Instance. Managed Instance sets a max server memory value based on the deployed tier, however when I tested resizing the environment, the max memory value was not dynamically changed. In that situation, you would need to manually update the value. This issue does not apply to Azure SQL Database.
The query optimizer uses statistics to build execution plans. This means SQL Server needs statistics to be up to date so that the query optimizer has a better chance of building a good execution plan. By default, statistics are updated after 20% +500 rows of data have been modified. That can take a long time on larger tables. Beginning with compatibility level 130, the threshold for statistics updates for large tables have been lowered. For SQL Server 2008R – 2014, you could lower this threshold using trace flag 2371.
I regularly find that customers are not manually updating statistics and even with the lower threshold, I've found that manually updating makes an environment more stable.
I recommend that customers use a third-party script to update statistics. Ola Hallengren has published a widely used Maintenance Solution for SQL Server. Part of that process is his Index Optimize procedure, which can take additional parameters to update statistics.
@UpdateStatistics ALL = update index and column statistics INDEX = update index statistics COLUMNS = update column statistics NULL = Do not perform statistics maintenance (this is the default) @OnlyModifiedStatistics Y = Update statistics only if rows have been modified since most recent stats update N = Update statistics regardless of whether any rows have been modified
I've found that customers who are using third party products or scripts to perform index maintenance based upon the fragmentation level of the index are not considering that reorganizations do not update statistics like rebuilds do. Many of these third-party applications have options for updating statistics just like Ola's Index Optimize procedure, you just need to turn it on.
Updating statistics applies to on-premises, IaaS, Azure SQL Database, and Azure SQL Managed Instance.
Performing index maintenance by removing fragmentation from your indexes is still important. Some retired documentation from Microsoft stated that index fragmentation can have a negative impact from 13-460% depending on the size of the environment and the level of fragmentation. While hardware such as intelligent SANs, Solid State Disk, and other advancements have helped speed things up, wasted space in index can translate to wasted space in the buffer pool as well as wasting more I/O.
Fragmentation occurs through regular operations such as inserts, updates and deletes. To remediate this, proper index maintenance of rebuilding or reorganizing your indexes is needed. I again turn to Ola Hallengren, for his Index Optimize script. Ola's script provides the ability to specify to rebuild or reorganize based on the level of fragmentation and minimum pages. Many third-party tools offer the same logic. SQL Server Database Maintenance plans prior to SQL Server 2016 only allowed to rebuild or reorganize all indexes. Beginning with SQL Server 2016, you can now specify similar logic based on fragmentation levels. Don't forget those statistics though if you are using smart logic based on fragmentation levels.
I like Ola's script and third-party tools that log to a table. I can then query the table to see if I have any index hot spots where fragmentation is constantly occurring at high levels and troubleshoot why fragmentation is so prevalent and can anything be done.
There are exceptions to every rule or best practice. Some patterns of data access lead to constant fragmentation. The cost of constantly rebuilding/reorganizing those tables may not be worth it and can be excluded from maintenance. Those situations should be evaluated on a case by case basis.
This applies to on-premises, IaaS, Azure SQL Database, and Azure SQL Managed Instance.
I find that max degree of parallelism and cost threshold for parallelism are typically left at the default values on the client servers. For MAXDOP the default value is zero which means an 'unlimited' number of CPUs could be used to execute a parallel region of a query. Technically up to 64 processors unless you enable a trace flag to use more.
A decade ago, when processors had lower core counts, this value was acceptable. Today, with high core density and multi-socket servers, an unlimited number of CPUs for parallelism isn't so good. Microsoft has given guidance on what values to use for MAXDOP.
If you are on SQL Server 2008 – SQL Server 2014, for a single NUMA node with less than 8 logical processors, keep MAXDOP at or below the number of logical processors. If you have more than 8 logical processers, keep MAXDOP at 8. If you have multiple NUMA nodes with less than 8 logical processors per NUMA node, keep MAXDOP at or below the number of logical processors per NUMA node. Greater than 8, keep MAXDOP at 8.
SQL Server 2016 introduced soft-NUMA nodes. During service startup, if the Database Engine detects more than 8 physical cores per NUMA node or socket, soft-NUMA nodes are created automatically. The engine takes care of placing logical processors from the same physical core into different soft-NUMA nodes. For that reason, we have slightly different guidance for MAXDOP for SQL Server 2016 onwards.
If you are on SQL Server 2016 and up, for a single NUMA node with less than 16 logical processors, keep MAXDOP at or below the number of logical processors. If you have more than 16 logical processers, keep MAXDOP at 16. If you have multiple NUMA nodes with less than 16 logical processors per NUMA node, keep MAXDOP at or below the number of logical processors per NUMA node. Greater than 16, keep MAXDOP at half the number of logical processors per NUMA node with a MAX value of 16.
If you are mostly virtualized on machines with 8 or fewer logical processors with a default MAXDOP, you're probably in OK. If you have big physical hardware with defaults, then you should look at optimizing MAXDOP.
All the figures above are guidelines, not hard truths. Your workloads vary and consideration should be taken when you determine what value is most optimal for your workload.
Configuring MAXDOP applies to on-premises, IaaS, and Azure SQL Managed Instance. However, there is a database scoped configuration that can be applied per database starting with SQL Server 2016, and this applies to Azure SQL Database.
Cost Threshold for Parallelism
Cost threshold for parallelism has a default value of 5. The history of this number goes back to the early days of SQL Server and the workstation that workload testing was performed on. With modern hardware, the cost estimation of 5 is outdated. Testing has shown that increasing the number from 5 to a higher value will keep shorter-running queries from having a parallel plan. I tend to recommend increasing this value to a higher number after examining the Plan Cache. In many cases I end up starting with a value of 25 and then monitor further and adjust from there, if needed. For more information about tuning cost threshold for parallelism, Jonathan Kehayias wrote: Tuning 'cost threshold for parallelism' from the Plan Cache.
This applies to on-premises, IaaS, and Azure SQL Managed Instance.
SQL Server Agent Alerts
Everyone should be leveraging SQL Agent alerts unless they have a third-party application monitoring for the same error conditions. Configuring alerts is easy and free, and having them configured will give you critical information when your servers are having problems.
I wrote an article titled SQL Server Agent Alerts, providing step-by-step instructions on how to create alerts for severity 19-25 errors and error 825. Enabling these alerts is easy: enable database mail, create a mail operator and then create the alerts. This can be accomplished using the GUI or with T-SQL. I encourage my everyone to script out this process using T-SQL and make it part of your standard server build.
This applies to on-premises, IaaS, and Azure SQL Managed Instance.
As you can see, there are many settings that should be modified from the defaults after installing SQL Server. This is not a comprehensive list; however, it does cover many of the more critical and performance impacting issues I find, and that I have lumped under my "SQL Server mishaps" category.