Tim Radney

Dealing with high severity errors in SQL Server

April 8, 2015 by in SQL Performance | 1 Comment
Free eBook : Query Optimization with SentryOne Plan Explorer
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Register to Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

db_error

In my previous article on SQL Server Agent Alerts, I provided step-by-step instructions on how to set up and configure SQL Agent alerts for high severity errors 19-25 as well as error 825. In this article, I'm going to discuss these errors in detail, and share what you should do if they happen in your environment.

Errors with a severity level of 19 or higher stop the current batch from completing. Errors with a severity of 20 and higher are fatal errors and terminate the current client connection. These errors may also impact all of the processes in the database. Fatal errors are exactly what the name implies: the process that is running is terminated and the client connection is closed.

Severity 19 Errors

A severity 19 error is an error due to lack of a resource. This means that an internal limit (that you can't configure) has been exceeded and caused the current batch to end. These errors rarely occur and there is little that you can do to correct the issue. If a severity 19 error occurs you should contact your primary support provider; typically, that would be Microsoft.

In all my years of working with SQL Server, I cannot recall any incident where a severity 19 error was generated. Even searching Bing, I've had trouble finding occurrences of the error; the few references I found were related to an early version of SQL Server, and referenced a bug within SQL Server itself.

Severity 20 Errors

A severity 20 error is a fatal error in the current process. This indicates that a statement encountered a problem and was terminated. As this only impacts the current process it is very unlikely that the database itself has been damaged. These errors are tied to an individual statement so you will need to gather the entire error message and reach out to the person or team responsible for that bit of code. This could be in-house or possibly the vendor of the application. An example error is:

Error: 18056, Severity 20, State: 29
The client was unable to reuse a session with SPID 123, which had been reset for connection pooling.

For this error I would reach out to the application developer or vendor, since the error is related to a pooled connection encountering an error when trying to reset. I would also review the SQL Server logs which may have a more detailed error message regarding what is actually happening to cause the error.

Severity 21 Errors

A severity 21 error is a fatal error in the database that affects all processes using that database.

I have seen this error occur when trying to restore a database using Enterprise features to a Standard Edition instance, as well as when a database is corrupt and the user tries to access a corrupt page. An example error message of this type is:

Error: 605, Severity: 21, State 1
Attempt to fetch logical page (1:8574233) in database 'DB_NAME' belongs to object '0', not to object 'Table01'.

When attempting to restore a database that is using Enterprise features to a Standard Edition instance, you will have to first remove the Enterprise features. For example, if you are using data compression or change data capture, you will first have to stop using and remove those features from the database, back up the database, and then restore it to the Standard Edition instance. You can use the DMV sys.dm_db_persisted_sku_features to check whether you have any Enterprise-only features in use.

For the corruption errors you will need to run DBCC CHECKDB to determine the extent of the corruption and go from there. If you are fortunate, the error will be in a nonclustered index that you can rebuild and resolve the issue. If the corruption is more severe, you could be looking at a restore operation. To better understand corruption and how to resolve various aspects of corruption, I encourage you to review the various blog post by Paul Randal. Paul has an entire category on corruption that you can view here:

Running DBCC CHECKDB as part of a regularly-scheduled job against your databases is highly recommended to detect corruption as early as possible. If you are not regularly checking for corruption, then you are at a huge risk of not being able to recover the corrupt data.

Severity 22 Errors

A severity 22 error is a fatal error due to table integrity being suspect, basically indicating that the table or index specified in the message is damaged. Corruption happens and happens often. Our experience is that the majority of corruption occurs due to an I/O subsystem-related issue. If you run into a severity 22 error, you will need to run DBCC CHECKDB to determine the extent of the damage. An example error is:

Error: 5180, Severity: 22, State: 1
Could not open XYZ for invalid file ID ## in database. Table or database may be corrupted.

If the error is in a nonclustered index, then you could just rebuild the index and fix the corruption. If the corruption is in a heap or clustered index, then you will need to restore the database to a consistent state.

I have seen reports where the corruption was in memory but not on disk. In that case a restart of the instance or setting the database offline and then online should clear up the error.

Severity 23 Errors

A severity 23 error is another fatal error reporting that the database itself has an integrity issue. The resolution is much like that of a severity 22 error, where you need to immediately run DBCC CHECKDB to find the full extent of the damage to the database.

This level of corruption is detected as effecting the entire database. This could be corruption within the data file itself or corruption within the log file. The details of the error will direct you toward the root problem. For instance, the following error points out that we would need to restore our database or attempt to rebuild the log. For consistency, I would restore from my most recent backup and all available transaction log backups.

Error: 9004, Severity: 23 State: 6
An error occurred while processing the log for database 'db_name'. If possible, restore from backup. If a backup is not available, it might be necessary to rebuild the log.

Severity 24 Errors

A severity 24 error is a fatal error related to a hardware. This message would occur due to some type of media failure. The most common of these types of errors I have seen are related to issues with memory and I/O errors. For example:

Error: 832, Severity: 24, State: 1
A page that should have been constant has changed (expected checksum: <expected value>, actual checksum: <actual value>, database <db_id>, file <filename>, page <page#>). This usually indicates a memory failure or other hardware or OS corruption.

When errors like this occur you should contact your system support team to run memory test on your server and give the server a good health check. This error could be bad memory or a memory scribbler (a kernel process or something that is changing SQL Server's memory).

Another example:

Error: 824, Severity: 24, State: 2
SQL Server detected a logical consistency-based I/O error: incorrect pageid (expected 1:123; actual 0:0). It occurred during a reads of page (1:123) in database ID <id>. Additional messages in the SQL Server error log or system event log may provide more detail.

This error indicates a consistency error in the primary data file of the database. You would need to immediately run DBCC CHECKDB to determine the extent of the corruption and take the appropriate action to repair or restore the database.

Severity 25 Errors

A severity 25 error is a fatal system error. I have heard that severity 25 is more or less a catch-all for miscellaneous fatal errors. I have only seen this error when related to failed upgrades: something prevents one of the upgrade scripts from running, and a severity 25 error is thrown. You would get an error similar to:

Script level upgrade for database 'master' failed because upgrade step 'sqlagent90_sysdbupg.sql' encountered error 598, state 1, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the 'master' database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.

In this case, errors prior to this message indicated an incorrect path for the default data location for SQL Server. Once that was corrected the upgrade ran successfully.

Error 825

Error 825 is often referred to as the read-retry warning, however the condition is for both read and write operations. This error lets you know that a retry of the operation was needed and how many times SQL Server had to retry the attempt before it was successful. SQL Server will retry the operations up to four times, after four retry attempts it will raise an 823 or 824 error. Error 825 messages will be similar to the following:

A read of the file 'path to file name\db_name.mdf' at offset 0x00000002000 succeeded after failing 2 time(s) with error: incorrect checksum (expected: XYZ; actual ABC). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

These messages are important as they are indicative that you have a larger problem with your disk subsystem. Troubleshooting methods would be to run DBCC CHECKDB to ensure the database is consistent, as the error recommends, as well as review the Windows event logs for errors from the operating system or storage devices. You should get your storage and hardware support team to review the underlying I/O subsystem for errors as well.

Summary

Having SQL Agent alerts configured is free and easy. Being proactive and responsive to these alerts is important to help minimize downtime for you and your customers. As you have now learned, many things can affect SQL Server and the consistency of your databases, and the best defense for being able to recover from these errors is having good backups and knowing the various repair options for DBCC CHECKDB. It's always recommended to run DBCC CHECKDB regularly against your databases to detect corruption as early as possible, as the quicker you find corruption, the more likely you are to have the data backed up so that you can restore with no data loss.