Fellow MVP Jamie Thomson recently pointed out that there is a "wrong results" bug in SQL Server that can manifest itself when the following conditions are true:
- You have an indexed view that joins at least two tables;
- those tables are constrained in either direction by a single-column foreign key;
- you perform updates to the base table(s) using
MERGE
which includes bothUPDATE
and (DELETE
orINSERT
) actions; and, - you subsequently issue queries that reference the index on the view (intentionally or not).
Unfortunately, the Knowledge Base article describing the problem (KB #2756471) is pretty light on details. They don't tell you how to reproduce the problem, or even what, specifically, you should be looking for to see if this affects you; and they don't even make mention of MERGE
(which is actually the core of the problem, not NOEXPAND
, and not a simple update). There are some additional details in the Connect item that brought about the fix; hopefully the KB article will be updated with more details before long.
In the meantime, the result you may see is incorrect data – or better put, stale data: The query may show you the old version of the updated row(s)! I spent a few minutes trying to reproduce this scenario in AdventureWorks, and failed miserably. Thankfully, Paul White (blog | @SQL_Kiwi) wrote an outstanding post describing the scenario and showing a full repro of the issue.
I don't think I can emphasize how serious this is.
Surely millions of customers are using indexed views, many of those have migrated their DML code to use MERGE
, and a large number of those are on Enterprise Edition (or aren't but are using the NOEXPAND
hint or are referencing the index directly). Paul was quick to point out that NOEXPAND
is not required to reproduce the problem in Enterprise Edition, and also discovered many of the other details required to reproduce the bug.
This post is not intended to steal any thunder from Jamie's or Paul's posts; just an attempt to reiterate the concern and to raise awareness of this issue. If you're in the habit of ignoring Cumulative Updates, opting to wait for Service Packs, and there is any chance that this issue could be affecting you right now, you owe it to yourself, not to mention your stakeholders and customers, to take this issue seriously.
So what should you do?
Well, what you do next depends on what version and edition of SQL Server you're running, and whether the bug actually affects you (or could).
- You should update to the latest Cumulative Update for your branch:
Branch Fixed in CU Build Minimum build required
to apply updateKB Article
(Download)2008 Service Pack 3 CU #8 10.00.5828 10.00.5500 KB #2771833 2008 R2 Service Pack 1 CU #10 10.50.2868 10.50.2500 KB #2783135 2008 R2 Service Pack 2 CU #4 10.50.4270 10.00.4000 KB #2777358 2012 RTM CU #5 11.00.2395 11.00.2100 KB #2777772 2012 Service Pack 1 CU #2 11.00.3339 11.00.3000 KB #2790947 Table 1 : Builds that contain the fix
- If you don't apply the fix, then you need to test all the references to your views to validate that they return correct results in all cases – including after you've updated the base tables using
MERGE
. If they don't (or you suspect that they might later be affected), then you should rebuild the clustered index on all affected views (or repair the indexed view(s) usingDBCC CHECKTABLE
, as Paul has described in his post), and stop usingMERGE
against these tables until you've applied the fix. If you continue usingMERGE
against the base tables, prepare to continue repairing the views in order to avoid the problem.
- A quicker fix would be to prevent the damaged indexed view from being used at all, by using any of the following methods required:
- apply the query hint
OPTION (EXPAND VIEWS)
to all relevant queries; - remove any explicit references to the index on the view;
- in Standard or other editions where indexed views are not matched automatically, remove all instances of
NOEXPAND
.
But this, of course, would largely defeat the purpose of the indexed view – may as well just drop the index. That said, it is usually better to get the right results slowly, than to get the wrong results quickly; so maybe that's okay.
- apply the query hint
SQL Server 2008 SP3
SQL Server 2008 R2 SP1/SP2
SQL Server 2012 RTM/SP1
Your options if you are on one of these builds:
SQL Server 2008 RTM/SP1/SP2
SQL Server 2008 R2 RTM
Unfortunately, you're on a build that is no longer in mainstream support, and it is unlikely this issue will be fixed for you (unless you are on extended support and you make a lot of noise). So your options are limited here – either move to a supported branch per the table above, and apply the Cumulative Update, or choose one of the other options mentioned previously.
SQL Server 2000
SQL Server 2005
Well, the bad news is that you're also on a build that is no longer supported. The good news is that in this specific case it doesn't matter – you can't use MERGE
anyway, so this bug can't affect you.
Other MERGE issues
Sadly, this is far from the first bug we've seen with MERGE
, and it likely won't be the last. Here is a quick selection of a dozen MERGE
bugs that are still marked as active on Connect:
- #773895 : MERGE Incorrectly Reports Unique Key Violations
- #766165 : MERGE evaluates filtered index per row, not post operation, which causes filtered index violation
- #723696 : Basic MERGE upsert causing deadlocks
- #713699 : A system assertion check has failed ("cxrowset.cpp":1528)
- #699055 : MERGE query plans allow FK and CHECK constraint violations
- #685800 : Parameterized DELETE and MERGE Allow Foreign Key Constraint Violations
- #654746 : merge in SQL2008 SP2 still suffers from "Attempting to set a non-NULL-able column's value to NULL"
- #635778 : NOT MATCHED and MATCHED parts of a SQL MERGE statement are not optimized
- #633132 : MERGE INTO WITH FILTERED SOURCE does not work properly
- #596086 : MERGE statement bug when INSERT/DELETE used and filtered index
- #583719 : MERGE statement treats non-nullable computed columns incorrectly in some scenarios
- #539084 : MERGE Stmt : Search condition on a non-key column and an ORDER BY in source dervied table breaks MERGE completely
Now, it may be the case that some of these bugs have actually been fixed, but their status is wrong because the loop back to Connect has not been closed. Even if that is the case, it can't be the true for all of them (and potentially for others that I did not uncover).
In addition, it has been demonstrated by Dan Guzman that MERGE
is not immune from race conditions and other concurrency issues. The workaround is to use HOLDLOCK
(or a higher isolation level); however, it is a common misconception that MERGE
is completely atomic and not prone to this problem at all. Therefore I'll wonder aloud: how many MERGE
statements out there include HOLDLOCK
(or are being executed under SERIALIZABLE
)? How many have been thoroughly tested for issues relating to concurrency?
Conclusion
Personally, I think the syntax is great (albeit daunting to learn), but every time an issue comes up, it erodes my confidence in the practicality of replacing existing DML with the new construct.
With that in mind, not to be Chicken Little, but I would not feel comfortable recommending anyone to use MERGE
unless they implement extremely comprehensive testing. Some of these issues are also present with standard UPSERT
methodologies, but the problems are more obvious there. MERGE
, merely through its single-statement nature, makes you want to believe in magic. Maybe someday it will deliver, but right now I know it's not going to be able to saw a person in half without some serious help.
Has this bug been fixed in SQL Server 2014 or 2016?
As far as I know, the bug never existed on SQL Server 2014 or 2016.
Thanks for the article and the answer to Darek's question!