Guest Author : Andy Mallon (@AMtwo)
If you're familiar with supporting the database behind Microsoft Dynamics CRM, you probably know that it's not the fastest-performing database. Honestly, that shouldn't be a surprise–it's not designed to be a screaming-fast database. It's designed to be a flexible database. Most Customer Relationship Management (CRM) systems are designed to be flexible so that they can meet the needs of many businesses in many industries with vastly different business requirements. They put those requirements ahead of database performance. That's probably smart business, but I'm not a business person–I'm a database person. My experience with Dynamics CRM is when people come to me and say
Andy, the database is slow
One recent occurrence was with a report failing due to a 5-minute query timeout. With the proper indexes, we should be able to get a few hundred rows really fast. I got my hands on the query and some example parameters, dropped it into Plan Explorer, and ran it a few times in our Test environment (I'm doing all this in Test–that's going to be important later). I wanted to make sure I was running it with a warm cache, so that I could use "the best of the worst" for my benchmark. The query was a big nasty SELECT
with a CTE, and a bunch of joins. Unfortunately, I can't provide the exact query, since it had some customer-specific business logic (Sorry!).
7 minutes, 37 seconds is as good as it gets.
Right off the bat, there's a lot of bad going on here. 1.5 million reads is a heck of a lot of I/O. 457 seconds to return 200 rows is slow. The Cardinality Estimator expected 2 rows, instead of 200. And there were a lot of writes–since this query is only a SELECT
statement, this means we must be spilling to TempDb. Maybe I'll get lucky, and be able to create an index to eliminate a table scan and speed this thing up. What's the plan look like?
Looks like an apatosaurus, or maybe a giraffe.
There will be no quick hits
Let me pause for a moment to explain something about Dynamics CRM. It uses views. It uses nested views. It uses nested views to enforce row-level security. In Dynamics parlance, these row-level-security-enforcing nested views are called "filtered views." Every query from the application goes through these filtered views. The only "supported" way to perform data access is to use these filtered views.
Recall I said this query was referencing a bunch of tables? Well, it's referencing a bunch of filtered views. So the complicated query I was handed is actually several layers more complicated. At this point, I got a fresh cup of coffee, and switched to a bigger monitor.
A great way to solve problems is to start at the start. I zoomed in on the SELECT operator, and followed the arrows to see what was going on:
Even on my 34" ultra-wide monitor, I had to fiddle with the display settings for the plan to see this much. Plan Explorer can rotate plans 90 degrees to make "tall" plans fit on a wide monitor.
Look at all those table-valued function calls! Followed immediately by a really expensive hash match. My Spidey Sense started to tingle. What is fn_GetMaxPrivilegeDepthMask
, and why is it being called 30 times? I bet this is a problem. When you see "Table-valued function" as an operator in a plan, that actually means it's a multi-statement table-valued function. If it were an inline table-valued function, it would get incorporated into the larger plan, and not be a black box. Multi-statement table-valued functions are evil. Don't use them. The Cardinality Estimator isn't able to make accurate estimates. The Query Optimizer isn't able to optimize them in the context of the larger query. From a performance perspective, they don't scale.
Even though this TVF is an out-of-the-box piece of code from Dynamics CRM, my Spidey Sense tells me that it's the problem. Forget this big nasty query with a big scary plan. Lets step into that function and see what's going on:
create function [dbo].[fn_GetMaxPrivilegeDepthMask](@ObjectTypeCode int)
returns @d table(PrivilegeDepthMask int)
-- It is by design that we return a table with only one row and column
as
begin
declare @UserId uniqueidentifier
select @UserId = dbo.fn_FindUserGuid()
declare @t table(depth int)
-- from user roles
insert into @t(depth)
select
--privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global)
-- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
-- do an AND with 0x0F ( =15) to get basic/local/deep/global
max(rp.PrivilegeDepthMask % 0x0F)
as PrivilegeDepthMask
from
PrivilegeBase priv
join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
join Role r on (rp.RoleId = r.ParentRootRoleId)
join SystemUserRoles ur on (r.RoleId = ur.RoleId and ur.SystemUserId = @UserId)
join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
where
potc.ObjectTypeCode = @ObjectTypeCode and
priv.AccessRight & 0x01 = 1
-- from user's teams roles
insert into @t(depth)
select
--privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global)
-- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
-- do an AND with 0x0F ( =15) to get basic/local/deep/global
max(rp.PrivilegeDepthMask % 0x0F)
as PrivilegeDepthMask
from
PrivilegeBase priv
join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
join Role r on (rp.RoleId = r.ParentRootRoleId)
join TeamRoles tr on (r.RoleId = tr.RoleId)
join SystemUserPrincipals sup on (sup.PrincipalId = tr.TeamId and sup.SystemUserId = @UserId)
join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
where
potc.ObjectTypeCode = @ObjectTypeCode and
priv.AccessRight & 0x01 = 1
insert into @d select max(depth) from @t
return
end
GO
This function follows a classic pattern in multi-statement TVFs:
- Declare a variable that is used as a constant
- Insert into a table variable
- Return that table variable
There's nothing fancy going on here. We could re-write these multiple statements as a single SELECT
statement. If we can write it as a single SELECT
statement, we can re-write this as an inline TVF.
Let's do it
If it isn't obvious, I'm about to re-write code provided by a software vendor. I've never met a software vendor that considers this to be "supported" behavior. If you change the out-of-the-box application code, you are on your own. Microsoft certainly considers this "unsupported" behavior for Dynamics. I'm going to do it anyway, since I'm using the test environment and I'm not playing around in production. Re-writing this function took just a couple minutes–so why not give it a try and see what happens? Here's what my version of the function looks like:
create function [dbo].[fn_GetMaxPrivilegeDepthMask](@ObjectTypeCode int)
returns table
-- It is by design that we return a table with only one row and column
as
RETURN
-- from user roles
select PrivilegeDepthMask = max(PrivilegeDepthMask)
from (
select
--privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global)
-- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
-- do an AND with 0x0F ( =15) to get basic/local/deep/global
max(rp.PrivilegeDepthMask % 0x0F)
as PrivilegeDepthMask
from
PrivilegeBase priv
join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
join Role r on (rp.RoleId = r.ParentRootRoleId)
join SystemUserRoles ur on (r.RoleId = ur.RoleId and ur.SystemUserId = dbo.fn_FindUserGuid())
join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
where
potc.ObjectTypeCode = @ObjectTypeCode and
priv.AccessRight & 0x01 = 1
UNION ALL
-- from user's teams roles
select
--privilege depth mask = 1(basic) 2(local) 4(deep) and 8(global)
-- 16(inherited read) 32(inherited local) 64(inherited deep) and 128(inherited global)
-- do an AND with 0x0F ( =15) to get basic/local/deep/global
max(rp.PrivilegeDepthMask % 0x0F)
as PrivilegeDepthMask
from
PrivilegeBase priv
join RolePrivileges rp on (rp.PrivilegeId = priv.PrivilegeId)
join Role r on (rp.RoleId = r.ParentRootRoleId)
join TeamRoles tr on (r.RoleId = tr.RoleId)
join SystemUserPrincipals sup on (sup.PrincipalId = tr.TeamId and sup.SystemUserId = dbo.fn_FindUserGuid())
join PrivilegeObjectTypeCodes potc on (potc.PrivilegeId = priv.PrivilegeId)
where
potc.ObjectTypeCode = @ObjectTypeCode and
priv.AccessRight & 0x01 = 1
)x
GO
I went back to my original test query, dumped the cache, and re-ran it a few times. Here's the slowest run time, when using my version of the TVF:
It's still not the most efficient query in the world, but it's fast enough–I don't need to make it any faster. Except… I had to modify Microsoft's code to make it happen. That's not ideal. Let's take a look at the full plan with the new TVF:
Goodbye apatosaurus, hello PEZ dispenser!
It's still a really gnarly plan, but if you look at the start, all those black box TVF calls are gone. The super-expensive hash match is gone. SQL Server gets right down to work without that big bottleneck of TVF calls (the work behind the TVF is now inline with the rest of the SELECT
):
Big picture impact
Where is this TVF actually used? Nearly every single filtered view in Dynamics CRM uses this function call. There are 246 filtered views and 206 of them reference this function. It is a critical function as part of the Dynamics row-level security implementation. Virtually every single query from the application to the databases calls this function at least once–usually a few times. This is a two-sided coin: on one hand, fixing this function will likely act as a turbo boost for the entire application; on the other hand, there's no way for me to do regression tests for everything that touches this function.
Wait a second–if this function call is so core to our performance, and so core to Dynamics CRM, then it follows that everyone who uses Dynamics is hitting this performance bottleneck. We opened a case with Microsoft, and I called a few folks to get the ticket bumped along to the engineering team responsible for this code. With a little luck, this updated version of the function will make it into the box (and the cloud) in a future release of Dynamics CRM.
This isn't the only multi-statement TVF in Dynamics CRM–I made the same type of change to fn_UserSharedAttributesAccess
for another performance issue. And there are more TVFs that I haven't touched because they haven't caused problems.
A lesson to everyone, even if you're not using Dynamics
Repeat after me: MULTI-STATEMENT TABLE VALUED FUNCTIONS ARE EVIL!
Re-factor your code to avoid using multi-statement TVFs. If you are trying to tune code, and you see a multi-statement TVF, look at it critically. You can't always change the code (or it may be a violation of your support contract if you do), but if you can change the code, do it. Tell your software vendor to stop using multi-statement TVFs. Make the world a better place by eliminating some of these nasty functions from your database.
Nicely done Andy!
The only time I have had success with ISVs (Microsoft in this case) is when my client was the 800-Pound-Gorilla client of said ISV. The most memorable such time was when my client had the VP of Developement AND the CEO of the ISV on the phone with me in under 2 minutes!! :-D Gorilla client can MAKE the ISV fix bad stuff (SAP and Trace Flag 2371 for example?!?). So search around for a VERY BIG CLIENT on Dynamics and get THEM to have a phone call with the Microsoft Dynamics team.
Ah the olde filtered views. Had many of hours refactoring the code in our reports. Another pain for us has been the datetime fields and the fact they work out the local time.
Nice write up
Thank you, Andy!
Enjoyed to read your clearly written post.
How exactly did you improve fn_UserSharedAttributesAccess?
What are performance numbers?
Regards,
Andrei
I don't have the exact code or performance numbers handy for the other TVF–but if I recall correctly, that function affects certain column-level security, as implemented within Dynamics CRM. It was a very simple TVF that was needlessly made into a Multi-statement TVF. If you carefully rearrange the function definition to be inline, you'll see some significant performance gains (assuming that you're using that column-level security feature).
~Andy
Small correction to your statement " Virtually every single query from the application to the databases calls this function at least once–usually a few times."
Filtered views are used only for reports. Regular Retrieve and RetrieveMultiple APIs use views not Filtered views. For example, for account entity Account view is used.
Row security filters (where conditions) added based on caller depth of read privilege for entity.
For reports proposed improvement is significant.
A minor point, but can't you get rid of two of the three MAX calls you have and cut the % calls down to one instead of two? It seems that your two inner queries should simply return rp.PrivilegeDepthMask UNION ALL'd together and let the outer SELECT do the max(rp.PrivilegeDepthMask % 0x0F). That should in theory be a tad more efficient.
I also realized the horrible SCALAR UDF they have will void the use of parallelism when it is appropriate, such as in a large report. Presumably dbo.fn_FindUserGuid() finds the "identifier" of the current user somehow. That should absolutely be known by the code and always passed into any sproc (or generated code if any). That would not only avoid the unnecessary expense of the lookup but would avoid the scalar UDF and all of the bad things that come along with that.
Agree with both Kevin's points. Double usage of dbo.fn_FindUserGuid() is introduced by refactoring TVF proposed by this article.
The best will be avoid dynamic computation of MaxPrivilegeDepthMask using joins. It should be precomputed and persist in a table
CREATE TABLE [dbo].[MaxPrivilegeReadDepthMask](
[SystemUserId] [uniqueidentifier] not NULL,
[ObjectTypeCode] [int] not NULL,
[MaxPrivilegeReadDepthMask] [int] not NULL,
CONSTRAINT [cndx_PrimaryKey_MaxPrivilegeReadDepthMask] PRIMARY KEY CLUSTERED
(
[SystemUserId] ASC,
[ObjectTypeCode] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 80) ON [PRIMARY]
) ON [PRIMARY]
GO
–This TVF will be a simple select
create function [dbo].[fn_GetMaxPrivilegeDepthMaskNew](@ObjectTypeCode int, @UserId uniqueidentifier)
returns table
— It is by design that we return a table with only one row and column
as
RETURN
select MaxPrivilegeReadDepthMask from [dbo].[MaxPrivilegeReadDepthMask] where SystemUserId = @UserId and ObjectTypeCode = @ObjectTypeCode
GO
Recomputation of data in table MaxPrivilegeDepthMask should be done when
a) user/team roles are changed
b) association user with a team (which has a role) changes
Hi Andy. I'm a little late seeing this but it's interesting. Question – did you verify the results of the query/report thoroughly after this change? I ask because in your duration/cpu/rows stats it looks like the actual rows coming back is 12 from your revised query versus 200 in the original, which would obviously be a bit problematic…. Am I missing something?