Aaron Bertrand

Removing the default trace – Part 2

SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

[ Part 1 | Part 2 | Part 3 ]

In the first post in this series, I showed the analysis I used to determine that the default trace is not for us. While looking at what information we actually needed to collect in its place (file size changes), and how this should be exposed to users, I considered the following points about the default trace:

  • it only captures automatic events;
  • it does not capture the "culprit" batch that caused the event, unless you're lucky enough that it was also captured for another reason (e.g. DDL); and,
  • it captures events using local time (our servers are Eastern and obey DST).

In its defense, it does capture a lot of important information about those automatic events. After we disable the default trace, we may still want to review events that happened before the change and were captured in those files. But once the default trace is disabled, the row no longer exists in sys.traces, so you can't determine the path to the .trc files from there. Here is where the inflexibility of the default trace actually provides a benefit: the files are hardcoded to reside in the same folder as SERVERPROPERTY(N'ErrorLogFileName'). So, even if the default trace is disabled, we can still pull data from the files using the following query (with adjustments to show the data in UTC):

;WITH dst AS
(
    SELECT s,e,d 
      FROM (VALUES ('20190310','20191103',240),('20191103','20200308',300),
                   ('20200308','20201101',240),('20201101','20210314',300),
                   ('20210314','20211107',240)) AS dst(s,e,d)
),    -- will add 2022, 2023, etc. later (if DST is still a thing then)
p AS
(

    SELECT TracePath = REVERSE(SUBSTRING(p, CHARINDEX(N'\', p), 260)) + N'log.trc' FROM 
    (SELECT REVERSE((CONVERT(nvarchar(max), SERVERPROPERTY(N'ErrorLogFileName'))))) AS s(p)
), 
trc AS
(
    SELECT src = 'trc', 
      t.DatabaseName, 
      t.[FileName], 
      DurationSeconds = CONVERT(decimal(18,3),Duration/1000000.0),
      StartTimeUTC = DATEADD(MINUTE, COALESCE(st1.d,0), t.StartTime),
      EndTimeUTC   = DATEADD(MINUTE, COALESCE(st2.d,0), t.EndTime),
      FileType = CASE t.EventClass WHEN 92 THEN 'Data' WHEN 93 THEN 'Log' END,
      Culprit = TextData, 
      IsAutomatic = 'true', 
      ChangeMB = CONVERT(bigint, IntegerData)*8/1024, 
      Principal = t.LoginName, 
      t.HostName, 
      App = CASE WHEN ApplicationName LIKE N'%Management Studio%Query%' 
                      THEN N'SSMS - Query Window'
                 WHEN ApplicationName LIKE N'%Management Studio%'
                      THEN N'SSMS - GUI!'
                 ELSE ApplicationName END
    FROM p CROSS APPLY sys.fn_trace_gettable(p.TracePath, DEFAULT) AS t
    LEFT OUTER JOIN dst AS st1 ON  t.StartTime >= DATEADD(HOUR,2,st1.s) 
                               AND t.StartTime <  DATEADD(HOUR,2,st1.e)
    LEFT OUTER JOIN dst AS st2 ON  t.EndTime   >= DATEADD(HOUR,2,st2.s) 
                               AND t.EndTime   <  DATEADD(HOUR,2,st2.e)
    WHERE t.EventClass IN (92,93)
)
SELECT * 
  FROM trc
  ORDER BY StartTimeUTC DESC;

Sample results from one server, where there were definitely some manual and automatic events that happened (click to enlarge):

Writing a replacement

The Extended Events session I formulated to replace this, which would also capture manual file size changes and the query text that caused automatic events, is as follows:

CREATE EVENT SESSION FileSizeChanges ON SERVER 
ADD EVENT sqlserver.database_file_size_change
(
  ACTION
  (
    sqlserver.sql_text,
    sqlserver.client_app_name,
    sqlserver.client_hostname,
    sqlserver.username,
    sqlserver.server_principal_name
  )
)
ADD TARGET package0.event_file
(
  SET filename       = N'W:\SomePath\FileSizeChanges.xel',
  MAX_FILE_SIZE      = 100, -- MB
  MAX_ROLLOVER_FILES = 100  -- 100 MB x 100 = max 10 GB
)
WITH
(
  MAX_MEMORY            = 8192 KB,
  EVENT_RETENTION_MODE  = ALLOW_SINGLE_EVENT_LOSS,
  MAX_DISPATCH_LATENCY  = 30 SECONDS,
  MAX_EVENT_SIZE        = 0 KB,
  TRACK_CAUSALITY       = OFF,
  STARTUP_STATE         = ON
);

ALTER EVENT SESSION FileSizeChanges ON SERVER STATE = START;

While it looks like username and server_principal_name may be redundant, I actually found cases where the latter was NULL (and it seems this problem has been around for a while).

Checking the results

I enabled that session on February 22nd, so it is missing the events the default trace captured on the 12th, but it captured more than the single autogrowth event from the 23rd. I ran the following query to get a result of the same shape as above:

;WITH FileInfo(XEPath) AS
(
  SELECT LEFT(BasePath,COALESCE(NULLIF(CHARINDEX(SessionName,BasePath)-1,-1),0)) 
         + SessionName + N'*.xel' 
    FROM
    (
      SELECT xmlsrc.data.value(N'(@name)[1]', N'nvarchar(max)'), SessionName
        FROM 
        (
          SELECT CONVERT(xml,target_data), s.[name]
            FROM sys.dm_xe_session_targets AS t
            INNER JOIN sys.dm_xe_sessions AS s
            ON s.[address] = t.event_session_address
            WHERE s.[name] = N'FileSizeChanges'
        ) AS xefile (TargetData, SessionName)
        CROSS APPLY TargetData.nodes(N'//EventFileTarget/File') AS xmlsrc(data)
    ) AS InnerData(BasePath, SessionName)
),
SessionData(EventData) AS 
(
  SELECT CONVERT(xml, TargetData.event_data) FROM FileInfo
  CROSS APPLY sys.fn_xe_file_target_read_file(FileInfo.XEPath, NULL, NULL, NULL) AS TargetData
), 
src AS
(
  SELECT 
    EndTimeUTC   = x.d.value(N'(@timestamp)[1]', N'datetime2'),
    DatabaseID   = x.d.value(N'(data  [@name="database_id"]/value)[1]',           N'int'),
    [FileName]   = x.d.value(N'(data  [@name="file_name"]/value)[1]',             N'sysname'),
    Duration     = x.d.value(N'(data  [@name="duration"]/value)[1]',              N'int'),
    FileType     = x.d.value(N'(data  [@name="file_type"]/text)[1]',              N'varchar(4)'),
    Culprit      = x.d.value(N'(action[@name="sql_text"]/value)[1]',              N'nvarchar(max)'),
    IsAutomatic  = x.d.value(N'(data  [@name="is_automatic"]/value)[1]',          N'varchar(5)'),
    ChangeKB     = x.d.value(N'(data  [@name="size_change_kb"]/value)[1]',        N'bigint'),
    Principal    = x.d.value(N'(action[@name="server_principal_name"]/value)[1]', N'sysname'),
    username     = x.d.value(N'(action[@name="username"]/value)[1]',              N'sysname'),
    AppName      = x.d.value(N'(action[@name="client_app_name"]/value)[1]',       N'sysname'),
    HostName     = x.d.value(N'(action[@name="client_hostname"]/value)[1]',       N'sysname')
  FROM SessionData CROSS APPLY EventData.nodes(N'/event') AS x(d)
)
SELECT 
  src = 'xe', 
  DatabaseName    = DB_NAME(DatabaseID), 
  [FileName], 
  DurationSeconds = CONVERT(decimal(18,3), Duration/1000000.0),
  StartTimeUTC    = DATEADD(MICROSECOND, -Duration, EndTimeUTC), 
  EndTimeUTC,
  FileType, 
  Culprit, 
  IsAutomatic, 
  ChangeMB  = CONVERT(decimal(18,3), ChangeKB / 1024.0), 
  Principal = COALESCE([Principal], COALESCE(NULLIF(username,N''), N'?')),
  HostName, 
  App = CASE WHEN AppName LIKE N'%Management Studio%Query%' 
                  THEN N'SSMS - Query Window'
             WHEN AppName LIKE N'%Management Studio%'       
                  THEN N'SSMS - GUI!'
             ELSE AppName END
FROM src
ORDER BY StartTimeUTC DESC;

The results show the additional fun I had, including (gasp!) running a "Shrink Database" task from the UI (click to enlarge):

Deploying it everywhere

Confident that I could now get a more complete picture of file size change events on any server, it was time to deploy. I used a CMS query window to first disable the default trace everywhere, and set the show advanced options back the way I found it:

IF EXISTS 
(
  SELECT 1 FROM sys.configurations 
    WHERE name = N'default trace enabled' 
    AND value_in_use = 1
)
BEGIN
  DECLARE @OriginalAdvancedOptions bit = 
  (
    SELECT CONVERT(bit, value_in_use)
      FROM sys.configurations 
      WHERE name = N'show advanced options'
  );

  IF @OriginalAdvancedOptions = 0 -- need to turn this on if it's not already
  BEGIN
    EXEC sys.sp_configure @configname = N'show advanced options', @configvalue = 1;
    RECONFIGURE WITH OVERRIDE;
  END

  EXEC   sys.sp_configure @configname = N'default trace enabled', @configvalue = 0;

  IF @OriginalAdvancedOptions = 0 -- need to set it back (else leave it)
  BEGIN
    EXEC sys.sp_configure @configname = N'show advanced options', @configvalue = 0;
  END

  RECONFIGURE WITH OVERRIDE;
END
GO

Then, to create the Extended Event session, I need to use dynamic SQL, because some servers might have different paths for SERVERPROPERTY(N'ErrorLogFileName') and that argument can't be parameterized.

DECLARE @path nvarchar(max) = (SELECT REVERSE(SUBSTRING(p, CHARINDEX(N'\', p), 4000)) 
  FROM (SELECT REVERSE((CONVERT(nvarchar(max), SERVERPROPERTY(N'ErrorLogFileName'))))) AS s(p));

IF EXISTS (SELECT 1 FROM sys.dm_xe_sessions WHERE name = N'FileSizeChanges')
BEGIN
  EXEC sys.sp_executesql N'DROP EVENT SESSION FileSizeChanges ON SERVER;';
END

DECLARE @sql nvarchar(max) = N' CREATE EVENT SESSION FileSizeChanges ON SERVER 
ADD EVENT sqlserver.database_file_size_change
(
  ACTION
  (
    sqlserver.sql_text,
    sqlserver.client_app_name,
    sqlserver.client_hostname,
    sqlserver.username,
    sqlserver.server_principal_name
  )
)
ADD TARGET package0.event_file
(
  SET filename       = ''' + @path + N'FileSizeChanges.xel'',
  MAX_FILE_SIZE      = 100, -- MB
  MAX_ROLLOVER_FILES = 100  -- 100 MB x 100 = max 10 GB
)
WITH
(
  MAX_MEMORY            = 8192 KB,
  EVENT_RETENTION_MODE  = ALLOW_SINGLE_EVENT_LOSS,
  MAX_DISPATCH_LATENCY  = 30 SECONDS,
  MAX_EVENT_SIZE        = 0 KB,
  TRACK_CAUSALITY       = OFF,
  STARTUP_STATE         = ON
);

ALTER EVENT SESSION FileSizeChanges ON SERVER STATE = START;';

EXEC sys.sp_executesql @sql;

The proof is in the pudding

I created a mock workload that was intentionally heavy with things that would log events to the default trace, then ran it multiple times with and without the default trace enabled, to show that observer effect can have an impact on the workload.

SELECT [starting] = sysdatetime();
GO

EXEC sys.sp_executesql N'CREATE OR ALTER PROCEDURE dbo.dostuff
AS
BEGIN
  SET NOCOUNT ON;
  SELECT DISTINCT TOP (1000) object_id, column_id INTO #blat FROM sys.all_columns;
  ALTER TABLE #blat ADD CONSTRAINT PK_wahoo PRIMARY KEY (object_id, column_id);
  ALTER TABLE #blat ADD CONSTRAINT DF_what DEFAULT(1) FOR object_id;
  CREATE INDEX IX_what ON #blat(column_id);
  DROP TABLE #blat;
END';

EXEC dbo.dostuff;

CREATE USER smidgen WITHOUT LOGIN;

ALTER ROLE db_owner ADD MEMBER smidgen;

DBCC TRACEON(2861) WITH NO_INFOMSGS;
DBCC TRACEOFF(2861) WITH NO_INFOMSGS;

DROP USER smidgen;
GO 5000

SELECT [finished] = sysdatetime();
GO

Average runtime:

Default trace Average workload time
Enabled
147.4s
Disabled
131.6s

A 10-11% runtime reduction is certainly not huge in isolation, but it's a big win if you think about the cumulative impact across an entire fleet of 200+ servers.

What's next?

Don't do this yet. We still have to talk about some side effects of disabling the default trace, and create views so that users can easily consume the session data without becoming XQuery experts. Stay tuned!

[ Part 1 | Part 2 | Part 3 ]