Parse parameter default values using PowerShell - Part 1

[ Part 1 | Part 2 | Part 3 ]

If you have ever tried to determine the default values for stored procedure parameters, you probably have marks on your forehead from hitting it on your desk repeatedly and violently. Most articles that talk about retrieving parameter information (like this tip) don’t even mention the word default. This is because, except for the raw text stored in the object’s definition, the information is not anywhere in the catalog views. There are columns has_default_value and default_value in sys.parameters that look promising, but they are only ever populated for CLR modules.

Deriving default values using T-SQL is cumbersome and error-prone. I recently answered a question on Stack Overflow about this problem, and it took me down memory lane. Back in 2006, I complained via multiple Connect items about the lack of visibility of the default values for parameters in the catalog views. However, the problem still exists in SQL Server 2019. (Here’s the only item I’ve found that made it to the new feedback system.)

While it is an inconvenience that the default values aren’t exposed in the metadata, they’re most likely not there because parsing them out of the object text (in any language, but particularly in T-SQL) is hard. It’s difficult to even find the start and end of the parameter list because T-SQL’s parsing capability is so limited, and there are more edge cases than you can imagine. A few examples:

You can’t rely on the presence of ( and ) to indicate the parameter list, since they are optional (and may be found throughout the parameter list)
You can't easily parse for the first AS to mark the beginning of the body, since it can appear for other reasons
You can't rely on the presence of BEGIN to mark the beginning of the body, since it is optional
It is hard to split on commas, since they can appear inside comments, within string literals, and as part of data type declarations (think (precision, scale))
It is very hard to parse away both types of comments, which can appear anywhere (including inside string literals), and can be nested
You can inadvertently find important keywords, commas, and equals signs inside string literals and comments
You can have default values that aren't numbers or string literals (think {fn curdate()} or GETDATE)

There are so many little syntax variations that normal string parsing techniques are rendered ineffective. Have I seen AS already? Was it between a parameter name and a data type? Was it after a right parenthesis that surrounds the entire parameter list, or [one?] that didn’t have a match before the last time I saw a parameter? Is that comma separating two parameters or is it part of precision and scale? When you’re looping through a string one word at a time, it goes on and on, and there are so many bits you need to track.

Take this (intentionally ridiculous, but still syntactically valid) example:

/* AS BEGIN , @a int = 7, comments can appear anywhere */
CREATE PROCEDURE dbo.some_procedure 
  -- AS BEGIN, @a int = 7 'blat' AS =
  /* AS BEGIN, @a int = 7 'blat' AS = -- */
  @a AS /* comment here because -- chaos */ int = 5,
  @b AS varchar(64) = 'AS = /* BEGIN @a, int = 7 */ ''blat''',
  @c AS int = -- 12 
              6 
AS
    -- @d int = 72,
    DECLARE @e int = 5;
    SET @e = 6;

Parsing the default values out of that definition using T-SQL is hard. Really hard. With no BEGIN to properly mark the end of the parameter list, all the comment mess, and all the cases where keywords like AS can mean different things, you will probably have a complex set of nested expressions involving more SUBSTRING and CHARINDEX patterns than you’ve ever seen in one place before. And you’ll probably still end up with @d and @e looking like procedure parameters instead of local variables.

Thinking about the problem some more, and searching to see if anyone had managed anything new in the last decade, I came across this great post by Michael Swart. In that post, Michael uses the ScriptDom's TSqlParser to remove both single-line and multi-line comments from a block of T-SQL. So I wrote some PowerShell code to step through a procedure to see which other tokens were identified. Let’s take a simpler example without all the intentional problems:

CREATE PROCEDURE dbo.procedure1
  @param1 int
AS PRINT 1;
GO

Open Visual Studio Code (or your favorite PowerShell IDE) and save a new file called Test1.ps1. The only prerequisite is to have the latest version of Microsoft.SqlServer.TransactSql.ScriptDom.dll (which you can download and extract from sqlpackage here) in the same folder as the .ps1 file. Copy this code, save, and then run or debug:

# need to extract this DLL from latest sqlpackage; place it in same folder
# https://docs.microsoft.com/en-us/sql/tools/sqlpackage-download
Add-Type -Path "Microsoft.SqlServer.TransactSql.ScriptDom.dll";

# set up a parser object using the most recent version available 
$parser = [Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser]($true)::New(); 

# and an error collector
$errors = [System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]]::New();

# this ultimately won't come from a constant - think file, folder, database
# can be a batch or multiple batches, just keeping it simple to start

$procedure = @"
CREATE PROCEDURE dbo.procedure1
  @param1 AS int
AS PRINT 1;
GO
"@

# now we need to try parsing
$block = $parser.Parse([System.IO.StringReader]::New($procedure), [ref]$errors);

# parse the whole thing, which is a set of one or more batches
foreach ($batch in $block.Batches)
{
    # each batch contains one or more statements
    # (though a valid create procedure statement is also always just one batch)
    foreach ($statement in $batch.Statements)
    {
        # output the type of statement
        Write-Host "  ====================================";
        Write-Host "    $($statement.GetType().Name)";
        Write-Host "  ====================================";        

        # each statement has one or more tokens in its token stream
        foreach ($token in $statement.ScriptTokenStream)
        {
            # those tokens have properties to indicate the type
            # as well as the actual text of the token
            Write-Host "  $($token.TokenType.ToString().PadRight(16)) : $($token.Text)";
        }
    }
}

The results:

====================================
CreateProcedureStatement
====================================

Create : CREATE
WhiteSpace :
Procedure : PROCEDURE
WhiteSpace :
Identifier : dbo
Dot : .
Identifier : procedure1
WhiteSpace :
WhiteSpace :
Variable : @param1
WhiteSpace :
As : AS
WhiteSpace :
Identifier : int
WhiteSpace :
As : AS
WhiteSpace :
Print : PRINT
WhiteSpace :
Integer : 1
Semicolon : ;
WhiteSpace :
Go : GO
EndOfFile :

To get rid of some of the noise, we can filter out a few TokenTypes inside the last for loop:

      foreach ($token in $statement.ScriptTokenStream)
      {
         if ($token.TokenType -notin "WhiteSpace", "Go", "EndOfFile", "SemiColon")
         {
           Write-Host "  $($token.TokenType.ToString().PadRight(16)) : $($token.Text)";
         }
      }

Ending up with a more concise series of tokens:

====================================
CreateProcedureStatement
====================================

Create : CREATE
Procedure : PROCEDURE
Identifier : dbo
Dot : .
Identifier : procedure1
Variable : @param1
As : AS
Identifier : int
As : AS
Print : PRINT
Integer : 1

The way this maps to a procedure visually:

Each token parsed from this simple procedure body.

You can already see the problems we’ll have trying to reconstruct parameter names, data types, and even finding the end of the parameter list. After looking into this some more, I came across a post by Dan Guzman that highlighted a ScriptDom class called TSqlFragmentVisitor, which identifies fragments of a block of parsed T-SQL. If we change the tactic just a bit, we can inspect fragments instead of tokens. A fragment is essentially a set of one or more tokens, and also has its own type hierarchy. As far as I know, there is no ScriptFragmentStream to iterate through fragments, but we can use a Visitor pattern to do essentially the same thing. Let’s create a new file called Test2.ps1, paste in this code, and run it:

Add-Type -Path "Microsoft.SqlServer.TransactSql.ScriptDom.dll";

$parser = [Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser]($true)::New(); 

$errors = [System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]]::New();

$procedure = @"
CREATE PROCEDURE dbo.procedure1
  @param1 AS int
AS PRINT 1;
GO
"@

$fragment = $parser.Parse([System.IO.StringReader]::New($procedure), [ref]$errors);
$visitor = [Visitor]::New();
$fragment.Accept($visitor);

class Visitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor 
{
   [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment)
   {
       Write-Host $fragment.GetType().Name;
   }
}

Results (interesting ones for this exercise in bold):

TSqlScript
TSqlBatch
CreateProcedureStatement
ProcedureReference
SchemaObjectName
Identifier
Identifier
ProcedureParameter
Identifier
SqlDataTypeReference
SchemaObjectName
Identifier
StatementList
PrintStatement
IntegerLiteral

If we try to map this visually to our previous diagram, it gets a little more complex. Each of these fragments is itself a stream of one or more tokens, and sometimes they will overlap. Several statement tokens and keywords aren’t even recognized on their own as part of a fragment, like CREATE, PROCEDURE, AS, and GO. The latter is understandable since it isn’t even T-SQL at all, but the parser still has to understand that it separates batches.

Comparing the way statement tokens and fragment tokens are recognized.

To rebuild any fragment in code, we can iterate through its tokens during a visit to that fragment. This lets us derive things like the name of the object and the parameter fragments with much less tedious parsing and conditionals, though we still have to loop inside each fragment’s token stream. If we change Write-Host $fragment.GetType().Name; in the previous script to this:

[void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment)
{
  if ($fragment.GetType().Name -in ("ProcedureParameter", "ProcedureReference"))
  {
    $output = "";
    Write-Host "==========================";
    Write-Host "  $($fragment.GetType().Name)";
    Write-Host "==========================";

    for ($i = $fragment.FirstTokenIndex; $i -le $fragment.LastTokenIndex; $i++)
    {
      $token = $fragment.ScriptTokenStream[$i];
      $output += $token.Text;
    }
    Write-Host $output;
  }
}

The output is:

==========================
ProcedureReference
==========================

dbo.procedure1

==========================
ProcedureParameter
==========================

@param1 AS int

We have the object and schema name together without having to perform any additional iteration or concatenation. And we have the whole line involved in any parameter declaration, including the parameter name, the data type, and any default value that might exist. Interestingly, the visitor handles @param1 int and int as two distinct fragments, essentially double-counting the data type. The former is a ProcedureParameter fragment, and the latter is a SchemaObjectName. We really only care about the first SchemaObjectName reference (dbo.procedure1) or, more specifically, only the one that follows ProcedureReference. I promise we’ll deal with those, just not all of them today. If we change the $procedure constant to this (adding a comment and a default value):

$procedure = @"
CREATE PROCEDURE dbo.procedure1
  @param1 AS int = /* comment */ -64
AS PRINT 1;
GO
"@

Then the output becomes:

==========================
ProcedureReference
==========================

dbo.procedure1

==========================
ProcedureParameter
==========================

@param1 AS int = /* comment */ -64

This still includes any tokens in the output that are actually comments. Inside the for loop, we can filter out any token types we want to ignore in order to address this (I also remove superfluous AS keywords in this example, but you may not want to do that if you're reconstructing module bodies):

for ($i = $fragment.FirstTokenIndex; $i -le $fragment.LastTokenIndex; $i++)
{
  $token = $fragment.ScriptTokenStream[$i];
  if ($token.TokenType -notin ("MultiLineComment", "SingleLineComment", "As"))
  {
    $output += $token.Text;
  }
}

The output is cleaner, but still not perfect.

==========================
ProcedureReference
==========================

dbo.procedure1

==========================
ProcedureParameter
==========================

@param1 int = -64

If we want to separate the parameter name, data type, and default value, it becomes more complex. While we’re looping through the token stream for any given fragment, we can split out the parameter name from any data type declarations by just tracking when we hit an EqualsSign token. Replacing the for loop with this additional logic:

if ($fragment.GetType().Name -in ("ProcedureParameter","SchemaObjectName"))
{
    $output  = "";
    $param   = ""; 
    $type    = "";
    $default = "";
    $seenEquals = $false;

      for ($i = $fragment.FirstTokenIndex; $i -le $fragment.LastTokenIndex; $i++)
      {
        $token = $fragment.ScriptTokenStream[$i];
        if ($token.TokenType -notin ("MultiLineComment", "SingleLineComment", "As"))
        {
          if ($fragment.GetType().Name -eq "ProcedureParameter")
          {
            if (!$seenEquals)
            {
              if ($token.TokenType -eq "EqualsSign") 
              { 
                $seenEquals = $true; 
              }
              else 
              { 
                if ($token.TokenType -eq "Variable") 
                {
                  $param += $token.Text; 
                }
                else 
                {
                  $type += $token.Text; 
                }
              }
            }
            else
            { 
              if ($token.TokenType -ne "EqualsSign")
              {
                $default += $token.Text; 
              }
            }
          }
          else 
          {
            $output += $token.Text.Trim(); 
          }
        }
      }

      if ($param.Length   -gt 0) { $output  = "Param name: "   + $param.Trim(); }
      if ($type.Length    -gt 0) { $type    = "`nParam type: " + $type.Trim(); }
      if ($default.Length -gt 0) { $default = "`nDefault:    " + $default.TrimStart(); }
      Write-Host $output $type $default;
}

Now the output is:

==========================
ProcedureReference
==========================

dbo.procedure1

==========================
ProcedureParameter
==========================

Param name: @param1
Param type: int
Default: -64

That's better, but there is still more to solve. There are parameter keywords I have ignored so far, like OUTPUT and READONLY, and we need logic when our input is a batch with more than one procedure. I’ll deal with those issues in part 2.

In the meantime, experiment! There are a lot of other powerful things you can do with ScriptDOM, TSqlParser, and TSqlFragmentVisitor.

[ Part 1 | Part 2 | Part 3 ]

Parse parameter default values using PowerShell – Part 1

1 thought on “Parse parameter default values using PowerShell – Part 1”