In my last post, I showed how to use TSqlParser
and TSqlFragmentVisitor
to extract important information from a T-SQL script containing stored procedure definitions. With that script, I left out a few things, such as how to parse out the OUTPUT
and READONLY
keywords for parameters, and how to parse multiple objects together. Today, I wanted to provide a script that handles those things, mention a few other future enhancements, and share a GitHub repository I created for this work.
Previously, I used a simple example like this:
CREATE PROCEDURE dbo.procedure1
@param1 AS int = /* comment */ -64
AS PRINT 1;
GO
And with the Visitor code I provided, the output to the console was:
ProcedureReference
==========================
dbo.procedure1
==========================
ProcedureParameter
==========================
Param name: @param1
Param type: int
Default: -64
Now, what if the script passed in looked more like this? It combines the intentionally terrible procedure definition from before with a couple of other elements that you might expect to cause problems, like user-defined type names, two different forms of the OUT
/OUTPUT
keyword, Unicode in parameter values (and in parameter names!), keywords as constants, and ODBC escape literals.
/* AS BEGIN , @a int = 7, comments can appear anywhere */
CREATE PROCEDURE dbo.some_procedure
-- AS BEGIN, @a int = 7 'blat' AS =
/* AS BEGIN, @a int = 7 'blat' AS = -- */
@a AS /* comment here because -- chaos */ int = 5,
@b AS varchar(64) = 'AS = /* BEGIN @a, int = 7 */ ''blat''',
@c AS int = -- 12
6
AS
-- @d int = 72,
DECLARE @e int = 5;
SET @e = 6;
GO
CREATE PROCEDURE [dbo].another_procedure
(
@p1 AS [int] = /* 1 */ 1,
@p2 datetime = getdate OUTPUT,-- comment,
@p3 date = {ts '2020-02-01 13:12:49'},
@p4 dbo.tabletype READONLY,
@p5 geography OUT,
@p6 sysname = N'学中'
)
AS SELECT 5
The previous script doesn’t quite handle multiple objects correctly, and we need to add a few logical elements to account for OUTPUT
and READONLY
. Specifically, Output
and ReadOnly
are not token types, but rather they are recognized as an Identifier
. So we need some extra logic to find identifiers with those explicit names within any ProcedureParameter
fragment. You might spot a few other minor changes:
Add-Type -Path "Microsoft.SqlServer.TransactSql.ScriptDom.dll";
$parser = [Microsoft.SqlServer.TransactSql.ScriptDom.TSql150Parser]($true)::New();
$errors = [System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]]::New();
$procedure = @"
/* AS BEGIN , @a int = 7, comments can appear anywhere */
CREATE PROCEDURE dbo.some_procedure
-- AS BEGIN, @a int = 7 'blat' AS =
/* AS BEGIN, @a int = 7 'blat' AS = -- */
@a AS /* comment here because -- chaos */ int = 5,
@b AS varchar(64) = 'AS = /* BEGIN @a, int = 7 */ ''blat''',
@c AS int = -- 12
6
AS
-- @d int = 72,
DECLARE @e int = 5;
SET @e = 6;
GO
CREATE PROCEDURE [dbo].another_procedure
(
@p1 AS [int] = /* 1 */ 1,
@p2 datetime = getdate OUTPUT,-- comment,
@p3 date = {ts '2020-02-01 13:12:49'},
@p4 dbo.tabletype READONLY,
@p5 geography OUT,
@p6 sysname = N'学中'
)
AS SELECT 5
"@
$fragment = $parser.Parse([System.IO.StringReader]::New($procedure), [ref]$errors);
$visitor = [Visitor]::New();
$fragment.Accept($visitor);
class Visitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor
{
[void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragment] $fragment)
{
$fragmentType = $fragment.GetType().Name;
if ($fragmentType -in ("ProcedureParameter", "ProcedureReference"))
{
if ($fragmentType -eq "ProcedureReference")
{
Write-Host "`n==========================";
Write-Host " $($fragmentType)";
Write-Host "==========================";
}
$output = "";
$param = "";
$type = "";
$default = "";
$extra = "";
$isReadOnly = $false;
$isOutput = $false;
$seenEquals = $false;
for ($i = $fragment.FirstTokenIndex; $i -le $fragment.LastTokenIndex; $i++)
{
$token = $fragment.ScriptTokenStream[$i];
if ($token.TokenType -notin ("MultiLineComment", "SingleLineComment", "As"))
{
if ($fragmentType -eq "ProcedureParameter")
{
if ($token.TokenType -eq "Identifier" -and
($token.Text.ToUpper -in ("OUT", "OUTPUT", "READONLY"))
{
$extra = $token.Text.ToUpper();
if ($extra -eq "READONLY")
{
$isReadOnly = $true;
}
else
{
$isOutput = $true;
}
}
if (!$seenEquals)
{
if ($token.TokenType -eq "EqualsSign")
{
$seenEquals = $true;
}
else
{
if ($token.TokenType -eq "Variable")
{
$param += $token.Text;
}
else
{
if (!$isOutput -and !$isReadOnly)
{
$type += $token.Text;
}
}
}
}
else
{
if ($token.TokenType -ne "EqualsSign" -and !$isOutput -and !$isReadOnly)
{
$default += $token.Text;
}
}
}
else
{
$output += $token.Text.Trim();
}
}
}
if ($param.Length -gt 0) { $output = "`nParam name: " + $param.Trim(); }
if ($type.Length -gt 0) { $type = "`nParam type: " + $type.Trim(); }
if ($default.Length -gt 0) { $default = "`nDefault: " + $default.TrimStart(); }
if ($isReadOnly) { $extra = "`nRead Only: yes"; }
if ($isOutput) { $extra = "`nOutput: yes"; }
Write-Host $output $type $default $extra;
}
}
}
This code is for demonstration purposes only, and there is zero chance it is the most current. Please see details below about downloading a more recent version.
The output in this case:
ProcedureReference
==========================
dbo.some_procedure
Param name: @a
Param type: int
Default: 5
Param name: @b
Param type: varchar(64)
Default: 'AS = /* BEGIN @a, int = 7 */ "blat"'
Param name: @c
Param type: int
Default: 6
==========================
ProcedureReference
==========================
[dbo].another_procedure
Param name: @p1
Param type: [int]
Default: 1
Param name: @p2
Param type: datetime
Default: getdate
Output: yes
Param name: @p3
Param type: date
Default: {ts '2020-02-01 13:12:49'}
Param name: @p4
Param type: dbo.tabletype
Read Only: yes
Param name: @p5
Param type: geography
Output: yes
Param name: @p6
Param type: sysname
Default: N'学中'
That’s some pretty powerful parsing, even though there are some tedious edge cases and a lot of conditional logic. I’d love to see TSqlFragmentVisitor
expanded so some of its token types have additional properties (like SchemaObjectName.IsFirstAppearance
and ProcedureParameter.DefaultValue
), and see new token types added (like FunctionReference
). But even now, this is light years beyond a brute force parser you might write in any language, never mind T-SQL.
There are still a couple of limitations I haven’t addressed yet, though:
- This only addresses stored procedures. The code to handle all three types of user-defined functions is similar, but there is no handy
FunctionReference
fragment type, so instead you need to identify the firstSchemaObjectName
fragment (or the first set ofIdentifier
andDot
tokens) and ignore any subsequent instances. Currently the code in this post will return all the information about the parameters to a function, but it will not return the function’s name. Feel free to use it for singletons or batches containing only stored procedures, but you might find the output confusing for multiple mixed object types. The latest version in the repository below does handle functions perfectly fine. - This code doesn’t save state. Outputting to the console within each Visit is easy, but collecting the data from multiple visits, to then pipeline elsewhere, is a little more complex, primarily because of the way the Visitor pattern works.
- The code above can’t accept input directly. To simplify demonstration here, it’s just a raw script where you paste your T-SQL block as a constant. The eventual goal is to support input from a file, an array of files, a folder, an array of folders, or pulling module definitions from a database. And the output can be anywhere: to the console, to a file, to a database… so the sky’s the limit there. Some of that work has happened in the meantime, but none of that has been written in the simple version you see above.
- There is no error handling. Again, for brevity and ease of consumption, the code here does not worry about handling inevitable exceptions, though the most destructive thing that can happen in its present form is that a batch won’t appear in the output if it can’t be properly parsed (like
CREATE STUPID PROCEDURE dbo.whatever
). When we start using databases and/or the file system, proper error handling will become that much more important.
You might wonder, where am I going to maintain ongoing work on this and fix all of these things? Well, I have put it on GitHub, have tentatively called the project ParamParser, and already have contributors helping with improvements. The current version of the code already looks quite different from the sample above, and by the time you read this some of the limitations mentioned here might already be addressed. I only want to maintain the code in one place; this tip is more about showing a minimal sample of how it can work, and highlighting that there is a project out there dedicated to simplifying this task.
In the next segment, I'll talk more about how my friend and colleague, Will White, helped me get from the standalone script you see above to the much more powerful module you'll find on GitHub.
If you have the need to parse default values from parameters in the meantime, feel free to download the code and try it out. And as I suggested before, experiment on your own, because there are lots of other powerful things you can do with these classes and the Visitor pattern.