Follow-up #1 on leading wildcard seeks

Follow-up #1 on leading wildcard seeks In my last post, "One way to get an index seek for a leading wildcard," I mentioned that you would need triggers to deal with maintaining the fragments I recommended. A couple of people have contacted me to ask if I could demonstrate those triggers.

To simplify from the previous post, let's assume we have the following tables – a set of Companies, and then a CompanyNameFragments table that allows for pseudo-wildcard searching against any substring of the company name:

CREATE TABLE dbo.Companies
(
  CompanyID  int CONSTRAINT PK_Companies PRIMARY KEY,
  Name       nvarchar(100) NOT NULL
);
GO

CREATE TABLE dbo.CompanyNameFragments
(
  CompanyID int NOT NULL,
  Fragment  nvarchar(100) NOT NULL
);

CREATE CLUSTERED INDEX CIX_CNF ON dbo.CompanyNameFragments(Fragment, CompanyID);

Given this function for generating fragments (the only change from the original article is I've increased @input to support 100 characters):

CREATE FUNCTION dbo.CreateStringFragments( @input nvarchar(100) )
RETURNS TABLE WITH SCHEMABINDING
AS
  RETURN 
  (
    WITH x(x) AS 
    (
      SELECT 1 UNION ALL SELECT x+1 FROM x WHERE x < (LEN(@input))
    )
    SELECT Fragment = SUBSTRING(@input, x, LEN(@input)) FROM x
  );
GO

We can create a single trigger that can handle all three operations:

CREATE TRIGGER dbo.Company_MaintainFragments
ON dbo.Companies
FOR INSERT, UPDATE, DELETE
AS
BEGIN
  SET NOCOUNT ON;

  DELETE f FROM dbo.CompanyNameFragments AS f
    INNER JOIN deleted AS d 
    ON f.CompanyID = d.CompanyID;

  INSERT dbo.CompanyNameFragments(CompanyID, Fragment)
    SELECT i.CompanyID, fn.Fragment
    FROM inserted AS i 
    CROSS APPLY dbo.CreateStringFragments(i.Name) AS fn;
END
GO

This works without any checking for which type of operation happened because:

For an UPDATE or a DELETE, the DELETE will happen - for an UPDATE, we are not going to bother trying to match fragments that will remain the same; we're just going to blow them all away, so they can be replaced en masse. For an INSERT, the DELETE statement will have no effect, because there will be no rows in deleted.
For an INSERT or an UPDATE, the INSERT will happen. For a DELETE, the INSERT statement will have no effect, because there will be no rows in inserted.

Now, just to make sure it works, let's perform some changes to the Companies table and then inspect our two tables.

-- First, let's insert two companies 
-- (table contents after insert shown in figure 1 below)

INSERT dbo.Companies(Name) VALUES(N'Banana'), (N'Acme Corp');

-- Now, let's update company 2 to 'Orange' 
-- (table contents after update shown in figure 2 below):

UPDATE dbo.Companies SET Name = N'Orange' WHERE CompanyID = 2;

-- Finally, delete company #1 
-- (table contents after delete shown in figure 3 below):

DELETE dbo.Companies WHERE CompanyID = 1;

Figure 1: Initial table contents

Figure 2: Table contents after update

Figure 3: Table contents after delete

Caveat (for the referential integrity folks)

Note that if you set up proper foreign keys between these two tables, you'll have to use an instead of trigger to handle deletes, otherwise you'll have a chicken and egg problem - you can't wait until *after* the parent row is deleted to remove the child rows. So you would have to set up ON DELETE CASCADE (which I personally don't tend to like), or your two triggers would look like this (the after trigger would still have to perform a DELETE/INSERT pair in the case of an UPDATE):

CREATE TRIGGER dbo.Company_DeleteFragments
ON dbo.Companies
INSTEAD OF DELETE
AS
BEGIN
  SET NOCOUNT ON;

  DELETE f FROM dbo.CompanyNameFragments AS f
    INNER JOIN deleted AS d
    ON f.CompanyID = d.CompanyID;

  DELETE c FROM dbo.Companies AS c
    INNER JOIN deleted AS d
    ON c.CompanyID = d.CompanyID;
END
GO

CREATE TRIGGER dbo.Company_MaintainFragments
ON dbo.Companies
FOR INSERT, UPDATE
AS
BEGIN
  SET NOCOUNT ON;

  DELETE f FROM dbo.CompanyNameFragments AS f
    INNER JOIN deleted AS d
    ON f.CompanyID = d.CompanyID;

  INSERT dbo.CompanyNameFragments(CompanyID, Fragment)
    SELECT i.CompanyID, fn.Fragment
    FROM inserted AS i
    CROSS APPLY dbo.CreateStringFragments(i.Name) AS fn;
END
GO

Summary

This post was aimed at showing how easy it is to set up triggers that will maintain seekable string fragments to improve wildcard searches, at least for moderately-sized strings. Now, I still know that this kind of comes across as a wacky idea, but I keep talking about it because I am convinced there are good use cases out there.

In my next post, I will show how to see the impact of this choice: You can easily set up representative workloads to compare the resource costs of maintaining the fragments against the performance savings at query time. I will look at varying string lengths as well as different workload balances (mostly read vs. mostly write) and try to find sweet spots and danger zones.

12 thoughts on “Follow-up #1 on leading wildcard seeks”

Yavor says:

February 9, 2017 at 9:30 AM

Thanks for following up on it Aaron! However, I am still a bigger fan on the reverse function with computed column, seems just easier to maintain.

Aaron Bertrand says:

February 9, 2017 at 12:36 PM

Yeah, that solution is great for '%this search', but not for '%this search%'.

FABRICIO LIMA says:

February 13, 2017 at 12:41 PM

HI Aaron,

Great post.

If someone use Windows collation (Brazil Default Collation), we can improve the performance of a query with ‘%this search%’ changing the Collation at the execution time in a where clause.

I did one test with great results here:
https://www.fabriciolima.net/blog/2017/02/08/improve-the-query-performance-with-like-string-changing-only-the-collation/

Thanks for all this information.

Thomas Franz says:

March 22, 2017 at 10:29 AM

I would suggest to use an IF UPDATE(name) block in the trigger, so that the INSERT / DELETE of the fragments occurs only, when the name-column is changed (in a real table there are a lot of columns and a company name will be changed very seldom).

Thomas Franz says:

March 22, 2017 at 11:11 AM

BTW: you can save a little bit CPU (about 7 % for 1.8 mio rows plus about 10 % TempDB-IO) , if you modify the function to calculate the input len only once:

[sql]
CREATE FUNCTION dbo.CreateStringFragments( @input nvarchar(100) )
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
WITH x(x, L) AS (SELECT 1 x, LEN(@input) L UNION ALL SELECT x+1, L FROM x WHERE x < L)
SELECT Fragment = SUBSTRING(@input, x, L) FROM x
);
[/sql]

david berlinger says:

March 22, 2017 at 1:23 PM

the storage requirement for this index is huge. given that there is no problem with a text index like a* , why are we going insane trying to find *a?
make a second table containing the reversed the data and search for the reverse string in the reversed table.
i.e :tbl data "this is my data" make a table with the reversed data "atad ym si siht" also with a text index.
look for da* in the primary table and for *ta as at* in the reverse table.
a simple set of triggers could ensure that any changes to the primary table are reflected in the reversed table.

david berlinger says:

March 22, 2017 at 1:32 PM

actually now that I think about it, why not make an even simpler solution. add a computed field to the table as the sql function reverse(string field) and add a text index on the computed column. search for *abc in the reverse computed column as cba*

Aaron Bertrand says:

March 22, 2017 at 1:58 PM

Because reverse works for flipping LIKE '%a' but it doesn't help at all with LIKE '%a%'. This isn't *just* about leading wildcards, but since we all know trailing wildcards only aren't a problem…

Aaron Bertrand says:

March 22, 2017 at 1:59 PM

Yes, please see the all comments on the original post. Again, REVERSE() doesn't solve everything that this solution solves.

Aaron Bertrand says:

March 22, 2017 at 3:39 PM

IF (UPDATE) doesn't tell you that, though. It tells you that name was in the SET list, but not that the value actually changed. You need to do a manual comparison, and do it for each row, if you want to optimize that way.

Thomas Franz says:

March 22, 2017 at 3:52 PM

of course. But when you have a procedure that runs an
UPDATE dbo.Companies SET last_contact_date = GetDate() where ID = @ID;
this procedure does not need to drop / recreate the fragments. The same is true when the DBA adds a new column and uses an UPDATE to initialize it.

I know, there are too many frameworks that reads the whole record (all columns), show some of them in the GUI and executes an UPDATE to all columns after the customer clicked on the save button, but there are usually very much other opportunities that are more specific.

Aaron Bertrand says:

March 22, 2017 at 3:53 PM

Yep, plenty of places for optimization when needed. Just wanted the basic struct without it becoming a thesis. :-)

Comments are closed.