Itzik Ben-Gan

Fundamentals of table expressions, Part 9 – Views, compared with derived tables and CTEs

June 9, 2021 by in T-SQL Queries | 2 Comments
SentryOne eBooks

In these books, you will find useful, hand-picked articles that will help give insight into some of your most vexing performance problems. These articles were written by several of the SQL Server industry’s leading experts, including Paul White, Paul Randal, Jonathan Kehayias, Erin Stellato, Glenn Berry, Aaron Bertrand, and Joe Sack.

Free Download

Featured Author

Paul Randal, CEO of SQLskills, writes about knee-jerk performance tuning, DBCC, and SQL Server internals.

Paul’s Posts

This is the 9th part in a series about named table expressions. In Part 1 I provided the background to named table expressions, which include derived tables, common table expressions (CTEs), views and inline table valued functions (iTVFs). In Part 2, Part 3 and Part 4 I focused on derived tables. In Part 5, Part 6, Part 7 and Part 8 I focused on CTEs. As I explained, derived tables and CTEs are statement-scoped named table expressions. Once the statement that defines them finishes, they’re gone.

We are now ready to proceed to coverage of reusable named table expressions. That is, ones that are created as an object in the database, and stay there permanently unless dropped. As such, they are accessible and reusable to all who have the right permissions. Views and iTVFs fall into this category. The difference between the two is primarily that the former doesn’t support input parameters and the latter does.

In this article I start the coverage of views. Like I did before, I’ll first focus on logical, or conceptual, aspects, and at a later point proceed to optimization aspects. With the first article on views, I want to start light, focusing on what a view is, using correct terminology, and compare design considerations of views with those of the previously discussed derived tables and CTEs.

In my examples I’ll use a sample database called TSQLV5. You can find the script that creates and populates it here, and its ER diagram here.

What is a view?

As usual when discussing relational theory, we SQL practitioners are often told that the terminology we’re using is wrong. So, in this spirit, right off the bat, I’ll start by saying that when you use the term tables and views, it’s wrong. I’ve learned this from Chris Date.

Recall that a table is SQL’s counterpart to a relation (oversimplifying the discussion around values and variables a bit). A table could be a base table defined as an object in the database, or it could be a table returned by an expression—more specifically, a table expression. That’s similar to the fact that a relation could be one that is returned from a relational expression. A table expression could be a query.

Now, what is a view? It’s a named table expression, much like a CTE is a named table expression. It’s just that like I said, a view is a reusable named table expression that is created as an object in the database, and is accessible to those who have the right permissions. This is all to say, a view is a table. It’s not a base table, but a table nonetheless. So just like saying “a rectangle and a square” or “a whisky and a Lagavulin” would seem strange (unless you had too much Lagavulin!), using “tables and views” is as improper.

Syntax

Here’s the T-SQL syntax for a CREATE VIEW statement:

CREATE [ OR ALTER ] VIEW [ < schema name > . ] < table name > [ (< target columns >) ]
[ WITH < view attributes including SCHEMABINDING > ]
AS
  < table expression >
  [ WITH CHECK OPTION ]
  [ ; ]

The CREATE VIEW statement must be the first, and only, statement in the batch.

Note that the CREATE OR ALTER part was introduced in SQL Server 2016 SP1, so if you’re using an earlier version, you will need to work with separate CREATE VIEW and ALTER VIEW statements depending on whether the object already exists or not. As you probably well know, altering an existing object retains assigned permissions. That’s one of the reasons why usually it’s sensible to alter an existing object as opposed to dropping and recreating it. What catches some people by surprise is that altering a view doesn’t retain the existing view attributes; those need to be respecified if you want to retain them.

Here’s an example for a simple view definition representing USA customers:

USE TSQLV5;
GO

CREATE OR ALTER VIEW Sales.USACustomers
AS
  SELECT custid, companyname
  FROM Sales.Customers
  WHERE country = N'USA';
GO

And here’s a statement that queries the view:

SELECT custid, companyname
FROM Sales.USACustomers;

Between the statement that creates the view, and the statement that queries it, you will find the very same three elements that are involved in a statement against a derived table or a CTE:

  1. The inner table expression (the view’s inner query)
  2. The assigned table name (the view name)
  3. The statement with the outer query against the view

Those of you with a keen eye will have noticed that there are actually two table expressions involved here. There’s the inner one (the view’s inner query), and there’s the outer one (the query in the statement against the view). In the statement with the query against the view, the query itself is a table expression, and once you add the terminator, it becomes a statement. This might sound picky, but if you get this, and call things by their right names, it does reflect on your knowledge. And isn’t it great when you know that you know?

Also, all of the requirements from the table expression in derived tables and CTEs that we discussed earlier in the series, apply to the table expression that the view is based on. As a reminder, the requirements are:

  • All of the table expression’s columns must have names
  • All of the table expression’s column names must be unique
  • The table expression’s rows have no order

If you need to refresh your understanding of what’s behind these requirements, see the section “A table expression is a table” in Part 2 of the series. Make sure that you especially understand the "no order" part. As a brief reminder, a table expression is a table, and as such has no order. That’s why you cannot create a view based on a query with an ORDER BY clause, unless this clause is there to support a TOP or OFFSET-FETCH filter. And even with this exception allowing the inner query to have an ORDER BY clause, you want to remember that if the outer query against the view doesn’t have its own ORDER BY clause, you don’t get a guarantee that the query will return the rows in any particular order, never mind the observed behavior. This is super important to understand!

Nesting and multiple references

When discussing design considerations of derived tables and CTEs, I compared the two in terms of both nesting and multiple references. Now let’s see how views fare in these departments. I’ll start with nesting. For this purpose, we’ll compare code that returns years in which more than 70 customers placed orders using derived tables, CTEs and views. You already saw the code with derived tables and CTEs earlier in the series. Here’s the code that handles the task using derived tables:

SELECT orderyear, numcusts
FROM ( SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
       FROM ( SELECT YEAR(orderdate) AS orderyear, custid
              FROM Sales.Orders ) AS D1
       GROUP BY orderyear ) AS D2
WHERE numcusts > 70;

I pointed out that the main downside that I see with derived tables here is the fact that you nest derived table definitions, and this can lead to complexity in understanding, maintaining and troubleshooting such code.

Here’s the code that handles the same task using CTEs:

WITH C1 AS ( SELECT YEAR(orderdate) AS orderyear, custid FROM Sales.Orders ), C2 AS ( SELECT orderyear, COUNT(DISTINCT custid) AS numcusts FROM C1 GROUP BY orderyear ) SELECT orderyear, numcusts FROM C2 WHERE numcusts > 70;

I pointed out that to me this feels like much clearer code due to the lack of nesting. You can see each step in the solution from start to end separately in its own unit, with the solution’s logic flowing clearly from top to bottom. Therefore, I see the CTE option as an improvement over derived tables in this respect.

Now to views. Remember, one of the main benefits of views is reusability. You can also control access permissions. The development of the units involved is a bit more similar to CTEs in the sense that you can focus your attention on one unit at a time from start to end. Moreover, you have the flexibility to decide whether to create a separate view per unit in the solution, or, perhaps just one view based on a query involving statement-scoped named table expressions.

You would go with the former when each of the units needs to be reusable. Here’s the code you would use in such a case, creating three views:

-- Sales.OrderYears
CREATE OR ALTER VIEW Sales.OrderYears
AS 
  SELECT YEAR(orderdate) AS orderyear, custid
  FROM Sales.Orders;
GO

-- Sales.YearlyCustCounts
CREATE OR ALTER VIEW Sales.YearlyCustCounts
AS
  SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
  FROM Sales.OrderYears
  GROUP BY orderyear;
GO

-- Sales.YearlyCustCountsMin70
CREATE OR ALTER VIEW Sales.YearlyCustCountsAbove70
AS
  SELECT orderyear, numcusts
  FROM Sales.YearlyCustCounts
  WHERE numcusts > 70;
GO

You can query each of the views independently, but here’s the code you would use to return what the original task was after.

SELECT orderyear, numcusts
FROM Sales.YearlyCustCountsAbove70;

If there’s a reusability requirement only for the outermost part (what the original task required), there’s no real need to develop three different views. You could create one view based on a query involving CTEs or derived tables. Here’s how you would do that with a query involving CTEs:

CREATE OR ALTER VIEW Sales.YearlyCustCountsAbove70
AS
  WITH C1 AS
  (
    SELECT YEAR(orderdate) AS orderyear, custid
    FROM Sales.Orders
  ),
  C2 AS
  (
    SELECT orderyear, COUNT(DISTINCT custid) AS numcusts
    FROM C1
    GROUP BY orderyear
  )
  SELECT orderyear, numcusts
  FROM C2
  WHERE numcusts > 70;
GO

By the way, if it wasn’t obvious, the CTEs that the view’s inner query is based on can be recursive.

Let’s proceed to cases where you need multiple references to the same table expression from the outer query. The task for this example is to compute the yearly order count per year, and compare the count in each year with the preceding year. The easiest way to achieve this is actually to use the LAG window function, but we’ll use a join between two instances of a table expression representing yearly order counts just to compare a multi-reference case among the three tools.

This is the code that we used earlier in the series to handle the task with derived tables:

SELECT CUR.orderyear, CUR.numorders,
  CUR.numorders - PRV.numorders AS diff
FROM ( SELECT YEAR(orderdate) AS orderyear, COUNT(*) AS numorders
       FROM Sales.Orders
       GROUP BY YEAR(orderdate) ) AS CUR
  LEFT OUTER JOIN
     ( SELECT YEAR(orderdate) AS orderyear, COUNT(*) AS numorders
       FROM Sales.Orders
       GROUP BY YEAR(orderdate) ) AS PRV
    ON CUR.orderyear = PRV.orderyear + 1;

There’s a very clear downside here. You have to repeat the definition of the table expression twice. You are essentially defining two named table expressions based on the same query code.

Here’s the code that handles the same task using CTEs:

WITH OrdCount AS
(
  SELECT YEAR(orderdate) AS orderyear, COUNT(*) AS numorders
  FROM Sales.Orders
  GROUP BY YEAR(orderdate)
)
SELECT CUR.orderyear, CUR.numorders,
  CUR.numorders - PRV.numorders AS diff
FROM OrdCount AS CUR
  LEFT OUTER JOIN OrdCount AS PRV
    ON CUR.orderyear = PRV.orderyear + 1;

There’s a clear advantage here; you define only one named table expression based on a single instance of the inner query, and refer to it twice from the outer query.

Views are more similar to CTEs in this sense. You define only one view based on only one copy of the query, like so:

CREATE OR ALTER VIEW Sales.YearlyOrderCounts
AS
  SELECT YEAR(orderdate) AS orderyear, COUNT(*) AS numorders
  FROM Sales.Orders
  GROUP BY YEAR(orderdate);
GO

But better than with CTEs, you’re not limited to reusing the named table expression only in the outer statement. You can reuse the view name any number of times that you like, with any number of unrelated queries, so long as you have the right permissions. Here’s the code to achieve the task by using multiple references to the view:

SELECT CUR.orderyear, CUR.numorders,
  CUR.numorders - PRV.numorders AS diff
FROM Sales.YearlyOrderCounts AS CUR
  LEFT OUTER JOIN Sales.YearlyOrderCounts AS PRV
    ON CUR.orderyear = PRV.orderyear + 1;

It seems like views are more similar to CTEs than to derived tables, with the extra functionality of being a more reusable tool, with the ability to control permissions. Or to turn it around, it’s probably appropriate to think of a CTE as a statement-scoped view. Now what could be really wonderful is if we also had a named table expression with a wider scope than that of a CTE, narrower than that of a view. For instance, wouldn’t it have been great if we had a session-level scoped named table expression?

Summary

I love this topic. There’s so much in table expressions that is rooted in relational theory, which in turn is rooted in mathematics. I love knowing what the right terms for things are, and generally making sure that I have the foundations figured out carefully, even if to some it might seem like being picky and over pedantic. Looking back at my learning process over the years, I can see a very clear path between insisting on a good grasp of foundations, using correct terminology, and really knowing your stuff later when it gets to the far more advanced and complex stuff.

So, what are the critical pieces when it comes to views?

  • A view is a table.
  • It is a table that is derived from a query (a table expression).
  • It is given a name that to the user appears like a table name, since it is a table name.
  • It is created as a permanent object in the database.
  • You can control access permissions against the view.

Views are similar to CTEs in a number of ways. In the sense that you develop your solutions in a modular way, focusing on one unit at a time from start to end. Also in the sense that you can have multiple references to the view name from the outer query. But better than CTEs, views are not limited only to the outer statement’s scope, rather are reusable until dropped from the database.

There’s a lot more to say about views, and I’ll continue the discussion next month. In the meanwhile, I want to leave you with a thought. With derived tables and CTEs you could make a case in favor of SELECT * in an inner query. See the case I made for it in Part 3 in the series for details. Could you make a similar case with views, or is it a bad idea with those?