The very first blog post on this site, way back in July of 2012, talked about best approaches for running totals. Since then, I've been asked on multiple occasions how I would approach the problem if the running totals were more complex – specifically, if I needed to calculate running totals for multiple entities – say, each customer's orders.

The original example used a fictitious case of a city issuing speeding tickets; the running total was simply aggregating and keeping a running count of the number of speeding tickets by day (regardless of who the ticket was issued to or how much it was for). A more complex (but practical) example might be aggregating the running total value of speeding tickets, grouped by driver's license, per day. Let's imagine the following table:

CREATE TABLE dbo.SpeedingTickets ( IncidentID INT IDENTITY(1,1) PRIMARY KEY, LicenseNumber INT NOT NULL, IncidentDate DATE NOT NULL, TicketAmount DECIMAL(7,2) NOT NULL ); CREATE UNIQUE INDEX x ON dbo.SpeedingTickets(LicenseNumber, IncidentDate) INCLUDE(TicketAmount);

*You might ask, DECIMAL(7,2), really? How fast are these people going? Well, in Canada, for example, it's not all that hard to get a $10,000 speeding fine.*

Now, let's populate the table with some sample data. I won't get into all of the specifics here, but this should produce about 6,000 rows representing multiple drivers and multiple ticket amounts over a month-long period:

;WITH TicketAmounts(ID,Value) AS ( -- 10 arbitrary ticket amounts SELECT i,p FROM ( VALUES(1,32.75),(2,75), (3,109),(4,175),(5,295), (6,68.50),(7,125),(8,145),(9,199),(10,250) ) AS v(i,p) ), LicenseNumbers(LicenseNumber,[newid]) AS ( -- 1000 random license numbers SELECT TOP (1000) 7000000 + number, n = NEWID() FROM [master].dbo.spt_values WHERE number BETWEEN 1 AND 999999 ORDER BY n ), JanuaryDates([day]) AS ( -- every day in January 2014 SELECT TOP (31) DATEADD(DAY, number, '20140101') FROM [master].dbo.spt_values WHERE [type] = N'P' ORDER BY number ), Tickets(LicenseNumber,[day],s) AS ( -- match *some* licenses to days they got tickets SELECT DISTINCT l.LicenseNumber, d.[day], s = RTRIM(l.LicenseNumber) FROM LicenseNumbers AS l CROSS JOIN JanuaryDates AS d WHERE CHECKSUM(NEWID()) % 100 = l.LicenseNumber % 100 AND (RTRIM(l.LicenseNumber) LIKE '%' + RIGHT(CONVERT(CHAR(8), d.[day], 112),1) + '%') OR (RTRIM(l.LicenseNumber+1) LIKE '%' + RIGHT(CONVERT(CHAR(8), d.[day], 112),1) + '%') ) INSERT dbo.SpeedingTickets(LicenseNumber,IncidentDate,TicketAmount) SELECT t.LicenseNumber, t.[day], ta.Value FROM Tickets AS t INNER JOIN TicketAmounts AS ta ON ta.ID = CONVERT(INT,RIGHT(t.s,1))-CONVERT(INT,LEFT(RIGHT(t.s,2),1)) ORDER BY t.[day], t.LicenseNumber;

This might seem a little too involved, but one of the biggest challenges I often have when composing these blog posts is constructing a suitable amount of realistic "random" / arbitrary data. If you have a better method for arbitrary data population, by all means, don't use my mumblings as an example – they're peripheral to the point of this post.

### Approaches

There are various ways to solve this problem in T-SQL. Here are seven approaches, along with their associated plans. I've left out techniques like cursors (because they will be undeniably slower) and date-based recursive CTEs (because they depend on contiguous days).

#### Subquery #1

SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = TicketAmount + COALESCE( ( SELECT SUM(TicketAmount) FROM dbo.SpeedingTickets AS s WHERE s.LicenseNumber = o.LicenseNumber AND s.IncidentDate < o.IncidentDate ), 0) FROM dbo.SpeedingTickets AS o ORDER BY LicenseNumber, IncidentDate;

#### Subquery #2

SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = ( SELECT SUM(TicketAmount) FROM dbo.SpeedingTickets WHERE LicenseNumber = t.LicenseNumber AND IncidentDate <= t.IncidentDate ) FROM dbo.SpeedingTickets AS t ORDER BY LicenseNumber, IncidentDate;

#### Self-join

SELECT t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount, RunningTotal = SUM(t2.TicketAmount) FROM dbo.SpeedingTickets AS t1 INNER JOIN dbo.SpeedingTickets AS t2 ON t1.LicenseNumber = t2.LicenseNumber AND t1.IncidentDate >= t2.IncidentDate GROUP BY t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount ORDER BY t1.LicenseNumber, t1.IncidentDate;

#### Outer apply

SELECT t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount, RunningTotal = SUM(t2.TicketAmount) FROM dbo.SpeedingTickets AS t1 OUTER APPLY ( SELECT TicketAmount FROM dbo.SpeedingTickets WHERE LicenseNumber = t1.LicenseNumber AND IncidentDate <= t1.IncidentDate ) AS t2 GROUP BY t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount ORDER BY t1.LicenseNumber, t1.IncidentDate;

#### SUM OVER() using RANGE (2012+ only)

SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = SUM(TicketAmount) OVER ( PARTITION BY LicenseNumber ORDER BY IncidentDate RANGE UNBOUNDED PRECEDING ) FROM dbo.SpeedingTickets ORDER BY LicenseNumber, IncidentDate;

*Plan for SUM OVER() using RANGE*

#### SUM OVER() using ROWS (2012+ only)

SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = SUM(TicketAmount) OVER ( PARTITION BY LicenseNumber ORDER BY IncidentDate ROWS UNBOUNDED PRECEDING ) FROM dbo.SpeedingTickets ORDER BY LicenseNumber, IncidentDate;

*Plan for SUM OVER() using ROWS*

#### Set-based iteration

With credit to Hugo Kornelis (@Hugo_Kornelis) for Chapter #4 in SQL Server MVP Deep Dives Volume #1, this approach combines a set-based approach and a cursor approach.

DECLARE @x TABLE ( LicenseNumber INT NOT NULL, IncidentDate DATE NOT NULL, TicketAmount DECIMAL(7,2) NOT NULL, RunningTotal DECIMAL(7,2) NOT NULL, rn INT NOT NULL, PRIMARY KEY(LicenseNumber, IncidentDate) ); INSERT @x(LicenseNumber, IncidentDate, TicketAmount, RunningTotal, rn) SELECT LicenseNumber, IncidentDate, TicketAmount, TicketAmount, ROW_NUMBER() OVER (PARTITION BY LicenseNumber ORDER BY IncidentDate) FROM dbo.SpeedingTickets; DECLARE @rn INT = 1, @rc INT = 1; WHILE @rc > 0 BEGIN SET @rn += 1; UPDATE [current] SET RunningTotal = [last].RunningTotal + [current].TicketAmount FROM @x AS [current] INNER JOIN @x AS [last] ON [current].LicenseNumber = [last].LicenseNumber AND [last].rn = @rn - 1 WHERE [current].rn = @rn; SET @rc = @@ROWCOUNT; END SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal FROM @x ORDER BY LicenseNumber, IncidentDate;

Due to its nature, this approach produces many identical plans in the process of updating the table variable, all of which are similar to the self-join and outer apply plans, but are able to use a seek:

*One of many UPDATE plans produced through set-based iteration*

The only difference between each plan in each iteration is the row count. Through each successive iteration, the number of rows affected should stay the same or go down, since the number of rows affected at each iteration represents the number of drivers with tickets on that number of days (or, more precisely, the number of days at that "rank").

### Performance Results

Here are how the approaches stacked up, as shown by SQL Sentry Plan Explorer, with the exception of the set-based iteration approach which, because it consists of many individual statements, does not represent well when compared to the rest.

*Plan Explorer runtime metrics for six of the seven approaches*

In addition to reviewing the plans and comparing runtime metrics in Plan Explorer, I also measured raw runtime in Management Studio. Here are the results of running each query 10 times, keeping in mind that this also includes render time in SSMS:

*Runtime duration, in milliseconds, for all seven approaches (10 iterations)*

So, if you are on SQL Server 2012 or better, the best approach seems to be `SUM OVER()`

using `ROWS UNBOUNDED PRECEDING`

. If you are not on SQL Server 2012, the second subquery approach seemed to be optimal in terms of runtime, in spite of the high number of reads compared to, say, the `OUTER APPLY`

query. In all cases, of course, you should test these approaches, adapted to your schema, against your own system. Your data, indexes, and other factors may lead to a different solution being most optimal in your environment.

### Other Complexities

Now, the unique index signifies that any LicenseNumber + IncidentDate combination will contain a single cumulative total, in the event where a specific driver gets multiple tickets on any given day. This business rule helps simplify our logic a little bit, avoiding the need for a tie-breaker to produce deterministic running totals.

If you do have cases where you may have multiple rows for any given LicenseNumber + IncidentDate combination, you can break the tie using another column that helps make the combination unique (obviously the source table would no longer have a unique constraint on those two columns). Note that this is possible even in cases where the `DATE`

column is actually `DATETIME`

– many people assume that date/time values are unique, but this is certainly not always guaranteed, regardless of granularity.

In my case, I could use the `IDENTITY`

column, `IncidentID`

; here is how I would adjust each solution (acknowledging that there may be better ways; just throwing out ideas):

/* --------- subquery #1 --------- */ SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = TicketAmount + COALESCE( ( SELECT SUM(TicketAmount) FROM dbo.SpeedingTickets AS s WHERE s.LicenseNumber = o.LicenseNumber AND (s.IncidentDate < o.IncidentDate -- added this line: OR (s.IncidentDate = o.IncidentDate AND s.IncidentID < o.IncidentID)) ), 0) FROM dbo.SpeedingTickets AS o ORDER BY LicenseNumber, IncidentDate; /* --------- subquery #2 --------- */ SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = ( SELECT SUM(TicketAmount) FROM dbo.SpeedingTickets WHERE LicenseNumber = t.LicenseNumber AND IncidentDate <= t.IncidentDate -- added this line: AND IncidentID <= t.IncidentID ) FROM dbo.SpeedingTickets AS t ORDER BY LicenseNumber, IncidentDate; /* --------- self-join --------- */ SELECT t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount, RunningTotal = SUM(t2.TicketAmount) FROM dbo.SpeedingTickets AS t1 INNER JOIN dbo.SpeedingTickets AS t2 ON t1.LicenseNumber = t2.LicenseNumber AND t1.IncidentDate >= t2.IncidentDate -- added this line: AND t1.IncidentID >= t2.IncidentID GROUP BY t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount ORDER BY t1.LicenseNumber, t1.IncidentDate; /* --------- outer apply --------- */ SELECT t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount, RunningTotal = SUM(t2.TicketAmount) FROM dbo.SpeedingTickets AS t1 OUTER APPLY ( SELECT TicketAmount FROM dbo.SpeedingTickets WHERE LicenseNumber = t1.LicenseNumber AND IncidentDate <= t1.IncidentDate -- added this line: AND IncidentID <= t1.IncidentID ) AS t2 GROUP BY t1.LicenseNumber, t1.IncidentDate, t1.TicketAmount ORDER BY t1.LicenseNumber, t1.IncidentDate; /* --------- SUM() OVER using RANGE --------- */ SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = SUM(TicketAmount) OVER ( PARTITION BY LicenseNumber ORDER BY IncidentDate, IncidentID RANGE UNBOUNDED PRECEDING -- added this column ^^^^^^^^^^^^ ) FROM dbo.SpeedingTickets ORDER BY LicenseNumber, IncidentDate; /* --------- SUM() OVER using ROWS --------- */ SELECT LicenseNumber, IncidentDate, TicketAmount, RunningTotal = SUM(TicketAmount) OVER ( PARTITION BY LicenseNumber ORDER BY IncidentDate, IncidentID ROWS UNBOUNDED PRECEDING -- added this column ^^^^^^^^^^^^ ) FROM dbo.SpeedingTickets ORDER BY LicenseNumber, IncidentDate; /* --------- set-based iteration --------- */ DECLARE @x TABLE ( -- added this column, and made it the PK: IncidentID INT PRIMARY KEY, LicenseNumber INT NOT NULL, IncidentDate DATE NOT NULL, TicketAmount DECIMAL(7,2) NOT NULL, RunningTotal DECIMAL(7,2) NOT NULL, rn INT NOT NULL ); -- added the additional column to the INSERT/SELECT: INSERT @x(IncidentID, LicenseNumber, IncidentDate, TicketAmount, RunningTotal, rn) SELECT IncidentID, LicenseNumber, IncidentDate, TicketAmount, TicketAmount, ROW_NUMBER() OVER (PARTITION BY LicenseNumber ORDER BY IncidentDate, IncidentID) -- and added this tie-breaker column ------------------------------^^^^^^^^^^^^ FROM dbo.SpeedingTickets; -- the rest of the set-based iteration solution remained unchanged

Another complication you may come across is when you're not after the whole table, but rather a subset (say, in this case, the first week of January). You'll have to make adjustments by adding `WHERE`

clauses, and keep those predicates in mind when you have correlated subqueries as well.

VERY NICE SOULUTION

IT WORKED FOR ME

GREAT ………

THANX

Best solution presentation I have ever seen! WOW!!

clear, concise, easy to follow, multiple options, and performance metrics to boot.

Do More!!! This is the Gold Standard for how to instruct other users.

SUM OVER() using ROWS works beautifully when applied to presentation of customer statements with a running balance per branch.

How to use the window function so that the partitioning happens on unsorted dataset? It by default sorts the partitioned column and then applies the aggregation. My requirement is to do regular Cumulative Average as I would do in e.g. Google Sheets at the link below:

https://docs.google.com/spreadsheets/d/15XOfaJTj00VZJ2OyWOSKIW8QdXwdizL-tNUf6Tyj-ks/edit#gid=0

What do you think is computational complexity of Query2? I see O(n^2) so it may not work well for large tables

I didn't test at larger scale, but I would probably lean toward aggregating over smaller sets on a larger table anyway, as opposed to just aggregating over the whole thing. Segmented by primary key, clustering key, partitioning key, etc. could help. If you have a billion rows what use would you have for outputting running totals over the whole billion rows?