Skip to content

The SQL Over() function

Introduction

The SQL OVER() function is a powerful tool for performing complex calculations across a set of table rows related to the current row. It is commonly used for computing moving averages, cumulative sums, running totals, and other statistics.

Data Example

To illustrate the concepts of OVER, let’s use the following SQL data example:

CREATE TABLE personnages (
name VARCHAR(50) NOT NULL,
type VARCHAR(50) NOT NULL,
strength INT NOT NULL
);
INSERT INTO personnages (name, type, strength) VALUES
('Asterix', 'Gaulois', 100),
('Obelix', 'Gaulois', 100),
('Cesar', 'Romain', 20),
('Brutus', 'Romain', 10),
('Ziguepus', 'Romain', 5),
('Panoramix', 'Gaulois', 50),
('Assurancetourix', 'Gaulois', 20),
('Abraracourcix', 'Gaulois', 80);

This table personnages contains characters from the famous comic series, categorized into two types: ‘Gaulois’ and ‘Romain’, with varying strengths. We will use this data to demonstrate how the SQL OVER() function works with partitions and frames.

Note: Tests are conducted using the Docker image of SQL Server 2022 (mcr.microsoft.com/mssql/server:2022-latest).

Basic Concepts: Partitions and Frames

Partitions and frames are essential when working with the OVER function. They can work together to define subsets of rows within a table, allowing you to perform calculations on specific groups of data.

Partitions

A partition is a subset of a table’s rows, grouped based on the values of a specific column.

In Our Example

There are two partitions: ‘Gaulois’ and ‘Romain’.

Windows (or Frames)

A window (or frame) is a dynamic subset of rows, defined by a specific number of preceding and following rows.

Types of Frames

  • Unbounded Frame: Includes all preceding and following rows.
  • Unbounded Preceding: Includes all preceding rows.
  • Unbounded Following: Includes all following rows.
  • Bounded Frame: Includes a specific number of preceding and following rows.

At each iteration, the frame adjusts based on the current row.

In Our Example

Let’s consider a frame with 2 preceding rows and 1 following row.

IterationCurrent RowFrame (Rows)
1AsterixAsterix, Obelix
2ObelixAsterix, Obelix, Panoramix
3PanoramixAsterix, Obelix, Panoramix, Assurancetourix
4AssurancetourixObelix, Panoramix, Assurancetourix, Abraracourcix
5AbraracourcixPanoramix, Assurancetourix, Abraracourcix, Cesar
6CesarAssurancetourix, Abraracourcix, Cesar, Brutus
7BrutusAbraracourcix, Cesar, Brutus, Ziguepus
8ZiguepusCesar, Brutus, Ziguepus

The SQL OVER Clause

The OVER clause defines a window of rows for a query result set. Rows can be partitioned and ordered before applying the window function. This clause is used with functions to compute values like moving averages, cumulative sums, running totals, statistics, and plenty of other useful use cases.

Some Examples

Simple OVER

The simple OVER() allows applying a function to the entire dataset acting as an unbounded window.

SELECT name,
type,
strength,
SUM(strength) OVER () AS sum_strength
FROM personnages;

With the output:

nametypestrengthsum_strength
AsterixGaulois100385
ObelixGaulois100385
CesarRomain20385
BrutusRomain10385
ZiguepusRomain5385
PanoramixGaulois50385
AssurancetourixGaulois20385
AbraracourcixGaulois80385

OVER with ORDER BY

Unbounded

Let’s now consider the OVER clause with an ORDER BY clause summing the strength row by row.

SELECT name,
type,
strength,
SUM(strength) OVER (ORDER BY type ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_force
FROM personnages;

With the output:

nametypestrengthcumulative_force
AsterixGaulois100100
ObelixGaulois100200
PanoramixGaulois50250
AssurancetourixGaulois20270
AbraracourcixGaulois80350
CesarRomain20370
BrutusRomain10380
ZiguepusRomain5385
Bounded

But it can also be bounded to sum on a window of 2 rows before and 1 row after the current row.

SELECT name,
type,
strength,
SUM(strength) OVER (ORDER BY type ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS windows_total_force
FROM personnages;

With the output:

nametypestrengthwindows_total_force
AsterixGaulois100200
ObelixGaulois100250
PanoramixGaulois50270
AssurancetourixGaulois20250
AbraracourcixGaulois80170
CesarRomain20130
BrutusRomain10115
ZiguepusRomain535

OVER with a Partition

SELECT name,
type,
strength,
SUM(strength) OVER (ORDER BY type) AS total_strength_by_type
FROM personnages;

With the output:

nametypestrengthtotal_strength_by_type
AsterixGaulois100350
ObelixGaulois100350
PanoramixGaulois50350
AssurancetourixGaulois20350
AbraracourcixGaulois80350
CesarRomain2035
BrutusRomain1035
ZiguepusRomain535

OVER with a Partition and an ORDER BY

SELECT name,
type,
strength,
SUM(strength) OVER (PARTITION BY type ORDER BY name) AS total_strength_by_type_ordered
FROM personnages;

With the output:

nametypestrengthtotal_strength_by_type_ordered
AbraracourcixGaulois8080
AssurancetourixGaulois20100
AsterixGaulois100200
ObelixGaulois100300
PanoramixGaulois50350
BrutusRomain1010
CesarRomain2030
ZiguepusRomain535

Subtleties

Some behaviors can be surprising when using the OVER clause. Let’s take the following example where the Asterix character is duplicated in the table:

SELECT name,
type,
strength,
SUM(strength) OVER (PARTITION BY type ORDER BY name) AS total_strength_by_type_ordered
FROM personnages;

With the output:

linenametypestrengthtotal_strength_by_type_ordered
1AbraracourcixGaulois8080
2AssurancetourixGaulois20100
3AsterixGaulois100300
4AsterixGaulois100300
5ObelixGaulois100400
6PanoramixGaulois50450
7BrutusRomain1010
8CesarRomain2030
9ZiguepusRomain535

This behavior can be surprising and should be taken into account when using the OVER clause, as lines 3 and 4 provide the same aggregation due to the duplicate entry. But this is not the case with an ORDER BY clause:

SELECT name,
type,
strength,
SUM(strength) OVER (PARTITION BY type ORDER BY name ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS total_strength_by_type_ordered
FROM personnages;

With the output:

nametypestrengthtotal_strength_by_type_ordered
AbraracourcixGaulois80100
AssurancetourixGaulois20200
AsterixGaulois100300
AsterixGaulois100320
ObelixGaulois100350
PanoramixGaulois50250
BrutusRomain1030
CesarRomain2035
ZiguepusRomain535

Interesting Use Cases

The OVER clause can be used with a variety of functions to perform complex calculations on the result set. Here are some examples:

These functions can be used to calculate the cumulative distribution, the percentage rank, the lag value, or the first value of a partition, among other use cases.

Conclusion

Understanding the OVER function is fundamental to performing complex calculations on a result set. By defining partitions and frames, you can compute various statistics, moving averages, cumulative sums, and other valuable insights without doing a subquery or multiple SQL calls.