Database

Window Function

Examples and the reason to use window function in SQL Server

  • #sql
  • #window

In SQL, window function performs calculation across a set of rows, that are somehow related to the current row without aggregating the data.

AVG, MIN and MAX

Here are a few straightforward examples using AVG, MIN and MAX.

SELECT
    id,
    name,
    category,
    AVG(price) OVER(),
    MIN(price) OVER(),
    MAX(price) OVER()
FROM books;

The output will yield the following with three new columns with the same value.

idnamecategoryavg(price)min(price)max(price)
1book1motivation50.105.20120.90
2book2motivation50.105.20120.90
3book3health50.105.20120.90

Difference from AVG

Next, we can calculate the price difference and the percentage with OVER.

SELECT
    id,
    name,
    category,
    ROUND(AVG(price) OVER(), 2),
    ROUND((price - AVG(price) OVER()), 2) AS diff_avg,
    ROUND(ABS(price - AVG(price) OVER()) / (AVG(price) OVER() * 100)) AS pct_diff_avg
FROM books;

It will the results that looks something shown below.

idnamecategoryavg(price)diff_avgpct_diff_avg
1book1motivation50.10-20.10xx
2book2motivation50.105.60xx
3book3health50.1010.80xx

Partitioning

PARTITION BY is very similar to GROUPBY, just that partition by happens within the OVER() clause.

Partition by Category

SELECT
    id,
    name,
    category,
    publisher,
    AVG(price) OVER(PARTITION BY category) AS avg_cat,
    AVG(price) OVER(PARTITION BY category, publisher) AS avg_cat_pub
FROM books;

It will produce the results similar to below.

idnamecategorypublisheravg_catavg_cat_pub
1book1motivationpelangi30.0032.00
2book2motivationpopular30.0025.00
3book3healthpelangi20.8020.80
4book4motivationpopular30.0025.00
5book5healthpelangi20.8020.80

The first partition by uses the category column as the baseline to perform the calculation. Each books that belongs to the same category should have the same value. The second partition uses the category column as well as the publisher column to perform the calculation. It is also expected for those who fall under the same category and publisher will have the same result.

Ranking

Row Number

A rank can be computed based on a desired column with the window function.

SELECT
    id,
    name,
    category,
    price,
    ROW_NUMBER() OVER(ORDER BY price DESC) AS rank_by_price
FROM books;
idnamecategorypricerank_by_price
1book1motivation99.991
2book2motivation70.002
3book3health30.503
4book4motivation10.004
5book5health5.805

The table will list down the rows according to the rank obtained in ascending order.

Additionally, we can partition the ranking according to a column, category for example.

SELECT
    id,
    name,
    category,
    price,
    ROW_NUMBER() OVER(PARTITION BY category ORDER BY price DESC) AS rank_partition_by_price
FROM books;

The result will now rank the rows within its own partition.

idnamecategorypricerank_partition_by_price
1book1motivation99.991
2book2motivation70.002
3book3health30.501
4book5health10.002
5book4motivation5.801

Problem statement: Create a new column that populate the column with "Yes" if the book is top 3 by category, "No" if otherwise.

To tackle this issue, we can use the CASE statement.

SELECT
    id,
    name,
    category,
    price,
    CASE
        WHEN ROW_NUMBER() OVER(PARTITION BY category ORDER BY price DESC) <= 3 THEN 'Yes'
        ELSE 'No'
    END AS top3_of_category
FROM books;

The result will be similar to this.

idnamecategorypricetop3_of_category
1book1motivation99.99Yes
2book2motivation70.00Yes
3book3motivation30.50Yes
4book5motivation10.00No
5book4motivation5.80No
6book6health100.00Yes

Rank & Dense Rank

Rank and dense rank is pretty similar to how ROW_NUMBER will work, but with a subtlety. They are useful to rank records that have the exact same value against a target column.

SELECT
    id,
    name,
    category,
    price,
    RANK() OVER(PARTITION BY category ORDER BY price DESC) AS rank_partition_by_price
FROM books;
idnamecategorypricerank_partition_by_price
1book1motivation99.991
2book2motivation70.002
3book3motivation70.002
4book5motivation60.504
5book4motivation60.504
6book6motivation30.006

The rank for the same records are the same, but the subsequent rank is skipped depending on how many records are having the same value.

Dense rank on the other hand, preserves the ranking sequence and does not skip any of the subsequent rank.

Below shows the results obtained when we replace the RANK keyword to DENSE_RANK in the same query.

idnamecategorypricerank_partition_by_price
1book1motivation99.991
2book2motivation70.002
3book3motivation70.002
4book5motivation60.503
5book4motivation60.503
6book6motivation30.004

Lag and Lead

  • Lag - bring in the previous value within the rank
  • Lead - bring in the next value within the rank
SELECT
    id,
    name,
    category,
    price,
    LAG(price) OVER(PARTITION BY category ORDER BY price) a,
    LEAD(price) OVER(PARTITION BY category ORDER BY price) b
FROM books;

The query above will yield the results shown on the table below.

idnamecategorypriceab
1book1motivation99.99null70.00
2book2motivation70.0099.9970.00
3book3motivation70.0070.0060.50
4book5motivation60.5070.0060.50
5book4motivation60.5060.5030.00
6book6motivation30.0060.50null

Both lag and lead accepts an additional optional parameter that dictates the period to be used.

LAG(price, 2)
LEAD(price, 5)

Usage

A few usages for window function

  • Ranking
  • Moving averages
  • Cumulative (running total)
  • Period-to-period comparison
  • Percentile calculations
  • Opening and closing prices
  • Time series analysis

References

Mo Chen. WINDOW FUNCTIONS | Advanced SQL.ย YouTube. https://www.youtube.com/watch?v=efrR9eP2hUo