Window Functions

SQL window functions, also known as analytic functions, are used to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result for a group of rows, window functions perform calculations for each row within the query result set.

Basic Syntax

The syntax for window functions generally includes the function name followed by an OVER clause, which defines the window or set of rows for the function to operate on.

function_name() OVER (

[PARTITION BY column_list]

[ORDER BY column_list]

[frame_clause]

)

Common Window Functions

ROW_NUMBER():

The ROW_NUMBER() function in SQL is a window function that assigns a unique sequential integer to rows within a result set's partition. This is often used for ranking or for ordering rows in a more granular fashion than the ORDER BY clause alone

Suppose we have a table employees with the following data:

The ROW_NUMBER() function can be used to assign a unique row number to each row within a partition of the result set. For example, if we want to assign a unique row number to each employee within their department based on their salary in descending order, we can use the following query:

SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees;

This query will partition the result set by the department column and then order the rows within each partition by the salary column in descending order. The ROW_NUMBER() function will then assign a unique row number to each row within these partitions.

The result might look like this:

Practical Use Case: Top-N Per Group

A common use case for ROW_NUMBER() is to find the top-N records within each group. For example, if we want to find the top 2 highest-paid employees in each department, we can use the ROW_NUMBER() function with a common table expression (CTE):

WITH RankedEmployees AS ( SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees ) SELECT id, name, department, salary FROM RankedEmployees WHERE row_num <= 2;

This query will first assign row numbers to each employee within their department based on their salary in descending order. It then filters the result to include only the top 2 employees in each department.

The result might look like this:

The ROW_NUMBER() function is a versatile tool in SQL for assigning unique sequential numbers to rows within partitions of a result set, which can be particularly useful for ranking and filtering data.

RANK(), DENSE_RANK():

The RANK() and DENSE_RANK() functions in SQL are both window functions that assign rankings to rows within a partition of a result set based on the values of a specified column. The primary difference between the two functions lies in how they handle ties (rows with the same value).

Suppose we have a table sales with the following data:

SELECT salesperson, sales_amount, RANK() OVER (ORDER BY sales_amount DESC) AS rank, DENSE_RANK() OVER (ORDER BY sales_amount DESC) AS dense_rank FROM sales;

This query ranks the sales amounts in descending order across all salespersons using both RANK() and DENSE_RANK():

In this example,

Bob has two rows with a sales amount of 700, so both rows are ranked 2 with RANK() and ranked 2 with DENSE_RANK().

The RANK() function skips rank 3 and assigns rank 4 to Charlie's 600 sales amount, while DENSE_RANK() assigns the next rank, 3, without skipping.

NTILE(n):

The NTILE(n) function is a window function in SQL that distributes the rows in an ordered partition into a specified number of approximately equal groups or buckets. Each row is assigned a bucket number ranging from 1 to n. Suppose we have a table employees with the following data:

Let's use the NTILE(4) function to divide the employees into 4 buckets based on their salaries:

SELECT id, name, salary, NTILE(4) OVER (ORDER BY salary DESC) AS salary_bucket FROM employees;

This query will assign each employee to one of four buckets based on their salary, ordered in descending order. The result might look something like this:

In this example:

The employees with the highest salaries, David and Frank, are placed in the first bucket.
Charlie and Grace are in the second bucket.
Alice and Heidi are in the third bucket.
Bob and Eve are in the fourth bucket.

Each bucket contains approximately the same number of rows. If the number of rows cannot be evenly divided by n, the buckets may have different sizes, with some buckets having one more row than others.

LAG(), LEAD():

The LEAD() and LAG() functions are window functions in SQL that allow you to access data from a subsequent or preceding row in the result set, respectively. These functions are useful for comparing values between rows within a partition of the data.

Suppose we have a table sales with the following data:

The LAG() function allows you to access data from a preceding row. For example, if we want to compare each salesperson's sales amount with their previous sales, we can use LAG().

The LEAD() function allows you to access data from a subsequent row. For example, if we want to compare each salesperson's sales amount with their next sales, we can use LEAD():

SELECT salesperson, sales_date, sales_amount, LEAD(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS next_sales ,LAG(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS previous_sales FROM sales;

FIRST_VALUE(), LAST_VALUE():

The FIRST_VALUE() and LAST_VALUE() functions are window functions in SQL that return the first and last value in a specified window or partition, respectively. These functions are useful for retrieving values based on the order of rows within a partition.

Suppose we have a table sales with the following data:

The FIRST_VALUE() function returns the first value in an ordered partition. For example, if we want to get the first sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the first sales amount for the salesperson based on the order of sales date.

The LAST_VALUE() function returns the last value in an ordered partition. To get the last sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the last sales amount for the salesperson based on the order of sales date.

Note: The window frame clause (RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) is important for LAST_VALUE() to ensure that the entire partition is considered. Without this clause, LAST_VALUE() might return unexpected results because it defaults to only considering rows up to the current row.

SELECT salesperson,

sales_date, sales_amount,

FIRST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS first_sales_amount,

LAST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sales_amount

FROM sales;

DataMindSphere

Search This Blog

Window Functions

Comments

Post a Comment

Popular posts from this blog

TechUplift: Elevating Your Expertise in Every Click

Relationships between tables

SQL Fundamentals