SQL window functions, also known as analytic functions, are used to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result for a group of rows, window functions perform calculations for each row within the query result set.
Basic Syntax
The syntax for window functions generally includes the function name
followed by an OVER clause, which
defines the window or set of rows for the function to operate on.
function_name() OVER (
[PARTITION BY
column_list]
[ORDER BY column_list]
[frame_clause]
)
Common Window Functions
ROW_NUMBER():
The ROW_NUMBER()
function in SQL is a window function that assigns a unique sequential
integer to rows within a result set's partition. This is often used for
ranking or for ordering rows in a more granular fashion than the ORDER BY
clause alone
Suppose we have a table employees
with the following data:
The ROW_NUMBER()
function
can be used to assign a unique row number to each row within a partition of the
result set. For example, if we want to assign a unique row number to each
employee within their department based on their salary in descending order, we
can use the following query:
SELECT id,
name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM
employees;
This query will partition the result set by the department
column and then order the
rows within each partition by the salary
column in descending order. The ROW_NUMBER()
function will then assign a unique row number to each row within these
partitions.
The result might look like this:
Practical Use Case: Top-N Per Group
A common use case for ROW_NUMBER() is to find the top-N records within each group. For example, if we want
to find the top 2 highest-paid employees in each department, we can use the ROW_NUMBER() function with a common table
expression (CTE):
WITH RankedEmployees AS ( SELECT id, name,
department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM
employees ) SELECT id, name, department, salary
FROM RankedEmployees WHERE
row_num <= 2;
This query will first assign row numbers to each employee within their
department based on their salary in descending order. It then filters the
result to include only the top 2 employees in each department.
The result might look like this:
The ROW_NUMBER()
function
is a versatile tool in SQL for assigning unique sequential numbers to rows
within partitions of a result set, which can be particularly useful for ranking
and filtering data.
RANK(), DENSE_RANK():
The RANK()
and DENSE_RANK()
functions in SQL are both window
functions that assign rankings to rows within a partition of a result set based
on the values of a specified column. The primary difference between the two
functions lies in how they handle ties (rows with the same value).
Suppose we have a table sales
with the following data:
SELECT
salesperson, sales_amount, RANK() OVER (ORDER BY sales_amount DESC)
AS rank, DENSE_RANK()
OVER (ORDER BY sales_amount DESC)
AS dense_rank FROM
sales;
This query ranks the sales amounts in descending order
across all salespersons using both RANK()
and DENSE_RANK()
:
In this example,
Bob has two rows with a
sales amount of 700, so both rows are ranked 2 with RANK() and ranked 2 with DENSE_RANK().
The RANK() function skips rank
3 and assigns rank 4 to Charlie's 600 sales amount, while DENSE_RANK() assigns the next rank, 3, without skipping.
NTILE(n):
The NTILE(n)
function is a window function in SQL that distributes the rows in an ordered
partition into a specified number of approximately equal groups or buckets.
Each row is assigned a bucket number ranging from 1 to n. Suppose we have a table employees
with the following data:
NTILE(4)
function to divide the employees into 4 buckets based on their salaries:SELECT id,
name, salary, NTILE(4)
OVER (ORDER BY salary DESC) AS salary_bucket FROM
employees;
This query will assign each employee to one of four
buckets based on their salary, ordered in descending order. The result might
look something like this:
In this example:
- The employees with the highest
salaries, David and Frank, are placed in the first bucket.
- Charlie and Grace are in the
second bucket.
- Alice and Heidi are in the third
bucket.
- Bob and Eve are in the fourth
bucket.
Each bucket contains approximately the same number of rows. If the number
of rows cannot be evenly divided by n, the buckets may have different sizes, with some buckets having one more
row than others.
LAG(), LEAD():
The LEAD()
and LAG()
functions are
window functions in SQL that allow you to access data from a subsequent or
preceding row in the result set, respectively. These functions are useful for
comparing values between rows within a partition of the data.
Suppose we have a table sales
with the following data:
LAG()
function allows you to access data from a preceding row. For example, if we
want to compare each salesperson's sales amount with their previous sales, we
can use LAG()
.LEAD()
function allows you to access data from a subsequent row. For example, if we
want to compare each salesperson's sales amount with their next sales, we can
use LEAD()
:FIRST_VALUE(), LAST_VALUE():
The FIRST_VALUE()
and LAST_VALUE()
functions are window functions in SQL
that return the first and last value in a specified window or partition,
respectively. These functions are useful for retrieving values based on the
order of rows within a partition.
Suppose we have a table sales
with the following data:
The FIRST_VALUE()
function
returns the first value in an ordered partition. For example, if we want to get
the first sales amount for each salesperson based on the sales date. This query
returns the sales amount for each row along with the first sales amount for the
salesperson based on the order of sales date.
The LAST_VALUE()
function
returns the last value in an ordered partition. To get the last sales amount
for each salesperson based on the sales date. This query returns the sales
amount for each row along with the last sales amount for the salesperson based
on the order of sales date.
Note: The
window frame clause (RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) is important for LAST_VALUE()
to ensure that the entire partition is
considered. Without this clause, LAST_VALUE()
might return unexpected results because it defaults to only considering rows up
to the current row.
SELECT
salesperson,
sales_date,
sales_amount,
FIRST_VALUE(sales_amount)
OVER (PARTITION
BY salesperson ORDER
BY sales_date) AS
first_sales_amount,
LAST_VALUE(sales_amount)
OVER (PARTITION
BY salesperson ORDER
BY sales_date RANGE
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS
last_sales_amount
FROM sales;
Comments
Post a Comment