Skip to main content

Window Functions

SQL window functions, also known as analytic functions, are used to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result for a group of rows, window functions perform calculations for each row within the query result set.


Basic Syntax

The syntax for window functions generally includes the function name followed by an OVER clause, which defines the window or set of rows for the function to operate on.

function_name() OVER (

 [PARTITION BY column_list]

[ORDER BY column_list]

[frame_clause]

)


Common Window Functions

ROW_NUMBER()

The ROW_NUMBER() function in SQL is a window function that assigns a unique sequential integer to rows within a result set's partition. This is often used for ranking or for ordering rows in a more granular fashion than the ORDER BY clause alone

Suppose we have a table employees with the following data:


The ROW_NUMBER() function can be used to assign a unique row number to each row within a partition of the result set. For example, if we want to assign a unique row number to each employee within their department based on their salary in descending order, we can use the following query:

SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees;

This query will partition the result set by the department column and then order the rows within each partition by the salary column in descending order. The ROW_NUMBER() function will then assign a unique row number to each row within these partitions.

The result might look like this:


Practical Use Case: Top-N Per Group

A common use case for ROW_NUMBER() is to find the top-N records within each group. For example, if we want to find the top 2 highest-paid employees in each department, we can use the ROW_NUMBER() function with a common table expression (CTE):

WITH RankedEmployees AS ( SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees ) SELECT id, name, department, salary FROM RankedEmployees WHERE row_num <= 2;

This query will first assign row numbers to each employee within their department based on their salary in descending order. It then filters the result to include only the top 2 employees in each department.

The result might look like this:

The ROW_NUMBER() function is a versatile tool in SQL for assigning unique sequential numbers to rows within partitions of a result set, which can be particularly useful for ranking and filtering data.


RANK(), DENSE_RANK():

The RANK() and DENSE_RANK() functions in SQL are both window functions that assign rankings to rows within a partition of a result set based on the values of a specified column. The primary difference between the two functions lies in how they handle ties (rows with the same value).

Suppose we have a table sales with the following data:


SELECT salesperson, sales_amount, RANK() OVER (ORDER BY sales_amount DESC) AS rank, DENSE_RANK() OVER (ORDER BY sales_amount DESC) AS dense_rank FROM sales;

This query ranks the sales amounts in descending order across all salespersons using both RANK() and DENSE_RANK():


In this example,

Bob has two rows with a sales amount of 700, so both rows are ranked 2 with RANK() and ranked 2 with DENSE_RANK().

The RANK() function skips rank 3 and assigns rank 4 to Charlie's 600 sales amount, while DENSE_RANK() assigns the next rank, 3, without skipping.


NTILE(n)

The NTILE(n) function is a window function in SQL that distributes the rows in an ordered partition into a specified number of approximately equal groups or buckets. Each row is assigned a bucket number ranging from 1 to n. Suppose we have a table employees with the following data:


      Let's use the NTILE(4) function to divide the employees into 4 buckets based on their salaries:

SELECT id, name, salary, NTILE(4) OVER (ORDER BY salary DESC) AS salary_bucket FROM employees;

This query will assign each employee to one of four buckets based on their salary, ordered in descending order. The result might look something like this:


In this example:

  • The employees with the highest salaries, David and Frank, are placed in the first bucket.
  • Charlie and Grace are in the second bucket.
  • Alice and Heidi are in the third bucket.
  • Bob and Eve are in the fourth bucket.

Each bucket contains approximately the same number of rows. If the number of rows cannot be evenly divided by n, the buckets may have different sizes, with some buckets having one more row than others.


LAG(), LEAD()

The LEAD() and LAG() functions are window functions in SQL that allow you to access data from a subsequent or preceding row in the result set, respectively. These functions are useful for comparing values between rows within a partition of the data.

Suppose we have a table sales with the following data:


The LAG() function allows you to access data from a preceding row. For example, if we want to compare each salesperson's sales amount with their previous sales, we can use LAG().

The LEAD() function allows you to access data from a subsequent row. For example, if we want to compare each salesperson's sales amount with their next sales, we can use LEAD():

SELECT salesperson, sales_date, sales_amount, LEAD(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS next_sales ,LAG(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS previous_sales FROM sales;

 

FIRST_VALUE(), LAST_VALUE():

The FIRST_VALUE() and LAST_VALUE() functions are window functions in SQL that return the first and last value in a specified window or partition, respectively. These functions are useful for retrieving values based on the order of rows within a partition.

Suppose we have a table sales with the following data:

The FIRST_VALUE() function returns the first value in an ordered partition. For example, if we want to get the first sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the first sales amount for the salesperson based on the order of sales date.

The LAST_VALUE() function returns the last value in an ordered partition. To get the last sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the last sales amount for the salesperson based on the order of sales date.

Note: The window frame clause (RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) is important for LAST_VALUE() to ensure that the entire partition is considered. Without this clause, LAST_VALUE() might return unexpected results because it defaults to only considering rows up to the current row.

SELECT salesperson,

 sales_date, sales_amount,

FIRST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS first_sales_amount,

LAST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sales_amount

FROM sales;


Comments

Popular posts from this blog

TechUplift: Elevating Your Expertise in Every Click

  Unlock the potential of data with SQL Fundamental: Master querying, managing, and manipulating databases effortlessly. Empower your database mastery with PL/SQL: Unleash the full potential of Oracle databases through advanced programming and optimization. Unlock the Potential of Programming for Innovation and Efficiency.  Transform raw data into actionable insights effortlessly. Empower Your Data Strategy with Power Dataware: Unleash the Potential of Data for Strategic Insights and Decision Making.

Python Topics

Learning Python can be an exciting and rewarding journey, especially given its versatility and widespread use in various fields like web development, data science, automation, and more. Here's a structured guide to help you learn Python effectively, covering essential topics from beginner to advanced levels. Beginner Level Introduction to Python Installation and setup Python syntax and interactive shell Writing and running your first Python script Basic Concepts Variables and data types (integers, floats, strings, booleans) Basic arithmetic operations String operation Comments and documentation Control Structures Conditional statements ( if ,  elif ,  else ) Loops ( for ,  while ) Data Structures Lists Tuples Dictionaries Sets Functions Defining and calling functions Function arguments and return values Lambda functions Built-in functions Modules and Packages Importing modules Standard library overview (e.g.,  math ,  datetime ,  random ) Installing and using external packages

DAX Functions

These are just some of the many DAX functions available in Power BI. Each  function serves a specific purpose and can be used to perform a wide range of calculations and transformations on your data. Aggregation Functions: SUM : Calculates the sum of values. AVERAGE : Calculates the arithmetic mean of values. MIN : Returns the smallest value in a column. MAX : Returns the largest value in a column. COUNT : Counts the number of rows in a table or column. COUNTA : Counts the number of non-blank values in a column. DISTINCTCOUNT : Counts the number of unique values in a column. Logical Functions: IF : Returns one value if a condition is true and another value if it's false. AND : Returns TRUE if all the conditions are true, otherwise FALSE. OR : Returns TRUE if any of the conditions are true, otherwise FALSE. NOT : Returns the opposite of a logical value. Text Functions: CONCATENATE : Concatenates strings together. LEFT : Returns the leftmost characters from a text string. RIGHT : Ret