Skip to main content

Window Functions

SQL window functions, also known as analytic functions, are used to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result for a group of rows, window functions perform calculations for each row within the query result set.


Basic Syntax

The syntax for window functions generally includes the function name followed by an OVER clause, which defines the window or set of rows for the function to operate on.

function_name() OVER (

 [PARTITION BY column_list]

[ORDER BY column_list]

[frame_clause]

)


Common Window Functions

ROW_NUMBER()

The ROW_NUMBER() function in SQL is a window function that assigns a unique sequential integer to rows within a result set's partition. This is often used for ranking or for ordering rows in a more granular fashion than the ORDER BY clause alone

Suppose we have a table employees with the following data:


The ROW_NUMBER() function can be used to assign a unique row number to each row within a partition of the result set. For example, if we want to assign a unique row number to each employee within their department based on their salary in descending order, we can use the following query:

SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees;

This query will partition the result set by the department column and then order the rows within each partition by the salary column in descending order. The ROW_NUMBER() function will then assign a unique row number to each row within these partitions.

The result might look like this:


Practical Use Case: Top-N Per Group

A common use case for ROW_NUMBER() is to find the top-N records within each group. For example, if we want to find the top 2 highest-paid employees in each department, we can use the ROW_NUMBER() function with a common table expression (CTE):

WITH RankedEmployees AS ( SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees ) SELECT id, name, department, salary FROM RankedEmployees WHERE row_num <= 2;

This query will first assign row numbers to each employee within their department based on their salary in descending order. It then filters the result to include only the top 2 employees in each department.

The result might look like this:

The ROW_NUMBER() function is a versatile tool in SQL for assigning unique sequential numbers to rows within partitions of a result set, which can be particularly useful for ranking and filtering data.


RANK(), DENSE_RANK():

The RANK() and DENSE_RANK() functions in SQL are both window functions that assign rankings to rows within a partition of a result set based on the values of a specified column. The primary difference between the two functions lies in how they handle ties (rows with the same value).

Suppose we have a table sales with the following data:


SELECT salesperson, sales_amount, RANK() OVER (ORDER BY sales_amount DESC) AS rank, DENSE_RANK() OVER (ORDER BY sales_amount DESC) AS dense_rank FROM sales;

This query ranks the sales amounts in descending order across all salespersons using both RANK() and DENSE_RANK():


In this example,

Bob has two rows with a sales amount of 700, so both rows are ranked 2 with RANK() and ranked 2 with DENSE_RANK().

The RANK() function skips rank 3 and assigns rank 4 to Charlie's 600 sales amount, while DENSE_RANK() assigns the next rank, 3, without skipping.


NTILE(n)

The NTILE(n) function is a window function in SQL that distributes the rows in an ordered partition into a specified number of approximately equal groups or buckets. Each row is assigned a bucket number ranging from 1 to n. Suppose we have a table employees with the following data:


      Let's use the NTILE(4) function to divide the employees into 4 buckets based on their salaries:

SELECT id, name, salary, NTILE(4) OVER (ORDER BY salary DESC) AS salary_bucket FROM employees;

This query will assign each employee to one of four buckets based on their salary, ordered in descending order. The result might look something like this:


In this example:

  • The employees with the highest salaries, David and Frank, are placed in the first bucket.
  • Charlie and Grace are in the second bucket.
  • Alice and Heidi are in the third bucket.
  • Bob and Eve are in the fourth bucket.

Each bucket contains approximately the same number of rows. If the number of rows cannot be evenly divided by n, the buckets may have different sizes, with some buckets having one more row than others.


LAG(), LEAD()

The LEAD() and LAG() functions are window functions in SQL that allow you to access data from a subsequent or preceding row in the result set, respectively. These functions are useful for comparing values between rows within a partition of the data.

Suppose we have a table sales with the following data:


The LAG() function allows you to access data from a preceding row. For example, if we want to compare each salesperson's sales amount with their previous sales, we can use LAG().

The LEAD() function allows you to access data from a subsequent row. For example, if we want to compare each salesperson's sales amount with their next sales, we can use LEAD():

SELECT salesperson, sales_date, sales_amount, LEAD(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS next_sales ,LAG(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS previous_sales FROM sales;

 

FIRST_VALUE(), LAST_VALUE():

The FIRST_VALUE() and LAST_VALUE() functions are window functions in SQL that return the first and last value in a specified window or partition, respectively. These functions are useful for retrieving values based on the order of rows within a partition.

Suppose we have a table sales with the following data:

The FIRST_VALUE() function returns the first value in an ordered partition. For example, if we want to get the first sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the first sales amount for the salesperson based on the order of sales date.

The LAST_VALUE() function returns the last value in an ordered partition. To get the last sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the last sales amount for the salesperson based on the order of sales date.

Note: The window frame clause (RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) is important for LAST_VALUE() to ensure that the entire partition is considered. Without this clause, LAST_VALUE() might return unexpected results because it defaults to only considering rows up to the current row.

SELECT salesperson,

 sales_date, sales_amount,

FIRST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS first_sales_amount,

LAST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sales_amount

FROM sales;


Comments

Popular posts from this blog

Performance Optimization

Performance optimization in SQL is crucial for ensuring that your database queries run efficiently, especially as the size and complexity of your data grow. Here are several strategies and techniques to optimize SQL performance: Indexing Create Indexes : Primary Key and Unique Indexes : These are automatically indexed. Ensure that your tables have primary keys and unique constraints where applicable. Foreign Keys : Index foreign key columns to speed up join operations. Composite Indexes : Use these when queries filter on multiple columns. The order of columns in the index should match the order in the query conditions. Avoid Over-Indexing:  Too many indexes can slow down write operations (INSERT, UPDATE, DELETE). Only index columns that are frequently used in WHERE clauses, JOIN conditions, and as sorting keys. Query Optimization Use SELECT Statements Efficiently : SELECT Only Necessary Columns : Avoid using SELECT * ; specify only ...

DAX UPPER Function

The DAX UPPER function in Power BI is used to convert all characters in a text string to uppercase. This function is useful for standardizing text data, ensuring consistency in text values, and performing case-insensitive comparisons. Syntax: UPPER(<text>) <text>: The text string that you want to convert to uppercase. Purpose: The UPPER function helps ensure that text data is consistently formatted in uppercase. This can be essential for tasks like data cleaning, preparing text for comparisons, and ensuring uniformity in text-based fields. E xample: Suppose you have a table named "Customers" with a column "Name" that contains names in mixed case. You want to create a new column that shows all names in uppercase. UppercaseName = UPPER(Customers[Name]) Example Scenario: Assume you have the following "Customers" table: You can use the UPPER function as follows: Using the UPPER function, you can convert all names to uppercase: UppercaseName = ...

Understanding the Power BI ecosystem and workflow

Understanding the Power BI ecosystem and workflow involves getting familiar with the various components of Power BI and how they interact to provide a comprehensive data analysis and visualization solution. Here's a detailed explanation: Power BI Ecosystem The Power BI ecosystem consists of several interconnected components that work together to enable users to connect to data sources, transform and model data, create visualizations, and share insights. The main components are: Power BI Desktop Power BI Service Power BI Mobile Power BI Gateway Power BI Report Server Power BI Embedded PowerBI Workflow Here’s a typical workflow in the Power BI ecosystem: Step 1: Connect to Data Sources Power BI Desktop:  Connect to various data sources like Excel, SQL databases, cloud services, and more. Power BI Gateway:  If using on-premises data sources, install and configure the gateway for secure data transfer. Step 2: Data Transformation and Modeling Power BI Desktop:  Use Power Query...