Skip to main content

Window Functions

SQL window functions, also known as analytic functions, are used to perform calculations across a set of table rows that are related to the current row. Unlike aggregate functions, which return a single result for a group of rows, window functions perform calculations for each row within the query result set.


Basic Syntax

The syntax for window functions generally includes the function name followed by an OVER clause, which defines the window or set of rows for the function to operate on.

function_name() OVER (

 [PARTITION BY column_list]

[ORDER BY column_list]

[frame_clause]

)


Common Window Functions

ROW_NUMBER()

The ROW_NUMBER() function in SQL is a window function that assigns a unique sequential integer to rows within a result set's partition. This is often used for ranking or for ordering rows in a more granular fashion than the ORDER BY clause alone

Suppose we have a table employees with the following data:


The ROW_NUMBER() function can be used to assign a unique row number to each row within a partition of the result set. For example, if we want to assign a unique row number to each employee within their department based on their salary in descending order, we can use the following query:

SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees;

This query will partition the result set by the department column and then order the rows within each partition by the salary column in descending order. The ROW_NUMBER() function will then assign a unique row number to each row within these partitions.

The result might look like this:


Practical Use Case: Top-N Per Group

A common use case for ROW_NUMBER() is to find the top-N records within each group. For example, if we want to find the top 2 highest-paid employees in each department, we can use the ROW_NUMBER() function with a common table expression (CTE):

WITH RankedEmployees AS ( SELECT id, name, department, salary, ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num FROM employees ) SELECT id, name, department, salary FROM RankedEmployees WHERE row_num <= 2;

This query will first assign row numbers to each employee within their department based on their salary in descending order. It then filters the result to include only the top 2 employees in each department.

The result might look like this:

The ROW_NUMBER() function is a versatile tool in SQL for assigning unique sequential numbers to rows within partitions of a result set, which can be particularly useful for ranking and filtering data.


RANK(), DENSE_RANK():

The RANK() and DENSE_RANK() functions in SQL are both window functions that assign rankings to rows within a partition of a result set based on the values of a specified column. The primary difference between the two functions lies in how they handle ties (rows with the same value).

Suppose we have a table sales with the following data:


SELECT salesperson, sales_amount, RANK() OVER (ORDER BY sales_amount DESC) AS rank, DENSE_RANK() OVER (ORDER BY sales_amount DESC) AS dense_rank FROM sales;

This query ranks the sales amounts in descending order across all salespersons using both RANK() and DENSE_RANK():


In this example,

Bob has two rows with a sales amount of 700, so both rows are ranked 2 with RANK() and ranked 2 with DENSE_RANK().

The RANK() function skips rank 3 and assigns rank 4 to Charlie's 600 sales amount, while DENSE_RANK() assigns the next rank, 3, without skipping.


NTILE(n)

The NTILE(n) function is a window function in SQL that distributes the rows in an ordered partition into a specified number of approximately equal groups or buckets. Each row is assigned a bucket number ranging from 1 to n. Suppose we have a table employees with the following data:


      Let's use the NTILE(4) function to divide the employees into 4 buckets based on their salaries:

SELECT id, name, salary, NTILE(4) OVER (ORDER BY salary DESC) AS salary_bucket FROM employees;

This query will assign each employee to one of four buckets based on their salary, ordered in descending order. The result might look something like this:


In this example:

  • The employees with the highest salaries, David and Frank, are placed in the first bucket.
  • Charlie and Grace are in the second bucket.
  • Alice and Heidi are in the third bucket.
  • Bob and Eve are in the fourth bucket.

Each bucket contains approximately the same number of rows. If the number of rows cannot be evenly divided by n, the buckets may have different sizes, with some buckets having one more row than others.


LAG(), LEAD()

The LEAD() and LAG() functions are window functions in SQL that allow you to access data from a subsequent or preceding row in the result set, respectively. These functions are useful for comparing values between rows within a partition of the data.

Suppose we have a table sales with the following data:


The LAG() function allows you to access data from a preceding row. For example, if we want to compare each salesperson's sales amount with their previous sales, we can use LAG().

The LEAD() function allows you to access data from a subsequent row. For example, if we want to compare each salesperson's sales amount with their next sales, we can use LEAD():

SELECT salesperson, sales_date, sales_amount, LEAD(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS next_sales ,LAG(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS previous_sales FROM sales;

 

FIRST_VALUE(), LAST_VALUE():

The FIRST_VALUE() and LAST_VALUE() functions are window functions in SQL that return the first and last value in a specified window or partition, respectively. These functions are useful for retrieving values based on the order of rows within a partition.

Suppose we have a table sales with the following data:

The FIRST_VALUE() function returns the first value in an ordered partition. For example, if we want to get the first sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the first sales amount for the salesperson based on the order of sales date.

The LAST_VALUE() function returns the last value in an ordered partition. To get the last sales amount for each salesperson based on the sales date. This query returns the sales amount for each row along with the last sales amount for the salesperson based on the order of sales date.

Note: The window frame clause (RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) is important for LAST_VALUE() to ensure that the entire partition is considered. Without this clause, LAST_VALUE() might return unexpected results because it defaults to only considering rows up to the current row.

SELECT salesperson,

 sales_date, sales_amount,

FIRST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date) AS first_sales_amount,

LAST_VALUE(sales_amount) OVER (PARTITION BY salesperson ORDER BY sales_date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_sales_amount

FROM sales;


Comments

Popular posts from this blog

Power BI tenant settings and admin portal

As of my last update, Power BI offers a dedicated admin portal for managing settings and configurations at the tenant level. Here's an overview of Power BI tenant settings and the admin portal: 1. Power BI Admin Portal: Access : The Power BI admin portal is accessible to users with admin privileges in the Power BI service. URL : You can access the admin portal at https://app.powerbi.com/admin-portal . 2. Tenant Settings: General Settings : Configure general settings such as tenant name, regional settings, and language settings. Tenant Administration : Manage user licenses, permissions, and access rights for Power BI within the organization. Usage Metrics : View usage metrics and reports to understand how Power BI is being used across the organization. Service Health : Monitor the health status of the Power BI service and receive notifications about service incidents and outages. Audit Logs : Access audit logs to track user activities, access requests, and administrative actions wit...

Performance Optimization

Performance optimization in SQL is crucial for ensuring that your database queries run efficiently, especially as the size and complexity of your data grow. Here are several strategies and techniques to optimize SQL performance: Indexing Create Indexes : Primary Key and Unique Indexes : These are automatically indexed. Ensure that your tables have primary keys and unique constraints where applicable. Foreign Keys : Index foreign key columns to speed up join operations. Composite Indexes : Use these when queries filter on multiple columns. The order of columns in the index should match the order in the query conditions. Avoid Over-Indexing:  Too many indexes can slow down write operations (INSERT, UPDATE, DELETE). Only index columns that are frequently used in WHERE clauses, JOIN conditions, and as sorting keys. Query Optimization Use SELECT Statements Efficiently : SELECT Only Necessary Columns : Avoid using SELECT * ; specify only ...

DAX UPPER Function

The DAX UPPER function in Power BI is used to convert all characters in a text string to uppercase. This function is useful for standardizing text data, ensuring consistency in text values, and performing case-insensitive comparisons. Syntax: UPPER(<text>) <text>: The text string that you want to convert to uppercase. Purpose: The UPPER function helps ensure that text data is consistently formatted in uppercase. This can be essential for tasks like data cleaning, preparing text for comparisons, and ensuring uniformity in text-based fields. E xample: Suppose you have a table named "Customers" with a column "Name" that contains names in mixed case. You want to create a new column that shows all names in uppercase. UppercaseName = UPPER(Customers[Name]) Example Scenario: Assume you have the following "Customers" table: You can use the UPPER function as follows: Using the UPPER function, you can convert all names to uppercase: UppercaseName = ...