SQL window functions are a powerful tool for performing complex calculations across sets of rows that are related to the current query row. Unlike aggregate functions, window functions do not cause rows to become grouped into a single output row; instead, the rows retain their separate identities.
In this blog post, we’ll explore several common SQL window functions, including ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, LAG()
, LEAD()
, and OVER()
. We’ll discuss what each function does, provide an example, and explain the output.
The ROW_NUMBER()
function assigns a unique sequential integer to rows within a partition of a result set, starting at 1 for the first row in each partition.
ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM
employees;
The RANK()
function provides a ranking of rows within a partition of a result set, with gaps in the ranking values when there are ties
RANK() OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank
FROM
employees;
Explanation: This example ranks employees within their departments by salary. If two employees have the same salary, they will receive the same rank, and the next rank will be skipped.
The DENSE_RANK()
function is similar to RANK()
, but it does not leave gaps in the ranking sequence when there are ties.
DENSE_RANK() OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM
employees;
Explanation: Here, employees are ranked by salary within their departments. Unlike RANK()
, DENSE_RANK()
does not skip any ranks, even if there are ties.
The NTILE()
function divides rows into a specified number of approximately equal groups and assigns a number to each row indicating its group.
NTILE(num_buckets) OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
NTILE(4) OVER (PARTITION BY department_id ORDER BY salary DESC) AS quartile
FROM
employees;
Explanation: In this example, employees within each department are divided into four quartiles based on their salary. The quartile
column indicates which quartile each employee falls into.
The LAG()
function allows you to access data from a previous row in the same result set without using a self-join.
LAG(column_name, offset, default) OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
LAG(salary, 1, 0) OVER (PARTITION BY department_id ORDER BY salary) AS previous_salary
FROM
employees;
Explanation: This example retrieves the salary from the previous row within the same department, ordered by salary. If there is no previous row, the default value (0) is used.
The LEAD()
function is similar to LAG()
, but it accesses data from a subsequent row.
LEAD(column_name, offset, default) OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
LEAD(salary, 1, 0) OVER (PARTITION BY department_id ORDER BY salary) AS next_salary
FROM
employees;
Explanation: This example retrieves the salary from the next row within the same department, ordered by salary. If there is no next row, the default value (0) is used.
The OVER()
clause defines the window (or set of rows) that the window function operates on. It can be used with any of the window functions mentioned above.
function() OVER (PARTITION BY column_name ORDER BY column_name)
SELECT
employee_id,
department_id,
salary,
SUM(salary) OVER (PARTITION BY department_id) AS total_salary
FROM
employees;
Explanation: This example calculates the total salary for each department. The SUM(salary) OVER (PARTITION BY department_id)
computes the sum of salaries within each department.
SQL window functions are incredibly versatile and can simplify complex queries that involve calculations across sets of rows. By understanding and using functions like ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, LAG()
, LEAD()
, and OVER()
, you can perform sophisticated data analysis directly within your SQL queries. Start incorporating these powerful functions into your SQL toolkit to enhance your data manipulation capabilities.