Everything about Analytic Functions in Advanced SQL queries

The Analytic Functions have been implemented to leverage SQL queries and make them more useful and powerful. These functions are calculated over sets of rows and then return the result in the current row. These functions could be an aggregate function like COUNT, MIN, MAX, AVG, SUM, or a pure one like DENSE_RANK.

OVER

The inseparable part of the analytic function is OVER. This is a mandatory part of using these function if they get () (open close parenthesis) in front of OVER means the analytic function indicates the widest possible scope.

select
    department_id,
    sum(salary) over() "Total"
from hr.employees
where department_id in (10,20,30)
 
  

The total SUM of departments 10,20,30 is returned next to each department.

Note the SUM function is used here without the GROUP BY clause. Also, it is possible to use KEEP clause to fetch out more columns.








select
    department_id,
    salary,
    min(last_name) keep (dense_rank first order by salary) over(),
    min(salary) over() "Total"
from hr.employees
where department_id in (10,20,30)
 
  

PARTITION BY

This part of the Analytic function acts as same as GROUP BY clause to group rows.

select
    last_name,
    salary,
    department_id,
    salary,
    max(salary) over (partition by department_id),
    JOB_ID,
    max(salary) over (partition by job_id)
from hr.employees
where department_id in (10,20,30)
 
  

Raphaely is the PU_MAN of department 30 and he has the best paid in this department.

LEAD, LAG, and ROW_NUMBER

These analytic functions are the member of the ranking family and in order to use them note, do not forget using ORDER BY:

select
    last_name,
    hire_date,
    row_number() over(order by hire_date),
    lag(last_name,1,'-') over(order by hire_date) ,
    lead(last_name,1,'-') over(order by hire_date)
from hr.employees
where department_id in (10,20,30,40,50)
 
  

ROW_NUMBER returns a row number in the specific order. LAG returns the previous nth row based on what number you passed as the second input argument, and LEAD returns next nth rows based on what ever you passed as the second input argument. here for both case we passed 1 as the second argument, then in the first row LAG returns null but if you determined the third argument that what you need instead of null and it will return the third argument, here '-' will return, and LEAD returns next value for LAST_NAME in the next record in the current record and so on and so forth.

select
    last_name,
    salary,
    lag(last_name,1,'-') over(order by salary) ,
    lag(last_name,2,'-') over(order by salary) ,
    lag(last_name,3,'-') over(order by salary)
from hr.employees
where department_id in (10,20,30,40)
 
  

RANK and DENSE_RANK

select
    last_name,
    department_id,
    row_number() over(order by department_id) "Row_Number" ,
    rank() over(order by department_id) "Rank" ,
    dense_rank() over(order by department_id) "Dense_Rank"
from hr.employees
where department_id in (10,20,30,40)

The function ROW_NUMBER always returns unique numbers based on OVER() clause. Both RANK and DENSE_RANK return duplicate numbers for employees with the same DEPARTMENT_ID. the difference between them is DENSE_RANK that it does not skip any numbers.

Note that you can not use Analytic Functions in where clause, only in the SELECT and ORDER BY clause.

select
last_name,
department_id,
row_number() over(partition by department_id order by department_id) "Row_Number" ,
rank() over(partition by department_id order by department_id) "Rank" ,
dense_rank() over(partition by department_id order by department_id) "Dense_Rank"
from hr.employees
where department_id in (10,20,30,40)
order by department_id,"Dense_Rank"
 
  

WINDOW

When you use ROWS BETWEEN clause in an analytic function, indeed you make a window on the query, it is one of the most powerful features in the SQL queries:

select
    last_name,
    hire_date,
    salary,
    max(salary) over(order by hire_date,last_name rows between 
                        unbounded preceding and 1 preceding) max_before,
    max(salary) over(order by hire_date,last_name rows between 1 
                             following and unbounded following) max_after
from hr.employees
where department_id in (10,20,30,40)
order by hire_date,last_name
 
  

On each row, the maximum salary before the current row and the maximum salary after the current row are demonstrated.

When SUM is used with ROWS BETWEEN we can produce cumulative totals.

select
    last_name,
    salary,
    sum(salary) over(order by last_name rows between unbounded preceding 
                                                and current row) Cum_sum
from hr.employees
where department_id in (10,20,30,40)
 
  

The lowest possible bound is UNBOUNDED PRECEDING, the current row is CURRENT ROW and the highest possible row is UNBOUNDED FOLLOWING.

Rows preceding and following the current row are retrieved with "n" PRECEDING and "n" FOLLOWING where "n" is the relative position of the current row.

Note when not specifying BETWEEN, the window implicitly ends at the CURRENT ROW.

select
    last_name,
    salary,
    sum(salary) over(order by Salary rows 1 preceding) Cum_sum
from hr.employees
where department_id in (10,20,30,40)
 
  

The SUM is calculated for rows between the previous and the current row. The window start at the position immediately preceding the current row and the current row.

RANGE is similar to ROWS but the intervals are not a number of rows. They are either numeric or DATE values.

select
    last_name,
    salary,
    salary*.9 LOW,
    salary*1.1 HIGH,
    count(*) over(order by Salary range between salary*.1 preceding 
                  and salary*.1 following) cnt
from hr.employees
where department_id in (10,20,30,40)
 
  

On the first row; SALARY is 2500, low bound is 2250 and upper bound is 2750 then COUNT should calculate how many records exist between this range so also value 2600 in the next row could be in this range then the result is 2, for next row SALARY is 2600 and low bound is 2340 and upper bound is 2860, how many records are exist in this range; there are values 2500,2600,2800 so the COUNT returns 3, so on and so forth.

It is very important to understand that if two rows get the same value, they are both either included or excluded from the window:

select
    ename,
    sal,
    sum(sal) over(order by sal rows  unbounded preceding) sum_rows,
    sum(sal) over(order by sal range unbounded preceding) sum_ranges
from scott.emp
 
  

Until the third row, everything is going equally in both column SUM_ROWS and SUM_RANGES but in the fourth row, SALARY is 1250 and also in fifth-row SALARY is 1250 then RANGES include both of them in the window and calculate 5350 but ROWS calculate 4100.

ROWS CURRENT ROW points to exactly one row; RANGE CURRENT ROW points to all row where the sort key is equal to the current row.

With dates and TIMESTAMPs, the interval could be a number of days, a day-to-seconds interval or a year-to-month interval.

select
    last_name,
    hire_date,
    salary,
    
    avg(salary) over(order by trunc(hire_date,'MM') 
                range between interval '1' month preceding 
                and 
                interval '1' month preceding) "previous",
    
    avg(salary) over(order by trunc(hire_date,'MM') 
                range current row) "current",
    
    avg(salary) over(order by trunc(hire_date,'MM') 
                range between interval '1' month following 
                and 
                interval '1' month following) "next",
    
    avg(salary) over(order by trunc(hire_date,'MM') 
                range between interval '1' month preceding 
                and 
                interval '1' month following) "3months"
from hr.employees
order by hire_date
 
  

The sort key is the month of hire date, The previous column evaluates the average salary for the employees hired in the month before the current employee was hired, the current column includes the average of employees hired in the same month of the current row and the next column relates to the employees hired in the month after the hire date of the current employee.

FIRST_VALUE and LAST_VALUE

The discrete bounds of the current window are returned by the FIRST_VALUE and LAST_VALUE:

select
    last_name,
    hire_date,
    
    first_value(last_name || '(' || hire_date || ')') 
    over (order by hire_date 
          range between 30 preceding and 30 following) "first",
    
    last_value(last_name || '(' || hire_date || ')') 
    over (order by hire_date 
          range between 30 preceding and 30 following) "last"
from hr.employees
order by hire_date
 
  

The window starts 30 days before the current hire date and ends 30 days after the current hire date, and FIRST_VALUE returns the first value in this window and LAST_VALUE returns the last value.

in 10g and later the IGNORE NULLS clause returns the first and last non-null values.

select
    ename,
    sal,
    comm,
    first_value(comm ignore nulls) 
               over (order by sal 
               rows between 1 following and unbounded following) "next",
    
    last_value(comm ignore nulls) 
               over (order by sal 
               rows between unbounded preceding and 1 preceding) "previous"
from scott.emp
 
  

Using Analytic Functions with GROUP BY clause

Analytics can be used with aggregation, When combining analytics with aggregation, all expression in the ORDER BY clause and column expression have to be either part of the GROUP BY expressions or aggregate functions.

select
    row_number() over(order by deptno) "Row_Number",
    deptno,
    sum(sal),
    to_char(100*ratio_to_report(sum(sal)) over(),'990.00L','NLS_CURRENCY=%') pct
from scott.emp
group by deptno
 
  

RATIO_TO_REPORT returns the percentage of the total salary of the department to the overall total.

To view or add a comment, sign in

More articles by Arash Atarzadeh

  • Generate Aggregate Functions in PLSQL

    When you work with buit-in aggregate and analytic functions, you must think in terms of SETs. The reason is very…

  • SQLJ in Oracle Database PL/SQL

    Create a Type : CREATE OR REPLACE TYPE SimpleStringArrayType AS TABLE OF VARCHAR2(500) 2. Create a sample JAVA code…

    1 Comment
  • The Oracle Database XQUERY In Action

    In Oracle 10gR2 and later, XQuery extends the XML capabilities. XQuery is defined by the World Wide Web Consortium at…

  • To Improve PIVOT queries in ORACLE Database 11gR2 Via Oracle Database Data Cartridge

    Oracle unveiled one of its most powerful functions in the field of backend database programming revolutionary for using…

    3 Comments
  • Advanced SQL Queries Tips and Tricks

    The Article is intended to clear all dark sides of Queries species and it begins with JOINS and will end with Hierarchy…

    1 Comment
  • Do not underestimate Conversion functions in ORACLE SQL

    SQL conversion functions are designed to transform the quiddity of the data type of a column value, expression or…

  • The Secrets of Oracle SQL Functions

    In the beginning of this discussion, I would like to mention there are three Table types in SYS schema available when…

    4 Comments
  • Advanced SQL Aggregation Topics

    I'm sure all of you are familiar with the concepts and structures of aggregation functions is SQL, so in the following,…

  • SQL For Modeling

    To describe Model in SQL lets start with one simple description says a model is a multidimensional array created with…

Insights from the community

Others also viewed

Explore topics