A famous person once said: Oracle’s analytic SQL functions rock, Oracle’s analytic SQL functions roll. I totally agree. However, I would like to add to this that they rock and roll on a much faster beat if you know more about them. To explain what I mean, let’s take an example:
HR@XE> create table t (n number)
2 /
Table created.
HR@XE> insert into t (n) values (100)
2 /
1 row created.
HR@XE> insert into t (n) values (200)
2 /
1 row created.
HR@XE> insert into t (n) values (300)
2 /
1 row created.
HR@XE> insert into t (n) values (300)
2 /
1 row created.
HR@XE> commit
2 /
Commit complete.
HR@XE> begin
2 dbms_stats.gather_table_stats('HR','T');
3 end;
4 /
PL/SQL procedure successfully completed.
HR@XE> select n from t order by n
2 /
N
----------
100
200
300
300
Now, I want to return the most frequent value of N (or, what’s called in statistics: mode). In the example above, it is 300. The following query, using the analytic function dense_rank, will do that for us:
HR@XE> select n
2 from
3 (select n,
4 dense_rank() over(
5 order by cnt desc) as
6 rnk
7 from
8 (select n,
9 count(*) as
10 cnt
11 from t
12 group by n)
13 x)
14 y
15 where rnk = 1
16 /
N
----------
300
Execution Plan
----------------------------------------------------------
Plan hash value: 4148838322
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3 | 78 | 5 (40)| 00:00:01 |
|* 1 | VIEW | | 3 | 78 | 5 (40)| 00:00:01 |
|* 2 | WINDOW SORT PUSHED RANK| | 3 | 9 | 5 (40)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 9 | 5 (40)| 00:00:01 |
| 4 | TABLE ACCESS FULL | T | 4 | 12 | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("RNK"=1)
2 - filter(DENSE_RANK() OVER ( ORDER BY COUNT(*) DESC )<=1)
Now, did you know that you could tune the above query by using an additional aggregate function called: FIRST.
KEEP … FIRST/LAST aggregate functions allow you to return the result of an aggregate applied over a set of rows that rank as the first or last with respect to a given order specification.
I will rewrite the above query like this:
HR@XE> select max(n) keep(dense_rank first
2 order by cnt desc) n
3 from
4 (select n,
5 count(*) cnt
6 from t
7 group by n)
8 /
N
----------
300
Execution Plan
----------------------------------------------------------
Plan hash value: 548494858
-----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (25)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 26 | | |
| 2 | VIEW | | 3 | 78 | 4 (25)| 00:00:01 |
| 3 | HASH GROUP BY | | 3 | 9 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T | 4 | 12 | 3 (0)| 00:00:01 |
-----------------------------------------------------------------------------
So, getting rid of the subquery and replacing it with KEEP… FIRST made a difference (as you can see from the Execution Plan), and gave us the same result. Sometimes, reading, or just quickly scanning, the documentation or other technical books and websites, is a good thing. It opens your eyes to some hidden, less used, options or commands or functions… that can come in handy in many situations like the one I descibed above.
Related articles:
Tagged analytic, function, sql | Comments Closed | Trackbacks Closed

















The two example queries will not return the same results in general.
If there are 2 values of n in the table with equal maximum frequencies (i.e. two modes), then the first query will return both top ranked values. The second query will return the maximum of the two top-ranked values.
Also, see:
select stats_mode(n) from t;
… which will return strictly one of the modes, although we can’t predict which one.
February 2nd, 2006, at 7:51 am #Anthony, you’re right. That’s an important distinction. So, to test, I added the following two rows to t:
Then, I ran the first query:
I got:
N ---------- 400 300When I ran the second query:
I got:
N ---------- 400Which proves that KEEP…FIRST will only keep one value: the greatest number.
Of course, you could change to first query to select the maximum:
which effectively makes it return the same result as the one with KEEP in this example.
However, testing stats_mode:
Returned:
STATS_MODE(N) ------------- 300stats_mode does not work in 9i or below.
Thanks for this clarification.
February 2nd, 2006, at 12:25 pm #