msgbartop
News, views, tips and tricks on Oracle and other fun stuff
msgbarbottom

Keep first, Keep last

A famous person once said: Oracle’s analytic SQL functions rock, Oracle’s analytic SQL functions roll. I totally agree. However, I would like to add to this that they rock and roll on a much faster beat if you know more about them. To explain what I mean, let’s take an example:

HR@XE> create table t (n number)
  2  /

Table created.

HR@XE> insert into t (n) values (100)
  2  /

1 row created.

HR@XE> insert into t (n) values (200)
  2  /

1 row created.

HR@XE> insert into t (n) values (300)
  2  /

1 row created.

HR@XE> insert into t (n) values (300)
  2  /

1 row created.

HR@XE> commit
  2  /

Commit complete.

HR@XE> begin
  2    dbms_stats.gather_table_stats('HR','T');
  3  end;
  4  /

PL/SQL procedure successfully completed.

HR@XE> select n from t order by n
  2  /

         N
----------
       100
       200
       300
       300

Now, I want to return the most frequent value of N (or, what’s called in statistics: mode). In the example above, it is 300. The following query, using the analytic function dense_rank, will do that for us:

HR@XE> select n
  2  from
  3    (select n,
  4       dense_rank() over(
  5     order by cnt desc) as
  6    rnk
  7     from
  8      (select n,
  9         count(*) as
 10      cnt
 11       from t
 12       group by n)
 13    x)
 14  y
 15  where rnk = 1
 16  /

         N
----------
       300


Execution Plan
----------------------------------------------------------
Plan hash value: 4148838322

---------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |     3 |    78 |     5  (40)| 00:00:01 |
|*  1 |  VIEW                    |      |     3 |    78 |     5  (40)| 00:00:01 |
|*  2 |   WINDOW SORT PUSHED RANK|      |     3 |     9 |     5  (40)| 00:00:01 |
|   3 |    HASH GROUP BY         |      |     3 |     9 |     5  (40)| 00:00:01 |
|   4 |     TABLE ACCESS FULL    | T    |     4 |    12 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RNK"=1)
   2 - filter(DENSE_RANK() OVER ( ORDER BY COUNT(*) DESC )<=1)

Now, did you know that you could tune the above query by using an additional aggregate function called: FIRST.

KEEP … FIRST/LAST aggregate functions allow you to return the result of an aggregate applied over a set of rows that rank as the first or last with respect to a given order specification.

I will rewrite the above query like this:

HR@XE> select max(n) keep(dense_rank first
  2  order by cnt desc) n
  3  from
  4    (select n,
  5       count(*) cnt
  6     from t
  7     group by n)
  8  /

         N
----------
       300


Execution Plan
----------------------------------------------------------
Plan hash value: 548494858

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |     1 |    26 |     4  (25)| 00:00:01 |
|   1 |  SORT AGGREGATE      |      |     1 |    26 |            |          |
|   2 |   VIEW               |      |     3 |    78 |     4  (25)| 00:00:01 |
|   3 |    HASH GROUP BY     |      |     3 |     9 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T    |     4 |    12 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

So, getting rid of the subquery and replacing it with KEEP… FIRST made a difference (as you can see from the Execution Plan), and gave us the same result. Sometimes, reading, or just quickly scanning, the documentation or other technical books and websites, is a good thing. It opens your eyes to some hidden, less used, options or commands or functions… that can come in handy in many situations like the one I descibed above.


Filed in Oracle, Tips on 02 Feb 06 | Tags: , ,


Reader's Comments

  1. |

    The two example queries will not return the same results in general.

    If there are 2 values of n in the table with equal maximum frequencies (i.e. two modes), then the first query will return both top ranked values. The second query will return the maximum of the two top-ranked values.

    Also, see:

    select stats_mode(n) from t;

    … which will return strictly one of the modes, although we can’t predict which one.

  2. |

    Anthony, you’re right. That’s an important distinction. So, to test, I added the following two rows to t:

    insert into t (n) values (400)
    /
    insert into t (n) values (400)
    /
    

    Then, I ran the first query:

    HR@XE> select n
      2  from
      3    (select n,
      4       dense_rank() over(
      5     order by cnt desc) as
      6    rnk
      7     from
      8      (select n,
      9         count(*) as
     10     cnt
     11      from t
     12      group by n)
     13   x)
     14  y
     15  where rnk = 1
     16  /
    

    I got:

             N
    ----------
           400
           300
    

    When I ran the second query:

    HR@XE> select max(n) keep(dense_rank first
      2  order by cnt desc) n
      3  from
      4    (select n,
      5       count(*) cnt
      6     from t
      7     group by n)
      8  /
    

    I got:

             N
    ----------
           400
    

    Which proves that KEEP…FIRST will only keep one value: the greatest number.

    Of course, you could change to first query to select the maximum:

    select max(n) from ( ...first query ...)
    

    which effectively makes it return the same result as the one with KEEP in this example.

    However, testing stats_mode:

    HR@XE> select stats_mode(n) from t;
    

    Returned:

    STATS_MODE(N)
    -------------
              300
    

    stats_mode does not work in 9i or below.

    Thanks for this clarification.