msgbartop
News, views, tips and tricks on Oracle and other fun stuff
msgbarbottom

Back to basics: self joins

A self join is a join of a table to itself. This table appears twice (or more) in the FROM clause and is followed by table aliases that qualify column names in the join condition and the SELECT clause. Take for example the employees table, the manager of one employee is also an employee. The rows for both are in the same employees table.

To get information about an employee and her manager, you have to join the employee table to itself, thereby treating employees as if it were two separate tables. The following example query uses a self join to return the name of each employee along with the name of the employee’s manager:

SELECT e.last_name employee, m.last_name manager
  FROM employees e INNER JOIN employees m
       ON e.manager_id = m.employee_id;


         EMPLOYEE                  MANAGER          
------------------------- ------------------------- 
Kochhar                   King                      
De Haan                   King                      
Hunold                    De Haan                   
Ernst                     Hunold                    
Austin                    Hunold                    
Pataballa                 Hunold                    

...                   

106 row(s) retrieved

But

SELECT COUNT (*)
  FROM employees;

               COUNT(*)                
-------------------------------------- 
                                   107 

Even though the employees table has 107 rows, the previous query returned only 106 rows. This is because there is an employee without a manager_id. Oracle excludes that employee’s row from the result set while performing the self inner join. To include that employee without a manager, you need an outer join:

SELECT e.last_name employee, m.last_name manager
  FROM employees e LEFT OUTER JOIN employees m
       ON e.manager_id = m.employee_id;


         EMPLOYEE                  MANAGER          
------------------------- ------------------------- 
King                     
Kochhar                   King                      
De Haan                   King                      
Hunold                    De Haan                   
Ernst                     Hunold                    
Austin                    Hunold                    
Pataballa                 Hunold                    

...                   

107 row(s) retrieved

The previous examples showed self equijoins. However, there are situations when you need to perform self non-equijoins. For example, let’s assume that you are in charge of organizing an interdepartmental soccer competition for the following departments:

SELECT department_name
  FROM departments
 WHERE department_id IN (10, 20, 30, 40);

        DEPARTMENT_NAME        
------------------------------ 
Administration                 
Marketing                      
Purchasing                     
Human Resources                

4 row(s) retrieved

You decide that each department plays against the other three departments only once, which means that the output of the query you are going to write cannot contain one combination Administration, Marketing and another Marketing, Administration because that means that each department plays against the others twice. The solution is to use a self non-equijoin query like this:

SELECT d1.department_name dept1, d2.department_name dept2
  FROM departments d1 INNER JOIN departments d2
       ON d1.department_id < d2.department_id
 WHERE d1.department_id IN (10, 20, 30, 40)
   AND d2.department_id IN (10, 20, 30, 40);

             DEPT1                          DEPT2             
------------------------------ ------------------------------ 
Administration                 Marketing                      
Administration                 Purchasing                     
Administration                 Human Resources                
Marketing                      Purchasing                     
Marketing                      Human Resources                
Purchasing                     Human Resources                

6 row(s) retrieved

That’s it for self joins :)

Sources:


Filed in Joins, Oracle on 11 Jul 06 | Tags: ,


Reader's Comments

  1. |

    What is the difference between using JOIN/ON and just using WHERE?

    For example, take your second query, and instead make it FROM employee e, employee m WHERE e.manager_id(+) = m.employee_id

    I’ve never used the JOIN/ON syntax. Does it have an advantage?

  2. |

    Hi Rob,

    The difference between using JOIN/ON and just using WHERE is that the first uses the ANSI/ISO SQL92 standard syntax, and the second uses the good old Oracle syntax.

    I refer you to this post for some discussion about the difference between the two and the advantages/disadvantages of using the ANSI SQL syntax (make sure you read the comments).

    By the way,

    FROM employees e LEFT OUTER JOIN employees m
       ON e.manager_id = m.employee_id
    

    translates to

        FROM employees e, employees m
     WHERE e.manager_id = m.employee_id (+)
    

    The (+) operator following m.employee_id means that you want to display a row from the m table, even though there exists no corresponding row in the e table, and that’s what the LEFT OUTER JOIN above means.

  3. |

    The (+) operator following m.employee_id means that you want to display a row from the m table, even though there exists no corresponding row in the e table

    Eddie, i think it must be the other way around. This code gets all rows from e, not m. This (+) is usually confusing to many people, it makes the sql get all the rows from the table on the other side.

  4. |

    Just a bit off topic. I clicked on the Spain’s flag on top of the screen and the text it showed make no sense at all! The damn thing even translate the code so it is a bit futile to run that translation.

  5. |

    Yas, you’re right, the corrected statement that describes

    WHERE e.manager_id = m.employee_id (+)

    is:

    The (+) operator following m.employee_id means that you want to display a row from the e table, even though there exists no corresponding row in the m table.

    Thanks for catching it.

  6. |

    As stated by Eddie: WHERE d1.department_id IN (10, 20, 30, 40) AND d2.department_id IN (10, 20, 30, 40); What if the department_ids are around 100 in number, would it not be a roblem to enter all department_ids in the query. Using Oracle I can solve it, however using ANSI do we have to use a subquery there as in Oracle or is there a seperate way. Please let me know.