A self join is a join of a table to itself. This table appears twice (or more) in the FROM clause and is followed by table aliases that qualify column names in the join condition and the SELECT clause. Take for example the employees table, the manager of one employee is also an employee. The rows for both are in the same employees table.
To get information about an employee and her manager, you have to join the employee table to itself, thereby treating employees as if it were two separate tables. The following example query uses a self join to return the name of each employee along with the name of the employee’s manager:
SELECT e.last_name employee, m.last_name manager
FROM employees e INNER JOIN employees m
ON e.manager_id = m.employee_id;
EMPLOYEE MANAGER
------------------------- -------------------------
Kochhar King
De Haan King
Hunold De Haan
Ernst Hunold
Austin Hunold
Pataballa Hunold
...
106 row(s) retrieved
But
SELECT COUNT (*)
FROM employees;
COUNT(*)
--------------------------------------
107
Even though the employees table has 107 rows, the previous query returned only 106 rows. This is because there is an employee without a manager_id. Oracle excludes that employee’s row from the result set while performing the self inner join. To include that employee without a manager, you need an outer join:
SELECT e.last_name employee, m.last_name manager
FROM employees e LEFT OUTER JOIN employees m
ON e.manager_id = m.employee_id;
EMPLOYEE MANAGER
------------------------- -------------------------
King
Kochhar King
De Haan King
Hunold De Haan
Ernst Hunold
Austin Hunold
Pataballa Hunold
...
107 row(s) retrieved
The previous examples showed self equijoins. However, there are situations when you need to perform self non-equijoins. For example, let’s assume that you are in charge of organizing an interdepartmental soccer competition for the following departments:
SELECT department_name
FROM departments
WHERE department_id IN (10, 20, 30, 40);
DEPARTMENT_NAME
------------------------------
Administration
Marketing
Purchasing
Human Resources
4 row(s) retrieved
You decide that each department plays against the other three departments only once, which means that the output of the query you are going to write cannot contain one combination Administration, Marketing and another Marketing, Administration because that means that each department plays against the others twice. The solution is to use a self non-equijoin query like this:
SELECT d1.department_name dept1, d2.department_name dept2
FROM departments d1 INNER JOIN departments d2
ON d1.department_id < d2.department_id
WHERE d1.department_id IN (10, 20, 30, 40)
AND d2.department_id IN (10, 20, 30, 40);
DEPT1 DEPT2
------------------------------ ------------------------------
Administration Marketing
Administration Purchasing
Administration Human Resources
Marketing Purchasing
Marketing Human Resources
Purchasing Human Resources
6 row(s) retrieved
That’s it for self joins
Sources:
Possibly related:
- SQL Joins as Seen on a Diagram
- Back to basics: cross joins
- Back to basics: anti-joins and semi-joins
- Give Me One Minute And I’ll Tell You If You Are Liberal Or Conservative
- Back to basics: inner joins
Tagged concepts, join | Post a Comment


















What is the difference between using JOIN/ON and just using WHERE?
For example, take your second query, and instead make it
FROM employee e, employee m WHERE e.manager_id(+) = m.employee_id
I’ve never used the JOIN/ON syntax. Does it have an advantage?
July 11th, 2006, at 3:25 pm #Hi Rob,
The difference between using JOIN/ON and just using WHERE is that the first uses the ANSI/ISO SQL92 standard syntax, and the second uses the good old Oracle syntax.
I refer you to this post for some discussion about the difference between the two and the advantages/disadvantages of using the ANSI SQL syntax (make sure you read the comments).
By the way,
translates to
The (+) operator following m.employee_id means that you want to display a row from the m table, even though there exists no corresponding row in the e table, and that’s what the LEFT OUTER JOIN above means.
July 11th, 2006, at 6:45 pm #The (+) operator following m.employee_id means that you want to display a row from the m table, even though there exists no corresponding row in the e table
Eddie, i think it must be the other way around. This code gets all rows from e, not m. This (+) is usually confusing to many people, it makes the sql get all the rows from the table on the other side.
July 12th, 2006, at 2:12 am #Just a bit off topic.
July 12th, 2006, at 7:27 am #I clicked on the Spain’s flag on top of the screen and the text it showed make no sense at all!
The damn thing even translate the code so it is a bit futile to run that translation.
Yas, you’re right, the corrected statement that describes
WHERE e.manager_id = m.employee_id (+)
is:
The (+) operator following m.employee_id means that you want to display a row from the e table, even though there exists no corresponding row in the m table.
Thanks for catching it.
July 12th, 2006, at 9:46 am #As stated by Eddie:
May 31st, 2007, at 12:36 am #WHERE d1.department_id IN (10, 20, 30, 40)
AND d2.department_id IN (10, 20, 30, 40);
What if the department_ids are around 100 in number, would it not be a roblem to enter all department_ids in the query. Using Oracle I can solve it, however using ANSI do we have to use a subquery there as in Oracle or is there a seperate way. Please let me know.