msgbartop
News, views, tips and tricks on Oracle and other fun stuff
msgbarbottom

Back to basics: anti-joins and semi-joins

I almost forgot that the Oracle Joins series is not complete yet. So here is the last episode, a quick and easy anti-join and semi-join refresher.

Anti-joins:
Anti-joins are written using the NOT EXISTS or NOT IN constructs. An anti-join between two tables returns rows from the first table for which there are no corresponding rows in the second table. In other words, it returns rows that fail to match the sub-query on the right side.

Suppose you want a list of departments with no employees. You could write a query like this:

SELECT   d.department_name
    FROM departments d
MINUS
SELECT   d.department_name
    FROM departments d, employees e
   WHERE d.department_id = e.department_id
ORDER BY department_name;

The above query will give the desired results, but it might be clearer to write the query using an anti-join:

SELECT   d.department_name
    FROM departments d
   WHERE NOT EXISTS (SELECT NULL
                       FROM employees e
                      WHERE e.department_id = d.department_id)
ORDER BY d.department_name;

Semi-joins:
Semi-joins are written using the EXISTS or IN constructs. A semi-join between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-join and a conventional join is that rows in the first table will be returned at most once.

Suppose you want a list of departments with at least one employee. You could write the query like this:

SELECT   d.department_name
    FROM departments d, employees e
   WHERE d.department_id = e.department_id
ORDER BY department_name;

The department name in the query result will appear as many times as the number of employees in it. So, for example if a department has 30 employees then that department will appear in the query output 30 times.

To eliminate the duplicate rows, you could use the DISTINCT or GROUP BY keywords. A more elegant solution is to use a semi-join between the departments and employees tables instead of a conventional join:

SELECT   d.department_name
    FROM departments d
   WHERE EXISTS (SELECT NULL
                   FROM employees e
                  WHERE e.department_id = d.department_id)
ORDER BY d.department_name;

The above query will list the departments that have at least one employee. The department will appear only once in the query output no matter how many employees it has.

Sources and Resources:


Filed in Joins, Oracle on 01 May 07 | Tags: ,


Reader's Comments

  1. |

    A great set of articles on joins, however, there is a big GOTCHA! waiting for Oracle developers adopting ANSI joins:

    If you are writing queries that select from many tables, like denormalizing data for warehousing, and the sum of the columns in those tables exceeds 1,050, you’ll always get ORA-01445.

    This occurs from 9i through 10g2, and has happened often enough to me that I avoid my preference for ANSI syntax on Oracle databases, unless I’m just selecting from a few tables.

    Just something to watch out for.

    thanks again for the articles!

  2. |

    Thanks for this tip Eric. Very interesting and good to know. One would think that “ANSI SQL” and “Oracle SQL” should behave the same in Oracle. So, This gotcha looks like an Oracle bug to me.