← Back to Index

Chapter 5: Joins

JOIN, subqueries & UNION

1. Join Overview

Relational databases store data across multiple tables to avoid redundancy. JOINs let you combine rows from two or more tables based on a related column, reconstructing the complete picture at query time.

πŸ’‘ Why not store everything in one big table? Duplicated data wastes storage, makes updates error-prone, and violates normalization principles. JOINs give you the best of both worlds: normalized storage with flexible querying.

We'll use these sample tables throughout this chapter:

CREATE TABLE departments (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE employees (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100) NOT NULL,
    dept_id INT,
    manager_id INT,
    salary DECIMAL(10,2),
    FOREIGN KEY (dept_id) REFERENCES departments(id),
    FOREIGN KEY (manager_id) REFERENCES employees(id)
);

CREATE TABLE projects (
    id INT PRIMARY KEY AUTO_INCREMENT,
    title VARCHAR(200) NOT NULL,
    dept_id INT,
    FOREIGN KEY (dept_id) REFERENCES departments(id)
);

-- Sample data
INSERT INTO departments (id, name) VALUES
(1, 'Engineering'), (2, 'Marketing'), (3, 'Sales'), (4, 'HR');

INSERT INTO employees (id, name, dept_id, manager_id, salary) VALUES
(1, 'Alice',   1, NULL, 95000),
(2, 'Bob',     1, 1,    85000),
(3, 'Charlie', 2, 1,    72000),
(4, 'Diana',   2, 3,    68000),
(5, 'Eve',     NULL, 1, 78000),
(6, 'Frank',   3, NULL, 82000);

INSERT INTO projects (id, title, dept_id) VALUES
(1, 'Website Redesign', 1),
(2, 'Mobile App', 1),
(3, 'Ad Campaign', 2);

Note that Eve has no department (dept_id = NULL), and the HR department has no employees. These edge cases will help illustrate different join behaviors.

2. INNER JOIN

Returns only rows that have matching values in both tables. This is the most common join type.

Basic Syntax

SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;

Result (Eve is excluded because her dept_id is NULL, HR is excluded because no employee belongs to it):

employeedepartment
AliceEngineering
BobEngineering
CharlieMarketing
DianaMarketing
FrankSales

Multi-Table Join

You can chain multiple joins to connect three or more tables:

SELECT e.name AS employee, d.name AS department, p.title AS project
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id
INNER JOIN projects p ON d.id = p.dept_id;
employeedepartmentproject
AliceEngineeringWebsite Redesign
AliceEngineeringMobile App
BobEngineeringWebsite Redesign
BobEngineeringMobile App
CharlieMarketingAd Campaign
DianaMarketingAd Campaign

Table Aliases

Aliases (e, d, p) make queries shorter and more readable. They're especially important when a column name exists in multiple tables:

-- Without alias: ambiguous 'name' column
SELECT employees.name, departments.name
FROM employees
INNER JOIN departments ON employees.dept_id = departments.id;

-- With alias: clear and concise
SELECT e.name AS employee, d.name AS department
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id;

πŸ’‘ JOIN without a prefix keyword defaults to INNER JOIN. Writing INNER JOIN explicitly is recommended for clarity.

3. LEFT JOIN / RIGHT JOIN

Outer joins keep all rows from one side, filling in NULLs where there's no match on the other side.

LEFT JOIN

Returns all rows from the left table, with matching rows from the right. If no match, right-side columns are NULL.

SELECT e.name AS employee, d.name AS department
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id;
employeedepartment
AliceEngineering
BobEngineering
CharlieMarketing
DianaMarketing
EveNULL
FrankSales

RIGHT JOIN

Returns all rows from the right table, with matching rows from the left.

SELECT e.name AS employee, d.name AS department
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id;
employeedepartment
AliceEngineering
BobEngineering
CharlieMarketing
DianaMarketing
FrankSales
NULLHR

Finding Unmatched Rows

A common pattern: use a LEFT JOIN + WHERE ... IS NULL to find rows without a match.

-- Employees without a department
SELECT e.name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id
WHERE d.id IS NULL;

-- Departments with no employees
SELECT d.name
FROM departments d
LEFT JOIN employees e ON d.id = e.dept_id
WHERE e.id IS NULL;

πŸ”„ FULL OUTER JOIN:

  • β€’ PostgreSQL, SQL Server, and Oracle support FULL OUTER JOIN natively β€” returns all rows from both sides
  • β€’ MySQL does not support FULL OUTER JOIN. Emulate it with a UNION of LEFT and RIGHT joins:
SELECT e.name, d.name FROM employees e LEFT JOIN departments d ON e.dept_id = d.id
UNION
SELECT e.name, d.name FROM employees e RIGHT JOIN departments d ON e.dept_id = d.id;

4. CROSS JOIN

Returns the Cartesian product β€” every row from the left table combined with every row from the right table. If table A has M rows and table B has N rows, the result has M Γ— N rows.

-- Generate all possible employee-department combinations
SELECT e.name AS employee, d.name AS department
FROM employees e
CROSS JOIN departments d;

With 6 employees and 4 departments, this returns 24 rows.

Practical Use Cases

CROSS JOIN is useful for generating combinations:

-- Generate a calendar grid: all dates Γ— all time slots
SELECT d.date_val, t.slot_name
FROM (
    SELECT '2025-01-01' AS date_val
    UNION ALL SELECT '2025-01-02'
    UNION ALL SELECT '2025-01-03'
) d
CROSS JOIN (
    SELECT 'Morning' AS slot_name
    UNION ALL SELECT 'Afternoon'
    UNION ALL SELECT 'Evening'
) t;

πŸ’‘ Caution: CROSS JOIN can produce enormous result sets. A cross join of two tables with 1,000 rows each yields 1,000,000 rows. Always use it intentionally and on small datasets.

5. Self Join

A self join joins a table with itself. This is essential when rows in the same table have a hierarchical or peer relationship β€” for example, employees and their managers.

Employee-Manager Example

-- Find each employee's manager
SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;
employeemanager
AliceNULL
BobAlice
CharlieAlice
DianaCharlie
EveAlice
FrankNULL

Employees Who Earn More Than Their Manager

SELECT
    e.name AS employee,
    e.salary AS emp_salary,
    m.name AS manager,
    m.salary AS mgr_salary
FROM employees e
INNER JOIN employees m ON e.manager_id = m.id
WHERE e.salary > m.salary;

Finding Colleagues (Same Department)

SELECT
    e1.name AS employee1,
    e2.name AS employee2,
    d.name AS department
FROM employees e1
INNER JOIN employees e2 ON e1.dept_id = e2.dept_id AND e1.id < e2.id
INNER JOIN departments d ON e1.dept_id = d.id;

πŸ’‘ The condition e1.id < e2.id prevents duplicate pairs (Alice-Bob and Bob-Alice) and self-pairs (Alice-Alice).

6. Subqueries

A subquery is a SELECT statement nested inside another query. Subqueries can appear in the WHERE clause, FROM clause, or SELECT list.

Scalar Subquery

Returns a single value. Can be used anywhere a single value is expected.

-- Employees who earn above the average salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Show each employee's salary and the company average
SELECT
    name,
    salary,
    (SELECT AVG(salary) FROM employees) AS avg_salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff
FROM employees;

IN Subquery

Returns a list of values. Use with the IN operator to filter rows matching any value in the list.

-- Employees in departments that have projects
SELECT name, dept_id
FROM employees
WHERE dept_id IN (SELECT DISTINCT dept_id FROM projects);

-- Employees NOT in any department with projects
SELECT name, dept_id
FROM employees
WHERE dept_id NOT IN (SELECT DISTINCT dept_id FROM projects WHERE dept_id IS NOT NULL);

πŸ’‘ Watch out for NULL with NOT IN: If the subquery returns any NULL value, NOT IN will return no rows at all. Always add WHERE column IS NOT NULL inside the subquery, or use NOT EXISTS instead.

EXISTS Subquery

Returns TRUE if the subquery returns at least one row. Often more efficient than IN for large datasets.

-- Departments that have at least one employee
SELECT d.name
FROM departments d
WHERE EXISTS (
    SELECT 1 FROM employees e WHERE e.dept_id = d.id
);

-- Departments that have NO employees
SELECT d.name
FROM departments d
WHERE NOT EXISTS (
    SELECT 1 FROM employees e WHERE e.dept_id = d.id
);

FROM Subquery (Derived Table)

A subquery in the FROM clause creates a temporary result set (derived table) that you can query like a regular table.

-- Department salary statistics
SELECT
    dept_summary.department,
    dept_summary.emp_count,
    dept_summary.avg_salary,
    dept_summary.max_salary
FROM (
    SELECT
        d.name AS department,
        COUNT(e.id) AS emp_count,
        ROUND(AVG(e.salary), 2) AS avg_salary,
        MAX(e.salary) AS max_salary
    FROM departments d
    INNER JOIN employees e ON d.id = e.dept_id
    GROUP BY d.id, d.name
) AS dept_summary
WHERE dept_summary.emp_count >= 2;

πŸ”„ Subquery vs JOIN performance:

  • β€’ In many cases, JOINs and subqueries produce the same results. The optimizer may rewrite one into the other
  • β€’ EXISTS is usually faster than IN for correlated subqueries on large tables
  • β€’ Derived tables (FROM subqueries) can sometimes be replaced with WITH (CTE) for better readability (MySQL 8.0+)
  • β€’ Always check with EXPLAIN to verify the actual execution plan

7. UNION & UNION ALL

UNION combines the result sets of two or more SELECT statements into a single result. The columns must match in number and be compatible in type.

UNION (Removes Duplicates)

-- All names from both employees and departments
SELECT name, 'employee' AS type FROM employees
UNION
SELECT name, 'department' AS type FROM departments;

UNION ALL (Keeps Duplicates)

-- Combine recent orders and archived orders
SELECT id, user_id, total_amount, created_at FROM orders
UNION ALL
SELECT id, user_id, total_amount, created_at FROM orders_archive;

πŸ’‘ UNION vs UNION ALL performance: UNION performs a de-duplication step (like DISTINCT), which requires sorting or hashing. If you know there are no duplicates, or you want to keep them, use UNION ALL β€” it's significantly faster.

ORDER BY with UNION

Place ORDER BY at the end to sort the combined result. It applies to the entire UNION, not individual SELECTs.

SELECT name, salary, 'high' AS tier
FROM employees WHERE salary >= 80000
UNION ALL
SELECT name, salary, 'standard' AS tier
FROM employees WHERE salary < 80000
ORDER BY salary DESC;

Practical Example: Search Across Tables

-- Search for "Engineering" across multiple tables
SELECT 'department' AS source, name AS result FROM departments WHERE name LIKE '%Engineering%'
UNION ALL
SELECT 'project' AS source, title AS result FROM projects WHERE title LIKE '%Engineering%'
UNION ALL
SELECT 'employee' AS source, name AS result FROM employees WHERE name LIKE '%Engineering%';

πŸ”„ UNION rules across databases:

  • β€’ UNION and UNION ALL work the same in MySQL, PostgreSQL, SQL Server, and SQLite
  • β€’ PostgreSQL and SQL Server also support INTERSECT (rows in both) and EXCEPT (rows in first but not second)
  • β€’ MySQL 8.0.31+ added support for INTERSECT and EXCEPT

8. Chapter Summary

Join Types

  • β€’ INNER JOIN β€” only matching rows from both tables
  • β€’ LEFT JOIN β€” all left rows + matched right rows
  • β€’ RIGHT JOIN β€” all right rows + matched left rows
  • β€’ CROSS JOIN β€” Cartesian product (M Γ— N rows)
  • β€’ Self Join β€” a table joined with itself

Subqueries

  • β€’ Scalar β€” returns a single value
  • β€’ IN / NOT IN β€” returns a list of values
  • β€’ EXISTS / NOT EXISTS β€” checks for row existence
  • β€’ Derived Table β€” subquery in FROM clause

UNION Operations

  • β€’ UNION β€” combine + remove duplicates
  • β€’ UNION ALL β€” combine + keep duplicates (faster)
  • β€’ Columns must match in number and type
  • β€’ ORDER BY applies to the final combined result

Key Takeaways

  • β€’ Use INNER JOIN for strict matches
  • β€’ Use LEFT JOIN to preserve all left-side rows
  • β€’ LEFT JOIN + IS NULL finds orphan records
  • β€’ Prefer EXISTS over IN for correlated checks
  • β€’ Use UNION ALL when duplicates aren't an issue
Previous
Ch4: Table Design
πŸ“š
Next
Ch6: Aggregation