1. Aggregate Functions
Aggregate functions compute a single result from a set of rows. They are the foundation of data analysis in SQL. MySQL provides five core aggregate functions:
| Function | Description | NULLs |
|---|---|---|
COUNT() | Counts number of rows | See below |
SUM() | Totals numeric values | Ignored |
AVG() | Averages numeric values | Ignored |
MIN() | Finds minimum value | Ignored |
MAX() | Finds maximum value | Ignored |
Let's work with a sample orders table:
CREATE TABLE orders (
id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
product VARCHAR(100),
amount DECIMAL(10,2),
quantity INT,
order_date DATE
);
INSERT INTO orders (customer_id, product, amount, quantity, order_date) VALUES
(1, 'Laptop', 999.99, 1, '2024-01-15'),
(2, 'Mouse', 29.99, 3, '2024-01-16'),
(1, 'Keyboard', 79.99, 1, '2024-01-20'),
(3, 'Monitor', 349.99, 2, '2024-02-01'),
(2, 'Webcam', 59.99, 1, '2024-02-05'),
(1, 'Headset', 89.99, 1, '2024-02-10'),
(3, 'USB Hub', NULL, 1, '2024-02-15'),
(4, 'SSD', 129.99, 2, '2024-03-01'),
(2, 'RAM', 64.99, 4, '2024-03-10'),
(1, 'Cable', 9.99, 5, '2024-03-15');
Basic Usage
-- Count all rows
SELECT COUNT(*) AS total_orders FROM orders;
-- Result: 10
-- Sum of all amounts (NULLs ignored)
SELECT SUM(amount) AS total_revenue FROM orders;
-- Average order amount
SELECT AVG(amount) AS avg_order FROM orders;
-- Cheapest and most expensive products
SELECT MIN(amount) AS cheapest, MAX(amount) AS most_expensive FROM orders;
-- Combine multiple aggregates
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS avg_amount,
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders;
COUNT(*) vs COUNT(column)
Key Point: COUNT(*) vs COUNT(column)
COUNT(*) counts all rows, including those with NULL values. COUNT(column) counts only non-NULL values in that column. This distinction is critical when your data contains NULLs.
-- COUNT(*) counts all rows: 10
SELECT COUNT(*) AS all_rows FROM orders;
-- COUNT(amount) ignores the NULL row: 9
SELECT COUNT(amount) AS non_null_amounts FROM orders;
-- COUNT(DISTINCT ...) counts unique values
SELECT COUNT(DISTINCT customer_id) AS unique_customers FROM orders;
-- Result: 4
Using DISTINCT Inside Aggregates
-- Total quantity of distinct products ordered
SELECT COUNT(DISTINCT product) AS unique_products FROM orders;
-- Sum only distinct amounts (rarely used, but valid)
SELECT SUM(DISTINCT amount) AS sum_distinct FROM orders;
2. GROUP BY
GROUP BY divides rows into groups. Each group is then processed by an aggregate function to produce one output row per group.
Basic Grouping
-- Orders per customer
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id;
-- Result:
-- customer_id | order_count | total_spent
-- 1 | 4 | 1179.96
-- 2 | 3 | 154.97
-- 3 | 2 | 349.99
-- 4 | 1 | 129.99
Grouping by Multiple Columns
-- Monthly orders per customer
SELECT
customer_id,
DATE_FORMAT(order_date, '%Y-%m') AS month,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id, DATE_FORMAT(order_date, '%Y-%m')
ORDER BY customer_id, month;
GROUP BY with Expressions
-- Group by year and month
SELECT
YEAR(order_date) AS yr,
MONTH(order_date) AS mo,
SUM(amount) AS monthly_revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY yr, mo;
-- Group by price range
SELECT
CASE
WHEN amount < 50 THEN 'Budget'
WHEN amount < 200 THEN 'Mid-range'
ELSE 'Premium'
END AS price_tier,
COUNT(*) AS product_count,
AVG(amount) AS avg_price
FROM orders
WHERE amount IS NOT NULL
GROUP BY price_tier
ORDER BY avg_price;
WITH ROLLUP
WITH ROLLUP generates subtotals and a grand total automatically:
SELECT
customer_id,
SUM(amount) AS total_spent,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id WITH ROLLUP;
-- The last row with customer_id = NULL is the grand total
-- customer_id | total_spent | order_count
-- 1 | 1179.96 | 4
-- 2 | 154.97 | 3
-- 3 | 349.99 | 2
-- 4 | 129.99 | 1
-- NULL | 1814.91 | 10
-- Use GROUPING() to identify rollup rows (MySQL 8.0+)
SELECT
IF(GROUPING(customer_id), 'Grand Total', customer_id) AS customer,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id WITH ROLLUP;
3. HAVING
HAVING filters groups after aggregation. Think of it as WHERE for groups.
| Clause | Filters | Timing | Can Use Aggregates? |
|---|---|---|---|
WHERE |
Individual rows | Before grouping | No |
HAVING |
Groups | After grouping | Yes |
-- Customers who have placed more than 2 orders
SELECT
customer_id,
COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
HAVING COUNT(*) > 2;
-- Customers who spent more than $200
SELECT
customer_id,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 200;
-- Combine WHERE and HAVING:
-- Among 2024 orders, find customers with average order > $100
SELECT
customer_id,
AVG(amount) AS avg_order,
COUNT(*) AS order_count
FROM orders
WHERE order_date >= '2024-01-01'
AND amount IS NOT NULL
GROUP BY customer_id
HAVING AVG(amount) > 100
ORDER BY avg_order DESC;
SQL execution order matters here:
-- Logical execution order:
-- 1. FROM - pick the table
-- 2. WHERE - filter rows
-- 3. GROUP BY - form groups
-- 4. HAVING - filter groups
-- 5. SELECT - compute output columns
-- 6. ORDER BY - sort results
-- 7. LIMIT - restrict result count
4. Window Functions (MySQL 8.0+)
Window functions perform calculations across a set of rows related to the current row, without collapsing the result into a single row like GROUP BY does. They use the OVER() clause.
-- Syntax:
-- function_name() OVER (
-- [PARTITION BY column, ...]
-- [ORDER BY column [ASC|DESC], ...]
-- [frame_clause]
-- )
ROW_NUMBER(), RANK(), DENSE_RANK()
These three ranking functions number rows within each partition. The difference appears when there are ties:
-- Sample data for ranking demonstration
CREATE TABLE exam_scores (
student VARCHAR(50),
subject VARCHAR(50),
score INT
);
INSERT INTO exam_scores VALUES
('Alice', 'Math', 95), ('Bob', 'Math', 90),
('Carol', 'Math', 90), ('Dave', 'Math', 85),
('Alice', 'English', 88), ('Bob', 'English', 92),
('Carol', 'English', 92), ('Dave', 'English', 78);
-- Compare all three ranking functions
SELECT
subject,
student,
score,
ROW_NUMBER() OVER (PARTITION BY subject ORDER BY score DESC) AS row_num,
RANK() OVER (PARTITION BY subject ORDER BY score DESC) AS rnk,
DENSE_RANK() OVER (PARTITION BY subject ORDER BY score DESC) AS dense_rnk
FROM exam_scores;
-- For Math subject:
-- student | score | row_num | rnk | dense_rnk
-- Alice | 95 | 1 | 1 | 1
-- Bob | 90 | 2 | 2 | 2
-- Carol | 90 | 3 | 2 | 2 (tie!)
-- Dave | 85 | 4 | 4 | 3 (rank skips 3; dense_rank doesn't)
SUM() OVER, AVG() OVER
Aggregate functions can be used as window functions to compute running totals, averages, etc.:
-- Running total by order date
SELECT
id,
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders
WHERE amount IS NOT NULL
ORDER BY order_date;
-- Per-customer running total
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS customer_running_total
FROM orders
WHERE amount IS NOT NULL;
-- Percentage of total
SELECT
product,
amount,
SUM(amount) OVER () AS grand_total,
ROUND(amount / SUM(amount) OVER () * 100, 2) AS pct_of_total
FROM orders
WHERE amount IS NOT NULL;
PARTITION BY and ORDER BY
-- PARTITION BY divides rows into groups (like GROUP BY, but keeps all rows)
-- ORDER BY defines the order within each partition
SELECT
customer_id,
product,
amount,
-- Average per customer
AVG(amount) OVER (PARTITION BY customer_id) AS customer_avg,
-- Rank within customer by amount
RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS price_rank,
-- Overall rank
RANK() OVER (ORDER BY amount DESC) AS overall_rank
FROM orders
WHERE amount IS NOT NULL;
LEAD() and LAG()
Access data from the next or previous row without self-joining:
-- Compare each order with the previous one
SELECT
order_date,
amount,
LAG(amount, 1) OVER (ORDER BY order_date) AS prev_amount,
LEAD(amount, 1) OVER (ORDER BY order_date) AS next_amount,
amount - LAG(amount, 1) OVER (ORDER BY order_date) AS diff_from_prev
FROM orders
WHERE amount IS NOT NULL
ORDER BY order_date;
-- With default values for first/last rows
SELECT
order_date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY order_date) AS prev_amount,
LEAD(amount, 1, 0) OVER (ORDER BY order_date) AS next_amount
FROM orders
WHERE amount IS NOT NULL;
-- FIRST_VALUE and LAST_VALUE
SELECT
customer_id,
product,
amount,
FIRST_VALUE(product) OVER (
PARTITION BY customer_id ORDER BY order_date
) AS first_product_bought,
LAST_VALUE(product) OVER (
PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_product_bought
FROM orders;
Frame Clause: ROWS BETWEEN
The frame clause defines exactly which rows relative to the current row are included in the window calculation:
-- Syntax:
-- ROWS BETWEEN <start> AND <end>
--
-- Options for start/end:
-- UNBOUNDED PRECEDING (first row of partition)
-- n PRECEDING (n rows before current)
-- CURRENT ROW
-- n FOLLOWING (n rows after current)
-- UNBOUNDED FOLLOWING (last row of partition)
-- 3-row moving average
SELECT
order_date,
amount,
AVG(amount) OVER (
ORDER BY order_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS moving_avg_3
FROM orders
WHERE amount IS NOT NULL;
-- Cumulative sum (default when ORDER BY is present)
SELECT
order_date,
amount,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_sum
FROM orders
WHERE amount IS NOT NULL;
-- Sum of current + next 2 rows
SELECT
order_date,
amount,
SUM(amount) OVER (
ORDER BY order_date
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) AS forward_sum
FROM orders
WHERE amount IS NOT NULL;
Named Windows (WINDOW Clause)
-- Avoid repeating window definitions
SELECT
customer_id,
order_date,
amount,
SUM(amount) OVER w AS running_total,
AVG(amount) OVER w AS running_avg,
COUNT(*) OVER w AS running_count
FROM orders
WHERE amount IS NOT NULL
WINDOW w AS (PARTITION BY customer_id ORDER BY order_date)
ORDER BY customer_id, order_date;
MySQL vs PostgreSQL: Window Functions
MySQL 8.0+: Supports all standard window functions. Added in 8.0 (2018). No support for RANGE with interval offsets (e.g., RANGE BETWEEN INTERVAL 7 DAY PRECEDING).
PostgreSQL: Full window function support since version 8.4 (2009). Supports RANGE with interval types, GROUPS frame mode, and FILTER (WHERE ...) clause on aggregates. Also supports EXCLUDE in frame specs.
Tip: If you need FILTER (WHERE ...) in MySQL, use SUM(CASE WHEN ... END) as a workaround.
5. Practical Example: Sales Report
Let's build a comprehensive sales reporting system using a realistic dataset:
-- Setup: products and sales tables
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
category VARCHAR(50) NOT NULL,
price DECIMAL(10,2) NOT NULL
);
CREATE TABLE sales (
id INT PRIMARY KEY AUTO_INCREMENT,
product_id INT NOT NULL,
quantity INT NOT NULL,
sale_date DATE NOT NULL,
FOREIGN KEY (product_id) REFERENCES products(id)
);
INSERT INTO products (name, category, price) VALUES
('Laptop Pro', 'Electronics', 1299.99),
('Wireless Mouse','Electronics', 29.99),
('Desk Lamp', 'Office', 45.99),
('Notebook Set', 'Office', 12.99),
('Headphones', 'Electronics', 199.99),
('Monitor 27"', 'Electronics', 449.99),
('Pen Pack', 'Office', 8.99),
('Keyboard', 'Electronics', 79.99);
INSERT INTO sales (product_id, quantity, sale_date) VALUES
(1, 2, '2024-01-05'), (2, 10, '2024-01-08'), (3, 5, '2024-01-12'),
(4, 20, '2024-01-15'), (5, 3, '2024-01-20'), (1, 1, '2024-02-02'),
(6, 4, '2024-02-10'), (2, 8, '2024-02-14'), (7, 30, '2024-02-18'),
(3, 7, '2024-02-22'), (8, 6, '2024-03-01'), (1, 3, '2024-03-05'),
(5, 5, '2024-03-10'), (4, 15, '2024-03-15'), (6, 2, '2024-03-20'),
(2, 12, '2024-03-25'), (7, 25, '2024-04-01'), (8, 4, '2024-04-05'),
(3, 8, '2024-04-10'), (1, 2, '2024-04-15');
Monthly Sales Summary
SELECT
DATE_FORMAT(s.sale_date, '%Y-%m') AS month,
COUNT(DISTINCT s.id) AS total_transactions,
SUM(s.quantity) AS units_sold,
SUM(s.quantity * p.price) AS revenue,
ROUND(AVG(s.quantity * p.price), 2) AS avg_transaction_value
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY DATE_FORMAT(s.sale_date, '%Y-%m')
ORDER BY month;
-- Add month-over-month growth using window functions
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
ROUND(
(revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month) * 100,
2
) AS growth_pct
FROM (
SELECT
DATE_FORMAT(s.sale_date, '%Y-%m') AS month,
SUM(s.quantity * p.price) AS revenue
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY DATE_FORMAT(s.sale_date, '%Y-%m')
) monthly_data
ORDER BY month;
Top N Products per Category
-- Top 2 products by revenue in each category
SELECT category, name, total_revenue, rnk
FROM (
SELECT
p.category,
p.name,
SUM(s.quantity * p.price) AS total_revenue,
ROW_NUMBER() OVER (
PARTITION BY p.category
ORDER BY SUM(s.quantity * p.price) DESC
) AS rnk
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.category, p.name
) ranked
WHERE rnk <= 2
ORDER BY category, rnk;
-- With DENSE_RANK to handle ties
SELECT category, name, total_revenue, rnk
FROM (
SELECT
p.category,
p.name,
SUM(s.quantity * p.price) AS total_revenue,
DENSE_RANK() OVER (
PARTITION BY p.category
ORDER BY SUM(s.quantity * p.price) DESC
) AS rnk
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.category, p.name
) ranked
WHERE rnk <= 2
ORDER BY category, rnk;
Running Total and Moving Average
-- Daily running total and 3-day moving average
SELECT
sale_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total,
ROUND(
AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2
) AS moving_avg_3day
FROM (
SELECT
s.sale_date,
SUM(s.quantity * p.price) AS daily_revenue
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY s.sale_date
) daily_data
ORDER BY sale_date;
-- Category-level cumulative revenue
SELECT
p.category,
s.sale_date,
SUM(s.quantity * p.price) AS daily_revenue,
SUM(SUM(s.quantity * p.price)) OVER (
PARTITION BY p.category
ORDER BY s.sale_date
) AS category_running_total
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.category, s.sale_date
ORDER BY p.category, s.sale_date;
Complete Dashboard Query
-- A single query combining multiple analytics
SELECT
p.category,
p.name AS product,
SUM(s.quantity) AS units_sold,
SUM(s.quantity * p.price) AS revenue,
ROUND(
SUM(s.quantity * p.price)
/ SUM(SUM(s.quantity * p.price)) OVER () * 100,
2
) AS revenue_share_pct,
RANK() OVER (ORDER BY SUM(s.quantity * p.price) DESC) AS overall_rank,
RANK() OVER (
PARTITION BY p.category
ORDER BY SUM(s.quantity * p.price) DESC
) AS category_rank
FROM sales s
JOIN products p ON s.product_id = p.id
GROUP BY p.category, p.name
ORDER BY revenue DESC;
6. Chapter Summary
Aggregate Functions
COUNT,SUM,AVG,MIN,MAXare the five core aggregatesCOUNT(*)counts all rows;COUNT(col)skips NULLs- Use
DISTINCTinside aggregates for unique values
GROUP BY & HAVING
GROUP BYcollapses rows into groupsWITH ROLLUPadds subtotals and a grand totalHAVINGfilters groups;WHEREfilters rows
Window Functions
ROW_NUMBER,RANK,DENSE_RANKfor rankingLEAD,LAGto access adjacent rows- Frame clauses like
ROWS BETWEENcontrol the window WINDOWclause to name and reuse definitions
Best Practices
- Filter with
WHEREfirst to reduce grouping work - Use window functions to avoid correlated subqueries
- Name your windows when reusing the same definition
- Watch for NULLs in aggregate calculations