1 聚合函数
聚合函数对一组值执行计算并返回单个结果。它们通常与 GROUP BY 配合使用,但也可以单独使用来计算整个表的统计值。
COUNT — 计数
-- COUNT(*) 统计所有行,包括 NULL
SELECT COUNT(*) AS total_orders FROM orders;
-- COUNT(column) 统计该列非 NULL 的行数
SELECT COUNT(email) AS has_email FROM customers;
-- COUNT(DISTINCT column) 统计去重后的非 NULL 值数
SELECT COUNT(DISTINCT category) AS category_count FROM products;
-- 组合使用:一条语句获取多个统计值
SELECT
COUNT(*) AS total_products,
COUNT(discount_price) AS on_sale,
COUNT(DISTINCT brand) AS brand_count
FROM products;
SUM / AVG — 求和与平均
-- 订单总金额
SELECT SUM(amount) AS total_revenue FROM orders
WHERE order_date >= '2024-01-01';
-- 商品平均价格
SELECT AVG(price) AS avg_price FROM products;
-- SUM 和 AVG 自动忽略 NULL 值
SELECT
SUM(bonus) AS total_bonus,
AVG(bonus) AS avg_bonus
FROM employees;
-- 如果 10 条记录中 3 条 bonus 为 NULL,AVG 的分母是 7 而非 10
MIN / MAX — 最小与最大
-- 最低价和最高价
SELECT
MIN(price) AS cheapest,
MAX(price) AS most_expensive
FROM products;
-- MIN/MAX 也可用于日期和字符串
SELECT
MIN(order_date) AS first_order,
MAX(order_date) AS last_order
FROM orders;
-- 综合示例:订单统计概览
SELECT
COUNT(*) AS order_count,
SUM(amount) AS total_amount,
AVG(amount) AS avg_amount,
MIN(amount) AS min_amount,
MAX(amount) AS max_amount
FROM orders
WHERE YEAR(order_date) = 2024;
💡 COUNT(*) vs COUNT(column)
COUNT(*) 统计所有行(包含 NULL),COUNT(column) 只统计该列值不为 NULL 的行。当列中有 NULL 值时,两者结果不同。如果只想知道行数,用 COUNT(*) 性能更好。
2 GROUP BY 分组
GROUP BY 将结果集按一个或多个列的值分组,每组返回一行。SELECT 中只能出现分组列或聚合函数。
基本分组
-- 每个类别的商品数量
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category;
-- 每个客户的订单总额
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC;
多列分组
-- 按年月统计销售额
SELECT
YEAR(order_date) AS order_year,
MONTH(order_date) AS order_month,
COUNT(*) AS order_count,
SUM(amount) AS monthly_revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date)
ORDER BY order_year, order_month;
-- 每个部门、每个职位的平均工资
SELECT
department,
job_title,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department, job_title
ORDER BY department, avg_salary DESC;
WITH ROLLUP — 小计与合计
WITH ROLLUP 在分组结果后自动添加小计行和总计行,适合生成报表。
-- 每个类别的销售额,附带总计
SELECT
COALESCE(category, '【总计】') AS category,
COUNT(*) AS product_count,
SUM(price) AS total_price
FROM products
GROUP BY category WITH ROLLUP;
-- 结果示例:
-- +-----------+---------------+-------------+
-- | category | product_count | total_price |
-- +-----------+---------------+-------------+
-- | 手机 | 5 | 25000.00 |
-- | 笔记本 | 3 | 18000.00 |
-- | 平板 | 4 | 12000.00 |
-- | 【总计】 | 12 | 55000.00 |
-- +-----------+---------------+-------------+
-- 多列 ROLLUP:先按 year 小计,再总计
SELECT
COALESCE(CAST(YEAR(order_date) AS CHAR), '合计') AS order_year,
COALESCE(CAST(MONTH(order_date) AS CHAR), '小计') AS order_month,
SUM(amount) AS revenue
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date) WITH ROLLUP;
3 HAVING 过滤
HAVING 在分组之后过滤,而 WHERE 在分组之前过滤行。这是 SQL 执行顺序决定的。
WHERE vs HAVING
| 特性 | WHERE | HAVING |
|---|---|---|
| 执行时机 | 分组之前 | 分组之后 |
| 过滤对象 | 单行数据 | 分组后的结果 |
| 能否用聚合函数 | 不能 | 能 |
| 能否引用别名 | 不能 | MySQL 允许 |
HAVING 使用示例
-- 找出平均价格超过 100 的类别
SELECT
category,
AVG(price) AS avg_price,
COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING avg_price > 100;
-- 找出下单超过 5 次的客户
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING order_count >= 5
ORDER BY total_spent DESC;
-- WHERE 和 HAVING 同时使用
-- 先用 WHERE 过滤 2024 年的订单,再用 HAVING 筛选大客户
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY customer_id
HAVING total_spent > 10000
ORDER BY total_spent DESC;
SQL 执行顺序
理解执行顺序是掌握 WHERE 与 HAVING 的关键:
-- SQL 的逻辑执行顺序(非书写顺序):
-- 1. FROM — 确定数据源
-- 2. WHERE — 过滤行
-- 3. GROUP BY — 分组
-- 4. HAVING — 过滤分组
-- 5. SELECT — 选择列 / 计算表达式
-- 6. DISTINCT — 去重
-- 7. ORDER BY — 排序
-- 8. LIMIT — 限制行数
💡 性能提示
能用 WHERE 过滤的条件不要放在 HAVING 中。WHERE 先执行,减少了参与分组的数据量,效率更高。
4 窗口函数(MySQL 8.0+)
窗口函数在不减少结果行数的情况下进行聚合计算。与 GROUP BY 不同,窗口函数保留每一行的数据,同时添加统计列。语法:函数() OVER (PARTITION BY ... ORDER BY ...)
排名函数
-- ROW_NUMBER():顺序编号,不存在并列
-- RANK():并列时编号相同,后续编号跳过
-- DENSE_RANK():并列时编号相同,后续编号不跳过
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num,
RANK() OVER (ORDER BY salary DESC) AS rank_num,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank_num
FROM employees;
-- 结果示例(salary: 10000, 8000, 8000, 6000):
-- row_num: 1, 2, 3, 4
-- rank_num: 1, 2, 2, 4 (跳过了 3)
-- dense_rank_num: 1, 2, 2, 3 (不跳过)
PARTITION BY — 分组内计算
-- 每个部门内按工资排名
SELECT
name,
department,
salary,
RANK() OVER (
PARTITION BY department
ORDER BY salary DESC
) AS dept_rank
FROM employees;
-- 每个类别内的商品价格排名
SELECT
product_name,
category,
price,
DENSE_RANK() OVER (
PARTITION BY category
ORDER BY price DESC
) AS price_rank
FROM products;
聚合窗口函数
-- SUM() OVER — 累计求和(Running Total)
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
-- AVG() OVER — 每个部门的平均工资(附在每行)
SELECT
name,
department,
salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg
FROM employees;
-- COUNT() OVER — 累计计数
SELECT
order_date,
customer_id,
COUNT(*) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS customer_order_seq
FROM orders;
LEAD / LAG — 前后行对比
-- LAG(column, offset, default):取前 N 行的值
-- LEAD(column, offset, default):取后 N 行的值
SELECT
order_date,
amount,
LAG(amount, 1, 0) OVER (ORDER BY order_date) AS prev_amount,
amount - LAG(amount, 1, 0) OVER (ORDER BY order_date) AS change_amount,
LEAD(amount, 1) OVER (ORDER BY order_date) AS next_amount
FROM orders;
-- 环比增长率
SELECT
order_month,
monthly_revenue,
LAG(monthly_revenue, 1) OVER (ORDER BY order_month) AS prev_month_revenue,
ROUND(
(monthly_revenue - LAG(monthly_revenue, 1) OVER (ORDER BY order_month))
/ LAG(monthly_revenue, 1) OVER (ORDER BY order_month) * 100,
2
) AS growth_rate_pct
FROM (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS order_month,
SUM(amount) AS monthly_revenue
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
) monthly;
Frame 子句 — 精确控制窗口范围
-- ROWS BETWEEN 定义窗口的物理范围
-- 3 日移动平均
SELECT
order_date,
daily_revenue,
AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day
FROM (
SELECT order_date, SUM(amount) AS daily_revenue
FROM orders
GROUP BY order_date
) daily;
-- 常用 Frame 写法:
-- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW 从第一行到当前行(默认)
-- ROWS BETWEEN 2 PRECEDING AND CURRENT ROW 前 2 行 + 当前行
-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING 前 1 行 + 当前行 + 后 1 行
-- ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING 当前行到最后一行
🔄 MySQL vs PostgreSQL 窗口函数
MySQL 8.0 引入了窗口函数,PostgreSQL 从 8.4(2009 年)起就支持。PostgreSQL 提供更丰富的功能:
-- 两者通用语法相同
SELECT RANK() OVER (PARTITION BY dept ORDER BY salary DESC) FROM employees;
-- PostgreSQL 额外支持:
-- 1. FILTER 子句
-- SELECT COUNT(*) FILTER (WHERE status = 'active') OVER () FROM users;
-- 2. GROUPS frame 类型(基于值的分组)
-- GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
-- 3. EXCLUDE 子句
-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW
-- MySQL 8.0 支持 ROWS 和 RANGE,但不支持 GROUPS 和 EXCLUDE
5 实战:销售报表
以下示例基于订单表和商品表,展示实际业务场景中的聚合查询。
月度销售汇总
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS month,
COUNT(DISTINCT o.id) AS order_count,
COUNT(DISTINCT o.customer_id) AS customer_count,
SUM(oi.quantity) AS items_sold,
SUM(oi.quantity * oi.unit_price) AS revenue,
ROUND(SUM(oi.quantity * oi.unit_price) / COUNT(DISTINCT o.id), 2) AS avg_order_value
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
WHERE o.order_date >= '2024-01-01'
GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
ORDER BY month;
Top N 商品(每个类别)
-- 每个类别销量前 3 的商品
SELECT category, product_name, total_sold, category_rank
FROM (
SELECT
p.category,
p.product_name,
SUM(oi.quantity) AS total_sold,
DENSE_RANK() OVER (
PARTITION BY p.category
ORDER BY SUM(oi.quantity) DESC
) AS category_rank
FROM products p
JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.category, p.product_name
) ranked
WHERE category_rank <= 3
ORDER BY category, category_rank;
累计销售额与移动平均
-- 每日累计销售额 + 7 日移动平均
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue,
ROUND(AVG(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS moving_avg_7day
FROM (
SELECT
order_date,
SUM(amount) AS daily_revenue
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY order_date
) daily
ORDER BY order_date;
同比分析(Year-over-Year)
-- 月度同比对比
SELECT
curr.month_label,
curr.revenue AS current_year,
prev.revenue AS previous_year,
ROUND(
(curr.revenue - prev.revenue) / prev.revenue * 100, 2
) AS yoy_growth_pct
FROM (
SELECT
MONTH(order_date) AS m,
DATE_FORMAT(order_date, '%m月') AS month_label,
SUM(amount) AS revenue
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY MONTH(order_date), DATE_FORMAT(order_date, '%m月')
) curr
LEFT JOIN (
SELECT
MONTH(order_date) AS m,
SUM(amount) AS revenue
FROM orders
WHERE YEAR(order_date) = 2023
GROUP BY MONTH(order_date)
) prev ON curr.m = prev.m
ORDER BY curr.m;
💡 报表查询优化
对于频繁查询的报表,考虑以下优化方式:① 在 order_date 列上建立索引;② 对于历史数据,使用汇总表(summary table)预先计算结果;③ 复杂报表可创建视图简化查询。
6 本章要点
🔢 聚合函数
- •
COUNT(*)统计所有行,COUNT(col)排除 NULL - •
SUM()、AVG()自动忽略 NULL 值 - •
MIN()/MAX()适用于数字、日期和字符串 - •
COUNT(DISTINCT col)统计不重复的非 NULL 值
📦 GROUP BY 分组
- • SELECT 中非聚合列必须出现在 GROUP BY 中
- • 支持多列分组和表达式分组
- •
WITH ROLLUP生成小计和总计行 - •
HAVING过滤分组,WHERE过滤行
🪟 窗口函数
- •
ROW_NUMBER()、RANK()、DENSE_RANK()排名 - •
PARTITION BY分组 +ORDER BY排序 - •
LAG()/LEAD()访问前后行 - •
ROWS BETWEEN定义窗口范围
📊 实战技巧
- • 用窗口函数实现 Top N、累计求和、移动平均
- • 同比 / 环比分析结合子查询或
LAG() - • 能用 WHERE 过滤的不要放到 HAVING
- • 复杂报表考虑视图和汇总表优化