Aggregate Functions & Grouping
Background
In the examples below, assume we have a table called users
with the following columns:
name
– the user’s nameage
– the user’s agecity
– the city where the user livessignup_date
– the date the user signed upemail
– the user’s email address
COUNT()
- Returns the number of rows that match a specified condition.
COUNT(*)
counts all rows, including those with NULLs. So doesCOUNT(1)
.COUNT(column_name)
only counts non-null values.
SELECT COUNT(*) FROM users
SELECT COUNT(city) FROM users
SUM()
- Adds all the non-null values in a numeric column. Treat null values as 0.
SELECT SUM(age) FROM users
MIN()
/ MAX()
- Finds the smallest or largest value in a column.
SELECT MIN(age), MAX(age) FROM users
AVG()
- Calculates the average of non-null numeric values in a column.
SELECT AVG(age) FROM users
GROUP BY
- Groups rows that share the same value in a column so aggregate functions can be applied to each group.
SELECT city, COUNT(*) FROM users
GROUP BY city
This returns a table like the following:
city | count |
---|---|
New York | 10 |
Boston | 7 |
San Diego | 5 |
HAVING
- Filters grouped records (unlike
WHERE
, which filters rows before aggregating).
SELECT city, COUNT(*)
FROM users
GROUP BY city
HAVING COUNT(*) > 5
This returns a table like the following:
city | count |
---|---|
New York | 10 |
Boston | 7 |
DISTINCT
To view unique values
Examples
- Get all the unique cities users come from:
SELECT DISTINCT city FROM users
city New York Boston San Diego Seattle - Get unique (city, age) pairs:
SELECT DISTINCT city, age FROM users
city age New York 25 Boston 30 San Diego 25 New York 30 Seattle 40 - This shows the unique pairs of city and age — meaning the same city can appear more than once, but only if it’s paired with a different age.
Using DISTINCT
in aggregations
SELECT COUNT(DISTINCT city) AS unique_cities
FROM users
unique_cities |
---|
4 |
DISTINCT
goes inside the aggregate function rather than at the beginning of theSELECT
clause.
Query Clause Order
The order in which we write SQL clauses matters. Here’s the standard order for everything we’ve learned so far:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
Reference
These notes are based on the Mode Intermediate SQL Tutorial. I’ve written them to help myself quickly recall aggregation and grouping tools in SQL.