Lecture 4: Aggregations, Partitions & Window Functions
Databases can represent a very large volume of information, and we need to determine some aggregations (summaries) of data to handle them correctly in our "human" analysis. We may need to
SELECT count(*) FROM facilities
SELECT count(*) FROM facilities WHERE name LIKE 'Tennis%'
Understanding the order in which SQL processes a query is essential:
FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
WHERE filters rows before groupingHAVING filters groups after aggregationSELECT (like aliases or window functions) can't be used in WHERE or HAVINGGROUP BY creates groups of rows, then aggregates each group separately:
SELECT bookings.memid as memberid, count(bookings.bookid) as cnt FROM bookings
LEFT JOIN members ON members.memid = bookings.memid
GROUP BY bookings.memid
ORDER BY cnt desc
Each unique value in the GROUP BY column becomes one row in the result.
WHERE filters rows before grouping. HAVING filters groups after aggregation:
SELECT memid, COUNT(*) as booking_count
FROM bookings
GROUP BY memid
HAVING COUNT(*) > 10;
Compare with WHERE (which cannot use aggregates):
-- This would ERROR:
SELECT memid, COUNT(*) FROM bookings
WHERE COUNT(*) > 10 -- ❌ Can't use aggregate in WHERE
GROUP BY memid;
Provider note: In MySQL, HAVING can reference column aliases. In PostgreSQL/Oracle, you must repeat the aggregate expression. SQLite allows both.
ORDER BY sorts the final result. It can be combined with ASC (ascending) and DESC (descending):
SELECT * FROM members
ORDER BY joindate DESC;
You can sort by multiple columns:
SELECT * FROM bookings
ORDER BY facid ASC, starttime DESC;
CASE WHEN allows if/else logic inside your queries:
SELECT name,
CASE
WHEN membercost = 0 THEN 'Free'
WHEN membercost < 10 THEN 'Cheap'
ELSE 'Expensive'
END as price_category
FROM facilities;
Very powerful combined with aggregations:
SELECT f.name,
SUM(CASE WHEN b.memid = 0 THEN b.slots * f.guestcost
ELSE b.slots * f.membercost END) as revenue
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.name;
One can consider using the result of a SELECT as a regular table:
SELECT memberid, cnt FROM (
SELECT bookings.memid as memberid, count(bookings.bookid) as cnt FROM bookings
LEFT JOIN members ON members.memid = bookings.memid
GROUP BY bookings.memid
ORDER BY cnt desc
) mem_bookings;
The subquery must have an alias (here: mem_bookings)
Window functions perform calculations across rows related to the current row, without collapsing them into groups:
| Function | Description | Example |
|---|---|---|
| ROW_NUMBER() | Unique row number | ROW_NUMBER() OVER (ORDER BY sales) |
| RANK() | Rank with gaps | RANK() OVER (ORDER BY score DESC) |
| DENSE_RANK() | Rank without gaps | DENSE_RANK() OVER (ORDER BY score) |
| SUM() OVER() | Running total | SUM(amount) OVER (ORDER BY date) |
| AVG() OVER() | Moving average | AVG(price) OVER (PARTITION BY category) |
Remember: window functions are computed in SELECT, so we can't filter on them with WHERE. We need a subquery:
SELECT
memberid, cnt, pos
FROM (
SELECT
bookings.memid as memberid,
COUNT(bookings.bookid) as cnt,
RANK() OVER (ORDER BY COUNT(bookings.bookid) DESC) as pos
FROM bookings
LEFT JOIN members ON members.memid = bookings.memid
GROUP BY bookings.memid
) mem_bookings
WHERE pos <= 4;
Note: RANK() requires MySQL 8.0 or higher. For older versions, use variables or LIMIT.
For older MySQL versions, use LIMIT instead:
SELECT
bookings.memid as memberid,
COUNT(bookings.bookid) as cnt
FROM bookings
LEFT JOIN members ON members.memid = bookings.memid
GROUP BY bookings.memid
ORDER BY cnt DESC
LIMIT 4;
Or use user-defined variables:
SET @rank = 0;
SELECT memberid, cnt, @rank := @rank + 1 AS pos
FROM (
SELECT memid as memberid, COUNT(*) as cnt
FROM bookings
GROUP BY memid
ORDER BY cnt DESC
) ranked
LIMIT 4;
Using the bookings database, write queries for:
-- Example solution for #1:
SELECT f.name,
SUM(b.slots *
CASE WHEN b.memid = 0
THEN f.guestcost
ELSE f.membercost END) as revenue
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.facid, f.name
ORDER BY revenue DESC;
Practice using window functions (MySQL 8.0+):
-- Example solution for #1:
SELECT
f.name,
COUNT(*) as booking_count,
RANK() OVER (ORDER BY COUNT(*) DESC) as rank
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.facid, f.name;
A CTE is a temporary named result set that exists within a single SQL statement.
-- Basic CTE syntax
WITH cte_name AS (
SELECT ...
)
SELECT * FROM cte_name;
-- Single CTE
WITH monthly_revenue AS (
SELECT
MONTH(starttime) AS month,
SUM(slots * f.guestcost) AS revenue
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY MONTH(starttime)
)
SELECT month, revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_revenue;
WITH keyword introduces the CTEAS and the subquery in parenthesesChain multiple CTEs with commas - later CTEs can reference earlier ones:
WITH
facility_bookings AS (
SELECT facid, COUNT(*) AS total_bookings
FROM bookings
GROUP BY facid
),
facility_revenue AS (
SELECT facid, SUM(slots * guestcost) AS revenue
FROM bookings b
JOIN facilities f USING(facid)
GROUP BY facid
)
SELECT
f.name,
fb.total_bookings,
fr.revenue
FROM facilities f
JOIN facility_bookings fb USING(facid)
JOIN facility_revenue fr USING(facid)
ORDER BY fr.revenue DESC;
CTEs are especially powerful for multi-level aggregations:
WITH member_stats AS (
SELECT
memid,
COUNT(*) AS booking_count,
SUM(slots) AS total_slots
FROM bookings
GROUP BY memid
),
member_rankings AS (
SELECT
memid,
booking_count,
total_slots,
RANK() OVER (ORDER BY booking_count DESC) AS booking_rank
FROM member_stats
)
SELECT m.firstname, m.surname, mr.*
FROM member_rankings mr
JOIN members m USING(memid)
WHERE booking_rank <= 10;
Recursive CTEs can traverse hierarchical data (e.g., organization charts, categories):
-- Find member referral chains
WITH RECURSIVE referral_chain AS (
-- Base case: members who referred someone
SELECT memid, firstname, surname, recommendedby, 1 AS level
FROM members
WHERE recommendedby IS NULL
UNION ALL
-- Recursive case: members referred by previous level
SELECT m.memid, m.firstname, m.surname, m.recommendedby, rc.level + 1
FROM members m
JOIN referral_chain rc ON m.recommendedby = rc.memid
)
SELECT * FROM referral_chain ORDER BY level, surname;
Note: CTEs require MySQL 8.0+ or PostgreSQL 8.4+, MariaDB 10.2.1+
Rewrite this complex query using CTEs for better readability:
-- Original query with subqueries
SELECT f.name, sub.total_bookings,
(SELECT AVG(slots) FROM bookings WHERE facid = f.facid) AS avg_slots
FROM facilities f
JOIN (SELECT facid, COUNT(*) AS total_bookings
FROM bookings GROUP BY facid) sub ON f.facid = sub.facid
WHERE sub.total_bookings > (SELECT AVG(cnt) FROM
(SELECT COUNT(*) AS cnt FROM bookings GROUP BY facid) x);
Challenge:
-- Example: Check query execution plan
EXPLAIN SELECT f.name, COUNT(*)
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.facid;
Create a comprehensive report showing:
-- Hint: You'll need to combine:
-- - Multiple aggregations (SUM, COUNT, AVG)
-- - Window functions for percentage and ranking
-- - Subqueries for total revenue calculation
-- - DISTINCT for unique member counting