Way-Up

Databases

Lecture 4: Aggregations, Partitions & Window Functions

Relational databases : Querying with aggregations


Databases can represent a very large volume of information, and we need to determine some aggregations (summaries) of data to handle them correctly in our "human" analysis. We may need to

Relational databases : Querying with aggregations (2)


SELECT count(*) FROM facilities

Relational databases : Querying with aggregations (3)


SELECT count(*) FROM facilities WHERE name LIKE 'Tennis%'

Query processing order

Understanding the order in which SQL processes a query is essential:

Relational databases: summarizing with GROUP BY

GROUP BY creates groups of rows, then aggregates each group separately:

SELECT bookings.memid as memberid, count(bookings.bookid) as cnt FROM bookings
LEFT JOIN members ON members.memid = bookings.memid
GROUP BY bookings.memid
ORDER BY cnt desc

Each unique value in the GROUP BY column becomes one row in the result.

Filtering groups with HAVING

WHERE filters rows before grouping. HAVING filters groups after aggregation:

SELECT memid, COUNT(*) as booking_count
FROM bookings
GROUP BY memid
HAVING COUNT(*) > 10;

Compare with WHERE (which cannot use aggregates):

-- This would ERROR:
SELECT memid, COUNT(*) FROM bookings
WHERE COUNT(*) > 10  -- ❌ Can't use aggregate in WHERE
GROUP BY memid;

Provider note: In MySQL, HAVING can reference column aliases. In PostgreSQL/Oracle, you must repeat the aggregate expression. SQLite allows both.

Relational databases: sorting with ORDER BY

ORDER BY sorts the final result. It can be combined with ASC (ascending) and DESC (descending):

SELECT * FROM members
ORDER BY joindate DESC;

You can sort by multiple columns:

SELECT * FROM bookings
ORDER BY facid ASC, starttime DESC;

Conditional logic with CASE WHEN

CASE WHEN allows if/else logic inside your queries:

SELECT name,
       CASE
           WHEN membercost = 0 THEN 'Free'
           WHEN membercost < 10 THEN 'Cheap'
           ELSE 'Expensive'
       END as price_category
FROM facilities;

Very powerful combined with aggregations:

SELECT f.name,
       SUM(CASE WHEN b.memid = 0 THEN b.slots * f.guestcost
                ELSE b.slots * f.membercost END) as revenue
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.name;

Relational databases : sub SELECTs

One can consider using the result of a SELECT as a regular table:

SELECT memberid, cnt FROM (
	SELECT bookings.memid as memberid, count(bookings.bookid) as cnt FROM bookings
	LEFT JOIN members ON members.memid = bookings.memid
	GROUP BY bookings.memid
	ORDER BY cnt desc
) mem_bookings;

The subquery must have an alias (here: mem_bookings)

MySQL Window Functions (8.0+)

Window functions perform calculations across rows related to the current row, without collapsing them into groups:

Function Description Example
ROW_NUMBER() Unique row number ROW_NUMBER() OVER (ORDER BY sales)
RANK() Rank with gaps RANK() OVER (ORDER BY score DESC)
DENSE_RANK() Rank without gaps DENSE_RANK() OVER (ORDER BY score)
SUM() OVER() Running total SUM(amount) OVER (ORDER BY date)
AVG() OVER() Moving average AVG(price) OVER (PARTITION BY category)

Relational databases : sub SELECTs with RANK (MySQL 8.0+)

Remember: window functions are computed in SELECT, so we can't filter on them with WHERE. We need a subquery:

SELECT
    memberid, cnt, pos
FROM (
    SELECT
        bookings.memid as memberid,
        COUNT(bookings.bookid) as cnt,
        RANK() OVER (ORDER BY COUNT(bookings.bookid) DESC) as pos
    FROM bookings
    LEFT JOIN members ON members.memid = bookings.memid
    GROUP BY bookings.memid
) mem_bookings
WHERE pos <= 4;

Note: RANK() requires MySQL 8.0 or higher. For older versions, use variables or LIMIT.

Alternative for MySQL < 8.0 (without RANK)

For older MySQL versions, use LIMIT instead:

SELECT
    bookings.memid as memberid,
    COUNT(bookings.bookid) as cnt
FROM bookings
LEFT JOIN members ON members.memid = bookings.memid
GROUP BY bookings.memid
ORDER BY cnt DESC
LIMIT 4;

Or use user-defined variables:

SET @rank = 0;
SELECT memberid, cnt, @rank := @rank + 1 AS pos
FROM (
    SELECT memid as memberid, COUNT(*) as cnt
    FROM bookings
    GROUP BY memid
    ORDER BY cnt DESC
) ranked
LIMIT 4;

Exercise: Aggregations Practice

Using the bookings database, write queries for:

  1. Calculate the total revenue per facility
  2. Find the average number of slots booked per booking
  3. Find the facility with the highest total revenue
  4. List members who have made more than 10 bookings
  5. Calculate monthly revenue for each facility
-- Example solution for #1:
SELECT f.name,
       SUM(b.slots *
           CASE WHEN b.memid = 0
           THEN f.guestcost
           ELSE f.membercost END) as revenue
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.facid, f.name
ORDER BY revenue DESC;

Exercise: Window Functions

Practice using window functions (MySQL 8.0+):

  1. Rank facilities by total bookings
  2. Calculate running total of bookings per member
  3. Find the difference between each member's bookings and the average
  4. Assign row numbers to bookings ordered by date
-- Example solution for #1:
SELECT
    f.name,
    COUNT(*) as booking_count,
    RANK() OVER (ORDER BY COUNT(*) DESC) as rank
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.facid, f.name;

Common Table Expressions (CTEs)

A CTE is a temporary named result set that exists within a single SQL statement.

-- Basic CTE syntax
WITH cte_name AS (
    SELECT ...
)
SELECT * FROM cte_name;

CTE Syntax in Detail

-- Single CTE
WITH monthly_revenue AS (
    SELECT
        MONTH(starttime) AS month,
        SUM(slots * f.guestcost) AS revenue
    FROM bookings b
    JOIN facilities f ON b.facid = f.facid
    GROUP BY MONTH(starttime)
)
SELECT month, revenue,
       revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_revenue;

Multiple CTEs

Chain multiple CTEs with commas - later CTEs can reference earlier ones:

WITH
    facility_bookings AS (
        SELECT facid, COUNT(*) AS total_bookings
        FROM bookings
        GROUP BY facid
    ),
    facility_revenue AS (
        SELECT facid, SUM(slots * guestcost) AS revenue
        FROM bookings b
        JOIN facilities f USING(facid)
        GROUP BY facid
    )
SELECT
    f.name,
    fb.total_bookings,
    fr.revenue
FROM facilities f
JOIN facility_bookings fb USING(facid)
JOIN facility_revenue fr USING(facid)
ORDER BY fr.revenue DESC;

CTEs with Aggregations

CTEs are especially powerful for multi-level aggregations:

WITH member_stats AS (
    SELECT
        memid,
        COUNT(*) AS booking_count,
        SUM(slots) AS total_slots
    FROM bookings
    GROUP BY memid
),
member_rankings AS (
    SELECT
        memid,
        booking_count,
        total_slots,
        RANK() OVER (ORDER BY booking_count DESC) AS booking_rank
    FROM member_stats
)
SELECT m.firstname, m.surname, mr.*
FROM member_rankings mr
JOIN members m USING(memid)
WHERE booking_rank <= 10;

Recursive CTEs

Recursive CTEs can traverse hierarchical data (e.g., organization charts, categories):

-- Find member referral chains
WITH RECURSIVE referral_chain AS (
    -- Base case: members who referred someone
    SELECT memid, firstname, surname, recommendedby, 1 AS level
    FROM members
    WHERE recommendedby IS NULL

    UNION ALL

    -- Recursive case: members referred by previous level
    SELECT m.memid, m.firstname, m.surname, m.recommendedby, rc.level + 1
    FROM members m
    JOIN referral_chain rc ON m.recommendedby = rc.memid
)
SELECT * FROM referral_chain ORDER BY level, surname;

CTE vs Subqueries: When to Use Which?

Use CTEs when:

  • Query is complex and needs structure
  • Same subquery is used multiple times
  • You need recursion
  • You want self-documenting code

Use Subqueries when:

  • Query is simple and straightforward
  • Subquery is used only once
  • Performance is critical (inline may be faster)
  • Working with older MySQL versions (<8.0)

Note: CTEs require MySQL 8.0+ or PostgreSQL 8.4+, MariaDB 10.2.1+

Exercise: CTEs

Rewrite this complex query using CTEs for better readability:

-- Original query with subqueries
SELECT f.name, sub.total_bookings,
       (SELECT AVG(slots) FROM bookings WHERE facid = f.facid) AS avg_slots
FROM facilities f
JOIN (SELECT facid, COUNT(*) AS total_bookings
      FROM bookings GROUP BY facid) sub ON f.facid = sub.facid
WHERE sub.total_bookings > (SELECT AVG(cnt) FROM
      (SELECT COUNT(*) AS cnt FROM bookings GROUP BY facid) x);

Challenge:

  1. Create a CTE for booking counts per facility
  2. Create a CTE for the average booking count
  3. Join CTEs in the main query

MySQL Performance Tips for Aggregations

-- Example: Check query execution plan
EXPLAIN SELECT f.name, COUNT(*)
FROM bookings b
JOIN facilities f ON b.facid = f.facid
GROUP BY f.facid;

Advanced Exercise: Complex Query

Create a comprehensive report showing:

-- Hint: You'll need to combine:
-- - Multiple aggregations (SUM, COUNT, AVG)
-- - Window functions for percentage and ranking
-- - Subqueries for total revenue calculation
-- - DISTINCT for unique member counting

Slide Overview