TDD with AI

Test-Driven Development

Lecture 5

Write tests first, then let AI implement the code

Why TDD is Essential with AI

Tests are your safety net against AI mistakes

AI Without TDD

  • Code works initially, breaks later
  • No verification of correctness
  • Regressions go unnoticed
  • Refactoring is risky
  • Edge cases are missed

AI With TDD

  • Tests define expected behavior
  • Immediate verification of AI output
  • Regressions caught instantly
  • Safe refactoring (tests stay green)
  • Edge cases are explicitly tested

TDD transforms AI from a code generator into a verified solution provider.

TDD in the AIDD Workflow

Where Test-Driven Development fits

Discover
Plan
Review
Execute
Commit
Test

Execute Phase

TDD is the implementation method:

  • Write failing test (RED)
  • AI implements to pass (GREEN)
  • Refactor with confidence

Test Phase

Validate the complete feature:

  • Integration tests
  • E2E user journeys
  • Human testing scripts

The TDD Cycle: Red-Green-Refactor

RED
Write failing test
GREEN
Make it pass
REFACTOR
Clean up

1. RED

Write a test that fails. This defines what you want to achieve.

2. GREEN

Write the minimum code to make the test pass. No more.

3. REFACTOR

Clean up the code while keeping tests green.

TDD with AI: The Workflow

1

You write the test
Define expected behavior first

2

Run it (should fail)
Confirm the test is valid

3

AI implements
Generate code to pass test

4

Run tests (should pass)
Verify AI's implementation

Key insight: YOU define what "correct" means. AI figures out HOW to achieve it.

Example: User Registration (RED)

Step 1: Write a failing test for the first requirement

# test_user_registration.py
import pytest
from app.auth import register_user

def test_register_user_with_valid_data():
    """User can register with valid email and password."""
    result = register_user(
        email="test@example.com",
        password="SecurePass123"
    )

    assert result.success is True
    assert result.user.email == "test@example.com"
    assert result.user.id is not None

def test_register_user_with_invalid_email():
    """Registration fails with invalid email format."""
    result = register_user(
        email="not-an-email",
        password="SecurePass123"
    )

    assert result.success is False
    assert "email" in result.errors
Run test: pytest test_user_registration.pyFAILS (function doesn't exist yet)

Example: User Registration (GREEN)

Step 2: Ask AI to implement the minimum code to pass

Implement the `register_user` function to pass these tests: [paste the test code] Requirements from vision.md: - Validate email format (RFC 5322) - Password: min 8 chars, 1 uppercase, 1 number - Return a result object with success, user, or errors Write ONLY what's needed to pass the tests. No extra features.
# app/auth.py (AI-generated)
import re
from dataclasses import dataclass

@dataclass
class User:
    id: str
    email: str

@dataclass
class RegistrationResult:
    success: bool
    user: User = None
    errors: dict = None

def register_user(email: str, password: str) -> RegistrationResult:
    errors = {}

    # Validate email (simple RFC 5322 pattern)
    if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
        errors['email'] = 'Invalid email format'

    # Validate password
    if len(password) < 8 or not re.search(r'[A-Z]', password) or not re.search(r'\d', password):
        errors['password'] = 'Password must be 8+ chars with 1 uppercase and 1 number'

    if errors:
        return RegistrationResult(success=False, errors=errors)

    user = User(id=str(uuid.uuid4()), email=email)
    return RegistrationResult(success=True, user=user)

Example: User Registration (REFACTOR)

Step 3: Clean up while keeping tests green

What to Refactor

  • Extract validation into separate functions
  • Add type hints
  • Improve error messages
  • Extract regex patterns to constants

The Rule

Tests must stay green.

If a refactor breaks a test, either:
1. The refactor is wrong, OR
2. The test was testing implementation, not behavior

Refactor this code while keeping all tests passing: - Extract validation into `validate_email()` and `validate_password()` functions - Add proper type hints - Move regex patterns to module-level constants Do NOT change the function signature or return types.

The Golden Rule: One Test at a Time

WRONG: Batch Testing

Write all tests first:
- test_valid_registration
- test_invalid_email
- test_weak_password
- test_duplicate_email
- test_email_verification

Then implement everything

Problem: Overwhelming, hard to debug, AI generates bloated code

RIGHT: Incremental

Cycle 1: test_valid_registration
  → implement → pass → refactor

Cycle 2: test_invalid_email
  → implement → pass → refactor

Cycle 3: test_weak_password
  → implement → pass → refactor

...

Each cycle is focused and verifiable

Types of Tests in TDD

Unit Tests

Test individual functions/methods in isolation

Fast: Milliseconds
Coverage: High
Mocking: Heavy

Integration Tests

Test components working together

Fast: Seconds
Coverage: Medium
Mocking: Selective

E2E Tests

Test complete user flows

Fast: Minutes
Coverage: Low
Mocking: None

Testing Pyramid: Many unit tests, fewer integration tests, even fewer E2E tests.

Using AI to Generate Tests

AI can help write tests, but YOU validate them

Generate pytest tests for this function specification: Function: `calculate_shipping_cost(weight: float, distance: int, expedited: bool) -> float` Requirements: - Base rate: $5 + $0.50 per kg - Distance multiplier: 1.0 for < 100km, 1.5 for 100-500km, 2.0 for > 500km - Expedited adds 50% to final cost - Minimum charge: $10 - Max weight: 50kg (raise ValueError if exceeded) Generate tests for: - Normal calculation - Each distance tier - Expedited option - Minimum charge - Weight limit error - Edge cases (0 weight, 0 distance)

Common TDD Mistakes to Avoid

Testing Implementation

Tests that break when you refactor internal code. Test BEHAVIOR, not HOW it works.

Tests After Code

Writing code first, then tests to match. This defeats the purpose of TDD.

Too Many Assertions

One test checking 10 things. Split into focused tests with single assertions.

Ignoring Edge Cases

Only testing happy path. AI misses edge cases unless you specify them.

Effective Test Prompts for AI

Test Generation

Generate pytest tests for [function].

Test cases needed:
1. [Normal case]
2. [Edge case 1]
3. [Error case]

Use fixtures for setup.
Include docstrings explaining
each test's purpose.

Implementation from Test

Here is my failing test:
[paste test code]

Implement the function to make
this test pass.

- Use only standard library
- Follow existing code style
- Add type hints
- Minimal implementation
Key phrase: "Implement the MINIMUM code to make this test pass."

Code Coverage Fundamentals

Measure what your tests actually test

Line Coverage

% of code lines executed by tests

Most common metric

Branch Coverage

% of decision branches taken (if/else)

Catches more edge cases

Function Coverage

% of functions called at least once

High-level overview

Path Coverage

% of all possible execution paths

Most thorough, rarely 100%

Key insight: Branch coverage reveals more bugs than line coverage alone.

Measuring Coverage

Run tests with coverage:

# Python (pytest-cov)
pytest --cov=app --cov-report=html --cov-branch

# JavaScript (Jest)
jest --coverage --collectCoverageFrom='src/**/*.js'

# Java (JaCoCo)
mvn test jacoco:report

Sample output:

Name              Stmts  Branch  Cover
---------------------------------------
app/auth.py          45      12    93%
app/models.py        22       4   100%
app/utils.py         18       8    61%
---------------------------------------
TOTAL               85      24    88%

AI Coverage Prompt

"My coverage report shows utils.py at 61% (lines 12-18, 25 uncovered). Here's the code: [paste code] Generate tests to cover the missing branches and error paths."

Coverage Targets by Code Type

Code TypeMinimumTargetRationale
Business Logic80%90%+Core value, must be reliable
API Endpoints70%85%Entry points, validation critical
Data Models60%75%Getters/setters less critical
Utilities80%95%Reused everywhere, bugs propagate
Error Handling70%85%Recovery paths must work

Don't Chase 100%

Diminishing returns past 90%. Focus on critical paths, not trivial code.

Enforce Minimums

Configure CI/CD to fail if coverage drops below threshold.

Enforcing Coverage in CI/CD

# pytest.ini - fail under threshold
[pytest]
addopts = --cov=app --cov-fail-under=80

# GitHub Actions workflow
- name: Run tests with coverage
  run: pytest --cov=app --cov-report=xml

- name: Upload to Codecov
  uses: codecov/codecov-action@v3

- name: Coverage gate
  run: |
    coverage report --fail-under=80

Coverage Gates

  • Absolute: Total coverage > 80%
  • Delta: PR can't decrease coverage
  • New code: Changed files must have 90%+

Tools: Codecov, Coveralls, SonarQube

Best practice: Block merging PRs that reduce overall coverage.

TDD Best Practices with AI

1. Test First, Always

Never let AI write code before you have a failing test.

2. Small Cycles

One test → one implementation. Don't batch.

3. Descriptive Names

test_user_cannot_register_with_duplicate_email not test_reg_3

4. Run Tests Often

After every change. Catch regressions immediately.

If you can't write a test for it, you don't understand the requirement well enough.

Key Takeaways

RED
GREEN
REFACTOR

Tests First

Define behavior before code

One at a Time

Single test, single cycle

Behavior

Test what, not how

Coverage

80%+ on critical paths

CI/CD

Enforce coverage gates

Safety Net

Protect against AI regressions

Questions?

Test-Driven Development with AI

Next: AI Limitations & When Not to Use AI

Slide Overview