TDD with AI

Test-Driven Development

Lecture 5

Write tests first, then let AI implement the code

Why TDD is Essential with AI

Tests are your safety net against AI mistakes

AI Without TDD

Code works initially, breaks later
No verification of correctness
Regressions go unnoticed
Refactoring is risky
Edge cases are missed

AI With TDD

Tests define expected behavior
Immediate verification of AI output
Regressions caught instantly
Safe refactoring (tests stay green)
Edge cases are explicitly tested

TDD transforms AI from a code generator into a verified solution provider.

TDD in the AIDD Workflow

Where Test-Driven Development fits

Discover

→

Plan

→

Review

→

Execute

→

Commit

→

Test

Execute Phase

TDD is the implementation method:

Write failing test (RED)
AI implements to pass (GREEN)
Refactor with confidence

Test Phase

Validate the complete feature:

Integration tests
E2E user journeys
Human testing scripts

The TDD Cycle: Red-Green-Refactor

RED
Write failing test

→

GREEN
Make it pass

→

REFACTOR
Clean up

1. RED

Write a test that fails. This defines what you want to achieve.

2. GREEN

Write the minimum code to make the test pass. No more.

3. REFACTOR

Clean up the code while keeping tests green.

TDD with AI: The Workflow

1

You write the test
Define expected behavior first

2

Run it (should fail)
Confirm the test is valid

3

AI implements
Generate code to pass test

4

Run tests (should pass)
Verify AI's implementation

Key insight: YOU define what "correct" means. AI figures out HOW to achieve it.

Example: User Registration (RED)

Step 1: Write a failing test for the first requirement

# test_user_registration.py
import pytest
from app.auth import register_user

def test_register_user_with_valid_data():
    """User can register with valid email and password."""
    result = register_user(
        email="test@example.com",
        password="SecurePass123"
    )

    assert result.success is True
    assert result.user.email == "test@example.com"
    assert result.user.id is not None

def test_register_user_with_invalid_email():
    """Registration fails with invalid email format."""
    result = register_user(
        email="not-an-email",
        password="SecurePass123"
    )

    assert result.success is False
    assert "email" in result.errors

Run test: pytest test_user_registration.py → FAILS (function doesn't exist yet)

Example: User Registration (GREEN)

Step 2: Ask AI to implement the minimum code to pass

Implement the `register_user` function to pass these tests: [paste the test code] Requirements from vision.md: - Validate email format (RFC 5322) - Password: min 8 chars, 1 uppercase, 1 number - Return a result object with success, user, or errors Write ONLY what's needed to pass the tests. No extra features.

# app/auth.py (AI-generated)
import re
from dataclasses import dataclass

@dataclass
class User:
    id: str
    email: str

@dataclass
class RegistrationResult:
    success: bool
    user: User = None
    errors: dict = None

def register_user(email: str, password: str) -> RegistrationResult:
    errors = {}

    # Validate email (simple RFC 5322 pattern)
    if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email):
        errors['email'] = 'Invalid email format'

    # Validate password
    if len(password) < 8 or not re.search(r'[A-Z]', password) or not re.search(r'\d', password):
        errors['password'] = 'Password must be 8+ chars with 1 uppercase and 1 number'

    if errors:
        return RegistrationResult(success=False, errors=errors)

    user = User(id=str(uuid.uuid4()), email=email)
    return RegistrationResult(success=True, user=user)

Example: User Registration (REFACTOR)

Step 3: Clean up while keeping tests green

What to Refactor

Extract validation into separate functions
Add type hints
Improve error messages
Extract regex patterns to constants

The Rule

Tests must stay green.

If a refactor breaks a test, either:
1. The refactor is wrong, OR
2. The test was testing implementation, not behavior

Refactor this code while keeping all tests passing: - Extract validation into `validate_email()` and `validate_password()` functions - Add proper type hints - Move regex patterns to module-level constants Do NOT change the function signature or return types.

The Golden Rule: One Test at a Time

WRONG: Batch Testing

Write all tests first:
- test_valid_registration
- test_invalid_email
- test_weak_password
- test_duplicate_email
- test_email_verification

Then implement everything

Problem: Overwhelming, hard to debug, AI generates bloated code

RIGHT: Incremental

Cycle 1: test_valid_registration
  → implement → pass → refactor

Cycle 2: test_invalid_email
  → implement → pass → refactor

Cycle 3: test_weak_password
  → implement → pass → refactor

...

Each cycle is focused and verifiable

Types of Tests in TDD

Unit Tests

Test individual functions/methods in isolation

Fast: Milliseconds
Coverage: High
Mocking: Heavy

Integration Tests

Test components working together

Fast: Seconds
Coverage: Medium
Mocking: Selective

E2E Tests

Test complete user flows

Fast: Minutes
Coverage: Low
Mocking: None

Testing Pyramid: Many unit tests, fewer integration tests, even fewer E2E tests.

Using AI to Generate Tests

AI can help write tests, but YOU validate them

Generate pytest tests for this function specification: Function: `calculate_shipping_cost(weight: float, distance: int, expedited: bool) -> float` Requirements: - Base rate: $5 + $0.50 per kg - Distance multiplier: 1.0 for < 100km, 1.5 for 100-500km, 2.0 for > 500km - Expedited adds 50% to final cost - Minimum charge: $10 - Max weight: 50kg (raise ValueError if exceeded) Generate tests for: - Normal calculation - Each distance tier - Expedited option - Minimum charge - Weight limit error - Edge cases (0 weight, 0 distance)

Common TDD Mistakes to Avoid

Testing Implementation

Tests that break when you refactor internal code. Test BEHAVIOR, not HOW it works.

Tests After Code

Writing code first, then tests to match. This defeats the purpose of TDD.

Too Many Assertions

One test checking 10 things. Split into focused tests with single assertions.

Ignoring Edge Cases

Only testing happy path. AI misses edge cases unless you specify them.

Effective Test Prompts for AI

Test Generation

Generate pytest tests for [function].

Test cases needed:
1. [Normal case]
2. [Edge case 1]
3. [Error case]

Use fixtures for setup.
Include docstrings explaining
each test's purpose.

Implementation from Test

Here is my failing test:
[paste test code]

Implement the function to make
this test pass.

- Use only standard library
- Follow existing code style
- Add type hints
- Minimal implementation

Key phrase: "Implement the MINIMUM code to make this test pass."

Code Coverage Fundamentals

Measure what your tests actually test

Line Coverage

% of code lines executed by tests

Most common metric

Branch Coverage

% of decision branches taken (if/else)

Catches more edge cases

Function Coverage

% of functions called at least once

High-level overview

Path Coverage

% of all possible execution paths

Most thorough, rarely 100%

Key insight: Branch coverage reveals more bugs than line coverage alone.

Measuring Coverage

Run tests with coverage:

# Python (pytest-cov)
pytest --cov=app --cov-report=html --cov-branch

# JavaScript (Jest)
jest --coverage --collectCoverageFrom='src/**/*.js'

# Java (JaCoCo)
mvn test jacoco:report

Sample output:

Name              Stmts  Branch  Cover
---------------------------------------
app/auth.py          45      12    93%
app/models.py        22       4   100%
app/utils.py         18       8    61%
---------------------------------------
TOTAL               85      24    88%

AI Coverage Prompt

"My coverage report shows utils.py at 61% (lines 12-18, 25 uncovered). Here's the code: [paste code] Generate tests to cover the missing branches and error paths."

Coverage Targets by Code Type

Code Type	Minimum	Target	Rationale
Business Logic	80%	90%+	Core value, must be reliable
API Endpoints	70%	85%	Entry points, validation critical
Data Models	60%	75%	Getters/setters less critical
Utilities	80%	95%	Reused everywhere, bugs propagate
Error Handling	70%	85%	Recovery paths must work

Don't Chase 100%

Diminishing returns past 90%. Focus on critical paths, not trivial code.

Enforce Minimums

Configure CI/CD to fail if coverage drops below threshold.

Enforcing Coverage in CI/CD

# pytest.ini - fail under threshold
[pytest]
addopts = --cov=app --cov-fail-under=80

# GitHub Actions workflow
- name: Run tests with coverage
  run: pytest --cov=app --cov-report=xml

- name: Upload to Codecov
  uses: codecov/codecov-action@v3

- name: Coverage gate
  run: |
    coverage report --fail-under=80

Coverage Gates

Absolute: Total coverage > 80%
Delta: PR can't decrease coverage
New code: Changed files must have 90%+

Tools: Codecov, Coveralls, SonarQube

Best practice: Block merging PRs that reduce overall coverage.

TDD Best Practices with AI

1. Test First, Always

Never let AI write code before you have a failing test.

2. Small Cycles

One test → one implementation. Don't batch.

3. Descriptive Names

test_user_cannot_register_with_duplicate_email not test_reg_3

4. Run Tests Often

After every change. Catch regressions immediately.

If you can't write a test for it, you don't understand the requirement well enough.

Key Takeaways

RED

→

GREEN

→

REFACTOR

Tests First

Define behavior before code

One at a Time

Single test, single cycle

Behavior

Test what, not how

Coverage

80%+ on critical paths

CI/CD

Enforce coverage gates

Safety Net

Protect against AI regressions

Questions?

Test-Driven Development with AI

Next: AI Limitations & When Not to Use AI

TDD with AI

Why TDD is Essential with AI

AI Without TDD

AI With TDD

TDD in the AIDD Workflow

Execute Phase

Test Phase

The TDD Cycle: Red-Green-Refactor

1. RED

2. GREEN

3. REFACTOR

TDD with AI: The Workflow

Example: User Registration (RED)

Example: User Registration (GREEN)

Example: User Registration (REFACTOR)

What to Refactor

The Rule

The Golden Rule: One Test at a Time

WRONG: Batch Testing

RIGHT: Incremental

Types of Tests in TDD

Unit Tests

Integration Tests

E2E Tests

Using AI to Generate Tests

Common TDD Mistakes to Avoid

Testing Implementation

Tests After Code

Too Many Assertions

Ignoring Edge Cases

Effective Test Prompts for AI

Test Generation

Implementation from Test

Code Coverage Fundamentals

Line Coverage

Branch Coverage

Function Coverage

Path Coverage

Measuring Coverage

AI Coverage Prompt

Coverage Targets by Code Type

Don't Chase 100%

Enforce Minimums

Enforcing Coverage in CI/CD

Coverage Gates

TDD Best Practices with AI

1. Test First, Always

2. Small Cycles

3. Descriptive Names

4. Run Tests Often

Key Takeaways

Questions?

Slide Overview