Test-Driven Development
Lecture 5
Write tests first, then let AI implement the code
2026 WayUp
Tests are your safety net against AI mistakes
TDD transforms AI from a code generator into a verified solution provider.
Where Test-Driven Development fits
TDD is the implementation method:
Validate the complete feature:
Write a test that fails. This defines what you want to achieve.
Write the minimum code to make the test pass. No more.
Clean up the code while keeping tests green.
1
You write the test
Define expected behavior first
2
Run it (should fail)
Confirm the test is valid
3
AI implements
Generate code to pass test
4
Run tests (should pass)
Verify AI's implementation
Step 1: Write a failing test for the first requirement
# test_user_registration.py import pytest from app.auth import register_user def test_register_user_with_valid_data(): """User can register with valid email and password.""" result = register_user( email="test@example.com", password="SecurePass123" ) assert result.success is True assert result.user.email == "test@example.com" assert result.user.id is not None def test_register_user_with_invalid_email(): """Registration fails with invalid email format.""" result = register_user( email="not-an-email", password="SecurePass123" ) assert result.success is False assert "email" in result.errors
pytest test_user_registration.py → FAILS (function doesn't exist yet)
Step 2: Ask AI to implement the minimum code to pass
# app/auth.py (AI-generated) import re from dataclasses import dataclass @dataclass class User: id: str email: str @dataclass class RegistrationResult: success: bool user: User = None errors: dict = None def register_user(email: str, password: str) -> RegistrationResult: errors = {} # Validate email (simple RFC 5322 pattern) if not re.match(r'^[\w\.-]+@[\w\.-]+\.\w+$', email): errors['email'] = 'Invalid email format' # Validate password if len(password) < 8 or not re.search(r'[A-Z]', password) or not re.search(r'\d', password): errors['password'] = 'Password must be 8+ chars with 1 uppercase and 1 number' if errors: return RegistrationResult(success=False, errors=errors) user = User(id=str(uuid.uuid4()), email=email) return RegistrationResult(success=True, user=user)
Step 3: Clean up while keeping tests green
Tests must stay green.
If a refactor breaks a test, either:
1. The refactor is wrong, OR
2. The test was testing implementation, not behavior
Write all tests first: - test_valid_registration - test_invalid_email - test_weak_password - test_duplicate_email - test_email_verification Then implement everything
Problem: Overwhelming, hard to debug, AI generates bloated code
Cycle 1: test_valid_registration → implement → pass → refactor Cycle 2: test_invalid_email → implement → pass → refactor Cycle 3: test_weak_password → implement → pass → refactor ...
Each cycle is focused and verifiable
Test individual functions/methods in isolation
Fast: Milliseconds
Coverage: High
Mocking: Heavy
Test components working together
Fast: Seconds
Coverage: Medium
Mocking: Selective
Test complete user flows
Fast: Minutes
Coverage: Low
Mocking: None
AI can help write tests, but YOU validate them
Tests that break when you refactor internal code. Test BEHAVIOR, not HOW it works.
Writing code first, then tests to match. This defeats the purpose of TDD.
One test checking 10 things. Split into focused tests with single assertions.
Only testing happy path. AI misses edge cases unless you specify them.
Generate pytest tests for [function]. Test cases needed: 1. [Normal case] 2. [Edge case 1] 3. [Error case] Use fixtures for setup. Include docstrings explaining each test's purpose.
Here is my failing test: [paste test code] Implement the function to make this test pass. - Use only standard library - Follow existing code style - Add type hints - Minimal implementation
Measure what your tests actually test
% of code lines executed by tests
Most common metric
% of decision branches taken (if/else)
Catches more edge cases
% of functions called at least once
High-level overview
% of all possible execution paths
Most thorough, rarely 100%
Run tests with coverage:
# Python (pytest-cov) pytest --cov=app --cov-report=html --cov-branch # JavaScript (Jest) jest --coverage --collectCoverageFrom='src/**/*.js' # Java (JaCoCo) mvn test jacoco:report
Sample output:
Name Stmts Branch Cover --------------------------------------- app/auth.py 45 12 93% app/models.py 22 4 100% app/utils.py 18 8 61% --------------------------------------- TOTAL 85 24 88%
"My coverage report shows utils.py at 61% (lines 12-18, 25 uncovered). Here's the code: [paste code] Generate tests to cover the missing branches and error paths."
| Code Type | Minimum | Target | Rationale |
|---|---|---|---|
| Business Logic | 80% | 90%+ | Core value, must be reliable |
| API Endpoints | 70% | 85% | Entry points, validation critical |
| Data Models | 60% | 75% | Getters/setters less critical |
| Utilities | 80% | 95% | Reused everywhere, bugs propagate |
| Error Handling | 70% | 85% | Recovery paths must work |
Diminishing returns past 90%. Focus on critical paths, not trivial code.
Configure CI/CD to fail if coverage drops below threshold.
# pytest.ini - fail under threshold [pytest] addopts = --cov=app --cov-fail-under=80 # GitHub Actions workflow - name: Run tests with coverage run: pytest --cov=app --cov-report=xml - name: Upload to Codecov uses: codecov/codecov-action@v3 - name: Coverage gate run: | coverage report --fail-under=80
Tools: Codecov, Coveralls, SonarQube
Never let AI write code before you have a failing test.
One test → one implementation. Don't batch.
test_user_cannot_register_with_duplicate_email not test_reg_3
After every change. Catch regressions immediately.
If you can't write a test for it, you don't understand the requirement well enough.
Tests First
Define behavior before code
One at a Time
Single test, single cycle
Behavior
Test what, not how
Coverage
80%+ on critical paths
CI/CD
Enforce coverage gates
Safety Net
Protect against AI regressions
Test-Driven Development with AI
Next: AI Limitations & When Not to Use AI
2026 WayUp - way-up.io