Practical Work 6

Model Deployment

Deploy your trained computer vision model as a production-ready API service

Duration 2.5 hours

Difficulty Advanced

Session 6 - Deployment

Objectives

By the end of this practical work, you will be able to:

Build a FastAPI inference service for your trained model
Containerize your application with Docker
Deploy to cloud platforms (optional)

Prerequisites

Completed Practical Work 5 with a trained model
Python 3.9+ installed
Docker installed and running
Basic understanding of REST APIs

Install required packages:

pip install fastapi uvicorn pillow torch torchvision python-multipart

Note: If you used TensorFlow in previous sessions, install tensorflow instead of torch torchvision.

Instructions

Step 1: Save Your Trained Model

First, ensure your trained model from the previous session is properly saved. If you haven't saved it yet, use the following code:

import torch

# Save the entire model (PyTorch)
torch.save(model, 'model.pth')  # (#1:Save complete model with architecture)

# Or save just the state dict (recommended)
torch.save(model.state_dict(), 'model_weights.pth')  # (#2:Save only the weights)

For TensorFlow/Keras:

import tensorflow as tf

# Save the model
model.save('model.keras')  # (#1:Save model in Keras format)

Step 2: Create Project Structure

Create the following project structure for your deployment:

cv-api/
- app/
  - __init__.py
  - main.py
  - preprocessing.py
  - model.py
- models/
  - model.pth
- Dockerfile
- requirements.txt
- .dockerignore

# Create project structure
mkdir -p cv-api/app cv-api/models
touch cv-api/app/__init__.py
touch cv-api/app/main.py cv-api/app/preprocessing.py cv-api/app/model.py
touch cv-api/Dockerfile cv-api/requirements.txt cv-api/.dockerignore

# Copy your trained model
cp model.pth cv-api/models/

Step 3: Write Preprocessing Utility

Create app/preprocessing.py with image preprocessing functions:

from PIL import Image
import torch
from torchvision import transforms
import io

# Define the same transforms used during training
transform = transforms.Compose([  # (#1:Match training transforms)
    transforms.Resize((224, 224)),  # (#2:Resize to model input size)
    transforms.ToTensor(),  # (#3:Convert to tensor)
    transforms.Normalize(  # (#4:Normalize with ImageNet stats)
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

def preprocess_image(image_bytes: bytes) -> torch.Tensor:
    """
    Preprocess image bytes for model inference.

    Args:
        image_bytes: Raw image bytes from upload

    Returns:
        Preprocessed tensor ready for model
    """
    image = Image.open(io.BytesIO(image_bytes))  # (#5:Load image from bytes)

    # Convert to RGB if necessary
    if image.mode != 'RGB':  # (#6:Handle grayscale/RGBA images)
        image = image.convert('RGB')

    # Apply transforms and add batch dimension
    tensor = transform(image)  # (#7:Apply preprocessing)
    tensor = tensor.unsqueeze(0)  # (#8:Add batch dimension [1, C, H, W])

    return tensor

Image Preprocessing Pipeline

flowchart TB
    A["Image Bytes"] --> B["PIL Image.open()"]
    B --> C{"RGB?"}
    C -->|No| D["Convert to RGB"]
    C -->|Yes| E["Resize 224x224"]
    D --> E
    E --> F["ToTensor()"]
    F --> G["Normalize"]
    G --> H["unsqueeze(0)"]
    H --> I["Ready for Model"]

    style A fill:#e8f4fc,stroke:#3498db
    style I fill:#e8fcf4,stroke:#27ae60
    style C fill:#fcf4e8,stroke:#e67e22

Note: Preprocessing must match the transforms used during training.

Step 4: Create Model Loading Utility

Create app/model.py to handle model loading:

import torch
import torch.nn as nn
from torchvision import models
from pathlib import Path

# Define your class labels (adjust based on your dataset)
CLASS_LABELS = ['cat', 'dog']  # (#1:Update with your actual classes)

def load_model(model_path: str = "models/model.pth") -> nn.Module:
    """
    Load the trained model from disk.

    Args:
        model_path: Path to the saved model weights

    Returns:
        Loaded model in evaluation mode
    """
    # Recreate the model architecture
    model = models.resnet18(weights=None)  # (#2:Same architecture as training)
    num_classes = len(CLASS_LABELS)
    model.fc = nn.Linear(model.fc.in_features, num_classes)  # (#3:Adjust final layer)

    # Load trained weights
    model_path = Path(model_path)
    if model_path.exists():
        state_dict = torch.load(model_path, map_location='cpu')  # (#4:Load on CPU)
        model.load_state_dict(state_dict)
    else:
        raise FileNotFoundError(f"Model not found at {model_path}")

    model.eval()  # (#5:Set to evaluation mode)
    return model

def predict(model: nn.Module, tensor: torch.Tensor) -> dict:
    """
    Run inference on preprocessed tensor.

    Args:
        model: Loaded PyTorch model
        tensor: Preprocessed image tensor

    Returns:
        Dictionary with prediction results
    """
    with torch.no_grad():  # (#6:Disable gradient computation)
        outputs = model(tensor)
        probabilities = torch.softmax(outputs, dim=1)  # (#7:Convert to probabilities)
        confidence, predicted_idx = torch.max(probabilities, 1)  # (#8:Get top prediction)

    return {
        "class": CLASS_LABELS[predicted_idx.item()],
        "confidence": round(confidence.item(), 4),
        "probabilities": {  # (#9:Return all class probabilities)
            label: round(prob, 4)
            for label, prob in zip(CLASS_LABELS, probabilities[0].tolist())
        }
    }

Step 5: Create FastAPI Application

Create app/main.py with the API endpoints:

from fastapi import FastAPI, File, UploadFile, HTTPException  # (#1:Import FastAPI components)
from fastapi.responses import JSONResponse
import logging

from .preprocessing import preprocess_image  # (#2:Import preprocessing)
from .model import load_model, predict, CLASS_LABELS  # (#3:Import model utilities)

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Initialize FastAPI app
app = FastAPI(
    title="Computer Vision API",  # (#4:API metadata)
    description="Image classification API using trained deep learning model",
    version="1.0.0"
)

# Load model at startup
model = None  # (#5:Global model variable)

@app.on_event("startup")  # (#6:Load model on startup)
async def startup_event():
    global model
    logger.info("Loading model...")
    model = load_model()
    logger.info("Model loaded successfully!")

@app.get("/health")  # (#7:Health check endpoint)
async def health_check():
    """
    Health check endpoint for monitoring.
    Returns status and model availability.
    """
    return {
        "status": "healthy",
        "model_loaded": model is not None,
        "classes": CLASS_LABELS
    }

@app.post("/predict")  # (#8:Prediction endpoint)
async def predict_image(file: UploadFile = File(...)):  # (#9:Accept file upload)
    """
    Predict the class of an uploaded image.

    - **file**: Image file (JPEG, PNG, etc.)

    Returns predicted class and confidence score.
    """
    # Validate file type
    if not file.content_type.startswith("image/"):  # (#10:Validate content type)
        raise HTTPException(
            status_code=400,
            detail="File must be an image"
        )

    try:
        # Read image bytes
        image_bytes = await file.read()  # (#11:Read uploaded file)

        # Preprocess
        tensor = preprocess_image(image_bytes)  # (#12:Preprocess image)

        # Predict
        result = predict(model, tensor)  # (#13:Run inference)

        return JSONResponse(content={
            "filename": file.filename,
            "prediction": result
        })

    except Exception as e:
        logger.error(f"Prediction error: {str(e)}")
        raise HTTPException(
            status_code=500,
            detail=f"Prediction failed: {str(e)}"
        )

@app.get("/")  # (#14:Root endpoint)
async def root():
    return {
        "message": "Computer Vision API",
        "docs": "/docs",
        "health": "/health",
        "predict": "/predict"
    }

Step 6: Test Locally with Uvicorn

Create requirements.txt:

fastapi==0.109.0
uvicorn[standard]==0.27.0
python-multipart==0.0.6
pillow==10.2.0
torch==2.2.0
torchvision==0.17.0

Install dependencies and run the server:

# Navigate to project directory
cd cv-api

# Install dependencies
pip install -r requirements.txt

# Run the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Success: Your API is now running at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

FastAPI Application Flow

flowchart LR
    A["Client"] -->|"POST /predict"| B["Uvicorn\n:8000"]
    B --> C["FastAPI\nRouter"]
    C --> D["Preprocessing"]
    D -->|"Tensor"| E["Model\nInference"]
    E --> F["JSON\nResponse"]
    F -->|"prediction"| A

    style A fill:#636e72,color:#fff
    style B fill:#00b894,color:#fff
    style C fill:#0984e3,color:#fff
    style D fill:#e17055,color:#fff
    style E fill:#6c5ce7,color:#fff
    style F fill:#00b894,color:#fff

Request flows through the FastAPI application stack for image classification.

Step 7: Test with Curl and Sample Images

Test your API endpoints using curl:

# Test health endpoint
curl http://localhost:8000/health

# Test prediction with an image
curl -X POST "http://localhost:8000/predict" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@test_image.jpg"

Expected response from /health:

{
    "status": "healthy",
    "model_loaded": true,
    "classes": ["cat", "dog"]
}

Expected response from /predict:

{
    "filename": "test_image.jpg",
    "prediction": {
        "class": "cat",
        "confidence": 0.9523,
        "probabilities": {
            "cat": 0.9523,
            "dog": 0.0477
        }
    }
}

API Endpoints Architecture

flowchart TB
    subgraph API["FastAPI Application"]
        A["GET /"] --> B["API Info"]
        C["GET /health"] --> D["Status + Model Info"]
        E["POST /predict"] --> F["Image Classification"]
    end
    F --> G["Preprocess"]
    G --> H["Model"]
    H --> I["JSON Response"]

    style A fill:#74b9ff,color:#fff
    style C fill:#27ae60,color:#fff
    style E fill:#3498db,color:#fff
    style I fill:#00b894,color:#fff

Three endpoints: root for API info, health for monitoring, predict for inference.

Step 8: Write Dockerfile

Create the Dockerfile:

FROM python:3.10-slim  # (#1:Use slim Python image)

WORKDIR /app  # (#2:Set working directory)

# Install system dependencies
RUN apt-get update && apt-get install -y \  # (#3:Install required packages)
    libgl1-mesa-glx \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

# Copy requirements first for better caching
COPY requirements.txt .  # (#4:Copy requirements)
RUN pip install --no-cache-dir -r requirements.txt  # (#5:Install Python deps)

# Copy application code
COPY app/ ./app/  # (#6:Copy app code)
COPY models/ ./models/  # (#7:Copy trained model)

# Expose port
EXPOSE 8000  # (#8:Expose API port)

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s \  # (#9:Configure health check)
    CMD curl -f http://localhost:8000/health || exit 1

# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]  # (#10:Start server)

Create .dockerignore:

__pycache__
*.pyc
*.pyo
.git
.gitignore
.env
*.md
.pytest_cache
.coverage
venv/
.venv/

Step 9: Build Docker Image

Build the Docker image:

# Build the image
docker build -t cv-api:latest .

# List images to verify
docker images | grep cv-api

Note: The first build may take several minutes as it downloads the base image and installs PyTorch/TensorFlow.

Step 10: Run Container and Test

Run the Docker container:

# Run container
docker run -d --name cv-api -p 8000:8000 cv-api:latest  # (#1:Run in detached mode)

# Check container logs
docker logs cv-api  # (#2:View startup logs)

# Test the endpoints
curl http://localhost:8000/health

# Test prediction
curl -X POST "http://localhost:8000/predict" \
  -F "file=@test_image.jpg"

# Stop and remove container when done
docker stop cv-api && docker rm cv-api

Success: Your containerized API is now running and ready for deployment!

Docker Build & Deploy

flowchart TB
    subgraph Build["Docker Build"]
        A["python:3.10-slim"] --> B["Install Dependencies"]
        B --> C["Copy app/"]
        C --> D["Copy models/"]
        D --> E["cv-api:latest"]
    end
    subgraph Run["Docker Run"]
        E --> F["Container"]
        F -->|":8000"| G["Host Port"]
    end

    style A fill:#0db7ed,color:#fff
    style E fill:#384d54,color:#fff
    style F fill:#27ae60,color:#fff
    style G fill:#e67e22,color:#fff

Note: Docker image size varies (~2-4GB with PyTorch). First build may take several minutes.

Step 11: Push to Docker Hub (Optional)

Share your image via Docker Hub:

# Login to Docker Hub
docker login

# Tag your image (replace 'yourusername')
docker tag cv-api:latest yourusername/cv-api:latest

# Push to Docker Hub
docker push yourusername/cv-api:latest

Step 12: Deploy to Cloud (Optional)

Choose a cloud platform for deployment:

Option A: Render

# Create render.yaml in project root
# Then connect your GitHub repo to Render

# render.yaml
services:
  - type: web
    name: cv-api
    env: docker
    dockerfilePath: ./Dockerfile
    healthCheckPath: /health

Option B: Railway

# Install Railway CLI
npm install -g @railway/cli

# Login and deploy
railway login
railway init
railway up

Option C: HuggingFace Spaces

Create a new Space at huggingface.co/spaces
Select "Docker" as the SDK
Push your code to the Space repository

Note: HuggingFace Spaces provides free GPU access for ML applications, making it ideal for computer vision models.

Cloud Deployment Options

flowchart TB
    A["Docker Image"] --> B{"Deployment Target"}
    B --> C["Render"]
    B --> D["Railway"]
    B --> E["HuggingFace Spaces"]
    B --> F["Docker Hub"]
    C --> G["render.yaml"]
    D --> H["railway up"]
    E --> I["Dockerfile SDK"]
    F --> J["docker push"]

    style A fill:#384d54,color:#fff
    style B fill:#e67e22,color:#fff
    style C fill:#46b1c9,color:#fff
    style D fill:#0b0d0e,color:#fff
    style E fill:#ff9d00,color:#fff
    style F fill:#0db7ed,color:#fff

Choose a platform based on your needs: free tier, GPU support, or enterprise features.

Expected Output

After completing this practical work, you should have:

A working REST API that accepts image uploads
Two endpoints: /health for monitoring and /predict for inference
JSON responses containing predicted class and confidence scores
A Dockerized application ready for deployment
Interactive API documentation at /docs

Verification: Your API should respond to health checks and return accurate predictions for test images from your training dataset.

Deliverables

Source Code: Complete app/ directory with all Python files
Dockerfile: Working Dockerfile and .dockerignore
Requirements: requirements.txt with all dependencies
API Documentation: Screenshots of the /docs page
Test Results: Screenshots or logs showing successful predictions
Deployment URL: (If deployed) Live URL to your running API

Bonus Challenges

Batch Prediction Endpoint: Add a /predict/batch endpoint that accepts multiple images and returns predictions for all of them in a single request
Model Versioning: Implement model versioning with a /models endpoint that lists available model versions and allows switching between them
Prometheus Metrics: Add Prometheus metrics using prometheus-fastapi-instrumentator for monitoring request latency, throughput, and error rates
Authentication: Add API key authentication using FastAPI's security utilities
Rate Limiting: Implement rate limiting to prevent API abuse

Example batch prediction endpoint:

from typing import List

@app.post("/predict/batch")
async def predict_batch(files: List[UploadFile] = File(...)):  # (#1:Accept multiple files)
    """Predict classes for multiple images."""
    results = []
    for file in files:
        image_bytes = await file.read()
        tensor = preprocess_image(image_bytes)
        result = predict(model, tensor)
        results.append({
            "filename": file.filename,
            "prediction": result
        })
    return {"predictions": results}

Batch Prediction Flow

sequenceDiagram
    participant C as Client
    participant A as FastAPI
    participant P as Preprocessor
    participant M as Model
    C->>A: POST /predict/batch (files[])
    loop For each image
        A->>P: preprocess(image)
        P->>M: tensor
        M->>A: prediction
    end
    A->>C: JSON predictions[]

Batch endpoint processes multiple images in a single request for efficiency.