Model Deployment
Deploy your trained computer vision model as a production-ready API service
Objectives
By the end of this practical work, you will be able to:
- Build a FastAPI inference service for your trained model
- Containerize your application with Docker
- Deploy to cloud platforms (optional)
Prerequisites
- Completed Practical Work 5 with a trained model
- Python 3.9+ installed
- Docker installed and running
- Basic understanding of REST APIs
Install required packages:
pip install fastapi uvicorn pillow torch torchvision python-multipart
Note: If you used TensorFlow in previous sessions, install tensorflow instead of torch torchvision.
Instructions
Step 1: Save Your Trained Model
First, ensure your trained model from the previous session is properly saved. If you haven't saved it yet, use the following code:
import torch
# Save the entire model (PyTorch)
torch.save(model, 'model.pth') # (#1:Save complete model with architecture)
# Or save just the state dict (recommended)
torch.save(model.state_dict(), 'model_weights.pth') # (#2:Save only the weights)
For TensorFlow/Keras:
import tensorflow as tf
# Save the model
model.save('model.keras') # (#1:Save model in Keras format)
Step 2: Create Project Structure
Create the following project structure for your deployment:
- cv-api/
- app/
- __init__.py
- main.py
- preprocessing.py
- model.py
- models/
- model.pth
- Dockerfile
- requirements.txt
- .dockerignore
- app/
# Create project structure
mkdir -p cv-api/app cv-api/models
touch cv-api/app/__init__.py
touch cv-api/app/main.py cv-api/app/preprocessing.py cv-api/app/model.py
touch cv-api/Dockerfile cv-api/requirements.txt cv-api/.dockerignore
# Copy your trained model
cp model.pth cv-api/models/
Step 3: Write Preprocessing Utility
Create app/preprocessing.py with image preprocessing functions:
from PIL import Image
import torch
from torchvision import transforms
import io
# Define the same transforms used during training
transform = transforms.Compose([ # (#1:Match training transforms)
transforms.Resize((224, 224)), # (#2:Resize to model input size)
transforms.ToTensor(), # (#3:Convert to tensor)
transforms.Normalize( # (#4:Normalize with ImageNet stats)
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
def preprocess_image(image_bytes: bytes) -> torch.Tensor:
"""
Preprocess image bytes for model inference.
Args:
image_bytes: Raw image bytes from upload
Returns:
Preprocessed tensor ready for model
"""
image = Image.open(io.BytesIO(image_bytes)) # (#5:Load image from bytes)
# Convert to RGB if necessary
if image.mode != 'RGB': # (#6:Handle grayscale/RGBA images)
image = image.convert('RGB')
# Apply transforms and add batch dimension
tensor = transform(image) # (#7:Apply preprocessing)
tensor = tensor.unsqueeze(0) # (#8:Add batch dimension [1, C, H, W])
return tensor
flowchart TB
A["Image Bytes"] --> B["PIL Image.open()"]
B --> C{"RGB?"}
C -->|No| D["Convert to RGB"]
C -->|Yes| E["Resize 224x224"]
D --> E
E --> F["ToTensor()"]
F --> G["Normalize"]
G --> H["unsqueeze(0)"]
H --> I["Ready for Model"]
style A fill:#e8f4fc,stroke:#3498db
style I fill:#e8fcf4,stroke:#27ae60
style C fill:#fcf4e8,stroke:#e67e22
Note: Preprocessing must match the transforms used during training.
Step 4: Create Model Loading Utility
Create app/model.py to handle model loading:
import torch
import torch.nn as nn
from torchvision import models
from pathlib import Path
# Define your class labels (adjust based on your dataset)
CLASS_LABELS = ['cat', 'dog'] # (#1:Update with your actual classes)
def load_model(model_path: str = "models/model.pth") -> nn.Module:
"""
Load the trained model from disk.
Args:
model_path: Path to the saved model weights
Returns:
Loaded model in evaluation mode
"""
# Recreate the model architecture
model = models.resnet18(weights=None) # (#2:Same architecture as training)
num_classes = len(CLASS_LABELS)
model.fc = nn.Linear(model.fc.in_features, num_classes) # (#3:Adjust final layer)
# Load trained weights
model_path = Path(model_path)
if model_path.exists():
state_dict = torch.load(model_path, map_location='cpu') # (#4:Load on CPU)
model.load_state_dict(state_dict)
else:
raise FileNotFoundError(f"Model not found at {model_path}")
model.eval() # (#5:Set to evaluation mode)
return model
def predict(model: nn.Module, tensor: torch.Tensor) -> dict:
"""
Run inference on preprocessed tensor.
Args:
model: Loaded PyTorch model
tensor: Preprocessed image tensor
Returns:
Dictionary with prediction results
"""
with torch.no_grad(): # (#6:Disable gradient computation)
outputs = model(tensor)
probabilities = torch.softmax(outputs, dim=1) # (#7:Convert to probabilities)
confidence, predicted_idx = torch.max(probabilities, 1) # (#8:Get top prediction)
return {
"class": CLASS_LABELS[predicted_idx.item()],
"confidence": round(confidence.item(), 4),
"probabilities": { # (#9:Return all class probabilities)
label: round(prob, 4)
for label, prob in zip(CLASS_LABELS, probabilities[0].tolist())
}
}
Step 5: Create FastAPI Application
Create app/main.py with the API endpoints:
from fastapi import FastAPI, File, UploadFile, HTTPException # (#1:Import FastAPI components)
from fastapi.responses import JSONResponse
import logging
from .preprocessing import preprocess_image # (#2:Import preprocessing)
from .model import load_model, predict, CLASS_LABELS # (#3:Import model utilities)
# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize FastAPI app
app = FastAPI(
title="Computer Vision API", # (#4:API metadata)
description="Image classification API using trained deep learning model",
version="1.0.0"
)
# Load model at startup
model = None # (#5:Global model variable)
@app.on_event("startup") # (#6:Load model on startup)
async def startup_event():
global model
logger.info("Loading model...")
model = load_model()
logger.info("Model loaded successfully!")
@app.get("/health") # (#7:Health check endpoint)
async def health_check():
"""
Health check endpoint for monitoring.
Returns status and model availability.
"""
return {
"status": "healthy",
"model_loaded": model is not None,
"classes": CLASS_LABELS
}
@app.post("/predict") # (#8:Prediction endpoint)
async def predict_image(file: UploadFile = File(...)): # (#9:Accept file upload)
"""
Predict the class of an uploaded image.
- **file**: Image file (JPEG, PNG, etc.)
Returns predicted class and confidence score.
"""
# Validate file type
if not file.content_type.startswith("image/"): # (#10:Validate content type)
raise HTTPException(
status_code=400,
detail="File must be an image"
)
try:
# Read image bytes
image_bytes = await file.read() # (#11:Read uploaded file)
# Preprocess
tensor = preprocess_image(image_bytes) # (#12:Preprocess image)
# Predict
result = predict(model, tensor) # (#13:Run inference)
return JSONResponse(content={
"filename": file.filename,
"prediction": result
})
except Exception as e:
logger.error(f"Prediction error: {str(e)}")
raise HTTPException(
status_code=500,
detail=f"Prediction failed: {str(e)}"
)
@app.get("/") # (#14:Root endpoint)
async def root():
return {
"message": "Computer Vision API",
"docs": "/docs",
"health": "/health",
"predict": "/predict"
}
Step 6: Test Locally with Uvicorn
Create requirements.txt:
fastapi==0.109.0
uvicorn[standard]==0.27.0
python-multipart==0.0.6
pillow==10.2.0
torch==2.2.0
torchvision==0.17.0
Install dependencies and run the server:
# Navigate to project directory
cd cv-api
# Install dependencies
pip install -r requirements.txt
# Run the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
Success: Your API is now running at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.
flowchart LR
A["Client"] -->|"POST /predict"| B["Uvicorn\n:8000"]
B --> C["FastAPI\nRouter"]
C --> D["Preprocessing"]
D -->|"Tensor"| E["Model\nInference"]
E --> F["JSON\nResponse"]
F -->|"prediction"| A
style A fill:#636e72,color:#fff
style B fill:#00b894,color:#fff
style C fill:#0984e3,color:#fff
style D fill:#e17055,color:#fff
style E fill:#6c5ce7,color:#fff
style F fill:#00b894,color:#fff
Request flows through the FastAPI application stack for image classification.
Step 7: Test with Curl and Sample Images
Test your API endpoints using curl:
# Test health endpoint
curl http://localhost:8000/health
# Test prediction with an image
curl -X POST "http://localhost:8000/predict" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@test_image.jpg"
Expected response from /health:
{
"status": "healthy",
"model_loaded": true,
"classes": ["cat", "dog"]
}
Expected response from /predict:
{
"filename": "test_image.jpg",
"prediction": {
"class": "cat",
"confidence": 0.9523,
"probabilities": {
"cat": 0.9523,
"dog": 0.0477
}
}
}
flowchart TB
subgraph API["FastAPI Application"]
A["GET /"] --> B["API Info"]
C["GET /health"] --> D["Status + Model Info"]
E["POST /predict"] --> F["Image Classification"]
end
F --> G["Preprocess"]
G --> H["Model"]
H --> I["JSON Response"]
style A fill:#74b9ff,color:#fff
style C fill:#27ae60,color:#fff
style E fill:#3498db,color:#fff
style I fill:#00b894,color:#fff
Three endpoints: root for API info, health for monitoring, predict for inference.
Step 8: Write Dockerfile
Create the Dockerfile:
FROM python:3.10-slim # (#1:Use slim Python image)
WORKDIR /app # (#2:Set working directory)
# Install system dependencies
RUN apt-get update && apt-get install -y \ # (#3:Install required packages)
libgl1-mesa-glx \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first for better caching
COPY requirements.txt . # (#4:Copy requirements)
RUN pip install --no-cache-dir -r requirements.txt # (#5:Install Python deps)
# Copy application code
COPY app/ ./app/ # (#6:Copy app code)
COPY models/ ./models/ # (#7:Copy trained model)
# Expose port
EXPOSE 8000 # (#8:Expose API port)
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s \ # (#9:Configure health check)
CMD curl -f http://localhost:8000/health || exit 1
# Run the application
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"] # (#10:Start server)
Create .dockerignore:
__pycache__
*.pyc
*.pyo
.git
.gitignore
.env
*.md
.pytest_cache
.coverage
venv/
.venv/
Step 9: Build Docker Image
Build the Docker image:
# Build the image
docker build -t cv-api:latest .
# List images to verify
docker images | grep cv-api
Note: The first build may take several minutes as it downloads the base image and installs PyTorch/TensorFlow.
Step 10: Run Container and Test
Run the Docker container:
# Run container
docker run -d --name cv-api -p 8000:8000 cv-api:latest # (#1:Run in detached mode)
# Check container logs
docker logs cv-api # (#2:View startup logs)
# Test the endpoints
curl http://localhost:8000/health
# Test prediction
curl -X POST "http://localhost:8000/predict" \
-F "file=@test_image.jpg"
# Stop and remove container when done
docker stop cv-api && docker rm cv-api
Success: Your containerized API is now running and ready for deployment!
flowchart TB
subgraph Build["Docker Build"]
A["python:3.10-slim"] --> B["Install Dependencies"]
B --> C["Copy app/"]
C --> D["Copy models/"]
D --> E["cv-api:latest"]
end
subgraph Run["Docker Run"]
E --> F["Container"]
F -->|":8000"| G["Host Port"]
end
style A fill:#0db7ed,color:#fff
style E fill:#384d54,color:#fff
style F fill:#27ae60,color:#fff
style G fill:#e67e22,color:#fff
Note: Docker image size varies (~2-4GB with PyTorch). First build may take several minutes.
Step 11: Push to Docker Hub (Optional)
Share your image via Docker Hub:
# Login to Docker Hub
docker login
# Tag your image (replace 'yourusername')
docker tag cv-api:latest yourusername/cv-api:latest
# Push to Docker Hub
docker push yourusername/cv-api:latest
Step 12: Deploy to Cloud (Optional)
Choose a cloud platform for deployment:
Option A: Render
# Create render.yaml in project root
# Then connect your GitHub repo to Render
# render.yaml
services:
- type: web
name: cv-api
env: docker
dockerfilePath: ./Dockerfile
healthCheckPath: /health
Option B: Railway
# Install Railway CLI
npm install -g @railway/cli
# Login and deploy
railway login
railway init
railway up
Option C: HuggingFace Spaces
- Create a new Space at huggingface.co/spaces
- Select "Docker" as the SDK
- Push your code to the Space repository
Note: HuggingFace Spaces provides free GPU access for ML applications, making it ideal for computer vision models.
flowchart TB
A["Docker Image"] --> B{"Deployment Target"}
B --> C["Render"]
B --> D["Railway"]
B --> E["HuggingFace Spaces"]
B --> F["Docker Hub"]
C --> G["render.yaml"]
D --> H["railway up"]
E --> I["Dockerfile SDK"]
F --> J["docker push"]
style A fill:#384d54,color:#fff
style B fill:#e67e22,color:#fff
style C fill:#46b1c9,color:#fff
style D fill:#0b0d0e,color:#fff
style E fill:#ff9d00,color:#fff
style F fill:#0db7ed,color:#fff
Choose a platform based on your needs: free tier, GPU support, or enterprise features.
Expected Output
After completing this practical work, you should have:
- A working REST API that accepts image uploads
- Two endpoints:
/healthfor monitoring and/predictfor inference - JSON responses containing predicted class and confidence scores
- A Dockerized application ready for deployment
- Interactive API documentation at
/docs
Verification: Your API should respond to health checks and return accurate predictions for test images from your training dataset.
Deliverables
- Source Code: Complete
app/directory with all Python files - Dockerfile: Working Dockerfile and
.dockerignore - Requirements:
requirements.txtwith all dependencies - API Documentation: Screenshots of the
/docspage - Test Results: Screenshots or logs showing successful predictions
- Deployment URL: (If deployed) Live URL to your running API
Bonus Challenges
- Batch Prediction Endpoint: Add a
/predict/batchendpoint that accepts multiple images and returns predictions for all of them in a single request - Model Versioning: Implement model versioning with a
/modelsendpoint that lists available model versions and allows switching between them - Prometheus Metrics: Add Prometheus metrics using
prometheus-fastapi-instrumentatorfor monitoring request latency, throughput, and error rates - Authentication: Add API key authentication using FastAPI's security utilities
- Rate Limiting: Implement rate limiting to prevent API abuse
Example batch prediction endpoint:
from typing import List
@app.post("/predict/batch")
async def predict_batch(files: List[UploadFile] = File(...)): # (#1:Accept multiple files)
"""Predict classes for multiple images."""
results = []
for file in files:
image_bytes = await file.read()
tensor = preprocess_image(image_bytes)
result = predict(model, tensor)
results.append({
"filename": file.filename,
"prediction": result
})
return {"predictions": results}
sequenceDiagram
participant C as Client
participant A as FastAPI
participant P as Preprocessor
participant M as Model
C->>A: POST /predict/batch (files[])
loop For each image
A->>P: preprocess(image)
P->>M: tensor
M->>A: prediction
end
A->>C: JSON predictions[]
Batch endpoint processes multiple images in a single request for efficiency.