Computer Vision for Business

Session 6: Deployment & Integration Strategies

Production architectures, optimization, and monitoring

Session 6 Overview

Taking CV from notebook to production (3 hours)

graph LR A["Part 1
Architecture
(1h)"] --> B["Part 2
Production
(1h)"] B --> C["Part 3
Lab 7
(1h)"] style A fill:#e1f5fe style B fill:#c8e6c9 style C fill:#fff3e0

Learning Objectives

Choose between cloud and edge deployment
Implement production-ready CV APIs
Optimize models for performance
Monitor and maintain deployed systems

The Journey to Production

From prototype to deployed system

graph LR A["Jupyter
Notebook"] --> B["Clean
Code"] B --> C["API
Wrapper"] C --> D["Container"] D --> E["Deploy"] E --> F["Monitor"] style A fill:#ffcdd2 style F fill:#c8e6c9

Cloud vs Edge Deployment

Two fundamentally different approaches

graph TB subgraph Cloud["Cloud Deployment"] C1["Images uploaded to cloud"] C2["Powerful servers process"] C3["Results returned via API"] end subgraph Edge["Edge Deployment"] E1["Images processed locally"] E2["On-device inference"] E3["Results available immediately"] end style Cloud fill:#e3f2fd style Edge fill:#e8f5e9

Cloud Deployment

Leveraging cloud infrastructure

Advantages:

Unlimited compute resources
Easy horizontal scaling
Access to powerful GPUs
No hardware management
Pay-per-use pricing

Disadvantages:

Network latency (100ms+)
Ongoing cloud costs
Requires internet
Data leaves premises
Vendor lock-in risk

Edge Deployment

Processing at the source

Advantages:

Ultra-low latency (<10ms)
Works offline
Data stays local
Predictable costs
No bandwidth costs

Disadvantages:

Limited compute power
Model size constraints
Hardware management
Harder to update
Upfront hardware cost

Deployment Decision Matrix

Match deployment to requirements

Factor	Cloud	Edge
Latency needs	100ms+ acceptable	Real-time required
Processing type	Batch processing	Streaming/continuous
Model complexity	Large models OK	Must be lightweight
Connectivity	Reliable internet	Offline/remote
Data sensitivity	Can send to cloud	Must stay local
Scale	Variable workloads	Predictable volume

Choosing Your Deployment

flowchart TD A{Latency?} -->|>100ms| B{Variable
load?} A -->|<50ms| C{Offline?} B -->|Yes| D["Cloud"] B -->|No| E{Sensitive
data?} C -->|Yes| F["Edge"] C -->|No| G["Hybrid"] E -->|Yes| F E -->|No| D style D fill:#e3f2fd style F fill:#e8f5e9 style G fill:#fff3e0

Production Architecture Patterns

Three common approaches

graph TB subgraph P1["Pattern 1: Sync API"] A1["Request"] --> B1["Process"] --> C1["Response"] end subgraph P2["Pattern 2: Async Queue"] A2["Request"] --> B2["Queue"] B2 --> C2["Worker"] C2 --> D2["Callback"] end subgraph P3["Pattern 3: Hybrid"] A3["Edge Filter"] --> B3["Cloud Process"] end style P1 fill:#e3f2fd style P2 fill:#fff3e0 style P3 fill:#e8f5e9

Pattern 1: Simple API Gateway

Synchronous request-response

sequenceDiagram participant Client participant API as API Gateway participant CV as CV Service participant Model Client->>API: POST /classify (image) API->>CV: Forward request CV->>Model: Run inference Model-->>CV: Predictions CV-->>API: Results API-->>Client: JSON response

Best for: Low-to-medium traffic, simple operations

Tools: FastAPI, Flask, AWS Lambda, Cloud Run

Pattern 2: Async Processing Queue

Decoupled processing for high throughput

graph LR A["Client"] --> B["API"] B --> C["Message Queue"] C --> D["Worker 1"] C --> E["Worker 2"] C --> F["Worker N"] D & E & F --> G["Results Store"] G --> H["Callback/Poll"] style C fill:#fff3e0 style G fill:#c8e6c9

Best for: Batch processing, variable loads, long tasks

Tools: RabbitMQ, AWS SQS, Redis Queue, Celery

Pattern 3: Edge + Cloud Hybrid

Intelligent routing between edge and cloud

graph TB subgraph Edge["Edge Device"] A["Camera"] --> B["Lightweight
Model"] B --> C{"Simple
case?"} end C -->|Yes| D["Local
Result"] C -->|No| E["Cloud
Full Model"] E --> F["Cloud
Result"] style Edge fill:#e8f5e9 style D fill:#c8e6c9 style F fill:#e3f2fd

Strategy: Edge handles 80% of simple cases, cloud handles complex ones

Implementation: FastAPI

Modern Python API framework

graph TB FA["FastAPI"] subgraph Fast["Fast"] F1["Async by default"] F2["High performance"] end subgraph Dev["Developer Friendly"] D1["Type hints"] D2["Auto-validation"] D3["Interactive docs"] end subgraph Prod["Production Ready"] P1["OpenAPI/Swagger"] P2["Easy testing"] P3["Middleware support"] end FA --> Fast FA --> Dev FA --> Prod style Fast fill:#d5f5e3 style Dev fill:#d4e6f1 style Prod fill:#fdebd0

FastAPI: Project Setup

from fastapi import FastAPI, File, UploadFile, HTTPException
import torch
from torchvision import transforms
from PIL import Image
import io
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(
    title="Product Classification API",
    description="CV API for product classification",
    version="1.0.0"
)

# Global model (loaded once at startup)
model = None
classes = None
transform = None

FastAPI: Model Loading at Startup

@app.on_event("startup")
async def load_model():
    """Load model at application startup."""
    global model, classes, transform

    logger.info("Loading model...")

    # Load TorchScript model
    model = torch.jit.load("product_classifier.pt")
    model.eval()

    classes = ["electronics", "clothing", "furniture", "food"]

    transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406],
                           [0.229, 0.224, 0.225])
    ])

    logger.info("Model loaded successfully!")

FastAPI: Health Check Endpoint

Essential for load balancers and monitoring

@app.get("/health")
async def health_check():
    """Health check endpoint for load balancers."""
    return {
        "status": "healthy",
        "model_loaded": model is not None,
        "version": "1.0.0"
    }

@app.get("/")
async def root():
    """API documentation redirect."""
    return {
        "message": "Product Classification API",
        "docs": "/docs",
        "health": "/health"
    }

FastAPI: Classification Endpoint

@app.post("/classify")
async def classify_image(file: UploadFile = File(...)):
    """Classify a product image."""

    # Validate file type
    if file.content_type not in ["image/jpeg", "image/png"]:
        raise HTTPException(400, "Invalid image format")

    try:
        contents = await file.read()
        image = Image.open(io.BytesIO(contents)).convert("RGB")
        input_tensor = transform(image).unsqueeze(0)

        with torch.no_grad():
            outputs = model(input_tensor)
            probs = torch.softmax(outputs, dim=1)
            confidence, predicted = probs.max(1)

        return {
            "class": classes[predicted.item()],
            "confidence": round(confidence.item(), 4)
        }
    except Exception as e:
        logger.error(f"Error: {str(e)}")
        raise HTTPException(500, str(e))

FastAPI: Batch Classification

from typing import List

@app.post("/batch_classify")
async def batch_classify(files: List[UploadFile] = File(...)):
    """Classify multiple product images."""
    results = []

    for file in files:
        try:
            contents = await file.read()
            image = Image.open(io.BytesIO(contents)).convert("RGB")
            input_tensor = transform(image).unsqueeze(0)

            with torch.no_grad():
                outputs = model(input_tensor)
                probs = torch.softmax(outputs, dim=1)
                confidence, predicted = probs.max(1)

            results.append({
                "filename": file.filename,
                "class": classes[predicted.item()],
                "confidence": round(confidence.item(), 4)
            })
        except Exception as e:
            results.append({"filename": file.filename, "error": str(e)})

    return {"results": results}

Running the API

Development and production modes

# Development (with auto-reload)
uvicorn app:app --reload --host 0.0.0.0 --port 8000

# Production (with workers)
uvicorn app:app --host 0.0.0.0 --port 8000 --workers 4

# Or with Gunicorn for more control
gunicorn app:app -w 4 -k uvicorn.workers.UvicornWorker

graph LR A["Client"] --> B["Uvicorn/Gunicorn"] B --> C["Worker 1"] B --> D["Worker 2"] B --> E["Worker N"] style B fill:#e1f5fe

Performance Optimization

Making models production-fast

graph TB subgraph Techniques["Optimization Techniques"] A["Quantization
2-4x faster"] B["Pruning
1.5-3x faster"] C["ONNX/TensorRT
2-5x faster"] D["Batching
2-8x throughput"] end style Techniques fill:#e8f5e9

Optimization Techniques Comparison

Technique	Speedup	Trade-off
Quantization	2-4x	Minor accuracy loss (0.5-2%)
Model Pruning	1.5-3x	May need fine-tuning
Knowledge Distillation	2-10x	Requires training smaller model
TensorRT/ONNX	2-5x	Hardware-specific
Batching	2-8x	Adds latency for single requests

Optimization: Quantization

Reduce precision from FP32 to INT8

graph LR A["FP32 Model
100MB"] --> B["Quantize"] B --> C["INT8 Model
25MB"] D["FP32 Inference
100ms"] --> E["Quantize"] E --> F["INT8 Inference
30ms"] style C fill:#c8e6c9 style F fill:#c8e6c9

import torch.quantization

# Dynamic quantization (easiest)
quantized_model = torch.quantization.quantize_dynamic(
    model,
    {torch.nn.Linear},
    dtype=torch.qint8
)

# Check size reduction
print(f"Original: {get_model_size(model)}MB")
print(f"Quantized: {get_model_size(quantized_model)}MB")

Optimization: ONNX Export

Cross-platform inference optimization

# Export to ONNX format
dummy_input = torch.randn(1, 3, 224, 224)

torch.onnx.export(
    model,
    dummy_input,
    "model.onnx",
    opset_version=11,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}}
)

# Run with ONNX Runtime
import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
outputs = session.run(None, {"input": input_array})

Monitoring and Observability

What to track in production

graph TB M["Monitoring"] subgraph Perf["Performance"] P1["Latency p50/p95/p99"] P2["Throughput"] P3["Error rates"] end subgraph Qual["Quality"] Q1["Accuracy drift"] Q2["Confidence distribution"] Q3["False positive rate"] end subgraph Sys["System"] S1["CPU/GPU usage"] S2["Memory"] S3["Queue depth"] end subgraph Biz["Business"] B1["Requests/hour"] B2["User feedback"] B3["Revenue impact"] end M --> Perf M --> Qual M --> Sys M --> Biz style Perf fill:#d4e6f1 style Qual fill:#d5f5e3 style Sys fill:#fdebd0 style Biz fill:#e8daef

Key Production Metrics

Category	Metrics	Alert Threshold
Performance	p95 latency	> 500ms
Errors	Error rate	> 1%
Quality	Avg confidence	< 0.7
System	GPU memory	> 90%
Queue	Queue depth	> 1000

Tools: Prometheus, Grafana, DataDog, CloudWatch

Drift Detection

When production data differs from training

graph LR A["Training Data
Distribution"] --> B{"Compare"} C["Production Data
Distribution"] --> B B -->|Similar| D["OK"] B -->|Different| E["DRIFT!
Retrain needed"] style D fill:#c8e6c9 style E fill:#ffcdd2

Drift Detection Implementation

import numpy as np
from scipy import stats

class DriftDetector:
    def __init__(self, baseline, threshold=0.05):
        self.baseline = baseline
        self.threshold = threshold
        self.recent = []

    def add_prediction(self, pred):
        self.recent.append(pred)
        if len(self.recent) >= 100:
            self._check_drift()
            self.recent = []

    def _check_drift(self):
        recent = np.array(self.recent)
        # Kolmogorov-Smirnov test
        stat, p_value = stats.ks_2samp(
            self.baseline.flatten(),
            recent.flatten()
        )
        if p_value < self.threshold:
            alert("DRIFT DETECTED!")

Handling Uncertain Predictions

Confidence thresholding strategies

graph TB A["Model Prediction"] --> B{"Confidence
> threshold?"} B -->|Yes| C["Auto-accept"] B -->|No| D["Manual Review"] E["High threshold (0.9)"] --> F["Few errors
More reviews"] G["Low threshold (0.6)"] --> H["More automation
More errors"] style C fill:#c8e6c9 style D fill:#fff3e0

Confidence Thresholding

def classify_with_threshold(model, image, threshold=0.7):
    """Route low-confidence predictions for review."""
    confidence, prediction = model.predict(image)

    if confidence < threshold:
        return {
            "status": "review_required",
            "confidence": confidence,
            "suggestion": prediction
        }

    return {
        "status": "auto_accepted",
        "class": prediction,
        "confidence": confidence
    }

# Tune threshold based on:
# - Cost of errors vs cost of review
# - Acceptable error rate
# - Review capacity

Graceful Degradation

What to do when uncertain

graph TB A["Primary Model"] --> B{"Confident?"} B -->|Yes| C["Return Result"] B -->|No| D["Fallback Model"] D --> E{"Confident?"} E -->|Yes| F["Return Result"] E -->|No| G["Human Review Queue"] style C fill:#c8e6c9 style F fill:#c8e6c9 style G fill:#fff3e0

Lab 7: End-to-End Pipeline

Product Image Quality Checker

graph LR A["Product
Image"] --> B["Technical
Check"] B --> C["Visual
Analysis"] C --> D["Quality
Score"] D --> E["Accept/
Reject"] style A fill:#e1f5fe style E fill:#c8e6c9

Quality Levels:

EXCELLENT (>0.9) - Hero placement ready
GOOD (>0.7) - Standard listing OK
ACCEPTABLE (>0.5) - Minor fixes needed
REJECTED (<0.5) - Does not meet standards

Lab 7: Pipeline Architecture

graph TB subgraph Input A["Image Upload"] end subgraph Technical["Technical Checks"] B["Resolution"] C["Aspect Ratio"] D["File Size"] end subgraph Visual["AI Analysis"] E["Lighting"] F["Background"] G["Focus"] H["Composition"] end subgraph Output I["Quality Score"] J["Recommendations"] end A --> Technical --> Visual --> Output style Technical fill:#e3f2fd style Visual fill:#e8f5e9

Lab 7: Data Structures

from dataclasses import dataclass
from enum import Enum

class ImageQuality(Enum):
    EXCELLENT = "excellent"
    GOOD = "good"
    ACCEPTABLE = "acceptable"
    REJECTED = "rejected"

@dataclass
class QualityAssessment:
    quality: ImageQuality
    issues: list
    recommendations: list
    scores: dict  # {"technical": 0.9, "visual": 0.8}

class ProductImageChecker:
    def __init__(self, api_key: str):
        self.client = anthropic.Anthropic(api_key=api_key)
        self.criteria = {
            "resolution": {"min_width": 800, "min_height": 800},
            "aspect_ratio": {"min": 0.8, "max": 1.2}
        }

Lab 7: Technical Quality Check

def _check_technical(self, image_path: str) -> dict:
    """Check technical aspects without AI."""
    from PIL import Image

    issues = []
    score = 1.0

    img = Image.open(image_path)
    width, height = img.size

    # Resolution check
    if width < 800 or height < 800:
        issues.append(f"Resolution too low: {width}x{height}")
        score -= 0.3

    # Aspect ratio check
    ratio = width / height
    if ratio < 0.8 or ratio > 1.2:
        issues.append(f"Aspect ratio {ratio:.2f} not ideal")
        score -= 0.1

    # File size check (rough quality indicator)
    file_size = os.path.getsize(image_path)
    if file_size < 50000:  # 50KB
        issues.append("Image may be over-compressed")
        score -= 0.2

    return {"issues": issues, "score": max(0, score)}

Lab 7: Visual Quality with Claude

def _check_visual(self, image_path: str) -> dict:
    """AI-powered visual quality assessment."""
    image_data = encode_image(image_path)

    response = self.client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64", "media_type": "image/jpeg",
                    "data": image_data}},
                {"type": "text", "text": """Rate this product image:
                {"lighting": 0-1, "background": 0-1,
                 "focus": 0-1, "composition": 0-1,
                 "issues": [], "recommendations": []}
                Return JSON only."""}
            ]
        }]
    )
    return json.loads(response.content[0].text)

Lab 7: Complete Check Method

def check_image(self, image_path: str) -> QualityAssessment:
    """Run complete quality assessment."""
    # Technical checks
    tech = self._check_technical(image_path)

    # Visual analysis
    visual = self._check_visual(image_path)

    # Calculate overall score
    visual_avg = sum([visual["lighting"], visual["background"],
                      visual["focus"], visual["composition"]]) / 4
    overall = (tech["score"] + visual_avg) / 2

    # Determine quality level
    if overall >= 0.9:
        quality = ImageQuality.EXCELLENT
    elif overall >= 0.7:
        quality = ImageQuality.GOOD
    elif overall >= 0.5:
        quality = ImageQuality.ACCEPTABLE
    else:
        quality = ImageQuality.REJECTED

    return QualityAssessment(
        quality=quality,
        issues=tech["issues"] + visual.get("issues", []),
        recommendations=visual.get("recommendations", []),
        scores={"technical": tech["score"], "visual": visual_avg}
    )

Lab 7: Using the Checker

def main():
    checker = ProductImageChecker(api_key="your-key")

    result = checker.check_image("product.jpg")

    print(f"Quality: {result.quality.value}")
    print(f"Scores: {result.scores}")

    if result.issues:
        print("\nIssues:")
        for issue in result.issues:
            print(f"  - {issue}")

    if result.recommendations:
        print("\nRecommendations:")
        for rec in result.recommendations:
            print(f"  - {rec}")

# Output:
# Quality: good
# Scores: {'technical': 0.9, 'visual': 0.75}
# Issues:
#   - Background slightly cluttered
# Recommendations:
#   - Use solid white background

Self-Practice Assignment 6

Duration: 2 hours | Deadline: End of course

Task: Deployment-Ready Project

Clean Up Code (30 min)
- Refactor into clean functions/classes
- Add docstrings and type hints
Create API (45 min)
- Wrap model in FastAPI
- Add health check + classify endpoints
- Include error handling
Documentation (45 min)
- Write comprehensive README
- Include API examples
- Document limitations

Deliverable: Complete project with API and documentation

Session 6 Summary

graph TB S6["Session 6"] subgraph Dep["Deployment"] D1["Cloud vs Edge"] D2["Architecture patterns"] D3["FastAPI"] end subgraph Opt["Optimization"] O1["Quantization"] O2["ONNX export"] O3["Batching"] end subgraph Mon["Monitoring"] M1["Key metrics"] M2["Drift detection"] M3["Alerting"] end subgraph Edge["Edge Cases"] E1["Confidence thresholds"] E2["Fallback strategies"] E3["Human-in-the-loop"] end S6 --> Dep S6 --> Opt S6 --> Mon S6 --> Edge style Dep fill:#d4e6f1 style Opt fill:#d5f5e3 style Mon fill:#fdebd0 style Edge fill:#e8daef

Course Complete!

You've completed Computer Vision for Business

graph LR S1["Session 1
Foundations"] --> S2["Session 2
Business Apps"] S2 --> S3["Session 3
Cloud APIs"] S3 --> S4["Session 4
Custom Models"] S4 --> S5["Session 5
Ethics"] S5 --> S6["Session 6
Deployment"] style S6 fill:#c8e6c9

Congratulations! You're now ready to implement CV solutions in production.

Computer Vision for Business

Session 6: Deployment & Integration Strategies

Session 6 Overview

Learning Objectives

The Journey to Production

Cloud vs Edge Deployment

Cloud Deployment

Advantages:

Disadvantages:

Edge Deployment

Advantages:

Disadvantages:

Deployment Decision Matrix

Choosing Your Deployment

Production Architecture Patterns

Pattern 1: Simple API Gateway

Pattern 2: Async Processing Queue

Pattern 3: Edge + Cloud Hybrid

Implementation: FastAPI

FastAPI: Project Setup

FastAPI: Model Loading at Startup

FastAPI: Health Check Endpoint

FastAPI: Classification Endpoint

FastAPI: Batch Classification

Running the API

Performance Optimization

Optimization Techniques Comparison

Optimization: Quantization

Optimization: ONNX Export

Monitoring and Observability

Key Production Metrics

Drift Detection

Drift Detection Implementation

Handling Uncertain Predictions

Confidence Thresholding

Graceful Degradation

Lab 7: End-to-End Pipeline

Quality Levels:

Lab 7: Pipeline Architecture

Lab 7: Data Structures

Lab 7: Technical Quality Check

Lab 7: Visual Quality with Claude

Lab 7: Complete Check Method

Lab 7: Using the Checker

Self-Practice Assignment 6

Task: Deployment-Ready Project

Session 6 Summary

Course Complete!

Slide Overview