Computer Vision for Business

Session 4: Custom Models & Transfer Learning

When and how to build your own computer vision models

Session 4: Learning Objectives

By the end of this session, you will:

Decide when to use custom models vs. cloud APIs
Understand transfer learning concepts and techniques
Build custom classifiers using PyTorch and pre-trained models
Apply data augmentation strategies for your domain
Evaluate model performance with appropriate metrics

Session 4 Roadmap

3-hour journey through custom model development

1

Decision Framework

30 min

→

2

Transfer Learning

1h30

→

3

AutoML Platforms

45 min

→

4

Model Evaluation

15 min

The Spectrum of CV Solutions

From off-the-shelf to fully custom

1

Cloud APIs

Easy to Start

→

2

AutoML

Middle Ground

→

3

Transfer Learning

Middle Ground

→

4

Custom Training

Full Control

When to Build Custom Models

Decision factors

✓ Use Cloud APIs When

Generic tasks (faces, text, objects)
Rapid prototyping needed
Limited ML expertise
Low to medium volume
Budget for per-call pricing

★ Build Custom When

Domain-specific objects
Need control over accuracy
High inference volume
Data privacy requirements
Edge deployment needed

Decision Framework

Factor-by-factor comparison

Factor	Cloud API	Custom Model
Time to deploy	Hours to days	Weeks to months
Data requirements	None	100s to 1000s labeled images
Cost structure	Per-call (scales with usage)	Fixed (training + infra)
Expertise	API integration	ML engineering
Customization	Limited	Full control
Privacy	Data sent to cloud	Data stays on-premise

Cost Crossover Point

When custom becomes cheaper

Cloud API Costs

Low Volume: $0.001/image = $100/month (100K images)
High Volume: $0.001 × 1M = $1,000/month
Scaling: Linear cost increase

Custom Model Costs

Development: $20K one-time
Hosting: $200/month
Scaling: Minimal cost increase

Rule of thumb: If processing >50K images/month, custom often wins on cost

What is Transfer Learning?

Standing on the shoulders of giants

✗ Traditional ML

1

Your Data (100s)

→

2

Train from Scratch

→

3

Poor Results

✓ Transfer Learning

1

ImageNet 14M+ Images

→

2

Your Data (100s)

→

3

Great Results

Why Transfer Learning Works

Neural networks learn hierarchically

1

Early Layers

Edges, Textures, Shapes

→

2

Middle Layers

Parts, Patterns, Structures

→

3

Late Layers

Objects, Your Classes

Key insight: Early/middle layers learn features useful for ANY image task!

Transfer Learning Approaches

Two main strategies

1 Feature Extraction

1

Pre-trained FROZEN

→

2

New Head TRAINED

2 Fine-Tuning

1

Pre-trained TRAINABLE

→

2

New Head TRAINED

Feature Extraction vs Fine-Tuning

When to use each

Aspect	Feature Extraction	Fine-Tuning
Training time	Fast (minutes)	Slower (hours)
Data needed	Small (50-200/class)	Larger (200-1000/class)
Domain similarity	Similar to ImageNet	Different from ImageNet
Risk of overfitting	Lower	Higher
Potential accuracy	Good	Better

Start with: Feature extraction, then try fine-tuning if accuracy isn't sufficient

Popular Pre-trained Models

Choosing your backbone

Model	Size	Speed	Accuracy	Best For
MobileNetV3	5M params	Very Fast	Good	Mobile/Edge
EfficientNet-B0	5M params	Fast	Very Good	Balanced
ResNet-50	25M params	Medium	Excellent	General
ViT-B/16	86M params	Slow	Best	Max accuracy

Lab 5: Custom Classifier with PyTorch

What we'll build

1

Your Images

→

2

Data Pipeline

→

3

Pre-trained ResNet

→

4

New Head

→

5

Training

→

6

Trained Model

Lab 5: Organizing Your Data

Standard ImageFolder structure

data/
  train/
    class_a/
      image001.jpg
      image002.jpg
      ...
    class_b/
      image001.jpg
      ...
  val/
    class_a/
      ...
    class_b/
      ...

Rule of thumb: 80% train, 20% validation. Aim for 50+ images per class minimum.

Lab 5: Creating the Data Pipeline

import torch
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

def create_dataloaders(data_dir: str, batch_size: int = 32):
    """Create train and validation dataloaders."""

    # Standard ImageNet normalization (required for pre-trained models)
    normalize = transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )

    train_transform = transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        normalize
    ])

    val_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        normalize
    ])

Lab 5: Creating DataLoaders

    # Load datasets from folder structure
    train_data = datasets.ImageFolder(
        f"{data_dir}/train",
        transform=train_transform
    )
    val_data = datasets.ImageFolder(
        f"{data_dir}/val",
        transform=val_transform
    )

    # Create dataloaders
    train_loader = DataLoader(
        train_data, batch_size=batch_size,
        shuffle=True, num_workers=4
    )
    val_loader = DataLoader(
        val_data, batch_size=batch_size,
        shuffle=False, num_workers=4
    )

    return train_loader, val_loader, train_data.classes

# Usage
train_loader, val_loader, classes = create_dataloaders("data/defects")
print(f"Classes: {classes}")  # ['good', 'scratch', 'dent']

Lab 5: Building the Model

Loading and modifying a pre-trained model

import torch.nn as nn
from torchvision import models

def create_model(num_classes: int, freeze_base: bool = True):
    """Create transfer learning model."""

    # Load pre-trained ResNet-50
    model = models.resnet50(
        weights=models.ResNet50_Weights.IMAGENET1K_V2
    )

    # Replace classification head
    num_features = model.fc.in_features  # 2048
    model.fc = nn.Linear(num_features, num_classes)

    # Freeze base layers if specified
    if freeze_base:
        for param in list(model.parameters())[:-2]:
            param.requires_grad = False

    return model

Lab 5: Model Architecture

What we're modifying

1

Conv Layers

FROZEN

→

2

Pooling Layers

FROZEN

→

3

Global Pool

NEW

→

4

Linear Layer

NEW

Only the new head is trained - much faster than training from scratch!

Lab 5: The Training Loop

def train_model(model, train_loader, val_loader, epochs=10, lr=0.001):
    """Train the model."""
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(
        filter(lambda p: p.requires_grad, model.parameters()),
        lr=lr
    )

    for epoch in range(epochs):
        model.train()
        running_loss = 0.0

        for inputs, labels in train_loader:
            inputs, labels = inputs.to(device), labels.to(device)

            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

Lab 5: Validation Step

        # Validation phase
        model.eval()
        val_loss = 0.0
        correct = 0
        total = 0

        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                val_loss += loss.item()

                _, predicted = outputs.max(1)
                total += labels.size(0)
                correct += predicted.eq(labels).sum().item()

        val_acc = correct / total
        print(f"Epoch {epoch+1}/{epochs}")
        print(f"  Train Loss: {running_loss/len(train_loader):.4f}")
        print(f"  Val Loss: {val_loss/len(val_loader):.4f}")
        print(f"  Val Acc: {val_acc:.2%}")

    return model

Lab 5: Complete Training Pipeline

def train_defect_classifier():
    """Complete training pipeline for defect detection."""

    # Create data loaders
    train_loader, val_loader, classes = create_dataloaders(
        "data/defects", batch_size=32
    )
    print(f"Classes: {classes}")
    print(f"Training samples: {len(train_loader.dataset)}")

    # Initialize model (feature extraction mode)
    model = create_model(
        num_classes=len(classes),
        freeze_base=True
    )

    # Train
    model = train_model(model, train_loader, val_loader, epochs=15)

    # Save model
    torch.save(model.state_dict(), "defect_classifier.pth")
    print("Model saved!")

    return model, classes

# Run training
model, classes = train_defect_classifier()

Lab 6: Data Augmentation

Artificially expand your dataset

1

Original Image

→

2

Augmentation Pipeline

→

3

5x More Training Data

Flipped, Rotated, Cropped, Color Shifted, Blurred

Key principle: Augmentations should mimic real-world variations

Types of Augmentations

1 Geometric

Flip (Horizontal/Vertical)
Rotation
Scale/Crop
Perspective

2 Color/Intensity

Brightness
Contrast
Saturation
Hue

3 Noise/Blur

Gaussian Noise
Blur
JPEG Artifacts

Lab 6: Using Albumentations

Fast, flexible augmentation library

import albumentations as A
from albumentations.pytorch import ToTensorV2

def create_augmentation_pipeline(task_type: str = "general"):
    """Create augmentation pipeline based on task."""

    if task_type == "general":
        return A.Compose([
            A.RandomResizedCrop(224, 224, scale=(0.8, 1.0)),
            A.HorizontalFlip(p=0.5),
            A.Rotate(limit=15, p=0.5),
            A.ColorJitter(
                brightness=0.2, contrast=0.2,
                saturation=0.2, p=0.5
            ),
            A.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225]
            ),
            ToTensorV2()
        ])

Lab 6: Domain-Specific Augmentations

Match augmentations to your use case

    if task_type == "defect_detection":
        # Manufacturing: preserve size, simulate sensor noise
        return A.Compose([
            A.Resize(224, 224),
            A.HorizontalFlip(p=0.5),
            A.VerticalFlip(p=0.5),
            A.RandomBrightnessContrast(brightness_limit=0.3, p=0.5),
            A.GaussNoise(var_limit=(10, 50), p=0.3),
            A.Normalize(...), ToTensorV2()
        ])

    elif task_type == "document":
        # Documents: perspective, lighting
        return A.Compose([
            A.Perspective(scale=(0.05, 0.1), p=0.5),
            A.Affine(rotate=(-5, 5), shear=(-5, 5), p=0.5),
            A.RandomBrightnessContrast(p=0.5),
            A.GaussianBlur(blur_limit=(3, 5), p=0.3),
            A.Normalize(...), ToTensorV2()
        ])

Augmentation Best Practices

✓ DO

Match real variations
Start conservative
Test on validation set
Use domain knowledge

✗ DON'T

Over-augment
Change class meaning
Apply to validation
Ignore edge cases

Example: Don't flip vertically for character recognition - "6" becomes "9"!

AutoML: No-Code Training

Train custom models without writing code

1

Upload Images

→

2

Annotate

→

3

Configure

→

4

Train

→

5

Deploy

Best for: Teams without ML expertise, rapid prototyping, proof of concepts

AutoML Platform Comparison

Platform	Strengths	Best For
Google Vertex AI	Auto architecture, one-click deploy	GCP users, production
AWS Rekognition Custom	Few-shot (10+ images), pay-per-use	AWS ecosystem
Azure Custom Vision	User-friendly, ONNX export	Edge deployment
Roboflow	Great annotation, versioning	Object detection
Teachable Machine	Free, instant results	Learning, demos

AutoML Workflow (Roboflow)

1

Create Project

Define task type

→

2

Upload Images

Drag & drop

→

3

Annotate

Draw boxes/polygons

→

4

Generate Dataset

Split + augment

→

5

Train Model

Select backbone

→

6

Deploy

API or edge

Model Evaluation

Measuring what matters

1

Predictions

→

2

Compare to Ground Truth

→

3

Metrics

Accuracy, Precision, Recall, F1

Understanding the Confusion Matrix

The foundation of classification metrics

✓ True Positive

Predicted +, Actually +

✗ False Positive

Predicted +, Actually -

✗ False Negative

Predicted -, Actually +

✓ True Negative

Predicted -, Actually -

Classification Metrics

Metric	Formula	Use When
Accuracy	(TP + TN) / Total	Balanced classes
Precision	TP / (TP + FP)	False positives costly
Recall	TP / (TP + FN)	False negatives costly
F1 Score	2 * (P * R) / (P + R)	Imbalanced classes

Choosing Metrics: Business Context

Different applications need different metrics

↑ High Recall Priority

Medical Screening: Catch all diseases
Defect Detection: Don't ship bad products
Security Threats: Detect all risks

↑ High Precision Priority

Spam Filtering: Don't block legit emails
Autonomous Driving: Don't brake unnecessarily
Recommendations: Show relevant items

Object Detection Metrics

Additional metrics for detection tasks

IoU (Intersection over Union): Measures box overlap quality
mAP (mean Average Precision): Standard detection benchmark
mAP@50: mAP at 50% IoU threshold (lenient)
mAP@50:95: Average across IoU thresholds (strict)

1

Predicted Box

→

2

IoU > threshold?

→

3

True Positive

Calculating IoU

def calculate_iou(box1, box2):
    """Calculate IoU between two boxes [x1, y1, x2, y2]."""
    # Calculate intersection
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])

    intersection = max(0, x2 - x1) * max(0, y2 - y1)

    # Calculate union
    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])
    union = area1 + area2 - intersection

    return intersection / union if union > 0 else 0

# Example
iou = calculate_iou([10, 10, 50, 50], [20, 20, 60, 60])
print(f"IoU: {iou:.2f}")  # IoU: 0.33

Self-Practice Assignment 4

Duration: 1.5 hours | Deadline: Before Session 5

Task: Train a Custom Classifier

Dataset Creation (30 min)
- Choose a domain relevant to your business case
- Collect 100+ images (20+ per class, 3-5 classes)
- Split into train/val (80/20)
Model Training (30 min)
- Use PyTorch transfer learning code OR
- Use Roboflow / Teachable Machine
Evaluation (30 min)
- Test on held-out images
- Report accuracy, precision, recall
- Analyze failure cases

Deliverable: Training report with performance analysis

Session 4 Summary

1 Decision Framework

API for generic tasks
Custom for domain-specific
Cost crossover analysis

2 Transfer Learning

Feature extraction
Fine-tuning strategies
Pre-trained backbones

3 Data Augmentation

Match real variations
Albumentations library
Domain-specific techniques

4 AutoML

No-code option
Rapid prototyping
Platform comparison

Looking Ahead: Session 5

Ethics, Governance & Final Presentations

1

Session 4: Custom Models

→

2

Session 5: Ethics

Challenges, Bias, Regulations, Projects

Next Session: Responsible AI and final project presentations

Computer Vision for Business

Session 4: Custom Models & Transfer Learning

Session 4: Learning Objectives

Session 4 Roadmap

The Spectrum of CV Solutions

When to Build Custom Models

✓ Use Cloud APIs When

★ Build Custom When

Decision Framework

Cost Crossover Point

Cloud API Costs

Custom Model Costs

What is Transfer Learning?

✗ Traditional ML

✓ Transfer Learning

Why Transfer Learning Works

Transfer Learning Approaches

1 Feature Extraction

2 Fine-Tuning

Feature Extraction vs Fine-Tuning

Popular Pre-trained Models

Lab 5: Custom Classifier with PyTorch

Lab 5: Organizing Your Data

Lab 5: Creating the Data Pipeline

Lab 5: Creating DataLoaders

Lab 5: Building the Model

Lab 5: Model Architecture

Lab 5: The Training Loop

Lab 5: Validation Step

Lab 5: Complete Training Pipeline

Lab 6: Data Augmentation

Types of Augmentations

1 Geometric

2 Color/Intensity

3 Noise/Blur

Lab 6: Using Albumentations

Lab 6: Domain-Specific Augmentations

Augmentation Best Practices

✓ DO

✗ DON'T

AutoML: No-Code Training

AutoML Platform Comparison

AutoML Workflow (Roboflow)

Model Evaluation

Understanding the Confusion Matrix

✓ True Positive

✗ False Positive

✗ False Negative

✓ True Negative

Classification Metrics

Choosing Metrics: Business Context

↑ High Recall Priority

↑ High Precision Priority

Object Detection Metrics

Calculating IoU

Self-Practice Assignment 4

Task: Train a Custom Classifier

Session 4 Summary

1 Decision Framework

2 Transfer Learning

3 Data Augmentation

4 AutoML

Looking Ahead: Session 5

Slide Overview