Practical Work 3

Building a Preprocessing Pipeline

Create a complete data augmentation and preprocessing pipeline for computer vision tasks

Duration 1.5 hours

Difficulty Intermediate

Session 3 - Feature Engineering

Objectives

By the end of this practical work, you will be able to:

Build a complete augmentation pipeline with Albumentations
Visualize augmentation effects on images
Create a PyTorch DataLoader for efficient batch processing
Verify batch shapes and value ranges for model compatibility

Prerequisites

Basic understanding of Python and NumPy
Familiarity with image processing concepts
Completed Practical Work 1 and 2 (recommended)

Install required packages:

pip install albumentations torch torchvision

Instructions

Step 1: Install Albumentations and Explore Documentation

Start by installing Albumentations and exploring its capabilities:

import albumentations as A  # (#1:Import Albumentations library)
from albumentations.pytorch import ToTensorV2  # (#2:PyTorch tensor conversion)
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Check version and explore available transforms
print(f"Albumentations version: {A.__version__}")  # (#3:Verify installation)

# List some available transforms
transforms_list = [name for name in dir(A) if name[0].isupper()]
print(f"Available transforms: {len(transforms_list)}")  # (#4:Explore available transforms)

Note: Albumentations is optimized for speed and provides a wide variety of augmentation techniques specifically designed for computer vision tasks.

Step 2: Define Geometric Augmentation Pipeline

Create geometric transformations that modify spatial properties:

geometric_transform = A.Compose([
    A.HorizontalFlip(p=0.5),  # (#1:Random horizontal flip with 50% probability)
    A.VerticalFlip(p=0.1),  # (#2:Random vertical flip with 10% probability)
    A.Rotate(limit=30, p=0.5),  # (#3:Random rotation between -30 and +30 degrees)
    A.ShiftScaleRotate(
        shift_limit=0.1,  # (#4:Shift image by up to 10% of dimensions)
        scale_limit=0.2,  # (#5:Scale image by up to 20%)
        rotate_limit=15,  # (#6:Additional rotation up to 15 degrees)
        border_mode=cv2.BORDER_REFLECT,
        p=0.5
    ),
    A.Affine(
        scale=(0.9, 1.1),  # (#7:Affine scaling range)
        translate_percent={"x": (-0.1, 0.1), "y": (-0.1, 0.1)},  # (#8:Translation range)
        p=0.3
    ),
])

# Test on a sample image
image = cv2.imread("sample_image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # (#9:Convert BGR to RGB)
transformed = geometric_transform(image=image)
transformed_image = transformed["image"]  # (#10:Extract transformed image)

Step 3: Define Photometric Augmentation Pipeline

Create transformations that modify color and intensity properties:

photometric_transform = A.Compose([
    A.RandomBrightnessContrast(
        brightness_limit=0.2,  # (#1:Adjust brightness by up to 20%)
        contrast_limit=0.2,  # (#2:Adjust contrast by up to 20%)
        p=0.5
    ),
    A.HueSaturationValue(
        hue_shift_limit=20,  # (#3:Shift hue by up to 20)
        sat_shift_limit=30,  # (#4:Shift saturation by up to 30)
        val_shift_limit=20,  # (#5:Shift value by up to 20)
        p=0.5
    ),
    A.GaussNoise(
        var_limit=(10.0, 50.0),  # (#6:Add Gaussian noise with variance range)
        p=0.3
    ),
    A.GaussianBlur(
        blur_limit=(3, 7),  # (#7:Apply Gaussian blur with kernel size 3-7)
        p=0.2
    ),
    A.CLAHE(
        clip_limit=4.0,  # (#8:Contrast Limited Adaptive Histogram Equalization)
        p=0.3
    ),
    A.ColorJitter(
        brightness=0.2,  # (#9:Random color jittering)
        contrast=0.2,
        saturation=0.2,
        hue=0.1,
        p=0.3
    ),
])

Warning: Be careful with noise and blur parameters - too much can degrade image quality and hurt model performance.

Expected Output: Before/After Augmentation

IMG

Original

IMG

Brightness +

IMG

Blur + Dim

Photometric augmentations modify brightness, contrast, saturation, and add noise/blur

Step 4: Combine into Full Training Transform

Merge geometric and photometric transforms into a complete pipeline:

train_transform = A.Compose([
    # Resize to standard input size
    A.Resize(height=224, width=224),  # (#1:Resize to model input size)

    # Geometric augmentations
    A.HorizontalFlip(p=0.5),  # (#2:Horizontal flip)
    A.ShiftScaleRotate(
        shift_limit=0.1,
        scale_limit=0.2,
        rotate_limit=30,
        p=0.5
    ),  # (#3:Combined geometric transform)

    # Photometric augmentations
    A.OneOf([  # (#4:Randomly select one of these transforms)
        A.RandomBrightnessContrast(p=1),
        A.HueSaturationValue(p=1),
        A.ColorJitter(p=1),
    ], p=0.5),

    A.OneOf([  # (#5:Randomly select one noise/blur transform)
        A.GaussNoise(var_limit=(10.0, 30.0), p=1),
        A.GaussianBlur(blur_limit=(3, 5), p=1),
        A.MotionBlur(blur_limit=5, p=1),
    ], p=0.2),

    # Normalize and convert to tensor
    A.Normalize(
        mean=[0.485, 0.456, 0.406],  # (#6:ImageNet mean values)
        std=[0.229, 0.224, 0.225],  # (#7:ImageNet std values)
    ),
    ToTensorV2(),  # (#8:Convert to PyTorch tensor (C, H, W) format)
])

Step 5: Create Validation Transform

Define a minimal transform for validation/test data:

val_transform = A.Compose([
    A.Resize(height=224, width=224),  # (#1:Resize to match training size)
    A.Normalize(
        mean=[0.485, 0.456, 0.406],  # (#2:Same normalization as training)
        std=[0.229, 0.224, 0.225],
    ),
    ToTensorV2(),  # (#3:Convert to tensor)
])

# Example usage
val_transformed = val_transform(image=image)
val_tensor = val_transformed["image"]
print(f"Validation tensor shape: {val_tensor.shape}")  # (#4:Should be (3, 224, 224))

Note: Validation transforms should only include deterministic operations (resize, normalize) - no random augmentations.

Step 6: Visualize Augmented Versions

Create a visualization grid showing multiple augmented versions of the same image:

def visualize_augmentations(image, transform, n_samples=9):
    """Visualize N augmented versions of the same image."""
    fig, axes = plt.subplots(3, 3, figsize=(12, 12))  # (#1:Create 3x3 grid)
    axes = axes.flatten()

    # Show original image first
    axes[0].imshow(image)
    axes[0].set_title("Original", fontsize=12)
    axes[0].axis("off")

    # Create visualization transform (without normalize/tensor)
    viz_transform = A.Compose([
        t for t in transform.transforms  # (#2:Filter transforms)
        if not isinstance(t, (A.Normalize, ToTensorV2))
    ])

    # Generate augmented versions
    for i in range(1, n_samples):
        augmented = viz_transform(image=image)  # (#3:Apply augmentation)
        axes[i].imshow(augmented["image"])
        axes[i].set_title(f"Augmented {i}", fontsize=12)
        axes[i].axis("off")

    plt.suptitle("Data Augmentation Visualization", fontsize=14, y=1.02)
    plt.tight_layout()
    plt.savefig("augmentation_grid.png", dpi=150, bbox_inches="tight")  # (#4:Save visualization)
    plt.show()

# Generate visualization
visualize_augmentations(image, train_transform, n_samples=9)  # (#5:Call visualization function)

Expected Output: 3x3 Augmentation Visualization Grid

IMG Original

IMG Flip H

IMG Rotate

IMG Bright

IMG Dark

IMG Scale

IMG Flip+Rot

IMG Blur

IMG Contrast

Each run produces different random augmentations. Your results will vary.

Augmentation Pipeline Flow

Input
Image

Resize
224x224

Geometric
Augment

Photometric
Augment

Normalize
ImageNet

Tensor
(C,H,W)

Training pipeline applies random augmentations; validation only resizes and normalizes

Step 7: Create Custom PyTorch Dataset Class

Implement a custom Dataset that applies transforms:

import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
from PIL import Image

class ImageDataset(Dataset):
    """Custom PyTorch Dataset with Albumentations transforms."""

    def __init__(self, image_dir, transform=None):
        self.image_dir = Path(image_dir)  # (#1:Store image directory path)
        self.transform = transform  # (#2:Store transform pipeline)

        # Collect all image paths
        self.image_paths = list(self.image_dir.glob("*.jpg")) + \
                          list(self.image_dir.glob("*.png"))  # (#3:Gather image files)

        print(f"Found {len(self.image_paths)} images")

    def __len__(self):
        return len(self.image_paths)  # (#4:Return dataset size)

    def __getitem__(self, idx):
        # Load image
        image_path = self.image_paths[idx]
        image = cv2.imread(str(image_path))  # (#5:Load image with OpenCV)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # (#6:Convert to RGB)

        # Apply transforms
        if self.transform:
            transformed = self.transform(image=image)  # (#7:Apply Albumentations transform)
            image = transformed["image"]

        # Extract label from filename or directory (example)
        label = 0  # (#8:Placeholder - implement your labeling logic)

        return image, label  # (#9:Return image tensor and label)

# Create dataset instances
train_dataset = ImageDataset("data/train", transform=train_transform)
val_dataset = ImageDataset("data/val", transform=val_transform)

Step 8: Initialize DataLoader with Batching

Create DataLoaders for efficient batch processing:

# Create DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=32,  # (#1:Batch size - adjust based on GPU memory)
    shuffle=True,  # (#2:Shuffle training data each epoch)
    num_workers=4,  # (#3:Parallel data loading workers)
    pin_memory=True,  # (#4:Faster GPU transfer)
    drop_last=True,  # (#5:Drop incomplete last batch)
)

val_loader = DataLoader(
    val_dataset,
    batch_size=32,
    shuffle=False,  # (#6:No shuffle for validation)
    num_workers=4,
    pin_memory=True,
)

print(f"Training batches: {len(train_loader)}")
print(f"Validation batches: {len(val_loader)}")

Tip: Set num_workers=0 on Windows if you encounter multiprocessing issues.

Step 9: Verify Batch Shapes and Value Ranges

Validate that the DataLoader produces correctly formatted batches:

def verify_dataloader(dataloader, name="DataLoader"):
    """Verify batch shapes and value ranges."""
    print(f"\n{'='*50}")
    print(f"Verifying {name}")
    print(f"{'='*50}")

    # Get a single batch
    images, labels = next(iter(dataloader))  # (#1:Get first batch)

    # Check shapes
    print(f"Batch shape: {images.shape}")  # (#2:Expected: (B, C, H, W))
    print(f"Labels shape: {labels.shape}")

    # Check data type
    print(f"Image dtype: {images.dtype}")  # (#3:Should be float32)
    print(f"Labels dtype: {labels.dtype}")

    # Check value ranges
    print(f"Image min: {images.min():.4f}")  # (#4:Check min value)
    print(f"Image max: {images.max():.4f}")  # (#5:Check max value)
    print(f"Image mean: {images.mean():.4f}")  # (#6:Check mean)
    print(f"Image std: {images.std():.4f}")  # (#7:Check std)

    # Verify normalization (should be approximately mean=0, std=1)
    per_channel_mean = images.mean(dim=[0, 2, 3])  # (#8:Per-channel statistics)
    per_channel_std = images.std(dim=[0, 2, 3])
    print(f"Per-channel mean: {per_channel_mean}")
    print(f"Per-channel std: {per_channel_std}")

    # Memory info
    print(f"Batch memory: {images.element_size() * images.nelement() / 1e6:.2f} MB")  # (#9:Memory usage)

    return images, labels

train_images, train_labels = verify_dataloader(train_loader, "Training DataLoader")
val_images, val_labels = verify_dataloader(val_loader, "Validation DataLoader")

Expected: Images should have shape (32, 3, 224, 224) and values roughly centered around 0 after normalization.

Expected Batch Properties

(

32 Batch

3 Channels

224 Height

224 Width

)

~-2.1

Min value

Mean

~2.6

Max value

float32

dtype

After ImageNet normalization, pixel values are centered around 0. Your exact values may vary.

Step 10: Benchmark Augmentation Speed

Measure the performance of your augmentation pipeline:

import time

def benchmark_augmentation(transform, image, n_iterations=1000):
    """Benchmark augmentation speed."""
    # Warmup
    for _ in range(10):  # (#1:Warmup iterations)
        transform(image=image)

    # Benchmark
    start_time = time.time()  # (#2:Start timer)
    for _ in range(n_iterations):
        transform(image=image)
    elapsed = time.time() - start_time  # (#3:Calculate elapsed time)

    images_per_second = n_iterations / elapsed  # (#4:Calculate throughput)
    ms_per_image = (elapsed / n_iterations) * 1000  # (#5:Calculate latency)

    print(f"Processed {n_iterations} images in {elapsed:.2f}s")
    print(f"Throughput: {images_per_second:.1f} images/second")
    print(f"Latency: {ms_per_image:.2f} ms/image")

    return images_per_second

# Benchmark different transforms
print("Training transform benchmark:")
train_speed = benchmark_augmentation(train_transform, image)  # (#6:Benchmark training transforms)

print("\nValidation transform benchmark:")
val_speed = benchmark_augmentation(val_transform, image)  # (#7:Benchmark validation transforms)

print(f"\nValidation is {val_speed/train_speed:.1f}x faster than training")  # (#8:Compare speeds)

Performance Tip: Albumentations is optimized with OpenCV and NumPy, making it significantly faster than PIL-based alternatives.

Expected Output: Benchmark Comparison

~450

Training
Transform

~1200

Validation
Transform

Images per second (higher is better)

Validation transform is ~2-3x faster due to lack of random augmentations. Your speeds depend on hardware.

Expected Output

After completing this practical work, you should have:

A complete augmentation pipeline combining geometric and photometric transforms
Training and validation transform pipelines with appropriate differences
A visualization grid showing augmented versions of your images
A working PyTorch Dataset class with integrated Albumentations transforms
DataLoaders producing batches with correct shapes: (batch_size, 3, 224, 224)
Verified value ranges (normalized values roughly between -3 and 3)
Performance benchmarks showing augmentation throughput

Sample Output

Verifying Training DataLoader
==================================================
Batch shape: torch.Size([32, 3, 224, 224])
Labels shape: torch.Size([32])
Image dtype: torch.float32
Labels dtype: torch.int64
Image min: -2.1179
Image max: 2.6400
Image mean: 0.0234
Image std: 1.0156
Per-channel mean: tensor([-0.0312,  0.0456,  0.0348])
Per-channel std: tensor([0.9987, 1.0234, 1.0247])
Batch memory: 19.27 MB

Training transform benchmark:
Processed 1000 images in 2.34s
Throughput: 427.4 images/second
Latency: 2.34 ms/image

Deliverables

Jupyter Notebook: Complete notebook with all code cells executed and outputs visible
Augmentation Visualization Grid: Saved image file (augmentation_grid.png) showing 9 augmented versions
Custom Dataset Class: Reusable ImageDataset class with proper documentation

Bonus Challenges

Implement Mixup Augmentation: Create a custom collate function that performs Mixup between pairs of images in a batch
Test CutMix: Implement CutMix augmentation that cuts and pastes patches between training images
Compare with torchvision transforms: Benchmark the same augmentations using torchvision.transforms and compare speed and memory usage
AutoAugment: Explore Albumentations' AutoAugment policies for automated augmentation selection

Mixup Implementation Hint

def mixup_data(x, y, alpha=0.2):
    """Apply Mixup augmentation to a batch."""
    lam = np.random.beta(alpha, alpha)  # (#1:Sample mixing coefficient)
    batch_size = x.size(0)
    index = torch.randperm(batch_size)  # (#2:Random permutation for pairing)

    mixed_x = lam * x + (1 - lam) * x[index]  # (#3:Mix images)
    y_a, y_b = y, y[index]  # (#4:Keep both labels)

    return mixed_x, y_a, y_b, lam  # (#5:Return mixed data)