← Back to Presentations
Practical Work 3

Building a Preprocessing Pipeline

Create a complete data augmentation and preprocessing pipeline for computer vision tasks

Duration 1.5 hours
Difficulty Intermediate
Session 3 - Feature Engineering

Objectives

By the end of this practical work, you will be able to:

  • Build a complete augmentation pipeline with Albumentations
  • Visualize augmentation effects on images
  • Create a PyTorch DataLoader for efficient batch processing
  • Verify batch shapes and value ranges for model compatibility

Prerequisites

  • Basic understanding of Python and NumPy
  • Familiarity with image processing concepts
  • Completed Practical Work 1 and 2 (recommended)

Install required packages:

pip install albumentations torch torchvision

Instructions

Step 1: Install Albumentations and Explore Documentation

Start by installing Albumentations and exploring its capabilities:

import albumentations as A  # (#1:Import Albumentations library)
from albumentations.pytorch import ToTensorV2  # (#2:PyTorch tensor conversion)
import cv2
import numpy as np
import matplotlib.pyplot as plt

# Check version and explore available transforms
print(f"Albumentations version: {A.__version__}")  # (#3:Verify installation)

# List some available transforms
transforms_list = [name for name in dir(A) if name[0].isupper()]
print(f"Available transforms: {len(transforms_list)}")  # (#4:Explore available transforms)

Note: Albumentations is optimized for speed and provides a wide variety of augmentation techniques specifically designed for computer vision tasks.

Step 2: Define Geometric Augmentation Pipeline

Create geometric transformations that modify spatial properties:

geometric_transform = A.Compose([
    A.HorizontalFlip(p=0.5),  # (#1:Random horizontal flip with 50% probability)
    A.VerticalFlip(p=0.1),  # (#2:Random vertical flip with 10% probability)
    A.Rotate(limit=30, p=0.5),  # (#3:Random rotation between -30 and +30 degrees)
    A.ShiftScaleRotate(
        shift_limit=0.1,  # (#4:Shift image by up to 10% of dimensions)
        scale_limit=0.2,  # (#5:Scale image by up to 20%)
        rotate_limit=15,  # (#6:Additional rotation up to 15 degrees)
        border_mode=cv2.BORDER_REFLECT,
        p=0.5
    ),
    A.Affine(
        scale=(0.9, 1.1),  # (#7:Affine scaling range)
        translate_percent={"x": (-0.1, 0.1), "y": (-0.1, 0.1)},  # (#8:Translation range)
        p=0.3
    ),
])

# Test on a sample image
image = cv2.imread("sample_image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # (#9:Convert BGR to RGB)
transformed = geometric_transform(image=image)
transformed_image = transformed["image"]  # (#10:Extract transformed image)

Step 3: Define Photometric Augmentation Pipeline

Create transformations that modify color and intensity properties:

photometric_transform = A.Compose([
    A.RandomBrightnessContrast(
        brightness_limit=0.2,  # (#1:Adjust brightness by up to 20%)
        contrast_limit=0.2,  # (#2:Adjust contrast by up to 20%)
        p=0.5
    ),
    A.HueSaturationValue(
        hue_shift_limit=20,  # (#3:Shift hue by up to 20)
        sat_shift_limit=30,  # (#4:Shift saturation by up to 30)
        val_shift_limit=20,  # (#5:Shift value by up to 20)
        p=0.5
    ),
    A.GaussNoise(
        var_limit=(10.0, 50.0),  # (#6:Add Gaussian noise with variance range)
        p=0.3
    ),
    A.GaussianBlur(
        blur_limit=(3, 7),  # (#7:Apply Gaussian blur with kernel size 3-7)
        p=0.2
    ),
    A.CLAHE(
        clip_limit=4.0,  # (#8:Contrast Limited Adaptive Histogram Equalization)
        p=0.3
    ),
    A.ColorJitter(
        brightness=0.2,  # (#9:Random color jittering)
        contrast=0.2,
        saturation=0.2,
        hue=0.1,
        p=0.3
    ),
])

Warning: Be careful with noise and blur parameters - too much can degrade image quality and hurt model performance.

Expected Output: Before/After Augmentation
IMG
Original
->
IMG
Brightness +
->
IMG
Blur + Dim
Photometric augmentations modify brightness, contrast, saturation, and add noise/blur

Step 4: Combine into Full Training Transform

Merge geometric and photometric transforms into a complete pipeline:

train_transform = A.Compose([
    # Resize to standard input size
    A.Resize(height=224, width=224),  # (#1:Resize to model input size)

    # Geometric augmentations
    A.HorizontalFlip(p=0.5),  # (#2:Horizontal flip)
    A.ShiftScaleRotate(
        shift_limit=0.1,
        scale_limit=0.2,
        rotate_limit=30,
        p=0.5
    ),  # (#3:Combined geometric transform)

    # Photometric augmentations
    A.OneOf([  # (#4:Randomly select one of these transforms)
        A.RandomBrightnessContrast(p=1),
        A.HueSaturationValue(p=1),
        A.ColorJitter(p=1),
    ], p=0.5),

    A.OneOf([  # (#5:Randomly select one noise/blur transform)
        A.GaussNoise(var_limit=(10.0, 30.0), p=1),
        A.GaussianBlur(blur_limit=(3, 5), p=1),
        A.MotionBlur(blur_limit=5, p=1),
    ], p=0.2),

    # Normalize and convert to tensor
    A.Normalize(
        mean=[0.485, 0.456, 0.406],  # (#6:ImageNet mean values)
        std=[0.229, 0.224, 0.225],  # (#7:ImageNet std values)
    ),
    ToTensorV2(),  # (#8:Convert to PyTorch tensor (C, H, W) format)
])

Step 5: Create Validation Transform

Define a minimal transform for validation/test data:

val_transform = A.Compose([
    A.Resize(height=224, width=224),  # (#1:Resize to match training size)
    A.Normalize(
        mean=[0.485, 0.456, 0.406],  # (#2:Same normalization as training)
        std=[0.229, 0.224, 0.225],
    ),
    ToTensorV2(),  # (#3:Convert to tensor)
])

# Example usage
val_transformed = val_transform(image=image)
val_tensor = val_transformed["image"]
print(f"Validation tensor shape: {val_tensor.shape}")  # (#4:Should be (3, 224, 224))

Note: Validation transforms should only include deterministic operations (resize, normalize) - no random augmentations.

Step 6: Visualize Augmented Versions

Create a visualization grid showing multiple augmented versions of the same image:

def visualize_augmentations(image, transform, n_samples=9):
    """Visualize N augmented versions of the same image."""
    fig, axes = plt.subplots(3, 3, figsize=(12, 12))  # (#1:Create 3x3 grid)
    axes = axes.flatten()

    # Show original image first
    axes[0].imshow(image)
    axes[0].set_title("Original", fontsize=12)
    axes[0].axis("off")

    # Create visualization transform (without normalize/tensor)
    viz_transform = A.Compose([
        t for t in transform.transforms  # (#2:Filter transforms)
        if not isinstance(t, (A.Normalize, ToTensorV2))
    ])

    # Generate augmented versions
    for i in range(1, n_samples):
        augmented = viz_transform(image=image)  # (#3:Apply augmentation)
        axes[i].imshow(augmented["image"])
        axes[i].set_title(f"Augmented {i}", fontsize=12)
        axes[i].axis("off")

    plt.suptitle("Data Augmentation Visualization", fontsize=14, y=1.02)
    plt.tight_layout()
    plt.savefig("augmentation_grid.png", dpi=150, bbox_inches="tight")  # (#4:Save visualization)
    plt.show()

# Generate visualization
visualize_augmentations(image, train_transform, n_samples=9)  # (#5:Call visualization function)
Expected Output: 3x3 Augmentation Visualization Grid
IMG Original
IMG Flip H
IMG Rotate
IMG Bright
IMG Dark
IMG Scale
IMG Flip+Rot
IMG Blur
IMG Contrast
Each run produces different random augmentations. Your results will vary.
Augmentation Pipeline Flow
Input
Image
->
Resize
224x224
->
Geometric
Augment
->
Photometric
Augment
->
Normalize
ImageNet
->
Tensor
(C,H,W)
Training pipeline applies random augmentations; validation only resizes and normalizes

Step 7: Create Custom PyTorch Dataset Class

Implement a custom Dataset that applies transforms:

import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
from PIL import Image

class ImageDataset(Dataset):
    """Custom PyTorch Dataset with Albumentations transforms."""

    def __init__(self, image_dir, transform=None):
        self.image_dir = Path(image_dir)  # (#1:Store image directory path)
        self.transform = transform  # (#2:Store transform pipeline)

        # Collect all image paths
        self.image_paths = list(self.image_dir.glob("*.jpg")) + \
                          list(self.image_dir.glob("*.png"))  # (#3:Gather image files)

        print(f"Found {len(self.image_paths)} images")

    def __len__(self):
        return len(self.image_paths)  # (#4:Return dataset size)

    def __getitem__(self, idx):
        # Load image
        image_path = self.image_paths[idx]
        image = cv2.imread(str(image_path))  # (#5:Load image with OpenCV)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # (#6:Convert to RGB)

        # Apply transforms
        if self.transform:
            transformed = self.transform(image=image)  # (#7:Apply Albumentations transform)
            image = transformed["image"]

        # Extract label from filename or directory (example)
        label = 0  # (#8:Placeholder - implement your labeling logic)

        return image, label  # (#9:Return image tensor and label)

# Create dataset instances
train_dataset = ImageDataset("data/train", transform=train_transform)
val_dataset = ImageDataset("data/val", transform=val_transform)

Step 8: Initialize DataLoader with Batching

Create DataLoaders for efficient batch processing:

# Create DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=32,  # (#1:Batch size - adjust based on GPU memory)
    shuffle=True,  # (#2:Shuffle training data each epoch)
    num_workers=4,  # (#3:Parallel data loading workers)
    pin_memory=True,  # (#4:Faster GPU transfer)
    drop_last=True,  # (#5:Drop incomplete last batch)
)

val_loader = DataLoader(
    val_dataset,
    batch_size=32,
    shuffle=False,  # (#6:No shuffle for validation)
    num_workers=4,
    pin_memory=True,
)

print(f"Training batches: {len(train_loader)}")
print(f"Validation batches: {len(val_loader)}")

Tip: Set num_workers=0 on Windows if you encounter multiprocessing issues.

Step 9: Verify Batch Shapes and Value Ranges

Validate that the DataLoader produces correctly formatted batches:

def verify_dataloader(dataloader, name="DataLoader"):
    """Verify batch shapes and value ranges."""
    print(f"\n{'='*50}")
    print(f"Verifying {name}")
    print(f"{'='*50}")

    # Get a single batch
    images, labels = next(iter(dataloader))  # (#1:Get first batch)

    # Check shapes
    print(f"Batch shape: {images.shape}")  # (#2:Expected: (B, C, H, W))
    print(f"Labels shape: {labels.shape}")

    # Check data type
    print(f"Image dtype: {images.dtype}")  # (#3:Should be float32)
    print(f"Labels dtype: {labels.dtype}")

    # Check value ranges
    print(f"Image min: {images.min():.4f}")  # (#4:Check min value)
    print(f"Image max: {images.max():.4f}")  # (#5:Check max value)
    print(f"Image mean: {images.mean():.4f}")  # (#6:Check mean)
    print(f"Image std: {images.std():.4f}")  # (#7:Check std)

    # Verify normalization (should be approximately mean=0, std=1)
    per_channel_mean = images.mean(dim=[0, 2, 3])  # (#8:Per-channel statistics)
    per_channel_std = images.std(dim=[0, 2, 3])
    print(f"Per-channel mean: {per_channel_mean}")
    print(f"Per-channel std: {per_channel_std}")

    # Memory info
    print(f"Batch memory: {images.element_size() * images.nelement() / 1e6:.2f} MB")  # (#9:Memory usage)

    return images, labels

train_images, train_labels = verify_dataloader(train_loader, "Training DataLoader")
val_images, val_labels = verify_dataloader(val_loader, "Validation DataLoader")

Expected: Images should have shape (32, 3, 224, 224) and values roughly centered around 0 after normalization.

Expected Batch Properties
(
32 Batch
,
3 Channels
,
224 Height
,
224 Width
)
~-2.1
Min value
~0
Mean
~2.6
Max value
float32
dtype
After ImageNet normalization, pixel values are centered around 0. Your exact values may vary.

Step 10: Benchmark Augmentation Speed

Measure the performance of your augmentation pipeline:

import time

def benchmark_augmentation(transform, image, n_iterations=1000):
    """Benchmark augmentation speed."""
    # Warmup
    for _ in range(10):  # (#1:Warmup iterations)
        transform(image=image)

    # Benchmark
    start_time = time.time()  # (#2:Start timer)
    for _ in range(n_iterations):
        transform(image=image)
    elapsed = time.time() - start_time  # (#3:Calculate elapsed time)

    images_per_second = n_iterations / elapsed  # (#4:Calculate throughput)
    ms_per_image = (elapsed / n_iterations) * 1000  # (#5:Calculate latency)

    print(f"Processed {n_iterations} images in {elapsed:.2f}s")
    print(f"Throughput: {images_per_second:.1f} images/second")
    print(f"Latency: {ms_per_image:.2f} ms/image")

    return images_per_second

# Benchmark different transforms
print("Training transform benchmark:")
train_speed = benchmark_augmentation(train_transform, image)  # (#6:Benchmark training transforms)

print("\nValidation transform benchmark:")
val_speed = benchmark_augmentation(val_transform, image)  # (#7:Benchmark validation transforms)

print(f"\nValidation is {val_speed/train_speed:.1f}x faster than training")  # (#8:Compare speeds)

Performance Tip: Albumentations is optimized with OpenCV and NumPy, making it significantly faster than PIL-based alternatives.

Expected Output: Benchmark Comparison
~450
Training
Transform
~1200
Validation
Transform
Images per second (higher is better)
Validation transform is ~2-3x faster due to lack of random augmentations. Your speeds depend on hardware.

Expected Output

After completing this practical work, you should have:

  • A complete augmentation pipeline combining geometric and photometric transforms
  • Training and validation transform pipelines with appropriate differences
  • A visualization grid showing augmented versions of your images
  • A working PyTorch Dataset class with integrated Albumentations transforms
  • DataLoaders producing batches with correct shapes: (batch_size, 3, 224, 224)
  • Verified value ranges (normalized values roughly between -3 and 3)
  • Performance benchmarks showing augmentation throughput

Sample Output

Verifying Training DataLoader
==================================================
Batch shape: torch.Size([32, 3, 224, 224])
Labels shape: torch.Size([32])
Image dtype: torch.float32
Labels dtype: torch.int64
Image min: -2.1179
Image max: 2.6400
Image mean: 0.0234
Image std: 1.0156
Per-channel mean: tensor([-0.0312,  0.0456,  0.0348])
Per-channel std: tensor([0.9987, 1.0234, 1.0247])
Batch memory: 19.27 MB

Training transform benchmark:
Processed 1000 images in 2.34s
Throughput: 427.4 images/second
Latency: 2.34 ms/image

Deliverables

  • Jupyter Notebook: Complete notebook with all code cells executed and outputs visible
  • Augmentation Visualization Grid: Saved image file (augmentation_grid.png) showing 9 augmented versions
  • Custom Dataset Class: Reusable ImageDataset class with proper documentation

Bonus Challenges

  • Implement Mixup Augmentation: Create a custom collate function that performs Mixup between pairs of images in a batch
  • Test CutMix: Implement CutMix augmentation that cuts and pastes patches between training images
  • Compare with torchvision transforms: Benchmark the same augmentations using torchvision.transforms and compare speed and memory usage
  • AutoAugment: Explore Albumentations' AutoAugment policies for automated augmentation selection

Mixup Implementation Hint

def mixup_data(x, y, alpha=0.2):
    """Apply Mixup augmentation to a batch."""
    lam = np.random.beta(alpha, alpha)  # (#1:Sample mixing coefficient)
    batch_size = x.size(0)
    index = torch.randperm(batch_size)  # (#2:Random permutation for pairing)

    mixed_x = lam * x + (1 - lam) * x[index]  # (#3:Mix images)
    y_a, y_b = y, y[index]  # (#4:Keep both labels)

    return mixed_x, y_a, y_b, lam  # (#5:Return mixed data)

Resources