Building a Preprocessing Pipeline
Create a complete data augmentation and preprocessing pipeline for computer vision tasks
Objectives
By the end of this practical work, you will be able to:
- Build a complete augmentation pipeline with Albumentations
- Visualize augmentation effects on images
- Create a PyTorch DataLoader for efficient batch processing
- Verify batch shapes and value ranges for model compatibility
Prerequisites
- Basic understanding of Python and NumPy
- Familiarity with image processing concepts
- Completed Practical Work 1 and 2 (recommended)
Install required packages:
pip install albumentations torch torchvision
Instructions
Step 1: Install Albumentations and Explore Documentation
Start by installing Albumentations and exploring its capabilities:
import albumentations as A # (#1:Import Albumentations library)
from albumentations.pytorch import ToTensorV2 # (#2:PyTorch tensor conversion)
import cv2
import numpy as np
import matplotlib.pyplot as plt
# Check version and explore available transforms
print(f"Albumentations version: {A.__version__}") # (#3:Verify installation)
# List some available transforms
transforms_list = [name for name in dir(A) if name[0].isupper()]
print(f"Available transforms: {len(transforms_list)}") # (#4:Explore available transforms)
Note: Albumentations is optimized for speed and provides a wide variety of augmentation techniques specifically designed for computer vision tasks.
Step 2: Define Geometric Augmentation Pipeline
Create geometric transformations that modify spatial properties:
geometric_transform = A.Compose([
A.HorizontalFlip(p=0.5), # (#1:Random horizontal flip with 50% probability)
A.VerticalFlip(p=0.1), # (#2:Random vertical flip with 10% probability)
A.Rotate(limit=30, p=0.5), # (#3:Random rotation between -30 and +30 degrees)
A.ShiftScaleRotate(
shift_limit=0.1, # (#4:Shift image by up to 10% of dimensions)
scale_limit=0.2, # (#5:Scale image by up to 20%)
rotate_limit=15, # (#6:Additional rotation up to 15 degrees)
border_mode=cv2.BORDER_REFLECT,
p=0.5
),
A.Affine(
scale=(0.9, 1.1), # (#7:Affine scaling range)
translate_percent={"x": (-0.1, 0.1), "y": (-0.1, 0.1)}, # (#8:Translation range)
p=0.3
),
])
# Test on a sample image
image = cv2.imread("sample_image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # (#9:Convert BGR to RGB)
transformed = geometric_transform(image=image)
transformed_image = transformed["image"] # (#10:Extract transformed image)
Step 3: Define Photometric Augmentation Pipeline
Create transformations that modify color and intensity properties:
photometric_transform = A.Compose([
A.RandomBrightnessContrast(
brightness_limit=0.2, # (#1:Adjust brightness by up to 20%)
contrast_limit=0.2, # (#2:Adjust contrast by up to 20%)
p=0.5
),
A.HueSaturationValue(
hue_shift_limit=20, # (#3:Shift hue by up to 20)
sat_shift_limit=30, # (#4:Shift saturation by up to 30)
val_shift_limit=20, # (#5:Shift value by up to 20)
p=0.5
),
A.GaussNoise(
var_limit=(10.0, 50.0), # (#6:Add Gaussian noise with variance range)
p=0.3
),
A.GaussianBlur(
blur_limit=(3, 7), # (#7:Apply Gaussian blur with kernel size 3-7)
p=0.2
),
A.CLAHE(
clip_limit=4.0, # (#8:Contrast Limited Adaptive Histogram Equalization)
p=0.3
),
A.ColorJitter(
brightness=0.2, # (#9:Random color jittering)
contrast=0.2,
saturation=0.2,
hue=0.1,
p=0.3
),
])
Warning: Be careful with noise and blur parameters - too much can degrade image quality and hurt model performance.
Step 4: Combine into Full Training Transform
Merge geometric and photometric transforms into a complete pipeline:
train_transform = A.Compose([
# Resize to standard input size
A.Resize(height=224, width=224), # (#1:Resize to model input size)
# Geometric augmentations
A.HorizontalFlip(p=0.5), # (#2:Horizontal flip)
A.ShiftScaleRotate(
shift_limit=0.1,
scale_limit=0.2,
rotate_limit=30,
p=0.5
), # (#3:Combined geometric transform)
# Photometric augmentations
A.OneOf([ # (#4:Randomly select one of these transforms)
A.RandomBrightnessContrast(p=1),
A.HueSaturationValue(p=1),
A.ColorJitter(p=1),
], p=0.5),
A.OneOf([ # (#5:Randomly select one noise/blur transform)
A.GaussNoise(var_limit=(10.0, 30.0), p=1),
A.GaussianBlur(blur_limit=(3, 5), p=1),
A.MotionBlur(blur_limit=5, p=1),
], p=0.2),
# Normalize and convert to tensor
A.Normalize(
mean=[0.485, 0.456, 0.406], # (#6:ImageNet mean values)
std=[0.229, 0.224, 0.225], # (#7:ImageNet std values)
),
ToTensorV2(), # (#8:Convert to PyTorch tensor (C, H, W) format)
])
Step 5: Create Validation Transform
Define a minimal transform for validation/test data:
val_transform = A.Compose([
A.Resize(height=224, width=224), # (#1:Resize to match training size)
A.Normalize(
mean=[0.485, 0.456, 0.406], # (#2:Same normalization as training)
std=[0.229, 0.224, 0.225],
),
ToTensorV2(), # (#3:Convert to tensor)
])
# Example usage
val_transformed = val_transform(image=image)
val_tensor = val_transformed["image"]
print(f"Validation tensor shape: {val_tensor.shape}") # (#4:Should be (3, 224, 224))
Note: Validation transforms should only include deterministic operations (resize, normalize) - no random augmentations.
Step 6: Visualize Augmented Versions
Create a visualization grid showing multiple augmented versions of the same image:
def visualize_augmentations(image, transform, n_samples=9):
"""Visualize N augmented versions of the same image."""
fig, axes = plt.subplots(3, 3, figsize=(12, 12)) # (#1:Create 3x3 grid)
axes = axes.flatten()
# Show original image first
axes[0].imshow(image)
axes[0].set_title("Original", fontsize=12)
axes[0].axis("off")
# Create visualization transform (without normalize/tensor)
viz_transform = A.Compose([
t for t in transform.transforms # (#2:Filter transforms)
if not isinstance(t, (A.Normalize, ToTensorV2))
])
# Generate augmented versions
for i in range(1, n_samples):
augmented = viz_transform(image=image) # (#3:Apply augmentation)
axes[i].imshow(augmented["image"])
axes[i].set_title(f"Augmented {i}", fontsize=12)
axes[i].axis("off")
plt.suptitle("Data Augmentation Visualization", fontsize=14, y=1.02)
plt.tight_layout()
plt.savefig("augmentation_grid.png", dpi=150, bbox_inches="tight") # (#4:Save visualization)
plt.show()
# Generate visualization
visualize_augmentations(image, train_transform, n_samples=9) # (#5:Call visualization function)
Image
224x224
Augment
Augment
ImageNet
(C,H,W)
Step 7: Create Custom PyTorch Dataset Class
Implement a custom Dataset that applies transforms:
import torch
from torch.utils.data import Dataset, DataLoader
from pathlib import Path
from PIL import Image
class ImageDataset(Dataset):
"""Custom PyTorch Dataset with Albumentations transforms."""
def __init__(self, image_dir, transform=None):
self.image_dir = Path(image_dir) # (#1:Store image directory path)
self.transform = transform # (#2:Store transform pipeline)
# Collect all image paths
self.image_paths = list(self.image_dir.glob("*.jpg")) + \
list(self.image_dir.glob("*.png")) # (#3:Gather image files)
print(f"Found {len(self.image_paths)} images")
def __len__(self):
return len(self.image_paths) # (#4:Return dataset size)
def __getitem__(self, idx):
# Load image
image_path = self.image_paths[idx]
image = cv2.imread(str(image_path)) # (#5:Load image with OpenCV)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # (#6:Convert to RGB)
# Apply transforms
if self.transform:
transformed = self.transform(image=image) # (#7:Apply Albumentations transform)
image = transformed["image"]
# Extract label from filename or directory (example)
label = 0 # (#8:Placeholder - implement your labeling logic)
return image, label # (#9:Return image tensor and label)
# Create dataset instances
train_dataset = ImageDataset("data/train", transform=train_transform)
val_dataset = ImageDataset("data/val", transform=val_transform)
Step 8: Initialize DataLoader with Batching
Create DataLoaders for efficient batch processing:
# Create DataLoaders
train_loader = DataLoader(
train_dataset,
batch_size=32, # (#1:Batch size - adjust based on GPU memory)
shuffle=True, # (#2:Shuffle training data each epoch)
num_workers=4, # (#3:Parallel data loading workers)
pin_memory=True, # (#4:Faster GPU transfer)
drop_last=True, # (#5:Drop incomplete last batch)
)
val_loader = DataLoader(
val_dataset,
batch_size=32,
shuffle=False, # (#6:No shuffle for validation)
num_workers=4,
pin_memory=True,
)
print(f"Training batches: {len(train_loader)}")
print(f"Validation batches: {len(val_loader)}")
Tip: Set num_workers=0 on Windows if you encounter multiprocessing issues.
Step 9: Verify Batch Shapes and Value Ranges
Validate that the DataLoader produces correctly formatted batches:
def verify_dataloader(dataloader, name="DataLoader"):
"""Verify batch shapes and value ranges."""
print(f"\n{'='*50}")
print(f"Verifying {name}")
print(f"{'='*50}")
# Get a single batch
images, labels = next(iter(dataloader)) # (#1:Get first batch)
# Check shapes
print(f"Batch shape: {images.shape}") # (#2:Expected: (B, C, H, W))
print(f"Labels shape: {labels.shape}")
# Check data type
print(f"Image dtype: {images.dtype}") # (#3:Should be float32)
print(f"Labels dtype: {labels.dtype}")
# Check value ranges
print(f"Image min: {images.min():.4f}") # (#4:Check min value)
print(f"Image max: {images.max():.4f}") # (#5:Check max value)
print(f"Image mean: {images.mean():.4f}") # (#6:Check mean)
print(f"Image std: {images.std():.4f}") # (#7:Check std)
# Verify normalization (should be approximately mean=0, std=1)
per_channel_mean = images.mean(dim=[0, 2, 3]) # (#8:Per-channel statistics)
per_channel_std = images.std(dim=[0, 2, 3])
print(f"Per-channel mean: {per_channel_mean}")
print(f"Per-channel std: {per_channel_std}")
# Memory info
print(f"Batch memory: {images.element_size() * images.nelement() / 1e6:.2f} MB") # (#9:Memory usage)
return images, labels
train_images, train_labels = verify_dataloader(train_loader, "Training DataLoader")
val_images, val_labels = verify_dataloader(val_loader, "Validation DataLoader")
Expected: Images should have shape (32, 3, 224, 224) and values roughly centered around 0 after normalization.
Step 10: Benchmark Augmentation Speed
Measure the performance of your augmentation pipeline:
import time
def benchmark_augmentation(transform, image, n_iterations=1000):
"""Benchmark augmentation speed."""
# Warmup
for _ in range(10): # (#1:Warmup iterations)
transform(image=image)
# Benchmark
start_time = time.time() # (#2:Start timer)
for _ in range(n_iterations):
transform(image=image)
elapsed = time.time() - start_time # (#3:Calculate elapsed time)
images_per_second = n_iterations / elapsed # (#4:Calculate throughput)
ms_per_image = (elapsed / n_iterations) * 1000 # (#5:Calculate latency)
print(f"Processed {n_iterations} images in {elapsed:.2f}s")
print(f"Throughput: {images_per_second:.1f} images/second")
print(f"Latency: {ms_per_image:.2f} ms/image")
return images_per_second
# Benchmark different transforms
print("Training transform benchmark:")
train_speed = benchmark_augmentation(train_transform, image) # (#6:Benchmark training transforms)
print("\nValidation transform benchmark:")
val_speed = benchmark_augmentation(val_transform, image) # (#7:Benchmark validation transforms)
print(f"\nValidation is {val_speed/train_speed:.1f}x faster than training") # (#8:Compare speeds)
Performance Tip: Albumentations is optimized with OpenCV and NumPy, making it significantly faster than PIL-based alternatives.
Expected Output
After completing this practical work, you should have:
- A complete augmentation pipeline combining geometric and photometric transforms
- Training and validation transform pipelines with appropriate differences
- A visualization grid showing augmented versions of your images
- A working PyTorch Dataset class with integrated Albumentations transforms
- DataLoaders producing batches with correct shapes:
(batch_size, 3, 224, 224) - Verified value ranges (normalized values roughly between -3 and 3)
- Performance benchmarks showing augmentation throughput
Sample Output
Verifying Training DataLoader
==================================================
Batch shape: torch.Size([32, 3, 224, 224])
Labels shape: torch.Size([32])
Image dtype: torch.float32
Labels dtype: torch.int64
Image min: -2.1179
Image max: 2.6400
Image mean: 0.0234
Image std: 1.0156
Per-channel mean: tensor([-0.0312, 0.0456, 0.0348])
Per-channel std: tensor([0.9987, 1.0234, 1.0247])
Batch memory: 19.27 MB
Training transform benchmark:
Processed 1000 images in 2.34s
Throughput: 427.4 images/second
Latency: 2.34 ms/image
Deliverables
- Jupyter Notebook: Complete notebook with all code cells executed and outputs visible
- Augmentation Visualization Grid: Saved image file (
augmentation_grid.png) showing 9 augmented versions - Custom Dataset Class: Reusable
ImageDatasetclass with proper documentation
Bonus Challenges
- Implement Mixup Augmentation: Create a custom collate function that performs Mixup between pairs of images in a batch
- Test CutMix: Implement CutMix augmentation that cuts and pastes patches between training images
- Compare with torchvision transforms: Benchmark the same augmentations using
torchvision.transformsand compare speed and memory usage - AutoAugment: Explore Albumentations' AutoAugment policies for automated augmentation selection
Mixup Implementation Hint
def mixup_data(x, y, alpha=0.2):
"""Apply Mixup augmentation to a batch."""
lam = np.random.beta(alpha, alpha) # (#1:Sample mixing coefficient)
batch_size = x.size(0)
index = torch.randperm(batch_size) # (#2:Random permutation for pairing)
mixed_x = lam * x + (1 - lam) * x[index] # (#3:Mix images)
y_a, y_b = y, y[index] # (#4:Keep both labels)
return mixed_x, y_a, y_b, lam # (#5:Return mixed data)