← Back to Presentations
Practical Work 1

Getting Started with Computer Vision

Set up your environment and explore the fundamentals of image processing with OpenCV and MNIST dataset

Duration 45 minutes
Difficulty Beginner
Session 1 - Introduction

Objectives

By the end of this practical work, you will be able to:

  • Set up a development environment for computer vision projects
  • Load and display images using OpenCV
  • Explore the MNIST dataset and understand its structure
  • Understand how images are represented as numerical arrays

Prerequisites

  • Python 3.8 or higher installed
  • Basic Python programming knowledge
  • Familiarity with NumPy arrays (helpful but not required)

Install the required packages:

pip install opencv-python numpy matplotlib tensorflow

Instructions

Step 1: Environment Setup

Choose one of the following options to set up your development environment:

Option A: Google Colab (Recommended for beginners)

  1. Go to Google Colab
  2. Create a new notebook
  3. All required libraries are pre-installed

Option B: Local Environment with Virtual Environment

# Create a new project directory
mkdir computer-vision-lab
cd computer-vision-lab

# Create and activate virtual environment
python -m venv venv

# On Windows:
venv\Scripts\activate

# On macOS/Linux:
source venv/bin/activate

# Install required packages
pip install opencv-python numpy matplotlib tensorflow

Tip: Using a virtual environment keeps your project dependencies isolated and prevents conflicts with other Python projects.

Step 2: Load and Display an Image with OpenCV

Create a new Python file or notebook cell and add the following code:

import cv2  # (#1:Import OpenCV library)
import matplotlib.pyplot as plt  # (#2:Import matplotlib for displaying images)
import numpy as np  # (#3:Import NumPy for array operations)

# Download a sample image or use your own
# For this example, we'll create a simple gradient image
image = np.zeros((256, 256, 3), dtype=np.uint8)  # (#4:Create a black image)

# Create a gradient effect
for i in range(256):
    image[i, :, 0] = i  # (#5:Blue channel gradient)
    image[:, i, 1] = i  # (#6:Green channel gradient)

# Display the image
plt.figure(figsize=(8, 8))  # (#7:Set figure size)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))  # (#8:Convert BGR to RGB for matplotlib)
plt.title('Generated Gradient Image')
plt.axis('off')  # (#9:Hide axis)
plt.show()

Info: OpenCV uses BGR (Blue, Green, Red) color ordering by default, while matplotlib expects RGB. Always convert when displaying OpenCV images with matplotlib.

Expected Output: Gradient Image
256x256 pixels - Blue channel increases top-to-bottom, Green channel increases left-to-right

Step 3: Explore Image Properties

Understand how images are represented as numerical arrays:

# Explore image properties
print(f"Image shape: {image.shape}")  # (#1:Height, Width, Channels)
print(f"Image data type: {image.dtype}")  # (#2:Data type - usually uint8)
print(f"Min pixel value: {image.min()}")  # (#3:Minimum value in the array)
print(f"Max pixel value: {image.max()}")  # (#4:Maximum value in the array)
print(f"Image size in bytes: {image.nbytes}")  # (#5:Memory footprint)

# Access individual pixels
print(f"\nPixel at (100, 100): {image[100, 100]}")  # (#6:BGR values at specific location)
print(f"Blue channel value: {image[100, 100, 0]}")
print(f"Green channel value: {image[100, 100, 1]}")
print(f"Red channel value: {image[100, 100, 2]}")

Expected output: You should see the image dimensions (256, 256, 3), dtype of uint8, and pixel values ranging from 0 to 255.

Step 4: Load MNIST Dataset with Keras

Load the famous MNIST handwritten digits dataset:

from tensorflow.keras.datasets import mnist  # (#1:Import MNIST from Keras)

# Load the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()  # (#2:Load training and test sets)

# Explore dataset dimensions
print("Training set:")
print(f"  Images shape: {X_train.shape}")  # (#3:60,000 images of 28x28 pixels)
print(f"  Labels shape: {y_train.shape}")  # (#4:60,000 labels)

print("\nTest set:")
print(f"  Images shape: {X_test.shape}")  # (#5:10,000 images)
print(f"  Labels shape: {y_test.shape}")

print(f"\nPixel value range: {X_train.min()} to {X_train.max()}")  # (#6:Grayscale values 0-255)
print(f"Label values: {np.unique(y_train)}")  # (#7:Digits 0-9)

Note: The first time you run this, TensorFlow will download the MNIST dataset (approximately 11 MB).

Expected Console Output
Training set:
  Images shape: (60000, 28, 28)
  Labels shape: (60000,)

Test set:
  Images shape: (10000, 28, 28)
  Labels shape: (10000,)

Pixel value range: 0 to 255
Label values: [0 1 2 3 4 5 6 7 8 9]

Step 5: Visualize MNIST Samples Grid

Create a 4x4 grid to display sample images from the dataset:

# Create a 4x4 grid of sample images
fig, axes = plt.subplots(4, 4, figsize=(10, 10))  # (#1:Create subplot grid)

for i, ax in enumerate(axes.flat):  # (#2:Iterate through all subplots)
    ax.imshow(X_train[i], cmap='gray')  # (#3:Display image in grayscale)
    ax.set_title(f'Label: {y_train[i]}', fontsize=12)  # (#4:Show the digit label)
    ax.axis('off')  # (#5:Hide axes for cleaner look)

plt.suptitle('MNIST Dataset - First 16 Samples', fontsize=16)  # (#6:Add main title)
plt.tight_layout()  # (#7:Adjust spacing)
plt.show()
Expected Output: 4x4 MNIST Sample Grid
5
Label: 5
0
Label: 0
4
Label: 4
1
Label: 1
9
Label: 9
2
Label: 2
1
Label: 1
3
Label: 3
1
Label: 1
4
Label: 4
3
Label: 3
5
Label: 5
3
Label: 3
6
Label: 6
1
Label: 1
7
Label: 7
First 16 samples from MNIST training set (28x28 grayscale). Your actual digits may differ.

Step 6: Calculate and Plot Class Distribution

Analyze the distribution of digits in the training set:

# Calculate class distribution
unique, counts = np.unique(y_train, return_counts=True)  # (#1:Count occurrences of each digit)

# Create histogram
plt.figure(figsize=(10, 6))
plt.bar(unique, counts, color='steelblue', edgecolor='black')  # (#2:Create bar chart)
plt.xlabel('Digit Class', fontsize=12)
plt.ylabel('Number of Samples', fontsize=12)
plt.title('MNIST Training Set - Class Distribution', fontsize=14)
plt.xticks(unique)  # (#3:Show all digit labels on x-axis)

# Add count labels on top of bars
for i, (digit, count) in enumerate(zip(unique, counts)):  # (#4:Annotate each bar)
    plt.text(digit, count + 100, str(count), ha='center', fontsize=10)

plt.tight_layout()
plt.show()

# Print statistics
print("Class Distribution Summary:")
for digit, count in zip(unique, counts):
    percentage = count / len(y_train) * 100
    print(f"  Digit {digit}: {count:,} samples ({percentage:.1f}%)")
Expected Output: Class Distribution Histogram
5,923
0
6,742
1
5,958
2
6,131
3
5,842
4
5,421
5
5,918
6
6,265
7
5,851
8
5,949
9
Digit Class
MNIST is well-balanced with ~6,000 samples per class. Your exact counts may vary slightly.

Step 7: Display Samples from Each Class

Show one example of each digit (0-9):

# Display one sample from each class
fig, axes = plt.subplots(2, 5, figsize=(15, 6))  # (#1:2 rows, 5 columns for digits 0-9)

for digit in range(10):  # (#2:Loop through digits 0-9)
    # Find the first occurrence of this digit
    idx = np.where(y_train == digit)[0][0]  # (#3:Get index of first matching sample)

    # Calculate subplot position
    row = digit // 5  # (#4:Row index)
    col = digit % 5   # (#5:Column index)

    # Display the image
    axes[row, col].imshow(X_train[idx], cmap='gray')
    axes[row, col].set_title(f'Digit: {digit}', fontsize=14, fontweight='bold')
    axes[row, col].axis('off')

plt.suptitle('One Sample from Each MNIST Class', fontsize=16)
plt.tight_layout()
plt.show()

Warning: Make sure to run the cells in order. The variables X_train and y_train must be loaded before this step.

Expected Output: One Sample from Each Class
0
Digit: 0
1
Digit: 1
2
Digit: 2
3
Digit: 3
4
Digit: 4
5
Digit: 5
6
Digit: 6
7
Digit: 7
8
Digit: 8
9
Digit: 9
2x5 grid showing the first sample found for each digit class. Your samples will look different.

Expected Output

After completing this practical work, you should have:

  • A working development environment with all required libraries
  • A gradient image displayed using OpenCV and matplotlib
  • Understanding of image dimensions: 60,000 training images, 10,000 test images, each 28x28 pixels
  • A 4x4 grid visualization showing sample MNIST digits
  • A histogram showing approximately equal distribution across all 10 digit classes (~5,000-6,000 samples each)
  • A visualization showing one representative sample from each digit class (0-9)

Success Criteria: All visualizations render correctly, and you can explain what each property (shape, dtype, min, max) tells us about the image data.

Deliverables

  • Jupyter Notebook: Complete notebook (.ipynb) with all code cells executed and outputs visible
  • Screenshots: Screenshots of the following visualizations:
    • 4x4 MNIST samples grid
    • Class distribution histogram
    • Samples from each class visualization

Bonus Challenges

  • Challenge 1: Fashion-MNIST

    Repeat all exercises using the Fashion-MNIST dataset instead:

    from tensorflow.keras.datasets import fashion_mnist
    (X_train_fashion, y_train_fashion), (X_test_fashion, y_test_fashion) = fashion_mnist.load_data()
    
    # Class names for Fashion-MNIST
    class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
                   'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
  • Challenge 2: Mean Pixel Value per Class

    Calculate and compare the mean pixel value for each digit class. This can reveal interesting patterns about how different digits are typically drawn:

    # Calculate mean pixel value for each class
    mean_values = []
    for digit in range(10):
        digit_images = X_train[y_train == digit]
        mean_val = digit_images.mean()
        mean_values.append(mean_val)
        print(f"Digit {digit}: Mean pixel value = {mean_val:.2f}")
    
    # Visualize as a bar chart
    plt.bar(range(10), mean_values)
    plt.xlabel('Digit')
    plt.ylabel('Mean Pixel Value')
    plt.title('Mean Pixel Value by Digit Class')
    plt.show()

Resources