Getting Started with Computer Vision
Set up your environment and explore the fundamentals of image processing with OpenCV and MNIST dataset
Objectives
By the end of this practical work, you will be able to:
- Set up a development environment for computer vision projects
- Load and display images using OpenCV
- Explore the MNIST dataset and understand its structure
- Understand how images are represented as numerical arrays
Prerequisites
- Python 3.8 or higher installed
- Basic Python programming knowledge
- Familiarity with NumPy arrays (helpful but not required)
Install the required packages:
pip install opencv-python numpy matplotlib tensorflow
Instructions
Step 1: Environment Setup
Choose one of the following options to set up your development environment:
Option A: Google Colab (Recommended for beginners)
- Go to Google Colab
- Create a new notebook
- All required libraries are pre-installed
Option B: Local Environment with Virtual Environment
# Create a new project directory
mkdir computer-vision-lab
cd computer-vision-lab
# Create and activate virtual environment
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install required packages
pip install opencv-python numpy matplotlib tensorflow
Tip: Using a virtual environment keeps your project dependencies isolated and prevents conflicts with other Python projects.
Step 2: Load and Display an Image with OpenCV
Create a new Python file or notebook cell and add the following code:
import cv2 # (#1:Import OpenCV library)
import matplotlib.pyplot as plt # (#2:Import matplotlib for displaying images)
import numpy as np # (#3:Import NumPy for array operations)
# Download a sample image or use your own
# For this example, we'll create a simple gradient image
image = np.zeros((256, 256, 3), dtype=np.uint8) # (#4:Create a black image)
# Create a gradient effect
for i in range(256):
image[i, :, 0] = i # (#5:Blue channel gradient)
image[:, i, 1] = i # (#6:Green channel gradient)
# Display the image
plt.figure(figsize=(8, 8)) # (#7:Set figure size)
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB)) # (#8:Convert BGR to RGB for matplotlib)
plt.title('Generated Gradient Image')
plt.axis('off') # (#9:Hide axis)
plt.show()
Info: OpenCV uses BGR (Blue, Green, Red) color ordering by default, while matplotlib expects RGB. Always convert when displaying OpenCV images with matplotlib.
Step 3: Explore Image Properties
Understand how images are represented as numerical arrays:
# Explore image properties
print(f"Image shape: {image.shape}") # (#1:Height, Width, Channels)
print(f"Image data type: {image.dtype}") # (#2:Data type - usually uint8)
print(f"Min pixel value: {image.min()}") # (#3:Minimum value in the array)
print(f"Max pixel value: {image.max()}") # (#4:Maximum value in the array)
print(f"Image size in bytes: {image.nbytes}") # (#5:Memory footprint)
# Access individual pixels
print(f"\nPixel at (100, 100): {image[100, 100]}") # (#6:BGR values at specific location)
print(f"Blue channel value: {image[100, 100, 0]}")
print(f"Green channel value: {image[100, 100, 1]}")
print(f"Red channel value: {image[100, 100, 2]}")
Expected output: You should see the image dimensions (256, 256, 3), dtype of uint8, and pixel values ranging from 0 to 255.
Step 4: Load MNIST Dataset with Keras
Load the famous MNIST handwritten digits dataset:
from tensorflow.keras.datasets import mnist # (#1:Import MNIST from Keras)
# Load the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data() # (#2:Load training and test sets)
# Explore dataset dimensions
print("Training set:")
print(f" Images shape: {X_train.shape}") # (#3:60,000 images of 28x28 pixels)
print(f" Labels shape: {y_train.shape}") # (#4:60,000 labels)
print("\nTest set:")
print(f" Images shape: {X_test.shape}") # (#5:10,000 images)
print(f" Labels shape: {y_test.shape}")
print(f"\nPixel value range: {X_train.min()} to {X_train.max()}") # (#6:Grayscale values 0-255)
print(f"Label values: {np.unique(y_train)}") # (#7:Digits 0-9)
Note: The first time you run this, TensorFlow will download the MNIST dataset (approximately 11 MB).
Images shape: (60000, 28, 28)
Labels shape: (60000,)
Test set:
Images shape: (10000, 28, 28)
Labels shape: (10000,)
Pixel value range: 0 to 255
Label values: [0 1 2 3 4 5 6 7 8 9]
Step 5: Visualize MNIST Samples Grid
Create a 4x4 grid to display sample images from the dataset:
# Create a 4x4 grid of sample images
fig, axes = plt.subplots(4, 4, figsize=(10, 10)) # (#1:Create subplot grid)
for i, ax in enumerate(axes.flat): # (#2:Iterate through all subplots)
ax.imshow(X_train[i], cmap='gray') # (#3:Display image in grayscale)
ax.set_title(f'Label: {y_train[i]}', fontsize=12) # (#4:Show the digit label)
ax.axis('off') # (#5:Hide axes for cleaner look)
plt.suptitle('MNIST Dataset - First 16 Samples', fontsize=16) # (#6:Add main title)
plt.tight_layout() # (#7:Adjust spacing)
plt.show()
Step 6: Calculate and Plot Class Distribution
Analyze the distribution of digits in the training set:
# Calculate class distribution
unique, counts = np.unique(y_train, return_counts=True) # (#1:Count occurrences of each digit)
# Create histogram
plt.figure(figsize=(10, 6))
plt.bar(unique, counts, color='steelblue', edgecolor='black') # (#2:Create bar chart)
plt.xlabel('Digit Class', fontsize=12)
plt.ylabel('Number of Samples', fontsize=12)
plt.title('MNIST Training Set - Class Distribution', fontsize=14)
plt.xticks(unique) # (#3:Show all digit labels on x-axis)
# Add count labels on top of bars
for i, (digit, count) in enumerate(zip(unique, counts)): # (#4:Annotate each bar)
plt.text(digit, count + 100, str(count), ha='center', fontsize=10)
plt.tight_layout()
plt.show()
# Print statistics
print("Class Distribution Summary:")
for digit, count in zip(unique, counts):
percentage = count / len(y_train) * 100
print(f" Digit {digit}: {count:,} samples ({percentage:.1f}%)")
Step 7: Display Samples from Each Class
Show one example of each digit (0-9):
# Display one sample from each class
fig, axes = plt.subplots(2, 5, figsize=(15, 6)) # (#1:2 rows, 5 columns for digits 0-9)
for digit in range(10): # (#2:Loop through digits 0-9)
# Find the first occurrence of this digit
idx = np.where(y_train == digit)[0][0] # (#3:Get index of first matching sample)
# Calculate subplot position
row = digit // 5 # (#4:Row index)
col = digit % 5 # (#5:Column index)
# Display the image
axes[row, col].imshow(X_train[idx], cmap='gray')
axes[row, col].set_title(f'Digit: {digit}', fontsize=14, fontweight='bold')
axes[row, col].axis('off')
plt.suptitle('One Sample from Each MNIST Class', fontsize=16)
plt.tight_layout()
plt.show()
Warning: Make sure to run the cells in order. The variables X_train and y_train must be loaded before this step.
Expected Output
After completing this practical work, you should have:
- A working development environment with all required libraries
- A gradient image displayed using OpenCV and matplotlib
- Understanding of image dimensions: 60,000 training images, 10,000 test images, each 28x28 pixels
- A 4x4 grid visualization showing sample MNIST digits
- A histogram showing approximately equal distribution across all 10 digit classes (~5,000-6,000 samples each)
- A visualization showing one representative sample from each digit class (0-9)
Success Criteria: All visualizations render correctly, and you can explain what each property (shape, dtype, min, max) tells us about the image data.
Deliverables
- Jupyter Notebook: Complete notebook (.ipynb) with all code cells executed and outputs visible
- Screenshots: Screenshots of the following visualizations:
- 4x4 MNIST samples grid
- Class distribution histogram
- Samples from each class visualization
Bonus Challenges
- Challenge 1: Fashion-MNIST
Repeat all exercises using the Fashion-MNIST dataset instead:
from tensorflow.keras.datasets import fashion_mnist (X_train_fashion, y_train_fashion), (X_test_fashion, y_test_fashion) = fashion_mnist.load_data() # Class names for Fashion-MNIST class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'] - Challenge 2: Mean Pixel Value per Class
Calculate and compare the mean pixel value for each digit class. This can reveal interesting patterns about how different digits are typically drawn:
# Calculate mean pixel value for each class mean_values = [] for digit in range(10): digit_images = X_train[y_train == digit] mean_val = digit_images.mean() mean_values.append(mean_val) print(f"Digit {digit}: Mean pixel value = {mean_val:.2f}") # Visualize as a bar chart plt.bar(range(10), mean_values) plt.xlabel('Digit') plt.ylabel('Mean Pixel Value') plt.title('Mean Pixel Value by Digit Class') plt.show()