Computer Vision for Business

Session 3: Hands-on Cloud Vision APIs

Building CV applications with cloud APIs from Google, AWS, Azure, and Anthropic

Session 3: Learning Objectives

Today's goals (3 hours)

Cloud

Use Cloud APIs

Effectively leverage cloud-based CV services

Compare

Compare Providers

Understand offerings from major providers

Build

Build Applications

Create functional CV-powered tools

Session Structure: Overview (45min) + Hands-on Labs (1h45) + Optimization (30min)

Session 3 Roadmap

Our journey today

1

Cloud API Overview

30min

2

Provider Comparison

15min

3

Lab 1: OCR

25min

4

Lab 2: Products

30min

5

Lab 3: Scenes

30min

6

Lab 4: Full App

20min

7

Cost Optimization

30min

Why Use Cloud Vision APIs?

Benefits over building from scratch

1 No ML Expertise

Pre-trained models, production-ready from day one

No need to hire data scientists or ML engineers

2 Fast Time-to-Market

Minutes to integrate, instant scaling

Focus on business logic, not infrastructure

3 Cost Efficiency

Pay per use, no infrastructure costs

Start small, scale automatically

4 Enterprise Ready

SLAs, security certifications

Compliance and support included

How Cloud Vision APIs Work

The request-response cycle

1

Load & Encode

Your App

2

HTTP POST

API Request

3

Process Image

AI Model

4

Extract Features

Analysis

5

Generate Analysis

Results

6

JSON Response

API Reply

7

Parse & Use

Your App

Key Insight: All cloud vision APIs follow this pattern - differences are in pricing, features, and model quality

The Cloud Vision Landscape

Major players in the ecosystem

Traditional CV APIs

Specialized, predefined tasks with structured outputs

Google Cloud Vision Amazon Rekognition Azure Computer Vision

Multimodal LLMs

Open-ended reasoning, natural language interaction

Claude Vision GPT-4 Vision Google Gemini

Specialized Services

Purpose-built for specific use cases

Document AI Face APIs AutoML / Custom Labels

Google Cloud Vision

Comprehensive traditional CV API

Strengths

OCR: Industry-leading text extraction
Labels: 10,000+ object categories
Web Entities: Find similar images online
Document AI: Structured extraction

Best For

Document processing
Content moderation
Landmark recognition
Product search

Pricing: ~$1.50/1,000 images

Amazon Rekognition

AWS's computer vision service

Strengths

Face Analysis: Emotions, age, attributes
Celebrity Recognition: 100K+ celebrities
Video Analysis: Real-time streaming
Custom Labels: Train your own models

Best For

Security & surveillance
Media & entertainment
User verification
Video content analysis

Pricing: Tier-based, volume discounts

Azure Computer Vision

Microsoft's vision intelligence

Strengths

Dense Captioning: Detailed descriptions
Spatial Analysis: People counting, zones
Read API: Multi-language OCR
Image Analysis 4.0: GPT-4 powered

Best For

Accessibility (alt text)
Retail analytics
Multi-language documents
Enterprise integration

Pricing: Competitive, good free tier

Multimodal LLMs for Vision

The new generation: Claude, GPT-4V, Gemini

Flexible Input

Any image type Natural language prompt Context & examples

Advanced Reasoning

Understands context Follows instructions Multi-step reasoning

Structured Output

JSON on request Natural language Code generation

Key Advantage: No predefined tasks - ask anything about images!

Provider Comparison

Side-by-side comparison

Provider	Type	Best Features	Pricing Model
Google Cloud Vision	Traditional	OCR, labels, web entities	Per image
Amazon Rekognition	Traditional	Face, video, custom labels	Tiered
Azure CV	Traditional	Captioning, spatial, reading	Per transaction
Claude/GPT-4V/Gemini	Multimodal LLM	Open-ended reasoning	Per token

Choosing the Right Service

Match your needs to the right API

Document/OCR Processing

Recommended: Google Cloud Vision, Azure Read API

Best for receipts, invoices, forms, ID cards

Face Analysis & Verification

Recommended: Amazon Rekognition

Best for security, user verification, demographics

Open-ended Analysis

Recommended: Claude Vision, GPT-4 Vision

Best for complex reasoning, custom tasks, products

Video & Streaming

Recommended: Amazon Rekognition Video

Best for surveillance, live streams, media analysis

Use Case to Service Mapping

Real-world applications matched to APIs

Use Case	Recommended Service
Receipt/Invoice OCR	Google Document AI, Azure Form Recognizer
Face verification	Amazon Rekognition Face
Product analysis	Claude Vision, GPT-4 Vision
Content moderation	Google SafeSearch, Amazon Moderation
Retail foot traffic	Azure Spatial Analysis
Complex reasoning	Claude Vision, Gemini Pro Vision

Hands-on Labs Overview

What we'll build today

1 Lab 1: OCR

Receipt Processing → Extract text → Parse structure

2 Lab 2: Products

Product Cataloging → Analyze attributes → Generate metadata

3 Lab 3: Scenes

Retail Analysis → Understand context → Business insights

4 Lab 4: Full App

ExpenseTracker → Combine skills → Production code

Lab Setup

Environment preparation

# Install required packages
# pip install anthropic pillow requests

import os
from pathlib import Path
import base64

# Create working directory
WORK_DIR = Path("cv_workshop")
WORK_DIR.mkdir(exist_ok=True)

# Sample images for testing
SAMPLE_IMAGES = [
    "receipt.jpg",      # For OCR
    "products.jpg",     # For object detection
    "storefront.jpg",   # For scene analysis
    "document.pdf"      # For document processing
]

Image Encoding Helper

Converting images to Base64 for APIs

1

Image File

JPG/PNG/WebP

2

Read Binary

rb mode

3

Base64 Encode

standard_b64encode

4

UTF-8 String

Ready for API

def encode_image(image_path: str) -> str:
    """Encode image to base64 string for API calls."""
    with open(image_path, "rb") as image_file:
        return base64.standard_b64encode(
            image_file.read()
        ).decode("utf-8")

# Usage
image_data = encode_image("receipt.jpg")
# Returns: "/9j/4AAQSkZJRg..." (base64 string)

Lab 1: Document Processing with OCR

Extract structured data from receipts

1

Receipt Image

Input

2

Claude Vision

API Call

3

Analyze

Extract

4

Structured JSON

Output

Extracts

Vendor Name
Date
Line Items
Total/Tax

Why Claude for OCR? Handles messy layouts, understands context, outputs clean JSON

Lab 1: OCR Function Setup

import anthropic
from pathlib import Path

def extract_receipt_data(image_path: str) -> dict:
    """Extract structured data from a receipt image."""
    client = anthropic.Anthropic()

    # Determine media type from extension
    ext = Path(image_path).suffix.lower()
    media_types = {
        '.jpg': 'image/jpeg',
        '.jpeg': 'image/jpeg',
        '.png': 'image/png',
        '.webp': 'image/webp'
    }
    media_type = media_types.get(ext, 'image/jpeg')

    # Encode image to base64
    image_data = encode_image(image_path)

Lab 1: Making the API Call

    # Call Claude Vision API
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": media_type,
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "..."  # Prompt on next slide
                }
            ]
        }]
    )

Lab 1: The OCR Prompt

Structured prompts get structured results

# The prompt that extracts structured data
prompt = """
Analyze this receipt and extract as JSON:
{
    "vendor_name": "",
    "date": "",
    "items": [
        {"name": "", "price": 0.00}
    ],
    "subtotal": 0.00,
    "tax": 0.00,
    "total": 0.00
}
Return only valid JSON, no additional text.
If any field is unclear, use null.
"""

Pro Tip: Show the exact JSON schema you want - Claude follows it precisely

Lab 1: Sample Output

What Claude returns from a restaurant receipt

{
    "vendor_name": "Café de Flore",
    "date": "2024-03-15",
    "items": [
        {"name": "Espresso", "price": 4.50},
        {"name": "Croissant", "price": 3.20},
        {"name": "Orange Juice", "price": 5.00}
    ],
    "subtotal": 12.70,
    "tax": 1.27,
    "total": 13.97
}

Lab 2: Product Analysis & Cataloging

Automate e-commerce product metadata

1

Product Photo

Input

2

Claude Vision

Analyze

3

Extract Data

Multiple

4

Listing Ready

Output

Category Classification

Automatic product categorization

Attribute Extraction

Colors, materials, sizes

Marketing Copy

Compelling descriptions

SEO Keywords

Search optimization

Lab 2: Product Analysis Code

def analyze_product(image_path: str) -> dict:
    """Analyze a product image for e-commerce cataloging."""
    client = anthropic.Anthropic()
    image_data = encode_image(image_path)

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1500,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_data
                }},
                {"type": "text", "text": """..."""}  # Next slide
            ]
        }]
    )
    return response.content[0].text

Lab 2: Product Analysis Prompt

prompt = """
Analyze this product for e-commerce listing.
Return JSON with:

{
    "category": "Main category",
    "subcategory": "Specific subcategory",
    "attributes": {
        "colors": [],
        "materials": [],
        "size_estimate": ""
    },
    "marketing_title": "Max 10 words, compelling",
    "description": "2-3 sentences, benefit-focused",
    "search_keywords": ["10", "relevant", "keywords"]
}

Be accurate about visible attributes.
Don't guess what you can't see clearly.
"""

Lab 2: Sample Product Output

{
    "category": "Fashion",
    "subcategory": "Women's Handbags",
    "attributes": {
        "colors": ["cognac brown", "gold accents"],
        "materials": ["leather", "metal hardware"],
        "size_estimate": "medium, ~30cm width"
    },
    "marketing_title": "Elegant Cognac Leather Tote with Gold Hardware",
    "description": "A sophisticated everyday tote crafted from
    premium cognac leather. Features secure zip closure and
    elegant gold-tone hardware for timeless style.",
    "search_keywords": ["leather tote", "brown handbag",
    "cognac bag", "gold hardware", "women's purse", ...]
}

Lab 2: Batch Product Processing

Process entire catalogs efficiently

1

Product Folder

Input

2

For Each Image

Loop

3

Analyze

API Call

4

Complete Catalog

Output

def process_product_catalog(image_folder: str) -> list:
    """Process all product images in a folder."""
    results = []
    image_extensions = {'.jpg', '.jpeg', '.png', '.webp'}

    for image_path in Path(image_folder).iterdir():
        if image_path.suffix.lower() in image_extensions:
            print(f"Processing: {image_path.name}")
            analysis = analyze_product(str(image_path))
            results.append({
                "filename": image_path.name,
                "analysis": analysis
            })
    return results

Lab 3: Scene Understanding

Analyze retail environments for business insights

Environment Analysis

Store type, size, layout, ambiance

Customer Insights

Count, demographics, behaviors

Merchandising Review

Displays, signage, product visibility

Operations Assessment

Cleanliness, safety, staff, queues

Lab 3: Scene Analysis Code

def analyze_retail_scene(image_path: str) -> dict:
    """Analyze a retail environment for insights."""
    client = anthropic.Anthropic()
    image_data = encode_image(image_path)

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {
                    "type": "base64",
                    "media_type": "image/jpeg",
                    "data": image_data
                }},
                {"type": "text", "text": """..."""}
            ]
        }]
    )
    return response.content[0].text

Lab 3: Scene Analysis Prompt

prompt = """
Analyze this retail environment comprehensively:

1. SCENE: Store type, estimated size, time of day,
   overall ambiance

2. CUSTOMERS: Approximate count, visible demographics,
   current activities and behaviors

3. MERCHANDISING: Display quality, signage effectiveness,
   product visibility, promotional materials

4. OPERATIONS: Cleanliness score (1-10), safety hazards,
   staff visibility, queue management

5. RECOMMENDATIONS: List 3 specific, actionable
   improvements with expected business impact

Be specific, quantitative where possible, and
business-focused in your analysis.
"""

Lab 3: Business Applications

How retail scene analysis drives value

Store Audits

Compliance checking
Brand standards
Mystery shopping

Operations

Queue monitoring
Staff allocation
Peak hour analysis

Marketing

Display effectiveness
Signage visibility
Promotion impact

Safety

Hazard detection
Crowd density
Emergency paths

Lab 4: Building a Complete Application

ExpenseTracker: CV-powered expense management

1

Receipt Photos

Input

2

OCR Extraction

Process

3

Auto-Categorization

Classify

4

Data Validation

Verify

5

Expense Database

Store

6

Reports & Analytics

Output

Lab 4: ExpenseTracker Architecture

Class-based design for maintainability

ExpenseTracker Class

Properties: expenses: List, client: Anthropic

Public: add_receipt(), get_summary()

Private: _extract_receipt(), _categorize()

Expense Data Structure

Fields: id, timestamp, image_path

Data: category, extracted data dict

Lab 4: ExpenseTracker Class

import json
from datetime import datetime

class ExpenseTracker:
    """Expense tracking using computer vision."""

    def __init__(self):
        self.expenses = []
        self.client = anthropic.Anthropic()

    def add_receipt(self, image_path: str,
                    category: str = None) -> dict:
        """Process a receipt and add to expenses."""
        # Extract data using CV
        extracted = self._extract_receipt(image_path)

        # Parse JSON response
        try:
            receipt_data = json.loads(extracted)
        except json.JSONDecodeError:
            receipt_data = {"raw": extracted, "error": True}

Lab 4: Adding Receipts

        # Create expense record
        expense = {
            "id": len(self.expenses) + 1,
            "timestamp": datetime.now().isoformat(),
            "image_path": image_path,
            "category": category or self._categorize(receipt_data),
            "data": receipt_data
        }
        self.expenses.append(expense)
        return expense

    def _extract_receipt(self, image_path: str) -> str:
        """Use CV to extract receipt data."""
        # Reuse our extract_receipt_data function
        return extract_receipt_data(image_path)

Lab 4: Auto-Categorization

Keyword-based category assignment

    def _categorize(self, data: dict) -> str:
        """Auto-categorize based on vendor name."""
        vendor = data.get("vendor_name", "").lower()

        categories = {
            "restaurant": ["restaurant", "cafe", "pizza",
                          "burger", "sushi", "bistro"],
            "grocery": ["supermarket", "grocery", "carrefour",
                       "auchan", "monoprix"],
            "transport": ["uber", "taxi", "sncf", "ratp"],
            "office": ["staples", "office", "amazon"],
        }

        for cat, keywords in categories.items():
            if any(kw in vendor for kw in keywords):
                return cat
        return "other"

Lab 4: Generating Summaries

    def get_summary(self) -> dict:
        """Generate expense summary by category."""
        by_category = {}

        for expense in self.expenses:
            cat = expense["category"]
            total = expense["data"].get("total", 0) or 0

            if cat not in by_category:
                by_category[cat] = {"count": 0, "total": 0}

            by_category[cat]["count"] += 1
            by_category[cat]["total"] += float(total)

        return {
            "total_expenses": len(self.expenses),
            "by_category": by_category,
            "grand_total": sum(
                c["total"] for c in by_category.values()
            )
        }

Lab 4: Using ExpenseTracker

# Initialize tracker
tracker = ExpenseTracker()

# Add receipts
tracker.add_receipt("lunch_receipt.jpg")
tracker.add_receipt("office_supplies.jpg")
tracker.add_receipt("taxi_receipt.jpg")

# Get summary
summary = tracker.get_summary()
print(f"Total expenses: {summary['total_expenses']}")
print(f"Grand total: €{summary['grand_total']:.2f}")

# Output:
# Total expenses: 3
# Grand total: €127.45
# By category:
#   restaurant: 1 expense, €23.50
#   office: 1 expense, €89.95
#   transport: 1 expense, €14.00

API Performance Comparison

Measuring and comparing API performance

1

Test Images

Input

2

API 1/2/3

Parallel Test

3

Measure

Metrics

4

Best Choice

Decision

Latency

Response time per image

Accuracy

Correctness of results

Cost

Total cost at expected volume

API Benchmarking Framework

import time
from typing import Callable

def benchmark_api(
    api_function: Callable,
    test_images: list,
    runs: int = 3
) -> dict:
    """Benchmark an API across multiple images."""
    results = {
        "timings": [], "successes": 0, "failures": 0
    }

    for image in test_images:
        for _ in range(runs):
            start = time.time()
            try:
                api_function(image)
                results["successes"] += 1
            except Exception as e:
                results["failures"] += 1
            results["timings"].append(time.time() - start)

    results["avg_time"] = sum(results["timings"]) / len(results["timings"])
    return results

Cost Optimization Strategies

Reducing API costs at scale

1

Raw Image

Input

2

Preprocess

Optimize

3

Cache Check

Hit/Miss

4

API Call

If needed

Resize & Compress

Reduce file size

Crop ROI

Focus on relevant areas

Return Cached

Skip duplicate calls

Store Result

Cache for future use

Cost Optimization Techniques

Practical strategies for production

1 Image Preprocessing

Resize to optimal dimensions - many APIs charge by size

Can reduce costs by 50-90% with minimal quality loss

2 Result Caching

Hash images, store results - avoid duplicate processing

Especially valuable for user-uploaded duplicate content

3 Batch Processing

Use batch endpoints for volume discounts when available

Process during off-peak hours for lower rates

Tiered Processing Strategy

Use cheap APIs for filtering, premium for analysis

1

All Images

100%

2

Tier 1: Fast Filter

$0.001/image

3

Quality Check

Pass/Fail

4

Tier 2: Analysis

$0.01/image

Example: Use basic classification to filter 80% of images before expensive multimodal analysis

Approach	Cost per 1000 images
All images through Tier 2	$10.00
Tiered processing (20% pass)	$3.00

Image Preprocessing for Cost Savings

from PIL import Image
import io

def optimize_image(image_path: str,
                   max_size: int = 1024,
                   quality: int = 85) -> bytes:
    """Optimize image size before API call."""
    with Image.open(image_path) as img:
        # Resize if too large
        if max(img.size) > max_size:
            ratio = max_size / max(img.size)
            new_size = tuple(int(d * ratio) for d in img.size)
            img = img.resize(new_size, Image.LANCZOS)

        # Convert to RGB if needed
        if img.mode in ('RGBA', 'P'):
            img = img.convert('RGB')

        # Save to bytes with compression
        buffer = io.BytesIO()
        img.save(buffer, format='JPEG', quality=quality)
        return buffer.getvalue()

# Can reduce file size by 70-90% while maintaining quality

Simple Result Caching

import hashlib
import json
from pathlib import Path

CACHE_DIR = Path("api_cache")
CACHE_DIR.mkdir(exist_ok=True)

def get_cached_or_call(image_path: str, api_func) -> dict:
    """Check cache before calling API."""
    # Generate hash of image content
    with open(image_path, "rb") as f:
        image_hash = hashlib.md5(f.read()).hexdigest()

    cache_file = CACHE_DIR / f"{image_hash}.json"

    # Return cached if exists
    if cache_file.exists():
        return json.loads(cache_file.read_text())

    # Call API and cache result
    result = api_func(image_path)
    cache_file.write_text(json.dumps(result))
    return result

Robust Error Handling

Production-ready API calls

import time
from typing import Optional

def call_api_with_retry(
    func,
    max_retries: int = 3,
    base_delay: float = 1.0
) -> Optional[dict]:
    """Call API with exponential backoff retry."""
    for attempt in range(max_retries):
        try:
            return func()
        except anthropic.RateLimitError:
            delay = base_delay * (2 ** attempt)
            print(f"Rate limited. Waiting {delay}s...")
            time.sleep(delay)
        except anthropic.APIError as e:
            print(f"API error: {e}")
            if attempt == max_retries - 1:
                raise
    return None

Security Best Practices

Protecting your API integration

Credentials Management

Environment variables
Secret managers (AWS Secrets, Azure Key Vault)
Never commit keys to version control

Data Privacy

PII detection and redaction
Image anonymization before API calls
Data retention policies

Network Security

HTTPS only, VPC endpoints, IP allowlisting

Monitoring

Usage alerts, anomaly detection, audit logging

Self-Practice Assignment 3

Duration: 1.5 hours | Deadline: Before Session 4

1 Choose a Use Case (15 min)

Select a business problem from Session 2
Or propose your own application idea

2 Implementation (45 min)

Build a working prototype using cloud APIs
Process at least 5 sample images
Include basic error handling

3 Documentation (30 min)

Document code with comments
Write README explaining usage
Include sample outputs

Deliverable: GitHub repository or zip file with code and documentation

Assignment Project Ideas

Inspiration for your CV application

Business Card Scanner

Contact extraction, CRM integration

Menu Scanner

Price extraction, allergen detection

Plant Identifier

Species recognition, disease detection

Fashion Analyzer

Style classification, similar items search

Session 3 Summary

Key takeaways from today

Cloud APIs Are Powerful

No ML expertise needed, instant production-ready, multiple providers

Choose Wisely

Traditional vs Multimodal, match use case to API, consider cost structure

Build Smart

Structured prompts, error handling, caching & optimization

Practice Makes Perfect

Start simple, iterate quickly, document everything

Looking Ahead: Session 4

Custom Models & Transfer Learning

1

Session 3: Cloud APIs

Today

2

Session 4: Custom Models

Decision criteria for custom models

Transfer Learning

Leverage pre-trained models

PyTorch Training

Hands-on model training

AutoML Platforms

No-code alternatives

Next Session: When and how to build custom CV models for specialized needs