Working with REST APIs

Data Formats

Session 4

Fetching and integrating data from web APIs for machine learning projects

Client API

Introduction to Web APIs

Application Programming Interfaces for the web

What is an API?

Application Programming Interface - A contract defining how software components interact

Analogy: Like a restaurant menu that defines available operations and expected responses

Web API

  • An API accessed over HTTP/HTTPS
  • Enables remote data retrieval
  • Language-agnostic communication
  • Structured data formats (JSON, XML)

REST APIs

  • REpresentational State Transfer
  • Popular architectural style
  • Stateless client-server communication
  • Resource-based URLs
Why use APIs? Fetch data programmatically without manual downloads or web scraping.

REST Request/Response Flow

Understanding the communication cycle

1
Request
GET /users
2
API
Query DB
3
Database
Return data
4
Response
JSON (200 OK)
5
Parse
Extract data
6
DataFrame
Analysis

Client (Python script) sends HTTP request → API server processes → Database queries data → API returns JSON response → Client parses and creates DataFrame

How REST APIs Work

Key concepts and HTTP methods

Resources

Data entities represented by URLs

Example: /users, /posts, /comments

Endpoints

URLs that represent specific resources

Example: https://api.example.com/users

HTTP Methods (Verbs)

GETRetrieve data (read)
POSTCreate new data
PUTUpdate existing data
DELETERemove data
Focus: For data science, we primarily use GET to retrieve data.

Making Your First API Request

Using Python's requests library

The requests Library

Python's most popular HTTP library for API interactions

Installation: pip install requests

import requests

# Define the API endpoint URL
url = "https://jsonplaceholder.typicode.com/posts/1"

# Send a GET request
response = requests.get(url)

# The response object contains the server's reply
print(response.status_code)  # e.g., 200
print(response.json())       # The data in JSON format
Result: response.json() automatically parses JSON into Python dictionaries and lists.

Understanding the Response

HTTP status codes and response bodies

HTTP Status Codes indicate the result of your request

Success Codes

200 OKRequest successful
201 CreatedResource created

Client Error Codes

400 Bad RequestInvalid request
401 UnauthorizedAuth required
403 ForbiddenNo permission
404 Not FoundResource missing

Server Error Codes

500 Internal ErrorServer problem
503 UnavailableService down

Response Body

The actual data, typically in JSON format with key-value pairs

Exercise 1: Simple GET Request

Fetch user data from a public API

Your Goal

Fetch a list of users from https://jsonplaceholder.typicode.com/users

1 Write Script

Use Python's requests library

2 Make Request

Send a GET request to the endpoint

3 Check Status

Verify status code is 200 (success)

4 Extract Data

Print the name of the first user

Hint: response.json()[0]['name'] gets the first user's name.

Working with API Parameters

Path and query parameters

Path Parameters

Part of the URL path to identify a specific resource

Example:

/users/1

/posts/42/comments

Query Parameters

Key-value pairs at the end of URL to filter or sort

Example:

/posts?userId=1

/users?role=admin&limit=10

Passing Query Parameters with requests

import requests

# Method 1: Include in URL string
response = requests.get('https://api.example.com/posts?userId=1')

# Method 2: Use params parameter (cleaner, handles encoding)
params = {'userId': 1, 'limit': 10}
response = requests.get('https://api.example.com/posts', params=params)

print(response.url)  # See the final URL with parameters

Exercise 2: Filtering API Results

Using query parameters

Your Goal

Fetch comments for a specific post using https://jsonplaceholder.typicode.com/comments

1 Query Parameter

Use postId=1 to filter comments

2 Write Script

Use the params argument in requests

3 Count Results

Print the number of comments received

4 Extract Email

Print the email of the first commenter

Hint: len(response.json()) gives you the count.

Handling Authentication

Securing API access

Why authenticate? To identify clients, enforce security, and implement rate limiting

API Keys

A unique string sent with each request

Common method: Header-based

headers = {
    'Authorization': 'Bearer YOUR_API_KEY'
}
response = requests.get(url, headers=headers)

OAuth Tokens

For delegated access (e.g., "Login with Google")

More complex: Multi-step flow

  • Request token from auth server
  • User grants permission
  • Use token in API requests
Security: Never commit API keys to version control. Use environment variables or config files.

Exercise 3: Error Handling

Writing robust API code

Your Goal

Write code that gracefully handles API errors

Task Steps

  • Request a non-existent post: /posts/99999
  • Check if status_code is 200
  • If not, print an error message
  • Wrap in try...except for network errors

Example Pattern

try:
    response = requests.get(url)
    if response.status_code == 200:
        data = response.json()
    else:
        print(f"Error: {response.status_code}")
except requests.exceptions.RequestException as e:
    print(f"Network error: {e}")
Best practice: Always check status codes before processing response data.

Integrating with Orange Data Mining

Convert API data to Orange tables

Use the Python Script widget to fetch API data and convert to Orange.data.Table

from Orange.data import Table, Domain, ContinuousVariable, StringVariable
import requests

# 1. Fetch data from API
response = requests.get('https://jsonplaceholder.typicode.com/users')
users_data = response.json()

# 2. Define the domain (columns)
domain = Domain([
    StringVariable('name'),
    StringVariable('email'),
    StringVariable('city')
])

# 3. Extract data into list format
data = [
    [u['name'], u['email'], u['address']['city']]
    for u in users_data
]

# 4. Create Orange table
out_data = Table.from_list(domain, data)
Remember: Store the result in out_data variable to pass to connected widgets.

Final Exercise: World Countries Data

Real-world API integration

Challenge: RestCountries API

Fetch real-world country data and visualize it in Orange

API: RestCountries

Endpoint: https://restcountries.com/v3.1/all

1 Fetch Data

Use requests to get all countries

2 Define Domain

name, region, population, area

3 Parse JSON

Navigate nested JSON structure

4 Create Table

Assign to out_data

Bonus: Connect output to a GeoMap widget and visualize countries by population!

Key Takeaways

1

APIs are structured. REST APIs provide predictable, documented access to data—better than web scraping.

2

HTTP status codes matter. Always check the response status before processing data.

3

Parameters filter results. Use query and path parameters to get exactly the data you need.

4

Error handling is essential. Network failures and API errors will happen—code defensively.

Next: Learn web scraping for when APIs aren't available.

Slide Overview