Data Formats
Session 4
Fetching and integrating data from web APIs for machine learning projects
2026 WayUp
Application Programming Interfaces for the web
Application Programming Interface - A contract defining how software components interact
Analogy: Like a restaurant menu that defines available operations and expected responses
Understanding the communication cycle
Client (Python script) sends HTTP request → API server processes → Database queries data → API returns JSON response → Client parses and creates DataFrame
Key concepts and HTTP methods
Data entities represented by URLs
Example: /users, /posts, /comments
URLs that represent specific resources
Example: https://api.example.com/users
GET | Retrieve data (read) |
POST | Create new data |
PUT | Update existing data |
DELETE | Remove data |
GET to retrieve data.
Using Python's requests library
Python's most popular HTTP library for API interactions
Installation: pip install requests
import requests
# Define the API endpoint URL
url = "https://jsonplaceholder.typicode.com/posts/1"
# Send a GET request
response = requests.get(url)
# The response object contains the server's reply
print(response.status_code) # e.g., 200
print(response.json()) # The data in JSON format
response.json() automatically parses JSON into Python dictionaries and lists.
HTTP status codes and response bodies
HTTP Status Codes indicate the result of your request
| 200 OK | Request successful |
| 201 Created | Resource created |
| 400 Bad Request | Invalid request |
| 401 Unauthorized | Auth required |
| 403 Forbidden | No permission |
| 404 Not Found | Resource missing |
| 500 Internal Error | Server problem |
| 503 Unavailable | Service down |
The actual data, typically in JSON format with key-value pairs
Fetch user data from a public API
Fetch a list of users from https://jsonplaceholder.typicode.com/users
Use Python's requests library
Send a GET request to the endpoint
Verify status code is 200 (success)
Print the name of the first user
response.json()[0]['name'] gets the first user's name.
Path and query parameters
Part of the URL path to identify a specific resource
Example:
/users/1
/posts/42/comments
Key-value pairs at the end of URL to filter or sort
Example:
/posts?userId=1
/users?role=admin&limit=10
import requests
# Method 1: Include in URL string
response = requests.get('https://api.example.com/posts?userId=1')
# Method 2: Use params parameter (cleaner, handles encoding)
params = {'userId': 1, 'limit': 10}
response = requests.get('https://api.example.com/posts', params=params)
print(response.url) # See the final URL with parameters
Using query parameters
Fetch comments for a specific post using https://jsonplaceholder.typicode.com/comments
Use postId=1 to filter comments
Use the params argument in requests
Print the number of comments received
Print the email of the first commenter
len(response.json()) gives you the count.
Securing API access
Why authenticate? To identify clients, enforce security, and implement rate limiting
A unique string sent with each request
Common method: Header-based
headers = {
'Authorization': 'Bearer YOUR_API_KEY'
}
response = requests.get(url, headers=headers)
For delegated access (e.g., "Login with Google")
More complex: Multi-step flow
Writing robust API code
Write code that gracefully handles API errors
/posts/99999status_code is 200try...except for network errorstry:
response = requests.get(url)
if response.status_code == 200:
data = response.json()
else:
print(f"Error: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"Network error: {e}")
Convert API data to Orange tables
Use the Python Script widget to fetch API data and convert to Orange.data.Table
from Orange.data import Table, Domain, ContinuousVariable, StringVariable
import requests
# 1. Fetch data from API
response = requests.get('https://jsonplaceholder.typicode.com/users')
users_data = response.json()
# 2. Define the domain (columns)
domain = Domain([
StringVariable('name'),
StringVariable('email'),
StringVariable('city')
])
# 3. Extract data into list format
data = [
[u['name'], u['email'], u['address']['city']]
for u in users_data
]
# 4. Create Orange table
out_data = Table.from_list(domain, data)
out_data variable to pass to connected widgets.
Real-world API integration
Fetch real-world country data and visualize it in Orange
API: RestCountries
Endpoint: https://restcountries.com/v3.1/all
Use requests to get all countries
name, region, population, area
Navigate nested JSON structure
Assign to out_data
1
APIs are structured. REST APIs provide predictable, documented access to data—better than web scraping.
2
HTTP status codes matter. Always check the response status before processing data.
3
Parameters filter results. Use query and path parameters to get exactly the data you need.
4
Error handling is essential. Network failures and API errors will happen—code defensively.