REST API Data Retrieval
Fetch and analyze data from public REST APIs using Python and pandas
Objectives
By the end of this practical work, you will be able to:
- Understand REST API concepts (endpoints, HTTP methods, JSON)
- Make HTTP GET requests using Python's
requestslibrary - Parse JSON responses and extract relevant data
- Handle pagination in API responses
- Transform API data into pandas DataFrames for analysis
- Integrate REST API data with Orange Data Mining
Prerequisites
- Python 3.8+ installed
- Basic understanding of JSON format
- Orange Data Mining (optional, for visualization)
Install required packages:
pip install requests pandas
API Selection
We'll work with the REST Countries API, a free public API with no authentication required:
- REST Countries API - Country information (population, area, languages, etc.)
Note: Public APIs don't require API keys, making them perfect for learning!
Instructions
Step 1: Make Your First API Request
Start by fetching data for a single country:
import requests
# Fetch data for France
url = "https://restcountries.com/v3.1/name/france"
response = requests.get(url)
# Check response status
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers.get('content-type')}")
# Parse JSON response
if response.status_code == 200:
data = response.json()
print(f"Number of results: {len(data)}")
print(f"Type of data: {type(data)}")
else:
print(f"Error: {response.status_code}")
Expected: Status 200, JSON content type, list with 1 country.
Step 2: Explore the JSON Structure
Understand the nested JSON structure:
import json
# Pretty print the first result
country = data[0]
print(json.dumps(country, indent=2)[:1000]) # First 1000 chars
# Access nested data
print(f"\nCountry: {country['name']['common']}")
print(f"Official Name: {country['name']['official']}")
print(f"Capital: {country['capital'][0]}")
print(f"Population: {country['population']:,}")
print(f"Area: {country['area']:,} km²")
print(f"Region: {country['region']}")
print(f"Subregion: {country['subregion']}")
Step 3: Fetch All Countries
Get data for all countries at once:
# Fetch all countries
url = "https://restcountries.com/v3.1/all"
response = requests.get(url)
all_countries = response.json()
print(f"Total countries: {len(all_countries)}")
# Preview first 5 country names
for country in all_countries[:5]:
name = country['name']['common']
pop = country.get('population', 0)
print(f" - {name}: {pop:,} people")
Step 4: Extract Data to Dictionary
Create a function to extract key fields:
def extract_country_data(country):
"""Extract relevant fields from a country JSON object."""
# Handle missing fields gracefully
capitals = country.get('capital', ['N/A'])
capital = capitals[0] if capitals else 'N/A'
languages = country.get('languages', {})
lang_list = list(languages.values()) if languages else []
currencies = country.get('currencies', {})
currency_names = [c.get('name', 'Unknown') for c in currencies.values()]
return {
'name': country['name']['common'],
'official_name': country['name']['official'],
'capital': capital,
'region': country.get('region', 'Unknown'),
'subregion': country.get('subregion', 'Unknown'),
'population': country.get('population', 0),
'area': country.get('area', 0),
'languages': ', '.join(lang_list),
'currencies': ', '.join(currency_names),
'landlocked': country.get('landlocked', False),
'un_member': country.get('unMember', False)
}
# Test with France
france_data = extract_country_data(all_countries[0])
for key, value in france_data.items():
print(f"{key}: {value}")
Step 5: Convert to DataFrame
Process all countries into a pandas DataFrame:
import pandas as pd
# Extract data for all countries
countries_data = [extract_country_data(c) for c in all_countries]
# Create DataFrame
df = pd.DataFrame(countries_data)
# Display basic info
print(f"DataFrame shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nData types:\n{df.dtypes}")
# Preview the data
print(f"\nFirst 10 countries:")
print(df[['name', 'capital', 'population', 'region']].head(10))
Step 6: Basic Data Analysis
Analyze the country data:
# Population statistics
print("=== Population Statistics ===")
print(f"Total world population: {df['population'].sum():,}")
print(f"Average population: {df['population'].mean():,.0f}")
print(f"Median population: {df['population'].median():,.0f}")
# Top 10 most populous countries
print("\n=== Top 10 Most Populous Countries ===")
top_10 = df.nlargest(10, 'population')[['name', 'population', 'region']]
print(top_10.to_string(index=False))
# Countries by region
print("\n=== Countries by Region ===")
region_counts = df['region'].value_counts()
print(region_counts)
# Landlocked countries
landlocked = df[df['landlocked'] == True]
print(f"\n=== Landlocked Countries: {len(landlocked)} ==")
print(landlocked['name'].head(10).tolist())
Step 7: Filter by Region
Use API filtering to get specific regions:
# Fetch only European countries
url = "https://restcountries.com/v3.1/region/europe"
response = requests.get(url)
european_countries = response.json()
print(f"European countries: {len(european_countries)}")
# Create European DataFrame
europe_data = [extract_country_data(c) for c in european_countries]
df_europe = pd.DataFrame(europe_data)
# Analysis
print(f"\nTotal EU population: {df_europe['population'].sum():,}")
print(f"\nLargest European countries by area:")
print(df_europe.nlargest(5, 'area')[['name', 'area', 'population']])
Step 8: Save to CSV
Export the data for later use:
# Save all countries
df.to_csv('countries_data.csv', index=False)
print(f"Saved {len(df)} countries to countries_data.csv")
# Save European countries separately
df_europe.to_csv('european_countries.csv', index=False)
print(f"Saved {len(df_europe)} European countries to european_countries.csv")
# Verify the files
import os
for filename in ['countries_data.csv', 'european_countries.csv']:
size = os.path.getsize(filename)
print(f"{filename}: {size:,} bytes")
Step 9: Integrate with Orange (Optional)
Create an Orange Data Table from the API data:
from Orange.data import Table, Domain, StringVariable, ContinuousVariable, DiscreteVariable
# Define the domain
domain = Domain(
# Continuous (numeric) features
[ContinuousVariable("population"),
ContinuousVariable("area")],
# Target variable (discrete)
[DiscreteVariable("region", values=list(df['region'].unique()))],
# Meta variables (string/text)
[StringVariable("name"),
StringVariable("capital"),
StringVariable("languages"),
DiscreteVariable("landlocked", values=["False", "True"]),
DiscreteVariable("un_member", values=["False", "True"])]
)
# Prepare data as list of lists
data_list = []
for _, row in df.iterrows():
data_list.append([
row['population'],
row['area'],
row['region'],
row['name'],
row['capital'],
row['languages'],
str(row['landlocked']),
str(row['un_member'])
])
# Create Orange table
out_data = Table.from_list(domain, data_list)
print(f"Created Orange table with {len(out_data)} rows")
Success! You can now use this data in Orange widgets for visualization and analysis.
Step 10: Visualization (Optional)
Create visualizations using matplotlib:
import matplotlib.pyplot as plt
# Population by region (bar chart)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Chart 1: Countries per region
region_counts = df['region'].value_counts()
axes[0].bar(region_counts.index, region_counts.values, color='steelblue')
axes[0].set_title('Number of Countries by Region')
axes[0].set_xlabel('Region')
axes[0].set_ylabel('Count')
axes[0].tick_params(axis='x', rotation=45)
# Chart 2: Total population by region
pop_by_region = df.groupby('region')['population'].sum().sort_values(ascending=False)
axes[1].bar(pop_by_region.index, pop_by_region.values / 1e9, color='coral')
axes[1].set_title('Total Population by Region (Billions)')
axes[1].set_xlabel('Region')
axes[1].set_ylabel('Population (Billions)')
axes[1].tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.savefig('countries_analysis.png', dpi=150)
print("Saved visualization to countries_analysis.png")
Expected Output
After completing this practical work, you should have:
- A working Python script that fetches data from REST Countries API
- A CSV file with data for 250+ countries
- Basic statistical analysis of world population and geography
- An Orange Data Table ready for visual analysis
- Visualizations showing regional distributions
Deliverables
- Python Script: Complete API data retrieval script (.py file)
- CSV Export: countries_data.csv with all extracted data
- Visualization: Chart showing population or country distribution
- Report: Answer these questions:
- How many countries are UN members?
- Which region has the highest total population?
- What is the largest landlocked country by area?
- How many unique languages are spoken across all countries?
Bonus Challenges
- Challenge 1: Use the Open-Meteo API to fetch weather data for each capital city
- Challenge 2: Calculate population density (population/area) and find the most/least dense countries
- Challenge 3: Create a world map visualization using the latitude/longitude data
- Challenge 4: Combine with another API (e.g., World Bank) to add GDP data
API Endpoints Reference
| Endpoint | Description | Example |
|---|---|---|
/v3.1/all |
Get all countries | Try it |
/v3.1/name/{name} |
Search by country name | Try it |
/v3.1/region/{region} |
Filter by region | Try it |
/v3.1/alpha/{code} |
Get by country code | Try it |
/v3.1/lang/{language} |
Filter by language | Try it |