← Back to Data Exploration
Practical Work 4

REST API Data Retrieval

Fetch and analyze data from public REST APIs using Python and pandas

Duration 2-3 hours
Difficulty Beginner
Session REST APIs

Objectives

By the end of this practical work, you will be able to:

  • Understand REST API concepts (endpoints, HTTP methods, JSON)
  • Make HTTP GET requests using Python's requests library
  • Parse JSON responses and extract relevant data
  • Handle pagination in API responses
  • Transform API data into pandas DataFrames for analysis
  • Integrate REST API data with Orange Data Mining

Prerequisites

  • Python 3.8+ installed
  • Basic understanding of JSON format
  • Orange Data Mining (optional, for visualization)

Install required packages:

pip install requests pandas

API Selection

We'll work with the REST Countries API, a free public API with no authentication required:

Note: Public APIs don't require API keys, making them perfect for learning!

Instructions

Step 1: Make Your First API Request

Start by fetching data for a single country:

import requests

# Fetch data for France
url = "https://restcountries.com/v3.1/name/france"
response = requests.get(url)

# Check response status
print(f"Status Code: {response.status_code}")
print(f"Content Type: {response.headers.get('content-type')}")

# Parse JSON response
if response.status_code == 200:
    data = response.json()
    print(f"Number of results: {len(data)}")
    print(f"Type of data: {type(data)}")
else:
    print(f"Error: {response.status_code}")

Expected: Status 200, JSON content type, list with 1 country.

Step 2: Explore the JSON Structure

Understand the nested JSON structure:

import json

# Pretty print the first result
country = data[0]
print(json.dumps(country, indent=2)[:1000])  # First 1000 chars

# Access nested data
print(f"\nCountry: {country['name']['common']}")
print(f"Official Name: {country['name']['official']}")
print(f"Capital: {country['capital'][0]}")
print(f"Population: {country['population']:,}")
print(f"Area: {country['area']:,} km²")
print(f"Region: {country['region']}")
print(f"Subregion: {country['subregion']}")

Step 3: Fetch All Countries

Get data for all countries at once:

# Fetch all countries
url = "https://restcountries.com/v3.1/all"
response = requests.get(url)
all_countries = response.json()

print(f"Total countries: {len(all_countries)}")

# Preview first 5 country names
for country in all_countries[:5]:
    name = country['name']['common']
    pop = country.get('population', 0)
    print(f"  - {name}: {pop:,} people")

Step 4: Extract Data to Dictionary

Create a function to extract key fields:

def extract_country_data(country):
    """Extract relevant fields from a country JSON object."""
    # Handle missing fields gracefully
    capitals = country.get('capital', ['N/A'])
    capital = capitals[0] if capitals else 'N/A'

    languages = country.get('languages', {})
    lang_list = list(languages.values()) if languages else []

    currencies = country.get('currencies', {})
    currency_names = [c.get('name', 'Unknown') for c in currencies.values()]

    return {
        'name': country['name']['common'],
        'official_name': country['name']['official'],
        'capital': capital,
        'region': country.get('region', 'Unknown'),
        'subregion': country.get('subregion', 'Unknown'),
        'population': country.get('population', 0),
        'area': country.get('area', 0),
        'languages': ', '.join(lang_list),
        'currencies': ', '.join(currency_names),
        'landlocked': country.get('landlocked', False),
        'un_member': country.get('unMember', False)
    }

# Test with France
france_data = extract_country_data(all_countries[0])
for key, value in france_data.items():
    print(f"{key}: {value}")

Step 5: Convert to DataFrame

Process all countries into a pandas DataFrame:

import pandas as pd

# Extract data for all countries
countries_data = [extract_country_data(c) for c in all_countries]

# Create DataFrame
df = pd.DataFrame(countries_data)

# Display basic info
print(f"DataFrame shape: {df.shape}")
print(f"\nColumns: {list(df.columns)}")
print(f"\nData types:\n{df.dtypes}")

# Preview the data
print(f"\nFirst 10 countries:")
print(df[['name', 'capital', 'population', 'region']].head(10))

Step 6: Basic Data Analysis

Analyze the country data:

# Population statistics
print("=== Population Statistics ===")
print(f"Total world population: {df['population'].sum():,}")
print(f"Average population: {df['population'].mean():,.0f}")
print(f"Median population: {df['population'].median():,.0f}")

# Top 10 most populous countries
print("\n=== Top 10 Most Populous Countries ===")
top_10 = df.nlargest(10, 'population')[['name', 'population', 'region']]
print(top_10.to_string(index=False))

# Countries by region
print("\n=== Countries by Region ===")
region_counts = df['region'].value_counts()
print(region_counts)

# Landlocked countries
landlocked = df[df['landlocked'] == True]
print(f"\n=== Landlocked Countries: {len(landlocked)} ==")
print(landlocked['name'].head(10).tolist())

Step 7: Filter by Region

Use API filtering to get specific regions:

# Fetch only European countries
url = "https://restcountries.com/v3.1/region/europe"
response = requests.get(url)
european_countries = response.json()

print(f"European countries: {len(european_countries)}")

# Create European DataFrame
europe_data = [extract_country_data(c) for c in european_countries]
df_europe = pd.DataFrame(europe_data)

# Analysis
print(f"\nTotal EU population: {df_europe['population'].sum():,}")
print(f"\nLargest European countries by area:")
print(df_europe.nlargest(5, 'area')[['name', 'area', 'population']])

Step 8: Save to CSV

Export the data for later use:

# Save all countries
df.to_csv('countries_data.csv', index=False)
print(f"Saved {len(df)} countries to countries_data.csv")

# Save European countries separately
df_europe.to_csv('european_countries.csv', index=False)
print(f"Saved {len(df_europe)} European countries to european_countries.csv")

# Verify the files
import os
for filename in ['countries_data.csv', 'european_countries.csv']:
    size = os.path.getsize(filename)
    print(f"{filename}: {size:,} bytes")

Step 9: Integrate with Orange (Optional)

Create an Orange Data Table from the API data:

from Orange.data import Table, Domain, StringVariable, ContinuousVariable, DiscreteVariable

# Define the domain
domain = Domain(
    # Continuous (numeric) features
    [ContinuousVariable("population"),
     ContinuousVariable("area")],
    # Target variable (discrete)
    [DiscreteVariable("region", values=list(df['region'].unique()))],
    # Meta variables (string/text)
    [StringVariable("name"),
     StringVariable("capital"),
     StringVariable("languages"),
     DiscreteVariable("landlocked", values=["False", "True"]),
     DiscreteVariable("un_member", values=["False", "True"])]
)

# Prepare data as list of lists
data_list = []
for _, row in df.iterrows():
    data_list.append([
        row['population'],
        row['area'],
        row['region'],
        row['name'],
        row['capital'],
        row['languages'],
        str(row['landlocked']),
        str(row['un_member'])
    ])

# Create Orange table
out_data = Table.from_list(domain, data_list)
print(f"Created Orange table with {len(out_data)} rows")

Success! You can now use this data in Orange widgets for visualization and analysis.

Step 10: Visualization (Optional)

Create visualizations using matplotlib:

import matplotlib.pyplot as plt

# Population by region (bar chart)
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Chart 1: Countries per region
region_counts = df['region'].value_counts()
axes[0].bar(region_counts.index, region_counts.values, color='steelblue')
axes[0].set_title('Number of Countries by Region')
axes[0].set_xlabel('Region')
axes[0].set_ylabel('Count')
axes[0].tick_params(axis='x', rotation=45)

# Chart 2: Total population by region
pop_by_region = df.groupby('region')['population'].sum().sort_values(ascending=False)
axes[1].bar(pop_by_region.index, pop_by_region.values / 1e9, color='coral')
axes[1].set_title('Total Population by Region (Billions)')
axes[1].set_xlabel('Region')
axes[1].set_ylabel('Population (Billions)')
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.savefig('countries_analysis.png', dpi=150)
print("Saved visualization to countries_analysis.png")

Expected Output

After completing this practical work, you should have:

  • A working Python script that fetches data from REST Countries API
  • A CSV file with data for 250+ countries
  • Basic statistical analysis of world population and geography
  • An Orange Data Table ready for visual analysis
  • Visualizations showing regional distributions

Deliverables

  • Python Script: Complete API data retrieval script (.py file)
  • CSV Export: countries_data.csv with all extracted data
  • Visualization: Chart showing population or country distribution
  • Report: Answer these questions:
    1. How many countries are UN members?
    2. Which region has the highest total population?
    3. What is the largest landlocked country by area?
    4. How many unique languages are spoken across all countries?

Bonus Challenges

  • Challenge 1: Use the Open-Meteo API to fetch weather data for each capital city
  • Challenge 2: Calculate population density (population/area) and find the most/least dense countries
  • Challenge 3: Create a world map visualization using the latitude/longitude data
  • Challenge 4: Combine with another API (e.g., World Bank) to add GDP data

API Endpoints Reference

Endpoint Description Example
/v3.1/all Get all countries Try it
/v3.1/name/{name} Search by country name Try it
/v3.1/region/{region} Filter by region Try it
/v3.1/alpha/{code} Get by country code Try it
/v3.1/lang/{language} Filter by language Try it

Resources