Building LLM Agents & Custom Skills

Workshop AI-Driven Development — Session 2

4 hours

From API calls to autonomous agents in vanilla Java

Session Agenda

4 intensive hours

Part 1 — Lecture (1h)

  • Session 1 retrospective
  • What is an LLM agent?
  • Agent architecture & loop
  • Calling Gemini in Java
  • Tools & registry pattern
  • Custom skills

Part 2 — Lab (3h)

  • Setup & first API call
  • Build agent core
  • Implement tools
  • Build skills
  • Personal project
  • Tests & review
  • Demos

Session 1 Retrospective

What did we learn? Where do we go next?

Discussion

  • What worked well?
  • Where did you get stuck?
  • Did AI make mistakes?
  • Did you understand the generated code?

Key Takeaways

  • Context is king — better specs = better output
  • AI doesn’t replace understanding
  • AIDD workflow structures thinking
  • AI mistakes are predictable

What is an LLM Agent?

From passive tool to autonomous actor

Simple LLM Usage

  • Single prompt → response
  • No memory between calls
  • No ability to act
  • Human orchestrates everything

LLM Agent

  • Multi-step autonomous execution
  • Maintains conversation context
  • Uses tools to act on the world
  • Observe → Think → Act loop

Definition: an agent is an LLM that can call tools, observe results, and decide what to do next — autonomously.

Agent Architecture

Four components working together

LLM Core Gemini 2.5 reasoning Tools Actions: files, HTTP, calc Memory Conversation history Orchestration Agent loop logic

The Agent Loop

Observe → Think → Act → repeat until done

OBSERVE Read user request or tool result THINK LLM decides: respond or call tool? ACT Execute tool OR return answer tool result → next iteration text response → done

Key insight: the agent keeps looping until the LLM decides no more tools are needed.

Calling GPT-5-mini in Vanilla Java

HttpClient + org.json — no frameworks

var body = new JSONObject()
  .put("model", "gpt-5-mini")
  .put("messages", new JSONArray(messages))
  .put("tools", registry.declarations())
  .put("tool_choice", "auto");

var req = HttpRequest.newBuilder()
  .uri(URI.create(
    "https://api.openai.com/v1"
    + "/chat/completions"))
  .header("Content-Type",
          "application/json")
  .header("Authorization",
          "Bearer " + API_KEY)
  .POST(BodyPublishers.ofString(
      body.toString()))
  .build();

var resp = client.send(req,
    BodyHandlers.ofString());

Built-in Java, one dependency

  • HttpClient — Java 11+, no libraries
  • org.json — builds the JSON body programmatically
  • messages — conversation history array
  • tools — auto-generated from registry
  • Bearer token — OpenAI auth pattern

Security: API_KEY = System.getenv("OPENAI_API_KEY") — never hardcode it.

Parsing LLM Responses

Text response or tool call — two paths

var json = new JSONObject(resp.body());
var message = json
    .getJSONArray("choices")
    .getJSONObject(0)
    .getJSONObject("message");

if (message.has("tool_calls")) {
    var calls = message.getJSONArray(
        "tool_calls");
    for (var call : calls) {
        var fn = call.getJSONObject(
            "function");
        String name = fn.getString("name");
        var args = new JSONObject(
            fn.getString("arguments"));
        String id = call.getString("id");
        // → execute tool, send result back
    }
} else {
    String text = message.getString(
        "content");
    // → final answer
}

Two response types

  • tool_calls — LLM wants to use tools
  • content — LLM has a final answer
  • arguments is a JSON string — parse it
  • id — must match in the tool result

Important: arguments is a string, not an object. Always wrap it with new JSONObject(fn.getString("arguments")).

Tools: Interface + Registry

The pattern behind every agent framework

interface Tool {
    String name();
    String description();
    JSONObject parameters();
    String execute(JSONObject args);
}

class ToolRegistry {
    Map<String, Tool> tools = new HashMap<>();

    void register(Tool t) {
        tools.put(t.name(), t);
    }
    String run(String name, JSONObject args) {
        return tools.get(name).execute(args);
    }
    JSONArray declarations() {
        var arr = new JSONArray();
        for (var t : tools.values()) {
            arr.put(new JSONObject()
              .put("type", "function")
              .put("function", new JSONObject()
                .put("name", t.name())
                .put("description", t.description())
                .put("parameters", t.parameters())));
        }
        return arr;
    }
}

Universal pattern

  • Interface — 4 methods: name, description, parameters, execute
  • Registry — central map of all tools
  • declarations() — generates OpenAI tool format
  • Extensible: just implement Tool

Fun fact: this is the same pattern used by LangChain, Spring AI, and Claude’s tool system.

Example: Calculator Tool

A concrete tool implementation

class CalculatorTool implements Tool {
    public String name() {
        return "calculate";
    }
    public String description() {
        return "Evaluate a math expression";
    }
    public JSONObject parameters() {
        return new JSONObject("""
          {"type":"object","properties":{
            "expression":{"type":"string",
              "description":"e.g. 2+3*4"}
          },"required":["expression"]}""");
    }
    public String execute(JSONObject args) {
        String expr = args.getString(
            "expression");
        // Simple eval (or use ScriptEngine)
        return String.valueOf(eval(expr));
    }
}

4 methods, one job

  • name() — how the LLM refers to it
  • description() — helps LLM decide when to use it
  • parameters() — OpenAI function schema (JSON Schema)
  • execute() — does the actual work
  • Registration: registry.register(new CalculatorTool())

Lab preview: in the lab you’ll implement at least 2 tools like this.

The Complete Agent Loop

30 lines of Java — that’s the entire agent

var messages = new ArrayList<JSONObject>();
messages.add(systemMsg(prompt));
messages.add(userMsg(input));

for (int i = 0; i < MAX_ITER; i++) {
    var resp = llm.call(messages,
        registry.declarations());
    var msg = resp.getJSONArray("choices")
        .getJSONObject(0)
        .getJSONObject("message");

    if (msg.has("tool_calls")) {
        messages.add(msg);
        for (var tc : msg.getJSONArray(
                "tool_calls")) {
            var fn = tc.getJSONObject(
                "function");
            var result = registry.run(
                fn.getString("name"),
                new JSONObject(
                  fn.getString("arguments")));
            messages.add(new JSONObject()
              .put("role", "tool")
              .put("tool_call_id",
                   tc.getString("id"))
              .put("content", result));
        }
    } else {
        return msg.getString("content");
    }
}

The heart of every agent

  • for loop — safety cap on iterations
  • llm.call — send messages + tool declarations
  • tool_calls? → execute each, add results to history
  • content? → return final answer
  • tool_call_id — links result to the call

That’s it. A complete LLM agent. Everything else is just adding more tools.

Vanilla Java vs Frameworks

Understand the pattern, then pick the right tool

Vanilla JavaLangChain4jSpring AI
Dependenciesorg.json only~20 JARsSpring Boot stack
Lines for an agent~150~30~20
Learning curveJust JavaNew abstractionsSpring ecosystem
FlexibilityTotal controlPlugin-basedConvention-based
Best forLearning, prototypesMedium projectsEnterprise
You understandEverythingMostlyFramework magic

Our approach: we use vanilla Java so you see what frameworks do under the hood. Once you understand the loop, any framework becomes transparent.

What is a Skill?

A tool is a function. A skill is an AI-powered capability.

Definition

A skill bundles a system prompt + specialized tools + output format into a single reusable unit. Think of it as a plugin for your agent.

Example: a “Code Review” skill that reads files, analyzes patterns, and outputs a structured report.

Good Skill Properties

  • Single responsibility — one job, done well
  • Clear contract — defined input & output
  • Error handling — fails gracefully
  • Composable — works with other skills
  • Testable — verifiable in isolation

Skill Architecture in Java

System prompt + tools = one reusable capability

class Skill {
    private final String name;
    private final String systemPrompt;
    private final List<Tool> tools;

    Skill(String name, String prompt,
          List<Tool> tools) {
        this.name = name;
        this.systemPrompt = prompt;
        this.tools = tools;
    }

    String execute(String userInput) {
        var registry = new ToolRegistry();
        tools.forEach(registry::register);
        var agent = new Agent(
            systemPrompt, registry);
        return agent.run(userInput);
    }
}

Encapsulation

  • System prompt — focuses the LLM on one task
  • Specialized tools — only what this skill needs
  • execute() — runs a full agent loop internally
  • Composable — agents can use skills as tools

Key insight: a skill is just a focused agent. Skills can even call other skills.

Example: Code Review Skill

A reusable skill with a focused prompt and read-only tools

var reviewSkill = new Skill(
    "codeReview",
    """
    You are a code reviewer. Analyze
    the given code for:
    1. Security vulnerabilities
    2. Performance issues
    3. Readability problems
    4. Potential bugs
    Return a structured report with
    severity: critical / major / minor.
    """,
    List.of(
        new ReadFileTool(),
        new CountLinesTool()
    )
);

String report = reviewSkill.execute(
    "Review src/agent/Agent.java");
System.out.println(report);

Why this works

  • Focused prompt — exactly what to look for
  • Read-only tools — no write access
  • Structured output — severity levels
  • Reusable — call on any file, any project

Safety: a review skill should NEVER have write tools. Read-only only.

Skill Ideas for Your Projects

Four ready-to-build skills

Code Review

Reads Java files, analyzes for bugs, security, and style. Outputs a structured report with severity levels.

Tools: ReadFile, CountLines

Test Generator

Reads a Java class, generates JUnit test cases with edge cases and assertions. Outputs test file content.

Tools: ReadFile, ListMethods

Documentation

Reads code and comments, generates Javadoc or README sections. Outputs formatted markdown.

Tools: ReadFile, ReadDirectory

Data Analyzer

Reads CSV/JSON data, computes statistics, generates a human-readable summary with key insights.

Tools: ReadFile, Calculate

Skills in Claude Code

The same concept, productized — SKILL.md format

# ~/.claude/skills/review/SKILL.md
---
name: review
description: Review code for bugs and style
user-invocable: true
allowed-tools: Read Grep
argument-hint: [file-path]
---

Review the following file for:
1. Security vulnerabilities
2. Performance issues
3. Readability problems
4. Potential bugs

File to review: $ARGUMENTS

Return a structured report with
severity levels: critical / major / minor.
Cite specific line numbers.

Anatomy of a SKILL.md

  • Frontmatter (YAML between ---) — metadata
  • name — becomes the /review command
  • description — when to auto-activate
  • allowed-tools — restricts what Claude can do
  • $ARGUMENTS — user input after the command
  • Body — the system prompt / instructions

Usage

/review src/Agent.java

Same pattern: your Java Skill class = Claude Code SKILL.md. System prompt + tools + input → structured output.

Example: The /craft Skill

Turn the CRAFT methodology into a reusable command

# ~/.claude/skills/craft/SKILL.md
---
name: craft
description: Generate a CRAFT-structured
  prompt for AI-driven development
user-invocable: true
allowed-tools: Read Grep Glob
argument-hint: [feature-description]
---

The user wants to build: $ARGUMENTS

Generate a CRAFT prompt by:

1. **Context**: Use Read and Grep to scan
   the project. Identify: framework, lang,
   existing files, conventions, types.

2. **Requirement**: Restate what the user
   wants with precise acceptance criteria.
   Add edge cases and error behavior.

3. **Action**: Specify exact file(s) to
   create or modify, with full paths.

4. **Format**: State the tech constraints
   (language, framework, patterns, types).

5. **Test**: Describe how to verify the
   result: expected behavior, edge cases,
   what must NOT happen.

Output the CRAFT prompt in a code block
ready to copy-paste into an AI assistant.

What /craft does

Instead of writing CRAFT prompts manually, the skill auto-generates them by reading your codebase first.

Example

/craft add user login with email

Claude reads your project, finds the stack, existing auth files, types, then outputs a complete CRAFT prompt.

Why it matters

  • Context is auto-discovered, not typed
  • File paths are real, not guessed
  • Types match your actual codebase
  • Acceptance criteria are specific

Meta: this skill uses AI to write better AI prompts. That’s the power of skills.

Project Ideas for the Challenge

Agent-powered projects — pick one and build it

Code Review Agent

Reads Java files, analyzes quality, generates structured review. Skills: file reading, pattern detection, report generation.

Study Assistant

Reads course notes, answers questions, generates flashcards and quizzes. Skills: document parsing, Q&A, quiz generation.

Data Pipeline Agent

Reads CSV/JSON, cleans data, computes stats, generates summary. Skills: file I/O, data parsing, text-based visualization.

DevOps Helper

Reads log files, diagnoses issues, suggests fixes. Skills: command execution, log analysis, troubleshooting.

Multi-API Aggregator

Queries 2+ public APIs (weather, news, stocks), combines data, generates a briefing. Skills: HTTP fetch, data merge, summary.

Your Own Agent!

Any agent with: 2+ custom tools, 1+ reusable skill, conversation memory, and error handling. Be creative.

Challenge Requirements

What you must deliver

Required

  • Complete vision.md with agent architecture
  • Working agent loop (observe-think-act)
  • 2+ custom tools implemented
  • 1+ reusable skill
  • 1 unit test (tool) + 1 integration test (agent)
  • GitHub repo with clean commits
  • Cross code review completed

Bonus

  • 3+ tools
  • Multi-turn conversation memory
  • Skill composition (skill calls skill)
  • Error recovery in agent loop
  • README with architecture diagram
  • Interactive CLI interface
  • Tool output formatting

Grading Rubric

How your project will be graded

CriterionPointsDetails
AIDD Methodology/15vision.md, GitHub issues, clean commits, workflow followed
Agent Core/25Working agent loop, LLM API integration, message history, proper exit
Tools & Skills/252+ working tools, 1+ skill, clean interfaces, error handling
Code Quality/15Clean Java, proper OOP (interfaces, encapsulation), no dead code
Tests/101 unit test (tool) + 1 integration test (agent loop), all passing
Demo/10Clarity, live demo, technical depth, honest AI assessment

Total: /100. Agent quality and skill design count 50%. A well-designed agent with 2 solid tools beats a messy agent with 5 broken ones.

Cross Code Review

Learn by evaluating others’ agent code

Feature Freeze Stop coding. Push everything. Repo Exchange Get partner’s GitHub link Review Agent Run their agent, read tools & skills Open Issues 1 positive + 2 suggestions Discuss 5 min in pairs. Explain the why. Look for: clean interfaces, good tool design, working agent loop Flag: missing error handling, hardcoded prompts, no separation of concerns

Prepare Your Demo

5 minutes to showcase your agent

Structure (5 min)

  1. 30s — What problem your agent solves
  2. 2min — Live demo: run the agent, show tool calls
  3. 1min — Architecture: agent loop, tool design, skills
  4. 1min — What Gemini did well / poorly
  5. 30s — What you would improve with more time

Tips

  • Prepare a scripted prompt that triggers tool calls
  • Have a backup recording in case the API is slow
  • Show the agent loop in action (print tool calls)
  • Show passing tests in terminal
  • Be honest about limitations

What You Take Away

Skills acquired in this session

Agent Architecture

  • LLM core + tools + memory + loop
  • Observe → Think → Act cycle
  • Vanilla Java HttpClient + org.json
  • Gemini 2.5 function calling

Skills & Tools

  • Interface + registry pattern
  • Single-responsibility tools
  • Composable skills
  • Code review as example

Critical Thinking

  • What happens inside agent frameworks
  • When agents help vs overcomplicate
  • Responsibility for agent actions
  • Vanilla code → framework fluency

You now understand what happens inside every AI agent framework. The patterns are universal — Java, Python, TypeScript, the loop is the same.

Ready to Build Your Agent?

3 hours to build an autonomous LLM agent

Timeline

  • 0:00–0:20 — Setup + first Gemini call
  • 0:20–1:00 — Build agent core
  • 1:00–1:30 — Implement 2+ tools
  • 1:30–1:45 — Build a skill
  • 1:45–2:30 — Personal project
  • 2:30–2:45 — Tests + cross review
  • 2:45–3:00 — Demos

Requirements Reminder

  • Working agent loop
  • 2+ custom tools
  • 1+ reusable skill
  • 1 unit + 1 integration test
  • Clean commits on GitHub
  • Cross code review done

Head to the lab → practical-work-2-personal-challenge.html

Slide Overview