Building LLM Agents & Custom Skills

Workshop AI-Driven Development — Session 2

4 hours

From API calls to autonomous agents in vanilla Java

Session Agenda

4 intensive hours

Part 1 — Lecture (1h)

Session 1 retrospective
What is an LLM agent?
Agent architecture & loop
Calling Gemini in Java
Tools & registry pattern
Custom skills

Part 2 — Lab (3h)

Setup & first API call
Build agent core
Implement tools
Build skills
Personal project
Tests & review
Demos

Session 1 Retrospective

What did we learn? Where do we go next?

Discussion

What worked well?
Where did you get stuck?
Did AI make mistakes?
Did you understand the generated code?

Key Takeaways

Context is king — better specs = better output
AI doesn’t replace understanding
AIDD workflow structures thinking
AI mistakes are predictable

What is an LLM Agent?

From passive tool to autonomous actor

Simple LLM Usage

Single prompt → response
No memory between calls
No ability to act
Human orchestrates everything

LLM Agent

Multi-step autonomous execution
Maintains conversation context
Uses tools to act on the world
Observe → Think → Act loop

Definition: an agent is an LLM that can call tools, observe results, and decide what to do next — autonomously.

Agent Architecture

Four components working together

The Agent Loop

Observe → Think → Act → repeat until done

Key insight: the agent keeps looping until the LLM decides no more tools are needed.

Calling GPT-5-mini in Vanilla Java

HttpClient + org.json — no frameworks

var body = new JSONObject()
  .put("model", "gpt-5-mini")
  .put("messages", new JSONArray(messages))
  .put("tools", registry.declarations())
  .put("tool_choice", "auto");

var req = HttpRequest.newBuilder()
  .uri(URI.create(
    "https://api.openai.com/v1"
    + "/chat/completions"))
  .header("Content-Type",
          "application/json")
  .header("Authorization",
          "Bearer " + API_KEY)
  .POST(BodyPublishers.ofString(
      body.toString()))
  .build();

var resp = client.send(req,
    BodyHandlers.ofString());

Built-in Java, one dependency

HttpClient — Java 11+, no libraries
org.json — builds the JSON body programmatically
messages — conversation history array
tools — auto-generated from registry
Bearer token — OpenAI auth pattern

Security: API_KEY = System.getenv("OPENAI_API_KEY") — never hardcode it.

Parsing LLM Responses

Text response or tool call — two paths

var json = new JSONObject(resp.body());
var message = json
    .getJSONArray("choices")
    .getJSONObject(0)
    .getJSONObject("message");

if (message.has("tool_calls")) {
    var calls = message.getJSONArray(
        "tool_calls");
    for (var call : calls) {
        var fn = call.getJSONObject(
            "function");
        String name = fn.getString("name");
        var args = new JSONObject(
            fn.getString("arguments"));
        String id = call.getString("id");
        // → execute tool, send result back
    }
} else {
    String text = message.getString(
        "content");
    // → final answer
}

Two response types

tool_calls — LLM wants to use tools
content — LLM has a final answer
arguments is a JSON string — parse it
id — must match in the tool result

Important: arguments is a string, not an object. Always wrap it with new JSONObject(fn.getString("arguments")).

Tools: Interface + Registry

The pattern behind every agent framework

interface Tool {
    String name();
    String description();
    JSONObject parameters();
    String execute(JSONObject args);
}

class ToolRegistry {
    Map<String, Tool> tools = new HashMap<>();

    void register(Tool t) {
        tools.put(t.name(), t);
    }
    String run(String name, JSONObject args) {
        return tools.get(name).execute(args);
    }
    JSONArray declarations() {
        var arr = new JSONArray();
        for (var t : tools.values()) {
            arr.put(new JSONObject()
              .put("type", "function")
              .put("function", new JSONObject()
                .put("name", t.name())
                .put("description", t.description())
                .put("parameters", t.parameters())));
        }
        return arr;
    }
}

Universal pattern

Interface — 4 methods: name, description, parameters, execute
Registry — central map of all tools
declarations() — generates OpenAI tool format
Extensible: just implement Tool

Fun fact: this is the same pattern used by LangChain, Spring AI, and Claude’s tool system.

Example: Calculator Tool

A concrete tool implementation

class CalculatorTool implements Tool {
    public String name() {
        return "calculate";
    }
    public String description() {
        return "Evaluate a math expression";
    }
    public JSONObject parameters() {
        return new JSONObject("""
          {"type":"object","properties":{
            "expression":{"type":"string",
              "description":"e.g. 2+3*4"}
          },"required":["expression"]}""");
    }
    public String execute(JSONObject args) {
        String expr = args.getString(
            "expression");
        // Simple eval (or use ScriptEngine)
        return String.valueOf(eval(expr));
    }
}

4 methods, one job

name() — how the LLM refers to it
description() — helps LLM decide when to use it
parameters() — OpenAI function schema (JSON Schema)
execute() — does the actual work
Registration: registry.register(new CalculatorTool())

Lab preview: in the lab you’ll implement at least 2 tools like this.

The Complete Agent Loop

30 lines of Java — that’s the entire agent

var messages = new ArrayList<JSONObject>();
messages.add(systemMsg(prompt));
messages.add(userMsg(input));

for (int i = 0; i < MAX_ITER; i++) {
    var resp = llm.call(messages,
        registry.declarations());
    var msg = resp.getJSONArray("choices")
        .getJSONObject(0)
        .getJSONObject("message");

    if (msg.has("tool_calls")) {
        messages.add(msg);
        for (var tc : msg.getJSONArray(
                "tool_calls")) {
            var fn = tc.getJSONObject(
                "function");
            var result = registry.run(
                fn.getString("name"),
                new JSONObject(
                  fn.getString("arguments")));
            messages.add(new JSONObject()
              .put("role", "tool")
              .put("tool_call_id",
                   tc.getString("id"))
              .put("content", result));
        }
    } else {
        return msg.getString("content");
    }
}

The heart of every agent

for loop — safety cap on iterations
llm.call — send messages + tool declarations
tool_calls? → execute each, add results to history
content? → return final answer
tool_call_id — links result to the call

That’s it. A complete LLM agent. Everything else is just adding more tools.

Vanilla Java vs Frameworks

Understand the pattern, then pick the right tool

	Vanilla Java	LangChain4j	Spring AI
Dependencies	org.json only	~20 JARs	Spring Boot stack
Lines for an agent	~150	~30	~20
Learning curve	Just Java	New abstractions	Spring ecosystem
Flexibility	Total control	Plugin-based	Convention-based
Best for	Learning, prototypes	Medium projects	Enterprise
You understand	Everything	Mostly	Framework magic

Our approach: we use vanilla Java so you see what frameworks do under the hood. Once you understand the loop, any framework becomes transparent.

What is a Skill?

A tool is a function. A skill is an AI-powered capability.

Definition

A skill bundles a system prompt + specialized tools + output format into a single reusable unit. Think of it as a plugin for your agent.

Example: a “Code Review” skill that reads files, analyzes patterns, and outputs a structured report.

Good Skill Properties

Single responsibility — one job, done well
Clear contract — defined input & output
Error handling — fails gracefully
Composable — works with other skills
Testable — verifiable in isolation

Skill Architecture in Java

System prompt + tools = one reusable capability

class Skill {
    private final String name;
    private final String systemPrompt;
    private final List<Tool> tools;

    Skill(String name, String prompt,
          List<Tool> tools) {
        this.name = name;
        this.systemPrompt = prompt;
        this.tools = tools;
    }

    String execute(String userInput) {
        var registry = new ToolRegistry();
        tools.forEach(registry::register);
        var agent = new Agent(
            systemPrompt, registry);
        return agent.run(userInput);
    }
}

Encapsulation

System prompt — focuses the LLM on one task
Specialized tools — only what this skill needs
execute() — runs a full agent loop internally
Composable — agents can use skills as tools

Key insight: a skill is just a focused agent. Skills can even call other skills.

Example: Code Review Skill

A reusable skill with a focused prompt and read-only tools

var reviewSkill = new Skill(
    "codeReview",
    """
    You are a code reviewer. Analyze
    the given code for:
    1. Security vulnerabilities
    2. Performance issues
    3. Readability problems
    4. Potential bugs
    Return a structured report with
    severity: critical / major / minor.
    """,
    List.of(
        new ReadFileTool(),
        new CountLinesTool()
    )
);

String report = reviewSkill.execute(
    "Review src/agent/Agent.java");
System.out.println(report);

Why this works

Focused prompt — exactly what to look for
Read-only tools — no write access
Structured output — severity levels
Reusable — call on any file, any project

Safety: a review skill should NEVER have write tools. Read-only only.

Skill Ideas for Your Projects

Four ready-to-build skills

Code Review

Reads Java files, analyzes for bugs, security, and style. Outputs a structured report with severity levels.

Tools: ReadFile, CountLines

Test Generator

Reads a Java class, generates JUnit test cases with edge cases and assertions. Outputs test file content.

Tools: ReadFile, ListMethods

Documentation

Reads code and comments, generates Javadoc or README sections. Outputs formatted markdown.

Tools: ReadFile, ReadDirectory

Data Analyzer

Reads CSV/JSON data, computes statistics, generates a human-readable summary with key insights.

Tools: ReadFile, Calculate

Skills in Claude Code

The same concept, productized — SKILL.md format

# ~/.claude/skills/review/SKILL.md
---
name: review
description: Review code for bugs and style
user-invocable: true
allowed-tools: Read Grep
argument-hint: [file-path]
---

Review the following file for:
1. Security vulnerabilities
2. Performance issues
3. Readability problems
4. Potential bugs

File to review: $ARGUMENTS

Return a structured report with
severity levels: critical / major / minor.
Cite specific line numbers.

Anatomy of a SKILL.md

Frontmatter (YAML between ---) — metadata
name — becomes the /review command
description — when to auto-activate
allowed-tools — restricts what Claude can do
$ARGUMENTS — user input after the command
Body — the system prompt / instructions

Usage

/review src/Agent.java

Same pattern: your Java Skill class = Claude Code SKILL.md. System prompt + tools + input → structured output.

Example: The /craft Skill

Turn the CRAFT methodology into a reusable command

# ~/.claude/skills/craft/SKILL.md
---
name: craft
description: Generate a CRAFT-structured
  prompt for AI-driven development
user-invocable: true
allowed-tools: Read Grep Glob
argument-hint: [feature-description]
---

The user wants to build: $ARGUMENTS

Generate a CRAFT prompt by:

1. **Context**: Use Read and Grep to scan
   the project. Identify: framework, lang,
   existing files, conventions, types.

2. **Requirement**: Restate what the user
   wants with precise acceptance criteria.
   Add edge cases and error behavior.

3. **Action**: Specify exact file(s) to
   create or modify, with full paths.

4. **Format**: State the tech constraints
   (language, framework, patterns, types).

5. **Test**: Describe how to verify the
   result: expected behavior, edge cases,
   what must NOT happen.

Output the CRAFT prompt in a code block
ready to copy-paste into an AI assistant.

What /craft does

Instead of writing CRAFT prompts manually, the skill auto-generates them by reading your codebase first.

Example

/craft add user login with email

Claude reads your project, finds the stack, existing auth files, types, then outputs a complete CRAFT prompt.

Why it matters

Context is auto-discovered, not typed
File paths are real, not guessed
Types match your actual codebase
Acceptance criteria are specific

Meta: this skill uses AI to write better AI prompts. That’s the power of skills.

Project Ideas for the Challenge

Agent-powered projects — pick one and build it

Code Review Agent

Reads Java files, analyzes quality, generates structured review. Skills: file reading, pattern detection, report generation.

Study Assistant

Reads course notes, answers questions, generates flashcards and quizzes. Skills: document parsing, Q&A, quiz generation.

Data Pipeline Agent

Reads CSV/JSON, cleans data, computes stats, generates summary. Skills: file I/O, data parsing, text-based visualization.

DevOps Helper

Reads log files, diagnoses issues, suggests fixes. Skills: command execution, log analysis, troubleshooting.

Multi-API Aggregator

Queries 2+ public APIs (weather, news, stocks), combines data, generates a briefing. Skills: HTTP fetch, data merge, summary.

Your Own Agent!

Any agent with: 2+ custom tools, 1+ reusable skill, conversation memory, and error handling. Be creative.

Challenge Requirements

What you must deliver

Required

Complete vision.md with agent architecture
Working agent loop (observe-think-act)
2+ custom tools implemented
1+ reusable skill
1 unit test (tool) + 1 integration test (agent)
GitHub repo with clean commits
Cross code review completed

Bonus

3+ tools
Multi-turn conversation memory
Skill composition (skill calls skill)
Error recovery in agent loop
README with architecture diagram
Interactive CLI interface
Tool output formatting

Grading Rubric

How your project will be graded

Criterion	Points	Details
AIDD Methodology	/15	vision.md, GitHub issues, clean commits, workflow followed
Agent Core	/25	Working agent loop, LLM API integration, message history, proper exit
Tools & Skills	/25	2+ working tools, 1+ skill, clean interfaces, error handling
Code Quality	/15	Clean Java, proper OOP (interfaces, encapsulation), no dead code
Tests	/10	1 unit test (tool) + 1 integration test (agent loop), all passing
Demo	/10	Clarity, live demo, technical depth, honest AI assessment

Total: /100. Agent quality and skill design count 50%. A well-designed agent with 2 solid tools beats a messy agent with 5 broken ones.

Cross Code Review

Learn by evaluating others’ agent code

Prepare Your Demo

5 minutes to showcase your agent

Structure (5 min)

30s — What problem your agent solves
2min — Live demo: run the agent, show tool calls
1min — Architecture: agent loop, tool design, skills
1min — What Gemini did well / poorly
30s — What you would improve with more time

Tips

Prepare a scripted prompt that triggers tool calls
Have a backup recording in case the API is slow
Show the agent loop in action (print tool calls)
Show passing tests in terminal
Be honest about limitations

What You Take Away

Skills acquired in this session

Agent Architecture

LLM core + tools + memory + loop
Observe → Think → Act cycle
Vanilla Java HttpClient + org.json
Gemini 2.5 function calling

Skills & Tools

Interface + registry pattern
Single-responsibility tools
Composable skills
Code review as example

Critical Thinking

What happens inside agent frameworks
When agents help vs overcomplicate
Responsibility for agent actions
Vanilla code → framework fluency

You now understand what happens inside every AI agent framework. The patterns are universal — Java, Python, TypeScript, the loop is the same.

Ready to Build Your Agent?

3 hours to build an autonomous LLM agent

Timeline

0:00–0:20 — Setup + first Gemini call
0:20–1:00 — Build agent core
1:00–1:30 — Implement 2+ tools
1:30–1:45 — Build a skill
1:45–2:30 — Personal project
2:30–2:45 — Tests + cross review
2:45–3:00 — Demos

Requirements Reminder

Working agent loop
2+ custom tools
1+ reusable skill
1 unit + 1 integration test
Clean commits on GitHub
Cross code review done

Head to the lab → practical-work-2-personal-challenge.html

Building LLM Agents & Custom Skills

Session Agenda

Part 1 — Lecture (1h)

Part 2 — Lab (3h)

Session 1 Retrospective

Discussion

Key Takeaways

What is an LLM Agent?

Simple LLM Usage

LLM Agent

Agent Architecture

The Agent Loop

Calling GPT-5-mini in Vanilla Java

Built-in Java, one dependency

Parsing LLM Responses

Two response types

Tools: Interface + Registry

Universal pattern

Example: Calculator Tool

4 methods, one job

The Complete Agent Loop

The heart of every agent

Vanilla Java vs Frameworks

What is a Skill?

Definition

Good Skill Properties

Skill Architecture in Java

Encapsulation

Example: Code Review Skill

Why this works

Skill Ideas for Your Projects

Code Review

Test Generator

Documentation

Data Analyzer

Skills in Claude Code

Anatomy of a SKILL.md

Usage

Example: The /craft Skill

What /craft does

Example

Why it matters

Project Ideas for the Challenge

Code Review Agent

Study Assistant

Data Pipeline Agent

DevOps Helper

Multi-API Aggregator

Your Own Agent!

Challenge Requirements

Required

Bonus

Grading Rubric

Cross Code Review

Prepare Your Demo

Structure (5 min)

Tips

What You Take Away

Agent Architecture

Skills & Tools

Critical Thinking

Ready to Build Your Agent?

Timeline

Requirements Reminder

Slide Overview