EngineeringSep 30, 2025Updated Jun 25, 20269 min read

How AI agents actually work

An AI agent is an LLM call, a list of tools, and a loop. A walkthrough of a working coding agent in a little over 200 lines of Python, no framework required.

Dave EbbelaarSenior AI Engineer

TL;DR

AI agents work through one mechanism. A loop around an LLM call. Your code sends the model a conversation history plus a list of tools, where each tool is just a name, a description, and an input schema. The model either answers in text or requests a tool. When it requests one, your code executes the matching Python function, appends the result to the message list, and calls the model again. That repeats until the model answers without asking for a tool. A working coding agent needs only three tools (read a file, list files, edit a file) and fits in a little over 200 lines of Python with no framework. Answering one question that needs a tool takes two LLM calls, one where the model decides to use the tool, and one where it turns the result into an answer. The same loop powers Cursor, Claude Code, and almost every agentic application, whether the model behind it comes from Anthropic, OpenAI, or Google.

You can build a working AI coding agent in a little over 200 lines of Python. No framework, no orchestration layer, one main.py file. It lists the files in a directory, reads them, and edits them on instruction. I asked it to add a print statement to a file and it did.

That small demo carries the answer to how AI agents work. An agent is a connection with a large language model where you give it a list of tools, the model decides when to use one, and your code executes the tool and feeds the result back. Cursor, Claude Code, and Copilot are this loop with more tools and a much bigger system prompt.

If you've played with agent frameworks but want to understand what they do behind the scenes, this walkthrough is for you. It follows my video on how AI agents actually work and a repo you can clone and run script by script. Credit first. The original repository comes from Francis, a member of our Data Freelancer community, and I forked his work into a runbook of seven scripts that build the agent step by step.

An agent is an LLM, a list of tools, and a loop

All an agent really is, is an LLM call where the model gets a list of tools, decides whether it needs one, and hands the decision back to your code for execution. That cycle is the difference between an agent and a plain call to a large language model.

The agent class that carries it needs exactly three attributes. A client, the Anthropic Python SDK wrapper you use to talk to the model. A messages list, which holds the conversation as dictionaries with a role and content, the same format OpenAI and Anthropic both use. And a tools list, which holds the tool definitions.

That's the whole state. No vector database, no planner, no graph.

Function calling is the one prerequisite worth learning before any of this, because tools are what make this an agent rather than a chat completion. Everything below builds on that principle.

Tools are specifications, not functions

A tool is an object with a name, a description, and an input schema. The definition exists so the model can answer one question. What actions can I take, and what parameters do they need?

In the repo, each tool is a Pydantic BaseModel with those three fields, matching the data types the models expect. If you're building AI agents with Python, Pydantic is a must-have library, and the rest of the walkthrough assumes you know its basics.

A minimal coding agent needs three tools:

read_file takes a file path and returns the contents.
list_files takes a path and lists the files and directories inside it.
edit_file takes a path, an old_text, and a new_text, and creates the file if it doesn't exist.

Notice how small the input schemas stay. The LLM only needs to figure out where it is and what the path should be. Python handles the rest.

At this stage nothing executes. Defining tools and implementing them are separate steps, and keeping them separate is what makes the whole thing legible. When the third script runs, it prints "agent initialized with three tools" by taking the length of the tools list. Specification done.

Python does the heavy lifting

The implementations behind those three specifications are standard library code you've probably written before. This is the shape of them:

Python

def read_file(path: str) -> str:
    with open(path) as f:
        return f.read()
 
def list_files(path: str) -> list[str]:
    items = []
    for item in sorted(os.listdir(path)):
        full = os.path.join(path, item)
        items.append(f"{item}/" if os.path.isdir(full) else item)
    return items
 
def edit_file(path: str, old_text: str, new_text: str) -> None:
    content = ""
    if os.path.exists(path):
        with open(path) as f:
            content = f.read()
    with open(path, "w") as f:
        f.write(content.replace(old_text, new_text))

That edit function deserves a second look. It reads the file, runs Python's built-in content.replace() on the old text, and writes the result back. Pass an empty old_text and you've created a new file. When I tested it with "hello world" and then asked it to swap that for "hello world, how are you?", the file changed in place.

Inside the agent class, the same functions get try/except wrapped around them so a failed call returns an exception message instead of crashing the application. The agent reads the error and can course-correct from there.

There's one more piece, an execute_tool function that takes the model's output, loops through it, and checks which tool the model picked. Did it decide to use read_file? Go execute it. Plain if/else, nothing clever.

And this is already pretty much what a coding agent does. If you've worked with Cursor, Copilot, or Claude Code, this is the move. Go through the files, read everything, take the developer's instruction, and replace old text with new text.

The chat loop is where it becomes an agent

Everything up to this point runs without AI. The fourth script calls agent.list_files() directly as a plain method, which only tests that the function works; the model isn't involved yet. The chat method is where it takes over the decisions.

The flow starts simple. Take the user's input and append it to the messages list as a dictionary with role "user". Convert the Pydantic tool definitions to the exact format the Anthropic SDK wants. Send messages and tools to Claude Sonnet 4.5. Then parse what comes back, because the response can contain two kinds of content:

Python

for block in response.content:
    if block.type == "text":
        # the model answered directly, collect the text
    elif block.type == "tool_use":
        # the model wants an action: append the tool_use block
        # (id, name, input), execute the tool, append the
        # result to messages, and call the model again

A text block means the model decided to just reply. Ask "what can you help me with?" and it describes its file operations without touching a tool. A tool_use block means the model decided it needs to do something first, like reading a file, listing a directory, or making an edit. Your code appends the tool_use content in Anthropic's required format, executes the function, appends the result, and sends the whole conversation back to the model for another pass.

Run it with "what files are in the current directory?" and the print statements show the sequence. First the model calls list_files, then it responds. Behind the scenes that's two LLM calls. The first one decides "I need a tool, I need more information." The second one has the file list in context and writes the answer.

I fully understand this feels abstract the first time. There's a lot of if/else going on. Go through it a couple of times, and use AI to fill in the gaps. Paste a piece of the code and ask what it's doing and why. But this is the main agent loop powering almost all agentic AI applications today. Anthropic, Google, OpenAI, they all run on this principle of function calling and tools.

Stuck building prototypes?

Get our production stack

What you're missing is the architecture, evaluation, and judgment to ship a real project.

See Curriculum

A while loop and a system prompt make it a product

The sixth script wraps the chat method in an interactive CLI. A while True loop captures user input, breaks when you type exit or quit, and prints the assistant's response each turn. Same loop shape as the agent itself, one level up. With that running, you have an AI coding assistant living in your terminal, similar to how you work with Claude Code.

Script seven adds a personality. The agent gets a system prompt telling it that it operates in a terminal environment and should output plain text. My first version still produced asterisks everywhere, so I added one line telling it not to use any asterisk characters in its response. Ran it again, clean output. That tweak cycle, edit the prompt, run, inspect, is also how you debug an agent.

The system prompt is also where the real products differentiate. The system-prompts-and-models-of-ai-tools repository on GitHub collects the prompts behind popular tools, and Cursor's agent prompt is extensive. Our agent has three tools and a couple of instruction lines. Cursor has the same loop with far more machinery specified on top. This is where Cursor starts, and where Claude Code starts. You can keep adding functionality from here.

The final main.py adds logging, writing every step to an agent.log file. Then came the test run. I asked it to create a short test.py to prove it can create files, asked it to add another function (it wrote calculate_square and wired it into main), and finally asked it to empty the file. It did all three.

A small thing I love about working this way is that typos don't matter. The model looks at tokens. You can mangle the instruction and it still understands.

Run it yourself

Clone the repo and work through the runbook in order. Each script is the previous one plus one new concept:

Script	What it adds
01 basic	Confirms the script runs and the API key loads
02 agent class	Client, messages list, tools list
03 define tools	The three tool specifications in Pydantic
04 implement tools	The Python functions plus execute_tool
05 chat method	The agent loop
06 interactive CLI	while True input loop with exit and quit
07 personality	The system prompt

You run everything with uv, a fast Python package and project manager that can replace pip. Each file carries inline dependency metadata, a uv feature that runs scripts with dependencies like anthropic and pydantic without setting up a virtual environment. You also need an API key from console.anthropic.com. I used Claude Sonnet 4.5, which had been released the same day I recorded the video, and it handled every test.

Once the loop clicks, frameworks stop being magic. You can look at any agent library and see the messages list, the tool registry, and the execution loop underneath the abstractions. That mental model is the foundation for the engineering work in how to build production AI systems, and the natural next step is learning how to build reliable AI agents, where the question shifts from how the loop works to where it belongs in a real system.

FAQ

Do I need a framework like LangChain to build an AI agent?

No. The working agent here is a little over 200 lines of plain Python using the Anthropic SDK and Pydantic. Frameworks wrap this same loop in abstractions, which can help later, but building it once without one teaches you what every framework is actually doing.

What is the agent loop?

The agent loop is the cycle where your code sends the conversation and tool list to the model, the model either answers in text or requests a tool, your code executes the requested tool and appends the result, and the model gets called again. The loop ends when the model replies with text instead of a tool request.

What is the difference between an AI agent and an LLM call?

Tools and execution. A plain LLM call takes a prompt and returns text. An agent gets a list of tools it can decide to use, and your code executes those decisions and feeds the results back, which lets the model act on files, APIs, and systems instead of only generating words.

How many LLM calls does an agent need to answer one question?

At least two when a tool is involved. The first call is the model deciding it needs more information and picking a tool. The second call happens after your code executes the tool, when the model turns the result into an answer. Questions that need no tool resolve in one call.

What does a tool definition look like to the model?

A name, a description, and an input schema. That's the full contract. The model never executes anything itself; it fills in the parameters and your application runs the matching function.