Workflows9 min read
How AI agents actually work
An AI agent is an LLM call, a list of tools, and a loop. A walkthrough of a working coding agent in a little over 200 lines of Python, no framework required.
You can build a working AI coding agent in a little over 200 lines of Python. No framework, no orchestration layer, one main.py file. It lists the files in a directory, reads them, and edits them on instruction. I asked it to add a print statement to a file and it did.
That small demo carries the answer to how AI agents work. An agent is a connection with a large language model where you give it a list of tools, the model decides when to use one, and your code executes the tool and feeds the result back. Cursor, Claude Code, and Copilot are this loop with more tools and a much bigger system prompt.
If you've played with agent frameworks but want to understand what they do behind the scenes, this walkthrough is for you. It follows my video on how AI agents actually work and a repo you can clone and run script by script. Credit first. The original repository comes from Francis, a member of our Data Freelancer community, and I forked his work into a runbook of seven scripts that build the agent step by step.
An agent is an LLM, a list of tools, and a loop
All an agent really is, is an LLM call where the model gets a list of tools, decides whether it needs one, and hands the decision back to your code for execution. That cycle is the difference between an agent and a plain call to a large language model.
The agent class that carries it needs exactly three attributes. A client, the Anthropic Python SDK wrapper you use to talk to the model. A messages list, which holds the conversation as dictionaries with a role and content, the same format OpenAI and Anthropic both use. And a tools list, which holds the tool definitions.
That's the whole state. No vector database, no planner, no graph.
Function calling is the one prerequisite worth learning before any of this, because tools are what make this an agent rather than a chat completion. Everything below builds on that principle.
Tools are specifications, not functions
A tool is an object with a name, a description, and an input schema. The definition exists so the model can answer one question. What actions can I take, and what parameters do they need?
In the repo, each tool is a Pydantic BaseModel with those three fields, matching the data types the models expect. If you're building AI agents with Python, Pydantic is a must-have library, and the rest of the walkthrough assumes you know its basics.
A minimal coding agent needs three tools:
read_filetakes a file path and returns the contents.list_filestakes a path and lists the files and directories inside it.edit_filetakes a path, anold_text, and anew_text, and creates the file if it doesn't exist.
Notice how small the input schemas stay. The LLM only needs to figure out where it is and what the path should be. Python handles the rest.
At this stage nothing executes. Defining tools and implementing them are separate steps, and keeping them separate is what makes the whole thing legible. When the third script runs, it prints "agent initialized with three tools" by taking the length of the tools list. Specification done.
Python does the heavy lifting
The implementations behind those three specifications are standard library code you've probably written before. This is the shape of them:
def read_file(path: str) -> str:
with open(path) as f:
return f.read()
def list_files(path: str) -> list[str]:
items = []
for item in sorted(os.listdir(path)):
full = os.path.join(path, item)
items.append(f"{item}/" if os.path.isdir(full) else item)
return items
def edit_file(path: str, old_text: str, new_text: str) -> None:
content = ""
if os.path.exists(path):
with open(path) as f:
content = f.read()
with open(path, "w") as f:
f.write(content.replace(old_text, new_text))That edit function deserves a second look. It reads the file, runs Python's built-in content.replace() on the old text, and writes the result back. Pass an empty old_text and you've created a new file. When I tested it with "hello world" and then asked it to swap that for "hello world, how are you?", the file changed in place.
Inside the agent class, the same functions get try/except wrapped around them so a failed call returns an exception message instead of crashing the application. The agent reads the error and can course-correct from there.
There's one more piece, an execute_tool function that takes the model's output, loops through it, and checks which tool the model picked. Did it decide to use read_file? Go execute it. Plain if/else, nothing clever.
And this is already pretty much what a coding agent does. If you've worked with Cursor, Copilot, or Claude Code, this is the move. Go through the files, read everything, take the developer's instruction, and replace old text with new text.
The chat loop is where it becomes an agent
Everything up to this point runs without AI. The fourth script calls agent.list_files() directly as a plain method, which only tests that the function works; the model isn't involved yet. The chat method is where it takes over the decisions.
The flow starts simple. Take the user's input and append it to the messages list as a dictionary with role "user". Convert the Pydantic tool definitions to the exact format the Anthropic SDK wants. Send messages and tools to Claude Sonnet 4.5. Then parse what comes back, because the response can contain two kinds of content:
for block in response.content:
if block.type == "text":
# the model answered directly, collect the text
elif block.type == "tool_use":
# the model wants an action: append the tool_use block
# (id, name, input), execute the tool, append the
# result to messages, and call the model againA text block means the model decided to just reply. Ask "what can you help me with?" and it describes its file operations without touching a tool. A tool_use block means the model decided it needs to do something first, like reading a file, listing a directory, or making an edit. Your code appends the tool_use content in Anthropic's required format, executes the function, appends the result, and sends the whole conversation back to the model for another pass.
Run it with "what files are in the current directory?" and the print statements show the sequence. First the model calls list_files, then it responds. Behind the scenes that's two LLM calls. The first one decides "I need a tool, I need more information." The second one has the file list in context and writes the answer.
I fully understand this feels abstract the first time. There's a lot of if/else going on. Go through it a couple of times, and use AI to fill in the gaps. Paste a piece of the code and ask what it's doing and why. But this is the main agent loop powering almost all agentic AI applications today. Anthropic, Google, OpenAI, they all run on this principle of function calling and tools.
GenAI Accelerator
The gap between a demo and production
Anyone can wire up an LLM call. The real skill is designing, evaluating, and shipping systems that hold up.
A while loop and a system prompt make it a product
The sixth script wraps the chat method in an interactive CLI. A while True loop captures user input, breaks when you type exit or quit, and prints the assistant's response each turn. Same loop shape as the agent itself, one level up. With that running, you have an AI coding assistant living in your terminal, similar to how you work with Claude Code.
Script seven adds a personality. The agent gets a system prompt telling it that it operates in a terminal environment and should output plain text. My first version still produced asterisks everywhere, so I added one line telling it not to use any asterisk characters in its response. Ran it again, clean output. That tweak cycle, edit the prompt, run, inspect, is also how you debug an agent.
The system prompt is also where the real products differentiate. The system-prompts-and-models-of-ai-tools repository on GitHub collects the prompts behind popular tools, and Cursor's agent prompt is extensive. Our agent has three tools and a couple of instruction lines. Cursor has the same loop with far more machinery specified on top. This is where Cursor starts, and where Claude Code starts. You can keep adding functionality from here.
The final main.py adds logging, writing every step to an agent.log file. Then came the test run. I asked it to create a short test.py to prove it can create files, asked it to add another function (it wrote calculate_square and wired it into main), and finally asked it to empty the file. It did all three.
A small thing I love about working this way is that typos don't matter. The model looks at tokens. You can mangle the instruction and it still understands.
Run it yourself
Clone the repo and work through the runbook in order. Each script is the previous one plus one new concept:
| Script | What it adds |
|---|---|
| 01 basic | Confirms the script runs and the API key loads |
| 02 agent class | Client, messages list, tools list |
| 03 define tools | The three tool specifications in Pydantic |
| 04 implement tools | The Python functions plus execute_tool |
| 05 chat method | The agent loop |
| 06 interactive CLI | while True input loop with exit and quit |
| 07 personality | The system prompt |
You run everything with uv, a fast Python package and project manager that can replace pip. Each file carries inline dependency metadata, a uv feature that runs scripts with dependencies like anthropic and pydantic without setting up a virtual environment. You also need an API key from console.anthropic.com. I used Claude Sonnet 4.5, which had been released the same day I recorded the video, and it handled every test.
Once the loop clicks, frameworks stop being magic. You can look at any agent library and see the messages list, the tool registry, and the execution loop underneath the abstractions. That mental model is the foundation for the engineering work in how to build production AI systems, and the natural next step is learning how to build reliable AI agents, where the question shifts from how the loop works to where it belongs in a real system.
FAQ
Do I need a framework like LangChain to build an AI agent?
No. The working agent here is a little over 200 lines of plain Python using the Anthropic SDK and Pydantic. Frameworks wrap this same loop in abstractions, which can help later, but building it once without one teaches you what every framework is actually doing.
What is the agent loop?
The agent loop is the cycle where your code sends the conversation and tool list to the model, the model either answers in text or requests a tool, your code executes the requested tool and appends the result, and the model gets called again. The loop ends when the model replies with text instead of a tool request.
What is the difference between an AI agent and an LLM call?
Tools and execution. A plain LLM call takes a prompt and returns text. An agent gets a list of tools it can decide to use, and your code executes those decisions and feeds the results back, which lets the model act on files, APIs, and systems instead of only generating words.
How many LLM calls does an agent need to answer one question?
At least two when a tool is involved. The first call is the model deciding it needs more information and picking a tool. The second call happens after your code executes the tool, when the model turns the result into an answer. Questions that need no tool resolve in one call.
What does a tool definition look like to the model?
A name, a description, and an input schema. That's the full contract. The model never executes anything itself; it fills in the parameters and your application runs the matching function.
