Workflows12 min read

How I think about a personal AI operating system

Why I'm building my personal AI operating system from first principles instead of cloning agent harnesses like OpenClaw, with three layers, a boring backend, and a context hub.

The whole developer community has been raging about AI agent harnesses these past weeks. OpenClaw, Hermes Agent, Nano Claw, Zero Claw, all the claws that are out there right now. And I get it. This is one of the most exciting things we've seen in AI since the introduction of LLMs.

But one thing concerns me. Everyone seems to be jumping on these out-of-the-box GitHub repositories. Clone the repo, plug in your API keys, give it access to your WhatsApp and your credit card, and go. We've already seen countless reports of leaked API keys and leaked data, which is exactly what you'd expect when you give a powerful system access to a whole bunch of your personal information.

So in this post I want to share a conceptual blueprint of how I'm building my own AI operating system for my company Datalumina from the ground up. Not a complete tutorial, that may come later. A way of thinking about the architecture, so you can start building a personal AI operating system that fits you.

Experiment with OpenClaw and Hermes, then build your own

My take on this is pretty similar to what I said when large language models first came out. Back then everyone jumped on frameworks like LangChain and CrewAI. Great to start with. But new technologies attract everyone at once, and the projects bloat. They blow up. You end up with layers of abstraction you never asked for.

The same thing is happening with agent harnesses right now, and the open source dynamic makes it inevitable. These tools need to work for a whole lot of people. So instead of just the WhatsApp integration you need, you get ten other connectors you'll never touch. Every integration, every config option, every abstraction is in there because someone else needed it. You carry all of it, including the attack surface.

That doesn't mean you should ignore these tools. The opposite. I recommend every developer to pull up the OpenClaw and Hermes Agent repositories and experiment with them, because there are real ideas in there. What I do is look at the repositories, try to understand what I can learn from them, and reverse engineer the parts that fit my specific needs. The soul.md idea in my own system came straight from OpenClaw, more on that below.

Building it yourself makes you a better engineer. You also end up with a codebase you can actually maintain, understand, and tailor to your own use cases.

What an AI operating system actually is

This week I saw a presentation by Jensen Huang, CEO of Nvidia, where he showed a slide on what an AI platform or operating system really is. It starts with multimodal input. Text, voice, images, pretty much anything you throw at it. It has short-term and long-term memory. There are the models, the actual LLMs driving the system. Those models can use sub-agents, which nowadays are mostly skills in the form of markdown files. The agents can use tools, CLI commands, and MCP servers to integrate with external services. They get some form of computer use, so a browser and terminal commands. And they work with files, both structured data in a database and unstructured data on a file system the agent can crawl through and search.

That's the conceptual picture. The question nobody really has the answer to right now is how to set this all up. Where does it live? How do you run it in a scalable and secure way?

Think about a traditional operating system, the one you're on right now. Whenever you want new capabilities, you install an app or reuse a tool you already have for a new use case. An AI operating system should work the same way. Install new capabilities. Don't rebuild the entire system every time, and don't spin up another GitHub repository, deployment, and configuration for every new workflow.

The three layers: triggers, schedules, and agents

Beyond the core building blocks, there are three architectural layers to take into account.

LayerTriggerExample
Trigger-based actionsExternal eventAn email comes in, a new subscriber joins your list
Scheduled workflowsTimeEvery Tuesday at 9:00 a.m., run a competitor analysis
User-invoked agentsYouA WhatsApp message, a Slack command, a Claude Code session

Layer one is the if-X-happens-then-do-Y pattern. Webhooks, API endpoints, event-driven processing.

Layer two is your cron jobs and recurring tasks. Every Tuesday at 9:00 a.m. I want a competitor analysis, so a series of agents does research, creates a report, and reports back to me.

Layer three is the agent layer, user-invoked and typically via chat. You pick up your phone, send a message through whatever integration you've set up (Claude Code, WhatsApp, Slack, Telegram), and that triggers an agent. Instead of a direct if-this-then-that, the agent dynamically decides what to do and maybe asks follow-up questions about what information it needs to pull in.

Layers one and two are backend processes. They run around the clock. When you're on the beach, they keep running. The agent layer is where you automate personal tasks, stuff you'd normally do yourself, now done through an agent in a loop with you. You give input, the agent comes back, you give feedback.

This split matters for reliability. Business automations and processes that need to run in the background belong in layers one and two, because they work without human intervention. Don't push everything through the agent layer.

The beauty of bringing this together in one operating system is that the layers work together. Your agents can trigger the webhooks and run the cron jobs, then use the results in their own context, because everything is built on the same infrastructure.

Where the system lives

The big question before any of the layers is where to put this. Do you buy a Mac Mini? Put it on a server? Run it locally? It can all work. It depends.

I decided to put everything on a server in the cloud, managed through Docker Compose. It's a Python-based backend. FastAPI creates the endpoints, and Caddy acts as the reverse proxy and handles HTTPS for incoming requests. Every time I integrate with a new system (email, a booked meeting, a published YouTube video), events come in via webhooks, hit an API endpoint, land in a Redis queue, and get executed by Celery workers. Celery Beat handles the scheduled jobs. Postgres stores the events and metadata.

The pattern inside every webhook handler stays the same. Verify, persist, dispatch. Verify the signature so other people can't send data into your system. Store the event in the database so you never lose data. Then dispatch the processing to a background worker. I cover this layer in depth in how to build webhooks for your AI platform.

The language and tools don't matter that much. You can build this in Go, in TypeScript, in whatever you're comfortable with. What matters is that you understand the architectural layers you need to orchestrate.

Right now this is a single repository on a single deployment. I don't need microservices. I want a simple system where I understand every building block and know exactly what to do when I want to add a capability. There's a CI/CD pipeline through GitHub Actions. I push to main, it auto-deploys to the server, and I get a Slack notification saying we're good to go again.

Monitoring runs through Sentry and Grafana, and that part is critical. When you're rapidly expanding a system like this and using AI to build it, you don't read all of your code anymore. We're past that point. Errors will happen. Sentry catches them, and Grafana keeps the server alive.

The agent layer in practice

The agent layer is what tools like OpenClaw made popular, a simple interface on your phone through which you can set powerful AI agents to work.

The flow runs on the same infrastructure as layer one. WhatsApp receives a message, sends a webhook to your API endpoint, you verify it, persist it, and dispatch it to a worker. But instead of a predefined series of functions and classes, the processing step is an AI agent. When the agent is done, it replies back through the same channel and the loop closes.

There's a spectrum for how heavy that agent should be. On the light end, a single LLM API call with a few tools, set up with something like Pydantic AI. On the heavy end, the Claude Agent SDK spawning a full Claude Code subprocess in the cloud, which can do anything you can do with Claude Code behind your own computer. That's the heaviest agent you can run right now.

One idea I took directly from OpenClaw is the soul.md file. It's a deep file describing who you are at the core. It covers your values, your mission, your goals, how you want to show up, why you're doing things. Start the system prompt with that soul, follow with the task instructions, and give it to a heavy model like Opus 4.6. Especially when you're chatting with it, things just get really cool. Highly creative and highly personalized, depending on how deep you want to go.

My WhatsApp agent is deliberately lean. It can do web search, so I can say "do some research on X" from my phone. It has a tool to save content ideas, so when a YouTube video idea hits me, I talk to WhatsApp and the idea lands on my file system, ready to pick up when I'm behind my computer. And it has the most powerful tool, delegate task, which spawns that full Claude Code subprocess in the cloud.

That last one gets pricey. Tell Opus 4.6 to go do research and it might run for 10 minutes and cost five bucks. For some cases that's totally fine, but just from quick experiments I hit $50 in API costs. That's why every delegated run gets max turns and a max budget, and why I keep deciding per workflow whether it needs the full Claude Code runtime or whether Pydantic AI with an LLM and some tools is enough. Quick responses? Sonnet with basic tools works. Heavy-duty research? Opus in a full Claude Code runtime, with the costs monitored. Choosing the right level of autonomy per task is the actual engineering decision, and it's the same trade-off I describe in how to build reliable AI agents.

Keep going

Start with production-ready code

GenAI Launchpad is the boilerplate behind our client builds, with webhooks, schedulers, and agents wired up and ready to extend.

Get GenAI Launchpad

The context hub

The backend is half the system. The other half is data and context.

I run a PostgreSQL database for event data and metadata, plus a file system the agents can crawl through. The file system lives in its own GitHub repository, almost everything markdown, version controlled and searchable. It primarily lives on my computer, synced through GitHub, so cloud workflows can pull it into memory while I work on the same files locally with Claude Code.

The top level has folders for identity, inbox, areas, projects, knowledge, and archive. Identity holds who I am. Mission, goals, values, both business and personal. Inbox is the dump zone for ideas and should be close to empty after you've processed things. Areas is growing fastest. Content, products, clients, health, anything I want to build automations around gets a folder. Projects holds active builds and research before they graduate into an area. Knowledge is the knowledge base for research, SOPs, and documents that don't belong to one area. And archive is where old files go, with the agents instructed to stay out, because agents that crawl through stale files bloat their own context window.

One thing I added to this context layer is tiered context loading, an idea I got from a project called Open Viking. Every folder gets an abstract.md, literally one line answering "what is this folder?", and an overview describing the workflows and relationships involved. The full files are the third tier. An agent can scan the entire repository in under 2,000 tokens by reading just the abstracts, then go one level deeper only where it needs to. Without this, agents pick into files, read everything, bloat the context window, and then figure out they didn't need any of it. You can enforce the system through your CLAUDE.md or agents.md rules, and it becomes the agent's navigation for the whole file system.

The context hub is also where skills live, the new capabilities you want your agents to have. I'm working on a LinkedIn writer, a YouTube packager, a thumbnail creator, an AI pulse skill that scans what's going on around the topics and people I follow, and a slide creator. The slides in the video this post is based on were created by an agent using that skill. I wrote the story and the narrative, the agent produced the visuals.

How I'd start

Start from first principles. Instead of cloning the next hyped GitHub repository and putting all your data in it, think about what you actually need, what you want to build, and what stack you're comfortable with. Use Claude Code, use AI, but be in the loop and be precise about how you build. Once the foundation is solid, that's when you can go heavy with AI.

Then start with the layer that solves your most immediate problem. Mostly automating personal stuff you do behind a computer? Agents with skills are enough, and you may not even need a dedicated deployment. Integrating with a lot of external systems, or automating something for a business, a department, or a team? Look at layers one and two.

Persist everything. These agents produce things, events run, and processes will fail and get stuck. A storage layer that leaves traces, events in my case, is what lets you debug and maintain the system over time.

Think long-term. You can set something up in a weekend, but treat it as a foundation to build on, not a finished product. New tools will keep coming out, and you want a foundation that grows with you instead of reinventing the wheel every time.

And context is king. Your AI is only as good as the context it has access to, so the context hub deserves as much attention as the agent runtime.

I also want to be transparent. I'm literally just figuring this out. The webhook automations and scheduled workflows are standard software engineering, the kind of thing I cover in how to build production AI systems. The interconnectivity part, Claude Code agents running with skills and markdown context, is highly experimental, and it's not something we implement for clients yet. In the beginning it will produce suboptimal output. Stay in the loop, improve the skills, and improve the tools they use.

FAQ

Should I use OpenClaw or build my own AI operating system?

Experiment with OpenClaw and Hermes Agent, you'll learn a lot from how they're put together. For a system you trust with your personal data, build your own. Community tools have to serve everyone, so they ship integrations and abstractions you'll never use, and you inherit their attack surface along with them. Reverse engineer the ideas that fit your needs instead.

Is a personal AI operating system production-ready?

Parts of it. Webhooks, schedules, workers, databases, and deployment are standard software engineering. The agent layer with personal context, skills, and broad tool access is experimental, and it's not something I implement for clients yet.

Do I need a cloud server to run this?

Not always. If your workflows are personal and interactive, agents with skills running locally in Claude Code may be enough. Once you need webhooks, scheduled background work, or always-on integrations, a small server with Docker Compose becomes useful.

What is the most important part of an AI operating system?

Context. Your AI is only as good as the context it can reach. A well-structured context hub with tiered loading often matters more than the choice of model or interface.

How do I keep agent costs under control?

Set max turns and a max budget on anything that spawns a full coding-agent runtime. A single Opus research task can run 10 minutes and cost five bucks, and I hit $50 just experimenting. Use a lighter model with a few tools for quick tasks and reserve the full Claude Code runtime for heavy work.

Written by

Dave Ebbelaar

Dave Ebbelaar

Senior AI Engineer

AI engineer and founder of Datalumina. Dave helps developers build production AI systems and turn technical skills into client work.