EngineeringUpdated 11 min read

How to build your own AI platform

A practical architecture for a personal AI platform with webhooks, scheduled workflows, agents, durable events, and a tiered context hub.

Most AI platform projects start in the wrong place.

Someone finds a new agent harness on GitHub, clones it, plugs in API keys, connects WhatsApp, adds a credit card, and starts handing personal data to a system they barely understand. That is convenient for an afternoon. It is also how you end up with leaked keys, leaked data, and a codebase you cannot maintain once the demo gets serious.

I look at tools like Open Claw differently. I study the repository, learn the useful patterns, then rebuild the parts I actually need. You become a better engineer that way, and you end up with a system you can change without waiting for a framework to catch up.

This article is a practical companion to the broader guide on how to build production AI systems. It is the blueprint behind the AI platform I am building for Datalumina, not a complete tutorial. It covers triggers, schedules, agents, durable events, and a context hub that agents can use without crawling everything blindly.

What an AI platform needs to do

An AI operating system is not one chat interface. It is a way to install new capabilities without rebuilding the whole system every time.

The conceptual pieces are familiar now. You have multimodal input such as text, voice, images, and files. You have short-term and long-term memory. You have LLMs that can call tools, sub-agents, CLI commands, MCP servers, browsers, terminals, APIs, databases, and files. You also have unstructured context on a file system, which is very different from traditional application data in a relational database.

The open question is how to run all of that in a way that stays understandable. Where does it live? How do new workflows get added? How do you keep one integration from becoming a separate repository, deployment, and configuration problem?

I think about it as three layers on the same infrastructure.

The three platform layers

Layer one is trigger-based actions. If X happens, do Y. These are webhooks, API endpoints, and event-driven workflows. An email comes in. A new subscriber joins your list. A form is submitted. A meeting is booked. The external system fires an event, and your platform receives it.

Layer two is scheduled workflows. These are cron jobs and recurring tasks. Every Tuesday at 9 a.m., run a competitor analysis. Every morning, generate a report. Every hour, recover stuck events or sync CRM data.

Layer three is the agent layer. This is user-invoked, usually through chat. You send a message through WhatsApp, Slack, Telegram, Claude Code, or another interface. An agent decides what to do, asks follow-up questions when needed, calls tools, and sends a result back.

LayerTriggerWho invokes itWhen it runs
Layer 1, triggersWebhook or API eventExternal systemWhen an event fires
Layer 2, schedulesCron or Celery BeatThe schedulerOn a fixed interval
Layer 3, agentsChat messageYouWhen you send a message

Each layer solves a different problem. Agents are useful for personal tasks where you want to stay in the loop. Trigger-based actions and schedules are better for business workflows that need to run in the background.

Start with the layer that solves the immediate problem. The advantage of one platform is that the layers can call each other later. An agent can trigger a workflow. A scheduled job can pull context from the same hub. A webhook can create an event that a worker, agent, or report uses downstream.

Where the platform runs

My current setup runs on a cloud server with Docker Compose. It is a Python backend, but the exact language matters less than the architecture.

FastAPI handles the endpoints. Caddy sits in front as the reverse proxy and terminates HTTPS for incoming requests. That is how the webhook endpoints are exposed to services like WhatsApp, email, meeting tools, forms, YouTube, and other systems.

When an event comes in, Redis acts as the queue. Celery workers pick tasks from the queue and run the actual work. That separation is important. The API receives and acknowledges the event quickly. The worker handles the slow part later.

Celery Beat gives the same system scheduled jobs. The same workers that process webhook events can also run daily reports, competitor analysis, cleanup jobs, and CRM syncs. The worker can talk to Postgres, external APIs, tools, the file-based context hub, and the Claude Agent SDK when a task needs a full agent runtime.

Right now this is one repository on one deployment. No microservices. I do not need them yet. I want one system where I understand every building block and know exactly what to change when I add a capability.

I also wired up CI/CD early. I build locally, push to main, GitHub Actions runs, the server auto-deploys, and Slack tells me when the deployment is done. That matters because a platform like this grows through small workflow additions.

Layer one, trigger-based actions

Trigger-based actions are webhooks, API endpoints, and event-driven workflows.

Inside FastAPI, I define endpoints for each source system. In the source application, I configure the webhook. When this event happens, send this payload to this endpoint.

Every event gets persisted to Postgres before any real work runs. Then it gets processed asynchronously by a worker. The worker can be plain Python code, a DAG-style workflow, or an agent, depending on the task.

Folder structure matters more than people expect. If every integration and workflow lands in the same file, the platform turns into a pile of scripts. Give workflows their own subfolders. Make it easy to find the code that handles a specific event.

Security is not optional. If Meta is sending WhatsApp messages into your system, verify the signature before you trust the payload.

Python
from fastapi import FastAPI, Header, HTTPException, Request
import hmac, hashlib, secrets
 
from app.db import store_event
from app.tasks import process_webhook
 
app = FastAPI()
 
def verify_signature(raw_body: bytes, provided: str | None, secret: str) -> bool:
    if not provided:
        return False
    expected = hmac.new(secret.encode(), raw_body, hashlib.sha256).hexdigest()
    return secrets.compare_digest(expected, provided)
 
@app.post("/webhooks/whatsapp")
async def whatsapp_webhook(
    request: Request,
    x_hub_signature_256: str | None = Header(default=None),
):
    raw_body = await request.body()
 
    # Step 1. Verify the webhook is legitimate.
    if not verify_signature(raw_body, x_hub_signature_256, WHATSAPP_SECRET):
        raise HTTPException(403, "Invalid signature")
 
    # Step 2. Parse and persist so the event is not lost.
    payload = await request.json()
    event_id = payload["data"]["id"]
    store_event(event_id, payload)
 
    # Step 3. Dispatch to a Celery worker in the background.
    process_webhook.delay(event_id, payload)
    return {"status": "accepted"}

Verify. Persist. Dispatch. That is the handler.

Layer two, scheduled workflows

Scheduled workflows use the same backend, but the trigger is time.

In Python, I use Celery Beat. Any cron-style tool can work. Decide which actions need to run on a regular interval, set the schedule, and point it at the code.

The pattern is the same as webhooks. A trigger fires. A worker runs. The event updates in the database. The output is stored so you can inspect what happened later.

For the workflow code itself, I often use a DAG, a directed acyclic graph of nodes. Run this node first, pass data to the next node, then continue. It is similar to how tools like n8n or Zapier think about workflows, but fully in Python. We have been building client automations this way for about 3 years, and that same structure plugs into the platform cleanly.

Python
from celery.schedules import crontab
 
CELERYBEAT_SCHEDULE = {
    "daily-report": {
        "task": "app.tasks.run_workflow",
        "schedule": crontab(hour=7, minute=0),
        "args": ["daily_report_workflow"],
    },
    "competitor-analysis": {
        "task": "app.tasks.run_workflow",
        "schedule": crontab(hour=9, minute=0, day_of_week="tuesday"),
        "args": ["competitor_analysis_workflow"],
    },
}

When the task fires, the worker imports the workflow, calls something like daily_report_workflow.run(), and writes the result back to the event record. Pending becomes completed. If it fails, you have a trace to debug.

Keep going

Start with production-ready code

GenAI Launchpad is the boilerplate behind our client builds, with webhooks, schedulers, and agents wired up and ready to extend.

Get GenAI Launchpad

Layer three, the AI agent layer

The agent layer is what made tools like Open Claw feel exciting. A simple interface on your phone can kick off real work.

In my setup, a WhatsApp or Slack message starts the flow. The message triggers a webhook. The webhook hits FastAPI. The request gets verified, persisted, and dispatched to a worker. The difference is that the worker does not call a fixed Python function. It runs an AI agent.

There are levels to this. For lightweight work, the agent can be a normal LLM API call with a few tools, built with Pydantic AI, LangChain, or another framework. For heavier work, the worker can use the Claude Agent SDK to spawn a Claude Code subprocess in the cloud.

That spectrum matters. A simple LLM-with-tools agent is cheap and quick. A full Claude Code runtime can do much more, but it can also run for 10 minutes and cost real money.

The delegate task tool is the most powerful agent tool I have right now. It can spin up Claude Code in the cloud and do the kind of work I would normally do at my computer. I set max_turns and max_budget because this gets expensive fast. I already hit $50 in API costs while experimenting with it.

One pattern I took from Open Claw is a soul.md file. It describes identity, mission, values, goals, how you want to show up, and why you do the work. When you put that before the task instructions, especially with a stronger model, chat-based agents become more personal and more useful.

My WhatsApp agent is intentionally lean right now. It has three tools:

  • Web search for quick research tasks.
  • A content idea saver that writes rough ideas to the file system.
  • A delegate task tool that spawns a Claude Code subprocess with budget limits.

For quick responses, I use a lighter setup. For heavier work, I use a stronger model and a fuller runtime. The platform pattern does not change. Receive the message, persist the event, run the worker, send the result back through the same channel.

Data and context

The backend is only half of the platform. The other half is context.

I use Postgres for event data and metadata. Then I use a separate markdown-based file system for context the agents can crawl. This part is still experimental, but it is the piece that makes agents much more useful.

The top level has six folders. Identity holds mission, values, goals, and personal context. Inbox is where raw ideas land before they are processed. Areas contains long-running parts of the business and life. Projects contains active builds and research. Knowledge stores research, SOPs, and reusable material. Archive holds old information the agents should usually ignore.

Everything is markdown or close to it. Everything is version controlled. Everything is searchable.

The most useful addition is tiered context loading, an idea I picked up from Open Viking. Every folder has three levels, from an abstract.md to an overview to the full files underneath.

The abstract.md is one line explaining what the folder is. The overview goes deeper and describes the area, workflows, and relationships. The full files are opened only when needed. Level zero can scan the whole repository through abstracts in under 2,000 tokens.

That saves a lot of waste. Instead of an agent reading whole files, filling the context window, and only then realizing the document was irrelevant, it starts with abstracts, moves into overviews, and reads full files only when the task calls for it.

You can teach this behavior through a CLAUDE.md, AGENT.md, or rules file for the agent harness you use. That file becomes the navigation layer for the context hub.

Zoom out and the platform has two connected systems. The backend runs on the cloud server. The context hub lives locally, syncs through GitHub, and can be pulled by cloud workflows when a cron job, webhook, or agent needs it. When I am at my computer, I can use local Claude Code against the same files.

Inside the context hub, skills.md files become leverage. I am working on a LinkedIn writer, a YouTube packager, a thumbnail creator, an AI pulse skill that scans the internet for topics I care about, and a slide creator. The slides behind the original video were created with that slide skill. I wrote the story and narrative, then asked the agent to create the visuals.

For your own platform, define the skills you reach for most often first.

Monitoring and observability

I run Sentry and Grafana on top of this.

That is not decoration. When a system is expanding quickly, moving lots of data, and using AI to help write parts of the code, errors will happen. Sentry catches application errors. Grafana helps me watch whether the server is alive and healthy.

You need this before the platform becomes important. Once workflows depend on it, silent failure is the worst failure mode.

What to build first

Build from first principles. Do not start by cloning the next hyped repository and dropping your private data into it. Ask what you need, which stack you understand, and which workflow deserves to exist first. Use AI to help you build, but stay in the loop.

Start with the layer that solves your immediate problem. If you want to automate personal tasks, begin with agents and skills. You might not need a dedicated deployment yet. If you integrate with many external systems, begin with webhooks. If you automate team or business workflows, begin with triggers and schedules.

Persist everything. Agents produce output. Workflows run. Events fail, retry, and get stuck. A durable event record gives you something to inspect when the system behaves strangely.

Think long term. This is not a weekend project you build once and forget. You can set up the foundation over a weekend, but the value comes from adding workflows over time without rebuilding the platform for every new tool.

Context is the multiplier. Your AI is only as useful as the information it can reach. A context hub gives agents the files, skills, identity, and project knowledge they need to do useful work.

Where this is still experimental

I want to be clear about where this stands.

The webhook and scheduled workflow parts are standard software engineering. We have built those patterns for clients for years. The experimental part is the interconnectivity. Claude Code agents run in the cloud, skills and markdown files feed the context, and agents decide when to use which capability.

That part is still messy. Early agent outputs are often suboptimal. The skills need iteration. The tools need limits. The context needs structure. You still need to stay in the loop.

But this is the direction I think serious AI platforms are moving. Not one more chat wrapper, and not a giant framework you do not understand. A small production system with events, workers, schedules, agents, observability, and context, built in a way you can actually maintain.

Written by

Dave Ebbelaar

Dave Ebbelaar

Senior AI Engineer

AI engineer and founder of Datalumina. Dave helps developers build production AI systems and turn technical skills into client work.