InfrastructureUpdated 8 min read
How to build webhooks for your AI platform
Webhook architecture for production AI systems. Verify signatures, store events, enforce idempotency, dispatch async workers, and recover failures.
Webhooks are the trigger layer of an AI platform.
A customer sends a WhatsApp message. A payment succeeds. A form is submitted. A new file lands in a tool you use. Each event arrives as an HTTP request, and your system has to decide whether to trust it, store it, and turn it into work.
That sounds small. It is not.
In production AI systems, the webhook layer is where a lot of reliability is either created or lost. If the handler is slow, the sender retries. If the signature check is weak, anyone can push fake events into your system. If you process before storing, one worker crash can erase the only copy of the event.
The pattern I use is simple. Verify, persist, dispatch.
The handler receives the request, verifies the sender, stores the event, sends a task to a worker, and returns 200. Every slow or risky step happens after that, outside the request cycle.
Where webhooks fit in an AI platform
In the AI operating system architecture from the webhook and agent platform video, there are three layers:
- Webhooks and API events for trigger-based actions.
- Scheduled workflows for recurring jobs.
- Agents for user-invoked work through chat or another interface.
Webhooks are layer one. They are the best fit when an external system needs to notify you that something happened.
This could be a WhatsApp message, Stripe payment, GitHub issue, Linear update, CRM change, file upload, or support ticket. The integration changes. The architecture should not.
Most webhook handlers should do the same five things every time:
- Read the raw request body.
- Verify the sender signature.
- Validate and store the event.
- Enqueue work for a background worker.
- Return a fast 2xx response.
That is the boundary. The AI workflow can be complex, but the intake layer should stay plain.
Keep the handler small
The handler should not call an LLM.
It should not run an agent. It should not call three external APIs. It should not write a long report, update a CRM, search a knowledge base, or wait for a tool loop to finish.
Those jobs belong in a worker.
Here is why. A sender such as Stripe, GitHub, Meta, or Shopify expects your endpoint to answer quickly. If your handler waits eight seconds for an LLM call and the sender times out after five, the sender assumes delivery failed. It retries. Now your system may process the same event twice.
The bug looks random from the outside. You get duplicate WhatsApp replies, repeated invoice syncs, repeated agent runs, or two database records for one real-world event.
The cause is usually boring. The request handler did too much.
Treat the handler as an intake clerk. It checks the sender, writes down what arrived, hands off the job, and moves on.
Verify the signature before trusting the payload
Webhook endpoints are public by design. Anyone who knows the URL can send a request unless you verify the sender.
Most serious webhook providers sign the payload. The exact format differs, but the common pattern is HMAC-SHA256 over the raw request body using a shared secret. The provider sends the signature in a header. Your endpoint recomputes the signature and compares the two values.
Use a constant-time comparison.
In Python, that usually means secrets.compare_digest(). Do not use == for signatures. Normal string comparison can leak timing information, and this is an easy place to use the safer primitive.
Also, hash the raw request bytes. Not parsed JSON. Not json.dumps(payload).
FastAPI makes it easy to receive a Pydantic model, but signature verification needs the exact bytes the sender signed. Parsed and re-serialized JSON can change whitespace, key order, and encoding. A real request can fail verification even when the payload is legitimate.
If verification fails, return 403 and log the attempt with metadata such as provider, endpoint, timestamp, and request ID. Do not return a detailed explanation to the caller.
Persist the event before processing
Once the signature is valid, store the event before you do anything else.
This is the part that saves you later.
Store the raw or validated payload in Postgres. Include the provider, event type, received timestamp, processing status, and a stable idempotency key. That key is usually the sender event ID, message ID, payment ID, or delivery ID.
Persistence gives you three things:
- A replay log when a worker fails.
- A debugging trail when a workflow behaves strangely.
- A place to enforce idempotency before work starts.
Without this, an event can disappear between "request received" and "task completed." That is not production infrastructure. That is a best-effort script with an HTTP endpoint attached.
Make idempotency explicit
Webhook senders retry. They retry when your endpoint times out, when the network drops, when you return a non-2xx response, and sometimes when their own delivery system is uncertain.
Your system should expect the same event more than once.
The fix is a unique constraint on the idempotency key. Before enqueuing work, check whether the event has already been accepted. If it has, return 200 and do not enqueue another task.
That response matters. From the sender's point of view, the event has been accepted. You do not want to keep triggering retries for an event you already stored.
Idempotency should happen at the database boundary, not only in memory. A local cache can disappear on deploy, restart, or crash. The database is the authority.
Dispatch work to async workers
After the event is stored, enqueue a task and return.
In my platform setup, FastAPI receives the request, Redis holds the queue, and Celery workers execute the tasks. The same pattern works with other stacks too, like BullMQ in TypeScript, Sidekiq in Ruby, a managed queue, or a cloud task system.
The task payload should stay small. Pass the idempotency key, not the whole world. The worker can load the event from Postgres, mark it as processing, run the workflow, store the result, and mark it as completed.
That worker is where the real AI platform work belongs:
- Run a deterministic workflow.
- Call an LLM or agent.
- Pull context from a file system or database.
- Call external APIs.
- Send a notification or response.
Now the slow part is isolated. If the LLM is down, the webhook sender does not need to know. If the agent fails, you can retry the task. If processing takes a minute, the original HTTP request has already been acknowledged.
Keep going
Start with production-ready code
GenAI Launchpad is the boilerplate behind our client builds, with webhooks, schedulers, and agents wired up and ready to extend.
Handle failures like part of the design
Workers fail. APIs time out. Models return bad structured output. Deployments happen at awkward moments. One event exposes a payload shape you did not know existed.
Do not treat that as exceptional. Build for it.
Configure retries with backoff for failures that might recover. Store each attempt count and error message. After retries are exhausted, move the job into a dead-letter queue and alert somewhere you actually check.
Treat the dead-letter queue as a repair queue, not a trash folder.
Each entry should answer a few basic questions:
- Which event failed?
- Which workflow handled it?
- What error occurred?
- How many times did it retry?
- Can it be replayed safely?
This is especially important for AI workflows because the failure may be downstream of the webhook. The intake might be fine, but the agent call, schema validation, tool call, or external API can fail later. If you only log the original HTTP request, you miss the actual problem.
A FastAPI and Celery example
This example shows the shape of the handler. Real integrations need their provider-specific header names, timestamp checks, and signature formats.
from fastapi import FastAPI, Header, HTTPException, Request
from pydantic import BaseModel
import hashlib
import hmac
import secrets
from app.db import event_exists, store_event
from app.tasks import process_inbound_event
app = FastAPI()
WEBHOOK_SECRET = "replace-me"
class InboundPayload(BaseModel):
event: str
data: dict
def verify_signature(raw_body: bytes, provided: str | None, secret: str) -> bool:
if not provided:
return False
expected = hmac.new(
secret.encode("utf-8"),
raw_body,
hashlib.sha256,
).hexdigest()
return secrets.compare_digest(expected, provided)
@app.post("/webhooks/inbound")
async def inbound_webhook(
request: Request,
payload: InboundPayload,
x_signature: str | None = Header(default=None),
) -> dict[str, str]:
raw_body = await request.body()
if not verify_signature(raw_body, x_signature, WEBHOOK_SECRET):
raise HTTPException(status_code=403, detail="Invalid signature")
idempotency_key = payload.data.get("id")
if not idempotency_key:
raise HTTPException(status_code=400, detail="Missing event id")
if event_exists(idempotency_key):
return {"status": "already_accepted"}
store_event(
idempotency_key=idempotency_key,
provider="inbound",
event_type=payload.event,
payload=payload.model_dump(),
status="pending",
)
process_inbound_event.delay(idempotency_key)
return {"status": "accepted"}The important detail is the order. Verify first. Store second. Dispatch third.
The worker receives the idempotency key, loads the event, runs the workflow, and updates the status. That workflow can be a normal Python function, a DAG-style process, an agent call, or a longer automation. The handler does not need to know.
Use separate endpoints per integration once the system grows, such as /webhooks/whatsapp, /webhooks/stripe, /webhooks/github, and so on. Each provider has a different signature format and payload shape, but the intake pattern stays the same.
Production traps to avoid
| Trap | What happens | Better pattern |
|---|---|---|
| Calling an LLM inside the handler | The sender times out and retries a valid event | Enqueue the task and return 200 |
| Verifying parsed JSON | Legitimate signatures fail after parsing changes the bytes | Hash the raw request body |
| Skipping idempotency | Retry spikes create duplicate work | Use a database unique constraint |
| Processing before persistence | Crashes erase the only copy of the event | Store the event first |
| Dropping exhausted tasks | Failures vanish from view | Use a dead-letter queue with alerts |
The first trap is the one I see most often. It starts as one quick API call. Then the handler grows into the whole workflow.
Keep the boundary clean. The handler accepts work. The worker does work.
Where to go next
Webhooks are one infrastructure layer inside the larger production AI systems stack. The same intake pattern also supports scheduled jobs and agent workflows because everything becomes a durable event that workers can process.
For the broader architecture behind this setup, read how to build your own AI platform. If you want a production-ready starting point with webhooks, schedulers, workers, and monitoring already wired together, the GenAI Launchpad is the closest match.
