CareerMar 13, 2026Updated Jun 11, 202615 min read

How to deliver custom AI solutions for clients

The client AI delivery process behind 50+ custom B2B projects, covering discovery calls, proof of concept vs MVP scoping, two-week sprints priced at 10 to 20k euros, one standardized Python stack, and deployment on a Hetzner VM.

Dave EbbelaarSenior AI Engineer

Most GenAI pilots still fail to deliver measurable impact. I've been repeating this for three years, and it still holds. The models keep getting more capable, and shipping something that survives production is still freaking hard. It's almost never the model. It's the use case you pick, the expectations you set, and the process you run between the first call and the handoff.

At Datalumina we've delivered over 50 custom B2B AI solutions, and we run the whole operation with two people plus subcontractors. This post is the full process, from discovery call to deployed system. It covers how we qualify projects, scope them, price them, build them, and keep them running. If you're an aspiring freelancer or a developer who wants to sell custom AI work, this is the delivery side of the freelance tech roadmap. The same process applies inside a 9-to-5, except the steps are usually split across multiple people and teams.

Discovery is where you win or lose the project

Everything starts with a 45-minute discovery call. The goal is to uncover what the actual problem is, not what the client thinks they need. When LLMs first came out, we had many clients arrive with a specific request. "We see AI can do this now, can you implement it?" Start from that biased point and you skip the due diligence. The cases clients brought us themselves were often too complex, too big in scope, or the ROI was unclear.

So spend real time on the ROI question, and be brutally honest about it. Custom software costs serious money even at today's delivery speed. If there's no clear return behind the solution, say so before anyone writes a proposal.

Then look for the quick wins instead of the moonshots. Right now we're building a document processing pipeline for a client. It's a big, pressing problem, because almost every employee processes documents against a set of standards. It's a great use case, but the number of employees, documents, and document types multiplies the edge cases. Meanwhile, shadow one employee for an afternoon and you'll often find hidden processes that simple automation and integrations can handle, sometimes without AI at all. The spectrum runs from n8n or Zapier-style connections that move data around to an LLM pipeline that processes an entire organization's documents. Open the conversation across that whole range, and pick the quick wins first.

Watch for red flags too. "We want AI because everyone is doing it" opens a conversation, but it doesn't survive the ROI question. No clear success criteria means no way to tell good from bad or to course correct. Some clients expect magic from day one. And confirm data access early. Check with security that you can actually build and deploy what the project requires.

Before committing, we take the client through a use case evaluation. Is the workflow simple and repeatable? Are there clear success and failure rules we can check? Can errors be escalated to a human? In customer care we do this constantly. The system handles the 80% of tickets it can, classifies the 20% it can't, and escalates those to humans. A human-in-the-loop step (a required review or approval) is different from escalation and can get you live quicker. And always ask what happens when the AI gets it wrong. For some workflows someone downstream catches the error. For customer-facing or critical processes, the bar moves.

The projects that pass tend to fall into five categories that keep coming back for us. Those are document processing, content generation, customer support, internal knowledge assistants, and data extraction.

Have the accuracy conversation before you build anything

Your first build typically gets to 70 or 80% accuracy. After iteration, you can usually get to 90%. Some clients expect 99% from day one, because they're used to deterministic software and have bought from traditional development companies for decades. Building with LLMs is a different paradigm, and educating the client on that is critical.

We made this mistake early. We were too excited about the technology and not clear enough up front, and it always comes back to you. Even now, some projects hit the stage where the build is objectively in a good ballpark and the client still feels it's not there yet. The last part is disproportionately hard. That's why most projects fail.

So frame it as iterative development, not one-shot perfection. Deliver quickly, show what version one looks like, and present the plan to improve it. Build feedback loops that make it easy for the client to send feedback back to you. AI is excellent at processing large amounts of feedback and turning it into pull requests you can test, so the loop from "this output is wrong" to a fix keeps getting shorter.

One more expectation to set is client involvement. If domain knowledge is required to judge what good looks like, the client will need to invest time and people. Say that at the start, or you'll eventually hear "we outsourced this because we don't have the capability internally, why do we still have to be hands-on?"

Scope the first build as a proof of concept or an MVP

Scoping is one of the trickiest things in tech, and AI makes it harder. We start from a single core workflow. What are we automating, what does the input look like, what does the output need to look like. From there we list must-haves and nice-to-haves, identify technical risk early, and set clear acceptance criteria. The common failure is over-building the first version. Most solutions have one core metric they need to hit, and it's easy to over-engineer the front end and the integrations while that metric is still not in place. Decide what the KPI is and optimize for it first.

Every first project gets framed as one of two things. A proof of concept means we're confident we can build something, but not 100% confident the current state of AI will hit the client's KPIs, so we prove it first. Be up front about what that means. A proof of concept is usually a local demo and a report, not a working solution that adds value. The client takes input data, sees the output, and both sides agree whether it's where it needs to be. If it's not, we run a gap analysis. This is typically also the first invoice.

An MVP is different. If we already know the approach works, because we've built something similar or the solution is simpler, we scope the first iteration as a minimum viable product, the smallest thing we can deliver, deploy, and hand over that actually produces value.

Write the proposal around two-week sprints, and plan for the gaps

After the discovery call (sometimes a second call), we write a proposal from a rough standard document. It opens with the problem statement framed in the client's own words, then the proposed solution with explicit scope boundaries, the technical approach (architecture diagrams, features, as deep as the project's complexity demands), what's included and what's not, success criteria and how we measure them, and finally timelines and pricing. We send it over and plan a proposal meeting to walk through it together.

Beginners often forget the ongoing costs. "I'll build this chatbot for 5K" sounds fine until someone asks what happens afterwards. Where does it run, your environment or theirs? What are the infrastructure costs? What does the LLM API usage cost? You can't predict it precisely up front, but give a rough estimate of the ongoing costs next to the build cost.

We price in two-week sprints at 10 to 20k euros per sprint. With a small team (me, my co-founder, and subcontractors depending on the contract) we can run multiple sprints in parallel. A sprint follows a fixed shape. Days one and two are scaffolding and architecture, days three to eight are core development (building the workflow, integrating the LLMs, iterating on quality), and the final days are testing, polish, and a demo. New clients get a kickoff call, ideally Monday afternoon, to agree on communication. Then we push for async updates. Clients sometimes ask for stand-ups; we push back. I optimize the whole business for as few meetings as possible.

The sprint model has one structural challenge. The gap. In an ideal world, sprint two starts when sprint one ends. In the real world, you deliver, and the client disappears to test internally. People are busy, someone is on holiday, priorities reshuffle, and the next sprint lands anywhere from two to eight weeks later. That's rough on cash flow, especially solo, which is how I started as a freelance data scientist. We mitigate it by overplanning, keeping another client or project in the pipeline even when a client says they'll continue back-to-back. The gaps never fully go away. You plan around them.

Standardize your stack so every project looks the same

There's nothing special about how we build, and that's the point. Everything is standardized, and that's the actual secret behind fast turnover, heavy AI use, and consistent quality. The back end is always Python, built on FastAPI, Celery, and Redis (Redis as the interim storage where tasks get scheduled), with PostgreSQL databases, usually self-hosted Supabase. Workflows are DAG-based and come hooked up with Pydantic AI. When a project needs a front end, we reach for Next.js with shadcn, which pairs well with Supabase for authentication; that side is mostly my co-founder's territory. For LLM providers we almost always use Azure OpenAI. I'm not the happiest with it (the interface changes almost every month and the quotas are annoying), but for most of our clients it's the only way to meet data compliance and security requirements.

Every new client project starts by forking our GenAI Launchpad repository, the back-end infrastructure and framework we built at Datalumina. A few bash scripts bring up the local Docker environment, including the database, authentication, and vector search if the project needs it. Most importantly, the file structure is identical across projects, down to folder and file names. I can jump into any client codebase and know exactly where everything lives. That's the only way to work on multiple projects at the same time without it turning into a mess. Same stack, same patterns, same deployment process, every single project.

The stack barely changes either. We've run the same one for two years. LLMs change monthly, but the client systems in production keep working; we just swap out the model. FastAPI, Redis, Celery, Postgres, and your front end have been stable technologies for years. Build the system once, make sure it can adopt new models, and stop reinventing the wheel underneath it.

New to freelancing?

Get help landing your first client

Most developers can already deliver work clients pay $150/hr for. What you're missing is an offer, a pipeline, and the first client.

Learn the System

Use Claude Code, but keep the setup vanilla

Claude Code is our primary development tool across all projects, and it changed our scaling math. Last year we wanted to grow with more developers, but the jump from a small hands-on team to a real agency means recruiting, training, and retaining people while you stay responsible for every project. Profits dip while stress climbs. We decided not to. We run the development company at its current size (multiple six figures), scale through AI instead, and put the spare capacity into building our own product. Compared to three years ago we produce 5 to 10x more output. What used to take us two months now ships in two weeks, which is exactly why the two-week sprint works.

My setup is close to vanilla. Boris, the Anthropic engineer who created Claude Code, has said he mostly just uses it as-is, and that matches my experience. It's very good out of the box. Opus 4.6 at the time of recording, one to three sessions at a time, on the same branch. I skip the ten-agent worktree systems because I can't oversee what they're doing. Plan mode first, then execute.

The part that actually matters is documentation, because every session starts fresh and knows nothing about your codebase. Keep your CLAUDE.md files up to date, including in subfolders. Keep a docs folder and let AI maintain it. And prime every new session. Tell it which two documents and two files to read for the task, instead of letting it re-index the codebase and burn half the context window before the work starts.

Two things I'm experimenting with are the superpowers skill set, which adds a planning agent, an execution agent, and a subagent workflow for long-running tasks, and Anthropic's official skill-creator. My rule now is that anything I do in a project more than two or three times becomes a skill. Standardization also keeps the agent on rails. Hand it the Launchpad structure as the scope and it doesn't wander off building its own architecture, and because we've worked in this stack for years, we instantly spot when a PR puts something in the wrong folder with the wrong naming convention.

Set up evals and monitoring before anything goes live

Every project gets unit tests, integration tests, and LLM evals. The evals are straightforward in our setup because every processing step outputs a structured Pydantic model, so we know what goes in and what each step should produce. Early in a project we mock data with AI. Once the system is live, we pull raw records from the production database and turn real payloads into eval cases. Take this raw JSON event through the workflow, and at step five the escalation flag should be true. Lately we've been building Claude Code skills around this, so when a client reports an issue, a script pulls that ticket from the database into the project and assesses what went wrong. The full method is in how to set up LLM evals.

In production we rely on two tools. Langfuse traces every LLM call, and the Launchpad ships wired up to it, so every workflow we build is tracked automatically. One client's customer support pipeline, which we've been optimizing for a year and a half, processes a request every 15 to 20 minutes; when the client says something is off, we drill down to the exact step with the full context and prompt in front of us. Without that you're flying blind.

Sentry watches the application itself, around the FastAPI endpoints, and pings a Slack channel when something errors. This matters because async systems fail silently. The server doesn't crash, the endpoint stays available, and the event simply never gets processed. A recent example was a webhook payload that arrived as a Pydantic model where the code expected a dictionary, flagged by Sentry down to the file and line. Sentry has a copy-as-markdown button; paste that into Claude Code and it fixes the issue on the spot.

Deploy on a bare VM and block everything by default

We work locally, push to GitHub, and deploy to a bare Hetzner VM. No managed services. We clone the repository on the VM and run the Docker startup commands, and for continuous deployment we set up a pull-based pipeline. A GitHub Action triggers, and a script on the VM pulls the new version in. Caddy handles the reverse proxy and automatic HTTPS, so the app lives on a clean URL instead of the VM's IP address. We started out on Microsoft Azure and AWS and can't be bothered anymore. Hetzner is literally 10 times cheaper and has been reliable for us for years. When a client requires their own environment (here in the Netherlands that usually means Azure, AWS, or GCP), the same approach transfers; if you can deploy on a bare VM, you can figure out the managed clouds.

Security is where newer developers get burned, because the default state of a deployment is exposed. Block all IPs by default, then whitelist the services that need access. We use NordLayer to give the team a dedicated static IP through a VPN, so connecting to a production database from a laptop only works through that IP. Put a firewall on the server with only ports 80 and 443 open for network traffic, and use private networks internally where possible. That's the bare minimum for client deployments.

People ask about Kubernetes. We've never needed it. With Celery in the middle, a single VM that you can scale up in size goes much further than most people expect, and in all the years we've run this architecture we've never hit its limit. We just don't work on projects at the size where Kubernetes becomes the question.

Keep your prices and play the long game

The unit economics of development work are changing. The tech industry ran on time-based pricing for decades, hourly rates and day rates, and that model gets shaky when agents can do a week's worth of work while you're at the gym. Right now there's a gap. Companies still understand the old price of software, and you can deliver 5 to 10x faster than that price assumed. We keep our pricing the same instead of racing to the bottom, use the speed to take on more projects, and overdeliver inside each sprint, because "can it also do X" requests are now cheap to say yes to. The market will adjust. That's exactly why I consider this a golden era for freelance developers and small AI development companies, and you can even start next to a job, in evenings and weekends, with agents working on client projects during the day.

The real money in software is the long game of becoming the client's software partner, or AI development partner. The beginning is always the hardest part, with the scoping issues and expectation setting this post covers. Going from zero to one is harder than going from one to "can you add a couple of features?" Once trust is built, new projects come naturally; we have clients we've worked with for five years, and the work keeps coming. Software is never done. There's maintenance, there are new features, there are updates, and recurring sprints become predictable revenue. On top of that you can charge maintenance fees and bill for ongoing infrastructure and API costs.

You also don't need many clients. Analytics, data science, machine learning, software development, and data engineering all fall into the same bucket for how you work and what projects you can take on, and one big contract can put you at the six-figure mark as a freelancer. We see that happen regularly in our community. In tech, you don't want to jump from small project to small project the way a design agency juggles requests. Fewer clients, deeper work.

Next step

Take one workflow from a company you know, run it through the use case evaluation above, and see whether it scopes into a proof of concept or an MVP. That exercise alone will tell you more than another tutorial. For the positioning and client-finding side of this work, start with the freelance tech roadmap, and if you're selling yourself as the builder, the freelance AI engineer guide covers offers and proof.

If you want the full A-to-Z with the templates and repositories we use, the GenAI Accelerator teaches the technical side of everything in this post, and Data Freelancer helps you land your first client if you already have a technical background. The video version of this whole walkthrough is How I build and ship custom AI solutions for clients.

FAQ

How much should you charge for a custom AI project?

We price two-week sprints at 10 to 20k euros per sprint, and frame the first engagement as a proof of concept or an MVP. Always include a rough estimate of ongoing costs (infrastructure and LLM API usage) in the proposal, because the build price is not the whole price.

What accuracy can a client expect from a custom AI solution?

A first build typically reaches 70 to 80% accuracy, and iteration usually gets it to 90%. The last stretch is disproportionately hard, so design for it. Escalate the cases the system can't handle to a human, and frame the project as iterative development from the first call.

What is the difference between a proof of concept and an MVP in AI projects?

A proof of concept proves the KPI is reachable. It's usually a local demo and a report, not a deployed product, and the client pays for the proof. An MVP is the smallest deployed system that produces real value. Say explicitly which one you're scoping, because clients sometimes expect a working solution from a proof of concept.

Do you need Kubernetes to deploy client AI solutions?

In my experience, no. We deploy everything with Docker Compose on a single Hetzner VM behind Caddy, and with Celery handling the task queue, a VM you can scale up in size has covered every project we've run. We've never hit the limit of this setup.

Can you deliver client AI projects next to a full-time job?

Yes, and right now is a good moment for it. Pick quick-win projects with clear ROI, work in two-week sprints, and let coding agents handle work in the background. One solid client relationship can grow into recurring sprints, and a single large contract can reach the six-figure mark.