Engineering7 min read

How to build a private knowledge chatbot

A high-level architecture for building a private knowledge chatbot with Python, FastAPI, React, Supabase, retrieval, citations, model APIs, and a full implementation path.

A knowledge chatbot is one of the best starter projects for AI engineering because it forces you to connect the pieces that matter in real applications.

You need a user. You need documents. You need ingestion. You need retrieval. You need an answer format. You need citations. You need a way for someone to trust the result.

That is why I like the Document Copilot project. It is an internal AI chatbot for analysts who need to ask questions over a curated document corpus and get sourced answers back. The example workflow is based on an analyst team that spends a large part of the week reading SEC filings before producing analysis. The product goal is to reduce that intake work while keeping every answer grounded in source documents.

This post is the high-level architecture. If you want the full build, watch the four-hour implementation. You can also open the starting branch and compare it with the full implementation.

Start with the workflow

Before choosing tools, define the workflow.

For Document Copilot, the user is an analyst. The corpus is SEC filings. The questions are specific, for example comparing risk-factor language, revenue segments, margins, supplier concentration, capital expenditures, and AI infrastructure language across companies and years.

The trust standard is stricter than a normal chatbot. The system should answer only from the corpus, cite source passages, and let the user inspect the underlying text. If the corpus does not support an answer, the chatbot should say so instead of filling the gap with a confident guess. It should also avoid investment advice and unsupported conclusions.

A document chatbot becomes useful when users can verify the answer. A confident paragraph with no source trail is weak proof.

The architecture has two paths

Think about the system as two separate paths.

PathPurposeMain components
Ingestion pathPrepare documents before users ask questionsDownload, parse, chunk, embed, store
Chat pathAnswer user questions from prepared knowledgeAuth, retrieval, generation, citations, chat history

The ingestion path runs before the chat experience. It turns raw documents into structured database records and searchable chunks.

The chat path runs when a user asks a question. It retrieves relevant passages, asks the model to answer from those passages, streams the result back to the browser, and stores the conversation.

Keeping those paths separate makes the system easier to reason about. If answers are weak, you can inspect the ingestion and retrieval layers before blaming the model.

The stack

The Document Copilot stack is practical and intentionally familiar:

LayerTooling
BackendPython, FastAPI, Pydantic, SQLAlchemy, Alembic
FrontendVite, React, TypeScript
DatabaseSupabase Postgres
RetrievalSupabase pgvector and Postgres full-text search
AuthSupabase Auth
LLM and embeddingsOpenAI
Deployment targetRailway

FastAPI owns the backend boundary. React owns the browser experience. Supabase handles authentication and durable state, covering users, chat threads, messages, source documents, chunks, embeddings, and citation metadata.

That split is important. The browser should never call OpenAI directly or hold privileged database credentials. It should authenticate the user, render the interface, and send requests to the backend. The backend verifies the user, performs retrieval, calls the model, validates the answer, and persists the result.

Privacy starts with deployment choices. You can deploy and manage the app, database, auth, storage, and retrieval layer as an internal tool behind company login. That keeps documents, chunks, chat history, and citations inside your own infrastructure boundary.

The part to be honest about is the model call. If you use a hosted model, prompts and retrieved passages still leave your backend for a third-party model API. You can host open-source models yourself and keep the whole path private, but for many retrieval systems those models are still not good enough for the answer quality users expect.

A practical middle ground is an enterprise model endpoint such as Amazon Bedrock or Azure OpenAI. Bedrock is explicit that prompts and completions are not stored or logged, not used to train models, and not distributed to third parties. Azure OpenAI models are stateless for inference, prompts and completions are not used to train base models, and approved customers can turn off abuse-monitoring data storage. Treat zero data retention as an enterprise configuration to confirm, not a default assumption. For a sensitive internal chatbot, that is often the privacy posture you want. Your app stays internal, and the model provider contract handles retention, training use, and geography.

The ingestion path

The ingestion path turns raw files into something the chatbot can search.

For Document Copilot, the sample corpus is SEC 10-K filings for Apple, Microsoft, NVIDIA, Amazon, and Alphabet across fiscal years 2021 to 2025. The pipeline downloads filings, converts HTML to Markdown, loads document metadata into Supabase, chunks the filing text, creates embeddings, and stores each chunk with metadata.

A good ingestion record needs enough structure for retrieval and enough context for citations. At minimum, store the document ID, source metadata, document type, date fields, chunk text, and embedding. If the corpus is long or sectioned, add section names, page or location metadata, token count, and a search vector. Those fields are not decoration. They are what lets the chatbot retrieve the right passage and show the user where the answer came from.

You can add more later, but this is enough to retrieve useful passages and show citations back to the user.

The chunking step deserves attention. Bad chunks create bad answers. If chunks are too large, retrieval becomes noisy. If chunks are too small, the model lacks context. The Document Copilot implementation also handles neighboring chunks, so the system can pull surrounding context when a retrieved passage needs a little more support.

GenAI Accelerator

The gap between a demo and production

Anyone can wire up an LLM call. The real skill is designing, evaluating, and shipping systems that hold up.

See Curriculum

The chat path

The chat path starts when the user asks a question.

In Document Copilot, the React app sends the question and the user's Supabase access token to FastAPI. The backend verifies the user, checks thread access, extracts the latest user message, creates a retriever, and streams the answer back to the browser.

The flow is simple from the user's point of view. They sign in, ask a question, and get an answer with citations. Behind that, the frontend sends the request to FastAPI with the user's token. FastAPI verifies the user and thread access, retrieves relevant document passages, asks the assistant to answer from those passages, attaches citations, and persists the final message with the citation records.

This is where the project starts to feel like a real product instead of a local demo. The user can come back to previous chats. The system knows which thread a message belongs to. Citations can be opened later. The backend owns the logic that decides what the model sees.

Retrieval is the core engineering layer

The retrieval layer decides what evidence the model gets.

Document Copilot uses hybrid retrieval:

  1. Embed the user's query.
  2. Run semantic search with pgvector.
  3. Run keyword search with Postgres full-text search.
  4. Fuse both result lists with Reciprocal Rank Fusion.
  5. Hydrate the selected chunks with document metadata and optional neighboring context.

This is a better default than relying on embeddings alone. Semantic search is useful when the user's wording differs from the source text. Keyword search is useful when exact terms, tickers, financial language, filing sections, or named entities matter. The fusion step gives you a ranked list that benefits from both.

For a deeper retrieval-specific build, the hybrid search walkthrough is a good companion. This post stays at the system level.

The trust layer

The most important part of a knowledge chatbot is the trust layer.

Document Copilot encodes that in the assistant contract. The assistant answers only from retrieved passages, cites factual claims, and includes source excerpts for citations. If the corpus cannot support the answer, it returns insufficient_evidence. It also avoids investment advice and does not infer beyond the filings.

The implementation also uses a typed output called GroundedAnswer. It contains the answer, the citations, and an insufficient_evidence flag. That structure gives the backend something concrete to validate before it stores or streams the final result.

For client work, this is the difference between a nice prototype and a system someone can actually use. Optimize for inspectable answers before confident phrasing.

A simple build order

If you are building your own version, start smaller than the full Document Copilot implementation. Pick one narrow corpus first. Create a basic ingestion script, store documents and chunks in Postgres, add embeddings, then build a search endpoint before you touch the chat UI.

Once retrieval works, add the chat endpoint, a simple React interface, and citations. Auth, chat history, deployment, and smoke tests come after the core loop works end to end.

Do not start with every feature. Get one document set working first. Ask real questions. Inspect the retrieved chunks. Check whether the answer uses the right evidence. That loop will teach you more than adding another UI feature.

Why this is a strong portfolio project

A knowledge chatbot is a strong AI engineering portfolio project because it shows judgment across the full stack.

It shows that you can work with documents, databases, APIs, auth, retrieval, LLM calls, citations, UI state, and deployment. It also gives you a realistic client story. A team has documents, they ask repeated questions, and they need answers with sources.

That makes it useful for freelance AI engineering too. Many companies do not need a brand-new model. They need a system that makes their existing knowledge easier to use.

If you are building toward client work, use this project as proof that you can connect AI to a real workflow. Then read the freelance AI engineer roadmap to turn that proof into a service offer.

Resources

Written by

Dave Ebbelaar

Dave Ebbelaar

Senior AI Engineer

AI engineer and founder of Datalumina. Dave helps developers build production AI systems and turn technical skills into client work.