Blog

Data

RAG, embeddings, evaluation, and the data layer behind useful AI products.

May 14, 2026Data

How to build hybrid search for RAG

Hybrid search for RAG combines BM25, dense embeddings, reciprocal rank fusion, reranking, and retrieval evaluation so your system retrieves better evidence.

May 10, 2026Data

How to build agentic RAG in pure Python

Build agentic RAG from scratch with three file tools that list, grep, and read. The same loop coding agents run, applied to your own knowledge, plus the production hardening that makes it hold up.

Feb 13, 2025Data

How to prepare data for AI agents

Build an open-source pipeline that prepares documents, PDFs, and websites for AI agents, using Docling extraction, hybrid chunking, embeddings, and vector search in Python.

Oct 15, 2024Data

How to implement hybrid search with PostgreSQL

A full walkthrough of hybrid search with PostgreSQL, combining semantic search through pgvectorscale, keyword search through full-text search and ts_rank_cd, and Cohere reranking on top, all in one database.

Oct 1, 2024Data

How to build RAG with PostgreSQL

Build RAG with PostgreSQL using pgvector and pgvectorscale, keeping relational data and embeddings in one database, with similarity search, structured output, and advanced filtering.

Dec 21, 2023Data

PostgreSQL as a vector database: a beginner's guide to pgvector

PostgreSQL works as a vector database with the pgvector extension. A side-by-side speed test against Pinecone, the tables LangChain creates for you, and how to host it for free on Supabase.