DataOct 15, 2024Updated Jun 11, 20269 min read

How to implement hybrid search with PostgreSQL

A full walkthrough of hybrid search with PostgreSQL, combining semantic search through pgvectorscale, keyword search through full-text search and ts_rank_cd, and Cohere reranking on top, all in one database.

Dave EbbelaarSenior AI Engineer

TL;DR

To implement hybrid search with PostgreSQL, store your documents and embeddings in one table, then run two searches against it. Semantic search through the pgvectorscale extension and keyword search through Postgres full-text search ranked with ts_rank_cd. Combine both result sets, remove duplicate IDs, and pass the merged candidates to a reranking model such as Cohere's to produce the final order. In my test on 1,000 CNN news articles embedded with OpenAI's text-embedding-3-small, keyword search surfaced two documents that semantic search missed entirely, which is the point. Full-text search catches exact terms like codes and abbreviations that embeddings can gloss over. The whole pipeline runs from one Docker Compose file and a single Postgres database, no dedicated vector database required.

You don't need a dedicated vector database to build an advanced RAG pipeline. PostgreSQL handles semantic search with embeddings through pgvectorscale and keyword search through its built-in full-text search. Combine the two, add a reranking model on top, and you get a retrieval setup that has been shown to work better than semantic search with vectors alone.

This post walks through the full implementation. It covers setting up the database with Docker, embedding 1,000 news articles, running semantic search, adding keyword search, merging both into hybrid search, and reranking the results with Cohere. Everything runs against one Postgres database.

It builds on my earlier Postgres RAG tutorial, where I set up PostgreSQL for RAG without a dedicated vector database and explain how the pgvector extension works. You can follow this guide on its own, but that one fills in the mechanics of the vector store if you want them. For the theory behind hybrid retrieval, BM25, and rank fusion, read my post on hybrid search for RAG. This one is about the Postgres-specific implementation.

One database, two search methods

The whole stack starts from a single Docker Compose file. It pulls the Timescale PostgreSQL image, which ships with the extensions and settings needed for similarity search already installed. Run docker compose up -d in the docker folder (or right-click the compose file in Cursor or VS Code and select Compose Up) and you have a database listening on the default port.

To follow along you need four things, a Docker installation, Python, an OpenAI API key, and a Postgres GUI client so you can inspect the tables. I use TablePlus. A Cohere API key is optional, only needed for the reranking step at the end, and a free key works. I'm on a free key myself.

Connect to the database with the default localhost settings to confirm it's running. For you the database should be empty at this point. Then create a virtual environment and install the requirements. That's the setup, five steps total, and the repository linked from the video has all of it written out.

Load the data and create the embeddings

The demo corpus is the CNN/DailyMail dataset from the datasets library on pip. It holds around 280,000 short news articles. I select 1,000 of them, enough to make the database feel real without slowing the tutorial down.

Swap in whatever corpus you want. Whether it's company documents, an internal knowledge base, or customer care guidelines, the mechanism stays the same, you only change the column structure. For this demonstration I use the article column and ignore the highlights.

Before running anything, copy the example env file to .env and add your OpenAI API key. The embedding function calls OpenAI's text-embedding-3-small, and the dimensions are configured in settings.py to match that model. The project uses pydantic-settings, so API keys, database settings, and the table name (documents) all live in one place. Change the table name there if you want a different one.

The insert script applies one function across the dataframe. For each row it creates an embedding from the article text, then builds a record. The four fields are an ID (a UUID based on the datetime), metadata with the creation time, the content, and the embedding. Embedding 1,000 articles took a couple of minutes on my machine. You could speed that up with async calls, but I kept it simple and sequential for the tutorial.

With the dataframe ready, three commands finish the job. Create the documents table, create the search indexes, and upsert the data. Two indexes matter here. The first is the vector search index built on pgvectorscale, which powers semantic search. The second is a GIN index, which powers keyword search. Same table, two retrieval paths.

Semantic search is the baseline

The test query for the rest of the walkthrough is "Is there any news related to London?". The database holds 1,000 CNN articles covering global news. Why London? No real reason, the query is arbitrary, and validating the data itself is a job for another tutorial. It's here to show the mechanism.

Semantic search here works the way you already know if you've built basic RAG. The vector store wraps Timescale's vector client, a helper library for similarity search on Postgres. It embeds the query, computes cosine distance against the stored vectors, and returns the closest matches. With the limit set to 5, it returns 5 results, ranked by distance.

That's standard RAG retrieval, the part you should already be familiar with if you're reading about advanced pipelines.

Keyword search with Postgres full-text search

Postgres ships with strong keyword search out of the box, and you reach it through plain SQL. The keyword search method in the vector store is a single SQL statement. It selects the ID and content, matches the query against the content column with to_tsvector using the English configuration, and ranks the matches with ts_rank_cd, which is Postgres's relevance scoring for full-text search. The repository's implementation looks like this in essence:

SQL

SELECT id, content, ts_rank_cd(to_tsvector('english', content), query) AS rank
FROM documents, plainto_tsquery('english', 'Is there any news related to London?') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC
LIMIT 5;

Run the same London query through it. The limit is 5, but only 2 results come back.

That strictness is both the strength and the weakness of keyword search. There's no fuzzy matching. A document only matches when it contains all of the query's meaningful words. I'm not 100% sure how Postgres handles every step behind the scenes, but the English configuration does preprocessing that filters stop words like "is" and "any", so what remains is essentially "news" and "London", and both have to appear. The official PostgreSQL full-text search documentation covers the internals better than I can here.

The real use case is exact terms. Inside a company corpus you'll have abbreviations, product codes, and specific technical keywords that matter a lot. Semantic search alone can miss those. Keyword search guarantees an exact match on top of it.

Combining the two into hybrid search

The hybrid search method is less clever than the name suggests, and that's fine. It runs the keyword search, runs the semantic search, concatenates the results, and removes duplicate IDs, keeping the first occurrence. Deduplication matters because you'll send these results to a language model later, and duplicate records waste tokens, compute, and processing power.

I set both retrieval limits to 10 for the test. Semantic search always fills its limit when the table is big enough, so it returned 10. Keyword search returned its strict 2. The combined, deduplicated set held 12 records, indexed 0 through 11.

That number is the whole argument for hybrid search in one line. Getting 12 unique records from a possible 12 means the keyword search found 2 documents the semantic search missed entirely. Whether those 2 were more relevant for this particular query, I'm honestly not sure, and for the mechanism it doesn't matter. Combining the two methods casts a wider net over what you can retrieve from the database.

Stuck building prototypes?

Get our production stack

What you're missing is the architecture, evaluation, and judgment to ship a real project.

See Curriculum

Reranking fixes the ordering problem

The merged list has a problem. Its ordering is naive. Keyword results sit on top because keyword search ran first, and each list is only ranked within itself. The semantic results are ordered by cosine distance, the keyword results by ts_rank_cd. Those two scores are different metrics on different scales. You can't compare them, so the combined list has no honest global order.

Reranking solves this. A reranking model looks at the query and all of the candidate results together, then produces one fresh ranking across the entire set. You specify a top_n and keep only the best. In the vector store this is a rerank boolean on the hybrid search method, off by default, plus the cutoff.

I use Cohere's rerank model. I've heard good things about it, and Anthropic published a post on contextual retrieval reporting strong results with reranking as a stage in the pipeline. Running hybrid search with rerank=True and a top 5 is fast, even though it performs the semantic search, the keyword search, and the reranking in one call.

For the London query, all five surviving results came from the semantic side. For this specific question, the model judged the keyword matches less relevant than the semantic ones. Your corpus and your queries will behave differently, which is exactly why you rerank instead of trusting either method's internal order.

One upgrade is straightforward to add. Don't send the raw user query to the keyword search at all. Put an extraction step in between. Take the query, use instructor or OpenAI structured outputs to pull out the two most relevant keywords, then run a keyword search for each of them individually. Combine everything and hand it to the reranker. Full sentences make mediocre keyword queries; actual keywords are what full-text search wants.

From there the pipeline ends the way every RAG pipeline ends. The synthesizer class takes the ranked results and the user question, sends them to a language model, and returns a grounded answer. For the London query it produced a clean summary of the retrieved articles. That's end-to-end RAG, from raw articles to embedded rows to hybrid retrieval to a final answer, on one database.

Where BM25 fits in Postgres

After my previous video, several comments asked about BM25, and it deserves a straight answer. BM25 is currently the state of the art for keyword ranking in retrieval systems, and it's typically what people mean when they say hybrid search. Postgres doesn't support it natively.

There are workarounds. Extensions like pg_search implement BM25 inside Postgres, so if exact-term ranking quality is critical for your application, look into those. I haven't implemented it in this setup yet. For most applications, full-text search with ts_rank_cd plus a reranker on top will already go a long way.

If you want the deeper comparison of BM25, dense embeddings, and rank fusion, including evaluation numbers, that's covered in the hybrid search for RAG post.

Why I keep everything in one database

Everything in this pipeline is open source except the two API calls, to OpenAI for embeddings and Cohere for reranking. You can take it and run it anywhere, whether that's locally, on Azure, AWS, or Google, in Docker on a server or a container instance. You're in complete control, with no external vector database in the loop.

Dedicated vector databases have their place. But for the projects I work on, I love this setup, mostly because the whole application lives in one database. The documents table with the embeddings sits right next to the table where I log events and records. That makes the whole process a lot easier to operate.

This retrieval layer is one piece of the broader skill set covered in my guide to building production AI systems. For the full implementation with all the code, watch the PostgreSQL hybrid search tutorial and work through the repository linked there, especially the vector store class, to understand how it works behind the scenes.

FAQ

Does PostgreSQL support hybrid search?

Yes. Postgres runs semantic search through the pgvector or pgvectorscale extensions and keyword search through its built-in full-text search with tsvector and ts_rank_cd. Combining the two result sets and reranking them gives you hybrid search without any extra infrastructure.

Does Postgres support BM25?

Not natively. Postgres full-text search uses its own ranking functions like ts_rank_cd, not BM25. Extensions such as pg_search add BM25 ranking inside Postgres if your application needs it, but full-text search plus a reranker covers most use cases.

Do I need a dedicated vector database for RAG?

No. A Postgres database with the pgvectorscale extension handles embedding storage, similarity search, and keyword search in one place. That also means your application data and your embeddings live in the same database, which simplifies operations.

Why does keyword search return fewer results than semantic search?

Because full-text matching is strict. A document only matches when it contains all the meaningful terms in the query after stop-word filtering, with no fuzzy matching. In my test, a limit of 5 returned just 2 documents. Semantic search ranks every document by distance, so it always fills its limit.

Is the Cohere reranker required?

No, it's optional. The hybrid search works without it, just with a naive ordering. A free Cohere API key is enough for the reranking step, and you can substitute another reranking model.