- vector search
- multi-vector search
- keyword search
- metadata filtering
How TopK differs from other “hybrid” search systems
Most databases that offer hybrid search maintain separate vector and keyword indexes. When a query is executed they:- Run two separate queries for both indexes
- Collect the top results from each query (e.g. first 100 + 100 candidates)
- Use techniques like Reciprocal Rank Fusion (RRF) to merge and rerank these two sets of results
This approach is fundamentally probabilistic - the final top-k results are not guaranteed to be the actual best candidates because some potential candidates might be missed if they don’t appear in either index’s top results. TopK is different. It runs through a single index(vector + keyword), ensuring that our “top 100” results are the actual top 100 - not just a probabilistic approximation: With TopK, you can:
- Retrieve documents based on multiple embeddings — Multi-vector retrieval
- Combine semantic similarity(e.g vector search) with keyword search — True Hybrid Retrieval
- Filter documents by their metadata
- Apply custom scoring functions blending multiple ranking factors — Custom scoring
Implementing Hybrid Search (Vector + Keyword)
Hybrid retrieval combines semantic similarity (vector-based search) with exact keyword matching. This approach ensures that documents with direct keyword matches are considered alongside those that are semantically similar to the query. Let’s define a collection with onekeyword_index()
and one semantic_index()
:
- We retrieve documents based on semantic meaning (
content_similarity
) and keyword matching (text_score
). - The
filter()
ensures that documents contain at least one relevant keyword. - The
topk()
function weights the scores, prioritizing semantic meaning (60%) while still considering keyword matches (40%).
Implementing Complex Search(Keyword + Vector + Filtering + Reranking)
In TopK, you can combine keyword search, vector search, filtering and reranking in a single query. This allows you to fetch the truly most relevant results while maintaining a steady performance - no overfetching.Custom Scoring Functions
TopK allows you to define custom scoring functions by combining:- Semantic similarity score
- Keyword score(BM25)
- Vector distance
- “Bring-your-own” precomputed importance score
Defining a Collection with Custom Scoring Fields
Querying with a Custom Scoring Function
- First, we retrieve documents based on both semantic similarity (
content_score
) and precomputed importance (importance_score
). - Then, the
topk()
function gives 80% weight to content score and 20% weight to document importance. - Sorting by a custom scoring function allows us to boost more critical documents, ensuring that highly relevant but less “important” content doesn’t dominate.