Unified Retrieval
Unified retrieval allows you to combine multiple retrieval techniques—such as multi-vector search, hybrid (vector + text) search, and custom scoring functions within a single query. This flexibility enables more precise and effective ranking of search results.
With TopK, you can:
- Retrieve documents based on multiple embeddings — multi-vector retrieval.
- Combine semantic similarity (or vector distance) with keyword search — hybrid retrieval.
- Apply custom scoring functions that blend multiple ranking factors — custom scoring.
Multi-Vector Retrieval
In some cases, a single vector representation of a document isn’t enough. For example, in a research paper database, you might want to:
- Retrieve documents based on both a summary of the entire paper and a summary of individual paragraphs.
- Rank results by combining scores from multiple embeddings.
Defining a Collection with Multiple Embeddings
Querying with Multiple Embeddings
Explanation
- We retrieve documents based on both the full paper summary and the paragraph summary.
- The
top_k()
function blends the two scores, giving 70% weight to the full paper and 30% weight to the paragraph. - This method ensures that entirely relevant papers rank higher while also considering specific paragraph relevance.
Vector + Text Retrieval (Hybrid Search)
Hybrid retrieval combines semantic similarity (vector-based search) with exact keyword matching. This ensures that documents containing explicit matches to the query keywords are considered alongside semantic similarity.
Defining a Collection for Hybrid Retrieval
Querying with Hybrid Retrieval
Explanation
- We retrieve documents based on semantic meaning (
content_score
) and keyword matching (keyword_match
). - The
filter()
ensures that documents must contain at least one relevant keyword. - The
top_k()
function weights the scores, prioritizing semantic meaning (60%) while still considering keyword matches (40%).
This balances precision and recall, capturing both exact keyword matches and meaningful context.
Custom Scoring Functions
TopK allows you to define powerful scoring functions by combining semantic similarity with additional fields, such as a precomputed importance score.
Defining a Collection with Custom Scoring Fields
Querying with a Custom Scoring Function
Explanation
- We retrieve documents based on both semantic similarity (
content_score
) and precomputed importance (importance_score
). - The
top_k()
function gives 80% weight to content relevance and 20% weight to document importance. - This method allows you to boost more critical documents, ensuring that highly relevant but less “important” content doesn’t dominate.
Was this page helpful?