In some cases, a single vector search may not be sufficient to retrieve the most relevant results.In TopK, you can run multiple vector searches in a single query.
Additionally, you can apply a different ranking score to each individual vector search result.For example, in a research paper database, you might want to:
Retrieve documents based on both the summary of the entire paper and summaries of individual paragraphs.
Rank results by combining scores from multiple embeddings such as:
from topk_sdk.schema import text, semantic_indexclient.collections().create( "articles", schema={ "title": text().required(), "paper_summary": text().index(semantic_index()), # Embedding of full paper "paragraph_summary": text().index(semantic_index()), # Embedding of a paragraph },)
After, we run a multi-vector search by using the fn.semantic_similarity() function for calculating semantic similarity on the paper_score field as well as the paragraph_score field:
Copy
from topk_sdk.query import select, field, fndocs = client.collection("articles").query( select( "title", paper_score=fn.semantic_similarity("paper_summary", "deep learning optimization"), paragraph_score=fn.semantic_similarity("paragraph_summary", "stochastic gradient descent"), ) .topk(field("paper_score") * 0.7 + field("paragraph_score") * 0.3, 10))# Example results:[ { "_id": "2", "title": "On the Importance of Initialization and Momentum in Deep Learning", "paper_score": 0.9774298071861267, "paragraph_score": 0.9783554673194885, # "blend_score": 0.9777075052261353, }, { "_id": "1", "title": "Understanding the Difficulty of Training Deep Feedforward Neural Networks", "paper_score": 0.9773390889167786, "paragraph_score": 0.976479709148407, # "blend_score": 0.977081298828125, }]
Let’s break down the example above:
Calculate semantic similarity between the query and the full paper summary as well as the paragraph summary.
Blend the two scores by giving 70% weight to the full paper and 30% weight to the paragraph summary.
Retrieve the top 10 results ranked by the blended score.