Semantic Search
Semantic search enables you to find relevant documents based on semantic similarity rather than exact keyword matches. With TopK, you can implement powerful semantic search in a few lines of code—without needing third-party embedding models or reranking services.
TopK comes with built-in embeddings and reranking, making it incredibly easy to build high-quality retrieval pipelines.
Implementation
1. Defining a collection schema
To use semantic search, you first need to define a collection and schema with a semantic index. This enables TopK to automatically generate embeddings for your text fields.
Explanation
- The
semantic_index()
ontitle
ensures the provided text is automatically embedded. - Other fields do not need to be in the schema to be stored and queried—they can still be upserted as part of a document.
If you want to use your own embeddings instead of TopK’s built-in semantic_index()
, see Custom Embeddings.
2. Running a semantic search query
Once the schema is set up, querying for semantically similar documents is simple:
What’s Happening Here?
- The
semantic_similarity
function computes the similarity between the query “catcher in the rye” and values stored in thetitle
field. - TopK automatically embeds the query using the model specified in
semantic_index()
. - The results are ranked based on similarity, and the top 10 most relevant documents are returned.
- The optional
.rerank()
call uses a reranking model to improve relevance of the results.
This works out of the box—no need to manage embeddings, external APIs, or reranking models.
3. Combining Semantic and Text Search
You may want to combine keyword search with semantic search for more precise results.
This blends keyword relevance (BM25) with semantic similarity, ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.
Customization
Bring your own embeddings
If you want to bring your own embeddings instead of using semantic_index()
, you can store them in a vector()
field and query using vector_distance()
.
To query with custom embeddings, use vector_distance()
instead of semantic_similarity()
:
Using custom embeddings is useful if:
- You have a domain-specific embedding model (e.g., medical, legal, or technical documents).
- You need embeddings that are consistent across multiple systems.
For most use cases, TopK’s built-in semantic_index()
is the easiest and most efficient way to implement semantic search.
You can still use our built-in reranking model by calling .rerank()
on a query with custom embeddings. In this case, you will need to pass the query and fields to .rerank()
explicitly.
Lexical scoring with reranking
You can also use lexical scoring with reranking. This will score documents based on the BM25 score and then use the semantic similarity to rerank the results.
Was this page helpful?