Semantic search
With TopK, you can implement vector-powered semantic search in just a few lines of code.
TopK comes with built-in embeddings and reranking, removing the need for third-party embedding models or custom reranking solutions.
How to perform a semantic search
In the following example, we’ll:
Define a collection schema
Define a collection schema for semantic search.
Add documents
Add documents to the collection.
Query the collection with semantic search
Perform a semantic search.
Define a collection schema
Semantic search is enabled by adding a semantic_index()
to a text field in the collection schema:
This configuration automatically generates embeddings as well as enables keyword search for the specified text fields.
If you want to use your own embeddings instead of TopK’s built-in semantic_index()
, see Vector Search.
Add documents to the collection
Let’s add some documents to the collection:
Perform a semantic search
To search for documents based on semantic similarity, use the semantic_similarity()
function:
Let’s break down the example above:
- The
semantic_similarity()
function computes the similarity between the query"classic American novel"
and the text value stored in thetitle
field for each document. - TopK performs automatic query embedding under the hood using the model specified in the
semantic_index()
function. - The results are ranked based on similarity, and the top 10 most relevant documents are returned.
- The optional
.rerank()
call uses a reranking model to improve relevance of the results. For more information, see our Reranking guide.
This works out of the box—no need to manage embeddings, external APIs, or third-party reranking models.
Combining semantic and keyword search
For certain use cases, you might want to use a combination of keyword search and semantic search:
This example above combines keyword relevance (BM25) with semantic similarity,
ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.