Documentation Index
Fetch the complete documentation index at: https://docs.topk.io/llms.txt
Use this file to discover all available pages before exploring further.
TopK enables you to add semantic search to your documents with just a few lines of code.
There is no need to build or maintain a separate embedding pipeline.
In the following example, we’ll:
Define a collection schema
Create a collection configured for semantic search.
Add documents
Insert documents into the collection.
Run a semantic query
Retrieve documents using a free-form text query.
Define a collection schema
Semantic search is enabled by adding a semantic_index() to a text() field in the collection schema:
from topk_sdk.schema import text, semantic_index
client.collections().create(
"books",
schema={
"title": text().required().index(semantic_index()),
},
)
This configuration automatically generates embeddings as well as enables keyword search for the specified text fields.
If you want to use your own embeddings instead of TopK’s built-in semantic_index(), see Vector Search guide.
Add documents to the collection
Let’s add some documents to the collection:
client.collection("books").upsert(
[
{"_id": "gatsby", "title": "The Great Gatsby"},
{"_id": "1984", "title": "1984"},
{"_id": "catcher", "title": "The Catcher in the Rye"},
],
)
Run a semantic query
To search for documents based on semantic similarity, use the fn.semantic_similarity() function:
from topk_sdk.query import select, field, fn
docs = client.collection("books").query(
select(
"title",
title_similarity=fn.semantic_similarity("title", "classic American novel"),
)
.topk(field("title_similarity"), 10)
)
# Example results:
[
{
"_id": "2",
"title": "The Catcher in the Rye",
"title_similarity": 0.9497610926628113
},
{
"_id": "1",
"title": "The Great Gatsby",
"title_similarity": 0.9480283856391907
}
]
Let’s break down the example above:
- The
semantic_similarity() function computes the similarity between the query "classic American novel" and the text value stored in the title field for each document.
- TopK performs automatic query embedding under the hood.
- The results are ranked based on similarity, and the top 10 most relevant documents are returned.
This works out of the box—no need to manage embeddings or external APIs.
Combining semantic and keyword search
For certain use cases, you might want to use a combination of keyword search and semantic search:
from topk_sdk.query import select, field, fn, match
docs = client.collection("books").query(
select(
"title",
title_similarity=fn.semantic_similarity("title", "catcher"),
text_score=fn.bm25_score(), # Keyword-based relevance
)
.filter(match("classic")) # Ensure the book contains the keyword "classic" in any of the text-indexed fields
.topk(field("title_similarity") * 0.7 + field("text_score") * 0.3, 10) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
)
This example above combines keyword relevance (BM25) with semantic similarity,
ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.