Skip to main content
TopK enables you to add semantic search to your documents with just a few lines of code. There is no need to build or maintain a separate embedding pipeline. In the following example, we’ll:
1

Define a collection schema

Create a collection configured for semantic search.
2

Add documents

Insert documents into the collection.
3

Run a semantic query

Retrieve documents using a free-form text query.

Define a collection schema

Semantic search is enabled by adding a semantic_index() to a text() field in the collection schema:
from topk_sdk.schema import text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
    },
)
This configuration automatically generates embeddings as well as enables keyword search for the specified text fields.
If you want to use your own embeddings instead of TopK’s built-in semantic_index(), see Vector Search guide.

Add documents to the collection

Let’s add some documents to the collection:
client.collection("books").upsert(
    [
        {"_id": "gatsby", "title": "The Great Gatsby"},
        {"_id": "1984", "title": "1984"},
        {"_id": "catcher", "title": "The Catcher in the Rye"},
    ],
)

Run a semantic query

To search for documents based on semantic similarity, use the fn.semantic_similarity() function:
from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "classic American novel"),
    )
    .topk(field("title_similarity"), 10)
    .rerank()
)

# Example results:

[
  {
    "_id": "2",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.9497610926628113,
    "_rank": 0,
    "_rerank_score": 0.048159245401620865,
  },
  {
    "_id": "1",
    "title": "The Great Gatsby",
    "title_similarity": 0.9480283856391907,
    "_rank": 1,
    "_rerank_score": 0.02818089909851551,
  }
]
Let’s break down the example above:
  1. The semantic_similarity() function computes the similarity between the query "classic American novel" and the text value stored in the title field for each document.
  2. TopK performs automatic query embedding under the hood using the model specified in the semantic_index() function.
  3. The results are ranked based on similarity, and the top 10 most relevant documents are returned.
  4. The optional .rerank() call uses a reranking model to improve relevance of the results. For more information, see our Reranking guide.
This works out of the box—no need to manage embeddings, external APIs, or third-party reranking models. For certain use cases, you might want to use a combination of keyword search and semantic search:
from topk_sdk.query import select, field, fn, match

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "catcher"),
        text_score=fn.bm25_score(),  # Keyword-based relevance
    )
    .filter(match("classic"))  # Ensure the book contains the keyword "classic" in any of the text-indexed fields
    .topk(field("title_similarity") * 0.7 + field("text_score") * 0.3, 10) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
    .rerank()
)
This example above combines keyword relevance (BM25) with semantic similarity,
ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.