Skip to main content
Prerequisites
TopK enables you to add state-of-the-art semantic search to your documents with just a few lines of code. With TopK’s semantic search, there is no embedding pipeline to build, no vector store to operate, and no reranking service to maintain. It’s as simple as adding semantic_index() to your collection schema and querying with fn.semantic_similarity():
from topk_sdk.schema import text, semantic_index
from topk_sdk.query import select, field, fn

client.collections().create("books", schema={
    "title": text().required().index(semantic_index()),
})

docs = client.collection("books").query(
    select("title", title_similarity=fn.semantic_similarity("title", "classic novel"))
    .sort(field("title_similarity"), asc=False)
    .limit(10)
)
Under the hood, semantic_index() is powered by Iso-ModernColBERT, TopK’s own multi-vector embedding model, combined with Sparse Multi-Vector Encoding (SMVE) for scalable retrieval and quantized MaxSim reranking.
Why multi-vector? Single-vector (dense) embeddings compress an entire document into one point in high-dimensional space.Multi-vector models like Iso-ModernColBERT keep one embedding per token, enabling token-level matching via MaxSim scoring. This consistently outperforms dense models on out-of-domain content, long documents, specific clauses, tables, and structured data.Read High-Quality Search, Out of the Box for benchmarks and a deep-dive into the architecture.
In the following example, we’ll:
1

Define a collection schema

Create a collection configured for semantic search.
2

Add documents

Insert documents into the collection.
3

Run a semantic query

Retrieve documents using a free-form text query.

Define a collection schema

Semantic search is enabled by adding a semantic_index() to a text() field in the collection schema:
from topk_sdk.schema import text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
    },
)
This configuration automatically generates multi-vector embeddings for the field and enables keyword search. Documents are indexed with sub-second lag — they are searchable as soon as they are written.
If you want to use your own embeddings instead of TopK’s built-in semantic_index(), see Vector Search guide.

Add documents to the collection

Let’s add some documents to the collection:
client.collection("books").upsert(
    [
        {"_id": "gatsby", "title": "The Great Gatsby"},
        {"_id": "1984", "title": "1984"},
        {"_id": "catcher", "title": "The Catcher in the Rye"},
    ],
)

Run a semantic query

To search for documents based on semantic similarity, use the fn.semantic_similarity() function:
from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "classic American novel"),
    )
    .sort(field("title_similarity"), asc=False)
    .limit(10)
)

# Example results:

[
  {
    "_id": "2",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.9497610926628113
  },
  {
    "_id": "1",
    "title": "The Great Gatsby",
    "title_similarity": 0.9480283856391907
  }
]
Let’s break down the example above:
  1. The semantic_similarity() function encodes the query "classic American novel" into multi-vector token embeddings using Iso-ModernColBERT and scores each document via quantized MaxSim — comparing every query token against every document token to find the best alignment.
  2. Candidate retrieval is accelerated by SMVE, which uses fast sparse approximations to identify a small set of candidates before the full MaxSim pass.
  3. The results are ranked by their MaxSim score and the top 10 most relevant documents are returned.
This works out of the box—no embedding pipeline, no vector store, no reranking service to manage. For certain use cases, you might want to use a combination of keyword search and semantic search:
from topk_sdk.query import select, field, fn, match

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "catcher"),
        text_score=fn.bm25_score(),  # Keyword-based relevance
    )
    .filter(match("classic"))  # Ensure the book contains the keyword "classic" in any of the text-indexed fields
    .sort(field("title_similarity") * 0.7 + field("text_score") * 0.3, asc=False) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
    .limit(10)
)
This example above combines keyword relevance (BM25) with semantic similarity,
ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.