Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.topk.io/llms.txt

Use this file to discover all available pages before exploring further.

TopK enables you to add semantic search to your documents with just a few lines of code. There is no need to build or maintain a separate embedding pipeline. In the following example, we’ll:
1

Define a collection schema

Create a collection configured for semantic search.
2

Add documents

Insert documents into the collection.
3

Run a semantic query

Retrieve documents using a free-form text query.

Define a collection schema

Semantic search is enabled by adding a semantic_index() to a text() field in the collection schema:
from topk_sdk.schema import text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
    },
)
This configuration automatically generates embeddings as well as enables keyword search for the specified text fields.
If you want to use your own embeddings instead of TopK’s built-in semantic_index(), see Vector Search guide.

Add documents to the collection

Let’s add some documents to the collection:
client.collection("books").upsert(
    [
        {"_id": "gatsby", "title": "The Great Gatsby"},
        {"_id": "1984", "title": "1984"},
        {"_id": "catcher", "title": "The Catcher in the Rye"},
    ],
)

Run a semantic query

To search for documents based on semantic similarity, use the fn.semantic_similarity() function:
from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "classic American novel"),
    )
    .topk(field("title_similarity"), 10)
)

# Example results:

[
  {
    "_id": "2",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.9497610926628113
  },
  {
    "_id": "1",
    "title": "The Great Gatsby",
    "title_similarity": 0.9480283856391907
  }
]
Let’s break down the example above:
  1. The semantic_similarity() function computes the similarity between the query "classic American novel" and the text value stored in the title field for each document.
  2. TopK performs automatic query embedding under the hood.
  3. The results are ranked based on similarity, and the top 10 most relevant documents are returned.
This works out of the box—no need to manage embeddings or external APIs. For certain use cases, you might want to use a combination of keyword search and semantic search:
from topk_sdk.query import select, field, fn, match

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.semantic_similarity("title", "catcher"),
        text_score=fn.bm25_score(),  # Keyword-based relevance
    )
    .filter(match("classic"))  # Ensure the book contains the keyword "classic" in any of the text-indexed fields
    .topk(field("title_similarity") * 0.7 + field("text_score") * 0.3, 10) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
)
This example above combines keyword relevance (BM25) with semantic similarity,
ensuring your search results capture both exact matches and contextual meaning with a custom scoring function that’s best suited for your use case.