Skip to main content
TopK is built for high-performance dense vector search workloads. It is designed to:
  • Maintain >98% recall, reducing the likelihood of missing relevant results in applications such as recommendation systems, image search, and semantic search.
  • Deliver consistent low latency (p99 < 50 ms). See the benchmarks for details.
  • Support large-scale single-collection deployments as well as multi-tenant architectures.

Define a collection schema with a vector field

Define a schema with a vector field and add a vector_index():
from topk_sdk.schema import text, f32_vector, vector_index

client.collections().create(
    "books",
    schema={
        "title": text().required(),
        "title_embedding": f32_vector(dimension=1536).required().index(vector_index(metric = "cosine")),
    },
)
Supported vector field types: See the schema reference for full API details. To retrieve the top-k nearest neighbors of a query vector, use the fn.vector_distance() function. fn.vector_distance() computes the distance (or similarity) between a stored vector field and a query vector, based on the distance metric configured in the vector index (e.g., cosine or Euclidean). You can use the computed value to sort and return the closest matches.
from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        published_year=field("published_year"),
        # Compute vector similarity between the vector embedding of the string "epic fantasy adventure"
        # and the embedding stored in the `title_embedding` field.
        title_similarity=fn.vector_distance("title_embedding", [0.1, 0.2, 0.3, ...]),
    )
    # Return top 10 results
    # sort: smaller euclidean distance = closer; larger cosine similarity = closer
    # if using euclidean distance, sort in ascending order(asc=True)
    .topk(field("title_similarity"), 10)
)

# Example results:
[
  {
    "_id": "2",
    "title": "Lord of the Rings",
    "title_similarity": 0.8150404095649719
  },
  {
    "_id": "1",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.7825378179550171,
  }
]
Let’s break down the example above:
  1. Compute the cosine similarity between the query embedding and the title_embedding field using the vector_distance() function.
  2. Store the computed cosine similarity in the title_similarity field.
  3. Return the top 10 results sorted by the title_similarity field in a descending order.

Combine vector search with metadata filtering

Vector search can be combined with metadata filtering by adding a filter() stage to the query:
from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.vector_distance("title_embedding", [0.1, 0.2, 0.3, ...]),
        published_year=field("published_year"),
    )
    .filter(field("published_year") > 2000)
    .topk(field("title_similarity"), 10)
)