Vector search

Vector search is the essential feature of TopK. With that in mind, it is designed to:

Stay above 98% recall — your application (e.g. recommendation, image search, semantic search) rarely misses relevant results.
Provide consistent low latency (<50ms p99). Check out our benchmarks.
Support large-scale single-collection as well as multi-tenant use cases.

How to run a vector search

Prerequisites

Define a schema with a vector field, either f32_vector(), u8_vector(), i8_vector() or binary_vector(), and add a vector_index() to it:

from topk_sdk.schema import text, f32_vector, vector_index

client.collections().create(
    "books",
    schema={
        "title": text().required(),
        "title_embedding": f32_vector(dimension=1536).required().index(vector_index(metric = "cosine")),
    },
)

When defining a vector field, you need to specify the of the vector. To perform a vector search on this field, index it with a vector index and specify the parameter.

Find the closest neighbors

To find the top-k closest neighbors of the query vector, use the vector_distance() function. It computes a numeric value(depending on the vector distance metric specified in the vector index) which you can use to sort the results:

from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        published_year=field("published_year"),
        # Compute vector similarity between the vector embedding of the string "epic fantasy adventure"
        # and the embedding stored in the `title_embedding` field.
        title_similarity=fn.vector_distance("title_embedding", [0.1, 0.2, 0.3, ...]),
    )
    # Return top 10 results
    # sort: smaller euclidean distance = closer; larger cosine similarity = closer
    # if using euclidean distance, sort in ascending order(asc=True)
    .topk(field("title_similarity"), 10)
)

# Example results:
[
  {
    "_id": "2",
    "title": "Lord of the Rings",
    "title_similarity": 0.8150404095649719
  },
  {
    "_id": "1",
    "title": "The Catcher in the Rye",
    "title_similarity": 0.7825378179550171,
  }
]

Let’s break down the example above:

Compute the cosine similarity between the query embedding and the title_embedding field using the vector_distance() function.
Store the computed cosine similarity in the title_similarity field.
Return the top 10 results sorted by the title_similarity field in a descending order.

Combine vector search with metadata filtering

Vector search can be easily combined with metadata filtering by adding a filter() stage to the query:

from topk_sdk.query import select, field, fn

docs = client.collection("books").query(
    select(
        "title",
        title_similarity=fn.vector_distance("title_embedding", [0.1, 0.2, 0.3, ...]),
        published_year=field("published_year"),
    )
    .filter(field("published_year") > 2000)
    .topk(field("title_similarity"), 10)
)

Get Started

Guides

Document API

Collection API

Vector search

How to run a vector search

Prerequisites

Find the closest neighbors

Combine vector search with metadata filtering

Get Started

Guides

Document API

Collection API

​How to run a vector search

​Prerequisites

​Find the closest neighbors

​Combine vector search with metadata filtering

How to run a vector search

Prerequisites

Find the closest neighbors

Combine vector search with metadata filtering