> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Sparse vector search

TopK provides native support for sparse vector search, enabling exact retrieval over high-dimensional sparse representations.

It is designed to:

* Provide **100% recall** (exact search).
* Support learned sparse representations such as [SPLADE](https://github.com/naver/splade).
* Deliver consistent **low latency** (p99 \< 20 ms). See the [benchmarks](https://www.topk.io/benchmarks) for details.
* Support **large-scale single-collection** deployments as well as **multi-tenant** architectures.

## Define a collection schema with a sparse vector field

Define a schema with a sparse vector field and add a [`vector_index()`](/sdk/topk-py/schema#vector_index):

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import text, f32_sparse_vector, vector_index

  client.collections().create(
      "books",
      schema={
          "title": text().required(),
          "title_embedding": f32_sparse_vector()
            .required()
            .index(vector_index(metric = "dot_product")),
      },
  )
  ```

  ```typescript Javascript theme={null}
  import { text, f32SparseVector, vectorIndex } from "topk-js/schema";

  await client.collections().create("books", {
    title: text().required(),
    title_embedding: f32SparseVector()
      .required()
      .index(vectorIndex({ metric: "dot_product" })),
  });
  ```

  Supported sparse vector field types:

  * **[`f32_sparse_vector()`](/sdk/topk-py/schema#f32_sparse_vector)** — Sparse float32 embeddings
  * **[`u8_sparse_vector()`](/sdk/topk-py/schema#u8_sparse_vector)** — Sparse uint8 embeddings

  See the [schema reference](/sdk/topk-py/schema) for full API details.
</CodeGroup>

<Note>
  Sparse vectors do not have a fixed dimension, so you don't need to specify the vector dimension when defining the field.
</Note>

<Warning>
  TopK only supports `dot_product` metric for sparse vectors which is compatible with both fixed and learned sparse
  vector representations.
</Warning>

## Perform a sparse vector search

To retrieve the top-k nearest neighbors of a query vector, use the [`fn.vector_distance()`](/sdk/topk-py/query#vector_distance) function.

`fn.vector_distance()` computes the distance (or similarity) between a stored sparse vector field and a query vector, based on the distance metric configured in the vector index (e.g., dot product).

You can use the computed value to sort and return the closest matches.

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.query import select, field, fn
  from topk_sdk.data import f32_sparse_vector

  docs = client.collection("books").query(
      select(
          "title",
          published_year=field("published_year"),
          # Compute relevance score between the sparse vector embedding of the string "epic fantasy adventure"
          # and the embedding stored in the `title_embedding` field.
          title_score=fn.vector_distance(
            "title_embedding",
            f32_sparse_vector({0: 0.12, 6: 0.67, ...}),
          )
      )
      # Return top 10 results
      .topk(field("title_score"), 10)
  )

  # Example results:
  [
    {
      "_id": "2",
      "title": "Lord of the Rings",
      "title_score": 0.8150404095649719
    },
    {
      "_id": "1",
      "title": "The Catcher in the Rye",
      "title_score": 0.7825378179550171,
    }
  ]
  ```

  ```js Javascript theme={null}
  import { select, field, fn } from "topk-js/query";
  import { f32SparseVector } from "topk-js/data";

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      published_year: field("published_year"),
      title_score: fn.vectorDistance(
        "title_embedding",
        // Compute relevance score between the sparse vector embedding of the string "epic fantasy adventure"
        // and the embedding stored in the `title_embedding` field.
        f32SparseVector({0: 0.12, 6: 0.67, ...})
      ),
    }).topk(field("title_score"), 10)
  );

  // Example results:
  [
    {
      _id: '2',
      title: 'Lord of the Rings',
      title_score: 0.8150404095649719
    },
    {
      _id: '1'
      title_score: 0.7825378179550171,
      title: 'The Catcher in the Rye',
    }
  ]
  ```
</CodeGroup>

Let's break down the example above:

1. Compute the sparse dot product between the query embedding and the `title_embedding` field using the `vector_distance()` function.
2. Store the computed dot product score in the `title_score` field.
3. Return the top 10 results sorted by the `title_score` field in a descending order.

## Combine sparse vector search with metadata filtering

Sparse vector search can be combined with metadata filtering by adding a [`filter()`](/sdk/topk-py/query#filter) stage to the query:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.query import select, field, fn
  from topk_sdk.data import f32_sparse_vector

  docs = client.collection("books").query(
      select(
          "title",
          title_score=fn.vector_distance(
            "title_embedding",
            f32_sparse_vector({0: 0.12, 6: 0.67, ...}),
          )
          published_year=field("published_year"),
      )
      .filter(field("published_year") > 2000)
      .topk(field("title_score"), 10)
  )
  ```

  ```js Javascript theme={null}
  import { select, field, fn } from "topk-js/query";
  import { f32SparseVector } from "topk-js/data";

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      title_score: fn.vectorDistance(
        "title_embedding",
        f32SparseVector({0: 0.12, 6: 0.67, ...})
      ),
      published_year: field("published_year"),
    })
      .filter(field("published_year").gt(2000))
      .topk(field("title_score"), 10)
  );
  ```
</CodeGroup>
