> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Semantic search

> Add state-of-the-art semantic search to your app with no embedding pipeline, no vector store, and no reranking service to maintain.

<Info>
  **Prerequisites**

  * TopK account ([Sign up here](https://console.topk.io/login))
  * TopK API key ([Get an API key here](https://console.topk.io/api-key))
</Info>

TopK enables you to add state-of-the-art semantic search to your documents with just a few lines of code.

With TopK's semantic search, there is no embedding pipeline to build, no vector store to operate, and no reranking service to maintain. It's as simple as adding [`semantic_index()`](/sdk/topk-py/schema#semantic_index) to your collection schema and querying with [`fn.semantic_similarity()`](/sdk/topk-py/query#semantic_similarity):

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import text, semantic_index
  from topk_sdk.query import select, field, fn

  client.collections().create("books", schema={
      "title": text().required().index(semantic_index()),
  })

  docs = client.collection("books").query(
      select("title", title_similarity=fn.semantic_similarity("title", "classic novel"))
      .sort(field("title_similarity"), asc=False)
      .limit(10)
  )
  ```

  ```typescript Javascript theme={null}
  import { text, semanticIndex } from "topk-js/schema";
  import { select, field, fn } from "topk-js/query";

  await client.collections().create("books", {
    title: text().required().index(semanticIndex()),
  });

  const docs = await client.collection("books").query(
    select({ title: field("title"), title_similarity: fn.semanticSimilarity("title", "classic novel") })
      .sort(field("title_similarity"), false)
      .limit(10)
  );
  ```

  ```sql SQL theme={null}
  CREATE TABLE books (
    title TEXT NOT NULL INDEX semantic_index()
  );

  INSERT INTO books (_id, title)
  VALUES
    ('gatsby',  'The Great Gatsby'),
    ('1984',    '1984'),
    ('catcher', 'The Catcher in the Rye');

  SELECT title, semantic_similarity(title, 'classic novel') AS title_similarity
  FROM books
  ORDER BY title_similarity DESC
  LIMIT 10;
  ```
</CodeGroup>

Under the hood, `semantic_index()` is powered by [Iso-ModernColBERT](https://huggingface.co/topk-io/Iso-ModernColBERT), TopK's own multi-vector embedding model, combined with [Sparse Multi-Vector Encoding (SMVE)](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval) for scalable retrieval and quantized *MaxSim* reranking.

<Note>
  **Why multi-vector?** Single-vector (dense) embeddings compress an entire document into one point in high-dimensional space.

  Multi-vector models like [Iso-ModernColBERT](https://huggingface.co/topk-io/Iso-ModernColBERT) keep one embedding per token, enabling token-level matching via MaxSim scoring. This consistently outperforms dense models on out-of-domain content, long documents, specific clauses, tables, and structured data.

  Read [High-Quality Search, Out of the Box](https://www.topk.io/blog/20260611-semantic-index-multi-vector-retrieval) for benchmarks and a deep-dive into the architecture.
</Note>

## How to perform a semantic search

In the following example, we'll:

<Steps>
  <Step title="Define a collection schema">Create a collection configured for semantic search.</Step>
  <Step title="Add documents">Insert documents into the collection.</Step>
  <Step title="Run a semantic query">Retrieve documents using a free-form text query.</Step>
</Steps>

### Define a collection schema

Semantic search is enabled by adding a [`semantic_index()`](/sdk/topk-py/schema#semantic_index) to a [`text()`](/sdk/topk-py/schema#text) field in the collection schema:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import text, semantic_index

  client.collections().create(
      "books",
      schema={
          "title": text().required().index(semantic_index()),
      },
  )
  ```

  ```typescript Javascript theme={null}
  import { text, semanticIndex } from "topk-js/schema";

  await client.collections().create("books", {
    title: text().required().index(semanticIndex()),
  });
  ```

  ```sql SQL theme={null}
  CREATE TABLE books (
    title TEXT NOT NULL INDEX semantic_index()
  );
  ```
</CodeGroup>

This configuration automatically generates multi-vector embeddings for the field and enables keyword search. Documents are indexed with sub-second lag — they are searchable as soon as they are written.

<Tip>
  If you want to use your own embeddings instead of TopK's built-in `semantic_index()`, see [Vector Search](/guides/vector-search) guide.
</Tip>

### Add documents to the collection

Let's add some documents to the collection:

<CodeGroup>
  ```python Python theme={null}
  client.collection("books").upsert(
      [
          {"_id": "gatsby", "title": "The Great Gatsby"},
          {"_id": "1984", "title": "1984"},
          {"_id": "catcher", "title": "The Catcher in the Rye"},
      ],
  )
  ```

  ```typescript Javascript theme={null}
  await client.collection("books").upsert([
    { _id: "gatsby", title: "The Great Gatsby" },
    { _id: "1984", title: "1984" },
    { _id: "catcher", title: "The Catcher in the Rye" },
  ]);
  ```

  ```sql SQL theme={null}
  INSERT INTO books (_id, title)
  VALUES
    ('gatsby',  'The Great Gatsby'),
    ('1984',    '1984'),
    ('catcher', 'The Catcher in the Rye');
  ```
</CodeGroup>

### Run a semantic query

To search for documents based on semantic similarity, use the [`fn.semantic_similarity()`](/sdk/topk-py/query#semantic_similarity) function:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.query import select, field, fn

  docs = client.collection("books").query(
      select(
          "title",
          title_similarity=fn.semantic_similarity("title", "classic American novel"),
      )
      .sort(field("title_similarity"), asc=False)
      .limit(10)
  )

  # Example results:

  [
    {
      "_id": "2",
      "title": "The Catcher in the Rye",
      "title_similarity": 0.9497610926628113
    },
    {
      "_id": "1",
      "title": "The Great Gatsby",
      "title_similarity": 0.9480283856391907
    }
  ]
  ```

  ```typescript Javascript theme={null}
  import { select, field, fn } from "topk-js/query";

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      title_similarity: fn.semanticSimilarity("title", "classic American novel"),
    })
      .sort(field("title_similarity"), false)
      .limit(10)
  );

  // Example results:

  [
    {
      _id: '2',
      title: 'The Catcher in the Rye',
      title_similarity: 0.9497610926628113
    },
    {
      _id: '1',
      title: 'The Great Gatsby',
      title_similarity: 0.9480283856391907
    }
  ]
  ```

  ```sql SQL theme={null}
  SELECT
    title,
    semantic_similarity(title, 'classic American novel') AS title_similarity
  FROM books
  ORDER BY title_similarity DESC
  LIMIT 10;
  ```
</CodeGroup>

Let's break down the example above:

1. The `semantic_similarity()` function encodes the query `"classic American novel"` into multi-vector token embeddings using Iso-ModernColBERT and scores each document via quantized *MaxSim* — comparing every query token against every document token to find the best alignment.
2. Candidate retrieval is accelerated by [SMVE](https://www.topk.io/blog/20260311-smve-multi-vector-retrieval), which uses fast sparse approximations to identify a small set of candidates before the full MaxSim pass.
3. The results are ranked by their MaxSim score and the top 10 most relevant documents are returned.

This works **out of the box**—no embedding pipeline, no vector store, no reranking service to manage.

## Combining semantic and keyword search

For certain use cases, you might want to use a combination of **keyword search** and **semantic search**:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.query import select, field, fn, match

  docs = client.collection("books").query(
      select(
          "title",
          title_similarity=fn.semantic_similarity("title", "catcher"),
          text_score=fn.bm25_score(),  # Keyword-based relevance
      )
      .filter(match("classic"))  # Ensure the book contains the keyword "classic" in any of the text-indexed fields
      .sort(field("title_similarity") * 0.7 + field("text_score") * 0.3, asc=False) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
      .limit(10)
  )
  ```

  ```typescript Javascript theme={null}
  import { select, field, fn, match } from "topk-js/query";

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      title_similarity: fn.semanticSimilarity("title", "catcher"),
      text_score: fn.bm25Score(), // Keyword-based relevance
    })
      .filter(match("classic")) // Ensure the book contains the keyword "classic" in any of the text-indexed fields
      // Add 70% weight to semantic similarity and 30% weight to keyword relevance
      .sort(field("title_similarity").mul(0.7).add(field("text_score").mul(0.3)), false)
      .limit(10)
  );
  ```

  ```sql SQL theme={null}
  SELECT
    title,
    semantic_similarity(title, 'catcher') AS title_similarity,
    bm25_score() AS text_score
  FROM books
  WHERE match('classic')
  ORDER BY title_similarity * 0.7 + text_score * 0.3 DESC
  LIMIT 10;
  ```
</CodeGroup>

This example above combines **keyword relevance (BM25)** with **semantic similarity**,\
ensuring your search results capture both exact matches and contextual meaning with a **custom scoring function** that's best suited for your use case.
