> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Semantic search

TopK enables you to add semantic search to your documents with just a few lines of code.
There is no need to build or maintain a separate embedding pipeline.

## How to perform a semantic search

In the following example, we'll:

<Steps>
  <Step title="Define a collection schema">Create a collection configured for semantic search.</Step>
  <Step title="Add documents">Insert documents into the collection.</Step>
  <Step title="Run a semantic query">Retrieve documents using a free-form text query.</Step>
</Steps>

### Define a collection schema

Semantic search is enabled by adding a [`semantic_index()`](/sdk/topk-py/schema#semantic_index) to a [`text()`](/sdk/topk-py/schema#text) field in the collection schema:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import text, semantic_index

  client.collections().create(
      "books",
      schema={
          "title": text().required().index(semantic_index()),
      },
  )
  ```

  ```typescript Javascript theme={null}
  import { text, semanticIndex } from "topk-js/schema";

  await client.collections().create("books", {
    title: text().required().index(semanticIndex()),
  });
  ```
</CodeGroup>

This configuration automatically generates embeddings as well as enables keyword search for the specified text fields.

<Tip>
  If you want to use your own embeddings instead of TopK's built-in `semantic_index()`, see [Vector Search](/concepts/vector-search) guide.
</Tip>

### Add documents to the collection

Let's add some documents to the collection:

<CodeGroup>
  ```python Python theme={null}
  client.collection("books").upsert(
      [
          {"_id": "gatsby", "title": "The Great Gatsby"},
          {"_id": "1984", "title": "1984"},
          {"_id": "catcher", "title": "The Catcher in the Rye"},
      ],
  )
  ```

  ```typescript Javascript theme={null}
  await client.collection("books").upsert([
    { _id: "gatsby", title: "The Great Gatsby" },
    { _id: "1984", title: "1984" },
    { _id: "catcher", title: "The Catcher in the Rye" },
  ]);
  ```
</CodeGroup>

### Run a semantic query

To search for documents based on semantic similarity, use the [`fn.semantic_similarity()`](/sdk/topk-py/query#semantic_similarity) function:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.query import select, field, fn

  docs = client.collection("books").query(
      select(
          "title",
          title_similarity=fn.semantic_similarity("title", "classic American novel"),
      )
      .topk(field("title_similarity"), 10)
  )

  # Example results:

  [
    {
      "_id": "2",
      "title": "The Catcher in the Rye",
      "title_similarity": 0.9497610926628113
    },
    {
      "_id": "1",
      "title": "The Great Gatsby",
      "title_similarity": 0.9480283856391907
    }
  ]
  ```

  ```typescript Javascript theme={null}
  import { select, field, fn } from "topk-js/query";

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      title_similarity: fn.semanticSimilarity("title", "classic American novel"),
    })
      .topk(field("title_similarity"), 10)
  );

  // Example results:

  [
    {
      _id: '2',
      title: 'The Catcher in the Rye',
      title_similarity: 0.9497610926628113
    },
    {
      _id: '1',
      title: 'The Great Gatsby'
      title_similarity: 0.9480283856391907
    }
  ]
  ```
</CodeGroup>

Let's break down the example above:

1. The `semantic_similarity()` function computes the similarity between the query `"classic American novel"` and the text value stored in the `title` field for each document.
2. TopK performs automatic query embedding under the hood.
3. The results are ranked based on similarity, and the top 10 most relevant documents are returned.

This works **out of the box**—no need to manage embeddings or external APIs.

## Combining semantic and keyword search

For certain use cases, you might want to use a combination of **keyword search** and **semantic search**:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.query import select, field, fn, match

  docs = client.collection("books").query(
      select(
          "title",
          title_similarity=fn.semantic_similarity("title", "catcher"),
          text_score=fn.bm25_score(),  # Keyword-based relevance
      )
      .filter(match("classic"))  # Ensure the book contains the keyword "classic" in any of the text-indexed fields
      .topk(field("title_similarity") * 0.7 + field("text_score") * 0.3, 10) # Add 70% weight to semantic similarity and 30% weight to keyword relevance
  )
  ```

  ```typescript Javascript theme={null}
  import { select, field, fn, match } from "topk-js/query";

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      title_similarity: fn.semanticSimilarity("title", "catcher"),
      text_score: fn.bm25Score(), // Keyword-based relevance
    })
      .filter(match("classic")) // Ensure the book contains the keyword "classic" in any of the text-indexed fields
      .topk(
        // Add 70% weight to semantic similarity and 30% weight to keyword relevance
        field("title_similarity").mul(0.7).add(field("text_score").mul(0.3)),
        10
      )
  );
  ```
</CodeGroup>

This example above combines **keyword relevance (BM25)** with **semantic similarity**,\
ensuring your search results capture both exact matches and contextual meaning with a **custom scoring function** that's best suited for your use case.
