> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Search

> Retrieve the most relevant passages from your documents.

Search your documents and return the most relevant passages based on your natural-language query.

## How it works

When you run **Search**, TopK:

<Steps>
  <Step title="Searches across your documents">
    Searches one or more datasets for the passages most relevant to the query.
  </Step>

  <Step title="Ranks the best matches">
    Results are ranked by relevance so the highest-signal passages come first.
  </Step>

  <Step title="Returns passages with document context">
    Each result includes the matched passage, document ID, dataset, page references, and requested metadata.
  </Step>
</Steps>

## Usage

Once your documents are processed, you can start retrieving relevant passages immediately.

<Tabs>
  <Tab title="CLI" icon="terminal">
    ```bash theme={null}
    topk search "travel reimbursement policy" -d policies
    ```
  </Tab>

  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    <CodeGroup>
      ```python Sync theme={null}
      import os
      from topk_sdk import Client

      client = Client(
          api_key=os.environ.get("TOPK_API_KEY"),
          region="aws-us-east-1-elastica",
      )

      for result in client.search(
          query="travel reimbursement policy",
          datasets=["policies"],
          top_k=10,
      ):
          print(result)
      ```

      ```python Async theme={null}
      import os
      import asyncio
      from topk_sdk import AsyncClient

      client = AsyncClient(
          api_key=os.environ.get("TOPK_API_KEY"),
          region="aws-us-east-1-elastica",
      )

      async def main() -> None:
          async for result in client.search(
              query="travel reimbursement policy",
              datasets=["policies"],
              top_k=10,
          ):
              print(result)

      asyncio.run(main())
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    ```typescript theme={null}
    import { Client } from "topk-js";

    const client = new Client({
      apiKey: process.env.TOPK_API_KEY,
      region: "aws-us-east-1-elastica",
    });

    for await (const message of client.search("travel reimbursement policy", ["policies"], 10)) {
      console.log(message);
    }
    ```
  </Tab>
</Tabs>

Here's an example of a **Search** query over a financial dataset:

<Info>
  **Query:**

  What was the total net income of Bank of America in 2024?

  **Search results:**

  * <Badge color="purple">1</Badge> Condensed Statement of Cash Flows showing net income of \$27,132m (2024) vs \$26,515m (2023)
    <a href="https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf#page=170" target="_blank" rel="noopener noreferrer">bank\_of\_america\_2024.pdf</a> <Badge color="gray" shape="pill">p. 170</Badge>
  * <Badge color="purple">2</Badge> Consolidated Statement of Comprehensive Income: net income line item for 2024–2022
    <a href="https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf#page=92" target="_blank" rel="noopener noreferrer">bank\_of\_america\_2024.pdf</a> <Badge color="gray" shape="pill">p. 92</Badge>
  * <Badge color="purple">3</Badge> Supporting figure from the filing (tabular financial excerpt)
    <a href="https://topk-docs.s3.us-east-2.amazonaws.com/boa-ask-ref-3-figure.jpg" target="_blank" rel="noopener noreferrer">boa-ask-ref-3-figure.jpg</a>
  * <Badge color="purple">4</Badge> Key performance indicators—selected annual financial data (including net income)
    <a href="https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf#page=33" target="_blank" rel="noopener noreferrer">bank\_of\_america\_2024.pdf</a> <Badge color="gray" shape="pill">pp. 33–36</Badge>
  * <Badge color="purple">5</Badge> Segment results tying to total-corporation net income
    <a href="https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf#page=166" target="_blank" rel="noopener noreferrer">bank\_of\_america\_2024.pdf</a> <Badge color="gray" shape="pill">pp. 166–168</Badge>
  * <Badge color="purple">6</Badge> Executive summary—summary income statement and balance sheet excerpts
    <a href="https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf#page=29" target="_blank" rel="noopener noreferrer">bank\_of\_america\_2024.pdf</a> <Badge color="gray" shape="pill">pp. 29–30</Badge>
</Info>

## Understanding search results

Search returns a list of the most relevant Search Results based on the provided query.
[Search Results](/core/search#search-result) are objects referencing specific passages extracted from the original documents.

### Search Result

Each Search Result has the following fields:

| Field        | Type     | Description                                                                                                                         |
| ------------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `doc_id`     | `string` | The ID of the source document assigned at upload time                                                                               |
| `doc_name`   | `string` | The file name of the source document assigned at upload time                                                                        |
| `doc_type`   | `string` | The MIME type of the source document (e.g. `application/pdf`)                                                                       |
| `dataset`    | `string` | The dataset the document belongs to                                                                                                 |
| `content_id` | `string` | A unique identifier for the content of this Search Result                                                                           |
| `content`    | `object` | The matched content — see [Content types](#content-types)                                                                           |
| `metadata`   | `object` | Metadata fields attached to the source document at upload time — must be requested, see [Retrieving metadata](#retrieving-metadata) |

### Content types

There are three content types a search result can contain:

* **Chunk** — a text passage extracted from a document, with optional source page number(s)
* **Image** — an image extracted from a document
* **Page** — an image of a page from a document, with source page number

## Scoping the search

Query across multiple datasets or apply document filters to narrow the scope of the query.

### Scoping to specific datasets

This is useful when you want:

* More targeted results
* Less ambiguity across unrelated document sets
* Tighter control over what content an agent is allowed to see

<Tabs>
  <Tab title="CLI" icon="terminal">
    Use `-d` / `--dataset` (repeatable):

    ```bash theme={null}
    topk search "What was the total net income of Bank of America in 2024?" -d finance -d compliance
    ```
  </Tab>

  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    To specify the datasets to search against, pass them in `datasets=`:

    <CodeGroup>
      ```python Sync theme={null}
      for result in client.search(
          query="What was the total net income of Bank of America in 2024?",
          datasets=["finance", "compliance"],
          top_k=10,
      ):
          print(result)
      ```

      ```python Async theme={null}
      async def main() -> None:
          async for result in client.search(
              query="What was the total net income of Bank of America in 2024?",
              datasets=["finance", "compliance"],
              top_k=10,
          ):
              print(result)

      asyncio.run(main())
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    To specify the datasets to search against, pass them in the `datasets` argument:

    ```typescript theme={null}
    for await (const message of client.search(
      "What was the total net income of Bank of America in 2024?",
      ["finance", "compliance"],
      10,
    )) {
      console.log(message);
    }
    ```
  </Tab>
</Tabs>

### Filter documents

Sometimes a dataset might contain documents that should not be considered for the query. You can filter out documents that don't match your criteria by providing a [filter expression](/documents/query#filtering).

These filter expressions operate on the **metadata fields** of documents.

If your documents include metadata fields, you can use those fields to narrow down the search scope.

This is useful when you want to query:

* Documents within a **specific time** range
* Documents matching a **particular category** or type
* Documents associated with a **specific group** or owner
* Documents the end user is **permitted to access**

<Tabs>
  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    <CodeGroup>
      ```python Sync theme={null}
      from topk_sdk.query import field

      for result in client.search(
          query="travel reimbursement limit",
          datasets=[
              {
                  "dataset": "policies",
                  "filter": field("department").eq("finance").and_(
                      field("year").eq(2024)
                  ),
              }
          ],
          top_k=10,
      ):
          print(result)
      ```

      ```python Async theme={null}
      import asyncio
      from topk_sdk.query import field

      async def main() -> None:
          async for result in client.search(
              query="travel reimbursement limit",
              datasets=[
                  {
                      "dataset": "policies",
                      "filter": field("department").eq("finance").and_(
                          field("year").eq(2024)
                      ),
                  }
              ],
              top_k=10,
          ):
              print(result)

      asyncio.run(main())
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    ```typescript theme={null}
    import { field } from "topk-js/query";

    for await (const message of client.search(
      "travel reimbursement limit",
      [
        {
          dataset: "policies",
          filter: field("department").eq("finance").and(field("year").eq(2024)),
        },
      ],
      10,
    )) {
      console.log(message);
    }
    ```
  </Tab>
</Tabs>

## Retrieving metadata

By default, metadata fields are not included in search results. Pass the field names you want returned and they will appear on each Search Result.

<Tabs>
  <Tab title="CLI" icon="terminal">
    Use `--field` (repeatable) and `--output json` to include metadata field(s) in the output:

    ```bash theme={null}
    topk search "refund policy" -d policies --field title --field author --field year --output json
    ```

    ```json title="Example output" expandable theme={null}
    [
      {
        "metadata": {
          "title": "Refund Policy 2024",
          "author": "Finance Team",
          "year": 2024
        },
        "doc_id": "refund-policy-2024",
        "doc_type": "application/pdf",
        "dataset": "policies",
        "content_id": "doc#refund-policy-2024#chunk#12",
        "doc_name": "refund_policy_2024.pdf",
        "content": {
          "text": "Refund requests must be submitted within 30 days of purchase. Approved refunds are processed within 5–7 business days.",
          "doc_pages": [3]
        }
      }
    ]
    ```
  </Tab>

  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    Pass a list of metadata field names to include in the output:

    <CodeGroup>
      ```python Sync theme={null}
      for result in client.search(
          query="refund policy",
          datasets=["policies"],
          top_k=10,
          select_fields=["title", "author", "year"], # document metadata fields
      ):
          print(result)
      ```

      ```python Async theme={null}
      import asyncio

      async def main() -> None:
          async for result in client.search(
              query="refund policy",
              datasets=["policies"],
              top_k=10,
              select_fields=["title", "author", "year"], # document metadata fields
          ):
              print(result)

      asyncio.run(main())
      ```
    </CodeGroup>

    ```json title="Example output" expandable theme={null}
    {
      "metadata": {
        "title": "Refund Policy 2024",
        "author": "Finance Team",
        "year": 2024
      },
      "doc_id": "refund-policy-2024",
      "doc_type": "application/pdf",
      "dataset": "policies",
      "content_id": "doc#refund-policy-2024#chunk#12",
      "doc_name": "refund_policy_2024.pdf",
      "content": {
        "data": {
          "Chunk": {
            "text": "Refund requests must be submitted within 30 days of purchase. Approved refunds are processed within 5–7 business days.",
            "doc_pages": [3]
          }
        }
      }
    }
    ```
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    Pass a list of metadata field names to include in the output:

    ```typescript theme={null}
    for await (const message of client.search(
      "refund policy",
      ["policies"],
      10,
      undefined,
      ["title", "author", "year"], // document metadata fields
    )) {
      console.log(message);
    }
    ```

    ```json title="Example output" expandable theme={null}
    {
      "metadata": {
        "title": "Refund Policy 2024",
        "author": "Finance Team",
        "year": 2024
      },
      "docId": "refund-policy-2024",
      "docType": "application/pdf",
      "dataset": "policies",
      "contentId": "doc#refund-policy-2024#chunk#12",
      "docName": "refund_policy_2024.pdf",
      "content": {
        "type": "chunk",
        "data": {
          "text": "Refund requests must be submitted within 30 days of purchase. Approved refunds are processed within 5–7 business days.",
          "docPages": [3]
        }
      }
    }
    ```
  </Tab>
</Tabs>

If present on the original document, the requested metadata appears on the returned [Search Result](/core/search#search-result).
