> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingest

> Upload your documents to TopK Context Engine.

To run queries in TopK, your documents need to be uploaded. Once a document is uploaded,
TopK parses, chunks, and indexes its content to power [Ask](/core/ask), [Search](/core/search), and [Research](/core/research).

## Uploading documents

Documents are uploaded to a **dataset**. A dataset is a named container for your documents.
Each document in a dataset is identified by a **document ID**.

<Tip>
  Learn how to create your dataset [here](/datasets#create-a-dataset).
</Tip>

<Tabs>
  <Tab title="CLI" icon="terminal">
    ```bash expandable theme={null}
    # Upload all supported files in the directory
    topk upload ./docs --dataset my-docs

    # Upload all supported files in the directory and subdirectories recursively
    topk upload ./docs -r --dataset my-docs

    # Upload all supported files in the directory by glob pattern
    topk upload "./docs/*" --dataset my-docs

    # Upload all supported files in the directory by glob pattern recursively
    topk upload "./docs/**/*" -r --dataset my-docs

    # Upload all PDFs in the directory by glob pattern
    topk upload "./docs/*.pdf" --dataset my-docs

    # Upload all PDFs in the directory by glob pattern recursively
    topk upload "./docs/**/*.pdf" --dataset my-docs

    # Upload a single file
    topk upload ./report.pdf --dataset my-docs
    ```
  </Tab>

  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    <CodeGroup>
      ```python Sync theme={null}
      import os
      from pathlib import Path
      from topk_sdk import Client

      client = Client(
          api_key=os.environ["TOPK_API_KEY"],
          region=os.environ["TOPK_REGION"],
      )

      handle = client.dataset("my-docs").upsert_file(
          "report-2024",                       # document ID
          Path("./report.pdf"),                # path to file
          {"title": "Annual Report 2024"},     # document metadata (optional)
      )

      print(handle)  # e.g. "hdl_abc123"
      ```

      ```python Async theme={null}
      import os
      from pathlib import Path
      from topk_sdk import AsyncClient

      client = AsyncClient(
          api_key=os.environ["TOPK_API_KEY"],
          region=os.environ["TOPK_REGION"],
      )

      handle = await client.dataset("my-docs").upsert_file(
          "report-2024",                       # document ID
          Path("./report.pdf"),                # path to file
          {"title": "Annual Report 2024"},     # document metadata (optional)
      )

      print(handle)  # e.g. "hdl_abc123"
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    ```typescript theme={null}
    import { Client } from "topk-js";

    const client = new Client({
      apiKey: process.env.TOPK_API_KEY!,
      region: process.env.TOPK_REGION!,
    });

    const handle = await client.dataset("my-docs").upsertFile(
      "report-2024",                    // document ID
      { path: "./report.pdf" },         // path to file
      { title: "Annual Report 2024" },  // document metadata (optional)
    );

    console.log(handle);  // e.g. "hdl_abc123"
    ```
  </Tab>
</Tabs>

### Supported formats

| Format   | MIME type         | Extensions      |
| -------- | ----------------- | --------------- |
| PDF      | `application/pdf` | `.pdf`          |
| Markdown | `text/markdown`   | `.md`           |
| HTML     | `text/html`       | `.html`         |
| PNG      | `image/png`       | `.png`          |
| JPEG     | `image/jpeg`      | `.jpg`, `.jpeg` |
| GIF      | `image/gif`       | `.gif`          |
| WebP     | `image/webp`      | `.webp`         |
| TIFF     | `image/tiff`      | `.tiff`, `.tif` |
| BMP      | `image/bmp`       | `.bmp`          |

### Attaching metadata

Each document can be associated with its own metadata. You can attach additional
information about the document, such as title, author, date, or category.

<Tabs>
  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    <CodeGroup>
      ```python Sync theme={null}
      from pathlib import Path

      resp = client.dataset("my-docs").upsert_file(
          "report-2024",
          Path("./report.pdf"),
          {
              "title": "Annual Report 2024",
              "author": "Finance Team",
              "year": 2024,
              "category": "financials",
              "tags": ["annual-report", "investor-relations"],
              "categories": ["finance", "public-company"],
          },
      )
      ```

      ```python Async theme={null}
      from pathlib import Path

      resp = await client.dataset("my-docs").upsert_file(
          "report-2024",
          Path("./report.pdf"),
          {
              "title": "Annual Report 2024",
              "author": "Finance Team",
              "year": 2024,
              "category": "financials",
              "tags": ["annual-report", "investor-relations"],
              "categories": ["finance", "public-company"],
          },
      )
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    ```typescript theme={null}
    const resp = await client.dataset("my-docs").upsertFile(
      "report-2024",
      { path: "./report.pdf" },
      {
        title: "Annual Report 2024",
        author: "Finance Team",
        year: 2024,
        category: "financials",
        tags: ["annual-report", "investor-relations"],
        categories: ["finance", "public-company"],
      },
    );
    ```
  </Tab>
</Tabs>

<Note>
  Metadata values can also include **lists** of strings or numbers, for example `tags` or `categories`.

  See all supported metadata types [here](/documents/upsert#supported-types).
</Note>

## Processing documents

After a document is uploaded, TopK returns a **processing handle** — a reference to the processing job.
The document is not ready for retrieval until it has been processed.

Processing happens asynchronously in the background and may take a few seconds to a few minutes, depending on document size and complexity.

<Info>
  Your documents might not be ready for retrieval immediately after upload.
</Info>

### Checking the status

After uploading a document, you can check whether it has been processed using the **processing handle**.

<Tabs>
  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    <CodeGroup>
      ```python Sync theme={null}
      import os
      from topk_sdk import Client

      client = Client(
          api_key=os.environ["TOPK_API_KEY"],
          region=os.environ["TOPK_REGION"],
      )

      processed = client.dataset("my-docs").check_handle(handle)
      print(processed)  # True or False
      ```

      ```python Async theme={null}
      import os
      from topk_sdk import AsyncClient

      client = AsyncClient(
          api_key=os.environ["TOPK_API_KEY"],
          region=os.environ["TOPK_REGION"],
      )

      processed = await client.dataset("my-docs").check_handle(handle)

      print(processed)  # True or False
      ```
    </CodeGroup>
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    ```typescript theme={null}
    import { Client } from "topk-js";

    const client = new Client({
      apiKey: process.env.TOPK_API_KEY!,
      region: process.env.TOPK_REGION!,
    });

    const processed = await client.dataset("my-docs").checkHandle(handle);

    console.log(processed);  // true or false
    ```
  </Tab>
</Tabs>

### Waiting for completion

Alternatively, you can wait for the document to be processed using the **processing handle**:

<Tabs>
  <Tab title="CLI" icon="terminal">
    Pass `--wait` to block until processing is complete:

    ```bash theme={null}
    topk upload ./report.pdf --dataset my-docs --wait
    ```
  </Tab>

  <Tab title="Python SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095" width="128" height="128" data-path="icons/python.svg">
    <CodeGroup>
      ```python Sync theme={null}
      import os
      from topk_sdk import Client

      client = Client(
          api_key=os.environ["TOPK_API_KEY"],
          region=os.environ["TOPK_REGION"],
      )

      client.dataset("my-docs").wait_for_handle(handle)

      # Document is now ready to be queried
      ```

      ```python Async theme={null}
      import os
      from topk_sdk import AsyncClient

      client = AsyncClient(
          api_key=os.environ["TOPK_API_KEY"],
          region=os.environ["TOPK_REGION"],
      )

      await client.dataset("my-docs").wait_for_handle(handle)

      # Document is now ready to be queried
      ```
    </CodeGroup>

    You can customize polling behavior by passing a `WaitConfig`:

    <CodeGroup>
      ```python Sync theme={null}
      from topk_sdk import WaitConfig

      client.dataset("my-docs").wait_for_handle(
          handle,
          WaitConfig(frequency_secs=10, timeout_secs=600),
      )
      ```

      ```python Async theme={null}
      from topk_sdk import WaitConfig

      await client.dataset("my-docs").wait_for_handle(
          handle,
          WaitConfig(frequency_secs=10, timeout_secs=600),
      )
      ```
    </CodeGroup>

    | Parameter        | Default | Description                             |
    | ---------------- | ------- | --------------------------------------- |
    | `frequency_secs` | `5`     | How often to poll for the handle status |
    | `timeout_secs`   | `300`   | Maximum time to wait before timing out  |
  </Tab>

  <Tab title="JavaScript SDK" icon="https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1" width="24" height="24" data-path="icons/js.svg">
    ```typescript theme={null}
    import { Client } from "topk-js";

    const client = new Client({
      apiKey: process.env.TOPK_API_KEY!,
      region: process.env.TOPK_REGION!,
    });

    await client.dataset("my-docs").waitForHandle(handle);

    // Document is now ready to be queried
    ```

    You can customize polling behavior by passing a `WaitConfig`:

    ```typescript theme={null}
    await client.dataset("my-docs").waitForHandle(handle, {
      frequencySecs: 10,
      timeoutSecs: 600,
    });
    ```

    | Parameter       | Default | Description                             |
    | --------------- | ------- | --------------------------------------- |
    | `frequencySecs` | `5`     | How often to poll for the handle status |
    | `timeoutSecs`   | `300`   | Maximum time to wait before timing out  |
  </Tab>
</Tabs>
