Ingest

To run queries in TopK, your documents need to be uploaded. Once a document is uploaded, TopK parses, chunks, and indexes its content to power Ask, Search, and Research.

Uploading documents

Documents are uploaded to a dataset. A dataset is a named container for your documents. Each document in a dataset is identified by a document ID.

Learn how to create your dataset here.

CLI
Python SDK
JavaScript SDK

# Upload all supported files in the directory
topk upload ./docs --dataset my-docs

# Upload all supported files in the directory and subdirectories recursively
topk upload ./docs -r --dataset my-docs

# Upload all supported files in the directory by glob pattern
topk upload "./docs/*" --dataset my-docs

# Upload all supported files in the directory by glob pattern recursively
topk upload "./docs/**/*" -r --dataset my-docs

# Upload all PDFs in the directory by glob pattern
topk upload "./docs/*.pdf" --dataset my-docs

# Upload all PDFs in the directory by glob pattern recursively
topk upload "./docs/**/*.pdf" --dataset my-docs

# Upload a single file with an explicit document ID
topk upload ./report.pdf --dataset my-docs --id report-2024

import os
from pathlib import Path
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

resp = client.dataset("my-docs").upsert_file(
    "report-2024",                       # document ID
    Path("./report.pdf"),                # path to file
    {"title": "Annual Report 2024"},     # optional metadata
)

print(resp.handle)  # e.g. "hdl_abc123"

import { Client } from "topk-js";

const client = new Client({
  apiKey: process.env.TOPK_API_KEY!,
  region: process.env.TOPK_REGION!,
});

const resp = await client.dataset("my-docs").upsertFile(
  "report-2024",             // document ID
  "./report.pdf",            // path to file
  { title: "Annual Report 2024" },  // optional metadata
);

console.log(resp.handle);  // e.g. "hdl_abc123"

Supported formats

Format	MIME type	Extensions
PDF	`application/pdf`	`.pdf`
Markdown	`text/markdown`	`.md`
HTML	`text/html`	`.html`
PNG	`image/png`	`.png`
JPEG	`image/jpeg`	`.jpg`, `.jpeg`
GIF	`image/gif`	`.gif`
WebP	`image/webp`	`.webp`
TIFF	`image/tiff`	`.tiff`, `.tif`
BMP	`image/bmp`	`.bmp`

Attaching metadata

Each document can be associated with metadata. This is useful for storing additional information about the document, such as title, author, date, category, or any custom metadata fields you want to attach to the document. Metadata values can also include arrays, for example tags or categories.

Python SDK
JavaScript SDK

from pathlib import Path

resp = client.dataset("my-docs").upsert_file(
    "report-2024",
    Path("./report.pdf"),
    {
        "title": "Annual Report 2024",
        "author": "Finance Team",
        "year": 2024,
        "category": "financials",
        "tags": ["annual-report", "investor-relations"],
        "categories": ["finance", "public-company"],
    },
)

const resp = await client.dataset("my-docs").upsertFile(
  "report-2024",
  "./report.pdf",
  {
    title: "Annual Report 2024",
    author: "Finance Team",
    year: 2024,
    category: "financials",
    tags: ["annual-report", "investor-relations"],
    categories: ["finance", "public-company"],
  },
);

Processing documents

After a document is uploaded, TopK returns a processing handle — a reference to the processing job. The document is not ready for retrieval until it has been processed. Processing happens asynchronously in the background and may take a few seconds to a few minutes, depending on document size and complexity.

Your documents might not be ready for retrieval immediately after upload.

Checking the status

After uploading a document, you can check whether it has been processed using the processing handle.

Python SDK
JavaScript SDK

import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

processed = client.dataset("my-docs").check_handle(resp.handle)
print(processed)  # True or False

import { Client } from "topk-js";

const client = new Client({
  apiKey: process.env.TOPK_API_KEY!,
  region: process.env.TOPK_REGION!,
});

const processed = await client.dataset("my-docs").checkHandle(resp.handle);

console.log(processed);  // true or false

Waiting for completion

Alternatively, you can wait for the document to be processed using the processing handle:

CLI
Python SDK
JavaScript SDK

Pass --wait to block until processing is complete:

topk upload ./report.pdf --dataset my-docs --wait

import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

client.dataset("my-docs").wait_for_handle(resp.handle)

# Document is now ready to be queried

import { Client } from "topk-js";

const client = new Client({
  apiKey: process.env.TOPK_API_KEY!,
  region: process.env.TOPK_REGION!,
});

await client.dataset("my-docs").waitForHandle(resp.handle);

// Document is now ready to be queried

Full example

Python SDK
JavaScript SDK

import os
from pathlib import Path
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

# Create dataset
client.datasets().create("my-docs")

# Upload document
resp = client.dataset("my-docs").upsert_file(
    "report-2024",
    Path("./report.pdf"),
    {"title": "Annual Report 2024"},
)

# Wait for processing
client.dataset("my-docs").wait_for_handle(resp.handle)

print("Document ready.")

import { Client } from "topk-js";

const client = new Client({
  apiKey: process.env.TOPK_API_KEY!,
  region: process.env.TOPK_REGION!,
});

// Create dataset
await client.datasets().create("my-docs");

// Upload document
const resp = await client.dataset("my-docs").upsertFile(
  "report-2024",
  "./report.pdf",
  { title: "Annual Report 2024" },
);

// Wait for processing
await client.dataset("my-docs").waitForHandle(resp.handle);

console.log("Document ready.");

Documentation

Core Concepts

Dataset API

Uploading documents

Supported formats

Attaching metadata

Processing documents

Checking the status

Waiting for completion

Full example

Documentation

Core Concepts

Dataset API

​Uploading documents

​Supported formats

​Attaching metadata

​Processing documents

​Checking the status

​Waiting for completion

​Full example

Uploading documents

Supported formats

Attaching metadata

Processing documents

Checking the status

Waiting for completion

Full example