Skip to main content
To run queries in TopK, your documents need to be uploaded. Once a document is uploaded, TopK parses, chunks, and indexes its content to power Ask, Search, and Research.

Uploading documents

Documents are uploaded to a dataset. A dataset is a named container for your documents. Each document in a dataset is identified by a document ID.
Learn how to create your dataset here.
# Upload all supported files in the directory
topk upload ./docs --dataset my-docs

# Upload all supported files in the directory and subdirectories recursively
topk upload ./docs -r --dataset my-docs

# Upload all supported files in the directory by glob pattern
topk upload "./docs/*" --dataset my-docs

# Upload all supported files in the directory by glob pattern recursively
topk upload "./docs/**/*" -r --dataset my-docs

# Upload all PDFs in the directory by glob pattern
topk upload "./docs/*.pdf" --dataset my-docs

# Upload all PDFs in the directory by glob pattern recursively
topk upload "./docs/**/*.pdf" --dataset my-docs

# Upload a single file with an explicit document ID
topk upload ./report.pdf --dataset my-docs --id report-2024

Supported formats

FormatMIME typeExtensions
PDFapplication/pdf.pdf
Markdowntext/markdown.md
HTMLtext/html.html
PNGimage/png.png
JPEGimage/jpeg.jpg, .jpeg
GIFimage/gif.gif
WebPimage/webp.webp
TIFFimage/tiff.tiff, .tif
BMPimage/bmp.bmp

Attaching metadata

Each document can be associated with metadata. This is useful for storing additional information about the document, such as title, author, date, category, or any custom metadata fields you want to attach to the document. Metadata values can also include arrays, for example tags or categories.
from pathlib import Path

resp = client.dataset("my-docs").upsert_file(
    "report-2024",
    Path("./report.pdf"),
    {
        "title": "Annual Report 2024",
        "author": "Finance Team",
        "year": 2024,
        "category": "financials",
        "tags": ["annual-report", "investor-relations"],
        "categories": ["finance", "public-company"],
    },
)

Processing documents

After a document is uploaded, TopK returns a processing handle — a reference to the processing job. The document is not ready for retrieval until it has been processed. Processing happens asynchronously in the background and may take a few seconds to a few minutes, depending on document size and complexity.
Your documents might not be ready for retrieval immediately after upload.

Checking the status

After uploading a document, you can check whether it has been processed using the processing handle.
import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

processed = client.dataset("my-docs").check_handle(resp.handle)
print(processed)  # True or False

Waiting for completion

Alternatively, you can wait for the document to be processed using the processing handle:
Pass --wait to block until processing is complete:
topk upload ./report.pdf --dataset my-docs --wait

Full example

import os
from pathlib import Path
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

# Create dataset
client.datasets().create("my-docs")

# Upload document
resp = client.dataset("my-docs").upsert_file(
    "report-2024",
    Path("./report.pdf"),
    {"title": "Annual Report 2024"},
)

# Wait for processing
client.dataset("my-docs").wait_for_handle(resp.handle)

print("Document ready.")