Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.topk.io/llms.txt

Use this file to discover all available pages before exploring further.

To run queries in TopK, your documents need to be uploaded. Once a document is uploaded, TopK parses, chunks, and indexes its content to power Ask, Search, and Research.

Uploading documents

Documents are uploaded to a dataset. A dataset is a named container for your documents. Each document in a dataset is identified by a document ID.
Learn how to create your dataset here.
# Upload all supported files in the directory
topk upload ./docs --dataset my-docs

# Upload all supported files in the directory and subdirectories recursively
topk upload ./docs -r --dataset my-docs

# Upload all supported files in the directory by glob pattern
topk upload "./docs/*" --dataset my-docs

# Upload all supported files in the directory by glob pattern recursively
topk upload "./docs/**/*" -r --dataset my-docs

# Upload all PDFs in the directory by glob pattern
topk upload "./docs/*.pdf" --dataset my-docs

# Upload all PDFs in the directory by glob pattern recursively
topk upload "./docs/**/*.pdf" --dataset my-docs

# Upload a single file
topk upload ./report.pdf --dataset my-docs

Supported formats

FormatMIME typeExtensions
PDFapplication/pdf.pdf
Markdowntext/markdown.md
HTMLtext/html.html
PNGimage/png.png
JPEGimage/jpeg.jpg, .jpeg
GIFimage/gif.gif
WebPimage/webp.webp
TIFFimage/tiff.tiff, .tif
BMPimage/bmp.bmp

Attaching metadata

Each document can be associated with its own metadata. You can attach additional information about the document, such as title, author, date, or category.
from pathlib import Path

resp = client.dataset("my-docs").upsert_file(
    "report-2024",
    Path("./report.pdf"),
    {
        "title": "Annual Report 2024",
        "author": "Finance Team",
        "year": 2024,
        "category": "financials",
        "tags": ["annual-report", "investor-relations"],
        "categories": ["finance", "public-company"],
    },
)
Metadata values can also include lists of strings or numbers, for example tags or categories.See all supported metadata types here.

Processing documents

After a document is uploaded, TopK returns a processing handle — a reference to the processing job. The document is not ready for retrieval until it has been processed. Processing happens asynchronously in the background and may take a few seconds to a few minutes, depending on document size and complexity.
Your documents might not be ready for retrieval immediately after upload.

Checking the status

After uploading a document, you can check whether it has been processed using the processing handle.
import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ["TOPK_REGION"],
)

processed = client.dataset("my-docs").check_handle(handle)
print(processed)  # True or False

Waiting for completion

Alternatively, you can wait for the document to be processed using the processing handle:
Pass --wait to block until processing is complete:
topk upload ./report.pdf --dataset my-docs --wait