Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.topk.io/llms.txt

Use this file to discover all available pages before exploring further.

TopK is a hybrid retrieval engine built on object storage for 10x lower cost and massive scale. It supports dense/sparse vector search, multi-vector retrieval, powerful filtering, custom ranking, and managed inference in one API. For unstructured document search use cases, we provide a datasets abstraction that allows you to ingest files, search, and get answers from your private documents. Connect your data to agents using our CLI or MCP server.

Get Started

Prerequisites
Simple example to get you started with TopK. Check out our guides for more complex examples.
1

Install Python SDK

pip install topk-sdk
2

Initialize the client

Setup the TopK client with your API key and region.
import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ.get("TOPK_REGION", "aws-us-east-1-elastica"),
)
See available regions for a full list of supported regions.
3

Create a collection

client.collections().create(
  "quickstart",
  schema={
    "title": text().required().index(keyword_index()),
    "content": text().index(semantic_index()),
  }
)
4

Upsert documents

client.collection("quickstart").upsert([
  {
      "_id": "1",
      "title": "Catcher in the Rye",
      "content": "IF YOU REALLY WANT TO HEAR about it, the first thing you'll probably want to know is ...",
      "author": "J.D. Salinger",
      "year": 1951,
  },
  {
      "_id": "2",
      "title": "1984",
      "content": "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, ...",
      "author": "George Orwell",
      "year": 1949,
  },
  ...
])
5

Query indexed data

from topk_sdk.query import select, field, fn

client.collection("quickstart").query(
  select(
    # Select document fields to return
    "_id", "title", "author",
    # Compute semantic similarity of content field with the query
    similarity_score = fn.semantic_similarity(
      "content",
      "What is the meaning of life?",
    )
  )
  # Filter documents by metadata
  .filter(field("rating") >= 3.0)
  # Rank using the computed similarity score and rating
  .sort(field("rating") * field("similarity_score"))
  # Get top 10 highest ranked documents
  .limit(10)
)
To learn more about how to use the Python SDK, see the Python SDK documentation.
1

Install CLI

brew tap topk-io/topk
brew install topk
If you don’t have Homebrew installed, you can install it here.
2

Authenticate

topk login
This command prompts you to either create a new API key or set an existing one.You can also skip the topk login command and authenticate by providing your API key via the TOPK_API_KEY environment variable:
export TOPK_API_KEY="your-api-key"
3

Create a dataset

A dataset is a named container for your documents. To create a dataset, run the following command:
topk dataset create quickstart --region aws-us-east-1-elastica
The --region flag determines where your data is stored. See available regions.
4

Upload a file

Upload a file to your dataset:
Download a sample PDF financial report:
curl -L https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf -o report.pdf
topk upload ./report.pdf --dataset quickstart
5

Ask a question

topk ask "What was the total net income of Bank of America in 2024?" -d quickstart
Answer:
  • Bank of America’s total net income for 2024 was $27,132 million (approximately $27.1 billion). 1 2 3 4 5 6
  • The 2024 net income represented an increase from the $26.5 billion reported in 2023. 3 4 6
  • The increase was driven by higher noninterest income, partially offset by a higher provision for credit losses and lower net interest income. 3 6
Citations:
{
  "facts": [
    {
      "fact": "Bank of America's total net income for the fiscal year 2024 was $27,132 million.",
      "ref_ids": ["1", "2", "3", "4", "5", "6"]
    },
    {
      "fact": "The 2024 net income of $27.1 billion represented an increase from the $26.5 billion reported in 2023.",
      "ref_ids": ["3", "4", "6"]
    },
    {
      "fact": "The increase was driven by higher noninterest income, partially offset by a higher provision for credit losses and lower net interest income.",
      "ref_ids": ["3", "6"]
    }
  ],
  "refs": {
    "1": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#130",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Condensed Statement of Cash Flows\n[row_1]; []=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
        "doc_pages": [170]
      }
    },
    "2": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#428",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Consolidated Statement of Comprehensive Income\n[row_0]; [Dollars in millions]=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
        "doc_pages": [92]
      }
    },
    "3": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#image#92",
      "doc_name": "report.pdf",
      "content": {
        "mime_type": "image/jpeg",
        "data": "<base64 encoded image>"
      }
    },
    "4": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#129",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Key Performance Indicators\n[row_7]=Income statement; []=Net income; [2024]=27,132; [2023]=26,515\n...",
        "doc_pages": [33, 34, 35, 36]
      }
    },
    "5": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#854",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Results of Business Segments\n[row_9]; [Item]=Net income; [Total Corporation (2) 2024]=27,132; [Total Corporation (2) 2023]=26,515\n...",
        "doc_pages": [166, 167, 168]
      }
    },
    "6": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#131",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Executive Summary > Financial Highlights\n[row_7]=Net income; [2024]=27,132; [2023]=26,515\n...",
        "doc_pages": [29, 30]
      }
    }
  },
  "confidence": 100.0
}
To learn more about how to use TopK CLI, see the CLI documentation.

Integrations

https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/python.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=97cbee7891538170fd752e1afbc98095

Python SDK

Full Python SDK reference.
https://mintcdn.com/topk/8NBkS0nek3e9o6Vi/icons/js.svg?fit=max&auto=format&n=8NBkS0nek3e9o6Vi&q=85&s=7642cf18b45f52a70f141214b3d0eca1

JavaScript SDK

Full TypeScript/JavaScript SDK reference.

CLI

Upload files, search, and ask questions directly from the terminal.

MCP Server

Connect TopK to any MCP-compatible AI agent via the Model Context Protocol.

Security & Compliance

TopK is SOC 2 Type I certified. Visit the trust center for full details.

Data encryption

All data is encrypted in transit and at rest.

Access control

Role-based access control with full auditability.

Private Deployment

Deploy inside your own VPC for complete isolation and data residency. Contact us for more details.

Learn More

Architecture

Learn about TopK’s architecture and how it works.

Concepts

Discover core concepts and how they work together.