Introduction

TopK is a hybrid retrieval engine built on object storage for 10x lower cost and massive scale. It supports dense/sparse vector search, multi-vector retrieval, powerful filtering, custom ranking, and managed inference in one API. For unstructured document search use cases, we provide a datasets abstraction that allows you to ingest files, search, and get answers from your private documents. Connect your data to agents using our CLI or MCP server.

Get Started

Prerequisites

TopK account (Sign up here)
TopK API key (Get an API key here)

Hybrid Search

Simple example to get you started with TopK. Check out our guides for more complex examples.

Python SDK
JavaScript SDK
SQL

Install Python SDK

pip install topk-sdk

uv add topk-sdk

Initialize the client

Setup the TopK client with your API key and region.

import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ.get("TOPK_REGION", "aws-us-east-1-elastica"),
)

import os
from topk_sdk import AsyncClient

client = AsyncClient(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ.get("TOPK_REGION", "aws-us-east-1-elastica"),
)

See available regions for a full list of supported regions.

Create a collection

client.collections().create(
  "quickstart",
  schema={
    "title": text().required().index(keyword_index()),
    "content": text().index(semantic_index()),
  }
)

await client.collections().create(
  "quickstart",
  schema={
    "title": text().required().index(keyword_index()),
    "content": text().index(semantic_index()),
  }
)

Upsert documents

client.collection("quickstart").upsert([
  {
      "_id": "1",
      "title": "Catcher in the Rye",
      "content": "IF YOU REALLY WANT TO HEAR about it, the first thing you'll probably want to know is ...",
      "author": "J.D. Salinger",
      "year": 1951,
  },
  {
      "_id": "2",
      "title": "1984",
      "content": "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, ...",
      "author": "George Orwell",
      "year": 1949,
  },
  ...
])

await client.collection("quickstart").upsert([
  {
      "_id": "1",
      "title": "Catcher in the Rye",
      "content": "IF YOU REALLY WANT TO HEAR about it, the first thing you'll probably want to know is ...",
      "author": "J.D. Salinger",
      "rating": 3.8
  },
  {
      "_id": "2",
      "title": "1984",
      "content": "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, ...",
      "author": "George Orwell",
      "rating": 4.7
  },
  ...
])

Query indexed data

from topk_sdk.query import select, field, fn

client.collection("quickstart").query(
  select(
    # Select document fields to return
    "_id", "title", "author",
    # Compute semantic similarity of content field with the query
    similarity_score = fn.semantic_similarity(
      "content",
      "What is the meaning of life?",
    )
  )
  # Filter documents by metadata
  .filter(field("rating") >= 3.0)
  # Rank using the computed similarity score and rating
  .sort(field("rating") * field("similarity_score"), asc=False)
  # Get top 10 highest ranked documents
  .limit(10)
)

from topk_sdk.query import select, field, fn

await client.collection("quickstart").query(
  select(
    # Select document fields to return
    "_id", "title", "author",
    # Compute semantic similarity of content field with the query
    similarity_score = fn.semantic_similarity(
      "content",
      "What is the meaning of life?",
    )
  )
  # Filter documents by metadata
  .filter(field("rating") >= 3.0)
  # Rank using the computed similarity score and rating
  .sort(field("rating") * field("similarity_score"), asc=False)
  # Get top 10 highest ranked documents
  .limit(10)
)

To learn more about how to use the Python SDK, see the Python SDK documentation.

Install JavaScript SDK

npm install topk-js

yarn add topk-js

pnpm add topk-js

Initialize the client

Setup the TopK client with your API key and region.

import { Client } from "topk-js";

const client = new Client({
  apiKey: process.env.TOPK_API_KEY!,
  region: process.env.TOPK_REGION ?? "aws-us-east-1-elastica",
});

See available regions for a full list of supported regions.

Create a collection

import { text, keywordIndex, semanticIndex } from "topk-js/schema";

await client.collections().create("quickstart", {
  title: text().required().index(keywordIndex()),
  content: text().index(semanticIndex()),
});

Upsert documents

await client.collection("quickstart").upsert([
  {
    _id: "1",
    title: "Catcher in the Rye",
    content: "IF YOU REALLY WANT TO HEAR about it, the first thing you'll probably want to know is ...",
    author: "J.D. Salinger",
    rating: 3.8,
  },
  {
    _id: "2",
    title: "1984",
    content: "It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, ...",
    author: "George Orwell",
    rating: 4.7,
  },
  // ...
]);

Query indexed data

import { select, field, fn } from "topk-js/query";

await client.collection("quickstart").query(
  select({
    title: field("title"),
    author: field("author"),
    // Compute semantic similarity of content field with the query
    similarity_score: fn.semanticSimilarity(
      "content",
      "What is the meaning of life?",
    ),
  })
  // Filter documents by metadata
  .filter(field("rating").gte(3.0))
  // Rank using the computed similarity score and rating
  .sort(field("rating").mul(field("similarity_score")), false)
  // Get top 10 highest ranked documents
  .limit(10)
);

To learn more about how to use the JavaScript SDK, see the JavaScript SDK documentation.

Connect

Connect using any PostgreSQL client. Use your API key as the password.

psql "host=<region>.sql.topk.io port=5432 user=topk password=$TOPK_API_KEY dbname=topk"

Create a collection

CREATE TABLE quickstart (
  title   TEXT NOT NULL INDEX keyword_index(),
  content TEXT          INDEX semantic_index(),
  rating  FLOAT
);

Upsert documents

INSERT INTO quickstart (_id, title, content, author, rating)
VALUES
  ('1', 'Catcher in the Rye',
   'IF YOU REALLY WANT TO HEAR about it, the first thing you''ll probably want to know is ...',
   'J.D. Salinger', 3.8),
  ('2', '1984',
   'It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, ...',
   'George Orwell', 4.7);

Query indexed data

SELECT
  _id,
  title,
  semantic_similarity(content, 'What is the meaning of life?') AS similarity_score
FROM quickstart
WHERE rating >= 3.0
ORDER BY similarity_score * rating DESC
LIMIT 10;

File Search

CLI
Python SDK
JavaScript SDK

Install CLI

brew tap topk-io/topk
brew install topk

If you don’t have Homebrew installed, you can install it here.

Authenticate

topk login

This command prompts you to either create a new API key or set an existing one.You can also skip the topk login command and authenticate by providing your API key via the TOPK_API_KEY environment variable:

export TOPK_API_KEY="your-api-key"

Create a dataset

A dataset is a named container for your documents. To create a dataset, run the following command:

topk dataset create quickstart --region aws-us-east-1-elastica

The --region flag determines where your data is stored. See available regions.

Upload a file

Upload a file to your dataset:

Download a sample PDF financial report:

curl -L https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf -o report.pdf

topk upload ./report.pdf --dataset quickstart

Ask a question

topk ask "What was the total net income of Bank of America in 2024?" -d quickstart

Answer:

Bank of America’s total net income for 2024 was $27,132 million (approximately $27.1 billion). 1 2 3 4 5 6
The 2024 net income represented an increase from the $26.5 billion reported in 2023. 3 4 6
The increase was driven by higher noninterest income, partially offset by a higher provision for credit losses and lower net interest income. 3 6

Citations:

1 Condensed Statement of Cash Flows showing net income of $27,132m (2024) vs $26,515m (2023)
bank_of_america_2024.pdf p. 170
2 Consolidated Statement of Comprehensive Income: net income line item for 2024–2022
bank_of_america_2024.pdf p. 92
3 Supporting figure from the filing (tabular financial excerpt)
boa-ask-ref-3-figure.jpg
4 Key performance indicators—selected annual financial data (including net income)
bank_of_america_2024.pdf pp. 33–36
5 Segment results tying to total-corporation net income
bank_of_america_2024.pdf pp. 166–168
6 Executive summary—summary income statement and balance sheet excerpts
bank_of_america_2024.pdf pp. 29–30

Full JSON output

{
  "facts": [
    {
      "fact": "Bank of America's total net income for the fiscal year 2024 was $27,132 million.",
      "ref_ids": ["1", "2", "3", "4", "5", "6"]
    },
    {
      "fact": "The 2024 net income of $27.1 billion represented an increase from the $26.5 billion reported in 2023.",
      "ref_ids": ["3", "4", "6"]
    },
    {
      "fact": "The increase was driven by higher noninterest income, partially offset by a higher provision for credit losses and lower net interest income.",
      "ref_ids": ["3", "6"]
    }
  ],
  "refs": {
    "1": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#130",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Condensed Statement of Cash Flows\n[row_1]; []=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
        "doc_pages": [170]
      }
    },
    "2": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#428",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Consolidated Statement of Comprehensive Income\n[row_0]; [Dollars in millions]=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
        "doc_pages": [92]
      }
    },
    "3": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#image#92",
      "doc_name": "report.pdf",
      "content": {
        "mime_type": "image/jpeg",
        "data": "<base64 encoded image>"
      }
    },
    "4": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#129",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Key Performance Indicators\n[row_7]=Income statement; []=Net income; [2024]=27,132; [2023]=26,515\n...",
        "doc_pages": [33, 34, 35, 36]
      }
    },
    "5": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#854",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Results of Business Segments\n[row_9]; [Item]=Net income; [Total Corporation (2) 2024]=27,132; [Total Corporation (2) 2023]=26,515\n...",
        "doc_pages": [166, 167, 168]
      }
    },
    "6": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#131",
      "doc_name": "report.pdf",
      "content": {
        "text": "## Executive Summary > Financial Highlights\n[row_7]=Net income; [2024]=27,132; [2023]=26,515\n...",
        "doc_pages": [29, 30]
      }
    }
  },
  "confidence": 100.0
}

To learn more about how to use TopK CLI, see the CLI documentation.

Install Python SDK

pip install topk-sdk

uv add topk-sdk

Initialize the client

Setup the TopK client with your API key and region.

import os
from topk_sdk import Client

client = Client(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ.get("TOPK_REGION", "aws-us-east-1-elastica"),
)

import os
from topk_sdk import AsyncClient

client = AsyncClient(
    api_key=os.environ["TOPK_API_KEY"],
    region=os.environ.get("TOPK_REGION", "aws-us-east-1-elastica"),
)

See available regions for a full list of supported regions.

Create a dataset

client.datasets().create("quickstart")

await client.datasets().create("quickstart")

Upload a file

Upload a file to your dataset by providing the document ID, the path to the file, and optional metadata:

Download a sample PDF financial report:

curl -L https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf -o report.pdf

from pathlib import Path

handle = client.dataset("quickstart").upsert_file(
    "bank-of-america-annual-report-2024",  # document ID
    Path("./report.pdf"),                  # path to file
    {
        "ticker": "BAC",
        "doc_type": "annual_report",
        "fiscal_year": 2024,
    },
)

from pathlib import Path

handle = await client.dataset("quickstart").upsert_file(
    "bank-of-america-annual-report-2024",  # document ID
    Path("./report.pdf"),                  # path to file
    {
        "ticker": "BAC",
        "doc_type": "annual_report",
        "fiscal_year": 2024,
    },
)

After the file has been uploaded, wait for it to be processed:

client.dataset("quickstart").wait_for_handle(handle)

await client.dataset("quickstart").wait_for_handle(handle)

Ask a question

for message in client.ask("What was the total net income of Bank of America in 2024?", ["quickstart"]):
    print(message)

async for message in client.ask("What was the total net income of Bank of America in 2024?", ["quickstart"]):
    print(message)

Full JSON output

{
  "facts": [
    {
      "fact": "Bank of America's total net income for the fiscal year 2024 was $27,132 million.",
      "ref_ids": ["1", "2", "3", "4", "5", "6"]
    },
    {
      "fact": "The 2024 net income of $27.1 billion represented an increase from the $26.5 billion reported in 2023.",
      "ref_ids": ["3", "4", "6"]
    },
    {
      "fact": "The increase in 2024 net income was driven by higher noninterest income, although this was partially offset by a higher provision for credit losses and lower net interest income.",
      "ref_ids": ["3", "6"]
    }
  ],
  "refs": {
    "1": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#130",
      "doc_name": "report.pdf",
      "content": {
        "data": {
          "Chunk": {
            "text": "## Condensed Statement of Cash Flows\n[row_1]; []=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
            "doc_pages": [170]
          }
        }
      }
    },
    "2": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#428",
      "doc_name": "report.pdf",
      "content": {
        "data": {
          "Chunk": {
            "text": "## Consolidated Statement of Comprehensive Income\n[row_0]; [Dollars in millions]=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
            "doc_pages": [92]
          }
        }
      }
    },
    "3": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#image#92",
      "doc_name": "report.pdf",
      "content": {
        "data": {
          "Image": {
            "mime_type": "image/jpeg",
            "data": "<base64 encoded image>"
          }
        }
      }
    },
    "4": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#129",
      "doc_name": "report.pdf",
      "content": {
        "data": {
          "Chunk": {
            "text": "## Key Performance Indicators\n[row_7]=Income statement; []=Net income; [2024]=27,132; [2023]=26,515\n...",
            "doc_pages": [33, 34, 35, 36]
          }
        }
      }
    },
    "5": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#854",
      "doc_name": "report.pdf",
      "content": {
        "data": {
          "Chunk": {
            "text": "## Results of Business Segments\n[row_9]; [Item]=Net income; [Total Corporation (2) 2024]=27,132; [Total Corporation (2) 2023]=26,515\n...",
            "doc_pages": [166, 167, 168]
          }
        }
      }
    },
    "6": {
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "quickstart",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#131",
      "doc_name": "report.pdf",
      "content": {
        "data": {
          "Chunk": {
            "text": "## Executive Summary > Financial Highlights\n[row_7]=Net income; [2024]=27,132; [2023]=26,515\n...",
            "doc_pages": [29, 30]
          }
        }
      }
    }
  },
  "confidence": 100.0
}

To learn more about how to use the Python SDK, see the Python SDK documentation.

Install JavaScript SDK

npm install topk-js

yarn add topk-js

pnpm add topk-js

Initialize the client

import { Client } from "topk-js";

const client = new Client({
  apiKey: "your-api-key",
  region: "aws-us-east-1-elastica",
});

See available regions for a full list of supported regions.

Create a dataset

await client.datasets().create("quickstart");

Upload a file

Download a sample PDF financial report:

curl -L https://topk-docs.s3.us-east-2.amazonaws.com/bank_of_america_2024.pdf -o report.pdf

const handle = await client.dataset("quickstart").upsertFile(
  "bank-of-america-annual-report-2024", // document ID
  { path: "./report.pdf" },             // path to file
  {
    ticker: "BAC",
    doc_type: "annual_report",
    fiscal_year: 2024,
  },
);

Wait for the file to be processed:

await client.dataset("quickstart").waitForHandle(handle);

Ask a question

for await (const message of client.ask("What was the total net income of Bank of America in 2024?", ["quickstart"])) {
  console.log(message);
}

Full JSON output

{
  "facts": [
    {
      "fact": "Bank of America's total net income for the fiscal year 2024 was $27,132 million.",
      "refIds": ["1", "2", "3", "4", "5", "6"]
    },
    {
      "fact": "The 2024 net income of $27.1 billion represented an increase from the $26.5 billion reported in 2023.",
      "refIds": ["3", "4", "6"]
    },
    {
      "fact": "The increase in 2024 net income was driven by higher noninterest income, although this was partially offset by a higher provision for credit losses and lower net interest income.",
      "refIds": ["3", "6"]
    }
  ],
  "refs": {
    "1": {
      "docId": "bank-of-america-annual-report-2024",
      "docType": "application/pdf",
      "dataset": "quickstart",
      "contentId": "doc#bank-of-america-annual-report-2024#chunk#130",
      "docName": "report.pdf",
      "content": {
        "type": "chunk",
        "data": {
          "text": "## Condensed Statement of Cash Flows\n[row_1]; []=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
          "docPages": [170]
        }
      }
    },
    "2": {
      "docId": "bank-of-america-annual-report-2024",
      "docType": "application/pdf",
      "dataset": "quickstart",
      "contentId": "doc#bank-of-america-annual-report-2024#chunk#428",
      "docName": "report.pdf",
      "content": {
        "type": "chunk",
        "data": {
          "text": "## Consolidated Statement of Comprehensive Income\n[row_0]; [Dollars in millions]=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
          "docPages": [92]
        }
      }
    },
    "3": {
      "docId": "bank-of-america-annual-report-2024",
      "docType": "application/pdf",
      "dataset": "quickstart",
      "contentId": "doc#bank-of-america-annual-report-2024#image#92",
      "docName": "report.pdf",
      "content": {
        "type": "image",
        "data": {
          "mimeType": "image/jpeg",
          "data": "<base64 encoded image>"
        }
      }
    },
    "4": {
      "docId": "bank-of-america-annual-report-2024",
      "docType": "application/pdf",
      "dataset": "quickstart",
      "contentId": "doc#bank-of-america-annual-report-2024#chunk#129",
      "docName": "report.pdf",
      "content": {
        "type": "chunk",
        "data": {
          "text": "## Key Performance Indicators\n[row_7]=Income statement; []=Net income; [2024]=27,132; [2023]=26,515\n...",
          "docPages": [33, 34, 35, 36]
        }
      }
    },
    "5": {
      "docId": "bank-of-america-annual-report-2024",
      "docType": "application/pdf",
      "dataset": "quickstart",
      "contentId": "doc#bank-of-america-annual-report-2024#chunk#854",
      "docName": "report.pdf",
      "content": {
        "type": "chunk",
        "data": {
          "text": "## Results of Business Segments\n[row_9]; [Item]=Net income; [Total Corporation (2) 2024]=27,132; [Total Corporation (2) 2023]=26,515\n...",
          "docPages": [166, 167, 168]
        }
      }
    },
    "6": {
      "docId": "bank-of-america-annual-report-2024",
      "docType": "application/pdf",
      "dataset": "quickstart",
      "contentId": "doc#bank-of-america-annual-report-2024#chunk#131",
      "docName": "report.pdf",
      "content": {
        "type": "chunk",
        "data": {
          "text": "## Executive Summary > Financial Highlights\n[row_7]=Net income; [2024]=27,132; [2023]=26,515\n...",
          "docPages": [29, 30]
        }
      }
    }
  },
  "confidence": 100.0
}

To learn more about how to use the JavaScript SDK, see the JavaScript SDK documentation.

Integrations

Python SDK

Full Python SDK reference.

JavaScript SDK

Full TypeScript/JavaScript SDK reference.

CLI

Upload files, search, and ask questions directly from the terminal.

MCP Server

Connect TopK to any MCP-compatible AI agent via the Model Context Protocol.

Security & Compliance

TopK is SOC 2 Type I certified. Visit the trust center for full details.

Data encryption

All data is encrypted in transit and at rest.

Access control

Role-based access control with full auditability.

Private Deployment

Deploy inside your own VPC for complete isolation and data residency. Contact us for more details.

Overview

Dataset API

Collection API

Management API

Introduction

Get Started

Hybrid Search

File Search

Integrations

Python SDK

JavaScript SDK

CLI

MCP Server

Security & Compliance

Data encryption

Access control

Private Deployment

Learn More

Architecture

Concepts

​Get Started

​Hybrid Search

​File Search

​Integrations

Python SDK

JavaScript SDK

CLI

MCP Server

​Security & Compliance

Data encryption

Access control

Private Deployment

​Learn More

Architecture

Concepts

Get Started

Hybrid Search

File Search

Integrations

Security & Compliance

Learn More