Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.topk.io/llms.txt

Use this file to discover all available pages before exploring further.

TopK answers natural-language queries over your documents. It retrieves the most relevant parts of your documents and synthesizes a grounded answer with source citations.

How it works

When you run an Ask, TopK:
1

Understands your query

Interprets your question and organizes it into a clear sequence of answerable steps
2

Searches your documents

Your documents are searched to find the most relevant passages based on your query.
3

Generates a grounded answer

Produce a grounded answer based on the retrieved evidence.
4

Returns answer with source citations

Returns a grounded answer with facts, citations, and a confidence score. See Understanding the answer.

Usage

Once a dataset is created and your documents are processed, you can start running agentic queries against your documents:
topk ask "What was the total net income of Bank of America in 2024?" -d my-docs
Here’s an example Ask query against a corporate filing (financial knowledge base):
Query:What was the total net income of Bank of America in 2024?Answer:
  • Bank of America’s total net income for the fiscal year 2024 was $27,132 million. 1 2 3 4 5 6
  • The 2024 net income of $27.1 billion represented an increase from the $26.5 billion reported in 2023. 3 4 6
  • The increase in 2024 net income was driven by higher noninterest income, although this was partially offset by a higher provision for credit losses and lower net interest income. 3 6
Citations:

Understanding the answer

The answer consists of three fields:
  • facts — individual statements answering the query, each backed by one or more citations
  • refs — a map from citation number to a Search Result
  • confidence — a score between 0 and 100 indicating confidence in the answer
{
  "facts": [
    {
      "fact": "Bank of America's total net income for the fiscal year 2024 was $27,132 million.",
      "ref_ids": ["1", "2", "3", "4", "5", "6"]
    },
    {
      "fact": "The 2024 net income represented an increase from the $26.5 billion reported in 2023.",
      "ref_ids": ["3", "4", "6"]
    },
    {
      "fact": "The increase was driven by higher noninterest income, partially offset by a higher provision for credit losses and lower net interest income.",
      "ref_ids": ["3", "6"]
    }
  ],
  "refs": {
    "1": { ... }, // Search Result 1
    "2": { ... }, // Search Result 2
    "3": { ... }, // Search Result 3
    "4": { ... }, // Search Result 4
    "5": { ... }, // Search Result 5
    "6": { ... }  // Search Result 6
  },
  "confidence": 100.0
}

Citations

Citations are numbered. Each fact’s ref_ids list points to entries in refs, where each key is a citation number and each value is a Search Result — the matched passage or image along with its document ID, file name, dataset, and any requested metadata. Query across specific datasets or apply document filters to narrow the scope of the query.

Scoping to specific datasets

When running ask, you must specify at least one dataset to query against.
To specify the datasets to query against, pass --dataset or -d (repeatable):
topk ask "What was the total net income of Bank of America in 2024?" -d finance -d compliance

Document filtering

Sometimes a dataset might contain documents that should not be considered for the query. You can filter out documents that don’t match your criteria by providing a filter expression. These filter expressions operate on the metadata fields of documents. If your documents include metadata fields, you can use those fields to narrow down the search scope. This is useful when you want to query:
  • Documents within a specific time range
  • Documents matching a particular category or type
  • Documents associated with a specific group or owner
  • Documents the end user is permitted to access
from topk_sdk.query import field

for message in client.ask(
    "What is the travel reimbursement limit?",
    [
        {
            "dataset": "policies",
            "filter": field("department").eq("finance").and_(
                field("year").eq(2024)
            ),
        }
    ],
):
    print(message)

Retrieving metadata

By default, metadata fields are not included in citations. Pass the field names you want returned and they will appear on each cited Search Result.
Use --field (repeatable) and --output json to include metadata field(s) in the output:
topk ask "What was the total net income of Bank of America in 2024?" -d finance --field title --field year --output json
Example output
{
  "facts": [
    {
      "fact": "Bank of America's total net income for the fiscal year 2024 was $27,132 million.",
      "ref_ids": ["1", "2"]
    }
  ],
  "refs": {
    "1": {
      "metadata": {
        "title": "Bank of America 2024 Annual Report",
        "year": 2024
      },
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "finance",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#130",
      "doc_name": "bank_of_america_2024.pdf",
      "content": {
        "text": "## Condensed Statement of Cash Flows\n[row_1]; []=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
        "doc_pages": [170]
      }
    },
    "2": {
      "metadata": {
        "title": "Bank of America 2024 Annual Report",
        "year": 2024
      },
      "doc_id": "bank-of-america-annual-report-2024",
      "doc_type": "application/pdf",
      "dataset": "finance",
      "content_id": "doc#bank-of-america-annual-report-2024#chunk#428",
      "doc_name": "bank_of_america_2024.pdf",
      "content": {
        "text": "## Consolidated Statement of Comprehensive Income\n[row_0]; [Dollars in millions]=Net income; [2024]=27,132; [2023]=26,515; [2022]=27,528\n...",
        "doc_pages": [92]
      }
    }
  },
  "confidence": 100.0
}