Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.topk.io/llms.txt

Use this file to discover all available pages before exploring further.

Search your documents and return the most relevant passages based on your natural-language query.

How it works

When you run Search, TopK:
1

Searches across your documents

Searches one or more datasets for the passages most relevant to the query.
2

Ranks the best matches

Results are ranked by relevance so the highest-signal passages come first.
3

Returns passages with document context

Each result includes the matched passage, document ID, dataset, page references, and requested metadata.

Usage

Once your documents are processed, you can start retrieving relevant passages immediately.
topk search "travel reimbursement policy" -d policies
Here’s an example of a Search query over a financial dataset:
Query:What was the total net income of Bank of America in 2024?Search results:

Understanding search results

Search returns a list of the most relevant Search Results based on the provided query. Search Results are objects referencing specific passages extracted from the original documents.

Search Result

Each Search Result has the following fields:
FieldTypeDescription
doc_idstringThe ID of the source document assigned at upload time
doc_namestringThe file name of the source document assigned at upload time
doc_typestringThe MIME type of the source document (e.g. application/pdf)
datasetstringThe dataset the document belongs to
content_idstringA unique identifier for the content of this Search Result
contentobjectThe matched content — see Content types
metadataobjectMetadata fields attached to the source document at upload time — must be requested, see Retrieving metadata

Content types

There are three content types a search result can contain:
  • Chunk — a text passage extracted from a document, with optional source page number(s)
  • Image — an image extracted from a document
  • Page — an image of a page from a document, with source page number
Query across multiple datasets or apply document filters to narrow the scope of the query.

Scoping to specific datasets

This is useful when you want:
  • More targeted results
  • Less ambiguity across unrelated document sets
  • Tighter control over what content an agent is allowed to see
Use -d / --dataset (repeatable):
topk search "What was the total net income of Bank of America in 2024?" -d finance -d compliance

Filter documents

Sometimes a dataset might contain documents that should not be considered for the query. You can filter out documents that don’t match your criteria by providing a filter expression. These filter expressions operate on the metadata fields of documents. If your documents include metadata fields, you can use those fields to narrow down the search scope. This is useful when you want to query:
  • Documents within a specific time range
  • Documents matching a particular category or type
  • Documents associated with a specific group or owner
  • Documents the end user is permitted to access
from topk_sdk.query import field

for result in client.search(
    query="travel reimbursement limit",
    datasets=[
        {
            "dataset": "policies",
            "filter": field("department").eq("finance").and_(
                field("year").eq(2024)
            ),
        }
    ],
    top_k=10,
):
    print(result)

Retrieving metadata

By default, metadata fields are not included in search results. Pass the field names you want returned and they will appear on each Search Result.
Use --field (repeatable) and --output json to include metadata field(s) in the output:
topk search "refund policy" -d policies --field title --field author --field year --output json
Example output
[
  {
    "metadata": {
      "title": "Refund Policy 2024",
      "author": "Finance Team",
      "year": 2024
    },
    "doc_id": "refund-policy-2024",
    "doc_type": "application/pdf",
    "dataset": "policies",
    "content_id": "doc#refund-policy-2024#chunk#12",
    "doc_name": "refund_policy_2024.pdf",
    "content": {
      "text": "Refund requests must be submitted within 30 days of purchase. Approved refunds are processed within 5–7 business days.",
      "doc_pages": [3]
    }
  }
]
If present on the original document, the requested metadata appears on the returned Search Result.