Skip to main content
Search your documents and return the most relevant passages based on your natural-language query.

How it works

When you run Search, TopK:
1

Searches across your documents

TopK searches one or more datasets for the passages most relevant to the query.
2

Ranks the best matches

Results are ranked by relevance so the highest-signal passages come first.
3

Returns passages with document context

Each result includes the matched passage, document ID, dataset, page references, and requested metadata.
For example, for the query:
What does the policy say about contractor access?
results look like this:
Example search results
[
  {
    "doc_id": "vendor-access-policy",
    "doc_type": "application/pdf",
    "dataset": "policies",
    "content": {
      "chunk": {
        "text": "Contractors may be granted access only through approved, time-limited credentials.",
        "doc_pages": [4]
      }
    },
    "metadata": {
      "title": "Vendor Access Policy",
      "department": "finance",
      "year": 2024
    }
  },
  {
    "doc_id": "security-standards",
    "doc_type": "text/markdown",
    "dataset": "policies",
    "content": {
      "chunk": {
        "text": "All contractor access must be sponsored by a full-time employee and reviewed on a quarterly basis.",
        "doc_pages": []
      }
    },
    "metadata": {
      "title": "Security Standards",
      "department": "it",
      "year": 2024
    }
  },
  {
    "doc_id": "access-control-diagram",
    "doc_type": "image/png",
    "dataset": "policies",
    "content": {
      "image": {
        "data": "iVBORw0KGgoAAAANSUhEUgAAAMgAAADICAYAAACtWK6eAAAABmJLR0QA/wD/AP+gvaeTAAAF...",
        "mime_type": "image/png"
      }
    },
    "metadata": {
      "title": "Access Control Diagram",
      "department": "it"
    }
  }
]
Search gives you the evidence directly, without deciding how to interpret it. This makes it useful for:
  • RAG pipelines
  • building custom answering or summarization layers on top of retrieved passages
  • feeding high-signal evidence into agents or downstream workflows
  • inspecting and verifying source passages directly
  • semantic search in applications

Usage

Once your documents are processed, you can start retrieving relevant passages immediately.
topk search "travel reimbursement policy" -d policies
Alternatively, pass --top-k to control the number of results (defaults to 10):
topk search "travel reimbursement policy" -d policies --top-k 20
Query across multiple datasets or apply document filters to narrow the scope of the query.

Scoping to specific datasets

This is useful when you want:
  • More targeted results
  • Less ambiguity across unrelated document sets
  • Tighter control over what content an agent is allowed to see
Use -d / --dataset (repeatable):
topk search "What was the total net income of Bank of America in 2024?" -d finance -d compliance

Filter documents

Sometimes a dataset might contain documents that should not be considered for the query. You can filter out documents that don’t match your criteria by providing a filter expression. These filter expressions operate on the metadata fields of documents. For example, if you uploaded documents with metadata such as department, year, doc_type, or author, you can use those fields to limit what Ask is allowed to retrieve. This is useful when you want to query:
  • Documents within a specific time range
  • Documents matching a particular category or type
  • Documents associated with a specific group or owner
  • Documents the user is permitted to access
import os
from topk_sdk import Client
from topk_sdk.query import field

client = Client(
    api_key=os.environ.get("TOPK_API_KEY"),
    region="aws-us-east-1-elastica",
)

results = client.search(
    query="travel reimbursement limit",
    datasets=[
        {
            "dataset": "policies",
            "filter": field("department").eq("finance").and_(
                field("year").eq(2024)
            ),
        }
    ],
    top_k=10,
)
Use source-level filters when the restriction is part of where the search should look. That keeps retrieval focused and improves the quality of the returned matches.

Retrieving metadata

The passage text alone is often not enough. You may also want metadata such as title, author, date, or any custom fields you attached during upload — to render richer results, group by source attributes, or carry context into downstream agents.
Use --field (repeatable):
topk search "refund policy" -d policies --field title --field author --field year
The requested metadata appears on each returned result.