Concepts - TopK

Dataset

Dataset is a high-level abstraction for storing, searching, and getting grounded answers from unstructured files (pdf, markdown, html, …). Document files uploaded into a dataset are parsed into context-aware chunks with relevant metadata and stored inside a collection. This gives our query sub-agent the ability to retrieve the most relevant context and generate a highly accurate, grounded answer.

Collection

Collection is a low-level abstraction for storing JSON-like documents with indexed fields. Collections have an opt-in schema that defines required and optional fields, field data types, and field indexes. Every document stored in a collection must have an _id field as a unique primary key.

Indexes

Documents inside a collection can have multiple indexed fields defined in the collection schema. Field indexes enable efficient retrieval of documents based on dense and sparse vector embeddings, multi-vector embeddings (late interaction), keywords (BM25), semantic similarity, and their combinations.

The ability to store and search multiple indexed fields per document minimizes storage overhead, makes filtering more efficient, and gives users flexibility at query time without having to re-ingest their data or maintain multiple indexes.

Filtering

Filtering allows queries to select only documents that match a specific condition. Filter expressions can be simple (comparison) or complex (AND/OR operators, ANY/ALL operators, regex patterns, and more). Additionally, you can use computed fields (for example, similarity score) to filter documents in the result set.

Filters are always applied before sorting and aggregation (top-k) to guarantee that the final results contain every document matching the filter, even if there is just one. We also guarantee that recall stays the same (or improves) when using higly selective filters.

Custom Scoring

Similarity is not the same as relevance. Custom scoring expressions enable relevance tuning (for example, boosting more recent documents) inside the query without having to over-fetch results and re-score them in the application layer. You can combine computed fields (similarity score, recency score, etc.) with any metadata fields (for example, source quality) inside your scoring expression to define your ranking.

Scoring expressions are always computed before sorting and aggregation to guarantee that the final results contain the most relevant documents according to your ranking logic.

Compute-Storage Separation

All data in TopK is durably stored in object storage. Read/write compute nodes are statless which means that any node can immediately take over serving requests in case of a node failure. This decoupled architecture enables cost-effective scaling and high availability without having to run consensus-based replication (Raft or Paxos) inside the cluster.

Read-Write Separation

Different applications have different read/write patterns and latency requirements. We designed our system with decoupled read and write paths to minimize the impact of write/indexing/compaction operations on query performance. This enables sustained high-throughput writes without query latency spikes caused by background compaction or indexing contenting for resources on the same node.

Multi-tenancy

TopK supports massively multi-tenant use cases with partitioned collections. Each partition within a collection is fully isolated which ensures that documents from different tenants are not visible to each other. This design also enables TopK to scale write throughput and read throughput horizontally with the number of partitions (tenants) in a collection.

Partitioned collections behave like regular collections, supporting all index types and full query capabilities.

Storing documents for a specific tenant

To store documents for a specific tenant, provide the tenant ID as partition_name alongside the collection_name when creating the collection client. Partitions are created implicity on the first write.

client.collection("books", "tenant-1234").upsert([
    {"_id": "doc-1", "title": "The Great Gatsby", "author": "F. Scott Fitzgerald"},
    {"_id": "doc-2", "title": "To Kill a Mockingbird", "author": "Harper Lee"},
])

await client.collection("books", "tenant-1234").upsert([
  { _id: "doc-1", title: "The Great Gatsby", author: "F. Scott Fitzgerald" },
  { _id: "doc-2", title: "To Kill a Mockingbird", author: "Harper Lee" },
]);

Querying documents for a specific tenant

Similarly to writes, you can query documents for a specific tenant by providing the tenant ID as partition_name alongside the collection_name when creating the collection client. Queries for missing partitions will return PartitionNotFound error.

client.collection("books", "tenant-1234").query(
    select("title")
    .filter(match("gatsby"))
    .limit(10)
)

await client.collection("books", "tenant-1234").query(
  select({
    title: field("title"),
  })
  .filter(match("gatsby"))
  .limit(10)
);

Read Consistency

TopK supports three different consistency levels, allowing you to choose the right trade-off between consistency, performance, and cost. By default, we provide a Balanced Consistency Mode, which balances data freshness (~750ms p99 write-to-queryable) and query efficiency for most applications. Below, we explain how TopK handles data writes and reads and how each consistency mode impacts behavior.

Balanced Consistency (Default)

Reads in this mode consider both indexed files and the most recent writes. While there may be a small delay of less than a second for some recent writes to appear, this mode offers lower cost compared to strong consistency. It is ideal for most real-world applications where near-real-time updates are sufficient.

client.collection("my_collection").query(
    query,
    # no need to specify consistency mode
)

await client.collection("my_collection").query(
  query
  // no need to specify the consistency mode
);

How It Works:

The Router checks both compacted files and a cached view of the WAL (refresh rate is less than 1s)
This introduces a chance of delay: if a write has just been added to WAL but hasn’t been cached yet, it may not show up in a read
However, this delay is minimal (less than 1s in most cases), making it a practical and efficient default

Indexed Consistency

Reads in this mode only consider fully compacted files and ignore recent WAL writes to deliver constantly low query latency. This is best suited for workloads with asynchronous write path that are not sensitive to recent writes being visible in queries with low delay.

client.collection("my_collection").query(
    ..., # query
    consistency="indexed",
)

await client
  .collection("my_collection")
  .query(query, { consistency: "indexed" });

How It Works:

The Router forwards queries only to the Executor, which reads from compacted files
WAL is ignored, meaning queries are always served from stable, processed data
This reduces query latency and load, making it the most cost-efficient option for high-throughput reads

Strong Consistency

Reads in this mode always return the latest writes before responding. While this ensures that all queries see the most recent updates, it comes with higher latency and cost due to additional WAL reads. This mode recommended for cases where clients always need to see the most recent writes.

client.collection("my_collection").query(
    ..., # query
    consistency="strong",
)

await client
  .collection("my_collection")
  .query(query, { consistency: "strong" });

How It Works:

Before serving a read, the Router explicitly checks the WAL to ensure the latest writes are reflected
This guarantees that all queries see the most recent updates but adds overhead because it requires an additional lookup
Strong consistency ensures that all queries see the most recent updates but is more expensive than other modes due to the extra computation

Choosing the Right Mode

Consistency Mode	Freshness	Cost	Query performance
Balanced (Default)	Near real-time (less than 1s)	Low	Good
Indexed	Only compacted data	Low	Fastest
Strong	All writes are visible	Higher	Slower

For most use cases, Balanced Consistency offers the best trade-off between performance and correctness. However, if you prioritize low query latency over recency, Indexed Consistency is the right choice. When no staleness is allowed, Strong Consistency ensures every read reflects the latest write.

LSN-based Consistency

For even more precise control over consistency, TopK also supports LSN (Log Sequence Number) based consistency. This approach allows you to ensure read-after-write consistency by specifying the exact sequence number of a write operation in your queries.

For detailed information about using LSNs in queries, see our LSN-based Consistency guide in the Query documentation.

​Dataset

​Collection

​Indexes

​Filtering

​Custom Scoring

​Compute-Storage Separation

​Read-Write Separation

​Multi-tenancy

​Storing documents for a specific tenant

​Querying documents for a specific tenant

​Read Consistency

​Balanced Consistency (Default)

​Indexed Consistency

​Strong Consistency

​Choosing the Right Mode

​LSN-based Consistency

Dataset

Collection

Indexes

Filtering

Custom Scoring

Compute-Storage Separation

Read-Write Separation

Multi-tenancy

Storing documents for a specific tenant

Querying documents for a specific tenant

Read Consistency

Balanced Consistency (Default)

Indexed Consistency

Strong Consistency

Choosing the Right Mode

LSN-based Consistency