Skip to main content
Collections store documents and provide the interface for querying them efficiently.

Creating a collection

Call create() on client.collections():
name
string
required
Collection name (unique within account)
schema
Map<String, FieldSpec>
required
Document structure and search capabilities
from topk_sdk.schema import int, text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
        "published_year": int().required(),
    },
)
Field names starting with _ are reserved for internal use.

Schema

Opt-in schema

TopK is schemaless-by-default. Fields without types can store any value. When types are specified, data is validated during upsert. Indexed fields require explicit types.

Field types

TypeUse case
text()Strings, descriptions, content
bytes()Binary data, images, files
int()Integers, counts, IDs
float()Decimal numbers, prices
bool()True/false values
list(value_type)Arrays (text, integer, float)
f32_vector(dim)Dense embeddings (most common)
u8_vector(dim)Quantized embeddings
i8_vector(dim)Signed quantized embeddings
binary_vector(dim)Binary embeddings
f32_sparse_vector()Sparse embeddings
u8_sparse_vector()Quantized sparse embeddings

Required fields

Fields are optional by default. Add .required() to make them mandatory:
"name": text().required()  # Must be present
"price": int()            # Can be null

Indexes

Indexes enable search capabilities. Without indexes, only exact filtering is possible.

Index types

Vector Index

Used for vector search. Supports dimensions up to 2^14.
"embedding": f32_vector(dimension=1536).index(vector_index(metric="cosine"))
Similarity metrics compatibility:
Vector Typecosineeuclideandot_producthamming
f32_vector
u8_vector
i8_vector
f32_sparse_vector
u8_sparse_vector
binary_vector

Keyword Index

Traditional text search with BM25 relevance scoring. Fast keyword matching with no embedding overhead.
"title": text().index(keyword_index())

Semantic Index

Convenience method for automatic embeddings.
"title": text().index(semantic_index())
See semantic_index() for model details.
I