Skip to main content
Collections organize your documents, define their schema, and enable fast vector search, filtering, keyword search, semantic search, and multi-vector search.

Creating a collection

In order to create a collection, call the create() method on the client.collections() object:
from topk_sdk.schema import int, text, semantic_index

client.collections().create(
    "books",
    schema={
        "title": text().required().index(semantic_index()),
        "published_year": int().required(),
    },
)
Field names starting with _ are reserved for internal use.

Schema

Opt-in schema

TopK is schemaless-by-default. Fields without types can store any value. When types are specified, data is validated during upsert.
Indexed fields require explicit types.

Field types

TypeUse case
text()Strings, descriptions, content, IDs
bytes()Binary data, images, files
int()Integers, counts, IDs
float()Decimal numbers, prices
bool()true/false values
list(value_type)Arrays of text, integer, or float elements
f8_vector(dim)8-bit float embeddings
f16_vector(dim)16-bit float embeddings
f32_vector(dim)Dense embeddings (most common)
u8_vector(dim)Quantized embeddings
i8_vector(dim)Signed quantized embeddings
binary_vector(dim)Binary embeddings
f32_sparse_vector()Sparse embeddings
u8_sparse_vector()Quantized sparse embeddings
matrix(dim, value_type)Multi-vector embeddings

Required fields

Fields are optional by default. Add required() to make them mandatory—required fields must be present in every document during upsert. Documents missing a required field are rejected with a validation error.
from topk_sdk.schema import int, text

schema = {
    "name": text().required(),   # Must be present in all documents
    "price": int(),              # Can be omitted (null)
}

Indexes

Only indexed fields can be searched. Non-indexed fields support exact-match filters only.

Vector Index

Used for vector search. Supports dimensions up to 2^14. Enabled by vector_index().
from topk_sdk.schema import f32_vector, vector_index

schema = {
    "embedding": f32_vector(dimension=1536).index(vector_index(metric="cosine")),
}
Similarity metrics compatibility:
Vector Typecosineeuclideandot_producthamming
f8_vector
f16_vector
f32_vector
u8_vector
i8_vector
binary_vector
f32_sparse_vector
u8_sparse_vector

Multi Vector Index

Enables multi-vector search on matrix() fields using the maxsim metric for late-interaction scoring. Enabled by multi_vector_index(). See multi-vector search for more information.
from topk_sdk.schema import matrix, multi_vector_index

schema = {
    "token_embeddings": matrix(
        dimension=1536,
        value_type="f32"
    ).index(
        multi_vector_index(metric="maxsim")
    ),
}

Keyword Index

Traditional text search with BM25 relevance scoring. Fast keyword matching with no embedding overhead. Enabled by keyword_index().
from topk_sdk.schema import keyword_index, text

schema = {
    "title": text().index(keyword_index()),
}

Semantic Index

Convenience method for automatic embeddings. Enabled by semantic_index().
from topk_sdk.schema import semantic_index, text

schema = {
    "title": text().index(semantic_index()),
}
See semantic_index() for model details.