Skip to main content
Documents in TopK are JSON-like objects composed of key-value pairs. Each document within a collection:
  • Must include a unique _id field
  • Must conform to the collection schema for indexed fields
Fields defined in the schema can be indexed for vector, keyword, or other retrieval strategies.

Upsert documents in a collection

To upsert documents, pass a list of documents to the upsert() function:
client.collection("books").upsert(
    [
        {
            "_id": "book-1",
            "title": "The Great Gatsby",
            "published_year": 1925,
            "title_embedding": [0.12, 0.67, 0.82, 0.53, ...]
        },
        {
            "_id": "book-2",
            "title": "To Kill a Mockingbird",
            "published_year": 1960,
            "title_embedding": [0.42, 0.53, 0.65, 0.33, ...]
        },
        {
            "_id": "book-3",
            "title": "1984",
            "published_year": 1949,
            "title_embedding": [0.59, 0.33, 0.71, 0.61, ...]
        }
    ]
)
  • Every document must have a string _id field.
  • If a document with the specified _id doesn’t exist, a new document will be inserted.
  • If a document with the same _id already exists, the existing document will be replaced with the new one.
The upsert() function does not perform a partial update or merge - the entire document is being replaced.
Each document you send is serialized as a Protocol Buffers (protobuf) message. The encoded size of that message must be 128KB or smaller.

Additional (non-schema) fields

You may include fields that are not defined in the collection schema. These fields:
  • Are stored with the document
  • Can be returned to the client in query results
  • Can be used for filtering in queries
Fields that are not defined in the schema are not indexed. If you want to use a field for vector search, semantic search, keyword search or multi-vector search, it must be declared in the schema and have a corresponding index defined.
from topk_sdk.schema import text, int, f32_vector, vector_index, keyword_index
from topk_sdk.query import select, field, fn

client.collections().create(
    "books",
    schema={
        "title": text().index(keyword_index()).required(),
        "published_year": int().required(),
        "title_embedding": f32_vector(dimension=1024).index(vector_index(metric="cosine")).required(),
    },
)

client.collection("books").upsert([
    {
        "_id": "book-1",
        "title": "The Great Gatsby",
        "published_year": 1925,
        "title_embedding": [0.12, 0.67, 0.82, 0.53, ...],
        "tags": ["fiction", "classic"], # non-schema field
        "source_url": "https://example.com/gatsby", # non-schema field
    }
])

client.collection("books").query(
    select(
        "title",
        "source_url",
        "title_similarity": fn.semantic_similarity("title", "classic American novel"),
    )
    .filter(field("tags").contains("fiction")) # non-schema fields can still be used for filtering
)

Supported types

TopK documents are a flat structure of key-value pairs. The following value types are supported:
TypePython TypeJavaScript TypeHelper Function
Stringstrstring-
Integerintnumber-
Floatfloatnumber-
Booleanboolboolean-
String listlist[str]string[]string_list()
F32 listlist[float]number[]f32_list()
F64 listuse helperuse helperf64_list()
I32 listuse helperuse helperi32_list()
I64 listuse helperuse helperi64_list()
U32 listuse helperuse helperu32_list()
F8 vectoruse helperuse helperf8_vector()
F16 vectoruse helperuse helperf16_vector()
F32 vectorlist[float]number[]f32_vector()
U8 vectoruse helperuse helperu8_vector()
I8 vectoruse helperuse helperi8_vector()
Binary vectoruse helperuse helperbinary_vector()
F32 sparse vectoruse helperuse helperf32_sparse_vector()
U8 sparse vectoruse helperuse helperu8_sparse_vector()
Matrixuse helperuse helpermatrix()
Bytesuse helperuse helperbytes()