Create a collection
Collections are the core data structure in TopK. They are used to store and query documents.
Creating a collection
The topk_sdk.schema
module contains the schema definition for the fields in a collection.
To create a collection, you need to pass the collection name and the schema to the create
method.
Schema
Data types
TopK supports the following data types. There are more on our roadmap and we are working effortlessly to bring them to you as soon as possible.
int()
int()
is used to define an integer field in the schema.
float()
float()
is used to define a float field in the schema.
bool()
bool()
is used to define a boolean field in the schema.
text()
text()
is used to define a text field in the schema.
f32_vector()
f32_vector()
is used to define a vector field with f32
values. You can pass vector dimension
as a parameter (required, greater than 0) which will be validated when upserting documents.
u8_vector()
u8_vector()
is used to define a vector field with u8
values. You can pass vector dimension
as a parameter (required, greater than 0) which will be validated when upserting documents.
binary_vector()
binary_vector()
is used to define a binary vector packed into u8
values. You can pass vector dimension
as a parameter (required, greater than 0) which will be validated when upserting documents.
Binary vector dimension is defined in terms of the number of bytes. This means that for a 1024-bit binary vector, the dimension topk
expects is 128 (1024 / 8).
bytes()
bytes()
is used to define a bytes field in the schema.
Properties
required()
required()
is used to mark a field as required. All fields are optional
by default.
_id
.The above example shows how to mark a field title
as required.
Methods
index()
index()
is used to create an index on a field.
Semantic index
The semantic_index()
method is used to create both a vector and a keyword index on a given field. This allows you to do both semantic search and keyword search over the same field. Note that semantic_index()
can only be called over text()
data type.
Optionally, you can pass a model
parameter to the semantic_index()
method. Supported models are:
cohere/embed-english-v3
cohere/embed-multilingual-v3
(default)
Vector index
The vector_index()
method is used to create a vector index. Only fields with f32_vector()
, u8_vector()
, or binary_vector()
data types
can be indexed with a vector index.
The above example shows how to create a vector index on a field title_embedding
with the cosine
similarity metric.
However, there are more metrics available:
euclidean
cosine
dot_product
hamming
(only supported forbinary_vector()
type)
Need support or want to give some feedback? You can join our community or drop us an email at support@topk.io.
Keyword index
The keyword_index()
method is used to create a keyword index on a text()
field.
index(keyword_index())
can only be called over text()
data type.
Was this page helpful?