> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Create a collection

Collections organize your documents, define their schema, and enable fast vector search, filtering, keyword search, semantic search, and multi-vector search.

## Creating a collection

In order to create a collection, call the [`create()`](/sdk/topk-py#create) method on the [`client.collections()`](/sdk/topk-py#collectionsclient) object:

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import int, text, semantic_index

  client.collections().create(
      "books",
      schema={
          "title": text().required().index(semantic_index()),
          "published_year": int().required(),
      },
  )
  ```

  ```typescript Javascript theme={null}
  import { int, text, semanticIndex } from "topk-js/schema";

  await client.collections().create("books", {
    title: text().required().index(semanticIndex()),
    published_year: int().required(),
  });
  ```
</CodeGroup>

<Warning>
  Field names starting with `_` are reserved for internal use.
</Warning>

## Schema

### Opt-in schema

TopK is schemaless-by-default. Fields without types can store any value. When types are specified, data is validated during upsert.

<Info>Indexed fields require **explicit types**.</Info>

### Field types

| Type                                                           | Use case                                   |
| -------------------------------------------------------------- | ------------------------------------------ |
| [`text()`](/sdk/topk-py/schema#text)                           | Strings, descriptions, content, IDs        |
| [`bytes()`](/sdk/topk-py/schema#bytes)                         | Binary data, images, files                 |
| [`int()`](/sdk/topk-py/schema#int)                             | Integers, counts, IDs                      |
| [`float()`](/sdk/topk-py/schema#float)                         | Decimal numbers, prices                    |
| [`bool()`](/sdk/topk-py/schema#bool)                           | true/false values                          |
| [`list(value_type)`](/sdk/topk-py/schema#list)                 | Arrays of text, integer, or float elements |
| [`struct(fields)`](/sdk/topk-py/schema#struct)                 | Nested objects with named fields           |
| [`f8_vector(dim)`](/sdk/topk-py/schema#f8_vector)              | 8-bit float embeddings                     |
| [`f16_vector(dim)`](/sdk/topk-py/schema#f16_vector)            | 16-bit float embeddings                    |
| [`f32_vector(dim)`](/sdk/topk-py/schema#f32_vector)            | Dense embeddings (most common)             |
| [`u8_vector(dim)`](/sdk/topk-py/schema#u8_vector)              | Quantized embeddings                       |
| [`i8_vector(dim)`](/sdk/topk-py/schema#i8_vector)              | Signed quantized embeddings                |
| [`binary_vector(dim)`](/sdk/topk-py/schema#binary_vector)      | Binary embeddings                          |
| [`f32_sparse_vector()`](/sdk/topk-py/schema#f32_sparse_vector) | Sparse embeddings                          |
| [`u8_sparse_vector()`](/sdk/topk-py/schema#u8_sparse_vector)   | Quantized sparse embeddings                |
| [`matrix(dim, value_type)`](/sdk/topk-py/schema#matrix)        | Multi-vector embeddings                    |

### Required fields

Fields are optional by default.

Add [`required()`](/sdk/topk-py/schema#required) to make them mandatory—required fields must be present in every document during upsert. Documents missing a required field are rejected with a validation error.

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import int, text

  schema = {
      "name": text().required(),   # Must be present in all documents
      "price": int(),              # Can be omitted (null)
  }
  ```

  ```typescript Javascript theme={null}
  import { int, text } from "topk-js/schema";

  const schema = {
    name: text().required(),   // Must be present in all documents
    price: int(),              // Can be omitted (null)
  };
  ```
</CodeGroup>

## Indexes

Only indexed fields can be searched. Non-indexed fields support exact-match filters only.

### Vector Index

Used for vector search. Supports dimensions up to 2^14. Enabled by [`vector_index()`](/sdk/topk-py/schema#vector_index).

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import f32_vector, vector_index

  schema = {
      "embedding": f32_vector(dimension=1536).index(vector_index(metric="cosine")),
  }
  ```

  ```typescript Javascript theme={null}
  import { f32Vector, vectorIndex } from "topk-js/schema";

  const schema = {
    embedding: f32Vector({ dimension: 1536 }).index(vectorIndex({ metric: "cosine" })),
  };
  ```
</CodeGroup>

**Similarity metrics compatibility:**

| Vector Type         | `cosine` | `euclidean` | `dot_product` | `hamming` |
| ------------------- | :------: | :---------: | :-----------: | :-------: |
| `f8_vector`         |     ✅    |      ✅      |       ✅       |     —     |
| `f16_vector`        |     ✅    |      ✅      |       ✅       |     —     |
| `f32_vector`        |     ✅    |      ✅      |       ✅       |     —     |
| `u8_vector`         |     ✅    |      ✅      |       ✅       |     —     |
| `i8_vector`         |     ✅    |      ✅      |       ✅       |     —     |
| `binary_vector`     |     —    |      —      |       —       |     ✅     |
| `f32_sparse_vector` |     —    |      —      |       ✅       |     —     |
| `u8_sparse_vector`  |     —    |      —      |       ✅       |     —     |

### Multi Vector Index

Enables multi-vector search on [`matrix()`](/sdk/topk-py/schema#matrix) fields using the maxsim metric for late-interaction scoring. Enabled by [`multi_vector_index()`](/sdk/topk-py/schema#multi_vector_index). See [multi-vector search](/concepts/multi-vector-search) for more information.

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import matrix, multi_vector_index

  schema = {
      "token_embeddings": matrix(
          dimension=1536,
          value_type="f32"
      ).index(
          multi_vector_index(metric="maxsim")
      ),
  }
  ```

  ```typescript Javascript theme={null}
  import { matrix, multiVectorIndex } from "topk-js/schema";

  const schema = {
    token_embeddings: matrix({
      dimension: 1536,
      valueType: "f32",
    }).index(
      multiVectorIndex({ metric: "maxsim" })
    ),
  };
  ```
</CodeGroup>

### Keyword Index

Traditional text search with BM25 relevance scoring. Fast keyword matching with no embedding overhead. Enabled by [`keyword_index()`](/sdk/topk-py/schema#keyword_index).

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import keyword_index, text

  schema = {
      "title": text().index(keyword_index()),
  }
  ```

  ```typescript Javascript theme={null}
  import { keywordIndex, text } from "topk-js/schema";

  const schema = {
    title: text().index(keywordIndex()),
  };
  ```
</CodeGroup>

### Semantic Index

Convenience method for automatic embeddings. Enabled by [`semantic_index()`](/sdk/topk-py/schema#semantic_index).

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import semantic_index, text

  schema = {
      "title": text().index(semantic_index()),
  }
  ```

  ```typescript Javascript theme={null}
  import { semanticIndex, text } from "topk-js/schema";

  const schema = {
    title: text().index(semanticIndex()),
  };
  ```
</CodeGroup>

See [semantic\_index()](../sdk/topk-py/schema#semantic-index) for details.
