> ## Documentation Index
> Fetch the complete documentation index at: https://docs.topk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Upsert documents

Documents in TopK are JSON-like objects composed of key-value pairs.

Each document within a collection:

* Must include a unique `_id` field
* Must conform to the collection schema for indexed fields

Fields defined in the schema can be indexed for vector, keyword, or other retrieval strategies.

## Upsert documents in a collection

To upsert documents, pass a list of documents to the [`upsert()`](/sdk/topk-py#upsert) function:

<CodeGroup>
  ```python Python theme={null}
  client.collection("books").upsert(
      [
          {
              "_id": "book-1",
              "title": "The Great Gatsby",
              "published_year": 1925,
              "title_embedding": [0.12, 0.67, 0.82, 0.53, ...]
          },
          {
              "_id": "book-2",
              "title": "To Kill a Mockingbird",
              "published_year": 1960,
              "title_embedding": [0.42, 0.53, 0.65, 0.33, ...]
          },
          {
              "_id": "book-3",
              "title": "1984",
              "published_year": 1949,
              "title_embedding": [0.59, 0.33, 0.71, 0.61, ...]
          }
      ]
  )
  ```

  ```typescript Javascript theme={null}
  await client.collection("books").upsert([
    {
      _id: "book-1",
      title: "The Great Gatsby",
      published_year: 1925,
      title_embedding: [0.12, 0.67, 0.82, 0.53],
    },
    {
      _id: "book-2",
      title: "To Kill a Mockingbird",
      published_year: 1960,
      title_embedding: [0.42, 0.53, 0.65, 0.33],
    },
    {
      _id: "book-3",
      title: "1984",
      published_year: 1949,
      title_embedding: [0.59, 0.33, 0.71, 0.61],
    },
  ]);
  ```
</CodeGroup>

* Every document must have a **string** `_id` field.
* If a document with the specified `_id` doesn't exist, a new document will be **inserted**.
* If a document with the same `_id` already exists, the existing document will be **replaced** with the new one.

<Note>
  The `upsert()` function does not perform a *partial update* or *merge* - the entire document is being replaced.
</Note>

<Warning>
  Each document you send is serialized as a Protocol Buffers (protobuf) message. The encoded size of that message must be **128KB or smaller**.
</Warning>

## Additional (non-schema) fields

You may include fields that are not defined in the collection schema.

These fields:

* Are stored with the document
* Can be returned to the client in query results
* Can be used for **filtering** in queries

Fields that are not defined in the schema are not indexed.
If you want to use a field for [vector search](/concepts/vector-search), [semantic search](/concepts/semantic-search), [keyword search](/concepts/keyword-search) or [multi-vector search](/concepts/multi-vector-search), it must be declared in the schema
and have a corresponding index defined.

<CodeGroup>
  ```python Python theme={null}
  from topk_sdk.schema import text, int, f32_vector, vector_index, keyword_index
  from topk_sdk.query import select, field, fn

  client.collections().create(
      "books",
      schema={
          "title": text().index(keyword_index()).required(),
          "published_year": int().required(),
          "title_embedding": f32_vector(dimension=1024).index(vector_index(metric="cosine")).required(),
      },
  )

  client.collection("books").upsert([
      {
          "_id": "book-1",
          "title": "The Great Gatsby",
          "published_year": 1925,
          "title_embedding": [0.12, 0.67, 0.82, 0.53, ...],
          "tags": ["fiction", "classic"], # non-schema field
          "source_url": "https://example.com/gatsby", # non-schema field
      }
  ])

  client.collection("books").query(
      select(
          "title",
          "source_url",
          "title_similarity": fn.semantic_similarity("title", "classic American novel"),
      )
      .filter(field("tags").contains("fiction")) # non-schema fields can still be used for filtering
  )
  ```

  ```typescript Javascript theme={null}
  import { text, int, f32Vector, vectorIndex, keywordIndex } from "topk-js/schema";
  import { select, field, fn } from "topk-js/query";

  await client.collections().create("books", {
    title: text().index(keywordIndex()).required(),
    published_year: int().required(),
    title_embedding: f32Vector({ dimension: 1024 })
      .index(vectorIndex({ metric: "cosine" }))
      .required()
  });

  await client.collection("books").upsert([
    {
      _id: "book-1",
      title: "The Great Gatsby",
      published_year: 1925,
      title_embedding: [0.12, 0.67, 0.82, 0.53],
      tags: ["fiction", "classic"], // non-schema field
      source_url: "https://example.com/gatsby", // non-schema field
    },
  ]);

  const docs = await client.collection("books").query(
    select({
      title: field("title"),
      source_url: field("source_url"),
      title_similarity: fn.semanticSimilarity("title", "classic American novel"),
    })
    .filter(field("tags").contains("fiction")) // non-schema fields can still be used for filtering
  );
  ```
</CodeGroup>

## Supported types

TopK documents are a flat structure of key-value pairs.

The following value types are supported:

| Type                  | Python Type      | JavaScript Type       | Helper Function                                                |
| --------------------- | ---------------- | --------------------- | -------------------------------------------------------------- |
| **String**            | `str`            | `string`              | -                                                              |
| **Integer**           | `int`            | `number`              | -                                                              |
| **Float**             | `float`          | `number`              | -                                                              |
| **Boolean**           | `bool`           | `boolean`             | -                                                              |
| **String list**       | `list[str]`      | `string[]`            | [`string_list()`](../sdk/topk-py/data#string-list)             |
| **F32 list**          | `list[float]`    | `number[]`            | [`f32_list()`](../sdk/topk-py/data#f32-list)                   |
| **F64 list**          | *use helper*     | *use helper*          | [`f64_list()`](../sdk/topk-py/data#f64-list)                   |
| **I32 list**          | *use helper*     | *use helper*          | [`i32_list()`](../sdk/topk-py/data#i32-list)                   |
| **I64 list**          | *use helper*     | *use helper*          | [`i64_list()`](../sdk/topk-py/data#i64-list)                   |
| **U32 list**          | *use helper*     | *use helper*          | [`u32_list()`](../sdk/topk-py/data#u32-list)                   |
| **F8 vector**         | *use helper*     | *use helper*          | [`f8_vector()`](../sdk/topk-py/data#f8-vector)                 |
| **F16 vector**        | *use helper*     | *use helper*          | [`f16_vector()`](../sdk/topk-py/data#f16-vector)               |
| **F32 vector**        | `list[float]`    | `number[]`            | [`f32_vector()`](../sdk/topk-py/data#f32-vector)               |
| **U8 vector**         | *use helper*     | *use helper*          | [`u8_vector()`](../sdk/topk-py/data#u8-vector)                 |
| **I8 vector**         | *use helper*     | *use helper*          | [`i8_vector()`](../sdk/topk-py/data#i8-vector)                 |
| **Binary vector**     | *use helper*     | *use helper*          | [`binary_vector()`](../sdk/topk-py/data#binary-vector)         |
| **F32 sparse vector** | *use helper*     | *use helper*          | [`f32_sparse_vector()`](../sdk/topk-py/data#f32-sparse-vector) |
| **U8 sparse vector**  | *use helper*     | *use helper*          | [`u8_sparse_vector()`](../sdk/topk-py/data#u8-sparse-vector)   |
| **Matrix**            | *use helper*     | *use helper*          | [`matrix()`](../sdk/topk-py/data#matrix-2)                     |
| **Bytes**             | *use helper*     | *use helper*          | [`bytes()`](../sdk/topk-py/data#bytes)                         |
| **Struct**            | `dict[str, Any]` | `Record<string, any>` | [`struct()`](../sdk/topk-py/data#struct)                       |
