Overview - TopK

TopK exposes a PostgreSQL wire protocol endpoint — any standard SQL client can connect and issue queries and writes against your collections.

Prerequisites

API key — sign in to console.topk.io and generate an API key.
Region — see docs.topk.io/regions for available regions.

Setup

Connect using any PostgreSQL-compatible client — psql, psycopg2, node-postgres, tokio-postgres, and others.

psql "host=<region>.sql.topk.io port=5432 user=topk password=<api-key> dbname=topk"

Replace <region> with your selected region and <api-key> with your API key.

See available regions and get your API key.

Quick Start

Create a collection, insert documents, and run a semantic search with SQL:

-- 1. Create a collection
CREATE TABLE books (
    title   TEXT NOT NULL,
    content TEXT          INDEX semantic_index(),
    author  TEXT NOT NULL,
    rating  FLOAT
);

-- 2. Insert documents
INSERT INTO books (_id, title, content, author, rating)
VALUES
    ('1', 'The Hobbit',
     'A hobbit embarks on an unexpected journey through Middle-earth.',
     'Tolkien', 4.3),
    ('2', '1984',
     'A dystopian novel about totalitarian surveillance and control.',
     'George Orwell', 4.7),
    ('3', 'Dune',
     'An epic saga of politics, religion, and ecology on a desert planet.',
     'Frank Herbert', 4.5);

-- 3. Search
SELECT _id, title, semantic_similarity(content, 'political control and oppression') AS score
FROM books
ORDER BY score DESC
LIMIT 10;

The INDEX clause is a TopK extension to SQL that lets you declare search indexes inline with your column definitions.

Schemaless by default

TopK collections are schemaless by default. Documents can have any fields with any types — no schema declaration is required to store or query them. The only exception: fields you want to index must be declared, because index configuration (metric, index type, etc.) is attached to the field definition.

Column types are inferred from the schema, search function return values, and explicit casts. Columns that can’t be typed statically (e.g. unindexed fields with mixed values) fall back to JSON. Use the :: cast operator to force a specific wire type.

For example, to cast published_year to int4 and rating to float8, use:

SELECT title, published_year::int4, rating::float8 FROM books LIMIT 10;

SQL Reference

Table references

Statements use <table> as either a collection name or a collection plus partition:

<table> ::= [schema.]collection[$partition]

The schema prefix is accepted but ignored — all collections live in a single namespace per project.

Partitions

Partitions can be specified with $ or the PARTITION keyword:

Form	Example
`collection`	`books`
`schema.collection`	`public.books`
`collection$partition`	`books$2024`
`schema.collection$partition`	`public.books$2024`
`collection PARTITION name`	`books PARTITION 2024`

Partition syntax applies to SELECT, INSERT, UPDATE, and DELETE only. CREATE TABLE and DROP TABLE name collections without partitions; partitions are created implicitly on first write.

SELECT

The primary query interface. Compute search scores in the projection, filter by metadata or text predicates in WHERE, and rank results by any scoring expression in ORDER BY.

SELECT <projection> FROM <table>
  [WHERE <filter>]
  [ORDER BY <expr> [ASC | DESC] LIMIT <n> [OFFSET <m>]]

OFFSET requires a LIMIT — OFFSET alone is not supported.

Projection

Each item in the select list must be one of:

A named column: SELECT title, rating
A struct subfield: SELECT metadata.publisher
An aliased expression: SELECT rating * 2 AS double_rating — expressions must have an AS alias
A search function: SELECT vector_distance(embedding, '[1,0,0,0]'::f32_vector) AS score
A wire-type cast: SELECT rating::float8 AS rating_f64

Wire-type casts (::int4, ::float8, ::text, …) control how the column is typed on the wire, which is useful when your driver infers the wrong type. Unaliased casts use the inner expression as the column name (SELECT title::text → column title). See Output type mapping for the full table.

COUNT(*) returns the number of matching documents. The result column is named _count unless aliased with AS.

WHERE filters

Operator	Example
`=`, `<>`, `!=`, `<`, `<=`, `>`, `>=`	`rating > 4.0`
`AND`, `OR`, `NOT`	`genre = 'fantasy' AND in_print = true`
`IS NULL`, `IS NOT NULL`	`checksum IS NOT NULL`
`IN`, `NOT IN`	`genre IN ('fantasy', 'fiction')`
`BETWEEN`, `NOT BETWEEN`	`published_year BETWEEN 1900 AND 2000`
`LIKE`, `NOT LIKE`	`title LIKE 'The%'`
`~` (regex)	`author ~ 'Tol.*'`
Arithmetic	`rating * 10 > 40`
`CASE WHEN … THEN … ELSE … END`	`CASE WHEN rating > 4.0 THEN 'top' ELSE 'other' END AS tier`
`contains(field, scalar)`	`contains(tags, 'classic')`
`match(query [, field [, weight [, all]]])`	`match('hobbit rings', title)`
`match_tokens(tokens [, field [, all]])`	`match_tokens(ARRAY['love', 'classic'], tags)`
`should(query [, field [, weight]])`	`should('rings', title)`
`match_all(field, query)`	`match_all(title, 'hobbit rings')`
`match_any(field, query)`	`match_any(tags, ARRAY['love', 'classic'])`

match(...) and match_tokens(...) search keyword-indexed text and can be ranked with bm25_score(). Combine text searches with AND / OR; add metadata filters with AND:

WHERE (match('hobbit', title) OR match('dune', title))
  AND rating > 4.0

Text searches can be combined with each other using OR, and with metadata filters using AND.

Search Functions

TopK extends SQL with search-specific functions for scoring and filtering. Scoring functions compute relevance in SELECT; text search predicates filter documents in WHERE.

Scoring functions

Scoring functions compute a relevance score for each document. Call them in SELECT, give them an alias, then use that alias in ORDER BY to rank results. All scoring functions return the score as f32:

SELECT _id, title, bm25_score() AS score
FROM books
WHERE match('dune', title)
ORDER BY score DESC
LIMIT 10;

Function	Description
`vector_distance(field, query [, skip_refine])`	Dense or sparse ANN distance
`multi_vector_distance(field, query [, candidates])`	Multi-vector MaxSim distance
`semantic_similarity(field, query)`	Semantic embedding similarity
`bm25_score([b, k1])`	Keyword relevance score; requires `match(...)` or `match_tokens(...)` in `WHERE`
`boost(score, condition, factor)`	Multiply score when condition is true

Text search predicates

Text search predicates filter documents in WHERE based on keyword matches. match(...) and match_tokens(...) enable BM25 scoring via bm25_score().

SELECT _id, title, bm25_score() AS score
FROM books
WHERE match('dune', title)
ORDER BY score DESC
LIMIT 10;

Function	Description
`match(query [, field [, weight [, all]]])`	Keyword text search
`match_tokens(tokens [, field [, all]])`	Keyword token search
`should(query [, field [, weight]])`	Optional BM25 scoring term without filtering
`match_all(field, query)`	Boolean predicate — all terms must match
`match_any(field, query)`	Boolean predicate — any term must match
`contains(field, scalar)`	List membership or string substring

should(...) adds an optional BM25 scoring term without filtering documents from the result set — documents containing the term score higher, while documents that do not contain it remain eligible. On its own it matches the entire collection; combine it with match(...) using AND when some terms are required and others should only influence ranking:

SELECT _id, title, bm25_score() AS score
FROM books
WHERE match('hobbit rings', title) AND should('lord', title)
ORDER BY score DESC
LIMIT 10;

match_all(...) and match_any(...) are boolean-only predicates — they filter documents but do not affect relevance ranking:

SELECT _id, title
FROM books
WHERE match_any(title, ARRAY['dune', 'hobbit'])
  AND rating > 4.0;

Text searches can be combined with each other using OR, and with metadata filters using AND:

SELECT _id, title, bm25_score() AS score
FROM books
WHERE (match('hobbit', title) OR match('dune', title))
  AND rating > 4.0
ORDER BY score DESC
LIMIT 10;

match(...) and match_tokens(...) cannot be combined with metadata conditions using OR. Use match_any(...) or match_all(...) for keyword predicates that need to appear in an OR alongside other conditions — they work in any logical expression but do not affect bm25_score().

Example queries

-- Vector ANN
SELECT
    _id,
    title,
    vector_distance(embedding, '[1,0,0,0]'::f32_vector) AS vec_dist
FROM books
ORDER BY vec_dist DESC  -- DESC for cosine/dot_product; ASC for euclidean
LIMIT 3;

-- Sparse vector ANN
SELECT
    _id,
    title,
    vector_distance(sparse_emb, '{"0":1.0,"1":0.5}'::f32_sparse_vector) AS vec_dist
FROM books
ORDER BY vec_dist DESC
LIMIT 3;

-- Multi-vector MaxSim
SELECT
    _id,
    title,
    multi_vector_distance(multi_emb, '[[1.0,0.0,0.0,0.0]]'::f32_matrix) AS vec_dist
FROM books
ORDER BY vec_dist DESC
LIMIT 3;

-- Full-text BM25
SELECT
    _id,
    title,
    bm25_score() AS bm25_score
FROM books
WHERE match('hobbit rings', title)
ORDER BY bm25_score DESC
LIMIT 5;

-- Semantic similarity
SELECT
    _id,
    title,
    semantic_similarity(bio, 'tales of magic and adventure') AS sem_similarity
FROM books
ORDER BY sem_similarity DESC
LIMIT 3;

-- Hybrid: vector + boost
SELECT
    _id,
    title,
    vector_distance(embedding, '[1,0,0,0]'::f32_vector) AS vec_dist
FROM books
WHERE match_any(title, 'hobbit')
ORDER BY boost(vec_dist, in_print = true, 1.5)
LIMIT 5;

-- Boolean keyword predicates
SELECT _id, title
FROM books
WHERE match_all(title, 'the hobbit')
   OR match_any(title, ARRAY['dune', '1984'])
LIMIT 5;

GROUP BY

Group rows by one or more key expressions and compute aggregations per group.

SELECT <keys and/or aggregate calls> FROM <table>
  [WHERE <filter>]
  GROUP BY <key> [, <key> ...]
  [HAVING <filter on aggregates>]
  [ORDER BY <expr> [ASC | DESC] LIMIT <n>]

A GROUP BY key must be a bare identifier — either a real column (GROUP BY genre) or a SELECT-list alias for a computed expression:

SELECT published_year < 1940 AS is_classic, COUNT(*) AS count
FROM books
GROUP BY is_classic;

Every item in the SELECT list of a GROUP BY query must be either a group key or an aggregate function call — arbitrary columns are not allowed.

Function	Description
`COUNT(*)`	Number of rows in the group
`COUNT(field)`	Number of non-null values of `field` in the group
`SUM(field)`	Sum of `field` in the group
`MIN(field)`	Minimum value of `field` in the group
`MAX(field)`	Maximum value of `field` in the group
`AVG(field)`	Average value of `field` in the group

HAVING filters on the grouped/aggregated output and requires a GROUP BY clause:

SELECT genre, COUNT(*) AS count, AVG(rating) AS avg_rating
FROM books
GROUP BY genre
HAVING COUNT(*) > 1
ORDER BY avg_rating DESC
LIMIT 5;

GROUP BY ALL and ROLLUP / CUBE / GROUPING SETS modifiers are not supported.

INSERT

Upsert semantics: inserting a document with an existing _id replaces it. _id is required and must appear in the column list.

INSERT INTO <table> (<col>, ...) VALUES (<val>, ...) [, (<val>, ...) ...];

Scalars use plain literals. TopK-native values use ::topk_type casts or constructor calls — see Type System. Prefer casts in VALUES:

INSERT INTO books (_id, title, author, published_year, rating, embedding, sparse_emb)
VALUES
    (
        'hobbit', 'The Hobbit', 'Tolkien', 1937, 4.3,
        '[1.0, 0.0, 0.0, 0.0]'::f32_vector,
        '{"0": 1.0, "1": 0.5}'::f32_sparse_vector
    );

UPDATE

Updates one or more fields on existing documents. _id cannot be updated. A WHERE clause is required and must resolve to a set of document IDs. Value expressions in SET follow the same rules as INSERT VALUES.

UPDATE <table> SET <col> = <val> [, ...] WHERE _id = '<id>';
UPDATE <table> SET <col> = <val> [, ...] WHERE _id IN ('<id1>', '<id2>', ...);

DELETE

Deletes documents by ID or by filter expression. A WHERE clause is required unless the target is a partition, in which case the entire partition is dropped.

DELETE FROM <table> WHERE _id = '<id>';
DELETE FROM <table> WHERE _id IN ('<id1>', '<id2>', ...);
DELETE FROM <table> WHERE <filter_expr>;
DELETE FROM <collection>$<partition>;

CREATE TABLE

Schema is defined once at collection creation. Indexes are declared inline on each column using a TopK-specific INDEX clause — standalone CREATE INDEX statements are not supported.

CREATE TABLE [IF NOT EXISTS] <table> (
    <column> <type> [NOT NULL] [INDEX <method>(<options>)],
    ...
);

IF NOT EXISTS suppresses the error if the collection already exists.

Column types

SQL type	TopK field type
`BOOLEAN`	`boolean`
`INTEGER` / `BIGINT` / `INT`	`integer`
`SMALLINT` / `INT2` / `INT4`	`integer`
`FLOAT` / `REAL` / `DOUBLE PRECISION`	`float`
`TEXT` / `VARCHAR`	`text`
`BYTEA`	`bytes`
`TEXT[]`	`list<string>`
`INTEGER[]`	`list<integer>`
`FLOAT[]`	`list<float>`
`JSONB`	`struct`
`f32_vector(n)`	`f32_vector(n)`
`f16_vector(n)`	`f16_vector(n)`
`f8_vector(n)`	`f8_vector(n)`
`u8_vector(n)`	`u8_vector(n)`
`i8_vector(n)`	`i8_vector(n)`
`binary_vector(n)`	`binary_vector(n)`
`f32_sparse_vector`	`f32_sparse_vector`
`f16_sparse_vector`	`f16_sparse_vector`
`f8_sparse_vector`	`f8_sparse_vector`
`u8_sparse_vector`	`u8_sparse_vector`
`i8_sparse_vector`	`i8_sparse_vector`
`f32_matrix(n)`	`f32_matrix(n)`
`f16_matrix(n)`	`f16_matrix(n)`
`f8_matrix(n)`	`f8_matrix(n)`
`u8_matrix(n)`	`u8_matrix(n)`
`i8_matrix(n)`	`i8_matrix(n)`

NOT NULL marks a field as required. All columns are optional by default.

Index types

Method	Applies to	Options
`keyword_index()`	`TEXT`, `VARCHAR`	type: `text` (default) — tokenized before indexing `exact` — indexed as a single term
`semantic_index()`	`TEXT`, `VARCHAR`	—
`vector_index()`	`_vector(n)`, `_sparse_vector`	metric: `cosine`, `dot_product`, `euclidean`, `hamming`
`multi_vector_index()`	`*_matrix(n)`	metric: `maxsim` quantization: `1bit`, `2bit`, `scalar` width, top_k

Example

CREATE TABLE books (
    title          TEXT NOT NULL               INDEX keyword_index(),
    author         TEXT NOT NULL,
    published_year INTEGER NOT NULL,
    rating         FLOAT,
    genre          TEXT                        INDEX keyword_index(type = 'exact'),
    in_print       BOOLEAN,
    bio            TEXT                        INDEX semantic_index(),
    embedding      f32_vector(4)               INDEX vector_index(metric = 'cosine'),
    sparse_emb     f32_sparse_vector           INDEX vector_index(metric = 'dot_product'),
    multi_emb      f32_matrix(4)               INDEX multi_vector_index(metric = 'maxsim'),
    tags           TEXT[],
    checksum       BYTEA,
    metadata       JSONB
);

DROP TABLE

Permanently deletes a collection and all of its documents and indexes. This operation is irreversible.

DROP TABLE [IF EXISTS] <table>;

IF EXISTS suppresses the error if the collection does not exist.

information_schema

TopK exposes two information_schema virtual tables for inspecting collections and their schemas. Specify column names explicitly in the select list — SELECT * is not supported on virtual tables.

information_schema.tables

Returns one row per collection in the project. Use it to list collections or check whether a specific collection exists.

SELECT table_name, table_schema, table_type FROM information_schema.tables;

WHERE clauses are accepted but silently ignored — all collections are always returned.

Column	Type	Value
`table_name`	`text`	collection name
`table_schema`	`text`	`"public"`
`table_type`	`text`	`"BASE TABLE"`
`table_owner`	`text`	`"topk"`

information_schema.columns

Returns one row per declared field in a collection. Filter by collection name using WHERE table_name = '<name>'.

SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'books';

Additional AND clauses (e.g. AND table_schema = 'public') are accepted but silently ignored.

Column	Type	Value
`column_name`	`text`	field name
`data_type`	`text`	see mapping below
`is_nullable`	`text`	`"YES"` or `"NO"`
`table_name`	`text`	collection name

Data type mapping:

TopK field type	`data_type`
`text`	`text`
`integer`	`bigint`
`float`	`double precision`
`boolean`	`boolean`
`bytes`	`bytea`
`*_vector(n)` (all dense variants)	`real[]`
`*_sparse_vector` (all variants)	`jsonb`
`list`, `struct`, `*_matrix`	`jsonb`

EXPLAIN

Returns the parsed Statement AST as a single plan TEXT column:

EXPLAIN <statement>;
EXPLAIN VERBOSE <statement>;

Session commands

Command	Behavior
`SET consistency_level = 'indexed'`	Indexed consistency for subsequent reads
`SET consistency_level = 'strong'`	Strong consistency for subsequent reads
`SET consistency_level = 'default'`	Clears the session override (router default)
`SHOW consistency_level`	Returns the current consistency level

SET/SHOW only recognize consistency_level; all other variable names return an error. The following commands are accepted and silently succeed:

Command
`BEGIN`
`COMMIT`
`ROLLBACK`
`DISCARD <anything>`

PostgreSQL Compatibility

TopK speaks the PostgreSQL wire protocol but is purpose-built for search — not general-purpose relational queries. Some SQL features are intentionally absent because they don’t map to search semantics or would work against search performance.

Queries

Single ORDER BY key — Ranking is a single scoring expression. Combine multiple signals into one composite score rather than sorting by multiple columns: ORDER BY vec_dist * 0.7 + bm25_score() * 0.3 DESC. No SELECT * — Selecting all columns is not supported. Indexed vector fields cannot be projected directly — use search functions (vector_distance, semantic_similarity, …) to get scores from them instead. COUNT(*) without GROUP BY is a standalone aggregate — it returns a document count and cannot be combined with other columns in the same SELECT list. Inside a GROUP BY query, COUNT(*) behaves as an ordinary per-group aggregate and can be combined with other keys/aggregates — see GROUP BY. GROUP BY keys must be identifiers — group by a real column or a SELECT-list alias for a computed expression; you can’t group by an inline expression directly. GROUP BY ALL and ROLLUP/CUBE/GROUPING SETS are not supported. No whole-value equality on complex types — tags = ARRAY['a'] is not supported. Use contains(tags, 'a') to check whether a single value exists in a list field — contains checks for one value at a time, not a list of values.

Writes

Upsert-only INSERT — INSERT always replaces a document when _id already exists, so ON CONFLICT is built-in and not needed. INSERT … SELECT and RETURNING are not supported. ::topk_type casts in VALUES — Standard PostgreSQL casts (::float8, CAST(… AS text)) aren’t supported inside INSERT/UPDATE values. Use TopK-native casts (::f32_vector, ::f32_sparse_vector, …) instead — see Type System.

Transactions

BEGIN/COMMIT/ROLLBACK are accepted without error so that drivers that auto-wrap statements in transactions (psycopg2, SQLAlchemy, JDBC) connect without modification. Writes are not transactional; ROLLBACK does not undo changes.

Structs

Individual struct subfields can be selected using dot notation:

SELECT metadata.publisher, metadata.year FROM books LIMIT 10;

To retrieve multiple subfields, list each one explicitly — selecting a whole struct column is not supported:

-- not supported
SELECT metadata FROM books;

Type System

TopK-native literals use ::topk_type casts (preferred in INSERT/UPDATE VALUES and search-function arguments) or equivalent constructor calls. PostgreSQL wire-type casts (::float8, …) apply only in SELECT projection.

Constructor	Example
`f32_vector(ARRAY[…])`	`'[0.1, 0.2, 0.3]'::f32_vector` or `f32_vector(ARRAY[0.1, 0.2, 0.3])`
`f16_vector(ARRAY[…])`	(same pattern)
`f8_vector(ARRAY[…])`	(same pattern)
`u8_vector(ARRAY[…])`	(same pattern)
`i8_vector(ARRAY[…])`	(same pattern)
`binary_vector(ARRAY[…])`	(same pattern)
`f32_sparse_vector(ARRAY[idx], ARRAY[val])`	`f32_sparse_vector(ARRAY[0, 2], ARRAY[1.0, 0.5])` or `'{"0":1.0,"2":0.5}'::f32_sparse_vector` or `'{"indices":[0,2],"values":[1.0,0.5]}'::f32_sparse_vector`
`f16_sparse_vector(…)`	(same pattern)
`f8_sparse_vector(…)`	(same pattern)
`u8_sparse_vector(…)`	(same pattern)
`i8_sparse_vector(…)`	(same pattern)
`f32_matrix(ARRAY[ARRAY[row1],…])`	`'[[1.0, 0.0], [0.5, 0.5]]'::f32_matrix`
`f16_matrix(…)`	(same pattern)
`f8_matrix(…)`	(same pattern)
`u8_matrix(…)`	(same pattern)
`i8_matrix(…)`	(same pattern)
`bytes('hexstring')`	`bytes('deadbeef')`
`struct(key1, val1, …)`	`struct('publisher', 'Penguin', 'pages', 320)`
`ARRAY[elem, …]`	`ARRAY['classic', 'fiction']` (list)

Complex types → JSON

Vector, matrix, sparse, and list types are returned as JSON when selected. Indexed vector fields cannot be projected directly — use scoring functions to compute scores from them. The table below documents the wire format for each type:

TopK type	JSON wire representation
Dense vector	`[0.1, 0.2, 0.3]`
Sparse vector	`{"indices":[0,2],"values":[1.0,0.5]}` or `{"0":1.0,"2":0.5}`
Matrix (multi-vector)	`[[1.0,0.0],[0.5,0.1]]` (row-major)
List	`["a","b"]` / `[1,2,3]`
Struct	`{"publisher":"Scribner","pages":180}`
Binary	`\xdeadbeef`

Output type mapping

pgwire maps SELECT-list expressions to PostgreSQL OIDs. Explicit ::cast in the projection list overrides inference. Casts are stripped from the query plan — they only affect the wire type.

Expression	pg OID
`::bool`	16 `BOOL`
`::smallint` / `::int2`	21 `INT2`
`::int` / `::int4`	23 `INT4`
`::bigint` / `::int8`	20 `INT8`
`::real` / `::float4`	700 `FLOAT4`
`::float` / `::float8`	701 `FLOAT8`
`::text`	25 `TEXT`
`::bytea`	17 `BYTEA`
`::json` / `::jsonb`	114 `JSON`
plain column (no cast)	114 `JSON`
search function (no cast)	700 `FLOAT4`
`COUNT(*)` (no cast)	20 `INT8`

​Prerequisites

​Setup

​Quick Start

​Schemaless by default

​SQL Reference

​Table references

​Partitions

​SELECT

​Projection

​WHERE filters

​Search Functions

Scoring functions

Text search predicates

Example queries

​GROUP BY

​INSERT

​UPDATE

​DELETE

​CREATE TABLE

​Column types

​Index types

​Example

​DROP TABLE

​information_schema

​information_schema.tables

​information_schema.columns

​EXPLAIN

​Session commands

​PostgreSQL Compatibility

​Queries

​Writes

​Transactions

​Structs

​Type System

​Complex types → JSON

​Output type mapping

Prerequisites

Setup

Quick Start

Schemaless by default

SQL Reference

Table references

Partitions

SELECT

Projection

WHERE filters

Search Functions

GROUP BY

INSERT

UPDATE

DELETE

CREATE TABLE

Column types

Index types

Example

DROP TABLE

information_schema

information_schema.tables

information_schema.columns

EXPLAIN

Session commands

PostgreSQL Compatibility

Queries

Writes

Transactions

Structs

Type System

Complex types → JSON

Output type mapping