Skip to main content
TopK exposes a PostgreSQL wire protocol endpoint — any standard SQL client can connect and issue queries and writes against your collections.

Prerequisites

Setup

Connect using any PostgreSQL-compatible client — psql, psycopg2, node-postgres, tokio-postgres, and others.
psql "host=<region>.sql.topk.io port=5432 user=topk password=<api-key> dbname=topk"
Replace <region> with your selected region and <api-key> with your API key.
See available regions and get your API key.

Quick Start

Create a collection, insert documents, and run a semantic search with SQL:
-- 1. Create a collection
CREATE TABLE books (
    title   TEXT NOT NULL,
    content TEXT          INDEX semantic_index(),
    author  TEXT NOT NULL,
    rating  FLOAT
);

-- 2. Insert documents
INSERT INTO books (_id, title, content, author, rating)
VALUES
    ('1', 'The Hobbit',
     'A hobbit embarks on an unexpected journey through Middle-earth.',
     'Tolkien', 4.3),
    ('2', '1984',
     'A dystopian novel about totalitarian surveillance and control.',
     'George Orwell', 4.7),
    ('3', 'Dune',
     'An epic saga of politics, religion, and ecology on a desert planet.',
     'Frank Herbert', 4.5);

-- 3. Search
SELECT _id, title, semantic_similarity(content, 'political control and oppression') AS score
FROM books
ORDER BY score DESC
LIMIT 10;
The INDEX clause is a TopK extension to SQL that lets you declare search indexes inline with your column definitions.

Schemaless by default

TopK collections are schemaless by default. Documents can have any fields with any types — no schema declaration is required to store or query them. The only exception: fields you want to index must be declared, because index configuration (metric, index type, etc.) is attached to the field definition.
Column types are inferred from the schema, search function return values, and explicit casts. Columns that can’t be typed statically (e.g. unindexed fields with mixed values) fall back to JSON. Use the :: cast operator to force a specific wire type.
For example, to cast published_year to int4 and rating to float8, use:
SELECT title, published_year::int4, rating::float8 FROM books LIMIT 10;

SQL Reference

Table references

Statements use <table> as either a collection name or a collection plus partition:
<table> ::= [schema.]collection[$partition]
The schema prefix is accepted but ignored — all collections live in a single namespace per project.

Partitions

Partitions can be specified with $ or the PARTITION keyword:
FormExample
collectionbooks
schema.collectionpublic.books
collection$partitionbooks$2024
schema.collection$partitionpublic.books$2024
collection PARTITION namebooks PARTITION 2024
Partition syntax applies to SELECT, INSERT, UPDATE, and DELETE only. CREATE TABLE and DROP TABLE name collections without partitions; partitions are created implicitly on first write.

SELECT

The primary query interface. Compute search scores in the projection, filter by metadata or text predicates in WHERE, and rank results by any scoring expression in ORDER BY.
SELECT <projection> FROM <table>
  [WHERE <filter>]
  [ORDER BY <expr> [ASC | DESC] LIMIT <n>]

Projection

Each item in the select list must be one of:
  • A named column: SELECT title, rating
  • A struct subfield: SELECT metadata.publisher
  • An aliased expression: SELECT rating * 2 AS double_rating — expressions must have an AS alias
  • A search function: SELECT vector_distance(embedding, '[1,0,0,0]'::f32_vector) AS score
  • A wire-type cast: SELECT rating::float8 AS rating_f64
Wire-type casts (::int4, ::float8, ::text, …) control how the column is typed on the wire, which is useful when your driver infers the wrong type. Unaliased casts use the inner expression as the column name (SELECT title::text → column title). See Output type mapping for the full table.
COUNT(*) returns the number of matching documents. The result column is named _count unless aliased with AS.

WHERE filters

OperatorExample
=, <>, !=, <, <=, >, >=rating > 4.0
AND, OR, NOTgenre = 'fantasy' AND in_print = true
IS NULL, IS NOT NULLchecksum IS NOT NULL
IN, NOT INgenre IN ('fantasy', 'fiction')
BETWEEN, NOT BETWEENpublished_year BETWEEN 1900 AND 2000
LIKE, NOT LIKEtitle LIKE 'The%'
~ (regex)author ~ 'Tol.*'
Arithmeticrating * 10 > 40
CASE WHEN … THEN … ELSE … ENDCASE WHEN rating > 4.0 THEN 'top' ELSE 'other' END AS tier
contains(field, scalar)contains(tags, 'classic')
match(query [, field [, weight [, all]]])match('hobbit rings', title)
match_tokens(tokens [, field [, all]])match_tokens(ARRAY['love', 'classic'], tags)
match_all(field, query)match_all(title, 'hobbit rings')
match_any(field, query)match_any(tags, ARRAY['love', 'classic'])
match(...) and match_tokens(...) search keyword-indexed text and can be ranked with bm25_score(). Combine text searches with AND / OR; add metadata filters with AND:
WHERE (match('hobbit', title) OR match('dune', title))
  AND rating > 4.0
Text searches can be combined with each other using OR, and with metadata filters using AND.

Search Functions

TopK extends SQL with search-specific functions for scoring and filtering. Scoring functions compute relevance in SELECT; text search predicates filter documents in WHERE.
Scoring functions
Scoring functions compute a relevance score for each document. Call them in SELECT, give them an alias, then use that alias in ORDER BY to rank results. All scoring functions return the score as f32:
SELECT _id, title, bm25_score() AS score
FROM books
WHERE match('dune', title)
ORDER BY score DESC
LIMIT 10;
FunctionDescription
vector_distance(field, query [, skip_refine])Dense or sparse ANN distance
multi_vector_distance(field, query [, candidates])Multi-vector MaxSim distance
semantic_similarity(field, query)Semantic embedding similarity
bm25_score([b, k1])Keyword relevance score; requires match(...) or match_tokens(...) in WHERE
boost(score, condition, factor)Multiply score when condition is true
Text search predicates
Text search predicates filter documents in WHERE based on keyword matches. match(...) and match_tokens(...) enable BM25 scoring via bm25_score().
SELECT _id, title, bm25_score() AS score
FROM books
WHERE match('dune', title)
ORDER BY score DESC
LIMIT 10;
FunctionDescription
match(query [, field [, weight [, all]]])Keyword text search
match_tokens(tokens [, field [, all]])Keyword token search
match_all(field, query)Boolean predicate — all terms must match
match_any(field, query)Boolean predicate — any term must match
contains(field, scalar)List membership or string substring
match_all(...) and match_any(...) are boolean-only predicates — they filter documents but do not affect relevance ranking:
SELECT _id, title
FROM books
WHERE match_any(title, ARRAY['dune', 'hobbit'])
  AND rating > 4.0;
Text searches can be combined with each other using OR, and with metadata filters using AND:
SELECT _id, title, bm25_score() AS score
FROM books
WHERE (match('hobbit', title) OR match('dune', title))
  AND rating > 4.0
ORDER BY score DESC
LIMIT 10;
match(...) and match_tokens(...) cannot be combined with metadata conditions using OR. Use match_any(...) or match_all(...) for keyword predicates that need to appear in an OR alongside other conditions — they work in any logical expression but do not affect bm25_score().
Example queries
-- Vector ANN
SELECT
    _id,
    title,
    vector_distance(embedding, '[1,0,0,0]'::f32_vector) AS vec_dist
FROM books
ORDER BY vec_dist DESC  -- DESC for cosine/dot_product; ASC for euclidean
LIMIT 3;

-- Sparse vector ANN
SELECT
    _id,
    title,
    vector_distance(sparse_emb, '{"0":1.0,"1":0.5}'::f32_sparse_vector) AS vec_dist
FROM books
ORDER BY vec_dist DESC
LIMIT 3;

-- Multi-vector MaxSim
SELECT
    _id,
    title,
    multi_vector_distance(multi_emb, '[[1.0,0.0,0.0,0.0]]'::f32_matrix) AS vec_dist
FROM books
ORDER BY vec_dist DESC
LIMIT 3;

-- Full-text BM25
SELECT
    _id,
    title,
    bm25_score() AS bm25_score
FROM books
WHERE match('hobbit rings', title)
ORDER BY bm25_score DESC
LIMIT 5;

-- Semantic similarity
SELECT
    _id,
    title,
    semantic_similarity(bio, 'tales of magic and adventure') AS sem_similarity
FROM books
ORDER BY sem_similarity DESC
LIMIT 3;

-- Hybrid: vector + boost
SELECT
    _id,
    title,
    vector_distance(embedding, '[1,0,0,0]'::f32_vector) AS vec_dist
FROM books
WHERE match_any(title, 'hobbit')
ORDER BY boost(vec_dist, in_print = true, 1.5)
LIMIT 5;

-- Boolean keyword predicates
SELECT _id, title
FROM books
WHERE match_all(title, 'the hobbit')
   OR match_any(title, ARRAY['dune', '1984'])
LIMIT 5;

INSERT

Upsert semantics: inserting a document with an existing _id replaces it. _id is required and must appear in the column list.
INSERT INTO <table> (<col>, ...) VALUES (<val>, ...) [, (<val>, ...) ...];
Scalars use plain literals. TopK-native values use ::topk_type casts or constructor calls — see Type System. Prefer casts in VALUES:
INSERT INTO books (_id, title, author, published_year, rating, embedding, sparse_emb)
VALUES
    (
        'hobbit', 'The Hobbit', 'Tolkien', 1937, 4.3,
        '[1.0, 0.0, 0.0, 0.0]'::f32_vector,
        '{"0": 1.0, "1": 0.5}'::f32_sparse_vector
    );

UPDATE

Updates one or more fields on existing documents. _id cannot be updated. A WHERE clause is required and must resolve to a set of document IDs. Value expressions in SET follow the same rules as INSERT VALUES.
UPDATE <table> SET <col> = <val> [, ...] WHERE _id = '<id>';
UPDATE <table> SET <col> = <val> [, ...] WHERE _id IN ('<id1>', '<id2>', ...);

DELETE

Deletes documents by ID or by filter expression. A WHERE clause is required unless the target is a partition, in which case the entire partition is dropped.
DELETE FROM <table> WHERE _id = '<id>';
DELETE FROM <table> WHERE _id IN ('<id1>', '<id2>', ...);
DELETE FROM <table> WHERE <filter_expr>;
DELETE FROM <collection>$<partition>;

CREATE TABLE

Schema is defined once at collection creation. Indexes are declared inline on each column using a TopK-specific INDEX clause — standalone CREATE INDEX statements are not supported.
CREATE TABLE [IF NOT EXISTS] <table> (
    <column> <type> [NOT NULL] [INDEX <method>(<options>)],
    ...
);
IF NOT EXISTS suppresses the error if the collection already exists.

Column types

SQL typeTopK field type
BOOLEANboolean
INTEGER / BIGINT / INTinteger
SMALLINT / INT2 / INT4integer
FLOAT / REAL / DOUBLE PRECISIONfloat
TEXT / VARCHARtext
BYTEAbytes
TEXT[]list<string>
INTEGER[]list<integer>
FLOAT[]list<float>
JSONBstruct
f32_vector(n)f32_vector(n)
f16_vector(n)f16_vector(n)
f8_vector(n)f8_vector(n)
u8_vector(n)u8_vector(n)
i8_vector(n)i8_vector(n)
binary_vector(n)binary_vector(n)
f32_sparse_vectorf32_sparse_vector
f16_sparse_vectorf16_sparse_vector
f8_sparse_vectorf8_sparse_vector
u8_sparse_vectoru8_sparse_vector
i8_sparse_vectori8_sparse_vector
f32_matrix(n)f32_matrix(n)
f16_matrix(n)f16_matrix(n)
f8_matrix(n)f8_matrix(n)
u8_matrix(n)u8_matrix(n)
i8_matrix(n)i8_matrix(n)
NOT NULL marks a field as required. All columns are optional by default.

Index types

MethodApplies toOptions
keyword_index()TEXT, VARCHAR
semantic_index()TEXT, VARCHAR
vector_index()*_vector(n), *_sparse_vectormetric: cosine, dot_product, euclidean, hamming
multi_vector_index()*_matrix(n)metric: maxsim
quantization: 1bit, 2bit, scalar
width, top_k

Example

CREATE TABLE books (
    title          TEXT NOT NULL               INDEX keyword_index(),
    author         TEXT NOT NULL,
    published_year INTEGER NOT NULL,
    rating         FLOAT,
    genre          TEXT,
    in_print       BOOLEAN,
    bio            TEXT                        INDEX semantic_index(),
    embedding      f32_vector(4)               INDEX vector_index(metric = 'cosine'),
    sparse_emb     f32_sparse_vector           INDEX vector_index(metric = 'dot_product'),
    multi_emb      f32_matrix(4)               INDEX multi_vector_index(metric = 'maxsim'),
    tags           TEXT[],
    checksum       BYTEA,
    metadata       JSONB
);

DROP TABLE

Permanently deletes a collection and all of its documents and indexes. This operation is irreversible.
DROP TABLE [IF EXISTS] <table>;
IF EXISTS suppresses the error if the collection does not exist.

information_schema

TopK exposes two information_schema virtual tables for inspecting collections and their schemas. Specify column names explicitly in the select list — SELECT * is not supported on virtual tables.

information_schema.tables

Returns one row per collection in the project. Use it to list collections or check whether a specific collection exists.
SELECT table_name, table_schema, table_type FROM information_schema.tables;
WHERE clauses are accepted but silently ignored — all collections are always returned.
ColumnTypeValue
table_nametextcollection name
table_schematext"public"
table_typetext"BASE TABLE"
table_ownertext"topk"

information_schema.columns

Returns one row per declared field in a collection. Filter by collection name using WHERE table_name = '<name>'.
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'books';
Additional AND clauses (e.g. AND table_schema = 'public') are accepted but silently ignored.
ColumnTypeValue
column_nametextfield name
data_typetextsee mapping below
is_nullabletext"YES" or "NO"
table_nametextcollection name
Data type mapping:
TopK field typedata_type
texttext
integerbigint
floatdouble precision
booleanboolean
bytesbytea
*_vector(n) (all dense variants)real[]
*_sparse_vector (all variants)jsonb
list, struct, *_matrixjsonb

EXPLAIN

Returns the parsed Statement AST as a single plan TEXT column:
EXPLAIN <statement>;
EXPLAIN VERBOSE <statement>;

Session commands

CommandBehavior
SET consistency_level = 'indexed'Indexed consistency for subsequent reads
SET consistency_level = 'strong'Strong consistency for subsequent reads
SET consistency_level = 'default'Clears the session override (router default)
SHOW consistency_levelReturns the current consistency level
SET/SHOW only recognize consistency_level; all other variable names return an error. The following commands are accepted and silently succeed:
Command
BEGIN
COMMIT
ROLLBACK
DISCARD <anything>

PostgreSQL Compatibility

TopK speaks the PostgreSQL wire protocol but is purpose-built for search — not general-purpose relational queries. Some SQL features are intentionally absent because they don’t map to search semantics or would work against search performance.

Queries

No OFFSET — Search indexes are optimized for top-k retrieval. Skipping N results requires scanning and discarding ranked candidates, which defeats the purpose. Express pagination through your ranking score instead. Single ORDER BY key — Ranking is a single scoring expression. Combine multiple signals into one composite score rather than sorting by multiple columns: ORDER BY vec_dist * 0.7 + bm25_score() * 0.3 DESC. No SELECT * — Selecting all columns is not supported. Indexed vector fields cannot be projected directly — use search functions (vector_distance, semantic_similarity, …) to get scores from them instead. COUNT(*) is a standalone aggregate — it returns a document count and cannot be combined with other columns in the same SELECT list. No whole-value equality on complex typestags = ARRAY['a'] is not supported. Use contains(tags, 'a') to check whether a single value exists in a list field — contains checks for one value at a time, not a list of values.

Writes

Upsert-only INSERTINSERT always replaces a document when _id already exists, so ON CONFLICT is built-in and not needed. INSERT … SELECT and RETURNING are not supported. ::topk_type casts in VALUES — Standard PostgreSQL casts (::float8, CAST(… AS text)) aren’t supported inside INSERT/UPDATE values. Use TopK-native casts (::f32_vector, ::f32_sparse_vector, …) instead — see Type System.

Transactions

BEGIN/COMMIT/ROLLBACK are accepted without error so that drivers that auto-wrap statements in transactions (psycopg2, SQLAlchemy, JDBC) connect without modification. Writes are not transactional; ROLLBACK does not undo changes.

Structs

Individual struct subfields can be selected using dot notation:
SELECT metadata.publisher, metadata.year FROM books LIMIT 10;
To retrieve multiple subfields, list each one explicitly — selecting a whole struct column is not supported:
-- not supported
SELECT metadata FROM books;

Type System

TopK-native literals use ::topk_type casts (preferred in INSERT/UPDATE VALUES and search-function arguments) or equivalent constructor calls. PostgreSQL wire-type casts (::float8, …) apply only in SELECT projection.
ConstructorExample
f32_vector(ARRAY[…])'[0.1, 0.2, 0.3]'::f32_vector or f32_vector(ARRAY[0.1, 0.2, 0.3])
f16_vector(ARRAY[…])(same pattern)
f8_vector(ARRAY[…])(same pattern)
u8_vector(ARRAY[…])(same pattern)
i8_vector(ARRAY[…])(same pattern)
binary_vector(ARRAY[…])(same pattern)
f32_sparse_vector(ARRAY[idx], ARRAY[val])f32_sparse_vector(ARRAY[0, 2], ARRAY[1.0, 0.5]) or '{"0":1.0,"2":0.5}'::f32_sparse_vector or '{"indices":[0,2],"values":[1.0,0.5]}'::f32_sparse_vector
f16_sparse_vector(…)(same pattern)
f8_sparse_vector(…)(same pattern)
u8_sparse_vector(…)(same pattern)
i8_sparse_vector(…)(same pattern)
f32_matrix(ARRAY[ARRAY[row1],…])'[[1.0, 0.0], [0.5, 0.5]]'::f32_matrix
f16_matrix(…)(same pattern)
f8_matrix(…)(same pattern)
u8_matrix(…)(same pattern)
i8_matrix(…)(same pattern)
bytes('hexstring')bytes('deadbeef')
struct(key1, val1, …)struct('publisher', 'Penguin', 'pages', 320)
ARRAY[elem, …]ARRAY['classic', 'fiction'] (list)

Complex types → JSON

Vector, matrix, sparse, and list types are returned as JSON when selected. Indexed vector fields cannot be projected directly — use scoring functions to compute scores from them. The table below documents the wire format for each type:
TopK typeJSON wire representation
Dense vector[0.1, 0.2, 0.3]
Sparse vector{"indices":[0,2],"values":[1.0,0.5]} or {"0":1.0,"2":0.5}
Matrix (multi-vector)[[1.0,0.0],[0.5,0.1]] (row-major)
List["a","b"] / [1,2,3]
Struct{"publisher":"Scribner","pages":180}
Binary\xdeadbeef

Output type mapping

pgwire maps SELECT-list expressions to PostgreSQL OIDs. Explicit ::cast in the projection list overrides inference. Casts are stripped from the query plan — they only affect the wire type.
Expressionpg OID
::bool16 BOOL
::smallint / ::int221 INT2
::int / ::int423 INT4
::bigint / ::int820 INT8
::real / ::float4700 FLOAT4
::float / ::float8701 FLOAT8
::text25 TEXT
::bytea17 BYTEA
::json / ::jsonb114 JSON
plain column (no cast)114 JSON
search function (no cast)700 FLOAT4
COUNT(*) (no cast)20 INT8