vector_distance()
or semantic_similarity()
or custom properties computed inside select()
k
results based on the provided logical expressionselect()
function is used to initialize the select stage of a query. It accepts a key-value pair of field names and field expressions:
field()
function to select fields from a document. In the select stage, you can also rename existing fields
or define computed fields using function expressions.
vector_distance(field, vector)
: Computes distance between vectors for vector search. This function is available for all dense and sparse vector types.bm25_score()
: Calculates relevance scores using the BM25 algorithm for keyword searchsemantic_similarity(field, query)
: Measures semantic similarity between the provided text query and the field’s embeddingvector_distance()
function is used to compute the vector score between a query vector and a vector field in a collection.
There are multiple ways to represent a query vector:
[0.1, 0.2, 0.3, ...]
- Array of numbers resolved as a dense float32 vectorf32_vector([...])
- Helper function returning a dense float32 vectoru8_vector([...])
- Helper function returning a dense u8 vectorbinary_vector([...])
- Helper function returning a binary vector{ 0: 0.1, 1: 0.2, 2: 0.3, ... }
- Mapping from index → value resolved as a sparse float32 vectorf32_sparse_vector({ ... })
- Helper function returning a sparse float32 vectoru8_sparse_vector({ ... })
- Helper function returning a sparse u8 vectorskip_refine=True
to bypass the internal distance refinement step. This will improve performance for queries with larget top_k
at the cost of lower accuracy.
skip_refine=True
unless you’re using large top_k
and a custom reranking model to get the final ranking.vector_distance()
function, you must have a vector index defined on the field you’re computing the vector distance against:fn.bm25_score()
in your query, you must include a match
predicate in your filter stage.
fn.bm25_score()
function, you must have a keyword index defined in your collection schema.semantic_similarity()
function is used to compute the similarity between a text query and a text field in a collection.
To use the semantic_similarity()
function, you must have a semantic index defined on the field you’re computing the similarity on.
select()
(e.g. vector similarity or BM25 score) and more.
Filter expressions support all
match()
function is the backbone of keyword search in TopK.
It allows you to search for documents that contain specific keywords or phrases.
You can configure the match()
function to:
match()
function accepts the following parameters:
all
parameter when a text must contain all terms(separated by a delimeter)all
is false
(default) it’s an equivalent of OR
operatorall
is true
it’s an equivalent of AND
operator"catcher"
in your documents is as simple as using the match()
function in the filter stage of your query:
match()
function can be configured to match all terms when using a delimiter.
A term delimiter is any non-alphanumeric character.
To ensure that all terms are matched, use the all
parameter:
weight
parameter:
"catcher"
and were published in 1997
, or documents that were published between 1920
and 1980
.
and
operator can be used to combine multiple logical expressions.
or
operator can be used to combine multiple logical expressions.
not
helper can be used to negate a logical expression. It takes an expression as an argument and inverts its logic.
choose
operator evaluates a condition and returns the first argument if the condition is true, else the second argument.
boost
operator multiplies the scoring expression by the provided boost
value if the condition
is true.
Otherwise, the scoring expression is unchanged (multiplied by 1).
coalesce
operator replaces null
values with a provided value.
eq
operator can be used to match documents that have a field with a specific value.
ne
operator can be used to match documents that have a field with a value that is not equal to a specific value.
is_null
operator can be used to match documents that have a field with a value that is null
.
is_not_null
operator can be used to match documents that have a field with a value that is not null
.
gt
operator can be used to match documents that have a field with a value greater than a specific value.
gte
operator can be used to match documents that have a field with a value greater than or equal to a specific value.
lt
operator can be used to match documents that have a field with a value less than a specific value.
lte
operator can be used to match documents that have a field with a value less than or equal to a specific value.
starts_with
operator can be used on string fields to match documents that start with a given prefix. This is especially
useful in multi-tenant applications where document IDs can be structured as {tenant_id}/{document_id}
and starts_with
can
then be used to scope the query to a specific tenant.
contains
operator can be used on string fields to match documents that include a specific substring. It is case-sensitive and is particularly useful in scenarios where you need to filter results based on a portion of a string.
match_all
operator returns true
if all terms in the query are present in the field with a keyword index.
match_all
operator against a text field, it must be used in conjunction with a keyword index defined in your collection schema.match_any
operator returns true
if any term in the query is present in the field with a keyword index.
match_any
operator against a text field, it must be used in conjunction with a keyword index defined in your collection schema.add
operator can be used to add two numbers.
sub
operator can be used to subtract two numbers.
mul
operator can be used to multiply two numbers.
div
operator can be used to divide two numbers.
abs
operator returns the absolute value of a number, which is useful for calculating distances or differences.
min
operator returns the smaller of two values, commonly used for clamping or setting upper bounds. It can work with both scalar values and other fields or expressions.
max
operator returns the larger of two values, commonly used for clamping or setting lower bounds. It can work with both scalar values and other fields or expressions.
ln
operator calculates the natural logarithm, useful for logarithmic scaling and dampening large values.
exp
operator calculates the exponential function (e^x), useful for exponential scaling and boosting.
sqrt
operator calculates the square root, useful for dampening values and creating non-linear transformations.
square
operator multiplies a number by itself (x²), useful for amplifying differences and creating quadratic transformations.
topk()
and count()
collectors.
topk()
function to return the top k
results. The topk()
function accepts the following parameters:
title_similarity
, you can use the following query:
count()
function to get the total number of documents matching the query. If there are no filters then count()
will return the total number of documents in the collection.
topk
or count
function at the end.rerank()
function is used to rerank the results of a query. Read more about it in our reranking guide.
upsert
), you receive an LSN as a string that represents the sequence number of that write in the system’s log.
You can use this LSN in subsequent queries to ensure that the query only returns results that are at least as recent as that write operation.
lsn = client.collection().upsert()
, you receive an LSNclient.collection().query(..., lsn=lsn)