Reranking

Vector search or keyword-based functions (like BM25 or ANN) can fetch the right ballpark of results quickly, but the order might not reflect true relevance. Reranking is used to improve relevance and accuracy. TopK comes with a built-in reranking functionality using the rerank() function.

`rerank()`

The rerank() function is called on the query instance and accepts the following parameters:

model

string

default:"cohere/rerank-v3.5"

The model to use for reranking. Currently, only cohere/rerank-v3.5 is supported.

query

string

The query text to rerank against. Uses arguments from semantic_similarity() function if not specified.

fields

string[]

List of fields to use for reranking. Uses fields from semantic_similarity() function if not specified.

topk_multiple

Multiple of top-k to rerank. For example, if topk=10 and topk_multiple=2, reranker takes 20 results from the original query and returns the top 10 results.

.rerank()

# or

.rerank(
    model="cohere/rerank-v3.5",
    query="catcher",
    fields=["title", "description"],
    topk_multiple=2
)

Lexical scoring with reranking

Consider the following example for searching documents with the "how to reset a router" query:

client.collections().create(
    "documents",
    schema={
        "title": text().index(semantic_index()).required(),
        "content": text().index(semantic_index()).required(),
    }
)

client.collection(name).upsert([
    {
        "_id": "1",
        "title": "How to reset router settings to fix intermittent connectivity",
        "content": "To fix intermittent connectivity issues related to your device, you need to unplug your modem or router for 10 seconds, then plug them back in. This will cause a reset of the network hardware and often resolves temporary disruptions.",
    },
    {
        "_id": "2",
        "title": "How to reset your device",
        "content": "Factory reset procedures vary by device. Press and hold the reset button for 10 seconds. This clears all settings, returning the router to default configuration.",
    },
])

client.collection(name).query(
    select(
        "title",
        "content",
        title_score=fn.bm25_score()
    )
    .filter(match("how to reset a router", field="title"))
    .topk(field("title_score"), 10)
)

# Results:

[
    {
        "_id": "1",
        "title": "How to reset router settings to fix intermittent connectivity",
        "title_score": 0.942161500453949,
        "content": "To fix intermittent connectivity issues related to your device, you need to unplug your modem or router for 10 seconds, then plug them back in. This will cause a reset of the network hardware and often resolves temporary disruptions.",
    },
    {
        "_id": "2",
        "title": "How to reset your device",
        "title_score": 0.4156554341316223,
        "content": "Factory reset procedures vary by device. Press and hold the reset button for 10 seconds. This clears all settings, returning the router to default configuration.",
    }
]

Lexical scoring methods like BM25 rely on exact term overlap. Here’s how they score: Document "1" is ranked higher because:

It contains "how", "to", "reset", "router" (some of which match the query directly).
Phrases like "how to reset router" appear close together.

Document "2" is ranked lower because:

Although it describes a true factory reset, it doesn’t include the exact phrase "how to reset a router" but includes "how to reset your device".
The term "router" might not appear (assume it uses "device").

Lexical scoring is useful for keyword-based search, but it doesn’t capture the true user intent - a guide on how to reset a router.

using rerank() to boost relevant results

Let’s apply the rerank() passing the "how to reset a router" query matching against the "content" field to rerank the results:

client.collection(name).query(
    select(
        "title",
        "content",
        title_score=fn.bm25_score(),
    )
    .filter(match("how to reset a router", field="title"))
    .topk(field("title_score"), 10)
    .rerank(
        fields=["content"],
        query="how to reset a router"
    )
)

# Results:

[
    {
        "_id": "2",
        "title": "How to reset your device",
        "title_score": 0.4156554341316223,
        "content": "Factory reset procedures vary by device. Press and hold the reset button for 10 seconds. This clears all settings, returning the router to default configuration.",
        "_rerank_score": 0.8374209403991699
    },
    {
        "_id": "1",
        "title": "How to reset router settings to fix intermittent connectivity",
        "title_score": 0.942161500453949,
        "content": "To fix intermittent connectivity issues related to your device, you need to unplug your modem or router for 10 seconds, then plug them back in. This will cause a reset of the network hardware and often resolves temporary disruptions.",
        "_rerank_score": 0.6334335207939148
    }
]

As you can see, using the rerank() function, the relevance of the results is improved - the actual guide on how to reset a router is ranked higher, even though keyword "router" does not appear in the "title" field.

Get Started

Guides

Document API

Collection API

`rerank()`

Lexical scoring with reranking

using rerank() to boost relevant results

Get Started

Guides

Document API

Collection API

​rerank()

​Lexical scoring with reranking

​using rerank() to boost relevant results

`rerank()`

Lexical scoring with reranking

using rerank() to boost relevant results