fix dim() (embedding dimensions) method for qdrant cloud dense encoders by shanbady · Pull Request #3079 · mitodl/mit-learn

shanbady · 2026-03-23T18:03:32Z

What are the relevant tickets?

Closes https://github.com/mitodl/hq/issues/10629

Description (What does it do?)

This PR resolves a bug that happens when attempting to use the new qdrant cloud encoder with a dense model

How can this be tested?

checkout shanbady/qdrant-upgrade
set settings.QDRANT_DENSE_MODEL = "openai/text-embedding-3-small" and settings.QDRANT_ENCODER="vector_search.encoders.qdrant_cloud.QdrantCloudEncoder"
run the following and see it fail:

from vector_search.utils import dense_encoder
encoder = dense_encoder()
encoder.dim()

checkout this branch and re-run the above and see it succeed

Additional Context

We have the option to use the cloud encoder for openai embeddings once we are migrated but will keep the legacy encoder for now until we have a chance to fully test it (seems to be working without issues afaik)

* unify key generation for point ids * fix tests * adding platform to vector key * fix tests * fixing other methods requiring point key * fix point key * fixing test * account for platform=None

* adding sparse encoder util * adding sparse encoder setting * add sparse enc * adding sparse hash encoder * adding scikit-learn * fix sparse encoder * fix topic embedding' * fix default vectorizer name * adding cloud inference capability * adding openai api key to options dict * fix limits * docstring updates * adding test * some optimizations * fixing limit for prefetch queries * hide hybrid search behind posthog feature flag * scale prefetch with offset * fix yield return * fix sparse hash threshold calculation * switching hybrid search to be a url param * remove search params from groupby * adding cache decorator to sparse encoder * fix test * fix test * add default encoding name * fix tests * fix stop_words param * adding test for hybrid flag and group_by * pinning tokenizer to None for tests * fix sparse embedding when searching

for more information, see https://pre-commit.ci

Copilot

Pull request overview

Fixes a runtime error when using the Qdrant Cloud encoder with dense embedding models by making dim() return the correct embedding vector size (needed for Qdrant collection configuration).

Changes:

Update QdrantCloudEncoder to compute embedding dimensions via litellm.get_model_info(...).
Adjust tiktoken model lookup to use the encoder’s model_short_name() (supports provider-prefixed model names like openai/...).
Minor whitespace tweak in vector_search().

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`vector_search/utils.py`	Minor formatting change (blank line) in `vector_search()` flow.
`vector_search/encoders/qdrant_cloud.py`	Fixes model token encoding lookup and adds a `dim()` implementation for Qdrant Cloud dense encoders.

Comments suppressed due to low confidence (1)

vector_search/encoders/qdrant_cloud.py:55

Add a unit test for QdrantCloudEncoder.dim() that mocks litellm.get_model_info and verifies it returns the expected embedding dimension (especially for provider-prefixed model names like openai/text-embedding-3-small). This will prevent regressions in collection creation where encoder_dense.dim() is required.

    def dim(self):
        """
        Return the dimension of the embeddings
        """
        info = litellm.get_model_info(self.model_short_name())
        return info["output_vector_size"]

vector_search/encoders/qdrant_cloud.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

mbertrand

👍

mbertrand · 2026-03-27T16:16:06Z

vector_search/encoders/qdrant_cloud.py

+    def dim(self):
+        """
+        Return the dimension of the embeddings
+        """
+        info = litellm.get_model_info(self.model_short_name())
+        if not isinstance(info, dict):
+            msg = (
+                f"Could not determine embedding dimension: litellm.get_model_info("
+                f"{self.model_short_name()!r}) returned {type(info).__name__}, "
+                "expected a dict with an 'output_vector_size' field."
+            )
+            raise TypeError(msg)
+        if "output_vector_size" not in info:
+            msg = (
+                "Could not determine embedding dimension: 'output_vector_size' "
+                f"missing from litellm.get_model_info({self.model_short_name()!r}) "
+                "response."
+            )
+            raise ValueError(msg)
+        dim = info["output_vector_size"]
+        if not isinstance(dim, int):
+            msg = (
+                "Could not determine embedding dimension: 'output_vector_size' "
+                f"from litellm.get_model_info({self.model_short_name()!r}) is of "
+                f"type {type(dim).__name__}, expected int."
+            )
+            raise TypeError(msg)
+        return dim


Would be good to have a parametrized unit test for this function but otherwise everything works, LGTM

shanbady and others added 9 commits March 2, 2026 13:39

Update qdrant point id keys (#2990)

b622020

* unify key generation for point ids * fix tests * adding platform to vector key * fix tests * fixing other methods requiring point key * fix point key * fixing test * account for platform=None

Merge branch 'main' into shanbady/qdrant-upgrade

2ef4605

Merge branch 'main' into shanbady/qdrant-upgrade

e05f9a0

Merge branch 'main' into shanbady/qdrant-upgrade

48d37a8

temporarily pin to _V2 settings for the cutover (#3077)

7658c30

fixing dim() method for qdrant cloud encoder

5b0012d

[pre-commit.ci] auto fixes from pre-commit.com hooks

bd13b9a

for more information, see https://pre-commit.ci

remove print

0c7b046

shanbady marked this pull request as ready for review March 23, 2026 18:16

shanbady changed the title ~~fixing dim() method for qdrant cloud encoder~~ fix dim() (embedding dimensions) method for qdrant cloud dense encoders Mar 24, 2026

Base automatically changed from shanbady/qdrant-upgrade to main March 25, 2026 13:30

mbertrand self-assigned this Mar 25, 2026

Merge branch 'main' into shanbady/cloud-dense-encoder-dense-fix

bb3d632

Copilot AI review requested due to automatic review settings March 26, 2026 20:05

Copilot started reviewing on behalf of shanbady March 26, 2026 20:05 View session

[pre-commit.ci] auto fixes from pre-commit.com hooks

10ac6d9

for more information, see https://pre-commit.ci

Copilot AI reviewed Mar 26, 2026

View reviewed changes

vector_search/encoders/qdrant_cloud.py Outdated Show resolved Hide resolved

vector_search/encoders/qdrant_cloud.py Outdated Show resolved Hide resolved

shanbady and others added 3 commits March 26, 2026 17:24

Update vector_search/encoders/qdrant_cloud.py

f160597

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update vector_search/encoders/qdrant_cloud.py

8a870fe

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

precommit fixes

6106c83

mbertrand approved these changes Mar 27, 2026

View reviewed changes

mbertrand added the Waiting on author label Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix dim() (embedding dimensions) method for qdrant cloud dense encoders#3079

fix dim() (embedding dimensions) method for qdrant cloud dense encoders#3079
shanbady wants to merge 14 commits intomainfrom
shanbady/cloud-dense-encoder-dense-fix

shanbady commented Mar 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mbertrand left a comment

Uh oh!

mbertrand Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shanbady commented Mar 23, 2026

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Additional Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mbertrand left a comment

Choose a reason for hiding this comment

Uh oh!

mbertrand Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants