fix(testing): Fix Kyutai Speech-To-Text, LLaVA-OneVision, and LongCatFlash test failures on main CI by harshaljanjani · Pull Request #44695 · huggingface/transformers

harshaljanjani · 2026-03-14T09:05:35Z

What does this PR do?

The following failing tests were identified and fixed in this PR:

→ Kyutai Speech-To-Text: The PR [processors] Unbloating simple processors, refactored ProcessorMixin.call to use explicit keyword-only params instead of accepting positional arguments; but the KyutaiSTT integration tests were still calling processor(samples) positionally; the audio samples in the current state mapped to the images param.
→ LLaVA-OneVision: The PR Load a tiny video to make CI faster introduced local video file path mappings. LlavaOnevision's setUpClass was still building paths to Big_Buck_Bunny_720_10s_10MB.mp4 and sample_demo_1.mp4 in the repo root.
→ LongCatFlash: The PR [V5] Return a BatchEncoding dict from apply_chat_template by default again changed apply_chat_template to return BatchEncoding dict instead of a tensor. The test was passing this dict directly to model.generate and tried to access .shape on the dict; this fixes that :)

Note: The test still fails with an AssertionError, I'm not too sure and it could be flaky, but the crash should be resolved :)

cc: @Rocketknight1 @zucchini-nlp

CI Failures

Before the fix (feel free to cross-check; these errors are reproducible):

After the fix (feel free to cross-check):

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you fix any necessary existing tests?

… main

github-actions · 2026-03-14T09:06:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: kyutai_speech_to_text, llava_onevision, longcat_flash

harshaljanjani · 2026-03-18T10:01:50Z

cc: @Rocketknight1 @zucchini-nlp Just a gentle ping :)

zucchini-nlp

left one comment, otherwise lgmt

zucchini-nlp · 2026-03-18T10:47:29Z

tests/models/llava_onevision/test_processing_llava_onevision.py

-        local_videos = [
-            os.path.join(repo_root, "Big_Buck_Bunny_720_10s_10MB.mp4"),
-            os.path.join(repo_root, "sample_demo_1.mp4"),
-        ]


we also load local images above from the same root path. IIRC we made sure these artifacts are cached when loading from hub so I dont know why they are being created here

cc @ydshieh for this

++, I didn't notice this previously but now that I've read into it a bit more, I guess even this image creation block isn't needed either for the same reason (ref: 05c0e1d); just the local_tiny_video logic staying intact should suffice but I'd love to know if I'm missing something.

fix: Fix KyutaiSTT, LlavaOnevision, and LongcatFlash test failures on…

28557c3

… main

harshaljanjani marked this pull request as ready for review March 14, 2026 09:12

github-actions bot requested a review from ydshieh March 14, 2026 09:12

zucchini-nlp reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(testing): Fix Kyutai Speech-To-Text, LLaVA-OneVision, and LongCatFlash test failures on main CI #44695

fix(testing): Fix Kyutai Speech-To-Text, LLaVA-OneVision, and LongCatFlash test failures on main CI #44695
harshaljanjani wants to merge 1 commit intohuggingface:mainfrom
harshaljanjani:fix/kyutai-llava-longcat-test-failures

harshaljanjani commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

harshaljanjani commented Mar 18, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

zucchini-nlp Mar 18, 2026

Uh oh!

harshaljanjani Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

harshaljanjani commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

harshaljanjani commented Mar 18, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

harshaljanjani Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

harshaljanjani commented Mar 14, 2026 •

edited

Loading