Skip to content

fix: restore throughput timeline hydration and Gemini timeouts#122

Open
RobertTLange wants to merge 18 commits intomainfrom
conc_reset
Open

fix: restore throughput timeline hydration and Gemini timeouts#122
RobertTLange wants to merge 18 commits intomainfrom
conc_reset

Conversation

@RobertTLange
Copy link
Copy Markdown
Collaborator

@RobertTLange RobertTLange commented Apr 12, 2026

What changed

  • centralize LLM timeout and retry policy across providers, and convert Gemini client timeouts to the millisecond unit expected by google-genai
  • restore async runner slot reservation and completed-job persistence safety, with added regression coverage for duplicate persistence and slot reuse edge cases
  • stop persisting sampling/evaluation worker lane ids and fall back to timestamp-based throughput lane inference for runtime views
  • disable browser caching for the local WebUI HTML shells and fix Throughput tab hydration so right_tab=throughput restores reliably after data loads
  • improve the Throughput runtime timeline UI with adaptive per-worker row sizing and a default-visible Hide Plot / Show Plot toggle
  • improve the local WebUI UX by adding dashboard sorting controls and surfacing the active results directory in the meta header
  • fix the compare view so negative best scores remain visible instead of being clamped away by summary rendering
  • update the unreleased changelog to cover the latest runtime and WebUI changes

Why

  • Gemini requests were timing out too aggressively because the Google client interpreted the shared timeout value in milliseconds, not seconds
  • async runs were vulnerable to slot-accounting and duplicate-persistence regressions under retries and overlapping completions
  • the throughput runtime timeline was both fragile and hard to read: hydration could fail after load, worker lanes could regress, and dense runs compressed multiple workers into unreadable rows
  • the local WebUI still had a few workflow rough edges around stale cached JS, result discovery, dashboard scanning, and compare summaries

How to test

  • pytest -q tests/test_llm_client_backends.py tests/test_llm_retry_constants.py tests/test_gemini_provider.py
  • pytest -q tests/test_async_runner_recovery.py tests/test_pipeline_timing.py tests/test_runtime_timeline_webui.py
  • pytest -q tests/test_compare_webui.py tests/test_index_webui.py
  • uv run ruff check tests --exclude tests/file.py
  • uv run mypy --follow-imports=skip --ignore-missing-imports tests/test_*.py tests/conftest.py
  • uv run --with pytest-cov pytest -q -m "not requires_secrets" --cov=shinka --cov-report=term-missing --cov-report=xml:coverage.xml

Context

  • the branch now combines backend safety fixes, timeout normalization, and a set of local WebUI quality-of-life fixes needed to make throughput analysis and local monitoring reliable again
  • commits are kept split by concern so reviewers can inspect runtime safety, WebUI behavior, and changelog/docs updates independently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant