fix: set PostgreSQL serial sequence MINVALUE and enable CYCLE#12505
fix: set PostgreSQL serial sequence MINVALUE and enable CYCLE#12505RogerHYang wants to merge 13 commits intomainfrom
Conversation
#11944) * feat(playground): add experiment record toggle with ephemeral experiment tracking Replace separate experimentId/experimentIsEphemeral fields on PlaygroundInstance with a single PlaygroundInstanceExperiment object and dedicated setInstanceExperiment action. The record toggle swaps with a running indicator during execution, and ephemeral experiments are filtered from the "View Experiment" link. * refactor: simplify experiment tracking in playground callbacks - Remove redundant ?? undefined in experiment ID derivation - Replace filter().map() with flatMap to eliminate non-null assertion - Read recordExperiments from store snapshot instead of closing over it to avoid unnecessary subscription re-creation - Fix missing deps in onCompleted callback * fix: format TypeScript files to pass CI * fix: resolve lint errors for unused updateInstance and missing setInstanceExperiment dep * fix: include ephemeral experiments in compare details query and guard against undefined When Record is OFF, experiments are created as ephemeral on the backend. The ExperimentCompareDetailsQuery was not passing includeEphemeral: true, so ephemeral experiments were excluded from results, causing experimentsById to be missing the entry and crashing with "Cannot read properties of undefined (reading 'repetitions')". Also adds a null guard in initializeSelectionState as a defensive fallback. * fix: include ephemeral experiments in all filterIds-based experiment queries Extends the includeEphemeral: true fix to all remaining compare page queries that use filterIds to look up experiments by ID. These pages are not currently reachable with ephemeral IDs via the normal UI, but the fix prevents potential crashes (e.g. unguarded .repetitions access in ExperimentCompareListPage) if they ever are, and makes the behavior consistent across all query sites. --------- Co-authored-by: Roger Yang <roger.yang@arize.com>
* chore: Remove legacy client * Update documentation * fix: Properly use kwargs * fix: Clean notebook metadata and fix malformed markdown cell Fix tutorials/log_traces_to_phoenix.ipynb which had execution_count and outputs fields on a markdown cell, causing ruff format to fail. Run make clean-notebooks and make format across all changed notebooks. * Update MIGRATION.md * Linting fixes * Formatting fixes * Linter fixes * fix: Normalize Unicode escapes to literal emojis in notebooks The clean-notebooks CI check requires literal emoji characters rather than JSON Unicode escape sequences.
…erve/db commands (#12336) Reorganize phoenix.server.main into a thin arg-parser dispatcher that delegates to phoenix.server.cli.commands.{serve,db}. This enables standalone `db migrate` for running migrations without starting the server — useful in container init scripts and CI pipelines. BREAKING CHANGE: CLI flags now follow the subcommand instead of preceding it. For example: Before: phoenix --dev serve After: phoenix serve --dev Other changes: - Remove deprecated --enable-websockets flag - Move welcome banner into app lifespan so it prints after startup - Update dev scripts (package.json, start-phoenix.sh) for new arg order - Relocate grpc port tests to tests/unit/server/cli/commands/test_serve.py - Document migration steps in MIGRATION.md (v14)
* feat(experiments): background experiment runner with lifecycle management (backend) * add nullable prompt_version_id * feat(experiments): background experiment runner frontend (#12380) * feat(experiments): background experiment runner frontend changes * feat(experiments): add search field and column selector to experiments table * switch to badge * feat(experiments): improve experiments table UX - Persist column visibility and sizing per dataset to localStorage - Add experiment ID column with copy-to-clipboard (hidden by default) - Add tooltips with arrows to job status badges - Rename status column to "job status", use medium badge size - Fix stale data by adding store-and-network fetch policy - Make Icon font-size relative (em) so it scales with parent - Use info notification for experiment stopped instead of success - Update navigation dialog copy and button order * fix: actually show the error * feat: add an ExperimentDetailsPage * chore: fmt * fix lint --------- Co-authored-by: Mikyo King <mikyo@arize.com> Co-authored-by: Mikyo King <mikeldking@gmail.com> * fix schema gen * use .map() for transformation functions * dedup annotations by name * revert .claude * feat: add experiment job columns, hide by default (#12408) * fix: experiment runner resume bugs, eval race condition, and polymorphic loading - Fix polymorphic lazy loading errors by using with_polymorphic("*") for ExperimentEvent queries and expunging ORM objects from session - Fix _ensure_eval_buffer race with on_task_success by deferring eval scanning until all tasks are done, preventing duplicate eval jobs - Use upsert for annotation persistence to handle pre-existing failed annotations on resume, with RETURNING to populate annotation IDs - Preserve eval exception tracebacks via error_exc on EvaluationResult - Redact common path prefixes from stack traces stored in DB - Fix migration downgrade to drop correct tables * rename experiment execution configs to experiment jobs * clean up * feat(experiments): rename experiment events to logs and harden errors - Rename experiment_events to experiment_logs and polymorphic models (task/eval/job logs). - Rename SYSTEM error category to EXPERIMENT across DB, GraphQL, and runner. - Return experiment error stack traces only to admins when authentication is enabled. - Rename migration module to add_experiment_jobs; update experiment_runner and tests. - Prepend basename on playground "View experiment" compare link. * remove unused experiment_error.py (superseded by experiment_log.py) * refactor(api): rename Experiment.backgroundJob and chat connection input - Expose experiment runner as Experiment.job instead of backgroundJob; align UI and Relay. - Rename ChatCompletionInput/OverDataset connection field to connectionConfig; update subscriptions and playground payloads. --------- Co-authored-by: Mikyo King <mikyo@arize.com> Co-authored-by: Mikyo King <mikeldking@gmail.com> Co-authored-by: Alexander Song <axiomofjoy@gmail.com>
* chore: Remove protobuf evaluation pipeline Remove the protobuf-based single-evaluation ingestion pipeline end-to-end. The pyarrow annotation pipeline (POST /v1/evaluations with application/x-pandas-arrow) is now the sole evaluation ingestion path. Deleted modules: - proto/trace/v1/evaluation.proto + generated pb2/pyi - db/insertion/evaluation.py (protobuf → SQL inserter) - session/evaluation.py (DataFrame → pb.Evaluation encoder) - trace/exporter.py (HttpExporter protobuf transport) Modified: - POST /v1/evaluations now only accepts application/x-pandas-arrow - BulkInserter: removed evaluation queue and insertion loop - Scaffolder: fixture evals now ingested via annotation precursors - launch_app(): no longer auto-ingests TraceDataset.evaluations (documented in MIGRATION.md) * fix: Resolve CI failures from protobuf evaluation pipeline removal - Regenerate OpenAPI TypeScript types to reflect protobuf removal - Add grpc_port=0 to unit test app fixture (replaces removed patch_grpc_server) - Restore PHOENIX_GRPC_PORT in integration test env (server still needs it for OTLP) - Fix isort ordering for phoenix.client imports in both conftest files * fix: Add else branch for unrecognized eval index in fixture loader The logger.info ran unconditionally even when neither if/elif branch matched, falsely reporting annotations were enqueued. Add an else branch that warns and returns early for unrecognized index patterns. * fix: Revert out-of-scope changes * fix: More cleanup * refactor: preserve startup evaluations without protobuf * fix: resolve mypy type-var error for pd.Series in evaluations
…12407) Consolidate on asyncpg as the sole PostgreSQL driver. Previously, migrations used a separate sync psycopg engine while the server used asyncpg. Now both paths use asyncpg via SQLAlchemy's run_sync bridge, preserving Mode 3 (external transaction) semantics. Key changes: - engines.py: remove sync psycopg engine creation; create a disposable AsyncEngine with NullPool for migrations (with correct SSL/IAM config) to avoid cross-event-loop pool contamination; refactor aio_postgresql_engine for clarity - migrate.py: accept a disposable AsyncEngine, run migrations via run_sync, and dispose after use - env.py: add `if connection.in_transaction(): connection.commit()` before explicit begin() to handle run_sync's auto-begun transaction - pg_config.py: remove psycopg branch, simplify to asyncpg-only - pyproject.toml: remove psycopg[binary,pool] from dev and pg deps - populate_project_sessions.py: switch to async engine with run_sync - MIGRATION.md: document psycopg removal for v14 - tests: update unit and integration tests to remove psycopg usage
Failed experiment tasks and evals were silently dropped — no DB record was written when the LLM/evaluator returned a non-retryable error or when retries exhausted. This left gaps in experiment results (e.g. 30 of 512 examples missing). Root cause of UniqueViolationError on persist: get_db_experiment_run set trace= (SQLAlchemy relationship) on ExperimentRun, causing cascade auto-INSERT that bypassed the ON CONFLICT DO UPDATE upsert in _persist_run. Changes: - Persist ExperimentRun with error for permanent task failures - Persist error annotations for permanent eval failures - Persist error records when retries exhaust (_persist_exhausted_retry) - Fix UniqueViolationError by removing trace= relationship from ExperimentRun construction (trace_id FK is sufficient for _persist_run upsert) - Use _persist_run (upsert) instead of raw session.add for template errors - Move success-path code to try/except/else for clearer separation - Extract _persist_eval_results to deduplicate annotation upsert logic - Move eval broadcasting from on_eval_success into EvalWorkItem.execute() - Remove notify_subscribers from on_failure (now purely bookkeeping) - Enforce consistent structure: persist → (fail? on_failure + return) → broadcast → callback - Add exc_info=True to all persist failure warnings for diagnostics - Add unit tests for task and eval error persistence paths - Document operating principles in WorkItem docstring
…12464) Add optional span and experimentRun fields to ChatCompletionSubscriptionError so the frontend can display partial trace data (latency, token counts, cost) and link to the persisted error run for failed experiment examples.
Adds periodic refetching (every 3s) to ExperimentsTable when any experiment has a RUNNING job status, providing live progress updates. Polling stops automatically when all jobs complete.
fix(experiments): filter errors by level, use LATERAL join, rename to ExperimentLog - Filter lastError dataloader and errors resolver to level='ERROR' only - Rename GraphQL type ExperimentError → ExperimentLog, add level field - Rename subtables experiment_*_events → experiment_*_logs - Add typed aliases ExperimentLogCategory and ExperimentLogLevel - Replace composite index with partial index WHERE level='ERROR' - Use LATERAL join on Postgres for index-only lookup; ROW_NUMBER fallback on SQLite - Replace correlated subquery with window function for SQLite compat
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
daea61b to
b1e74ca
Compare
Add a migration that sets MINVALUE to -2147483648 and enables CYCLE on all serial (32-bit) sequences. This doubles the usable ID range from ~2.1B to ~4.3B and allows sequences to wrap around instead of erroring when they reach MAXVALUE. Also update the DDL generator to emit ALTER SEQUENCE statements for sequences with non-default attributes. Caveats: - After CYCLE wraps around, sorting by ID will no longer correspond to insertion order. Code that uses ID ordering as a proxy for creation time should use a timestamp column instead. - Primary key collisions are possible after a full cycle if any row still holds an ID from the previous cycle — this doesn't require a large table, just a single surviving row at the colliding ID. Migrating to bigint after a wrap-around does not help because the colliding rows already occupy the wrapped-to IDs. The fix would be to migrate to bigint and set the next value past the largest existing ID (e.g. 2147483648), resuming in the range above the old int range.
b1e74ca to
852df8e
Compare
Concern: ID ordering invariants break after sequence wrap-aroundThe CYCLE behavior means that after the sequence reaches MAXVALUE (2,147,483,647), newly inserted rows will receive IDs starting from MINVALUE (-2,147,483,648). This is correct from a sequence exhaustion standpoint, but it breaks a pervasive implicit invariant throughout the codebase: higher ID = newer record. 1. Cursor-based pagination and "newest first" orderingThe REST and GraphQL pagination layers use After wrap-around, newly inserted records with negative IDs sort after all positive IDs in descending order. Users paginating "newest first" would not see post-wrap records on the first page — they'd appear at the very end of results instead. 2.
|
7eb4e12 to
0b930ee
Compare
Add a migration that sets MINVALUE to -2147483648 and enables CYCLE on all serial (32-bit) sequences. This doubles the usable ID range from ~2.1B to ~4.3B and allows sequences to wrap around instead of erroring when they reach MAXVALUE. Also update the DDL generator to emit ALTER SEQUENCE statements for sequences with non-default attributes.
Note: after CYCLE wraps around, sorting by ID will no longer correspond to insertion order (e.g. rows with negative IDs will sort before older positive-ID rows). Code that uses ID ordering as a proxy for creation time should use a timestamp column instead. Primary key collisions are theoretically possible after a full cycle, but moot at ~4.3B rows.