Skip to content

fix: set PostgreSQL serial sequence MINVALUE and enable CYCLE#12505

Draft
RogerHYang wants to merge 13 commits intomainfrom
roger/pg-sequence-cycle
Draft

fix: set PostgreSQL serial sequence MINVALUE and enable CYCLE#12505
RogerHYang wants to merge 13 commits intomainfrom
roger/pg-sequence-cycle

Conversation

@RogerHYang
Copy link
Copy Markdown
Contributor

@RogerHYang RogerHYang commented Apr 2, 2026

Add a migration that sets MINVALUE to -2147483648 and enables CYCLE on all serial (32-bit) sequences. This doubles the usable ID range from ~2.1B to ~4.3B and allows sequences to wrap around instead of erroring when they reach MAXVALUE. Also update the DDL generator to emit ALTER SEQUENCE statements for sequences with non-default attributes.

Note: after CYCLE wraps around, sorting by ID will no longer correspond to insertion order (e.g. rows with negative IDs will sort before older positive-ID rows). Code that uses ID ordering as a proxy for creation time should use a timestamp column instead. Primary key collisions are theoretically possible after a full cycle, but moot at ~4.3B rows.

RogerHYang and others added 12 commits March 31, 2026 18:42
#11944)

* feat(playground): add experiment record toggle with ephemeral experiment tracking

Replace separate experimentId/experimentIsEphemeral fields on PlaygroundInstance
with a single PlaygroundInstanceExperiment object and dedicated setInstanceExperiment
action. The record toggle swaps with a running indicator during execution, and
ephemeral experiments are filtered from the "View Experiment" link.

* refactor: simplify experiment tracking in playground callbacks

- Remove redundant ?? undefined in experiment ID derivation
- Replace filter().map() with flatMap to eliminate non-null assertion
- Read recordExperiments from store snapshot instead of closing over it
  to avoid unnecessary subscription re-creation
- Fix missing deps in onCompleted callback

* fix: format TypeScript files to pass CI

* fix: resolve lint errors for unused updateInstance and missing setInstanceExperiment dep

* fix: include ephemeral experiments in compare details query and guard against undefined

When Record is OFF, experiments are created as ephemeral on the backend.
The ExperimentCompareDetailsQuery was not passing includeEphemeral: true,
so ephemeral experiments were excluded from results, causing experimentsById
to be missing the entry and crashing with "Cannot read properties of undefined
(reading 'repetitions')". Also adds a null guard in initializeSelectionState
as a defensive fallback.

* fix: include ephemeral experiments in all filterIds-based experiment queries

Extends the includeEphemeral: true fix to all remaining compare page queries
that use filterIds to look up experiments by ID. These pages are not currently
reachable with ephemeral IDs via the normal UI, but the fix prevents potential
crashes (e.g. unguarded .repetitions access in ExperimentCompareListPage) if
they ever are, and makes the behavior consistent across all query sites.

---------

Co-authored-by: Roger Yang <roger.yang@arize.com>
* chore: Remove legacy client

* Update documentation

* fix: Properly use kwargs

* fix: Clean notebook metadata and fix malformed markdown cell

Fix tutorials/log_traces_to_phoenix.ipynb which had execution_count
and outputs fields on a markdown cell, causing ruff format to fail.
Run make clean-notebooks and make format across all changed notebooks.

* Update MIGRATION.md

* Linting fixes

* Formatting fixes

* Linter fixes

* fix: Normalize Unicode escapes to literal emojis in notebooks

The clean-notebooks CI check requires literal emoji characters
rather than JSON Unicode escape sequences.
…erve/db commands (#12336)

Reorganize phoenix.server.main into a thin arg-parser dispatcher that
delegates to phoenix.server.cli.commands.{serve,db}.  This enables
standalone `db migrate` for running migrations without starting the
server — useful in container init scripts and CI pipelines.

BREAKING CHANGE: CLI flags now follow the subcommand instead of
preceding it.  For example:
  Before: phoenix --dev serve
  After:  phoenix serve --dev

Other changes:
- Remove deprecated --enable-websockets flag
- Move welcome banner into app lifespan so it prints after startup
- Update dev scripts (package.json, start-phoenix.sh) for new arg order
- Relocate grpc port tests to tests/unit/server/cli/commands/test_serve.py
- Document migration steps in MIGRATION.md (v14)
* feat(experiments): background experiment runner with lifecycle management (backend)

* add nullable prompt_version_id

* feat(experiments): background experiment runner frontend (#12380)

* feat(experiments): background experiment runner frontend changes

* feat(experiments): add search field and column selector to experiments table

* switch to badge

* feat(experiments): improve experiments table UX

- Persist column visibility and sizing per dataset to localStorage
- Add experiment ID column with copy-to-clipboard (hidden by default)
- Add tooltips with arrows to job status badges
- Rename status column to "job status", use medium badge size
- Fix stale data by adding store-and-network fetch policy
- Make Icon font-size relative (em) so it scales with parent
- Use info notification for experiment stopped instead of success
- Update navigation dialog copy and button order

* fix: actually show the error

* feat: add an ExperimentDetailsPage

* chore: fmt

* fix lint

---------

Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikeldking@gmail.com>

* fix schema gen

* use .map() for transformation functions

* dedup annotations by name

* revert .claude

* feat: add experiment job columns, hide by default (#12408)

* fix: experiment runner resume bugs, eval race condition, and polymorphic loading

- Fix polymorphic lazy loading errors by using with_polymorphic("*") for
  ExperimentEvent queries and expunging ORM objects from session
- Fix _ensure_eval_buffer race with on_task_success by deferring eval
  scanning until all tasks are done, preventing duplicate eval jobs
- Use upsert for annotation persistence to handle pre-existing failed
  annotations on resume, with RETURNING to populate annotation IDs
- Preserve eval exception tracebacks via error_exc on EvaluationResult
- Redact common path prefixes from stack traces stored in DB
- Fix migration downgrade to drop correct tables

* rename experiment execution configs to experiment jobs

* clean up

* feat(experiments): rename experiment events to logs and harden errors

- Rename experiment_events to experiment_logs and polymorphic models (task/eval/job logs).
- Rename SYSTEM error category to EXPERIMENT across DB, GraphQL, and runner.
- Return experiment error stack traces only to admins when authentication is enabled.
- Rename migration module to add_experiment_jobs; update experiment_runner and tests.
- Prepend basename on playground "View experiment" compare link.

* remove unused experiment_error.py (superseded by experiment_log.py)

* refactor(api): rename Experiment.backgroundJob and chat connection input

- Expose experiment runner as Experiment.job instead of backgroundJob; align UI and Relay.
- Rename ChatCompletionInput/OverDataset connection field to connectionConfig; update subscriptions and playground payloads.

---------

Co-authored-by: Mikyo King <mikyo@arize.com>
Co-authored-by: Mikyo King <mikeldking@gmail.com>
Co-authored-by: Alexander Song <axiomofjoy@gmail.com>
* chore: Remove protobuf evaluation pipeline

Remove the protobuf-based single-evaluation ingestion pipeline
end-to-end. The pyarrow annotation pipeline (POST /v1/evaluations
with application/x-pandas-arrow) is now the sole evaluation
ingestion path.

Deleted modules:
- proto/trace/v1/evaluation.proto + generated pb2/pyi
- db/insertion/evaluation.py (protobuf → SQL inserter)
- session/evaluation.py (DataFrame → pb.Evaluation encoder)
- trace/exporter.py (HttpExporter protobuf transport)

Modified:
- POST /v1/evaluations now only accepts application/x-pandas-arrow
- BulkInserter: removed evaluation queue and insertion loop
- Scaffolder: fixture evals now ingested via annotation precursors
- launch_app(): no longer auto-ingests TraceDataset.evaluations
  (documented in MIGRATION.md)

* fix: Resolve CI failures from protobuf evaluation pipeline removal

- Regenerate OpenAPI TypeScript types to reflect protobuf removal
- Add grpc_port=0 to unit test app fixture (replaces removed patch_grpc_server)
- Restore PHOENIX_GRPC_PORT in integration test env (server still needs it for OTLP)
- Fix isort ordering for phoenix.client imports in both conftest files

* fix: Add else branch for unrecognized eval index in fixture loader

The logger.info ran unconditionally even when neither if/elif branch
matched, falsely reporting annotations were enqueued. Add an else
branch that warns and returns early for unrecognized index patterns.

* fix: Revert out-of-scope changes

* fix: More cleanup

* refactor: preserve startup evaluations without protobuf

* fix: resolve mypy type-var error for pd.Series in evaluations
…12407)

Consolidate on asyncpg as the sole PostgreSQL driver. Previously,
migrations used a separate sync psycopg engine while the server used
asyncpg. Now both paths use asyncpg via SQLAlchemy's run_sync bridge,
preserving Mode 3 (external transaction) semantics.

Key changes:
- engines.py: remove sync psycopg engine creation; create a disposable
  AsyncEngine with NullPool for migrations (with correct SSL/IAM config)
  to avoid cross-event-loop pool contamination; refactor
  aio_postgresql_engine for clarity
- migrate.py: accept a disposable AsyncEngine, run migrations via
  run_sync, and dispose after use
- env.py: add `if connection.in_transaction(): connection.commit()`
  before explicit begin() to handle run_sync's auto-begun transaction
- pg_config.py: remove psycopg branch, simplify to asyncpg-only
- pyproject.toml: remove psycopg[binary,pool] from dev and pg deps
- populate_project_sessions.py: switch to async engine with run_sync
- MIGRATION.md: document psycopg removal for v14
- tests: update unit and integration tests to remove psycopg usage
Failed experiment tasks and evals were silently dropped — no DB record was
written when the LLM/evaluator returned a non-retryable error or when retries
exhausted. This left gaps in experiment results (e.g. 30 of 512 examples missing).

Root cause of UniqueViolationError on persist: get_db_experiment_run set
trace= (SQLAlchemy relationship) on ExperimentRun, causing cascade auto-INSERT
that bypassed the ON CONFLICT DO UPDATE upsert in _persist_run.

Changes:
- Persist ExperimentRun with error for permanent task failures
- Persist error annotations for permanent eval failures
- Persist error records when retries exhaust (_persist_exhausted_retry)
- Fix UniqueViolationError by removing trace= relationship from ExperimentRun
  construction (trace_id FK is sufficient for _persist_run upsert)
- Use _persist_run (upsert) instead of raw session.add for template errors
- Move success-path code to try/except/else for clearer separation
- Extract _persist_eval_results to deduplicate annotation upsert logic
- Move eval broadcasting from on_eval_success into EvalWorkItem.execute()
- Remove notify_subscribers from on_failure (now purely bookkeeping)
- Enforce consistent structure: persist → (fail? on_failure + return) → broadcast → callback
- Add exc_info=True to all persist failure warnings for diagnostics
- Add unit tests for task and eval error persistence paths
- Document operating principles in WorkItem docstring
…12464)

Add optional span and experimentRun fields to ChatCompletionSubscriptionError
so the frontend can display partial trace data (latency, token counts, cost)
and link to the persisted error run for failed experiment examples.
Adds periodic refetching (every 3s) to ExperimentsTable when any
experiment has a RUNNING job status, providing live progress updates.
Polling stops automatically when all jobs complete.
fix(experiments): filter errors by level, use LATERAL join, rename to ExperimentLog

- Filter lastError dataloader and errors resolver to level='ERROR' only
- Rename GraphQL type ExperimentError → ExperimentLog, add level field
- Rename subtables experiment_*_events → experiment_*_logs
- Add typed aliases ExperimentLogCategory and ExperimentLogLevel
- Replace composite index with partial index WHERE level='ERROR'
- Use LATERAL join on Postgres for index-only lookup; ROW_NUMBER fallback on SQLite
- Replace correlated subquery with window function for SQLite compat
@RogerHYang RogerHYang requested a review from a team as a code owner April 2, 2026 04:07
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Apr 2, 2026
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 2, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 2, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@RogerHYang RogerHYang force-pushed the roger/pg-sequence-cycle branch 6 times, most recently from daea61b to b1e74ca Compare April 2, 2026 05:13
Add a migration that sets MINVALUE to -2147483648 and enables CYCLE on
all serial (32-bit) sequences. This doubles the usable ID range from
~2.1B to ~4.3B and allows sequences to wrap around instead of erroring
when they reach MAXVALUE. Also update the DDL generator to emit
ALTER SEQUENCE statements for sequences with non-default attributes.

Caveats:
- After CYCLE wraps around, sorting by ID will no longer correspond to
  insertion order. Code that uses ID ordering as a proxy for creation
  time should use a timestamp column instead.
- Primary key collisions are possible after a full cycle if any row
  still holds an ID from the previous cycle — this doesn't require a
  large table, just a single surviving row at the colliding ID.
  Migrating to bigint after a wrap-around does not help because the
  colliding rows already occupy the wrapped-to IDs. The fix would be
  to migrate to bigint and set the next value past the largest existing
  ID (e.g. 2147483648), resuming in the range above the old int range.
@RogerHYang RogerHYang force-pushed the roger/pg-sequence-cycle branch from b1e74ca to 852df8e Compare April 2, 2026 05:21
@RogerHYang RogerHYang marked this pull request as draft April 2, 2026 19:02
@anticorrelator
Copy link
Copy Markdown
Contributor

Concern: ID ordering invariants break after sequence wrap-around

The CYCLE behavior means that after the sequence reaches MAXVALUE (2,147,483,647), newly inserted rows will receive IDs starting from MINVALUE (-2,147,483,648). This is correct from a sequence exhaustion standpoint, but it breaks a pervasive implicit invariant throughout the codebase: higher ID = newer record.

1. Cursor-based pagination and "newest first" ordering

The REST and GraphQL pagination layers use ORDER BY id DESC with keyset pagination (WHERE id <= cursor_rowid) across ~20+ endpoints in src/phoenix/server/api/routers/v1/ (spans, traces, annotations, datasets, projects, prompts, experiments, users, sessions) and the GraphQL layer (queries.py, types/Project.py, types/Trace.py).

After wrap-around, newly inserted records with negative IDs sort after all positive IDs in descending order. Users paginating "newest first" would not see post-wrap records on the first page — they'd appear at the very end of results instead.

2. MAX(id) and row_number(ORDER BY id) as "latest version" / "sequence number" proxies

Several data loaders use ID ordering as a proxy for recency or sequence:

  • LatestPromptVersionIdDataLoader (latest_prompt_version_ids.py:31) uses func.max(PromptVersion.id) to find the latest version (comment in code: "max version id (which is the latest)"). After wrap, MAX() returns the highest positive ID — an older version — not the newest negative-ID version. Same pattern exists in dataset example revisions across Dataset.py, dataset_mutations.py, and experiments.py.

  • Sequence numbering in ExperimentSequenceNumberDataLoader (experiment_sequence_number.py:30) and PromptVersionSequenceNumberDataLoader (prompt_version_sequence_number.py:25) use row_number().over(order_by=Model.id). After wrap, negative IDs sort first in ASC order, scrambling ordinals for all existing records.

  • PromptVersion.previous_version (types/PromptVersion.py:90) uses WHERE id < self.id_attr — a post-wrap version with a negative ID would lose its link to all pre-wrap versions since every positive ID is "greater than" every negative ID.

These are silent logic errors — no exception is thrown, just wrong data returned to users.

Practical likelihood

This only manifests after ~2.1B rows per table, which is extremely unlikely for most deployments. But the failure mode is subtle (wrong results, not errors), making it harder to diagnose when it does occur.

Possible mitigation strategies

  1. Document the invariant break: At minimum, add a comment block in the migration (and ideally in the PR description) listing the code paths that assume ID monotonicity, so future maintainers know what to audit if the BIGINT migration hasn't happened by the time any table approaches 2B rows.

  2. Track a BIGINT migration for high-volume tables: If spans and traces are expected to be the highest-volume tables, migrating their PKs to BIGINT would eliminate the wrap-around scenario entirely for practical purposes. This could be a follow-up tracked issue.

  3. Add created_at-based ordering where correctness matters: For the "latest version" lookups and sequence numbering, switching from MAX(id) / ORDER BY id to timestamp-based ordering would make these code paths resilient to ID wrap-around regardless of column type.

  4. Monitoring/alerting: Add a check (or document one for operators) that alerts when any sequence's last_value exceeds a threshold (e.g., 80% of MAXVALUE), giving time to execute a BIGINT migration before wrap-around occurs.

None of these need to block this PR if the team accepts the trade-off consciously — the immediate benefit of preventing hard sequence exhaustion errors is real. But the affected code paths should be documented so the team can prioritize accordingly.

@RogerHYang RogerHYang force-pushed the version-14 branch 3 times, most recently from 7eb4e12 to 0b930ee Compare April 7, 2026 14:42
Base automatically changed from version-14 to main April 7, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Status: 📘 Todo

Development

Successfully merging this pull request may close these issues.

4 participants