Skip to content

Add translator support for hgvs uncertain ranges#632

Open
theferrit32 wants to merge 14 commits intoga4gh:v3from
theferrit32:feature/609-uncertain-ranges
Open

Add translator support for hgvs uncertain ranges#632
theferrit32 wants to merge 14 commits intoga4gh:v3from
theferrit32:feature/609-uncertain-ranges

Conversation

@theferrit32
Copy link
Copy Markdown
Contributor

No description provided.

theferrit32 and others added 14 commits April 21, 2026 18:55
Closes ga4gh#609.

ClinVar reports CNVs with imprecise breakpoints as nested parenthesized
HGVS (e.g. (A_B)_(C_D)dup, (?_N)_(M_?)del). hgvs 2.0.0a0 parses these as
nested Interval positions; vrs-python now maps them to VRS Range values
in SequenceLocation.start/end.

- pyproject.toml: bump hgvs>=2.0.0a0,<3.0.
- hgvs_tools.py: add _hgvs_pos_to_vrs / _vrs_pos_to_hgvs helpers that
  handle 1-based inclusive ↔ 0-based interbase shift per side.
- CnvTranslator._from_hgvs: route through the helper; emits Range
  start/end for uncertain bounds.
- HgvsTools.get_position_and_state: raise a clear ValueError when an
  Allele path is given an uncertain-range expression (LiteralSequence
  Expression can't represent unknown reference).
- HgvsTools._to_sequence_variant: emit nested uncertain Intervals when
  a VRS location carries a Range, wrapping the certain side in a
  non-uncertain Interval so hgvs's formatter can compare start/end.
- Tests: no-network unit tests for the helpers plus parametrized
  CnvTranslator tests and AlleleTranslator from/to tests (VCR).
- Add notebook 7_Uncertain_Range_HGVS.ipynb demoing the feature.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hgvs 2.0.0a0 parses the exact side of mixed forms like g.100_(200_?)del
as Interval(uncertain=False), not SimplePosition. Treating any Interval
as uncertain produced degenerate Range([N,N]) positions and diverging
SequenceLocation digests for the same variant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the rejection from get_position_and_state into extract_allele_values
right after parse(), so the failure is signaled as a representation limit
rather than a data-unavailability error. This also makes the raises test
network-free; drop its @pytest.mark.vcr.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cassettes for the four CnvTranslator._from_hgvs parametrizations and
the AlleleTranslator._to_hgvs round-trip. Recorded against local
seqrepo-rest-service; each verified to replay standalone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add HgvsTools.build_cnv_location so CnvTranslator doesn't need to import
_hgvs_pos_to_vrs directly across module boundaries. Keeps positional
representation details inside hgvs_tools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mixed int+Range PosEdit round-trip (covers the as_interval=True wrapper
for the exact side), and a regression test that the ref='' workaround
is still required (ref=None raises HGVSError at str-time).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Execute notebook end-to-end so execution_counts and iopub timestamps
come from a single session. No content changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cstring

_is_uncertain_range now returns TypeGuard[hgvs.location.Interval] so Pylance
can narrow `pos` to Interval inside `if _is_uncertain_range(pos):` branches
(previously the .start/.end access on the narrowed member produced a
type-checker warning since SimplePosition and BaseOffsetPosition don't have
those attrs).

Also rewrite the _hgvs_pos_to_vrs docstring to list all three shapes it
handles — the helper isn't CNV-specific; branch 3 is the same arithmetic
used for plain certain positions on the Allele path. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rather than silently returning a wrong-but-valid integer when handed an
input shape that has no faithful VRS coordinate (intronic position, c./n.
transcript interval, or an unexpected non-uncertain Interval wrapper),
raise a typed error that names the limitation. Nested the Interval checks
so the branch ordering is structural rather than dependent on call order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces six duplicated `pos.start.base - 1` / `pos.end.base` expressions
with two named locals computed once via the self-validating primitive.
The function is now defensible against intronic and BaseOffsetInterval
inputs without relying on upstream caller guards — typed errors from
_hgvs_pos_to_vrs propagate instead of opaque AttributeError or silent
intron-offset drops.

The ins branch keeps direct attribute access because hgvs insertion
semantics use a single pos.start.base for both endpoints with no shift,
which doesn't fit the side="start"/"end" convention.

Add TestGetPositionAndState with three defensive tests:
- uncertain-range input returns (Range, Range, str), locking the
  "rejection is a translator-layer concern" boundary;
- intronic input raises ValueError;
- BaseOffsetInterval input raises TypeError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…slator

HgvsTools is a parsing/conversion utility shared by AlleleTranslator and
CnvTranslator; its error messages should not name sibling translator
classes. The "use CnvTranslator for del/dup" guidance now lives in
AlleleTranslator._from_hgvs, which is the layer that knows about its
sibling. extract_allele_values keeps its intronic-rejection guard, since
intronic is an HGVS/VRS limitation rather than a translator-routing
decision.

Expose the uncertain-range check through HgvsTools.has_uncertain_range
so the translator doesn't need to import a module-private underscore
helper across module boundaries — matches the precedent from fb3055b
(HgvsTools.build_cnv_location).

The existing test_from_hgvs_uncertain_range_raises is unchanged (still
network-free, since rejection still happens before any data-proxy call —
just one stack frame higher).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@theferrit32 theferrit32 marked this pull request as ready for review April 22, 2026 19:09
@theferrit32 theferrit32 requested review from a team as code owners April 22, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant