Add translator support for hgvs uncertain ranges#632
Open
theferrit32 wants to merge 14 commits intoga4gh:v3from
Open
Add translator support for hgvs uncertain ranges#632theferrit32 wants to merge 14 commits intoga4gh:v3from
theferrit32 wants to merge 14 commits intoga4gh:v3from
Conversation
Closes ga4gh#609. ClinVar reports CNVs with imprecise breakpoints as nested parenthesized HGVS (e.g. (A_B)_(C_D)dup, (?_N)_(M_?)del). hgvs 2.0.0a0 parses these as nested Interval positions; vrs-python now maps them to VRS Range values in SequenceLocation.start/end. - pyproject.toml: bump hgvs>=2.0.0a0,<3.0. - hgvs_tools.py: add _hgvs_pos_to_vrs / _vrs_pos_to_hgvs helpers that handle 1-based inclusive ↔ 0-based interbase shift per side. - CnvTranslator._from_hgvs: route through the helper; emits Range start/end for uncertain bounds. - HgvsTools.get_position_and_state: raise a clear ValueError when an Allele path is given an uncertain-range expression (LiteralSequence Expression can't represent unknown reference). - HgvsTools._to_sequence_variant: emit nested uncertain Intervals when a VRS location carries a Range, wrapping the certain side in a non-uncertain Interval so hgvs's formatter can compare start/end. - Tests: no-network unit tests for the helpers plus parametrized CnvTranslator tests and AlleleTranslator from/to tests (VCR). - Add notebook 7_Uncertain_Range_HGVS.ipynb demoing the feature. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hgvs 2.0.0a0 parses the exact side of mixed forms like g.100_(200_?)del as Interval(uncertain=False), not SimplePosition. Treating any Interval as uncertain produced degenerate Range([N,N]) positions and diverging SequenceLocation digests for the same variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the rejection from get_position_and_state into extract_allele_values right after parse(), so the failure is signaled as a representation limit rather than a data-unavailability error. This also makes the raises test network-free; drop its @pytest.mark.vcr. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cassettes for the four CnvTranslator._from_hgvs parametrizations and the AlleleTranslator._to_hgvs round-trip. Recorded against local seqrepo-rest-service; each verified to replay standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add HgvsTools.build_cnv_location so CnvTranslator doesn't need to import _hgvs_pos_to_vrs directly across module boundaries. Keeps positional representation details inside hgvs_tools. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mixed int+Range PosEdit round-trip (covers the as_interval=True wrapper for the exact side), and a regression test that the ref='' workaround is still required (ref=None raises HGVSError at str-time). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Execute notebook end-to-end so execution_counts and iopub timestamps come from a single session. No content changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cstring _is_uncertain_range now returns TypeGuard[hgvs.location.Interval] so Pylance can narrow `pos` to Interval inside `if _is_uncertain_range(pos):` branches (previously the .start/.end access on the narrowed member produced a type-checker warning since SimplePosition and BaseOffsetPosition don't have those attrs). Also rewrite the _hgvs_pos_to_vrs docstring to list all three shapes it handles — the helper isn't CNV-specific; branch 3 is the same arithmetic used for plain certain positions on the Allele path. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rather than silently returning a wrong-but-valid integer when handed an input shape that has no faithful VRS coordinate (intronic position, c./n. transcript interval, or an unexpected non-uncertain Interval wrapper), raise a typed error that names the limitation. Nested the Interval checks so the branch ordering is structural rather than dependent on call order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces six duplicated `pos.start.base - 1` / `pos.end.base` expressions with two named locals computed once via the self-validating primitive. The function is now defensible against intronic and BaseOffsetInterval inputs without relying on upstream caller guards — typed errors from _hgvs_pos_to_vrs propagate instead of opaque AttributeError or silent intron-offset drops. The ins branch keeps direct attribute access because hgvs insertion semantics use a single pos.start.base for both endpoints with no shift, which doesn't fit the side="start"/"end" convention. Add TestGetPositionAndState with three defensive tests: - uncertain-range input returns (Range, Range, str), locking the "rejection is a translator-layer concern" boundary; - intronic input raises ValueError; - BaseOffsetInterval input raises TypeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…slator HgvsTools is a parsing/conversion utility shared by AlleleTranslator and CnvTranslator; its error messages should not name sibling translator classes. The "use CnvTranslator for del/dup" guidance now lives in AlleleTranslator._from_hgvs, which is the layer that knows about its sibling. extract_allele_values keeps its intronic-rejection guard, since intronic is an HGVS/VRS limitation rather than a translator-routing decision. Expose the uncertain-range check through HgvsTools.has_uncertain_range so the translator doesn't need to import a module-private underscore helper across module boundaries — matches the precedent from fb3055b (HgvsTools.build_cnv_location). The existing test_from_hgvs_uncertain_range_raises is unchanged (still network-free, since rejection still happens before any data-proxy call — just one stack frame higher). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.