Implement backward seek for updateCursor, fixes a quadratic behavior in HOLSourceAST.mkFileline#1907
Open
tanyongkiam wants to merge 1 commit intoHOL-Theorem-Prover:mcandidatefrom
Open
Conversation
Previously, any backward seek in updateCursor reset to position 0 and
rescanned from scratch ("TODO walk backwards"). Now there are three cases:
- Same line (p >= pos - col): pos - col is the line start, so the new
col is col - (pos - p) in O(1), no scan at all.
- Earlier line: a single backward scan from pos-1 counts newlines in
[p, pos) and stops at the first newline before p, giving both the
line delta and the new line start in one pass.
- Events crossed (e.g. #line pragmas applied in (p, pos]): fall back
to reset, as before.
This fixes O(N^2) behavior when processing files with N definitions.
mkFileline is called with non-monotone positions: expandDec advances the
cursor to the opening quote content, then the printer's first startSpan
seeks back to the val keyword a few bytes earlier. With the old reset,
definition K at byte position B_K cost O(B_K) per backward seek, giving
O(N^2) total. The same-line fast path reduces this to O(1) per definition,
fixing the hang on large generated files like armv86aScript.sml (480K
lines, 8117 definitions).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Member
|
Looks good to me. Will merge in 12 hours or so, giving @digama0 a chance to weigh in. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
bin/Holmakehangs when processing large machine-generated HOL scriptssuch as
armv86aScript.sml(480K lines, 8117 definitions). The full HOLsource translator (
HOLSource.fileToReader) is used byholdeptooltoscan dependency information, and its runtime was O(N²) in the number of
definitions.
Root cause
HOLSourceAST.updateCursoralways reset to byte position 0 on anybackward seek:
Backward seeks arise because
expandDecandprintDecsshare a singlefilelinecursor but visit positions in opposite order within eachdeclaration. For
val _ = Define \...``:expandDeccallsfilelineat the quote content position (just afterthe opening backtick)
printDecsthen callsfilelineat thevalkeyword position (a fewbytes earlier)
The backward seek is only ~20 bytes, but the old code reset to 0 and
rescanned from the start of the file each time. For definition K at byte
position B_K this cost O(B_K), giving O(N²) total.
Fix
Teach
updateCursorto walk backwards directly. There are now three cases:p >= pos - col):pos - colis the line start, socoladjusts bycol - (pos - p)in O(1) with no scan. This coversthe common case.
pos-1counts newlinesin
[p, pos)and stops at the first newline beforep, giving boththe line delta and new line start in one pass.
#linepragmas applied in(p, pos]): fallback to reset, as before.
Total work across all definitions is O(N).
Diagnosis by Claude here: #❄️ HOL4 > Candidate SHAs for HOL4 master @ 💬