Skip to content

Implement backward seek for updateCursor, fixes a quadratic behavior in HOLSourceAST.mkFileline#1907

Open
tanyongkiam wants to merge 1 commit intoHOL-Theorem-Prover:mcandidatefrom
tanyongkiam:mcandidate
Open

Implement backward seek for updateCursor, fixes a quadratic behavior in HOLSourceAST.mkFileline#1907
tanyongkiam wants to merge 1 commit intoHOL-Theorem-Prover:mcandidatefrom
tanyongkiam:mcandidate

Conversation

@tanyongkiam
Copy link
Copy Markdown
Contributor

@tanyongkiam tanyongkiam commented Apr 15, 2026

Problem

bin/Holmake hangs when processing large machine-generated HOL scripts
such as armv86aScript.sml (480K lines, 8117 definitions). The full HOL
source translator (HOLSource.fileToReader) is used by holdeptool to
scan dependency information, and its runtime was O(N²) in the number of
definitions.

Root cause

HOLSourceAST.updateCursor always reset to byte position 0 on any
backward seek:

  if p < pos then updateCursor (newCursor body evts, p) (* TODO walk backwards *) 

Backward seeks arise because expandDec and printDecs share a single
fileline cursor but visit positions in opposite order within each
declaration. For val _ = Define \...``:

  1. expandDec calls fileline at the quote content position (just after
    the opening backtick)
  2. printDecs then calls fileline at the val keyword position (a few
    bytes earlier)

The backward seek is only ~20 bytes, but the old code reset to 0 and
rescanned from the start of the file each time. For definition K at byte
position B_K this cost O(B_K), giving O(N²) total.

Fix

Teach updateCursor to walk backwards directly. There are now three cases:

  • Same line (p >= pos - col): pos - col is the line start, so
    col adjusts by col - (pos - p) in O(1) with no scan. This covers
    the common case.
  • Earlier line: a single backward scan from pos-1 counts newlines
    in [p, pos) and stops at the first newline before p, giving both
    the line delta and new line start in one pass.
  • Events crossed (e.g. #line pragmas applied in (p, pos]): fall
    back to reset, as before.

Total work across all definitions is O(N).

Diagnosis by Claude here: #❄️ HOL4 > Candidate SHAs for HOL4 master @ 💬

@tanyongkiam tanyongkiam requested a review from digama0 April 15, 2026 13:15
Previously, any backward seek in updateCursor reset to position 0 and
rescanned from scratch ("TODO walk backwards"). Now there are three cases:

  - Same line (p >= pos - col): pos - col is the line start, so the new
    col is col - (pos - p) in O(1), no scan at all.
  - Earlier line: a single backward scan from pos-1 counts newlines in
    [p, pos) and stops at the first newline before p, giving both the
    line delta and the new line start in one pass.
  - Events crossed (e.g. #line pragmas applied in (p, pos]): fall back
    to reset, as before.

This fixes O(N^2) behavior when processing files with N definitions.
mkFileline is called with non-monotone positions: expandDec advances the
cursor to the opening quote content, then the printer's first startSpan
seeks back to the val keyword a few bytes earlier. With the old reset,
definition K at byte position B_K cost O(B_K) per backward seek, giving
O(N^2) total. The same-line fast path reduces this to O(1) per definition,
fixing the hang on large generated files like armv86aScript.sml (480K
lines, 8117 definitions).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tanyongkiam tanyongkiam changed the title Fix O(N^2) behavior in HOLSourceAST.mkFileline Implement backward seek for updateCursor, fixes a quadratic behavior in HOLSourceAST.mkFileline Apr 15, 2026
@mn200
Copy link
Copy Markdown
Member

mn200 commented Apr 15, 2026

Looks good to me. Will merge in 12 hours or so, giving @digama0 a chance to weigh in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants