Skip to content

Holmake: add locking for concurrent safety (#1826)#1905

Open
xrchz wants to merge 6 commits intodevelopfrom
holmakemulti
Open

Holmake: add locking for concurrent safety (#1826)#1905
xrchz wants to merge 6 commits intodevelopfrom
holmakemulti

Conversation

@xrchz
Copy link
Copy Markdown
Member

@xrchz xrchz commented Apr 13, 2026

When two separate Holmake processes build in directories that share a common dependency, both may try to rebuild the shared directory simultaneously, leading to race conditions and build failures.

Fix this by acquiring a POSIX fcntl advisory lock on a per-directory lockfile (.hol/holmake.lock) before building targets in that directory. If another Holmake process is already building there, the second process blocks until the first finishes, then re-checks timestamps and skips targets that are already up-to-date.

The locking is added to both build paths:

  • HM_GraphBuildJ1 (jobs=1): tracks the currently locked directory and acquires/releases on directory transitions.
  • multibuild (jobs>1): uses a refcount-based lock map since multiple jobs from different directories may be active concurrently.

On non-Unix systems or if locking fails (e.g., unsupported filesystem), Holmake warns and proceeds without locking, preserving the previous behaviour.

When two separate Holmake processes build in directories that share a
common dependency, both may try to rebuild the shared directory
simultaneously, leading to race conditions and build failures.

Fix this by acquiring a POSIX fcntl advisory lock on a per-directory
lockfile (.hol/holmake.lock) before building targets in that directory.
If another Holmake process is already building there, the second process
blocks until the first finishes, then re-checks timestamps and skips
targets that are already up-to-date.

The locking is added to both build paths:
- HM_GraphBuildJ1 (jobs=1): tracks the currently locked directory and
  acquires/releases on directory transitions.
- multibuild (jobs>1): uses a refcount-based lock map since multiple
  jobs from different directories may be active concurrently.

On non-Unix systems or if locking fails (e.g., unsupported filesystem),
Holmake warns and proceeds without locking, preserving the previous
behaviour.
@charles-cooper
Copy link
Copy Markdown
Contributor

if it's not too much extra work (or even the same amount of work?) per-file would be good for users too. actually strictly better i think

@xrchz
Copy link
Copy Markdown
Member Author

xrchz commented Apr 13, 2026

if it's not too much extra work (or even the same amount of work?) per-file would be good for users too. actually strictly better i think

I think it's a fair amount of more work for not that much benefit... but we could look into it...

@xrchz
Copy link
Copy Markdown
Member Author

xrchz commented Apr 13, 2026

(This comment was generated by Claude)

I analysed what per-file (per-target) locking would entail versus the current per-directory approach. Here is the summary:

Complexity increase: Per-file locking is roughly 2-3x more code (~150-200 lines vs ~110). The main complications are:

  1. BIC_BuildScript produces multiple files atomically (*Theory.sig, *Theory.sml, *Theory.dat). These are separate graph nodes sharing one command (linked via find_nodes_by_command). A per-file lock would need to understand this grouping — you would lock on the script name rather than individual targets, and you need care with lock ordering to avoid deadlocks if two processes try to lock the same group differently.

  2. poly_compile writes both .uo and sometimes .ui from a single SML file compilation. Another mini multi-target case.

  3. The async gap in multibuild: Locks must be held across the NewJobupdate callback boundary, meaning file descriptors need to stay alive across the async dispatch. Requires tracking per-target fds rather than a simple per-directory one.

  4. Lock call sites multiply: Per-directory has ~2 lock/unlock points (directory transitions). Per-file requires lock/unlock at each of the ~6-8 command dispatch sites in both build paths.

Practical benefit is minimal for the reported use case: The issue describes two separate Holmake processes both needing the same shared directory X rebuilt. Both processes build all targets in X (since they depend on theories from X). With per-file locking, both would try to build the same files in the same dependency order, so they would just take turns on each file serially — same total wall-clock time as per-directory, but with more lock overhead. Per-file only helps if the two processes need different subsets of targets in X, which is uncommon.

Where per-file would matter: If a future single Holmake with --jobs=N built across directories with fine-grained parallelism, per-file locking could enable more concurrency. But that is a larger architectural change beyond the scope of this issue.

@xrchz
Copy link
Copy Markdown
Member Author

xrchz commented Apr 13, 2026

(This comment was generated by Claude)

Correction to my earlier analysis: I was wrong that both processes would always need "all of directory X built". Holmake already does fine-grained dependency tracking via mkneeded — only targets transitively required by the local build get marked Pending{needed=true}, and find_runnable only picks those. I confirmed empirically that an unrelated theory in a shared directory stays Pending{needed=false} and is not built.

This means per-file locking has a more meaningful benefit than I originally claimed: if dirB needs theory A from shared X, and dirC needs theory B from X, per-directory locking forces dirC to wait while dirB builds A, even though they are working on independent targets. Per-file locking would allow both to proceed concurrently on their respective targets, only blocking when both actually touch the same file.

So @charles-cooper's point is stronger than I gave it credit for. The complexity cost is still real (multi-target grouping for BIC_BuildScript, async fd lifetime in multibuild, more lock call sites), but the benefit is also real for projects with large shared directories containing independent theories.

@charles-cooper
Copy link
Copy Markdown
Contributor

you would lock on the script name rather than individual targets, and you need care with lock ordering to avoid deadlocks if two processes try to lock the same group differently

without looking at the actual implementation code i'm skeptical about this. you would just have .hol/someScript.sml.lock instead of .hol/holmake.lock.

@xrchz xrchz changed the title Holmake: add per-directory locking for concurrent safety (#1826) Holmake: add locking for concurrent safety (#1826) Apr 14, 2026
@mn200
Copy link
Copy Markdown
Member

mn200 commented Apr 15, 2026

The test-code seems as if it has to be run manually. Can you not include it in the build’s testing?

xrchz added 3 commits April 15, 2026 10:21
createDirIfNecessary: handle the race where two concurrent Holmake
processes both try to create the same directory (e.g., .hol/objs).
Previously, the second process would crash with EEXIST. Now catch
OS.SysErr on mkDir and re-check whether the path is already a
directory, only failing if it isn't.
Add dirlock test to parallel_tests so it runs automatically during
selftest builds (bin/build -t).
@xrchz
Copy link
Copy Markdown
Member Author

xrchz commented Apr 15, 2026

The test-code seems as if it has to be run manually. Can you not include it in the build’s testing?

should be included now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants