Add a separate option to run the API server using mmap#89
Add a separate option to run the API server using mmap#89
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the API/index loading pipeline to support optional memory-mapped file I/O and to load the index’s protein/mapping data from new binary formats, aiming to improve startup and query performance.
Changes:
- Add a
--mmapCLI option and plumb it throughapi::startinto index loading. - Switch index construction to use
sa_serverloader functions and new binary inputs (proteins.bin,mapping.bin). - Update dependencies/lockfile to new
unipept-indexcrates and introducesa-server.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| index/src/lib.rs | Switches index loading to sa_server loaders and constructs the new Searcher with a mapping file + use_mmap. |
| index/Cargo.toml | Replaces prior deps with sa-server (and pins several deps to a feature branch). |
| api/src/main.rs | Adds --mmap flag and passes it into start(...). |
| api/src/lib.rs | Updates start(...) signature and switches expected index data files to .bin + mapping input. |
| Cargo.lock | Updates git dependency sources and adds memmap2/sa-server transitive deps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
SimonVandeVyver
left a comment
There was a problem hiding this comment.
I looked at the code and ran the api, everything worked as expected.
I performed some API calls while using the mmap feature, with the mmap feature disables and using the develop branch, which all return the same results
| sa-compression = { git = "https://github.com/unipept/unipept-index.git" } | ||
| sa-index = { git = "https://github.com/unipept/unipept-index.git" } | ||
| sa-mappings = { git = "https://github.com/unipept/unipept-index.git" } | ||
| sa-server = { git = "https://github.com/unipept/unipept-index.git", branch = "feature/speedup-loading-index" } |
There was a problem hiding this comment.
The branch of the dependency is still using the feature branch
Important
This PR can only be merged, when all other PR's in other repositories are merged:
unipept/unipept-index#34
unipept/unipept-utilities#2
Before merging, the dependencies need to be updated to point to the main branch. Right now they point to a separate branch for testing purposes
This pull request updates how the index and protein data are loaded in the API, introducing support for memory-mapped file loading and switching to new binary file formats for improved speed and efficiency. The most important changes are grouped below:
Index and Data Loading Improvements:
startfunction inapi/src/lib.rsnow takes ause_mmapboolean parameter, allowing the API to load index files using memory-mapped I/O for faster access. This parameter is also exposed as a command-line argument (--mmap) inapi/src/main.rs.proteins.binandmapping.bininstead ofproteins.tsv), and uses updated loader functions from the newsa_serverdependency.Indexobject is updated to use the newSearcherstructure, taking advantage of the new mapping file and memory-mapped loading.Code Cleanup: