Skip to content

Add a separate option to run the API server using mmap#89

Open
tibvdm wants to merge 11 commits intodevelopfrom
feature/speedup-loading-index
Open

Add a separate option to run the API server using mmap#89
tibvdm wants to merge 11 commits intodevelopfrom
feature/speedup-loading-index

Conversation

@tibvdm
Copy link
Copy Markdown
Collaborator

@tibvdm tibvdm commented Mar 30, 2026

Important

This PR can only be merged, when all other PR's in other repositories are merged:
unipept/unipept-index#34
unipept/unipept-utilities#2

Before merging, the dependencies need to be updated to point to the main branch. Right now they point to a separate branch for testing purposes

This pull request updates how the index and protein data are loaded in the API, introducing support for memory-mapped file loading and switching to new binary file formats for improved speed and efficiency. The most important changes are grouped below:

Index and Data Loading Improvements:

  • The start function in api/src/lib.rs now takes a use_mmap boolean parameter, allowing the API to load index files using memory-mapped I/O for faster access. This parameter is also exposed as a command-line argument (--mmap) in api/src/main.rs.
  • The index now loads protein and mapping data from new binary files (proteins.bin and mapping.bin instead of proteins.tsv), and uses updated loader functions from the new sa_server dependency.
  • The construction of the Index object is updated to use the new Searcher structure, taking advantage of the new mapping file and memory-mapped loading.

Code Cleanup:

  • Old code for loading the suffix array from text and compressed formats is removed, as this is now handled by the new loader functions.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the API/index loading pipeline to support optional memory-mapped file I/O and to load the index’s protein/mapping data from new binary formats, aiming to improve startup and query performance.

Changes:

  • Add a --mmap CLI option and plumb it through api::start into index loading.
  • Switch index construction to use sa_server loader functions and new binary inputs (proteins.bin, mapping.bin).
  • Update dependencies/lockfile to new unipept-index crates and introduce sa-server.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
index/src/lib.rs Switches index loading to sa_server loaders and constructs the new Searcher with a mapping file + use_mmap.
index/Cargo.toml Replaces prior deps with sa-server (and pins several deps to a feature branch).
api/src/main.rs Adds --mmap flag and passes it into start(...).
api/src/lib.rs Updates start(...) signature and switches expected index data files to .bin + mapping input.
Cargo.lock Updates git dependency sources and adds memmap2/sa-server transitive deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread index/src/lib.rs Outdated
Comment thread index/src/lib.rs Outdated
Comment thread index/src/lib.rs Outdated
Comment thread index/Cargo.toml Outdated
Comment thread index/src/lib.rs Outdated
@tibvdm tibvdm requested a review from SimonVandeVyver April 1, 2026 11:40
@tibvdm tibvdm marked this pull request as ready for review April 1, 2026 11:41
Copy link
Copy Markdown
Contributor

@SimonVandeVyver SimonVandeVyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the code and ran the api, everything worked as expected.
I performed some API calls while using the mmap feature, with the mmap feature disables and using the develop branch, which all return the same results

Comment thread index/Cargo.toml
sa-compression = { git = "https://github.com/unipept/unipept-index.git" }
sa-index = { git = "https://github.com/unipept/unipept-index.git" }
sa-mappings = { git = "https://github.com/unipept/unipept-index.git" }
sa-server = { git = "https://github.com/unipept/unipept-index.git", branch = "feature/speedup-loading-index" }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch of the dependency is still using the feature branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants