Skip to content

Indexes 5: Adds spk repo index subcommand for index generation and updates#1340

Open
dcookspi wants to merge 17 commits intoindex-4-indexed-repository-and-fbindexfrom
index-5-repo-cmds-config-and-cli-flags
Open

Indexes 5: Adds spk repo index subcommand for index generation and updates#1340
dcookspi wants to merge 17 commits intoindex-4-indexed-repository-and-fbindexfrom
index-5-repo-cmds-config-and-cli-flags

Conversation

@dcookspi
Copy link
Copy Markdown
Collaborator

@dcookspi dcookspi commented Mar 20, 2026

This adds a new repo index subcommand to spk for index generation and updates. It adds the --use-indexes and --no-indexes flags for repository index usage. This also updates resolvo solver to get global variables data from an indexed repository. This allows resolvo to solve without needing to restart its solves.

Indexing

The index is designed to help the solvers with solve times. It doesn't contain enough data to help with other spk operations like building and testing a package.

Indexing can be enabled or disabled in the spk config file. If indexing is enabled, you have to generate an index, with spk repo index, prior to trying to use it. They are not generated on the fly (outside of automated tests).

To generate an index (for the origin):

  • spk repo index --disable-repo local

To update an existing index, e.g. after a new python package was published:

  • spk repo index --disable-repo local --update python

The flatbuffer index data is stored in a file in the underlying spfs repo in a index/spk/ sub-directory.

If index use is enabled in the config file, it can be disabled with the --no-indexes command line flag. If index use is disabled by default, it can be enbled with the --use-indexes flag. If index use is enabled, but no index has been generated, spk will fallback to using the underlying repo directly (it acts as it would before this change).

Speed Diferences

Generating the index file on our repo (sizes below) takes about 2 minutes. Updating a package in an existing index, such as after a new build is published, takes a few seconds.

Sample solver time improvements using this indexing

The numbers come from this setup:

  • an origin repo that has 2245 packages, 23540 versions, 82517 builds (11 erroring, about 30% deprecated), and 141 global vars
  • the index loads in ~0.0003 seconds unverified, or ~0.2 seconds verified, and is about 76 MB on disk with trimmed down deprecated builds (107 MB with full deprecated builds)
  • these times are from a rough average of 3-4 runs with index verification disabled
  • a "toolset" below is a set of requests for the named DCC and our typical in-house plugins and tools
Requests        | Solution size | Num.    | Solve time  |  Indexed solve time, no retries
                | (# packages)  | Retries | (seconds)   |  (seconds)
-----------------------------------------------------------------------------------------
python          |        4      |    1    |     0.17    |   0.03 
boost-python    |        8      |    1    |     0.31    |   0.05 
python-torch    |       37      |    2    |     0.58    |   0.15 
widget toolset  |       60      |    2    |     3.44    |   0.48 
katana toolset  |      181      |   10    |    18.32    |   2.97 
nuke toolset    |      280      |   12    |    24.32    |   4.55 (*)
houdini toolset |      211      |   19    |    37.24    |   6.58 (*)
maya toolset    |      403      |   20    |    59.50    |   9.52 (*)

The indexing doesn't have a noticable impact (to users) on smaller solves. But it allows our larger solves to finish in under 10 seconds, or about 1/6th of the time they currently do. The times marked with (*) are improved further by the changes in PR6: (#1344).

This is the final 5 of 5 chained PRs for adding indexes to spk solves:

  1. Indexes 1: Change Package and related traits to not return references to fields #1336
  2. Indexes 2: Add new_unchecked() constructors to spk schema objects #1337
  3. Indexes 3: Adds flatbuffers schema and SolverPackageSpec for indexes to spk #1338
  4. Indexes 4: Adds Indexes for SPK repositories #1339
  5. this PR
  6. Indexes 6: Changes version_filter field in index schema #1344
  7. Indexes 7: Adds a lock file around index generation and updates #1354
  8. Indexes 8: Fixes the spk build or mkb crash with indexes enabled #1355
  9. Indexes 9 - Adds messaging on package events to kafka #1356

@dcookspi dcookspi self-assigned this Mar 20, 2026
@dcookspi dcookspi added enhancement New feature or request SPI AOI Area of interest for SPI pr-chain This PR doesn't target the main branch, don't merge! labels Mar 20, 2026
@dcookspi dcookspi changed the title Indexes 5: Adds 'spk repo index' subcommand for index generation and updates Indexes 5: Adds spk repo index subcommand for index generation and updates Mar 20, 2026
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch from 9da2f7c to d88602e Compare March 20, 2026 01:22
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

@dcookspi dcookspi requested review from jrray and rydrman March 20, 2026 19:15
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch from d88602e to 027a156 Compare March 20, 2026 19:27
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch from 8530adc to 68bb519 Compare March 20, 2026 19:30
Comment thread crates/spk-cli/cmd-repo/src/cmd_repo.rs Outdated
Comment thread crates/spk-cli/common/src/flags.rs
Comment thread crates/spk-cli/group4/src/cmd_view.rs
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch from 68bb519 to 1e28bf4 Compare March 20, 2026 19:53
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch from 027a156 to cdaf8f9 Compare March 20, 2026 19:55
Comment thread crates/spk-storage/src/storage/flatbuffer_index.rs
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch from 1e28bf4 to 854446a Compare March 21, 2026 01:07
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch from cdaf8f9 to b70cba2 Compare March 21, 2026 01:08
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch from 854446a to a58142a Compare March 25, 2026 01:13
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch from b70cba2 to 4698954 Compare March 25, 2026 01:15
@dcookspi dcookspi force-pushed the index-4-indexed-repository-and-fbindex branch 2 times, most recently from b32582c to 1a43f64 Compare March 27, 2026 19:30
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch 2 times, most recently from b8f7428 to 3871cc5 Compare March 27, 2026 23:28
Comment thread crates/spk-cli/cmd-repo/src/cmd_repo.rs Outdated
Comment thread crates/spk-cli/common/src/flags.rs Outdated
Comment on lines +1070 to +1077
pub use_indexes: bool,

/// Do not get the package data from the repo index, always use
/// the repo instead. This only applies to non-destructive repo
/// operations. This option can be configured as the default in
/// spk's config file.
#[clap(long, conflicts_with = "use_indexes")]
pub no_indexes: bool,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have both? Can a "global" option to disable index use despite what an individual repo is configured to do exist at a higher level?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It lets a site (or user) enable indexes in the spk config file (so the default for all uses), and disable them for some command line runs, and visa versa - if a site (or user( disables indexes in the spk config file, this lets them be enabled for some command line runs.

We're likely to enable indexes in the config file, and probably use --no-indexes sometimes (if there's an issue as a workaround, or for testing something). But another site might prefer it the other way around for some reason.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the concept as you describe the usage pattern but still dislike these two options existing here at the same level in the configuration hierarchy.

We already have a configuration pattern of some config property that can be set in a config file but overridden with an env var (or possibly a command-line option). Having these two with opposite meanings creates confusion about which gets set where and which overrules the other.

This could be a case for needing something other than one or two bool options but use an enum instead:

  • an option that disables indexes globally and overrules any repo-specific setting
  • an option that delegates to repo-specific settings (default)
  • an option that enables indexes globally but doesn't overrule any repo-specific setting (we'd likely use this one in our config file)
  • (maybe) an option that enables indexes globally and overrules any repo-specific setting, but this one feels questionable

I'd be okay with having a flag like no_indexes that acts as an alias / shortcut for picking the option that globally disables indexes, but this wouldn't map to a setting that lives in the config file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion today, I've updated the spk config file structure to remove the global index settings and have them on each spk repository section, replaced the command line options with a single flag with an enum of values, and changed the defaults to use an index if one exists, except for the local repo.

@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch 2 times, most recently from 39f3a57 to 50c68f1 Compare April 9, 2026 18:17
@dcookspi
Copy link
Copy Markdown
Collaborator Author

Updated the spk repo index --update ... option so it can be specified multiple times, and fixed a couple of bugs related to updating specific package/versions, or a deleted package/version, in the index.

@dcookspi dcookspi requested a review from jrray April 10, 2026 00:54
Comment thread crates/spk-cli/cmd-repo/src/cmd_repo.rs Outdated
if !update.is_empty() {
// Update the existing index for the given package/version
let start = Instant::now();
let idents: Vec<VersionIdent> = update
Copy link
Copy Markdown
Collaborator Author

@dcookspi dcookspi Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo:

  • Swap OptVersionIdent in for VersionIdent for the list of things to update.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've swapped this in and removed the reliance on the default 0.0.0 version.

dcookspi added 17 commits April 30, 2026 15:39
… flabuffers index and configuration

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Adds --use-indexes and --no-indexes flags to repository.
Updates resolvo solver to get global variables data from an indexed repository.

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
…ndles.

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
…dex use.

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
… option

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
times, and fixes bugs when using it to update a specific package
version.

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
instead of just the ones for the packages being updated.

Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
@dcookspi dcookspi force-pushed the index-5-repo-cmds-config-and-cli-flags branch from 52b5be2 to 7556016 Compare May 1, 2026 17:40

// spk repo index ...
Self::Index { repo, update } => {
// Generate or update an index a repo. The repo must
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Generate or update an index a repo. The repo must
// Generate or update an index in a repo. The repo must

Comment on lines +144 to +146
Err(err) => {
// There isn't an existing index, so generate one from scratch that
// will also include the update package version.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you match on a more specific error before assuming the problem is that the index doesn't exist?

Comment on lines +1009 to +1010
/// The index use command line setting options that can be used to
/// override index usage set in the spk config file. The default for a
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The index use command line setting options that can be used to
/// override index usage set in the spk config file. The default for a
/// The options that can be used to
/// override index usage set in the spk config file. The default for a

I had a really hard time parsing that sentence.

/// using the matching environment variables.
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum)]
pub enum IndexUse {
/// Use the index use settings from the repository configurations
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Use the index use settings from the repository configurations
/// Use the settings from the repository configurations

It's the "Use ... use" I'm having trouble with.

// Check whether using the indexes for the repos is globally
// disabled by the spk command, such as 'spk repo index' or
// 'spk info'.
let disable_all_index_use = DISABLE_INDEX_USE.load(std::sync::atomic::Ordering::Relaxed);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Relaxed here is inconsistent with using Release on line 70, which would imply pairing with Acquire here. However, since this is only for doing an atomic read/write, it is fine to use Relaxed in both places. Either way, this load is not strictly guaranteed to see a prior store. Use Mutex<bool> for that guarantee.

Comment thread docs/ref/indexes.md
packages at a once, e.g.:
`spk repo index -r origin --update python --update zlib`

The `--update` option take a package/version as well. This lets the
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `--update` option take a package/version as well. This lets the
The `--update` option takes a package/version as well. This lets the

Comment thread docs/ref/indexes.md

The `--update` option take a package/version as well. This lets the
update be restricted to a specific version of a package. This can make
for shorter update times for packages with large numbers of versions,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for shorter update times for packages with large numbers of versions,
for shorter update times for packages with a large number of versions,

Comment thread docs/ref/indexes.md
Those commands will read in the existing index for the repository and
update the versions and builds of the named package in the index. It
is faster than generating an index from scratch. It has to be run once
per repository to update the given package or packages in that
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
per repository to update the given package or packages in that
per repository to update the given package or packages in each

Comment thread docs/ref/indexes.md
See `spk repo index -h` for more details.


## Index vs Repository mismatches - updates are important
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heading capitalization in this document is inconsistent (or I'm not seeing the pattern). Please pick one style.

Comment thread docs/ref/indexes.md
spfs fs repository.


### Structure and types in SPK
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another consistency nit, we can't decide if it's Spk or SPK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pr-chain This PR doesn't target the main branch, don't merge! SPI AOI Area of interest for SPI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants