Indexes 5: Adds spk repo index subcommand for index generation and updates#1340
Indexes 5: Adds spk repo index subcommand for index generation and updates#1340dcookspi wants to merge 17 commits intoindex-4-indexed-repository-and-fbindexfrom
spk repo index subcommand for index generation and updates#1340Conversation
spk repo index subcommand for index generation and updates
9da2f7c to
d88602e
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
d88602e to
027a156
Compare
8530adc to
68bb519
Compare
68bb519 to
1e28bf4
Compare
027a156 to
cdaf8f9
Compare
1e28bf4 to
854446a
Compare
cdaf8f9 to
b70cba2
Compare
854446a to
a58142a
Compare
b70cba2 to
4698954
Compare
b32582c to
1a43f64
Compare
b8f7428 to
3871cc5
Compare
| pub use_indexes: bool, | ||
|
|
||
| /// Do not get the package data from the repo index, always use | ||
| /// the repo instead. This only applies to non-destructive repo | ||
| /// operations. This option can be configured as the default in | ||
| /// spk's config file. | ||
| #[clap(long, conflicts_with = "use_indexes")] | ||
| pub no_indexes: bool, |
There was a problem hiding this comment.
Why have both? Can a "global" option to disable index use despite what an individual repo is configured to do exist at a higher level?
There was a problem hiding this comment.
It lets a site (or user) enable indexes in the spk config file (so the default for all uses), and disable them for some command line runs, and visa versa - if a site (or user( disables indexes in the spk config file, this lets them be enabled for some command line runs.
We're likely to enable indexes in the config file, and probably use --no-indexes sometimes (if there's an issue as a workaround, or for testing something). But another site might prefer it the other way around for some reason.
There was a problem hiding this comment.
I agree with the concept as you describe the usage pattern but still dislike these two options existing here at the same level in the configuration hierarchy.
We already have a configuration pattern of some config property that can be set in a config file but overridden with an env var (or possibly a command-line option). Having these two with opposite meanings creates confusion about which gets set where and which overrules the other.
This could be a case for needing something other than one or two bool options but use an enum instead:
- an option that disables indexes globally and overrules any repo-specific setting
- an option that delegates to repo-specific settings (default)
- an option that enables indexes globally but doesn't overrule any repo-specific setting (we'd likely use this one in our config file)
- (maybe) an option that enables indexes globally and overrules any repo-specific setting, but this one feels questionable
I'd be okay with having a flag like no_indexes that acts as an alias / shortcut for picking the option that globally disables indexes, but this wouldn't map to a setting that lives in the config file.
There was a problem hiding this comment.
As per our discussion today, I've updated the spk config file structure to remove the global index settings and have them on each spk repository section, replaced the command line options with a single flag with an enum of values, and changed the defaults to use an index if one exists, except for the local repo.
39f3a57 to
50c68f1
Compare
|
Updated the |
| if !update.is_empty() { | ||
| // Update the existing index for the given package/version | ||
| let start = Instant::now(); | ||
| let idents: Vec<VersionIdent> = update |
There was a problem hiding this comment.
Todo:
- Swap
OptVersionIdentin forVersionIdentfor the list of things to update.
There was a problem hiding this comment.
I've swapped this in and removed the reliance on the default 0.0.0 version.
e5ee8b6 to
07eb6a8
Compare
5540bc7 to
ab04c9f
Compare
07eb6a8 to
a3c15eb
Compare
… flabuffers index and configuration Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Adds --use-indexes and --no-indexes flags to repository. Updates resolvo solver to get global variables data from an indexed repository. Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
…ndles. Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
…dex use. Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
… option Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
times, and fixes bugs when using it to update a specific package version. Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
instead of just the ones for the packages being updated. Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
Signed-off-by: David Gilligan-Cook <dcook@imageworks.com>
52b5be2 to
7556016
Compare
|
|
||
| // spk repo index ... | ||
| Self::Index { repo, update } => { | ||
| // Generate or update an index a repo. The repo must |
There was a problem hiding this comment.
| // Generate or update an index a repo. The repo must | |
| // Generate or update an index in a repo. The repo must |
| Err(err) => { | ||
| // There isn't an existing index, so generate one from scratch that | ||
| // will also include the update package version. |
There was a problem hiding this comment.
Can you match on a more specific error before assuming the problem is that the index doesn't exist?
| /// The index use command line setting options that can be used to | ||
| /// override index usage set in the spk config file. The default for a |
There was a problem hiding this comment.
| /// The index use command line setting options that can be used to | |
| /// override index usage set in the spk config file. The default for a | |
| /// The options that can be used to | |
| /// override index usage set in the spk config file. The default for a |
I had a really hard time parsing that sentence.
| /// using the matching environment variables. | ||
| #[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, ValueEnum)] | ||
| pub enum IndexUse { | ||
| /// Use the index use settings from the repository configurations |
There was a problem hiding this comment.
| /// Use the index use settings from the repository configurations | |
| /// Use the settings from the repository configurations |
It's the "Use ... use" I'm having trouble with.
| // Check whether using the indexes for the repos is globally | ||
| // disabled by the spk command, such as 'spk repo index' or | ||
| // 'spk info'. | ||
| let disable_all_index_use = DISABLE_INDEX_USE.load(std::sync::atomic::Ordering::Relaxed); |
There was a problem hiding this comment.
Using Relaxed here is inconsistent with using Release on line 70, which would imply pairing with Acquire here. However, since this is only for doing an atomic read/write, it is fine to use Relaxed in both places. Either way, this load is not strictly guaranteed to see a prior store. Use Mutex<bool> for that guarantee.
| packages at a once, e.g.: | ||
| `spk repo index -r origin --update python --update zlib` | ||
|
|
||
| The `--update` option take a package/version as well. This lets the |
There was a problem hiding this comment.
| The `--update` option take a package/version as well. This lets the | |
| The `--update` option takes a package/version as well. This lets the |
|
|
||
| The `--update` option take a package/version as well. This lets the | ||
| update be restricted to a specific version of a package. This can make | ||
| for shorter update times for packages with large numbers of versions, |
There was a problem hiding this comment.
| for shorter update times for packages with large numbers of versions, | |
| for shorter update times for packages with a large number of versions, |
| Those commands will read in the existing index for the repository and | ||
| update the versions and builds of the named package in the index. It | ||
| is faster than generating an index from scratch. It has to be run once | ||
| per repository to update the given package or packages in that |
There was a problem hiding this comment.
| per repository to update the given package or packages in that | |
| per repository to update the given package or packages in each |
| See `spk repo index -h` for more details. | ||
|
|
||
|
|
||
| ## Index vs Repository mismatches - updates are important |
There was a problem hiding this comment.
The heading capitalization in this document is inconsistent (or I'm not seeing the pattern). Please pick one style.
| spfs fs repository. | ||
|
|
||
|
|
||
| ### Structure and types in SPK |
There was a problem hiding this comment.
Another consistency nit, we can't decide if it's Spk or SPK.
This adds a new
repo indexsubcommand to spk for index generation and updates. It adds the--use-indexesand--no-indexesflags for repository index usage. This also updatesresolvosolver to get global variables data from an indexed repository. This allowsresolvoto solve without needing to restart its solves.Indexing
The index is designed to help the solvers with solve times. It doesn't contain enough data to help with other spk operations like building and testing a package.
Indexing can be enabled or disabled in the spk config file. If indexing is enabled, you have to generate an index, with
spk repo index, prior to trying to use it. They are not generated on the fly (outside of automated tests).To generate an index (for the origin):
spk repo index --disable-repo localTo update an existing index, e.g. after a new python package was published:
spk repo index --disable-repo local --update pythonThe flatbuffer index data is stored in a file in the underlying spfs repo in a
index/spk/sub-directory.If index use is enabled in the config file, it can be disabled with the
--no-indexescommand line flag. If index use is disabled by default, it can be enbled with the--use-indexesflag. If index use is enabled, but no index has been generated, spk will fallback to using the underlying repo directly (it acts as it would before this change).Speed Diferences
Generating the index file on our repo (sizes below) takes about 2 minutes. Updating a package in an existing index, such as after a new build is published, takes a few seconds.
Sample solver time improvements using this indexing
The numbers come from this setup:
The indexing doesn't have a noticable impact (to users) on smaller solves. But it allows our larger solves to finish in under 10 seconds, or about 1/6th of the time they currently do. The times marked with (*) are improved further by the changes in PR6: (#1344).
This is the final 5 of 5 chained PRs for adding indexes to spk solves:
new_unchecked()constructors to spk schema objects #1337version_filterfield in index schema #1344