-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Split runfile tests up to improve helix usage #53452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
marcpopMSFT
wants to merge
4
commits into
main
Choose a base branch
from
marcpopMSFT-improvebuildperf
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
046c304
Split RunFileTests into 5 classes to improve Helix shard distribution
marcpopMSFT ff55904
Save off the performance analysis that was done
marcpopMSFT 4d6a104
Merge remote-tracking branch 'origin/main' into marcpopMSFT-improvebu…
marcpopMSFT 3cbbbc7
Remove unused using directives from split RunFileTests files
marcpopMSFT File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # SDK PR Build Performance Analysis | ||
|
|
||
| ## Overview | ||
|
|
||
| Analysis of the SDK public PR build pipeline (definition 101, `dotnet-sdk-public-ci`) to identify performance improvement opportunities, focusing on Helix test shard distribution. | ||
|
|
||
| ## Pipeline Statistics (20 recent PR builds) | ||
|
|
||
| | Metric | Value | | ||
| |---|---| | ||
| | Execution time p50 | 80.6 min | | ||
| | Execution time mean | 81.2 min | | ||
| | Execution time range | 50–116 min | | ||
| | Queue time p50 | 0.5 min | | ||
| | Queue time mean | 26.2 min (severe outliers) | | ||
|
|
||
| ## Critical Path | ||
|
|
||
| The **TestBuild: linux (x64)** job is consistently the critical path, taking 95.4 min in the slow build vs 56.9 min in the fast build. Within this job, Helix work items run across ~100 machines with ~7.4x parallelism, meaning long individual shards directly extend the critical path. | ||
|
|
||
| ## Key Findings | ||
|
|
||
| ### dotnet.Tests.dll Shard Imbalance | ||
|
|
||
| The `AssemblyScheduler` partitions test assemblies into Helix work items by IL-visible public method count per class, with a limit of 16 methods per shard. It never splits a class across shards. | ||
|
|
||
| **Problem:** `RunFileTests` was a single class with ~150 IL methods. The scheduler placed it entirely in shard 19 alongside `RunCommandTests` (4 methods), creating a 42-minute shard while all other dotnet.Tests.dll shards completed in under 8 minutes. | ||
|
|
||
| | Shard | Duration | Tests | Classes | | ||
| |---|---|---|---| | ||
| | Shard 19 | **42.1 min** | 234 | RunCommandTests, RunFileTests | | ||
| | Shard 11 | 7.0 min | 169 | SdkInfoProviderTests, WorkloadsInfoProviderTests, GivenDotnetPackageAdd | | ||
| | Shard 16 | 4.3 min | 62 | GivenDotnetRootEnv, GivenDotnetRunBuildsCsproj | | ||
| | All others | <3 min each | varies | varies | | ||
|
|
||
| ### Microsoft.NET.Publish.Tests.dll Shard Imbalance | ||
|
|
||
| Similarly, the `GivenThatWeWantToRunILLink` classes dominate: | ||
|
|
||
| | Shard | Duration | Tests | Classes | | ||
| |---|---|---|---| | ||
| | Shard 9 | **21.2 min** | 153 | ILLink2, ILLink3 | | ||
| | Shard 8 | **16.7 min** | 133 | PublishWithIfDifferent, PublishWithoutConflicts, ILLink1 | | ||
| | Shard 5 | 7.9 min | 85 | PublishASelfContainedApp, PublishASingleFileApp | | ||
| | All others | <6 min each | varies | varies | | ||
|
|
||
| ## Improvement Opportunities (Ranked) | ||
|
|
||
| | # | Opportunity | Expected Savings | Effort | | ||
| |---|---|---|---| | ||
| | 1 | Reduce queue time outliers (infra) | ~25 min (p90) | Medium | | ||
| | 2 | **Split RunFileTests into multiple classes** | **~30 min** | **Low** | | ||
| | 3 | Further split ILLink classes | ~15 min | Low | | ||
| | 4 | Add time-aware scheduling to AssemblyScheduler | ~10 min | Medium | | ||
| | 5 | Parallelize restore + build within legs | ~5 min | Medium | | ||
| | 6 | Cache NuGet packages across Helix work items | ~3 min | Medium | | ||
| | 7 | Reduce TestBuild leg count on non-critical platforms | ~10 min | Low | | ||
| | 8 | Skip unchanged test assemblies | Variable | High | | ||
| | 9 | Pre-warm Helix machines | ~2 min | Medium | | ||
| | 10 | Binary log analysis for build-time targets | ~5 min | Medium | | ||
|
|
||
| ## Implementation: RunFileTests Split (Opportunity #2) | ||
|
|
||
| ### Problem | ||
|
|
||
| `RunFileTests` was a 6,210-line class with ~150 public methods. The `AssemblyScheduler` (16-method limit) cannot split within a class, so it all ended up in one shard taking 42 minutes. | ||
|
|
||
| ### Solution | ||
|
|
||
| Split into 5 test classes inheriting from a shared `RunFileTestBase`: | ||
|
|
||
| | Class | Tests | Description | | ||
| |---|---|---| | ||
| | `RunFileTests` | 31 | Path resolution, precedence, stdin, multifile | | ||
| | `RunFileTests_BuildOptions` | 28 | Working dir, Dir.Build.props, arguments, binary log, verbosity, resources | | ||
| | `RunFileTests_BuildCommands` | 28 | Restore, build, publish, pack, clean, launch profiles | | ||
| | `RunFileTests_Directives` | 24 | Defines, package/SDK/project refs, include directive, user secrets, CSC arguments | | ||
| | `RunFileTests_CscOnlyAndApi` | 42 | Up-to-date checks, csc-only mode, API tests, entry points, MSBuild get | | ||
|
|
||
| `RunFileTestBase` holds shared static fields, helper methods (`DirectiveError`, `VerifyBinLogEvaluationDataCount`), and the `Build()` instance method. | ||
|
|
||
| **Total test count preserved:** 153 methods (123 Fact + 27 Theory + 3 PlatformSpecificFact). | ||
|
|
||
| ### Expected Impact | ||
|
|
||
| The single 42-minute shard should distribute across ~5 shards of ~8–10 minutes each, reducing the critical path of **TestBuild: linux (x64)** by approximately 30 minutes. | ||
|
|
||
| ### How the Scheduler Works | ||
|
|
||
| The `AssemblyScheduler` (in `test/HelixTasks/AssemblyScheduler.cs`) scans IL metadata for public test classes and their method counts. Classes are sorted alphabetically, then accumulated into partitions. When the running method count reaches the limit (16 for TestBuild, 32 for FullFramework), a new partition is started. The class that triggers the limit is included in the current partition, not moved to the next. | ||
|
|
||
| `SDKCustomCreateXUnitWorkItemsWithTestExclusion` (in `test/HelixTasks/`) creates Helix work items from these partitions, using `--filter "ClassName1|ClassName2"` for each shard. | ||
|
|
||
| ## Next Steps | ||
|
|
||
| - **Opportunity #3:** Further split `GivenThatWeWantToRunILLink1/2/3` classes in `Microsoft.NET.Publish.Tests` to reduce the 21-minute and 17-minute shards. | ||
| - **Opportunity #4:** Consider adding time-based weighting to the `AssemblyScheduler` to account for tests that are individually slow (e.g., ILLink tests that build and publish full apps). | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,195 @@ | ||
| // Licensed to the .NET Foundation under one or more agreements. | ||
| // The .NET Foundation licenses this file to you under the MIT license. | ||
|
|
||
| using Microsoft.Build.Framework; | ||
| using Microsoft.Build.Logging.StructuredLogger; | ||
| using Microsoft.DotNet.Cli.Commands; | ||
| using Microsoft.DotNet.Cli.Commands.Run; | ||
| using Microsoft.DotNet.Cli.Utils; | ||
| using Microsoft.DotNet.FileBasedPrograms; | ||
| using Microsoft.DotNet.ProjectTools; | ||
|
|
||
| namespace Microsoft.DotNet.Cli.Run.Tests; | ||
|
|
||
| public abstract class RunFileTestBase(ITestOutputHelper log) : SdkTest(log) | ||
| { | ||
| internal static string s_includeExcludeDefaultKnownExtensions | ||
| => field ??= string.Join(", ", CSharpDirective.IncludeOrExclude.DefaultMapping.Select(static e => e.Extension)); | ||
|
|
||
| internal static readonly string s_program = /* lang=C#-Test */ """ | ||
| if (args.Length > 0) | ||
| { | ||
| Console.WriteLine("echo args:" + string.Join(";", args)); | ||
| } | ||
| Console.WriteLine("Hello from " + System.Reflection.Assembly.GetExecutingAssembly().GetName().Name); | ||
| #if !DEBUG | ||
| Console.WriteLine("Release config"); | ||
| #endif | ||
| #if CUSTOM_DEFINE | ||
| Console.WriteLine("Custom define"); | ||
| #endif | ||
| """; | ||
|
|
||
| internal static readonly string s_programDependingOnUtil = /* lang=C#-Test */ """ | ||
| if (args.Length > 0) | ||
| { | ||
| Console.WriteLine("echo args:" + string.Join(";", args)); | ||
| } | ||
| Console.WriteLine("Hello, " + Util.GetMessage()); | ||
| """; | ||
|
|
||
| internal static readonly string s_util = /* lang=C#-Test */ """ | ||
| static class Util | ||
| { | ||
| public static string GetMessage() | ||
| { | ||
| return "String from Util"; | ||
| } | ||
| } | ||
| """; | ||
|
|
||
| internal static readonly string s_programReadingEmbeddedResource = /* lang=C#-Test */ """ | ||
| var assembly = System.Reflection.Assembly.GetExecutingAssembly(); | ||
| var resourceName = assembly.GetManifestResourceNames().SingleOrDefault(); | ||
|
|
||
| if (resourceName is null) | ||
| { | ||
| Console.WriteLine("Resource not found"); | ||
| return; | ||
| } | ||
|
|
||
| using var stream = assembly.GetManifestResourceStream(resourceName)!; | ||
| using var reader = new System.Resources.ResourceReader(stream); | ||
| Console.WriteLine(reader.Cast<System.Collections.DictionaryEntry>().Single()); | ||
| """; | ||
|
|
||
| internal static readonly string s_resx = """ | ||
| <root> | ||
| <data name="MyString"> | ||
| <value>TestValue</value> | ||
| </data> | ||
| </root> | ||
| """; | ||
|
|
||
| internal static readonly string s_consoleProject = $""" | ||
| <Project Sdk="Microsoft.NET.Sdk"> | ||
| <PropertyGroup> | ||
| <OutputType>Exe</OutputType> | ||
| <TargetFramework>{ToolsetInfo.CurrentTargetFramework}</TargetFramework> | ||
| <ImplicitUsings>enable</ImplicitUsings> | ||
| </PropertyGroup> | ||
| </Project> | ||
| """; | ||
|
|
||
| internal static readonly string s_launchSettings = """ | ||
| { | ||
| "profiles": { | ||
| "TestProfile1": { | ||
| "commandName": "Project", | ||
| "environmentVariables": { | ||
| "Message": "TestProfileMessage1" | ||
| } | ||
| }, | ||
| "TestProfile2": { | ||
| "commandName": "Project", | ||
| "environmentVariables": { | ||
| "Message": "TestProfileMessage2" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| """; | ||
|
|
||
| /// <summary> | ||
| /// Used when we need an out-of-tree base test directory to avoid having implicit build files | ||
| /// like Directory.Build.props in scope and negating the optimizations we want to test. | ||
| /// </summary> | ||
| internal static string OutOfTreeBaseDirectory => field ??= PrepareOutOfTreeBaseDirectory(); | ||
|
|
||
| internal static bool HasCaseInsensitiveFileSystem | ||
| { | ||
| get | ||
| { | ||
| return RuntimeInformation.IsOSPlatform(OSPlatform.Windows) | ||
| || RuntimeInformation.IsOSPlatform(OSPlatform.OSX); | ||
| } | ||
| } | ||
|
|
||
| /// <inheritdoc cref="OutOfTreeBaseDirectory"/> | ||
| private static string PrepareOutOfTreeBaseDirectory() | ||
| { | ||
| string outOfTreeBaseDirectory = TestPathUtility.ResolveTempPrefixLink(Path.Join(Path.GetTempPath(), "dotnetSdkTests")); | ||
| Directory.CreateDirectory(outOfTreeBaseDirectory); | ||
|
|
||
| // Create NuGet.config in our out-of-tree base directory. | ||
| var sourceNuGetConfig = Path.Join(SdkTestContext.Current.TestExecutionDirectory, "NuGet.config"); | ||
| var targetNuGetConfig = Path.Join(outOfTreeBaseDirectory, "NuGet.config"); | ||
| File.Copy(sourceNuGetConfig, targetNuGetConfig, overwrite: true); | ||
|
|
||
| // Check there are no implicit build files that would prevent testing optimizations. | ||
| VirtualProjectBuildingCommand.CollectImplicitBuildFiles(new DirectoryInfo(outOfTreeBaseDirectory), [], out var exampleMSBuildFile); | ||
| exampleMSBuildFile.Should().BeNull(because: "there should not be any implicit build files in the temp directory or its parents " + | ||
| "so we can test optimizations that would be disabled with implicit build files present"); | ||
|
|
||
| return outOfTreeBaseDirectory; | ||
| } | ||
|
|
||
| internal static string DirectiveError(string path, int line, string messageFormat, params ReadOnlySpan<object> args) | ||
| { | ||
| return $"{path}({line}): {FileBasedProgramsResources.DirectiveError}: {string.Format(messageFormat, args)}"; | ||
| } | ||
|
|
||
| internal static void VerifyBinLogEvaluationDataCount(string binaryLogPath, int expectedCount) | ||
| { | ||
| var records = BinaryLog.ReadRecords(binaryLogPath).ToList(); | ||
| records.Count(static r => r.Args is ProjectEvaluationStartedEventArgs).Should().Be(expectedCount); | ||
| records.Count(static r => r.Args is ProjectEvaluationFinishedEventArgs).Should().Be(expectedCount); | ||
| } | ||
|
|
||
| private protected void Build( | ||
| TestDirectory testInstance, | ||
| BuildLevel expectedLevel, | ||
| ReadOnlySpan<string> args = default, | ||
| string expectedOutput = "Hello from Program", | ||
| string programFileName = "Program.cs", | ||
| string? workDir = null, | ||
| Func<TestCommand, TestCommand>? customizeCommand = null) | ||
| { | ||
| string prefix = expectedLevel switch | ||
| { | ||
| BuildLevel.None => CliCommandStrings.NoBinaryLogBecauseUpToDate + Environment.NewLine, | ||
| BuildLevel.Csc => CliCommandStrings.NoBinaryLogBecauseRunningJustCsc + Environment.NewLine, | ||
| BuildLevel.All => string.Empty, | ||
| _ => throw new ArgumentOutOfRangeException(paramName: nameof(expectedLevel)), | ||
| }; | ||
|
|
||
| var command = new DotnetCommand(Log, ["run", programFileName, "-bl", .. args]) | ||
| .WithWorkingDirectory(workDir ?? testInstance.Path); | ||
|
|
||
| if (customizeCommand != null) | ||
| { | ||
| command = customizeCommand(command); | ||
| } | ||
|
|
||
| command.Execute() | ||
| .Should().Pass() | ||
| .And.HaveStdOut(prefix + expectedOutput); | ||
|
|
||
| var binlogs = new DirectoryInfo(workDir ?? testInstance.Path) | ||
| .EnumerateFiles("*.binlog", SearchOption.TopDirectoryOnly); | ||
|
|
||
| binlogs.Select(f => f.Name) | ||
| .Should().BeEquivalentTo( | ||
| expectedLevel switch | ||
| { | ||
| BuildLevel.None or BuildLevel.Csc => [], | ||
| BuildLevel.All => ["msbuild.binlog"], | ||
| _ => throw new ArgumentOutOfRangeException(paramName: nameof(expectedLevel), message: expectedLevel.ToString()), | ||
| }); | ||
|
|
||
| foreach (var binlog in binlogs) | ||
| { | ||
| binlog.Delete(); | ||
| } | ||
| } | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this doc file be committed given it's probably outdated now?