Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions documentation/general/pr-build-performance-analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# SDK PR Build Performance Analysis

## Overview

Analysis of the SDK public PR build pipeline (definition 101, `dotnet-sdk-public-ci`) to identify performance improvement opportunities, focusing on Helix test shard distribution.

## Pipeline Statistics (20 recent PR builds)

| Metric | Value |
|---|---|
| Execution time p50 | 80.6 min |
| Execution time mean | 81.2 min |
| Execution time range | 50–116 min |
| Queue time p50 | 0.5 min |
| Queue time mean | 26.2 min (severe outliers) |

## Critical Path

The **TestBuild: linux (x64)** job is consistently the critical path, taking 95.4 min in the slow build vs 56.9 min in the fast build. Within this job, Helix work items run across ~100 machines with ~7.4x parallelism, meaning long individual shards directly extend the critical path.

## Key Findings

### dotnet.Tests.dll Shard Imbalance

The `AssemblyScheduler` partitions test assemblies into Helix work items by IL-visible public method count per class, with a limit of 16 methods per shard. It never splits a class across shards.

**Problem:** `RunFileTests` was a single class with ~150 IL methods. The scheduler placed it entirely in shard 19 alongside `RunCommandTests` (4 methods), creating a 42-minute shard while all other dotnet.Tests.dll shards completed in under 8 minutes.

| Shard | Duration | Tests | Classes |
|---|---|---|---|
| Shard 19 | **42.1 min** | 234 | RunCommandTests, RunFileTests |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this doc file be committed given it's probably outdated now?

| Shard 11 | 7.0 min | 169 | SdkInfoProviderTests, WorkloadsInfoProviderTests, GivenDotnetPackageAdd |
| Shard 16 | 4.3 min | 62 | GivenDotnetRootEnv, GivenDotnetRunBuildsCsproj |
| All others | <3 min each | varies | varies |

### Microsoft.NET.Publish.Tests.dll Shard Imbalance

Similarly, the `GivenThatWeWantToRunILLink` classes dominate:

| Shard | Duration | Tests | Classes |
|---|---|---|---|
| Shard 9 | **21.2 min** | 153 | ILLink2, ILLink3 |
| Shard 8 | **16.7 min** | 133 | PublishWithIfDifferent, PublishWithoutConflicts, ILLink1 |
| Shard 5 | 7.9 min | 85 | PublishASelfContainedApp, PublishASingleFileApp |
| All others | <6 min each | varies | varies |

## Improvement Opportunities (Ranked)

| # | Opportunity | Expected Savings | Effort |
|---|---|---|---|
| 1 | Reduce queue time outliers (infra) | ~25 min (p90) | Medium |
| 2 | **Split RunFileTests into multiple classes** | **~30 min** | **Low** |
| 3 | Further split ILLink classes | ~15 min | Low |
| 4 | Add time-aware scheduling to AssemblyScheduler | ~10 min | Medium |
| 5 | Parallelize restore + build within legs | ~5 min | Medium |
| 6 | Cache NuGet packages across Helix work items | ~3 min | Medium |
| 7 | Reduce TestBuild leg count on non-critical platforms | ~10 min | Low |
| 8 | Skip unchanged test assemblies | Variable | High |
| 9 | Pre-warm Helix machines | ~2 min | Medium |
| 10 | Binary log analysis for build-time targets | ~5 min | Medium |

## Implementation: RunFileTests Split (Opportunity #2)

### Problem

`RunFileTests` was a 6,210-line class with ~150 public methods. The `AssemblyScheduler` (16-method limit) cannot split within a class, so it all ended up in one shard taking 42 minutes.

### Solution

Split into 5 test classes inheriting from a shared `RunFileTestBase`:

| Class | Tests | Description |
|---|---|---|
| `RunFileTests` | 31 | Path resolution, precedence, stdin, multifile |
| `RunFileTests_BuildOptions` | 28 | Working dir, Dir.Build.props, arguments, binary log, verbosity, resources |
| `RunFileTests_BuildCommands` | 28 | Restore, build, publish, pack, clean, launch profiles |
| `RunFileTests_Directives` | 24 | Defines, package/SDK/project refs, include directive, user secrets, CSC arguments |
| `RunFileTests_CscOnlyAndApi` | 42 | Up-to-date checks, csc-only mode, API tests, entry points, MSBuild get |

`RunFileTestBase` holds shared static fields, helper methods (`DirectiveError`, `VerifyBinLogEvaluationDataCount`), and the `Build()` instance method.

**Total test count preserved:** 153 methods (123 Fact + 27 Theory + 3 PlatformSpecificFact).

### Expected Impact

The single 42-minute shard should distribute across ~5 shards of ~8–10 minutes each, reducing the critical path of **TestBuild: linux (x64)** by approximately 30 minutes.

### How the Scheduler Works

The `AssemblyScheduler` (in `test/HelixTasks/AssemblyScheduler.cs`) scans IL metadata for public test classes and their method counts. Classes are sorted alphabetically, then accumulated into partitions. When the running method count reaches the limit (16 for TestBuild, 32 for FullFramework), a new partition is started. The class that triggers the limit is included in the current partition, not moved to the next.

`SDKCustomCreateXUnitWorkItemsWithTestExclusion` (in `test/HelixTasks/`) creates Helix work items from these partitions, using `--filter "ClassName1|ClassName2"` for each shard.

## Next Steps

- **Opportunity #3:** Further split `GivenThatWeWantToRunILLink1/2/3` classes in `Microsoft.NET.Publish.Tests` to reduce the 21-minute and 17-minute shards.
- **Opportunity #4:** Consider adding time-based weighting to the `AssemblyScheduler` to account for tests that are individually slow (e.g., ILLink tests that build and publish full apps).
195 changes: 195 additions & 0 deletions test/dotnet.Tests/CommandTests/Run/RunFileTestBase.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using Microsoft.Build.Framework;
using Microsoft.Build.Logging.StructuredLogger;
using Microsoft.DotNet.Cli.Commands;
using Microsoft.DotNet.Cli.Commands.Run;
using Microsoft.DotNet.Cli.Utils;
using Microsoft.DotNet.FileBasedPrograms;
using Microsoft.DotNet.ProjectTools;

namespace Microsoft.DotNet.Cli.Run.Tests;

public abstract class RunFileTestBase(ITestOutputHelper log) : SdkTest(log)
{
internal static string s_includeExcludeDefaultKnownExtensions
=> field ??= string.Join(", ", CSharpDirective.IncludeOrExclude.DefaultMapping.Select(static e => e.Extension));

internal static readonly string s_program = /* lang=C#-Test */ """
if (args.Length > 0)
{
Console.WriteLine("echo args:" + string.Join(";", args));
}
Console.WriteLine("Hello from " + System.Reflection.Assembly.GetExecutingAssembly().GetName().Name);
#if !DEBUG
Console.WriteLine("Release config");
#endif
#if CUSTOM_DEFINE
Console.WriteLine("Custom define");
#endif
""";

internal static readonly string s_programDependingOnUtil = /* lang=C#-Test */ """
if (args.Length > 0)
{
Console.WriteLine("echo args:" + string.Join(";", args));
}
Console.WriteLine("Hello, " + Util.GetMessage());
""";

internal static readonly string s_util = /* lang=C#-Test */ """
static class Util
{
public static string GetMessage()
{
return "String from Util";
}
}
""";

internal static readonly string s_programReadingEmbeddedResource = /* lang=C#-Test */ """
var assembly = System.Reflection.Assembly.GetExecutingAssembly();
var resourceName = assembly.GetManifestResourceNames().SingleOrDefault();

if (resourceName is null)
{
Console.WriteLine("Resource not found");
return;
}

using var stream = assembly.GetManifestResourceStream(resourceName)!;
using var reader = new System.Resources.ResourceReader(stream);
Console.WriteLine(reader.Cast<System.Collections.DictionaryEntry>().Single());
""";

internal static readonly string s_resx = """
<root>
<data name="MyString">
<value>TestValue</value>
</data>
</root>
""";

internal static readonly string s_consoleProject = $"""
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>{ToolsetInfo.CurrentTargetFramework}</TargetFramework>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>
</Project>
""";

internal static readonly string s_launchSettings = """
{
"profiles": {
"TestProfile1": {
"commandName": "Project",
"environmentVariables": {
"Message": "TestProfileMessage1"
}
},
"TestProfile2": {
"commandName": "Project",
"environmentVariables": {
"Message": "TestProfileMessage2"
}
}
}
}
""";

/// <summary>
/// Used when we need an out-of-tree base test directory to avoid having implicit build files
/// like Directory.Build.props in scope and negating the optimizations we want to test.
/// </summary>
internal static string OutOfTreeBaseDirectory => field ??= PrepareOutOfTreeBaseDirectory();

internal static bool HasCaseInsensitiveFileSystem
{
get
{
return RuntimeInformation.IsOSPlatform(OSPlatform.Windows)
|| RuntimeInformation.IsOSPlatform(OSPlatform.OSX);
}
}

/// <inheritdoc cref="OutOfTreeBaseDirectory"/>
private static string PrepareOutOfTreeBaseDirectory()
{
string outOfTreeBaseDirectory = TestPathUtility.ResolveTempPrefixLink(Path.Join(Path.GetTempPath(), "dotnetSdkTests"));
Directory.CreateDirectory(outOfTreeBaseDirectory);

// Create NuGet.config in our out-of-tree base directory.
var sourceNuGetConfig = Path.Join(SdkTestContext.Current.TestExecutionDirectory, "NuGet.config");
var targetNuGetConfig = Path.Join(outOfTreeBaseDirectory, "NuGet.config");
File.Copy(sourceNuGetConfig, targetNuGetConfig, overwrite: true);

// Check there are no implicit build files that would prevent testing optimizations.
VirtualProjectBuildingCommand.CollectImplicitBuildFiles(new DirectoryInfo(outOfTreeBaseDirectory), [], out var exampleMSBuildFile);
exampleMSBuildFile.Should().BeNull(because: "there should not be any implicit build files in the temp directory or its parents " +
"so we can test optimizations that would be disabled with implicit build files present");

return outOfTreeBaseDirectory;
}

internal static string DirectiveError(string path, int line, string messageFormat, params ReadOnlySpan<object> args)
{
return $"{path}({line}): {FileBasedProgramsResources.DirectiveError}: {string.Format(messageFormat, args)}";
}

internal static void VerifyBinLogEvaluationDataCount(string binaryLogPath, int expectedCount)
{
var records = BinaryLog.ReadRecords(binaryLogPath).ToList();
records.Count(static r => r.Args is ProjectEvaluationStartedEventArgs).Should().Be(expectedCount);
records.Count(static r => r.Args is ProjectEvaluationFinishedEventArgs).Should().Be(expectedCount);
}

private protected void Build(
TestDirectory testInstance,
BuildLevel expectedLevel,
ReadOnlySpan<string> args = default,
string expectedOutput = "Hello from Program",
string programFileName = "Program.cs",
string? workDir = null,
Func<TestCommand, TestCommand>? customizeCommand = null)
{
string prefix = expectedLevel switch
{
BuildLevel.None => CliCommandStrings.NoBinaryLogBecauseUpToDate + Environment.NewLine,
BuildLevel.Csc => CliCommandStrings.NoBinaryLogBecauseRunningJustCsc + Environment.NewLine,
BuildLevel.All => string.Empty,
_ => throw new ArgumentOutOfRangeException(paramName: nameof(expectedLevel)),
};

var command = new DotnetCommand(Log, ["run", programFileName, "-bl", .. args])
.WithWorkingDirectory(workDir ?? testInstance.Path);

if (customizeCommand != null)
{
command = customizeCommand(command);
}

command.Execute()
.Should().Pass()
.And.HaveStdOut(prefix + expectedOutput);

var binlogs = new DirectoryInfo(workDir ?? testInstance.Path)
.EnumerateFiles("*.binlog", SearchOption.TopDirectoryOnly);

binlogs.Select(f => f.Name)
.Should().BeEquivalentTo(
expectedLevel switch
{
BuildLevel.None or BuildLevel.Csc => [],
BuildLevel.All => ["msbuild.binlog"],
_ => throw new ArgumentOutOfRangeException(paramName: nameof(expectedLevel), message: expectedLevel.ToString()),
});

foreach (var binlog in binlogs)
{
binlog.Delete();
}
}
}
Loading
Loading