Skip to content

DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model#377

Open
JimBobSquarePants wants to merge 201 commits intomainfrom
js/canvas-api
Open

DrawingCanvas API: Replace imperative extension methods with stateful canvas-based drawing model#377
JimBobSquarePants wants to merge 201 commits intomainfrom
js/canvas-api

Conversation

@JimBobSquarePants
Copy link
Copy Markdown
Member

@JimBobSquarePants JimBobSquarePants commented Mar 1, 2026

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following matches the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Breaking Changes: DrawingCanvas API

Fix #106
Fix #244
Fix #344
Fix #367

This is a major breaking change. The library's public API has been completely redesigned around a canvas-based drawing model, replacing the previous collection of imperative extension methods.

What changed

The old API surface — dozens of IImageProcessingContext extension methods like DrawLine(), DrawPolygon(), FillPolygon(), DrawBeziers(), DrawImage(), DrawText(), etc. — has been removed entirely. These methods were individually simple but suffered from several architectural limitations:

  • Each call was an independent image processor that rasterized and composited in isolation, making it impossible to batch or reorder operations.
  • State (blending mode, clip paths, transforms) had to be passed to every single call.
  • There was no way for an alternate rendering backend to intercept or accelerate a sequence of draw calls.

The new model: DrawingCanvas

All drawing now goes through IDrawingCanvas / DrawingCanvas<TPixel>, a stateful canvas that queues draw commands and flushes them as a batch.

Via Image.Mutate() (most common)

using SixLabors.ImageSharp.Drawing;
using SixLabors.ImageSharp.Drawing.Processing;

image.Mutate(ctx => ctx.ProcessWithCanvas(canvas =>
{
    // Fill a path
    canvas.Fill(Brushes.Solid(Color.Red), new EllipsePolygon(200, 200, 100));

    // Stroke a path
    canvas.Draw(Pens.Solid(Color.Blue, 3), new RectangularPolygon(50, 50, 200, 100));

    // Draw a polyline
    canvas.DrawLine(Pens.Solid(Color.Green, 2), new PointF(0, 0), new PointF(100, 100));

    // Draw text
    canvas.DrawText(
        new RichTextOptions(font) { Origin = new PointF(10, 10) },
        "Hello, World!",
        brush: Brushes.Solid(Color.Black),
        pen: null);

    // Draw an image
    canvas.DrawImage(sourceImage, sourceRect, destinationRect);

    // Save/Restore state (options, clip paths)
    canvas.Save(new DrawingOptions
    {
        GraphicsOptions = new GraphicsOptions { BlendPercentage = 0.5f }
    });
    canvas.Fill(brush, path);
    canvas.Restore();

    // Apply arbitrary image processing to a path region
    canvas.Process(path, inner => inner.Brightness(0.5f));

    // Commands are flushed on Dispose (or call canvas.Flush() explicitly)
}));

Standalone usage (without Image.Mutate)

DrawingCanvas<TPixel> can be constructed directly against an image frame:

using var canvas = DrawingCanvas<Rgba32>.FromRootFrame(image, new DrawingOptions());

canvas.Fill(brush, path);
canvas.Draw(pen, path);
canvas.Flush();
using var canvas = DrawingCanvas<Rgba32>.FromImage(image, frameIndex: 0, new DrawingOptions());
// ...
using var canvas = DrawingCanvas<Rgba32>.FromFrame(frame, new DrawingOptions());
// ...

Canvas state management

The canvas supports a save/restore stack (similar to HTML Canvas or SkCanvas):

int saveCount = canvas.Save();             // push current state
canvas.Save(options, clipPath1, clipPath2); // push and replace state

canvas.Restore();              // pop one level
canvas.RestoreTo(saveCount);   // pop to a specific level

State includes DrawingOptions (graphics options, shape options, transform) and clip paths. SaveLayer creates an offscreen layer that composites back on Restore.

IDrawingBackend — bring your own renderer

The library's rasterization and composition pipeline is abstracted behind IDrawingBackend. This interface has the following methods:

Method Purpose
FlushCompositions<TPixel> Flushes queued composition operations for the target.
TryReadRegion<TPixel> Read pixels back from the target (needed for Process() and DrawImage()).

The library ships with DefaultDrawingBackend (CPU, tiled fixed-point rasterizer). An experimental WebGPU compute-shader backend (ImageSharp.Drawing.WebGPU) is also available, demonstrating how alternate backends plug in. Users can provide their own implementations — for example, GPU-accelerated backends, SVG emitters, or recording/replay layers.

Backends are registered on Configuration:

configuration.SetDrawingBackend(myCustomBackend);

Migration guide

Old API New API
ctx.Fill(color, path) ctx.ProcessWithCanvas(c => c.Fill(Brushes.Solid(color), path))
ctx.Fill(brush, path) ctx.ProcessWithCanvas(c => c.Fill(brush, path))
ctx.Draw(pen, path) ctx.ProcessWithCanvas(c => c.Draw(pen, path))
ctx.DrawLine(pen, points) ctx.ProcessWithCanvas(c => c.DrawLine(pen, points))
ctx.DrawPolygon(pen, points) ctx.ProcessWithCanvas(c => c.Draw(pen, new Polygon(new LinearLineSegment(points))))
ctx.FillPolygon(brush, points) ctx.ProcessWithCanvas(c => c.Fill(brush, new Polygon(new LinearLineSegment(points))))
ctx.DrawText(text, font, color, origin) ctx.ProcessWithCanvas(c => c.DrawText(new RichTextOptions(font) { Origin = origin }, text, Brushes.Solid(color), null))
ctx.DrawImage(overlay, opacity) ctx.ProcessWithCanvas(c => c.DrawImage(overlay, sourceRect, destRect))
Multiple independent draw calls Single ProcessWithCanvas block — commands are batched and flushed together

Other breaking changes in this PR

  • AntialiasSubpixelDepth removed — The rasterizer now uses a fixed 256-step (8-bit) subpixel depth. The old AntialiasSubpixelDepth property (default: 16) controlled how many vertical subpixel steps the rasterizer used per pixel row. The new fixed-point scanline rasterizer integrates area/cover analytically per cell rather than sampling at discrete subpixel rows, so the "depth" is a property of the coordinate precision (24.8 fixed-point), not a tunable sample count. 256 steps gives ~0.4% coverage granularity — more than sufficient for all practical use cases. The old default of 16 (~6.25% granularity) could produce visible banding on gentle slopes.
  • GraphicsOptions.Antialias — now controls RasterizationMode (antialiased vs aliased). When false, coverage is snapped to binary using AntialiasThreshold.
  • GraphicsOptions.AntialiasThreshold — new property (0–1, default 0.5) controlling the coverage cutoff in aliased mode. Pixels with coverage at or above this value become fully opaque; pixels below are discarded.

Benchmarks

All benchmarks run under the following environment.

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.26200
Unknown processor
.NET SDK=10.0.103
  [Host] : .NET 8.0.24 (8.0.2426.7010), X64 RyuJIT

Toolchain=InProcessEmitToolchain  InvocationCount=1  IterationCount=40
LaunchCount=3  UnrollFactor=1  WarmupCount=40

DrawPolygonAll - Renders a 7200x4800px path of the state of Mississippi with a 2px stroke.

Method Mean Error StdDev Median Ratio RatioSD
SkiaSharp 42.20 ms 2.197 ms 6.976 ms 38.18 ms 1.00 0.00
SystemDrawing 44.10 ms 0.172 ms 0.538 ms 44.05 ms 1.07 0.16
ImageSharp 12.09 ms 0.083 ms 0.269 ms 12.06 ms 0.29 0.05
ImageSharpWebGPU 12.47 ms 0.291 ms 0.940 ms 12.71 ms 0.30 0.05

FillParis - Renders 1096x1060px scene containing 50K fill paths.

Method Mean Error StdDev Ratio RatioSD
SkiaSharp 104.46 ms 0.356 ms 1.145 ms 1.00 0.00
SystemDrawing 148.53 ms 0.327 ms 1.033 ms 1.42 0.02
ImageSharp 66.32 ms 0.999 ms 3.083 ms 0.64 0.03
ImageSharpWebGPU 41.95 ms 0.457 ms 1.368 ms 0.40 0.01

@antonfirsov
Copy link
Copy Markdown
Member

antonfirsov commented Mar 30, 2026

CPU is now faster than SkiaSharp on my machine

That superiority is only part of the picture because it assumes many available cores and a single user. As always, I'm skeptical about algorithm-level parallelization on CPU because it may actually hurt perf for services under high load. IMO the blog post describing Blaze lacks rigor by omitting the analysis on how the algorithm scales with no. of threads, which makes my skepticism even stronger. The author doesn't seem to consider server-side applications, I assume it's not his area of focus, but for us, the main application is likely still server.

We need an empirical proof that the algorithm has a good parallel efficiency/speedup. Ideally there should be a parametric benchmark showing how the algorithm it scales by adding threads. There should be a sweet spot where efficiency is good enough and I don't think the sweet spot is as high as ProcessorCount because the scaling is rarely linear. If I will be proven wrong, that would be good news of course.

@JimBobSquarePants
Copy link
Copy Markdown
Member Author

CPU is now faster than SkiaSharp on my machine

That superiority is only part of the picture because it assumes many available cores and a single user. As always, I'm skeptical about algorithm-level parallelization on CPU because it may actually hurt perf for services under high load. IMO the blog post describing Blaze lacks rigor by omitting the analysis on how the algorithm scales with no. of threads, which makes my skepticism even stronger. The author doesn't seem to consider server-side applications, I assume it's not his area of focus, but for us, the main application is likely still server.

We need an empirical proof that the algorithm has a good parallel efficiency/speedup. Ideally there should be a parametric benchmark showing how the algorithm it scales by adding threads. There should be a sweet spot where efficiency is good enough and I don't think the sweet spot is as high as ProcessorCount because the scaling is rarely linear. If I will be proven wrong, that would be good news of course.

@antonfirsov

OK... I've wired the backend up so it respects MaxDegreeOfParallelism from the configuration everywhere

I've also removed the multiple parallel steps in the CPU scene builder. There's now only 2 that run always and one optional one when clipping or dashing is required.

  • Command preparation (path clipping, dashing)
  • Geometry linearization (building retained rasterizable data)
  • Row-band execution (rasterization + brush composition)

All three using the same pattern:

int partitionCount = Math.Min(
    workItemCount,
    requestedParallelism == -1 ? Environment.ProcessorCount : requestedParallelism);

Parallel.For(
    0,
    count,
    new ParallelOptions { MaxDegreeOfParallelism = partitionCount },
    ...);

Given that the parallel approach we take is used for almost all image processing operations I think that if this model caused problems under server load, it would have shown up years ago across the entire library.

On top of that I've managed to massively improve performance and reduce the task workload by doing a few things.

  • Fused stroking and transforming with rasterization. We no longer pay the up-front cost of calling the mapping to-and-from clipping library types and stroking during batch preparation. This is true for both GPU and CPU.
  • For CPU I've removed the per-row cost of renting and returning the Vector4 buffer during pixel blending by adding an overload to the API in ImageSharp that allows passing a buffer. Our solid brush does not need a color buffer anymore either.
  • For GPU I've copied the exact same memory setup Vello uses for our initial memory arenas and scale them based on the exact requirements returned by the processor pipeline should the allocation be too small. We also chunk processing for massive scenes. This matches some ongoing work that the Vello team are doing. The arenas are also shared across flushes in a threadsafe manner which improves processing time.

I'm getting good competitive numbers across all test scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants