perf: `fn rav1d_prepare_intra_edges`: pre-slice by ERobsham · Pull Request #1415 · memorysafety/rav1d

ERobsham · 2025-05-25T15:42:39Z

Part of fn prepare_intra_edges is slower in rav1d than dav1d #1394.

rework rav1d_prepare_intra_edges to precalculate and take a slice of the data to be iterated over so we only incur the overhead of taking the DisjointImmutGuard once, which also allows us to safely use the unsafe unsafe fn get_unchecked(..) within the loop avoiding the overhead of any runtime bounds checks, since the slice() opertaion already implcitly bounds checked the slice we're asking for.

inital hyperfine results show a overall improvement (albeit fairly minisucle):

Benchmark 1: ./target/release/dav1d -q -i test-data/Chimera-AV1-10bit-1920x1080-6191kbps.ivf -o /dev/null
  Time (mean ± σ):      9.384 s ±  0.057 s    [User: 108.379 s, System: 6.500 s]
  Range (min … max):    9.342 s …  9.516 s    10 runs

Benchmark 1: ./target.main/release/dav1d -q -i test-data/Chimera-AV1-10bit-1920x1080-6191kbps.ivf -o /dev/null
  Time (mean ± σ):      9.490 s ±  0.076 s    [User: 110.062 s, System: 6.521 s]
  Range (min … max):    9.383 s …  9.606 s    10 runs

(changed version took ~98.88% of the original runtime)

rework `rav1d_prepare_intra_edges` to precalculate and take a slice of the data to be iterated over so we only incur the overhead of taking the `DisjointImmutGuard` once, which also allows us to safely use the unsafe `unsafe fn get_unchecked(..)` within the loop avoiding the overhead of any runtime bounds checks, since the `slice()` opertaion already implcitly bounds checked the slice we're asking for. inital hyperfine results show a overall improvement (albeit fairly minisucle): ``` Benchmark 1: ./target/release/dav1d -q -i test-data/Chimera-AV1-10bit-1920x1080-6191kbps.ivf -o /dev/null Time (mean ± σ): 9.384 s ± 0.057 s [User: 108.379 s, System: 6.500 s] Range (min … max): 9.342 s … 9.516 s 10 runs Benchmark 1: ./target.main/release/dav1d -q -i test-data/Chimera-AV1-10bit-1920x1080-6191kbps.ivf -o /dev/null Time (mean ± σ): 9.490 s ± 0.076 s [User: 110.062 s, System: 6.521 s] Range (min … max): 9.383 s … 9.606 s 10 runs ``` (changed version took ~98.88% of the original runtime)

src/ipred_prepare.rs

ERobsham · 2025-05-25T17:45:47Z

Okay, looks like I missed running the test script with some flags that CI uses, so I'll take a look at that...

However, this errror seems fairly concerning for the general concept of hoisting the borrow out of the loop:

thread 'rav1d-worker-0' panicked at src/disjoint_mut.rs:837:13:
	overlapping DisjointMut:
 current:    & _[734..35296] on ThreadId(2) at src/ipred_prepare.rs:253:22
existing: &mut _[25920..25984] on ThreadId(3) at src/itx.rs:98:33

It seems like maybe there is no getting around taking a DisjointImmutGuard at each iteration?

Although that makes me wonder if there is an intentional mechanism keeping those index() calls from overlapping a current mut borrow that isn't working the same for slice()?

Otherwise, couldn't we still run into the same issue for the index() calls if that other thread is ever allowed to take a mut borrow from the same chunk we're looping over here? (just with a lower likelihood of hitting this case, since its a smaller chunk getting immutably borrowed at once)

fix tests when running with `-n` flag run `cargo fmt`

ERobsham · 2025-05-25T18:25:02Z

Okay, updated to ensure tests pass while running the following locally:

cargo build --release
rm -rf build

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress \
    -n

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress \
    -f 1

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress \
    -f 2

However, Im not seeing the overlap issue locally, so that one is still rather concerning.

kkysen

cargo build --release
rm -rf build

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress \
    -n

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress \
    -f 1

./.github/workflows/test.sh \
    -r target/release/dav1d \
    -s target/release/seek_stress \
    -f 2

However, Im not seeing the overlap issue locally, so that one is still rather concerning.

The overlap checks are only run with debug_assertions on (they should really be on always, but they are very bad for perf, and we don't know a better way to implement it), so you need to test with the opt-dev or checked-release profiles.

kkysen · 2025-05-26T08:51:47Z

src/ipred_prepare.rs

+                    0
+                };
+                let dst_slice =
+                    &*(dst + slice_offset - 1isize).slice::<BD>((px_have - 1) * abs_stride + 1);


The slice here overlaps ones from other threads, as the individual indices accessed are interleaved. That's why .index::<BD>() works, but not a single .slice::<BD>(). It's unfortunately a very tricky problem.

ERobsham mentioned this pull request May 25, 2025

fn prepare_intra_edges is slower in rav1d than dav1d #1394

Open

kkysen reviewed May 25, 2025

View reviewed changes

src/ipred_prepare.rs Outdated Show resolved Hide resolved

fixes + suggestion from PR

46e6d7e

fix tests when running with `-n` flag run `cargo fmt`

kkysen reviewed May 26, 2025

View reviewed changes

kkysen added the performance label Jun 1, 2025

kkysen changed the title ~~perf improvement re: issue #1394~~ perf: fn rav1d_prepare_intra_edges: pre-slice Jun 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: `fn rav1d_prepare_intra_edges`: pre-slice#1415

perf: `fn rav1d_prepare_intra_edges`: pre-slice#1415
ERobsham wants to merge 2 commits intomemorysafety:mainfrom
ERobsham:dev/prepare_intra_edges-perf-test

ERobsham commented May 25, 2025 •

edited by kkysen

Loading

Uh oh!

Uh oh!

ERobsham commented May 25, 2025

Uh oh!

ERobsham commented May 25, 2025 •

edited by kkysen

Loading

Uh oh!

kkysen left a comment

Uh oh!

kkysen May 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ERobsham commented May 25, 2025 • edited by kkysen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ERobsham commented May 25, 2025

Uh oh!

ERobsham commented May 25, 2025 • edited by kkysen Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkysen left a comment

Choose a reason for hiding this comment

Uh oh!

kkysen May 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ERobsham commented May 25, 2025 •

edited by kkysen

Loading

ERobsham commented May 25, 2025 •

edited by kkysen

Loading