This is a follow-up to making the rav1d video decoder 1% faster, where we compared profiler snapshots of rav1d (the Rust implementation) and dav1d (the C baseline) to find specific functions that were slower in the Rust implementation.Today, we are going to pay off a small debt from that post: since dav1d and rav1d share the same hand-written assembly functions, we used them as anchors to navigate the different implementations - they, at least, should match exactly! And they did. Well, almost all of them did.This, dear reader, is the story of the one function that didnât.An OverviewWeâll need to ask - and answer! - three âWhysâ today:Using the same techniques from last time, weâll see that a specific assembly function is, indeed, slower in the Rust version.But why? âĄď¸ Because loading data in the Rust version is slower, which we discover using samplyâs special asm view. 1But why? âĄď¸ Because the Rust version stores much more data on the stack, which we find by playing with some arguments and looking at the generated LLVM IR. 2But why? âĄď¸ Because the compiler cannot optimize away a specific Rust abstraction across function pointers! 3Which we fix by switching to a more compiler-friendly version (PR). 4Side note: again, weâll be running all these benchmarks on a MacBook, so our tools are a tad limited and weâll have to resort to some guesswork. Leave a comment if you know more - or, even better, write an article about profiling on macOS đđ¨.Discuss on r/rust, lobsters, HN! đfilter4_pri_edged_8bpcLetâs rerun the benchmark after the previous postâs changes:./rav1d $ git checkout cfd3f59 && cargo build --release ./rav1d $ sudo samply record ./target/release/dav1d -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 1 Weâll switch to the inverted call stack view and filter for the cdef_ functions, resulting in the following clippings. The assembly functions are the ones with the _neon suffix.On the left is dav1d (C), and on the right rav1d (Rust):On the top i...
First seen: 2025-12-29 21:01
Last seen: 2025-12-29 23:02