Why is calling my asm function from Rust slower than calling it from C?

https://news.ycombinator.com/rss Hits: 3
Summary

This is a follow-up to making the rav1d video decoder 1% faster, where we compared profiler snapshots of rav1d (the Rust implementation) and dav1d (the C baseline) to find specific functions that were slower in the Rust implementation.Today, we are going to pay off a small debt from that post: since dav1d and rav1d share the same hand-written assembly functions, we used them as anchors to navigate the different implementations - they, at least, should match exactly! And they did. Well, almost all of them did.This, dear reader, is the story of the one function that didn’t.An OverviewWe’ll need to ask - and answer! - three ‘Whys’ today:Using the same techniques from last time, we’ll see that a specific assembly function is, indeed, slower in the Rust version.But why? ➡️ Because loading data in the Rust version is slower, which we discover using samply’s special asm view. 1But why? ➡️ Because the Rust version stores much more data on the stack, which we find by playing with some arguments and looking at the generated LLVM IR. 2But why? ➡️ Because the compiler cannot optimize away a specific Rust abstraction across function pointers! 3Which we fix by switching to a more compiler-friendly version (PR). 4Side note: again, we’ll be running all these benchmarks on a MacBook, so our tools are a tad limited and we’ll have to resort to some guesswork. Leave a comment if you know more - or, even better, write an article about profiling on macOS 🍎💨.Discuss on r/rust, lobsters, HN! 👋filter4_pri_edged_8bpcLet’s rerun the benchmark after the previous post’s changes:./rav1d $ git checkout cfd3f59 && cargo build --release ./rav1d $ sudo samply record ./target/release/dav1d -q -i Chimera-AV1-8bit-1920x1080-6736kbps.ivf -o /dev/null --threads 1 We’ll switch to the inverted call stack view and filter for the cdef_ functions, resulting in the following clippings. The assembly functions are the ones with the _neon suffix.On the left is dav1d (C), and on the right rav1d (Rust):On the top i...

First seen: 2025-12-29 21:01

Last seen: 2025-12-29 23:02