Bitwise conversion of doubles using only FP multiplication and addition (2020)

https://news.ycombinator.com/rss Hits: 4

Summary

In the words of Tom Lehrer, “this is completely pointless, but may prove useful to some of you some day, perhaps in a somewhat bizarre set of circumstances.” The problem is as follows: suppose you’re working in a programming environment that provides only an IEEE-754 double-precision floating point (“double”) type, and no operations that can access that type’s representation (such as C++ bitwise cast operations, or Javascript’s DataView object). You have a double and you want to convert it to its bitwise representation as two unsigned 32-bit integers (stored as doubles), or vice versa. This problem comes up from time to time, but I was curious about a different question: how restricted can your programming environment be? Could you do it with just floating point multiplication and addition? Bitwise conversion using floating point operations can be useful in situations like limited interpreted languages, or C++ constexpr contexts. Generally double to int conversion can be done using a binary search, comparing with powers of two to figure out the bits of the exponent. From there the fraction bits can be extracted, either by binary searching more, or using the knowledge of the exponent to scale the fraction bits into the integer range. But can it be done without bitwise operations, branches, exponentiation, division, or floating point comparisons? It seemed improbable at first, but I’ve discovered the answer is yes, multiplication and addition are mostly sufficient, although with a few notable caveats. Even without these restrictions different NaN values cannot be distinguished or generated (without bitwise conversion) in most environments, but using only multiplication and addition it is impossible to convert NaN, Infinity or -Infinity into an unsigned 32-bit value. The other problematic value is “negative zero”, which cannot be differentiated from “positive zero” using addition and multiplication. All my code uses subtraction, although it could be removed by substitu...

First seen: 2026-01-25 22:55

Last seen: 2026-01-26 01:56

Read Full Article More from this Source

Bitwise conversion of doubles using only FP multiplication and addition (2020)

Summary

Related News

Show HN: Elo ranking for landing pages

The future of software engineering is SRE

Turbopack: Building faster by building less

Scientists identify brain waves that define the limits of 'you'

Show HN: Netfence – Like Envoy for eBPF Filters