A bitwise reproducible deep learning framework

https://news.ycombinator.com/rss Hits: 8
Summary

RepDL: Reproducible Deep Learning This research project is for academic and non-production purposes. Your suggestions and contributions are warmly welcomed. RepDL is a specialized library designed to facilitate reproducible deep learning by guaranteeing bitwise identical outcomes across various hardware platforms for identical training or inference tasks. Citation: @misc{xie_repdl_2025, title = {{RepDL}: {Bit}-level {Reproducible} {Deep} {Learning} {Training} and {Inference}}, url = {https://arxiv.org/abs/2510.09180}, author = {Xie, Peichen and Zhang, Xian and Chen, Shuo}, year = {2025}, note = {arXiv: 2510.09180}, } Get Started Before setting up RepDL, ensure that PyTorch and the corresponding CUDA version are installed on your system. To build and install RepDL, execute the following commands in your terminal: git clone https://github.com/microsoft/RepDL.git cd RepDL pip install . The easiest way to enable reproducible inference for an existing PyTorch model is by using the following code import repdl model = repdl.from_torch_module(model) For reproducible training, refer to the example script located at examples/mnist_training.py. The output of this script is consistent across different devices, as shown below: Hash of the initial model: 2a2d133895b1684e55d0f152ead2914b55adc551d9790a4b4585309b79c60362 Hash of the trained model: 31ee86a7f75dbd76bac22f209617eb7349f93b0ede116dba924f328cac3013f1 Test accuracy of the model on the 10000 test images: 0.9804 Hash of the logits: 0ca34dcd37105b4af690c46a62407b4b9b097d1063864c6d99f4309aeb4bca3e Reproducible Operations, Functions, and Modules Many operations in PyTorch are non-reproducible, even if torch.use_deterministic_algorithms(True) is set. For example, the following operations on a tensor x : torch.mm(x, x) , torch.div(x, 10) , and torch.sqrt(x) , can lead to various results on different devices. However, with RepDL, the equivalent operations: repdl.ops.mm(x, x) , repdl.ops.div(x, 10) , and repdl.ops.sqrt(x) will prod...

First seen: 2025-12-29 04:59

Last seen: 2025-12-29 12:00