Sopro TTS: A 169M model with zero-shot voice cloning that runs on the CPU

https://news.ycombinator.com/rss Hits: 23
Summary

sopro_readme.mp4 Sopro TTS Sopro (from the Portuguese word for “breath/blow”) is a lightweight English text-to-speech model I trained as a side project. Sopro is composed of dilated convs (à la WaveNet) and lightweight cross-attention layers, instead of the common Transformer architecture. Even though Sopro is not SOTA across most voices and situations, I still think it’s a cool project made with a very low budget (trained on a single L40S GPU), and it can be improved with better data. Some of the main features are: 169M parameters Streaming Zero-shot voice cloning 0.25 RTF on CPU (measured on an M3 base model), meaning it generates 30 seconds of audio in 7.5 seconds (measured on an M3 base model), meaning it generates 30 seconds of audio in 7.5 seconds 3-12 seconds of reference audio for voice cloning Instructions I only pinned the minimum dependency versions so you can install the package without having to create a separate env. However, some versions of Torch work best. For example, on my M3 CPU, torch==2.6.0 (without torchvision ) achieves ~3× more performance. (Optional) conda create -n soprotts python=3.10 conda activate soprotts From PyPI pip install sopro From the repo git clone https://github.com/samuel-vitorino/sopro cd sopro pip install -e . Examples CLI soprotts \ --text " Sopro is a lightweight 169 million parameter text-to-speech model. Some of the main features are streaming, zero-shot voice cloning, and 0.25 real-time factor on the CPU. " \ --ref_audio ref.wav \ --out out.wav You have the expected temperature and top_p parameters, alongside: --style_strength (controls the FiLM strength; increasing it can improve or reduce voice similarity; default 1.0 ) (controls the FiLM strength; increasing it can improve or reduce voice similarity; default ) --no_stop_head to disable early stopping to disable early stopping --stop_threshold and --stop_patience (number of consecutive frames that must be classified as final before stopping). For short sentences, the...

First seen: 2026-01-08 21:49

Last seen: 2026-01-09 19:52