Zonos Beta Release: Open-Source Voice Cloning by Zyphra

3 min read Zyphra launches **Zonos-v0.1**, an open-source voice cloning model with **high-fidelity speech, real-time performance, and multilingual support**. It can clone voices with **5-30s of audio** and runs efficiently on **RTX 4090 GPUs**. A major step for AI voice synthesis. February 12, 2025 13:24 Zonos Beta Release: Open-Source Voice Cloning by Zyphra


Zyphra has introduced Zonos-v0.1, an open-source voice cloning model under the Apache 2.0 license, offering high-fidelity speech synthesis with multilingual support. Here’s what you need to know:


Key Highlights


1️⃣ Models & Architecture

  • Two models: A 1.6B-parameter transformer model and a hybrid model of similar size.
  • Available under Apache 2.0, making them free for developers and researchers.


2️⃣ Training & Multilingual Capabilities

  • Trained on 200,000 hours of speech data across multiple languages.
  • Predominantly English, but also supports Chinese, Japanese, French, Spanish, and German.


3️⃣ Core Features

  • Voice Cloning: Requires just 5 to 30 seconds of sample audio for cloning.
  • Expressiveness: Controls for speaking rate, pitch, audio quality, and emotions like happiness, anger, fear, sadness, and surprise.
  • Native 44 kHz Audio Quality for realistic speech generation.


4️⃣ Performance & Efficiency

  • Optimized for real-time applications with a latency of 200-300ms.
  • Runs efficiently on NVIDIA RTX 4090 GPUs, with a real-time factor above 1.


5️⃣ Accessibility & Deployment

  • Available on Hugging Face, with model weights for both architectures.
  • Supports a Gradio-based UI, Docker setup, and an API for cloud-based use.


6️⃣ Feedback & Limitations

  • Praised for high-quality output and expressiveness, but some users report occasional audio artifacts and alignment issues.
  • Zyphra plans future updates to enhance language support, pronunciation accuracy, emotional control, and inference efficiency.


7️⃣ Community & Reception

  • Posts on X (formerly Twitter) show strong enthusiasm, particularly around its open-source nature and potential for developers.


Why It Matters

Zonos-v0.1 represents a major step forward in open-source voice cloning, offering tools that rival proprietary models. With its high-quality speech synthesis, low-latency performance, and multilingual capabilities, it opens new possibilities for TTS applications, AI-powered assistants, and creative content development.

For developers and researchers in text-to-speech and AI voice cloning, this is an exciting opportunity to contribute, test, and build upon a truly open AI model.

User Comments (0)

Add Comment
We'll never share your email with anyone else.

img