Latest AI news, expert analysis, bold opinions, and key trends — delivered to your inbox.
ElevenLabs, a company widely recognized for its advancements in AI-generated voice technology, has made its foray into the speech-to-text (STT) market with the launch of Scribe on February 26, 2025. This move marks a significant expansion of its portfolio, leveraging its expertise in audio AI to compete in a space dominated by established players like OpenAI’s Whisper, Google’s Gemini 2.0 Flash, and Deepgram’s Nova-3.
At launch, Scribe has demonstrated strong benchmark results, with an accuracy rate of 96.7% for English and 98.7% for Italian, based on tests like FLEURS and Common Voice. It also offers support for 99 languages, including those less commonly covered by existing models, such as Serbian, Cantonese, and Malayalam. Additionally, its ability to reduce word error rates (WER) in these underserved languages has been positioned as a competitive edge.
Beyond accuracy, Scribe introduces features that extend beyond basic transcription. These include speaker diarization (supporting up to 32 speakers in a single audio file), word-level timestamps, and tagging of non-speech events such as laughter and applause. Such functionalities cater to various industries, from media and entertainment to business transcription and research.
Scribe enters a crowded and rapidly evolving STT market, where factors like accuracy, cost, language support, and ease of integration define success. At a pricing of $0.40 per hour of input audio (with a 50% discount for early adopters), it aligns competitively with existing solutions but remains subject to cost-benefit comparisons with lower-priced alternatives.
While Scribe is currently optimized for pre-recorded audio, ElevenLabs has already hinted at plans for a low-latency, real-time version, which could broaden its appeal for live applications such as customer service, closed captioning, and real-time transcription in media broadcasting.
Since Scribe was just launched, reactions are still emerging. However, early tech discussions and industry analysts have pointed to mixed perspectives:
Scribe’s launch represents a significant step forward for ElevenLabs as it moves beyond text-to-speech into speech-to-text technology. The company’s reputation, coupled with a recent $3.3 billion valuation and $180 million funding round, suggests it has the resources to challenge incumbents in the space.
However, the true test of its impact will lie in adoption rates, long-term performance across diverse real-world use cases, and how well it addresses privacy concerns and cost-effectiveness. As more users put Scribe to the test, the industry will gain a clearer picture of whether it stands as a transformative STT solution or simply a strong competitor in an already crowded field.
A virtual event with ElevenLabs' development team is scheduled for next week, which may provide further insights into the model’s design, future roadmap, and user feedback.
For now, Scribe holds promise, but its long-term success will depend on real-world testing, adoption rates, and how well it balances innovation with practical user needs.