Every engine, scored on the same script.
No vendor marketing. Ranked on third-party Quality Elo from the Artificial Analysis Speech Arena, with our own measured latency, languages, cloning, price, and rights in the open. Filter by what you are building.
Scores from an independent arena, latencies we measured ourselves, engine costs in the open: this is how you know the routing is on your side, not the vendor’s. We put it all in one place so the trade-offs are yours to judge.
Showing all 5 engines, unfiltered, in leaderboard order (new engines last, unranked).
Latency: our own measurement 2026-06-10 on the same routed path that serves the studio, not a server SLA. Quality Elo: from the Artificial Analysis Speech Arena (full leaderboard below).
Quality Elo from the Artificial Analysis Speech Arena, retrieved June 10, 2026. For context, the top model of all rated is Fun-Realtime-TTS at 1228.06. Latencies are our own measured wall-clock numbers.
* MAI Voice 2: Score is for MAI-Voice-1; MAI-Voice-2 is not yet arena-rated.
* Zonos: Baseline rating with limited arena votes so far.
Latency: our own measurement 2026-06-10 on the same routed path that serves the studio (script: "The northern lights drifted across the sky, slow and silent, like breathing.") - not a server SLA. Quality Elo: third-party, from the Artificial Analysis Speech Arena (see attribution above).
- 973ms
- Fastest engine (Kokoro)
- 2770ms
- Most expressive (Gemini)
- 5
- Engines measured
- 2026-06-10
- Last measured
The same test, every time.
One script, measured the same way across every engine so the numbers are comparable and reproducible.
Step 1: Same script, every engine
Every engine receives the identical passage: "The northern lights drifted across the sky, slow and silent, like breathing." No engine gets a head start.
Step 2: Measure wall-clock latency
We record the time from sending the request to receiving the full audio bytes, three runs per engine. The median is published. Wall-clock time to full audio through our routing gateway, measured locally. Not a server SLA.
Step 3: Quality from a third-party arena
Quality Elo comes from the Artificial Analysis Speech Arena, a user-vote arena where listeners blind-compare engines. We do not score quality ourselves, so no engine grades its own homework.
Step 4: Publish in the open
Results are dated (last run: 2026-06-10) and re-run when engines change. No engine scores its own homework.