The engines behind the layer.
Every model we route to, presented as a card: specs, scores, rights, and a preview sample. Self-hosted engines plug in the same way.
We publish each engine’s real cost for one reason: so you can see we route on merit and charge one flat price. The raw rates are here in the open, the numbers we pay underneath, so you never have to take our word for what is fair.
- 5
- Real engines today
- $0.62/M
- Cheapest engine (Kokoro)
- 973ms
- Fastest measured latency
- 1
- Endpoint for all engines
The whole roster on one line.
Every engine against the same rows, so the trade-offs are obvious before you read a single card. Press play in any column to hear it.
* Quality Elo from the Artificial Analysis Speech Arena, retrieved June 10, 2026. It is a user-vote arena rating; the top model of all rated is Fun-Realtime-TTS at 1228.06. Latency figures are our own measured wall-clock numbers, measured 2026-06-10on the same routed path that serves the studio, not a server SLA. Engine rates are the providers’ published list rates, checked 2026-06-11; when a provider moves a rate, we update it here.
* MAI Voice 2: Score is for MAI-Voice-1; MAI-Voice-2 is not yet arena-rated.
* Zonos: Baseline rating with limited arena votes so far.
Every engine, side by side.
Specs, rights, sample voices, and pricing shown on every card. Compare them against the open benchmark before you pick.
The only engine here that acts your bracketed [emotion] directions.
- Quality Elo
- 1225
- Latency
- 2770 ms (measured 2026-06-10)
- Languages
- 24
- Price
- $1/M in + $20/M out
- Rights
- Commercial use; outputs are yours
Cheapest. Clean, plain read. Ignores cues.
- Quality Elo
- 1060
- Latency
- 973 ms (measured 2026-06-10)
- Languages
- 8
- Price
- $0.62/M in · $0 out
- Rights
- Apache-2.0 model; commercial OK
xAI voice with 5 personas. Plain read, ignores cues.
- Quality Elo
- 1197
- Latency
- 2444 ms (measured 2026-06-10)
- Languages
- 1
- Price
- $15/M in · $0 out
- Rights
- Commercial use; outputs are yours
Microsoft voice with real style and speed controls.
- Quality Elo
- 1007 *
- Latency
- 2426 ms (measured 2026-06-10)
- Languages
- 1
- Price
- $22/M in · $0 out
- Rights
- Commercial use; outputs are yours
Open-weight Zyphra engine with four accent voices.
- Quality Elo
- 1000 *
- Latency
- 4523 ms (measured 2026-06-10)
- Languages
- 1
- Price
- $7/M in · $0 out
- Rights
- Apache-2.0 model; commercial OK
One endpoint. Every engine.
- Single POST /api/speech endpoint for all engines
- Switch engines with one field change
- API key managed server-side only
POST /api/speech
Swap engines with one field - key stays server-side
Hear the engines for yourself.
Open the console and generate real audio through each engine. No sign-up required to listen.