Skip to content
New · the open voice benchmark is liveRead it
cantari
LiveLocalize

Bring your audio into another language.

Dubbing and Translation takes spoken audio, transcribes it with a Whisper-class model, translates the script, and re-voices it in a new language on a multilingual engine. Three real steps, chained into one flow and live today.

Painted desk with a shortwave radio, globe, and postcards
Localize
ENES
One take, eight languages

Record once. Ship everywhere.

Dubbing is three real steps. We transcribe your recording, translate the script into the language you pick (you can edit every word before it is voiced), and re-voice it on the multilingual engine.

Your recordingEPISODE-12.MP3

01 Transcribe 02 Translate 03 Re-voice

Españoles

Spanish

Translated + re-voiced

Shown on the homepage

Françaisfr

French

Translated + re-voiced

Deutschde

German

Translated + re-voiced

Italianoit

Italian

Translated + re-voiced

Portuguêspt

Portuguese

Translated + re-voiced

日本語ja

Japanese

Translated + re-voiced

हिन्दीhi

Hindi

Translated + re-voiced

العربيةar

Arabic

Translated + re-voiced

Dubbing carries your words and the read through Gemini Flash, the multilingual engine. It does not lip-sync video, and you review and edit the translated script before anything is re-voiced.

How it works

From a blank script to audio you own.

Step 1: Transcribe the source

Your original audio is turned into text by a Whisper-class model. Edit the transcript before you translate it.

Step 2: Translate the script

The script is translated into one of eight languages, with bracketed [stage directions] kept in place. Review it before voicing.

Step 3: Re-voice it

Gemini Flash, the multilingual engine, speaks the translated script in the new language, in a voice you choose.

Step 4: Export and own

Download the dubbed audio as MP3 or WAV, yours to publish commercially.

Capabilities

What Dubbing & Translation gives you.

One pipeline, three steps

Transcribe, translate, then re-voice, chained into a single flow so you are not stitching three tools together by hand.

Voiced by live engines

The final speech runs on the same real engines as Text to Speech, so the output is genuinely generated voice.

Review before you ship

Designed so you can check the translated script before it is voiced, because a dub is only as good as its words.

Own the result

Every dub you generate is yours to export and publish, with commercial rights and no watermark.

Powered by

The engines behind it.

Gemini FlashExpressive - follows [cues]

The only engine here that acts your bracketed [emotion] directions.

Quality Elo
1225
Latency
2770 ms (measured 2026-06-10)
Languages
24
Rights
Commercial use; outputs are yours
Cue-followingExpressive

Quality Elo from the Artificial Analysis Speech Arena, retrieved June 10, 2026. Latencies are our own real wall-clock numbers.

Live now

Dubbing & Translation is generating in the studio today.

Live today, end to end. A Whisper-class model transcribes the source, a translation model turns the script into the target language, and the final voice runs on Gemini Flash, our multilingual engine (Kokoro, Grok Voice, MAI Voice 2, and Zonos are English-only, so the dub uses Gemini). English-source audio works best. You can edit the text at every step before moving on.

Engine connectedTool live
Open Dubbing
Questions

The honest answers.

What Dubbing & Translation can and cannot do today, in plain language.

Can I dub audio today?
Yes, end to end. Open Dubbing, transcribe a recording, translate the script into one of eight languages, then re-voice it. Every step runs live and you can edit the text before moving on.
How does it work under the hood?
A composite pipeline: a Whisper-class model transcribes the source, a translation model turns the script into the target language, and Gemini Flash voices the result. All three run inside one studio.
Which languages can I dub into?
Eight: Spanish, French, German, Italian, Portuguese, Japanese, Hindi, and Arabic. English-source audio transcribes most reliably, so it makes the best starting point.
Why does the dub only use Gemini Flash?
Because it is the only multilingual engine in Cantari. Kokoro, Grok Voice, MAI Voice 2, and Zonos are English-only today, so offering them for a non-English dub would not be honest. Gemini Flash speaks roughly two dozen languages.
Keep exploring
By language

Eight target languages, live today. Each has its own honest guide to dubbing into it.

Start generating with Dubbing & Translation.

Free to start, no credit meter. Open the console and hear it for yourself.