Skip to content
New · the open voice benchmark is liveRead it
cantari
Tools

How to dub audio into another language

Transcribe a recording, translate the script, and re-voice it in one of 8 languages. Three real steps with an editable transcript between each.

Updated June 11, 2026

How the pipeline works

Dubbing & Translation chains three real steps into one flow, and you can edit the text at every stage before moving on. Nothing is hidden between steps: each one produces text or audio you can inspect, correct, and re-run.

Step 01, Transcribe: upload or record audio, and a Whisper-class model turns it into an editable transcript. Step 02, Translate: pick a target language, and a fast language model produces an editable translated script. Step 03, Re-voice: pick a voice, and the translated script becomes finished dubbed audio.

Steps unlock progressively. Translate is disabled until a transcript exists, and Re-voice is disabled until a translation exists. Every step is re-runnable, so you can fix the transcript and translate again, or tweak the translation and re-voice it.

A dub is only as good as its words. The editable transcript and translation between steps exist so you catch mistakes before they are spoken.

Audio only, no video lip-sync

This tool works on audio. It does not edit video, align mouths, or time-stretch the dub to match the original's pacing. If you upload an mp4 video file, we read its audio track and the result is still audio. You bring the dubbed track back into your video editor yourself.

Supported languages

You can translate and dub into eight languages today. The studio shows each one in its own script:

LanguageShown in the studio as
SpanishEspañol
FrenchFrançais
GermanDeutsch
ItalianItaliano
PortuguesePortuguês
Japanese日本語
Hindiहिन्दी
Arabicالعربية

Inputs and limits

The transcribe step accepts mp3, wav, m4a, webm, ogg, and flac uploads (it also reads mpga, mpeg, mp4, and oga extensions), up to 25 MB per file. You can also record directly in the browser, up to 5 minutes per take.

Scripts are capped at 30,000 characters at both the transcript and translation stages. As a rule of thumb, about 1,000 characters is a minute of audio, so the cap is roughly half an hour of speech.

Translating and dubbing require sign-in. Before the re-voice step runs, a preflight line shows you the exact character count that will be generated, so there are no surprises against your monthly allowance.

The re-voicing engine

Dubbed audio is voiced by Gemini Flash, the only multilingual engine in the lineup. Kokoro, Grok Voice, MAI Voice 2, and Zonos are English-only, so they are not offered here. You pick from Gemini Flash's five voices: Kore, Charon, Aoede, Puck, and Zephyr.

Bracketed [stage directions] in your script are kept in place and left untranslated, so performance cues like [warmly] or [pause] survive the translation step and reach the voice engine intact.

The finished dub plays back in the studio and auto-saves to your library, ready to download.