Skip to content
New · the open voice benchmark is liveRead it
cantari
For teams shipping in many markets

Carry your audio into another language.

Reaching a new market should not mean re-recording from scratch. The dubbing pipeline transcribes your audio, translates the script, and re-voices it in the target language, live today, with a review step before anything ships.

No credit card · Real engines · The audio is yours

Painted desk with a shortwave radio, globe, and postcards
Generated voice
MP3 + WAV · yours to export
The moment

The episode works. You know it works, because the analytics say a third of your audience watches with Spanish captions on. The content is finished and paid for; the only thing between it and a new market is a voice that speaks the language.

Why this is hard

What Localization & Dubbing actually needs.

We would rather name the friction plainly than pretend it away. Here is the problem this page is about.
The honest problem

Shipping content into another market usually means a separate recording session, a separate budget, and a separate vendor per language. The friction is the chain: transcribe, translate, then re-voice, each step a handoff. A pipeline that chains those steps and re-voices on engines you already trust removes the stitching, so localization is a flow rather than three projects.

How Cantari helps

Real features, mapped to the job.

Every item here works today, or says plainly where it is still in progress.

One pipeline, three steps

Transcribe, translate, then re-voice, chained into one flow so you are not stitching three separate tools together by hand. Live today, end to end, with the script editable at every step.

Re-voiced by live engines

The final speech runs on the same real engines as Text to Speech, so a dubbed line is genuinely generated voice you can audition.

Review before you ship

You can check and edit the translated script before it is voiced, because a dub is only as good as its words.

Own the dubbed result

Export MP3 or WAV with commercial rights and no watermark, yours to publish in the new market.

Worked example

One line, two markets

Script fragmentGemini Flash
EN

The recipe takes ten minutes, and you already own every ingredient.

ES

La receta toma diez minutos, y ya tienes todos los ingredientes.

Line 2, real Gemini Flash output, unedited.

Gemini Flash voices both lines: it is the multilingual engine the dubbing pipeline uses for the re-voice step, so the Spanish you hear is generated speech, not a recording we licensed.

The honest arithmetic · about 1,000 characters is a minute of speech
8
target languages in the dubbing pipeline
3
chained steps: transcribe, translate, re-voice
1
flow instead of three vendor handoffs
The workflow

How it goes, step by step.

Step 1: Transcribe the source

Your original audio is turned into text through the transcription layer.

Step 2: Translate the script

The transcript is translated into your target language, ready for you to review.

Step 3: Re-voice it

The live TTS engines speak the translated script in the new language, in a voice you choose.

Step 4: Export and own

Download the dubbed audio as MP3 or WAV, yours to publish commercially.

Recommended engine

Start with Gemini Flash.

Gemini Flash supports the widest language range here and acts cues, so a translated line can keep the tone of the original rather than reading flat.

Gemini FlashExpressive - follows [cues]

The only engine here that acts your bracketed [emotion] directions.

Quality Elo
1225
Latency
2770 ms (measured 2026-06-10)
Languages
24
Rights
Commercial use; outputs are yours
Cue-followingExpressive
Hear a line for this use case

The same story, now in a language your audience already speaks.

Real Gemini Flash output, recorded unedited.

The honest answers.

What Cantari can and cannot do for localization & dubbing today, in plain language.

Can I dub audio end to end today?
Yes. Open Dubbing, transcribe a recording, translate the script into one of eight languages, then re-voice it. Every step runs live, and you can edit the text before moving on. English-source audio transcribes most reliably.
How does the pipeline work under the hood?
It is a composite: transcription, then translation, then Gemini Flash for the final voice, all inside one studio. Gemini is the multilingual engine here; the other engines are English-only today, so offering them for a non-English dub would not be honest.
Do I own the dubbed audio?
Yes. Every dub you generate is yours to export and publish commercially, with no watermark.
Keep exploring

Try Cantari for localization & dubbing.

Free to start, no credit meter. Open the studio and hear it for yourself.