Skip to content
New · the open voice benchmark is liveRead it
cantari
For edtech and curriculum teams

Dialog practice in two languages, from one studio.

Every lesson needs listening audio: dialogs, drills, comprehension passages, in the target language and the learner's own. Generate both sides on the multilingual engine, and regenerate the set when the curriculum changes.

No credit card · Real engines · The audio is yours

Painted cafe table with two espresso cups, postcards, and an open notebook
Generated voice
MP3 + WAV · yours to export
The moment

Your learners can read Spanish; what they cannot do is survive hearing it at full speed. Every unit needs dialog audio, two speakers with natural pacing, re-recorded whenever the curriculum changes. Hiring native speakers for every drill in a sixty-unit course is how edtech budgets die.

Why this is hard

What Language Learning actually needs.

We would rather name the friction plainly than pretend it away. Here is the problem this page is about.
The honest problem

Listening practice is the most expensive part of a language curriculum to produce: it needs native-sounding speech in the target language, plus instruction in the learner's language, for every drill in every unit. Studio sessions per language pair do not scale, so courses ship with too little audio and learners meet the spoken language for the first time in the wild. Generated speech on a multilingual engine makes listening drills as revisable as the worksheets around them.

How Cantari helps

Real features, mapped to the job.

Every item here works today, or says plainly where it is still in progress.

Two languages, one engine

Gemini Flash is the multilingual engine here, so the target-language line and the instruction line come from the same studio, in the same session.

Dialog with intent

Bracketed cues like [cheerfully] or [slowly, clearly] shape the delivery, so a drill can sound like a market stall rather than a dictation test.

Drills that revise with the curriculum

When unit four gets rewritten, regenerate unit four. The flat allowance means audio stops being the reason a curriculum update waits.

Translate existing lessons

The live dubbing pipeline carries recorded lesson audio into eight languages: transcribe, translate, re-voice, with the script editable at each step.

Worked example

Listening drill, unit 4: at the market (A2)

Script fragmentGemini Flash
NARRATOR

Listen to the vendor's question, then answer out loud in the pause.

VENDOR

¿Cuánto quiere? ¿Medio kilo, o un kilo entero?

NARRATOR

She asked how much you want: half a kilo, or a whole kilo.

Line 2, real Gemini Flash output, unedited.

Gemini Flash voices both languages in one drill. It is the same multilingual engine behind the dubbing pipeline, so the Spanish is generated speech, not a recording you have to license.

The honest arithmetic · about 1,000 characters is a minute of speech
~1 min
of listening drill from every 1,000 characters
2
speakers in a dialog, one multilingual engine
8
languages the dubbing pipeline translates into
The workflow

How it goes, step by step.

Step 1: Write the drill

Script the dialog with speaker labels: instruction lines in the learner's language, practice lines in the target language.

Step 2: Voice each speaker

Pick a Gemini Flash voice per speaker so the vendor and the narrator stay distinct across the unit.

Step 3: Generate and listen for pacing

Generate the lines, check the target-language pacing, and add cues like [slowly, clearly] where beginners need room.

Step 4: Export to your course

Export MP3 or WAV per line or per drill, yours to embed in the app or LMS.

Curriculum notes

Designing language learning audio learners can keep up with.

Repetition is the curriculum, so make it cheap

A learner needs the same structure voiced five ways: statement, question, faster, slower, with a distractor. On a per-character meter those variants are where a language learning budget quietly dies; on a flat allowance, generating all five becomes the default lesson design instead of a luxury.

Two voices keep the drill legible

Hold one voice for instructions in the learner's language and a different voice for the target language, and never swap them. Learners stop translating the frame and start listening for the content, because the voice itself tells them which language is coming next.

Mind the engine's language list

Gemini Flash, the multilingual engine here, speaks roughly two dozen languages, and the dubbing pipeline translates into eight. Before scripting a unit, check the language learning pair you teach against those lists. If your target language is missing, we would rather you learn that on this page than after a pilot.

Recommended engine

Start with Gemini Flash.

Gemini Flash is the only multilingual engine here and the only one that acts cues, which is exactly the pairing a language drill needs: real target-language speech, delivered with intent.

Gemini FlashExpressive - follows [cues]

The only engine here that acts your bracketed [emotion] directions.

Quality Elo
1225
Latency
2770 ms (measured 2026-06-10)
Languages
24
Rights
Commercial use; outputs are yours
Cue-followingExpressive
Hear a line for this use case

Escucha otra vez. ¿Cuánto cuesta el kilo de tomates?

Real Gemini Flash output, recorded unedited.

The honest answers.

What Cantari can and cannot do for language learning today, in plain language.

Which languages can I generate?
Gemini Flash speaks roughly two dozen languages, and it is the engine we route multilingual work to. The dubbing pipeline's translation step currently covers eight: Spanish, French, German, Italian, Portuguese, Japanese, Hindi, and Arabic.
Can I slow the audio down for beginners?
Not as a dial on the multilingual engine today; real speed control exists on MAI Voice 2, which is English-only. The honest technique for target-language drills is writing shorter sentences and adding a cue like [slowly, clearly], which Gemini Flash acts on.
Do I own the drill audio?
Yes. Every generation is yours to export and ship inside a paid course or app, with no watermark and no attribution required.
Keep exploring

Try Cantari for language learning.

Free to start, no credit meter. Open the studio and hear it for yourself.