Skip to content
New · the open voice benchmark is liveRead it
cantari
For game developers

Placeholder lines and barks, generated as you build.

Waiting on final VO blocks iteration. Generate temp dialogue and ambient barks now so you can hear pacing in the build, then swap in final audio when you have it. Cue-directed delivery gives characters a read with intent.

No credit card · Real engines · The audio is yours

Painted game-dev desk with a controller, a dragon figurine, and a hand-drawn level map
Generated voice
MP3 + WAV · yours to export
Worked example

Bark set: dockside guards, night shift

Script fragmentGemini Flash
GUARD 1

bored Third watch and nothing but rats. Again.

GUARD 2

alarmed Wait. That crate just moved. That crate just moved!

GUARD 1

angry Sound the bell! Sound the bell, you idiot!

Line 2, real Gemini Flash output, unedited.

Gemini Flash: the same temp voice goes bored, alarmed, and furious on cue, which is what a bark set needs before final VO exists. One voice per character keeps the build coherent.

The honest arithmetic · about 1,000 characters is a minute of speech
~60
characters in a typical bark
2
passes cover an 800-bark sheet at that length
5
live engines, each with its own fixed voices
Why this is hard

What Game Dev actually needs.

We would rather name the friction plainly than pretend it away. Here is the problem this page is about.
The honest problem

During development, dialogue is in flux and final voice acting comes late. Designers need to hear lines in the build to test pacing, timing, and triggers, but commissioning VO for lines that will still change is wasteful. Generated placeholder dialogue lets you iterate on a fully voiced build now, with cue-directed delivery so even temp lines carry character, and swap in final audio when the script settles.

Sound familiar?

There are eight hundred barks in the spreadsheet and the playtest is Friday. The guards need to sound alarmed, the merchant needs to sound bored, and none of it is final enough to put in front of a voice actor yet. Right now every line in the build is your own voice, recorded at your desk, and the testers can tell.

How Cantari helps

Real features, mapped to the job.

Every item here works today, or says plainly where it is still in progress.

Cue-directed character lines

Bracketed [angry] or [nervous] cues let a temp line carry character, so you can test how a bark reads in context. Gemini Flash performs the direction.

Iterate before final VO

Generate placeholder dialogue in seconds and hear it in the build, so pacing and triggers get tested long before final audio is commissioned.

Many reads, fast

Generate variations of a bark quickly to find the line that fits, then export the keepers.

Own what you ship

Export MP3 or WAV with commercial rights and no watermark, whether the line is temp or final.

The workflow

How it goes, step by step.

Step 1: Write the lines

Drop character dialogue and barks into Text to Speech, with bracketed cues for delivery.

Step 2: Generate temp audio

Pick a voice per character and generate placeholder lines to drop into the build.

Step 3: Test in context

Hear the lines in your build, iterate on the script, and regenerate what changes.

Step 4: Export the keepers

Export MP3 or WAV for any line you keep. Commercial rights, no watermark.

Production notes

AI voice acting for games, from placeholder to ship.

Barks are a numbers problem

A bark sheet is hundreds of tiny lines, and the cost that matters is the re-render: combat tuning changes a trigger, and twelve guards need new lines by Friday. At about 60 characters a bark, regenerating a whole faction is minutes of studio time on a flat allowance, so iterating on dialogue stops being a budget conversation.

WAV out, any game engine in

Exports are plain MP3 or WAV files, no wrapper, no SDK, no per-seat license. They drop into Unity, Unreal, Godot, or a homegrown game engine the same way: as audio assets in the pipeline you already have. There is no integration to maintain and nothing that breaks when your engine updates.

Placeholder today, decision later

Cue-directed temp lines are good enough to playtest a game's pacing and triggers, which is their actual job. When the script locks, commission actors for the keepers or ship the generated lines: commercial rights are identical either way, and the build never waited on the casting call.

Recommended engine

Start with Gemini Flash.

Gemini Flash acts bracketed [emotion] cues, so even a placeholder bark can read angry, nervous, or triumphant and tell you whether the line lands in context.

Gemini FlashExpressive - follows [cues]

The only engine here that acts your bracketed [emotion] directions.

Quality Elo
1225
Latency
2770 ms (measured 2026-06-10)
Languages
24
Rights
Commercial use; outputs are yours
Cue-followingExpressive
Hear a line for this use case

[nervous] I do not think we are alone down here. Keep your light on the door.

Real Gemini Flash output, recorded unedited.

The honest answers.

What Cantari can and cannot do for game dev today, in plain language.

Is this meant to replace voice actors?
No. The honest pitch is placeholder and temp dialogue so you can iterate on a voiced build before final VO is commissioned. Many teams swap in actor audio for the final ship; generated lines keep development moving in the meantime.
Can I clone a specific character voice?
Coming soon. When cloning ships you will clone a voice from a short clip (about 10 to 20 seconds is the sweet spot) with the speaker's permission, and it will join your picker next to the roster voices. English first; consent will be required for every clone.
Do I own the lines I generate?
Yes. Every generation is yours to export and ship commercially, with no watermark, whether the line is temp or final.
Keep exploring

Try Cantari for game dev.

Free to start, no credit meter. Open the studio and hear it for yourself.