Skip to content
New · the open voice benchmark is liveRead it
cantari
Glossary

What is a TTS cue? Bracketed emotion directions, explained

A cue is a stage direction in square brackets, like [whispering], that tells a voice engine how to deliver the next line. Which engines act them, and which ignore them.

Updated June 11, 2026

The definition

A cue is a performance direction written in square brackets inside a TTS script. It is never spoken aloud; instead, a cue-following engine reads it as direction for the words after it, the way an actor reads a stage direction in a play.

[whispering] The map was wrong. The door was never locked.

[pause] [excited] It opens from the inside!

Why cues exist

Punctuation can only carry so much. A question mark changes pitch and a comma adds a beat, but nothing in plain text says read this line like you are out of breath. Cues give writers a way to direct delivery, line by line, without leaving the script. They are the difference between text that is read and text that is performed.

Anything descriptive can be a cue: an emotion like [nervously], a volume like [quietly], a pace like [pause], or something specific like [out of breath]. Cue-following engines interpret them the way an actor would interpret a margin note.

The honest catch: most engines ignore them

Cue-following is a property of the engine, not of the script. Write [sobbing] for an engine that does not support direction and one of two things happens: a good pipeline strips the bracket so it is silently skipped, and a bad one reads the word sobbing aloud. Neither performs it.

On Cantari's roster, Gemini Flash is the engine that acts cues, and it is the default for exactly that reason. The other four (Kokoro, Grok Voice, MAI Voice 2, Zonos) give a plain read, and the studio badges every engine with Acts [cues] or Plain read so the behavior is never a surprise.

In the studio, the Enhance button can insert cues for you: a directing pass adds [emotion] markers where they help, with one-click undo. See the studio guide.

Writing better cues

Place the cue immediately before the line it directs, keep it short, and change it when the mood changes; a single [calm] at the top of a page will not carry a whole scene. The cue section of the Text to Speech guide shows the studio's full cue palette and worked examples.