A voice that can go softly, and the sound beneath it.
Guided sessions live or die on delivery. Direct the read with [softly] and [whispering] cues, then generate the ambience bed in Sound & Music, so a session is two generated layers and one export away.
No credit card · Real engines · The audio is yours

Your members press play at eleven p.m., in the dark, and the session has to hold them from the first breath. A rushed read ruins it, and a hired narrator costs more per session than the app earns per member. You need a voice that can go softly on cue, and a bed of sound underneath it.
What Meditation & Wellness actually needs.
Wellness audio is the hardest read to fake: pace, breath, and softness are the whole product. Most generated voice reads too fast and too brightly for a body scan or a sleep story, and recording a human narrator for a growing session library is a per-minute cost that compounds. The need is delivery you can direct, softly, slowly, down to the line, plus the ambience underneath, without booking two studios.
Real features, mapped to the job.
Every item here works today, or says plainly where it is still in progress.
Delivery you can direct
Bracketed [softly], [whispering], or [slowly] cues are acted by Gemini Flash, so a wind-down actually winds down instead of reading at podcast pace.
Ambience beds, generated here
Sound & Music is live today: describe rain on a roof, a low drone, a slow piano, and Lyria 3 composes an instrumental bed to sit under the voice.
A library that grows weekly
New sessions are a script away. The flat allowance means a fifty-session library does not carry fifty narrator invoices.
Own every session
Export voice and bed as MP3 or WAV with commercial rights and no watermark, yours to publish in your app.
Wind-down: the last four lines of a sleep session
softly Let the day set itself down. You do not need to hold it anymore.
softly Feel the weight of the blanket. The room is doing the resting for you.
whispering Tomorrow is already taken care of. Nothing is asked of you tonight.
whispering Just this breath. And the next one. That is all.
Gemini Flash, voice Aoede, stepping down from [softly] to [whispering] across the four lines. Generate a rain or drone bed in Sound & Music and lay it underneath in your editor.
- ~10 min
- of speech from a 10,000-character script, before the pauses you add
- 2
- generated layers: the voice and the music bed
How it goes, step by step.
Step 1: Script the session with cues
Write the passage and mark the delivery: [softly] for the body of the session, [whispering] for the final minutes.
Step 2: Generate the voice
Render on Gemini Flash and listen for pace. Re-cue and regenerate any line that pushes too hard.
Step 3: Generate the bed in Sound & Music
Describe the ambience in plain language and Lyria 3 composes an instrumental clip to sit underneath.
Step 4: Mix and export
Lay the bed under the voice in your editor, then ship both files. They are yours, commercially, no watermark.
Directing a meditation read, line by line.
Wellness scripts fail in production for craft reasons, not technical ones. Three notes from building meditation sessions with the studio's own tools.
Pace is written, not dialed
There is no slow-down dial on the expressive engine, so a meditation script controls pace the honest way: short sentences, one image per line, and cues like [slowly] or [whispering] where the read should sink. Sentence breaks are your rests. Write them where you want the listener to breathe, and the engine follows.
Silence is part of the session
Generated speech comes back without the long pauses a body scan needs, and that is fine: render the guidance in sections, then open the gaps in your editor, ten seconds here, thirty there. The silence stays exactly as long as you decide, and re-rendering one line never moves the bed underneath it.
Audition at a whisper before you commit
Gemini Flash acts [softly] and [whispering], which is the trait guided meditation work leans on; for unguided tracks, skip the voice and let Sound & Music carry the session alone. Not every voice in the roster suits a sleep story, and softness exposes brightness fast, so audition the quietest line in your script first, not the loudest.
Start with Gemini Flash.
Gemini Flash is the only engine here that acts [softly] and [whispering] cues, and softness on cue is the entire job in this category.
The only engine here that acts your bracketed [emotion] directions.
- Quality Elo
- 1225
- Latency
- 2770 ms (measured 2026-06-10)
- Languages
- 24
- Rights
- Commercial use; outputs are yours
“[softly] Settle in. There is nowhere else you need to be for the next ten minutes.”
The honest answers.
What Cantari can and cannot do for meditation & wellness today, in plain language.
Can the voice really whisper?
Is the background music generated too?
Do I own the sessions commercially?
Try Cantari for meditation & wellness.
Free to start, no credit meter. Open the studio and hear it for yourself.