For wellness apps and coaches

A voice that can go softly, and the sound beneath it.

Guided sessions live or die on delivery. Direct the read with [softly] and [whispering] cues, then generate the ambience bed in Sound & Music, so a session is two generated layers and one export away.

Start free Open the studio →

No credit card · Real engines · The audio is yours

Painted meditation corner with a floor cushion, a lit candle, and a singing bowl at dusk

Generated voice

MP3 + WAV · yours to export

The moment

Your members press play at eleven p.m., in the dark, and the session has to hold them from the first breath. A rushed read ruins it, and a hired narrator costs more per session than the app earns per member. You need a voice that can go softly on cue, and a bed of sound underneath it.

Why this is hard

What Meditation & Wellness actually needs.

We would rather name the friction plainly than pretend it away. Here is the problem this page is about.

The honest problem

Wellness audio is the hardest read to fake: pace, breath, and softness are the whole product. Most generated voice reads too fast and too brightly for a body scan or a sleep story, and recording a human narrator for a growing session library is a per-minute cost that compounds. The need is delivery you can direct, softly, slowly, down to the line, plus the ambience underneath, without booking two studios.

How Cantari helps

Real features, mapped to the job.

Every item here works today, or says plainly where it is still in progress.

Delivery you can direct

Bracketed [softly], [whispering], or [slowly] cues are acted by Gemini Flash, so a wind-down actually winds down instead of reading at podcast pace.

Ambience beds, generated here

Sound & Music is live today: describe rain on a roof, a low drone, a slow piano, and Lyria 3 composes an instrumental bed to sit under the voice.

A library that grows weekly

New sessions are a script away. The flat allowance means a fifty-session library does not carry fifty narrator invoices.

Own every session

Export voice and bed as MP3 or WAV with commercial rights and no watermark, yours to publish in your app.

Worked example

Wind-down: the last four lines of a sleep session

Script fragmentGemini Flash

softly Let the day set itself down. You do not need to hold it anymore.

softly Feel the weight of the blanket. The room is doing the resting for you.

whispering Tomorrow is already taken care of. Nothing is asked of you tonight.

whispering Just this breath. And the next one. That is all.

Line 1, real Gemini Flash output, unedited.

Gemini Flash, voice Aoede, stepping down from [softly] to [whispering] across the four lines. Generate a rain or drone bed in Sound & Music and lay it underneath in your editor.

The honest arithmetic · about 1,000 characters is a minute of speech

~10 min: of speech from a 10,000-character script, before the pauses you add
2: generated layers: the voice and the music bed

The workflow

How it goes, step by step.

Step 1: Script the session with cues

Write the passage and mark the delivery: [softly] for the body of the session, [whispering] for the final minutes.

Step 2: Generate the voice

Render on Gemini Flash and listen for pace. Re-cue and regenerate any line that pushes too hard.

Step 3: Generate the bed in Sound & Music

Describe the ambience in plain language and Lyria 3 composes an instrumental clip to sit underneath.

Step 4: Mix and export

Lay the bed under the voice in your editor, then ship both files. They are yours, commercially, no watermark.

Craft notes

Directing a meditation read, line by line.

Wellness scripts fail in production for craft reasons, not technical ones. Three notes from building meditation sessions with the studio's own tools.

Pace is written, not dialed

There is no slow-down dial on the expressive engine, so a meditation script controls pace the honest way: short sentences, one image per line, and cues like [slowly] or [whispering] where the read should sink. Sentence breaks are your rests. Write them where you want the listener to breathe, and the engine follows.

Silence is part of the session

Generated speech comes back without the long pauses a body scan needs, and that is fine: render the guidance in sections, then open the gaps in your editor, ten seconds here, thirty there. The silence stays exactly as long as you decide, and re-rendering one line never moves the bed underneath it.

Audition at a whisper before you commit

Gemini Flash acts [softly] and [whispering], which is the trait guided meditation work leans on; for unguided tracks, skip the voice and let Sound & Music carry the session alone. Not every voice in the roster suits a sleep story, and softness exposes brightness fast, so audition the quietest line in your script first, not the loudest.

Recommended engine

Start with Gemini Flash.

Gemini Flash is the only engine here that acts [softly] and [whispering] cues, and softness on cue is the entire job in this category.

Gemini FlashExpressive - follows [cues]

The only engine here that acts your bracketed [emotion] directions.

Quality Elo: 1225
Latency: 2770 ms (measured 2026-06-10)
Languages: 24
Rights: Commercial use; outputs are yours

Cue-followingExpressive

Hear a line for this use case

“[softly] Settle in. There is nowhere else you need to be for the next ten minutes.”

Real Gemini Flash output, recorded unedited.

Tools behind itText to Speech Sound & Music Audiobook Studio

The honest answers.

What Cantari can and cannot do for meditation & wellness today, in plain language.

Can the voice really whisper?

Gemini Flash acts bracketed cues, including [softly] and [whispering], and the sample on this page is its real output, unedited, so you can judge it yourself. The plain-read engines will ignore those cues, and the picker says so up front.

Is the background music generated too?

Yes. Sound & Music is live in the studio, composed by Lyria 3, a music model in preview. It makes instrumental beds and atmospheres; it does not make voices, and one-shot effects like a single bell strike are not here yet.

Do I own the sessions commercially?

Yes. Both the voice and the music bed are yours to export and publish in a paid app or membership, with no watermark and no attribution required.

Keep exploring

Try Cantari for meditation & wellness.

Free to start, no credit meter. Open the studio and hear it for yourself.

Start free Open the studio