For hosts and producers

Intros, ad reads, and pickups in your show's voice.

Not every line needs the studio. Generate the host intro, the mid-roll ad read, or a one-word pickup you forgot to record, in a consistent voice that matches the rest of the episode.

Start free Open the studio →

No credit card · Real engines · The audio is yours

Painted listening corner with headphones and a teal armchair

Generated voice

MP3 + WAV · yours to export

The moment

The interview is cut and it is good. What is missing is ninety seconds of connective tissue: a fresh cold open, a mid-roll for the new sponsor, and a one-line correction for a date your guest got wrong. None of it is worth setting the studio back up for.

Why this is hard

What Podcasts actually needs.

We would rather name the friction plainly than pretend it away. Here is the problem this page is about.

The honest problem

Most of a podcast is a real conversation, and it should stay that way. But the connective tissue, the intro, the sponsor read, the correction you need to drop in after the fact, often means booking the booth again for thirty seconds of audio. Generating those pieces keeps the show moving without a second session, in a voice that stays consistent episode to episode.

How Cantari helps

Real features, mapped to the job.

Every item here works today, or says plainly where it is still in progress.

Consistent show voice

Lock one voice for your intros and ad reads so every episode opens the same way, without re-matching a recording setup.

Acted ad reads

Bracketed [warmly] or [upbeat] cues let a sponsor read carry the right tone. Gemini Flash performs them.

Show notes from the episode itself

Speech to Text is live: upload the finished episode and work from the transcript, pulling show notes, quotes, and chapter markers instead of re-listening with a notepad.

Own every clip

Export MP3 or WAV with commercial rights and no watermark, ready to splice into the master.

Worked example

Cold open and mid-roll, episode 41

Script fragmentGemini Flash

intrigued My guest today spent nine years inside the agency she is about to take apart.

plain That conversation, right after this.

AD READ

warmly This episode is supported by Fieldnotes, the journaling app my producer will not stop quoting at me.

Line 3, real Gemini Flash output, unedited.

Gemini Flash, voice Kore: the [warmly] cue is the difference between a sponsor read and a sponsor apology. The cold open takes [intrigued] so the hook leans in.

The honest arithmetic · about 1,000 characters is a minute of speech

~500: characters in a 30-second cold open
~60 sec: of mid-roll from a 1,000-character read

The workflow

How it goes, step by step.

Step 1: Write the segment

Drop your intro, ad read, or pickup line into Text to Speech.

Step 2: Match your show voice

Pick the voice you use across the show and add cues for the ad reads.

Step 3: Generate and splice

Generate the segment, export it, and drop it into the episode where it belongs.

Platform notes

What podcast platforms say about generated voice.

No major podcast directory bans synthetic narration. What they police is deception, and the difference matters once your podcast carries a generated segment.

Spotify's line is impersonation, not AI

Spotify's platform rules target deceptive content: posing as another person, or manufactured media passed off as real. A podcast using generated voice for its own intros and ad reads breaks no rule there; cloning someone else's voice without permission does. That is the same line our consent-gated cloning draws, which is not a coincidence.

Source: Spotify Platform Rules

Generate the frame, record the conversation

The honest split for a podcast workflow: humans carry the interview, and generated voice carries the connective tissue around it (the cold open, the mid-roll, the correction you remember at midnight). Listeners came for the conversation. Nothing about the frame around it needs a booth.

Mine the episode after it ships

Run the finished episode through Speech to Text and the transcript becomes the podcast's paper trail: show notes, pull quotes for promotion, a searchable record of what was actually said. One recording, three more assets, no extra session.

Spotify's rules were checked live on June 11, 2026 at the link above. Other directories publish their own terms; we link only pages we verified ourselves.

Recommended engine

Start with Gemini Flash.

Gemini Flash acts the [warmly] or [upbeat] cues an ad read needs, so a sponsor segment sounds intentional rather than flat.

Gemini FlashExpressive - follows [cues]

The only engine here that acts your bracketed [emotion] directions.

Quality Elo: 1225
Latency: 2770 ms (measured 2026-06-10)
Languages: 24
Rights: Commercial use; outputs are yours

Cue-followingExpressive

Hear a line for this use case

“[warmly] This episode is brought to you by the people who actually listen to the end.”

Real Gemini Flash output, recorded unedited.

Tools behind itText to Speech Speech to Text Sound & Music

The honest answers.

What Cantari can and cannot do for podcasts today, in plain language.

Should I generate the whole episode?

We would not pitch that. The strength here is the connective tissue: intros, ad reads, and pickups. The conversation itself is yours to record. Generated voice fills the gaps between, in a consistent voice.

Can I get a transcript of my episode?

Yes. Speech to Text went live in June 2026: upload the audio or record straight into the studio, and a Whisper-class model returns a transcript you can edit, copy, or save to your library. It is the same transcription layer the dubbing pipeline runs on.

Do I own the generated segments?

Yes. Every generation is yours to export and use commercially, with no watermark.

Keep exploring

Try Cantari for podcasts.

Free to start, no credit meter. Open the studio and hear it for yourself.

Start free Open the studio