Skip to content
New · the open voice benchmark is liveRead it
cantari
Convert

Text to WAV

Straight from generation to uncompressed file, with nothing lossy in between.

Painted writer's desk at night with a vintage microphone
Three steps

How does Text to WAV work?

Step 1: Write or paste the script

Scripts up to 30,000 characters. Bracketed cues like [softly] are stage directions, and Gemini Flash performs them.

Step 2: Pick an engine and voice

Five engines, each listing its real voices, described by character rather than hype. Audition before you commit.

Step 3: Generate, play, download

Press generate, hear the take in the browser, and download the WAV file. It is yours from that moment.

The output

Why WAV as the output?

WAV out of a speech engine means the take reaches you uncompressed: PCM straight from generation to file, with no lossy encode in the path. That is how Gemini Flash works here; it generates raw PCM that gets a WAV header and nothing else, so the download is the first and only rendering of that performance.

The other WAV path is long-form: the Audiobook Studio stitches chapter takes into a single 24 kHz mono WAV, assembled in your browser. Editors treat WAV as a first-class citizen, so cuts, fades, and loudness work never stack a second compression pass on top of the voice.

Real uses

Where text-to-WAV fits a real workflow

  • Audiobook chaptersstitched exports leave the studio as 24 kHz mono WAV, a sensible master for spoken word.
  • Post-productioneditors who manage loudness and mastering prefer a WAV source, so the only lossy encode is the final one.
  • Cue-driven readsscripts with bracketed stage directions go to Gemini Flash, and Gemini Flash happens to deliver WAV.
  • Game and app voice linesdialogue archived uncompressed stays clean through every conversion a build pipeline throws at it.
The honest specifics
  • Scripts up to 30,000 characters
  • House math: about 1,000 characters is a minute of audio
  • Output: WAV (24 kHz mono) from Gemini Flash and stitched audiobook chapters
  • No watermark, yours to keep
Straight answers

Text to WAV questions, answered honestly.

Which engine converts text to WAV?
Gemini Flash. It generates PCM that is wrapped as a WAV file on the way to you; the other four engines and cloned voices return MP3 instead. Pick Gemini Flash when you want WAV directly, or when the script leans on bracketed cues like [whispering], which it performs.
What sample rate and channel count is the WAV?
24 kHz, 16-bit, mono, for both a Gemini Flash take and a stitched audiobook chapter export. That is a spoken-word spec, not a music one: compact, clean, and exactly what the engine actually generates, with no upsampling theater.
Is WAV noticeably better than MP3 for AI speech?
On a single listen, few people can tell. The difference appears downstream: WAV gives an editor headroom to cut, process, and re-encode without compounding losses, which is why post-production asks for it even when listeners never would.
Can I get an MP3 from the WAV later?
Of course; any converter or editor will encode it, and going from WAV to MP3 once is exactly the encode order you want. Or skip the step entirely and generate on one of the four engines that emit MP3 in the first place.
Keep converting

Related formats.

Want the longer read? Open the Text to Speech guide in the docs.

Your script is one generate away.

Paste it in, pick a voice, and the finished file downloads with full commercial rights. Free to start, no credit meter.