Question 1

Which engine converts text to WAV?

Accepted Answer

Gemini Flash. It generates PCM that is wrapped as a WAV file on the way to you; the other four engines and cloned voices return MP3 instead. Pick Gemini Flash when you want WAV directly, or when the script leans on bracketed cues like [whispering], which it performs.

Question 2

What sample rate and channel count is the WAV?

Accepted Answer

24 kHz, 16-bit, mono, for both a Gemini Flash take and a stitched audiobook chapter export. That is a spoken-word spec, not a music one: compact, clean, and exactly what the engine actually generates, with no upsampling theater.

Question 3

Is WAV noticeably better than MP3 for AI speech?

Accepted Answer

On a single listen, few people can tell. The difference appears downstream: WAV gives an editor headroom to cut, process, and re-encode without compounding losses, which is why post-production asks for it even when listeners never would.

Question 4

Can I get an MP3 from the WAV later?

Accepted Answer

Of course; any converter or editor will encode it, and going from WAV to MP3 once is exactly the encode order you want. Or skip the step entirely and generate on one of the four engines that emit MP3 in the first place.

Text to WAV

How does Text to WAV work?

Step 1: Write or paste the script

Step 2: Pick an engine and voice

Step 3: Generate, play, download

Why WAV as the output?

Where text-to-WAV fits a real workflow

Text to WAV questions, answered honestly.

Related formats.

Your script is one generate away.