Formats

Every format we read and write.

These lists are generated from the same configuration the studio runs on, so they cannot drift from what the tools actually accept. If a format is missing here, the upload will tell you the same thing.

Verified against the studio · Updated 2026-06-10

Bring audio in

What you can upload.

Three tools take audio files. Dubbing transcribes first, so it reads the same uploads as Speech to Text; cloning takes the same set minus video containers.

Speech to Text

Uploads up to 25 MB per file

Upload a recording, get an accurate transcript back from a Whisper-class model.

InYou can upload

OutWe export

.txtPlain textDownload any transcript as .txt.

Voice Cloning

Reference clips up to 20 MB · Up to 2 minutes of reference audio

A short reference clip becomes a reusable voice. In-browser recordings are re-encoded to WAV for you.

InYou can upload

Dubbing & Translation

Uploads up to 25 MB per file · Translated scripts up to 30,000 characters

Dubbing transcribes your audio first, so it reads exactly the same uploads as Speech to Text.

InYou can upload

OutWe export

Bring words in

Manuscripts and chapters.

The Audiobook Studio takes text, not audio: paste the whole book or import a plain-text file, then export finished chapters.

Audiobook Studio

Manuscripts up to 150,000 characters

Paste a whole manuscript or import a plain-text file; export finished chapters as audio.

InYou can upload

.txtPlain textPaste your manuscript or import a .txt file.

OutWe export

Take audio out

What your generations export as.

No uploads here; you type, we generate. The engine you pick decides the container, and every export carries commercial rights.

Text to Speech

Scripts up to 30,000 characters

Type or paste a script; the engine you pick decides whether the audio comes back as MP3 or WAV.

InYou can upload

No file uploads: type or paste your script.

OutWe export

Sound & Music

Prompts up to 600 characters

A plain-language prompt returns a finished instrumental clip.

InYou can upload

No file uploads: describe the clip you want.

OutWe export

.mp3MP3Finished instrumental clips from Lyria 3.Guide ›

At a glance

The full matrix.

Every format against every tool, computed from the same registry as the cards above.

Format	Speech to Text	Voice Cloning	Dubbing & Translation	Audiobook Studio	Text to Speech	Sound & Music
.mp3	Speech to Text accepts .mp3 uploads	Voice Cloning accepts .mp3 uploads	Dubbing & Translation accepts .mp3 uploadsDubbing & Translation exports .mp3	Audiobook Studio exports .mp3	Text to Speech exports .mp3	Sound & Music exports .mp3
.wav	Speech to Text accepts .wav uploads	Voice Cloning accepts .wav uploads	Dubbing & Translation accepts .wav uploadsDubbing & Translation exports .wav	Audiobook Studio exports .wav	Text to Speech exports .wav
.m4a	Speech to Text accepts .m4a uploads	Voice Cloning accepts .m4a uploads	Dubbing & Translation accepts .m4a uploads
.mp4	Speech to Text accepts .mp4 uploads		Dubbing & Translation accepts .mp4 uploads
.webm	Speech to Text accepts .webm uploads	Voice Cloning accepts .webm uploads	Dubbing & Translation accepts .webm uploads
.ogg	Speech to Text accepts .ogg uploads	Voice Cloning accepts .ogg uploads	Dubbing & Translation accepts .ogg uploads
.flac	Speech to Text accepts .flac uploads	Voice Cloning accepts .flac uploads	Dubbing & Translation accepts .flac uploads
.txt	Speech to Text exports .txt			Audiobook Studio accepts .txt uploads

Accepted as uploadAvailable as export

Convert guides

One page per conversion.

Each common conversion has its own guide: what the format is, who really makes such files, and the honest caps, with the same registry behind every number.

Straight answers

Format questions, answered honestly.

Is there a file size limit?

Yes, and we publish the real numbers: Speech to Text and Dubbing take uploads up to 25 MB per file, Voice Cloning takes reference clips up to 20 MB and up to two minutes long, and the Audiobook Studio takes manuscripts up to 150,000 characters, which is about two and a half hours of finished narration.

What about video files like MP4?

MP4 works today in Speech to Text and Dubbing: we read the audio track and ignore the picture. Other video formats like MOV are not supported yet; if you can export the audio as MP3, WAV, or M4A first, every tool will read it.

What format should I record in?

Whatever your recorder makes. Browser recordings come out as WebM and phone voice memos as M4A, and we read both. For voice cloning, in-browser recordings are re-encoded to clean WAV automatically before upload, so you do not have to convert anything yourself.

Why do some exports come back as WAV and others as MP3?

The engine decides. Gemini Flash generates raw PCM that we deliver as WAV; the other engines and your cloned voices return MP3. Stitched chapter exports from the Audiobook Studio are always 24 kHz mono WAV, and Sound & Music clips are MP3.

Bring a file and see for yourself.

Drop a recording into Speech to Text and read the transcript in seconds. Free to start, no credit meter.

Transcribe a file Open the console