Every format we read and write.
These lists are generated from the same configuration the studio runs on, so they cannot drift from what the tools actually accept. If a format is missing here, the upload will tell you the same thing.
Verified against the studio · Updated 2026-06-10
What you can upload.
Three tools take audio files. Dubbing transcribes first, so it reads the same uploads as Speech to Text; cloning takes the same set minus video containers.
Speech to Text
Uploads up to 25 MB per fileUpload a recording, get an accurate transcript back from a Whisper-class model.
- .mp3MP3The everywhere format. We also read .mpga and .mpeg files as MP3.Guide ›
- .wavWAVUncompressed PCM audio.Guide ›
- .m4aM4AAAC audio, what phone voice memos usually record.Guide ›
- .mp4MP4A video container; we read its audio track.Guide ›
- .webmWebMWhat browser recorders produce.Guide ›
- .oggOGGOpen container, usually Vorbis or Opus. We also read .oga.Guide ›
- .flacFLACLossless compression; bigger files, nothing thrown away.Guide ›
- .txtPlain textDownload any transcript as .txt.
Voice Cloning
Reference clips up to 20 MB · Up to 2 minutes of reference audioA short reference clip becomes a reusable voice. In-browser recordings are re-encoded to WAV for you.
- .mp3MP3The everywhere format. We also read .mpga and .mpeg files as MP3.Guide ›
- .wavWAVUncompressed PCM audio.Guide ›
- .m4aM4AAAC audio, what phone voice memos usually record.Guide ›
- .webmWebMWhat browser recorders produce.Guide ›
- .oggOGGOpen container, usually Vorbis or Opus. We also read .oga.Guide ›
- .flacFLACLossless compression; bigger files, nothing thrown away.Guide ›
Dubbing & Translation
Uploads up to 25 MB per file · Translated scripts up to 30,000 charactersDubbing transcribes your audio first, so it reads exactly the same uploads as Speech to Text.
- .mp3MP3The everywhere format. We also read .mpga and .mpeg files as MP3.Guide ›
- .wavWAVUncompressed PCM audio.Guide ›
- .m4aM4AAAC audio, what phone voice memos usually record.Guide ›
- .mp4MP4A video container; we read its audio track.Guide ›
- .webmWebMWhat browser recorders produce.Guide ›
- .oggOGGOpen container, usually Vorbis or Opus. We also read .oga.Guide ›
- .flacFLACLossless compression; bigger files, nothing thrown away.Guide ›
Manuscripts and chapters.
The Audiobook Studio takes text, not audio: paste the whole book or import a plain-text file, then export finished chapters.
Audiobook Studio
Manuscripts up to 150,000 charactersPaste a whole manuscript or import a plain-text file; export finished chapters as audio.
- .txtPlain textPaste your manuscript or import a .txt file.
What your generations export as.
No uploads here; you type, we generate. The engine you pick decides the container, and every export carries commercial rights.
Text to Speech
Scripts up to 30,000 charactersType or paste a script; the engine you pick decides whether the audio comes back as MP3 or WAV.
No file uploads: type or paste your script.
Sound & Music
Prompts up to 600 charactersA plain-language prompt returns a finished instrumental clip.
No file uploads: describe the clip you want.
The full matrix.
Every format against every tool, computed from the same registry as the cards above.
| Format | Speech to Text | Voice Cloning | Dubbing & Translation | Audiobook Studio | Text to Speech | Sound & Music |
|---|---|---|---|---|---|---|
| .mp3 | Speech to Text accepts .mp3 uploads | Voice Cloning accepts .mp3 uploads | Dubbing & Translation accepts .mp3 uploadsDubbing & Translation exports .mp3 | Audiobook Studio exports .mp3 | Text to Speech exports .mp3 | Sound & Music exports .mp3 |
| .wav | Speech to Text accepts .wav uploads | Voice Cloning accepts .wav uploads | Dubbing & Translation accepts .wav uploadsDubbing & Translation exports .wav | Audiobook Studio exports .wav | Text to Speech exports .wav | |
| .m4a | Speech to Text accepts .m4a uploads | Voice Cloning accepts .m4a uploads | Dubbing & Translation accepts .m4a uploads | |||
| .mp4 | Speech to Text accepts .mp4 uploads | Dubbing & Translation accepts .mp4 uploads | ||||
| .webm | Speech to Text accepts .webm uploads | Voice Cloning accepts .webm uploads | Dubbing & Translation accepts .webm uploads | |||
| .ogg | Speech to Text accepts .ogg uploads | Voice Cloning accepts .ogg uploads | Dubbing & Translation accepts .ogg uploads | |||
| .flac | Speech to Text accepts .flac uploads | Voice Cloning accepts .flac uploads | Dubbing & Translation accepts .flac uploads | |||
| .txt | Speech to Text exports .txt | Audiobook Studio accepts .txt uploads |
One page per conversion.
Each common conversion has its own guide: what the format is, who really makes such files, and the honest caps, with the same registry behind every number.
Format questions, answered honestly.
Is there a file size limit?
What about video files like MP4?
What format should I record in?
Why do some exports come back as WAV and others as MP3?
Bring a file and see for yourself.
Drop a recording into Speech to Text and read the transcript in seconds. Free to start, no credit meter.