How to clone your voice from a short recording
Clone your voice from 10 to 20 seconds of clean audio. What to record, the consent attestation, language support, and what delete actually does.
Updated June 11, 2026
What voice cloning will be here
Voice cloning is coming soon. When it ships it will live on the Your voices page in the app. You will give it a short reference clip of one person speaking, and it will build a reusable voice you can generate speech with anywhere a cloned voice is offered. This guide describes how it will work so you know what to expect; we label it honestly rather than imply a working feature.
Creating a clone will take about a minute. It is never instant, and we do not pretend otherwise. Once created, your clone will speak on Sonic, and everything you generate with it will land in your private library automatically.
Recording the reference clip
About 10 to 20 seconds of clean, single-speaker audio is the sweet spot. The hard cap is 2 minutes of reference audio and 20 MB per file. More audio does not reliably make a better clone; cleaner audio does.
You can record directly in the browser or upload a file (mp3, wav, m4a, webm, ogg, or flac). Browser recordings are automatically re-encoded to plain 24 kHz mono WAV on your machine before upload, because raw browser recorder output is unreliable with the cloning engine. If that conversion ever fails, we fall back to the original recording so your capture is never lost.
Read like you talk, not like a robot. A clip that mixes statements, questions, and a little energy gives the clone a real pitch range to learn from.
One speaker only. Background music, crosstalk, or a second voice in the clip will degrade the clone or cause the build to fail.
The prepared reading scripts
Improvising a minute of speech on the spot is harder than it sounds, so the create form offers four prepared passages in different registers. Each runs roughly 110 to 130 words, about 45 to 60 seconds at an easy pace, and deliberately mixes statements, questions, and exclamations.
The consent attestation
Before you can create a clone, you must tick a required consent checkbox. It reads, in full:
"This is my voice, or I have the speaker's permission to clone it. I understand cloned voices I create are mine to use and mine to answer for."
This is enforced on the server, not just in the form: no consent, no clone. Cloning someone without their permission, or using a clone to impersonate a real person, is against the terms of use and is on you, not us.
Cloned voices you create are yours to use and yours to answer for. Do not clone a voice you do not have permission to clone.
Languages
English is the tested path. The language picker also offers Spanish, French, German, Italian, Portuguese, Hindi, and Japanese, and each is labeled untested in the picker, because that is the truth: we have not verified clone quality in those languages yet. You are welcome to try them; just know you are off the tested path.
Testing, generating, and where audio saves
Each voice in your list has a Test panel: type a line (up to 500 characters) and press Generate to hear your clone speak it. Generation with a cloned voice requires sign-in, comes out of the same monthly character allowance as every other engine, and the resulting audio is saved server-side to your private library as an MP3, tagged with the voice's name.
Your reference clip is also kept, privately, in your account's storage, so it stays attached to the voice it built.
Deleting a clone
Delete uses an inline two-step confirm so a stray click cannot destroy a voice. When you confirm, we remove the voice from your account, delete it from the cloning engine, and remove the stored reference clip. The voice is gone; there is no trash folder or restore.
Audio you already generated with that voice stays in your library until you delete those items yourself, the same as any other generation.