Skip to content
New · the open voice benchmark is liveRead it
cantari
Voiceover

How to Do a Voiceover on TikTok

TikTok can record your voice or read text aloud in-app. Here is how both work, and how to use a natural voice that does not sound robotic.

Step by step

How to do a voiceover on TikTok

  1. Generate the voice in Cantari: paste your script, pick a voice, and generate a natural read in seconds, then export the audio file (MP3 on every plan, WAV on paid plans). No microphone, no retakes.

  2. In the TikTok editor, after recording or uploading your clips, tap the editing tools and find Voiceover (the microphone icon).

  3. Drag the playhead to where the narration should start, press and hold to record, and release to stop.

  4. For TikTok's built-in text to speech, add a text element, tap it, and choose Text to speech to have TikTok read it in one of its preset voices.

TikTok's own text-to-speech voices are a small, recognizable set. To use a natural, distinctive voice, generate the line in Cantari, then assemble the video with that audio in CapCut (TikTok's editor) and post it. CapCut has its own voiceover page below.

Straight answers

Voiceover on TikTok, answered.

Can I use a custom voice instead of TikTok's text to speech?
Yes, but not directly inside TikTok. Generate the voice as an audio file, build the video in an editor like CapCut with that file on the audio track, then upload the finished video to TikTok. TikTok's in-app voices are limited to its presets.
Why does the TikTok text-to-speech voice sound the same as everyone else's?
Because it is a handful of shared preset voices. If you want a voice that is yours and natural, generate it separately and add it in your editor before posting.
Does a voiceover replace the original sound?
TikTok layers the voiceover over your clips and lets you lower or mute the original clip volume in the editor, so you control the mix.
Keep going

Voiceovers elsewhere.

The voice itself comes from text to speech. New to it? Read the guide.

Your TikTok voiceover starts with the voice.

Generate a natural read from your script in seconds, export it, and add it the way this guide shows. No microphone required.