Documentation
How the studio works, in plain language. Every guide describes the product as it runs today: real limits, real numbers, nothing aspirational.
Start here
Tools
The editor, choosing an engine by character, directing a read with bracketed cues, the fine controls, and Enhance.
Updated June 11, 2026How to clone your voice from a short recordingClone your voice from 10 to 20 seconds of clean audio. What to record, the consent attestation, language support, and what delete actually does.
Updated June 11, 2026How to re-voice a recording as one of your cloned voicesUpload or record a take, pick a clone you built, and convert. What carries over, the real caps, and how usage meters by audio length.
Updated June 11, 2026How to dub audio into another languageTranscribe a recording, translate the script, and re-voice it in one of 8 languages. Three real steps with an editable transcript between each.
Updated June 11, 2026How to turn a manuscript into an audiobookImport a manuscript, split it into chapters and segments, render each line as real audio, and export stitched chapter WAVs. Beta, documented honestly.
Updated June 11, 2026How to transcribe audio to textUpload or record audio and get an accurate plain-text transcript from a Whisper-class model. Formats, the 25 MB cap, and what metering really counts.
Updated June 11, 2026How to generate background music and ambienceDescribe a mood and get a finished instrumental clip from Lyria 3 (preview). Prompt tips, what the tool makes today, and what it honestly does not.
Updated June 11, 2026Cantari Scribe
Install, pair with a one-time code, hold a key, and clean text lands at your cursor in any app. How metering, privacy, and the beta caveats actually work.
Updated June 12, 2026Setting up Cantari Scribe, step by stepFrom download to your first placed sentence: the SmartScreen prompt, the pairing code, the hotkey, and what each part of the little pill means.
Updated June 12, 2026Scribe troubleshooting: when dictation does not landThe known dead ends and their fixes: SmartScreen, busy microphones, stolen hotkeys, apps that fight paste, Signed out, and the allowance meter.
Updated June 12, 2026Platform
The full roster by character, plus how to read the benchmark: third-party quality scores and latencies we measure ourselves.
Updated June 11, 2026How usage, limits, and file caps workWhat a character buys, when the meter resets, what happens at the limit, and the real caps on uploads and scripts.
Updated June 11, 2026Who owns the audio you generateYou own your generations: commercial use, worldwide, no watermark, no attribution. What we store, what delete does, and your cloning responsibilities.
Updated June 11, 2026Supported audio and file formatsEvery format each tool reads and writes, with the real caps: 25 MB uploads, 20 MB clone clips, 150,000-character manuscripts, MP3 and WAV out.
Updated June 11, 2026Plans and billing: what each plan includes and how payment worksThe three plans and their real allowances, what counts toward the meter, how checkout and cancellation work, and what happens when you hit the cap.
Updated June 11, 2026How the community works: sharing, remixing, and moderationShare your own clips with explicit consent, remix anyone's script into your studio, join the discussions, and see exactly how moderation works.
Updated June 11, 2026Trust & safety
No child stock voices, no cloning minors, no child-like presets. The five reasons: consent, privacy law, voiceprint law, replica law, and the abuse record.
Updated June 11, 2026Voice cloning consent and the lawWhy consent is the legal foundation of voice cloning: publicity rights, voiceprint statutes with real damages, GDPR biometrics, and what genuine permission looks like.
Updated June 11, 2026Cloning celebrities, politicians, and other public figuresFamous voices are the most legally protected voices, not the least. The ELVIS Act, the FCC's robocall ruling, the New Hampshire deepfake fine, and where parody actually stands.
Updated June 11, 2026Voices of the dead: who can say yes to a posthumous cloneDeath does not end a voice's legal protection. Postmortem publicity rights, California's digital replica law, how estates actually license voices, and what we allow.
Updated June 11, 2026Deepfake disclosure laws: when synthetic audio must say soThe labeling rules arriving for AI audio: the EU AI Act's transparency article, US state election-deepfake laws, platform policies, and where Cantari already stands.
Updated June 11, 2026Voice cloning scams: what they sound like and how to protect your familyThe family-emergency voice scam, the case that reached Congress, what the FTC and FBI advise, the passphrase habit worth adopting today, and what we do on our side.
Updated June 11, 2026Voice banking: preserving your own voice, and consent done rightThe best case for voice cloning: people banking their voices ahead of ALS, performers licensing replicas on their own terms, and how you will preserve a voice here when cloning ships (coming soon).
Updated June 11, 2026Glossary
Text to speech (TTS) is software that turns written words into spoken audio. What modern neural engines actually do, and how to size a script in minutes.
Updated June 11, 2026What is speech to text? Transcription, explained simplySpeech to text (STT) turns recorded speech into written text. How modern transcription models work, what limits their accuracy, and what they cannot do yet.
Updated June 11, 2026What is a TTS cue? Bracketed emotion directions, explainedA cue is a stage direction in square brackets, like [whispering], that tells a voice engine how to deliver the next line. Which engines act them, and which ignore them.
Updated June 11, 2026What is speakable text normalization?Normalization rewrites numbers, dates, and abbreviations into the words a voice should actually say: Dr. into Doctor, 1982 into nineteen eighty-two. Why it matters for long-form audio.
Updated June 11, 2026What is voice drift in AI narration?Voice drift is the slow change in a synthetic narrator's tone, pacing, or energy across long-form audio. Why it happens, and how chaptered workflows keep hour seven sounding like hour one.
Updated June 11, 2026What is TTS latency? Time to first byte vs full audioTTS latency is how long an engine takes to speak. The number depends entirely on where you stop the clock: the first streamed byte, or the complete audio file.
Updated June 11, 2026What is a Quality Elo score for AI voices?Quality Elo is a listener-vote rating for voice engines, borrowed from chess: blind pairwise comparisons produce a score nobody can self-award. How to read one, with the attribution.
Updated June 11, 2026What is voice cloning consent, and why does it matter?Voice cloning consent is the speaker's explicit permission before their voice is cloned. What good consent practice looks like, and how it is enforced here.
Updated June 11, 2026Dubbing vs subtitling: what is the difference?Dubbing replaces the voice track in a new language; subtitling keeps the original audio and translates on screen. The honest trade-offs, and which one this studio does.
Updated June 11, 2026What is zero data retention (ZDR)?Zero data retention means an AI provider processes your request and keeps nothing: no stored prompts, no stored outputs, no training on your text. What ZDR covers, and what it does not.
Updated June 11, 2026Looking for an API reference? There is no public API yet. API docs ship the day the API does.