Jump to

Docs

Documentation

How the studio works, in plain language. Every guide describes the product as it runs today: real limits, real numbers, nothing aspirational.

Start here

Getting started: your first generation in five minutes

Create an account, learn the free plan, make your first clip, and find it in your library. No credit card, no setup.

Updated June 11, 2026

Tools

How to use the Text to Speech studio

The editor, choosing an engine by character, directing a read with bracketed cues, the fine controls, and Enhance.

Updated June 11, 2026 How to clone your voice from a short recording

Clone your voice from 10 to 20 seconds of clean audio. What to record, the consent attestation, language support, and what delete actually does.

Updated June 11, 2026 How to re-voice a recording as one of your cloned voices

Upload or record a take, pick a clone you built, and convert. What carries over, the real caps, and how usage meters by audio length.

Updated June 11, 2026 How to dub audio into another language

Transcribe a recording, translate the script, and re-voice it in one of 8 languages. Three real steps with an editable transcript between each.

Updated June 11, 2026 How to turn a manuscript into an audiobook

Import a manuscript, split it into chapters and segments, render each line as real audio, and export stitched chapter WAVs. Beta, documented honestly.

Updated June 11, 2026 How to transcribe audio to text

Upload or record audio and get an accurate plain-text transcript from a Whisper-class model. Formats, the 25 MB cap, and what metering really counts.

Updated June 11, 2026 How to generate background music and ambience

Describe a mood and get a finished instrumental clip from Lyria 3 (preview). Prompt tips, what the tool makes today, and what it honestly does not.

Updated June 11, 2026

Cantari Scribe

Cantari Scribe: push-to-talk dictation for Windows

Install, pair with a one-time code, hold a key, and clean text lands at your cursor in any app. How metering, privacy, and the beta caveats actually work.

Updated June 12, 2026 Setting up Cantari Scribe, step by step

From download to your first placed sentence: the SmartScreen prompt, the pairing code, the hotkey, and what each part of the little pill means.

Updated June 12, 2026 Scribe troubleshooting: when dictation does not land

The known dead ends and their fixes: SmartScreen, busy microphones, stolen hotkeys, apps that fight paste, Signed out, and the allowance meter.

Updated June 12, 2026

Platform

The five engines, and which one to pick

The full roster by character, plus how to read the benchmark: third-party quality scores and latencies we measure ourselves.

Updated June 11, 2026 How usage, limits, and file caps work

What a character buys, when the meter resets, what happens at the limit, and the real caps on uploads and scripts.

Updated June 11, 2026 Who owns the audio you generate

You own your generations: commercial use, worldwide, no watermark, no attribution. What we store, what delete does, and your cloning responsibilities.

Updated June 11, 2026 Supported audio and file formats

Every format each tool reads and writes, with the real caps: 25 MB uploads, 20 MB clone clips, 150,000-character manuscripts, MP3 and WAV out.

Updated June 11, 2026 Plans and billing: what each plan includes and how payment works

The three plans and their real allowances, what counts toward the meter, how checkout and cancellation work, and what happens when you hit the cap.

Updated June 11, 2026 How the community works: sharing, remixing, and moderation

Share your own clips with explicit consent, remix anyone's script into your studio, join the discussions, and see exactly how moderation works.

Updated June 11, 2026

Trust & safety

Why Cantari Has No Children's Voices

No child stock voices, no cloning minors, no child-like presets. The five reasons: consent, privacy law, voiceprint law, replica law, and the abuse record.

Updated June 11, 2026 Voice cloning consent and the law

Why consent is the legal foundation of voice cloning: publicity rights, voiceprint statutes with real damages, GDPR biometrics, and what genuine permission looks like.

Updated June 11, 2026 Cloning celebrities, politicians, and other public figures

Famous voices are the most legally protected voices, not the least. The ELVIS Act, the FCC's robocall ruling, the New Hampshire deepfake fine, and where parody actually stands.

Updated June 11, 2026 Voices of the dead: who can say yes to a posthumous clone

Death does not end a voice's legal protection. Postmortem publicity rights, California's digital replica law, how estates actually license voices, and what we allow.

Updated June 11, 2026 Deepfake disclosure laws: when synthetic audio must say so

The labeling rules arriving for AI audio: the EU AI Act's transparency article, US state election-deepfake laws, platform policies, and where Cantari already stands.

Updated June 11, 2026 Voice cloning scams: what they sound like and how to protect your family

The family-emergency voice scam, the case that reached Congress, what the FTC and FBI advise, the passphrase habit worth adopting today, and what we do on our side.

Updated June 11, 2026 Voice banking: preserving your own voice, and consent done right

The best case for voice cloning: people banking their voices ahead of ALS, performers licensing replicas on their own terms, and how you will preserve a voice here when cloning ships (coming soon).

Updated June 11, 2026

Glossary

What is text to speech? TTS, explained simply

Text to speech (TTS) is software that turns written words into spoken audio. What modern neural engines actually do, and how to size a script in minutes.

Updated June 11, 2026 What is speech to text? Transcription, explained simply

Speech to text (STT) turns recorded speech into written text. How modern transcription models work, what limits their accuracy, and what they cannot do yet.

Updated June 11, 2026 What is a TTS cue? Bracketed emotion directions, explained

A cue is a stage direction in square brackets, like [whispering], that tells a voice engine how to deliver the next line. Which engines act them, and which ignore them.

Updated June 11, 2026 What is speakable text normalization?

Normalization rewrites numbers, dates, and abbreviations into the words a voice should actually say: Dr. into Doctor, 1982 into nineteen eighty-two. Why it matters for long-form audio.

Updated June 11, 2026 What is voice drift in AI narration?

Voice drift is the slow change in a synthetic narrator's tone, pacing, or energy across long-form audio. Why it happens, and how chaptered workflows keep hour seven sounding like hour one.

Updated June 11, 2026 What is TTS latency? Time to first byte vs full audio

TTS latency is how long an engine takes to speak. The number depends entirely on where you stop the clock: the first streamed byte, or the complete audio file.

Updated June 11, 2026 What is a Quality Elo score for AI voices?

Quality Elo is a listener-vote rating for voice engines, borrowed from chess: blind pairwise comparisons produce a score nobody can self-award. How to read one, with the attribution.

Updated June 11, 2026 What is voice cloning consent, and why does it matter?

Voice cloning consent is the speaker's explicit permission before their voice is cloned. What good consent practice looks like, and how it is enforced here.

Updated June 11, 2026 Dubbing vs subtitling: what is the difference?

Dubbing replaces the voice track in a new language; subtitling keeps the original audio and translates on screen. The honest trade-offs, and which one this studio does.

Updated June 11, 2026 What is zero data retention (ZDR)?

Zero data retention means an AI provider processes your request and keeps nothing: no stored prompts, no stored outputs, no training on your text. What ZDR covers, and what it does not.

Updated June 11, 2026

Looking for an API reference? There is no public API yet. API docs ship the day the API does.